22
Learning Morphological Disambiguation Rules for Turkish Deniz Yuret Ferhan Türe Koç University, İstanbul

Learning Morphological Disambiguation Rules for Turkish

  • Upload
    cutler

  • View
    21

  • Download
    1

Embed Size (px)

DESCRIPTION

Learning Morphological Disambiguation Rules for Turkish. Deniz Yuret Ferhan T ü re Ko ç University, İ stanbul. Overview. Turkish morphology The morphological disambiguation task The Greedy Prepend Algorithm Training Evaluation. Turkish Morphology. - PowerPoint PPT Presentation

Citation preview

Page 1: Learning Morphological Disambiguation Rules for Turkish

Learning Morphological Disambiguation Rules for Turkish

Deniz Yuret

Ferhan Türe

Koç University, İstanbul

Page 2: Learning Morphological Disambiguation Rules for Turkish

Overview

Turkish morphology The morphological disambiguation task The Greedy Prepend Algorithm Training Evaluation

Page 3: Learning Morphological Disambiguation Rules for Turkish

Turkish Morphology

Turkish is an agglutinative language: Many syntactic phenomena expressed by function words and word order in

English are expressed by morphology in Turkish.

I will be able to go.

(go) + (able to) + (will) + (I)

git + ebil + ecek + im

Gidebileceğim.

Page 4: Learning Morphological Disambiguation Rules for Turkish

Fun with Turkish Morphology

Avrupa Europe lı European laş become tır make ama not able to

dık we were larımız those that dan from mış were sınız you

Avrupalılaştıramadıklarımızdanmışsınız

Page 5: Learning Morphological Disambiguation Rules for Turkish

So how long can words be?

uyu – sleep uyut – make X sleep uyuttur – have Y make X sleep uyutturt – have Z have Y make X sleep uyutturttur – have W have Z have Y make X sleep uyutturtturt – have Q have W have Z … …

Page 6: Learning Morphological Disambiguation Rules for Turkish

Morphological Analyzer for Turkish

masalı masal+Noun+A3sg+Pnon+Acc (= the story) masal+Noun+A3sg+P3sg+Nom (= his story) masa+Noun+A3sg+Pnon+Nom^DB+Adj+With (=

with tables)

Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing

Oflazer, K., Hakkani-Tür, D. Z., and Tür, G. (1999) Design for a turkish treebank. EACL’99

Kenneth R. Beesley and Lauri Karttunen, Finite State Morphology, CSLI Publications, 2003

Page 7: Learning Morphological Disambiguation Rules for Turkish

Features, IGs and Tags

126 unique features 9129 unique IGs

∞ unique tags 11084 distinct tags observed in

1M word training corpus

masa+Noun+A3sg+Pnon+Nom^DB+Adj+With

stemfeatures features

inflectional group (IG) IGderivationalboundary

tag

Page 8: Learning Morphological Disambiguation Rules for Turkish

Why not just do POS tagging?

from Oflazer (1999)

Page 9: Learning Morphological Disambiguation Rules for Turkish

Why not just do POS tagging?

Inflectional groups can independently act as heads or modifiers in syntactic dependencies.

Full morphological analysis is essential for further syntactic analysis.

Page 10: Learning Morphological Disambiguation Rules for Turkish

Morphological disambiguation

Ambiguity rare in English: lives = live+s or life+s

More serious in Turkish:42.1% of the tokens ambiguous

1.8 parses per token on average

3.8 parses for ambiguous tokens

Page 11: Learning Morphological Disambiguation Rules for Turkish

Morphological disambiguation

Task: pick correct parse given context1. masal+Noun+A3sg+Pnon+Acc

2. masal+Noun+A3sg+P3sg+Nom

3. masa+Noun+A3sg+Pnon+Nom^DB+Adj+With

– Uzun masalı anlat Tell the long story– Uzun masalı bitti His long story ended– Uzun masalı oda Room with long table

Page 12: Learning Morphological Disambiguation Rules for Turkish

Morphological disambiguation

Task: pick correct parse given context1. masal+Noun+A3sg+Pnon+Acc

2. masal+Noun+A3sg+P3sg+Nom

3. masa+Noun+A3sg+Pnon+Nom^DB+Adj+With

Key Idea

Build a separate classifier for each feature.

Page 13: Learning Morphological Disambiguation Rules for Turkish

Decision Lists

1. If (W = çok) and (R1 = +DA)

Then W has +Det

2. If (L1 = pek)

Then W has +Det

3. If (W = +AzI)

Then W does not have +Det

4. If (W = çok)

Then W does not have +Det

5. If TRUE

Then W has +Det

“pek çok alanda”(R1)

“pek çok insan” (R2) “insan çok daha”

(R4)

Page 14: Learning Morphological Disambiguation Rules for Turkish

Greedy Prepend Algorithm

GPA(data)1 dlist = NIL2 default-class = Most-Common-Class(data)3 rule = [If TRUE Then default-class]4 while Gain(rule, dlist, data) > 05 do dlist = prepend(rule, dlist)6 rule = Max-Gain-Rule(dlist, data)7 return dlist

Page 15: Learning Morphological Disambiguation Rules for Turkish

Training Data

1M words of news material Semi automatically disambiguated Created 126 separate training sets, one for

each feature Each training set only contains instances

which have the corresponding feature in at least one of their parses

Page 16: Learning Morphological Disambiguation Rules for Turkish

Input attributes

For a five word window: The exact word string (e.g. W=Ali'nin) The lowercase version (e.g. W=ali'nin) All suffixes (e.g. W=+n, W=+In, W=+nIn,

W=+'nIn, etc.) Character types (e.g. Ali'nin would be

described with W=UPPER-FIRST, W=LOWER-MID,

W=APOS-MID, W=LOWERLAST)

Average 40 features per instance.

Page 17: Learning Morphological Disambiguation Rules for Turkish

Sample decision lists

+Acc01 W=+InI1 W=+yI1 W=UPPER01 W=+IzI1 L1=~bu1 W=~onu1 R1=+mAK1 W=~beni0 W=~günü1 W=+InlArI1 W=~onlarý0 W=+olAyI0 W=~sorunu… (672 rules)

+Prop10 W=STFIRST0 W==Türk1 W=STFIRST R1=UCFIRST0 L1==.0 W=+AnAl1 R1==,0 W=+yAD1 W=UPPER00 W=+lAD0 W=+AK1 R1=UPPER0 W==Milli1 W=STFIRST R1=UPPER0… (3476 rules)

Page 18: Learning Morphological Disambiguation Rules for Turkish

Models for individual features

0

1000

2000

3000

4000

5000

6000

7000

A3sgNoun

PnonNom DB

Verb

AdjPos

P3sgP2s

gPro

pZer

oAcc

Adverb

A3pl

Ru

les

84

86

88

90

92

94

96

98

100

Acc

ura

cy

Page 19: Learning Morphological Disambiguation Rules for Turkish

Combining models

masal+Noun+A3sg+P3sg+Nom masal+Noun+A3sg+Pnon+Acc Decision list results and confidence (only

distinguishing features necessary): P3sg = yes (89.53%) Nom = no (93.92%) Pnon = no (95.03%) Acc = yes (89.24%)

score(P3sg+Nom) = 0.8953 x (1 – 0.9392) score(Pnon+Acc) = (1 – 0.9503) x 0.8924

Page 20: Learning Morphological Disambiguation Rules for Turkish

Evaluation

Test corpus: 1000 words, hand tagged Accuracy: 95.87% (conf. int: 94.57-97.08) Better than the training data !?

Page 21: Learning Morphological Disambiguation Rules for Turkish

Other Experiments

Retraining on own output: 96.03% Training on unambiguous data: 82.57% Forget disambiguation, let’s do tagging with a

single decision list: 91.23%, 10000 rules

Page 22: Learning Morphological Disambiguation Rules for Turkish

Contributions

Learning morphological disambiguation rules using GPA decision list learner.

Reducing data sparseness and increase noise tolerance using separate models for individual output features.

ECOC, WSD, etc.