19
Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

Embed Size (px)

Citation preview

Page 1: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

Morphological Analysis of Hungarian in NooJ

Peter Vajda

Hungarian Academy of Sciences

Research Institute for Linguistics

Page 2: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

2

Summary

1. Hungarian morphology

2. Linguistic resources

3. Some experiments with INTEX/NooJ

4. The solution

5. Examples

6. Derivation

Page 3: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

Hungarian morphology Agglutinative (and sometimes inflectional) The suffixes

Can have many forms (vowel harmony) Can change the form of the stem (there are groups of variants)

bokor (sg.) bokr – ok (pl.); alma (sg.) almá – k (pl.) Sometimes begin with a linking vowel

plural: -k / -ak / -ek / -ok / -ök A noun (adj., num.) can have ~ 7-800 forms A verb can have ~ 80 forms Orthography: there are difficulties, when digraphs are

doubled cs cscs ccs, gy gygy ggy

Page 4: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

Nominal inflections 18 cases (nominative, accusative, dative + grammatical relations

which are expressed by prepositions in French/English) Expression of the possessives by suffixes

Which mark the number, the person, the number of the possessed ház-a-m, ház-a-d, ház-a (my/your/his house) ház-a-i-m, ház-a-i-d, ház-a-i (my/your/his houses)

Anaphorical possessive A ház Péteré The house is Péter’s; A házak Péteréi The

houses are Péter’s The maximal number of inflections can be five

barát-ai-tok-é-i-t (I can see) those (things) of your friends’

Page 5: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

5

Verbal inflections

Two tenses: present, past three modes: indicative, conditional, imperative definite and indefinite conjugations

Néz-ek egy asztalt Néz-em az asztalt I watch a table I watch the table

one special form where the subject is in 1st person and the object is in the 2nd: néz-lek (I watch you)

infinitive and „conjugated infinitive” (sometimes subjunctive in French)

Page 6: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

6

The resources

Dictionary of Hungarian inflections (Elekfi,’92) A traditional description, profound and exhaustive Two dimensional classification:

Vowel harmony (3 classes) and complex features of the stems (stem-types, linking vowel, etc.,

55 classes) Altogether: 1700 different sub-classes (paradigms)

systematic differences and similarities are hidden not convenient to use in finite-state transducers We have converted it into a database, where we can

retrieve all the forms from

Page 7: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

7

The experiments with INTEX/NooJ ‘Brute-force’ method

We created one graph per sub-class for testing INTEX 1700 sub-graphs 45000 paths in the graphs…

Using only dictionaries (.nod) Dictionary of stems (70000 words)

ház,ház,N+C2A+stem=1+NW Dictionary of suffixes (one million entries)

(*)ak,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PL} (*)am,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PSe1} (*)at,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=ACC} (*)at,<$1=N+C2A1+stem=1>{$0,$1L,N$1S+ana=ACC} (*)amat,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PSe1+ACC}

dictionary of lexical forms (which have a zero morpheme as suffix) ház,ház,N+ana=NOM

Page 8: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

8

The linguistic solution transform the database into a grammar based on morpho-

phonological features The grammatical features of stems and morphemes are in the dictionary The features of the stems and the suffixes can be unified

•Grammar•We have to describe the order of the morphemes•Introduce features which select from the allomorphs

Page 9: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

9

The order of morphemes for nominals

Page 10: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

10

barát-a-i-tok-é-i-t

barát,N +PS +PL +ps_2 +ps_pl +ANAP+i +ACC

The order of morphemes for nominals

Page 11: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

11

Page 12: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

12

Morpho-phonological features

To introduce features we examine the allomorphs

HÁZ HAJÓ HÁZ - A HAJÓ-JA ház,,N+nonj hajó,,N+j HÁZ - AT HAJÓ - T ház,,N+nonj+acclink hajó,,N+j+accnolink

Page 13: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

The dictionary

Page 14: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics
Page 15: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

15

The plural and the accusativekalap - ot (hat, SG+ACC)

kalap - ok - at (hats, PL+ACC)

Page 16: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

Derivation

Can change or leave the category (POS) Introduce new features

kosár kosar - ak (pl.) basket kosar-as kosar - as - ok (pl.) basketball player

Simple cases are handled by graphs Others are listed as lemmas in the dictionary

Page 17: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

17

Assimilation and digraphs some suffixes (eg. val/vel) enforce total assimilation:

LÉC + VEL LÉCCEL PÉCS + VEL PÉCCSEL PLÉD + VEL PLÉDDEL

Page 18: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

18

Conclusion We have adapted the traditional description We have described the inflectional morphology of

Hungarian in NooJ grammars/dictionaries Handled some of the derivational morphology

Objectives Find a simpler method for derivation Disambiguation Automatic methods to expand the dictionary

Automatic delegation of features

Page 19: Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

Thank you