View
220
Download
0
Category
Tags:
Preview:
Citation preview
Morphological Analysis of Hungarian in NooJ
Peter Vajda
Hungarian Academy of Sciences
Research Institute for Linguistics
2
Summary
1. Hungarian morphology
2. Linguistic resources
3. Some experiments with INTEX/NooJ
4. The solution
5. Examples
6. Derivation
Hungarian morphology Agglutinative (and sometimes inflectional) The suffixes
Can have many forms (vowel harmony) Can change the form of the stem (there are groups of variants)
bokor (sg.) bokr – ok (pl.); alma (sg.) almá – k (pl.) Sometimes begin with a linking vowel
plural: -k / -ak / -ek / -ok / -ök A noun (adj., num.) can have ~ 7-800 forms A verb can have ~ 80 forms Orthography: there are difficulties, when digraphs are
doubled cs cscs ccs, gy gygy ggy
Nominal inflections 18 cases (nominative, accusative, dative + grammatical relations
which are expressed by prepositions in French/English) Expression of the possessives by suffixes
Which mark the number, the person, the number of the possessed ház-a-m, ház-a-d, ház-a (my/your/his house) ház-a-i-m, ház-a-i-d, ház-a-i (my/your/his houses)
Anaphorical possessive A ház Péteré The house is Péter’s; A házak Péteréi The
houses are Péter’s The maximal number of inflections can be five
barát-ai-tok-é-i-t (I can see) those (things) of your friends’
5
Verbal inflections
Two tenses: present, past three modes: indicative, conditional, imperative definite and indefinite conjugations
Néz-ek egy asztalt Néz-em az asztalt I watch a table I watch the table
one special form where the subject is in 1st person and the object is in the 2nd: néz-lek (I watch you)
infinitive and „conjugated infinitive” (sometimes subjunctive in French)
6
The resources
Dictionary of Hungarian inflections (Elekfi,’92) A traditional description, profound and exhaustive Two dimensional classification:
Vowel harmony (3 classes) and complex features of the stems (stem-types, linking vowel, etc.,
55 classes) Altogether: 1700 different sub-classes (paradigms)
systematic differences and similarities are hidden not convenient to use in finite-state transducers We have converted it into a database, where we can
retrieve all the forms from
7
The experiments with INTEX/NooJ ‘Brute-force’ method
We created one graph per sub-class for testing INTEX 1700 sub-graphs 45000 paths in the graphs…
Using only dictionaries (.nod) Dictionary of stems (70000 words)
ház,ház,N+C2A+stem=1+NW Dictionary of suffixes (one million entries)
(*)ak,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PL} (*)am,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PSe1} (*)at,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=ACC} (*)at,<$1=N+C2A1+stem=1>{$0,$1L,N$1S+ana=ACC} (*)amat,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PSe1+ACC}
dictionary of lexical forms (which have a zero morpheme as suffix) ház,ház,N+ana=NOM
8
The linguistic solution transform the database into a grammar based on morpho-
phonological features The grammatical features of stems and morphemes are in the dictionary The features of the stems and the suffixes can be unified
•Grammar•We have to describe the order of the morphemes•Introduce features which select from the allomorphs
9
The order of morphemes for nominals
10
barát-a-i-tok-é-i-t
barát,N +PS +PL +ps_2 +ps_pl +ANAP+i +ACC
The order of morphemes for nominals
11
12
Morpho-phonological features
To introduce features we examine the allomorphs
HÁZ HAJÓ HÁZ - A HAJÓ-JA ház,,N+nonj hajó,,N+j HÁZ - AT HAJÓ - T ház,,N+nonj+acclink hajó,,N+j+accnolink
The dictionary
15
The plural and the accusativekalap - ot (hat, SG+ACC)
kalap - ok - at (hats, PL+ACC)
Derivation
Can change or leave the category (POS) Introduce new features
kosár kosar - ak (pl.) basket kosar-as kosar - as - ok (pl.) basketball player
Simple cases are handled by graphs Others are listed as lemmas in the dictionary
17
Assimilation and digraphs some suffixes (eg. val/vel) enforce total assimilation:
LÉC + VEL LÉCCEL PÉCS + VEL PÉCCSEL PLÉD + VEL PLÉDDEL
18
Conclusion We have adapted the traditional description We have described the inflectional morphology of
Hungarian in NooJ grammars/dictionaries Handled some of the derivational morphology
Objectives Find a simpler method for derivation Disambiguation Automatic methods to expand the dictionary
Automatic delegation of features
Thank you
Recommended