View
220
Download
1
Tags:
Embed Size (px)
Citation preview
CMSC 723 / LING 645: Intro to Computational Linguistics
September 15, 2004: Dorr
More about FSA’s, Finite State Morphology (J&M 3)
Prof. Bonnie J. DorrDr. Christof Monz
TA: Adam Lee
More about FSAs
TransducersEquivalence of DFSAs and NFSAsRecognition as search: depth-first, breadth-
search
Regular languages
Regular languages are characterized by FSAs
For every NFSA, there is an equivalent DFSA.
Regular languages are closed under concatenation, Kleene closure, union.
Morphology
Definitions and Problems– What is Morphology?– Topology of Morphologies
Approaches to Computational Morphology– Lexicons and Rules– Computational Morphology Approaches
Morphology
The study of the way words are built up from smaller meaning units called Morphemes
Syntax Lexeme/Inflected Lexeme Grammars sentences
Morphology Morpheme/Allomorph Morphotactics words
Phonology Phoneme/Allophone Phonotactics letters
Abstract versus Realized HOP +PAST hop +ed hopped /hapt/
Phonology and Morphology
Phonology vs. OrthographyHistorical spelling
– night, nite – attention, mission, fish
Script Limitations– Spoken English has 14 vowels
• heed hid hayed head had hoed hood who’d hide how’d taught Tut toy enough
– English Alphabet has 5• Use vowel combinatios: far fair fare• Consonantal doubling (hopping vs. hoping)
Syntax and Morphology
Phrase-level agreement– Subject-Verb
• John studies hard (STUDY+3SG)
– Noun-Adjective• Las vacas hermosas
Sub-word phrasal structures– נויספרבש– נו+ים+ספר+ב+ש– That+in+book+PL+Poss:1PL– Which are in our books
conj
prep
noun
posspluralarticle
Topology of Morphologies
Concatenative vs. Templatic
Derivational vs. Inflectional
Regular vs. Irregular
Concatenative Morphology
Morpheme+Morpheme+Morpheme+…Stems: also called lemma, base form, root, lexeme
– hope+ing hoping hop hoppingAffixes
– Prefixes: Antidisestablishmentarianism– Suffixes: Antidisestablishmentarianism– Infixes: hingi (borrow) – humingi (borrower) in Tagalog– Circumfixes: sagen (say) – gesagt (said) in German
Agglutinative Languages– uygarlaştıramadıklarımızdanmışsınızcasına– uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına– Behaving as if you are among those whom we could not cause to become civilized
Templatic Morphology
Roots and Patterns
وكتمب
ب K T B
و? ??َم�
تك
בוכת
ב
ו? ??
תכ
maktuubwritten
ktuuvwritten
Templatic Morphology: Root Meaning
KTB: writing “stuff”
כתב
מכתב
כתב
כתיבspelling
כתובתaddress
كتب
كاتب
مكتوب
كتابbook
مكتبةlibrary
مكتبoffice
write
writer
letter
Derivational vs. Inflectional
Word Classes– Parts of speech: noun, verb, adjectives, etc.– Word class dictates how a word combines with
morphemes to form new words
Derivational morphology
Nominalization: computerization, appointee, killer, fuzziness
Formation of adjectives: computational, clueless, embraceable
CatVar: Categorial Variation Databasehttp://clipdemos.umiacs.umd.edu/catvar/
Inflectional morphology
Adds: Tense, number, person, mood, aspect
Word class doesn’t changeWord serves new grammatical roleFive verb forms in EnglishOther languages have (lots more)
Nouns and Verbs (in English)
Nouns have simple inflectional morphology– cat– cat+s, cat+’s
Verbs have more complex morphology
Regulars and Irregulars
Nouns– Cat/Cats– Mouse/Mice, Ox, Oxen, Goose, Geese
Verbs– Walk/Walked– Go/Went, Fly/Flew
Regular (English) Verbs
Morphological Form Classes Regularly Inflected Verbs
Stem walk merge try map
-s form walks merges tries maps
-ing form walking merging trying mapping
Past form or –ed participle walked merged tried mapped
Irregular (English) Verbs
Morphological Form Classes Irregularly Inflected Verbs
Stem eat catch cut
-s form eats catches cuts
-ing form eating catching cutting
Past form ate caught cut
-ed participle eaten caught cut
Computational Morphology
Finite State Morphology– Finite State Transducers (FST)
Input/Output
Analysis/Generation
Computational Morphology
WORD STEM (+FEATURES)* cats cat +N +PL cat cat +N +SG cities city +N +PLgeese goose +N +PLducks (duck +N +PL) or
(duck +V +3SG)merging merge +V +PRES-PART caught (catch +V +PAST-PART) or
(catch +V +PAST)
Building a Morphological Parser
The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language
Approaches– Lexicon only– Lexicon and Rules
• Finite-state Automata• Finite-state Transducers
– Rules only
Lexicon-only Morphology
acclaim acclaim $N$
acclaim acclaim $V+0$
acclaimed acclaim $V+ed$
acclaimed acclaim $V+en$
acclaiming acclaim $V+ing$
acclaims acclaim $N+s$
acclaims acclaim $V+s$
acclamation acclamation $N$
acclamations acclamation $N+s$
acclimate acclimate $V+0$
acclimated acclimate $V+ed$
acclimated acclimate $V+en$
acclimates acclimate $V+s$
acclimating acclimate $V+ing$
• The lexicon lists all surface level and lexical level pairs
• No rules …?
• Analysis/Generation is easy
• Very large for English
• What about Arabic or Turkish?
• Chinese?
Building a Morphological Parser
The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language
Approaches– Lexicon only– Lexicon and Rules
• Finite-state Automata• Finite-state Transducers
– Rules only
Lexicon and Rules:FSA Inflectional Noun Morphology
reg-noun Irreg-pl-noun Irreg-sg-noun plural
fox
cat
dog
geese
sheep
mice
goose
sheep
mouse
-s
• English Noun Lexicon
• English Noun Rule
Lexicon and Rules: FSA English Verb Inflectional Morphology
reg-verb-stem irreg-verb-stem irreg-past-verb past past-part pres-part 3sg
walkfrytalkimpeach
cutspeakspokensing
sang
caughtateeaten
-ed -ed -ing -s
Morphological Parsing
Finite-state automata (FSA) – Recognizer– One-level morphology
Finite-state transducers (FST)– Two-level morphology
• PC-Kimmo (Koskenniemi 83)
– input-output pair
Terminology for PC-Kimmo
Upper = lexical tapeLower = surface tapeCharacters correspond to pairs, written a:b If “a:a”, write “a” for shorthandTwo-level lexical entries# = word boundary ^ = morpheme boundaryOther = “any feasible pair that is not in this transducer”Final states indicated with “:” and non-final states
indicated with “.”
Spelling Rules
Name Rule Description Example
Consonant Doubling 1-letter consonant doubled before -ing/-ed beg/begging
E-deletion Silent e dropped before -ing and -ed make/making
E-insertion e added after s,z,x,ch,sh before s watch/watches
Y-replacement -y changes to -ie before -s, -i before -ed try/tries
K-insertion verbs ending with vowel + -c add -k panic/panicked
FSTs and ambiguity
Parse Example 1: unionizable– union +ize +able– un+ ion +ize +able
Parse Example 2: assess– assessv– assN +essN
Parse Example 3: tender– tenderAJ– tenNum+dAJ+erCMP
What to do about Global Ambiguity?
Accept first successful structure
Run parser through all possible paths
Bias the search in some manner
Computational Morphology
The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language
Approaches– Lexicon only– Lexicon and Rules
• Finite-state Automata• Finite-state Transducers
– Rules only
Computational Morphology
The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language
Approaches– Lexicon only– Lexicon and Rules
• Finite-state Automata• Finite-state Transducers
– Rules only (next time!!)