Upload
yakov
View
39
Download
1
Embed Size (px)
DESCRIPTION
Morphology & Finite-State Transducers. Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent ways} morpheme: small meaning bearing unit e.g., books = book+s, cats = cat + s Classes of Morphemes stem (root) affixes ( 詞綴 ) - PowerPoint PPT Presentation
Citation preview
Jing-Shin Chang 1
Morphology & Finite-State Transducers
Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-depende
nt ways} morpheme: small meaning bearing unit e.g., books = book+s, cats = cat + s
Classes of Morphemes stem (root) affixes ( 詞綴 )
Morphological Parsing (or Analysis): breaking down surface forms (or input forms) into stem and affixes e.g., foxes = “fox” + “-es” (+N, +PL) stemming: mapping surface form to stem (extracting stem from surf
ace form) Morphological Generation:
generate surface forms from stem and morphological features
Jing-Shin Chang 2
Morphology & Finite-State Transducers
Applications: spelling check, tokenization for parsing
Knowledge for Morphological Analysis morphological rules (morphotactics): constituents of words & order spelling rules (orthographic rules): spelling changes
Dictionary/Lexicon: list of stems and affixes stems of regular words (plus irregular variants) as indexing keys not efficient to enumerate all morphological variants
some morphemes are productive: can be applied to all words or new words (impossible to list all of them)
morphological variants depends on spelling as well as pronunciation morphologically complex languages (e.g., Turkish) may have a large num
ber of morphological variants
Jing-Shin Chang 3
Morphology & Finite-State Transducers
Models for morphological analysis/generation generate-and-test: enumerate all possibilities & test against
constraints FSA / two-level FST model: modeling lexicon, morphological rules and
orthographic rules as finite state automata or transducers
Jing-Shin Chang 4
English Morphology
Morphology: the study of the way words are built up from smaller meaning-bearing u
nits (morphemes) morpheme: the minimal meaning-bearing unit in a language
Classes of Morphemes stem (root): main morpheme of the word, supplying main meaning affixes ( 詞綴 ): add additional meanings
Affixes: prefixes: un-happy suffixes: eat-s infixes: inserted inside the stem
Philipine language Tagalog: hingi (“borrow”) => h-um-ingi (agent of borrow)
circumfixes: sagen (“to say”) => ge-sag-t (“said”) (German) [pp]
Jing-Shin Chang 5
English Morphology
Affixes: concatenative: prefix & suffixes non-concatenative: infixes & templatic morphology
Templatic: root-and-pattern Arabic, Hebrew, Semitic languages Hebrew: lmd (“learn”, “study”) (tri-consonantal root) active voice template: CaCaC => lamad (‘he studied’) intensive CiCeC template: => limed (‘he taught’) intensive passive template CuCaC => lumad (‘he was taught’)
Multiple affixes: un-believabl-y Agglutinative languages:
languages that tends to string affixes together (Turkish, Japanese, Korean)
Jing-Shin Chang 6
English Morphology
Infection: stem + morphemes => same class e.g., book + s => books (same meaning, same part of speech( 詞類 ))
Derivation: stem + morphemes => different class e.g., computerize + ation => computerization [verb => noun]
Jing-Shin Chang 7
English Morphology
Inflectional Morphology only Noun, Verb, Adjective, Adverb can be inflected
Noun: Plural, Possessive Regular: Plural (+s/+es/+ies), Possessive (+’s, +s’) Irregular: ox-en, mouse => mice
Verb (main/ 一般 , modal/ 助 , primary/be): Forms: stem ( 現 / 不定 ), -s ( 現 /P3SG), -ing( 動名 / 現分 ), -ed ( 過 / 過分
/ 完成 ) Regular: (+s/+es,-y+ies), -e+ing/+ing/+.ing (consonant doubling), +d/
+ed/+.ed Irregular: e.g., eat => ate, eaten (+en), catch => caught Consonant doubling: ( 短母音 )+ 單子音 => double -c => -ck (picnicked)
Adjective/Adverb: comparative/extreme happy => happier, happiest, happily
Jing-Shin Chang 8
English Morphology
Derivational Morphology usually resulting in different classes need part of speech (POS) conversion from root POS & affixes to get
correct POS
Nominalization: V/A => N computerize => computerization more examples …
N/V => A computation => computational more examples …
Jing-Shin Chang 9
Chinese Morphology
Chinese Morphemes hard to be distinguished from characters and words and compound
words free morphemes bound morphemes
Examples 副 - 總統 , 前 - 妻 , 非 - 經濟 ( 因素 ) 學生 - 們 哈日 - 族 , 銀髮 - 族 工業 - 化 , 綠 - 化 , 藍 - 化 , 腐 - 化 , 石 - 化 , 神 - 化 公務 - 員 , 業務 - 員 , 推銷 - 員 , 運動 - 員
Jing-Shin Chang 10