10
Jing-Shin Chang 1 Morphology & Finite-State Transducers Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-d ependent ways} morpheme: small meaning bearing unit e.g., books = book+s, cats = cat + s Classes of Morphemes stem (root) affixes ( 詞詞 ) Morphological Parsing (or Analysis): breaking down surface forms (or input forms) into stem and affixes e.g., foxes = “fox” + “-es” (+N, +PL) stemming: mapping surface form to stem (extracting stem fro m surface form) Morphological Generation: generate surface forms from stem and morphological features

Morphology & Finite-State Transducers

  • Upload
    yakov

  • View
    39

  • Download
    1

Embed Size (px)

DESCRIPTION

Morphology & Finite-State Transducers. Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-dependent ways} morpheme: small meaning bearing unit e.g., books = book+s, cats = cat + s Classes of Morphemes stem (root) affixes ( 詞綴 ) - PowerPoint PPT Presentation

Citation preview

Page 1: Morphology & Finite-State Transducers

Jing-Shin Chang 1

Morphology & Finite-State Transducers

Morphology: the study of constituents of words Word = {a set of morphemes, combined in language-depende

nt ways} morpheme: small meaning bearing unit e.g., books = book+s, cats = cat + s

Classes of Morphemes stem (root) affixes ( 詞綴 )

Morphological Parsing (or Analysis): breaking down surface forms (or input forms) into stem and affixes e.g., foxes = “fox” + “-es” (+N, +PL) stemming: mapping surface form to stem (extracting stem from surf

ace form) Morphological Generation:

generate surface forms from stem and morphological features

Page 2: Morphology & Finite-State Transducers

Jing-Shin Chang 2

Morphology & Finite-State Transducers

Applications: spelling check, tokenization for parsing

Knowledge for Morphological Analysis morphological rules (morphotactics): constituents of words & order spelling rules (orthographic rules): spelling changes

Dictionary/Lexicon: list of stems and affixes stems of regular words (plus irregular variants) as indexing keys not efficient to enumerate all morphological variants

some morphemes are productive: can be applied to all words or new words (impossible to list all of them)

morphological variants depends on spelling as well as pronunciation morphologically complex languages (e.g., Turkish) may have a large num

ber of morphological variants

Page 3: Morphology & Finite-State Transducers

Jing-Shin Chang 3

Morphology & Finite-State Transducers

Models for morphological analysis/generation generate-and-test: enumerate all possibilities & test against

constraints FSA / two-level FST model: modeling lexicon, morphological rules and

orthographic rules as finite state automata or transducers

Page 4: Morphology & Finite-State Transducers

Jing-Shin Chang 4

English Morphology

Morphology: the study of the way words are built up from smaller meaning-bearing u

nits (morphemes) morpheme: the minimal meaning-bearing unit in a language

Classes of Morphemes stem (root): main morpheme of the word, supplying main meaning affixes ( 詞綴 ): add additional meanings

Affixes: prefixes: un-happy suffixes: eat-s infixes: inserted inside the stem

Philipine language Tagalog: hingi (“borrow”) => h-um-ingi (agent of borrow)

circumfixes: sagen (“to say”) => ge-sag-t (“said”) (German) [pp]

Page 5: Morphology & Finite-State Transducers

Jing-Shin Chang 5

English Morphology

Affixes: concatenative: prefix & suffixes non-concatenative: infixes & templatic morphology

Templatic: root-and-pattern Arabic, Hebrew, Semitic languages Hebrew: lmd (“learn”, “study”) (tri-consonantal root) active voice template: CaCaC => lamad (‘he studied’) intensive CiCeC template: => limed (‘he taught’) intensive passive template CuCaC => lumad (‘he was taught’)

Multiple affixes: un-believabl-y Agglutinative languages:

languages that tends to string affixes together (Turkish, Japanese, Korean)

Page 6: Morphology & Finite-State Transducers

Jing-Shin Chang 6

English Morphology

Infection: stem + morphemes => same class e.g., book + s => books (same meaning, same part of speech( 詞類 ))

Derivation: stem + morphemes => different class e.g., computerize + ation => computerization [verb => noun]

Page 7: Morphology & Finite-State Transducers

Jing-Shin Chang 7

English Morphology

Inflectional Morphology only Noun, Verb, Adjective, Adverb can be inflected

Noun: Plural, Possessive Regular: Plural (+s/+es/+ies), Possessive (+’s, +s’) Irregular: ox-en, mouse => mice

Verb (main/ 一般 , modal/ 助 , primary/be): Forms: stem ( 現 / 不定 ), -s ( 現 /P3SG), -ing( 動名 / 現分 ), -ed ( 過 / 過分

/ 完成 ) Regular: (+s/+es,-y+ies), -e+ing/+ing/+.ing (consonant doubling), +d/

+ed/+.ed Irregular: e.g., eat => ate, eaten (+en), catch => caught Consonant doubling: ( 短母音 )+ 單子音 => double -c => -ck (picnicked)

Adjective/Adverb: comparative/extreme happy => happier, happiest, happily

Page 8: Morphology & Finite-State Transducers

Jing-Shin Chang 8

English Morphology

Derivational Morphology usually resulting in different classes need part of speech (POS) conversion from root POS & affixes to get

correct POS

Nominalization: V/A => N computerize => computerization more examples …

N/V => A computation => computational more examples …

Page 9: Morphology & Finite-State Transducers

Jing-Shin Chang 9

Chinese Morphology

Chinese Morphemes hard to be distinguished from characters and words and compound

words free morphemes bound morphemes

Examples 副 - 總統 , 前 - 妻 , 非 - 經濟 ( 因素 ) 學生 - 們 哈日 - 族 , 銀髮 - 族 工業 - 化 , 綠 - 化 , 藍 - 化 , 腐 - 化 , 石 - 化 , 神 - 化 公務 - 員 , 業務 - 員 , 推銷 - 員 , 運動 - 員

Page 10: Morphology & Finite-State Transducers

Jing-Shin Chang 10