51
CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr Dr. Christof Monz TA: Adam Lee

CMSC 723 / LING 645: Intro to Computational Linguistics September 15, 2004: Dorr More about FSA’s, Finite State Morphology (J&M 3) Prof. Bonnie J. Dorr

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

CMSC 723 / LING 645: Intro to Computational Linguistics

September 15, 2004: Dorr

More about FSA’s, Finite State Morphology (J&M 3)

Prof. Bonnie J. DorrDr. Christof Monz

TA: Adam Lee

More about FSAs

TransducersEquivalence of DFSAs and NFSAsRecognition as search: depth-first, breadth-

search

Recognition using NFSAs

NFSA Recognition of “baaa!”

Breadth-first Recognition of “baaa!”

should be q2

Regular languages

Regular languages are characterized by FSAs

For every NFSA, there is an equivalent DFSA.

Regular languages are closed under concatenation, Kleene closure, union.

Concatenation

Kleene Closure

Union

Morphology

Definitions and Problems– What is Morphology?– Topology of Morphologies

Approaches to Computational Morphology– Lexicons and Rules– Computational Morphology Approaches

Morphology

The study of the way words are built up from smaller meaning units called Morphemes

Syntax Lexeme/Inflected Lexeme Grammars sentences

Morphology Morpheme/Allomorph Morphotactics words

Phonology Phoneme/Allophone Phonotactics letters

Abstract versus Realized HOP +PAST hop +ed hopped /hapt/

Phonology and Morphology

Phonology vs. OrthographyHistorical spelling

– night, nite – attention, mission, fish

Script Limitations– Spoken English has 14 vowels

• heed hid hayed head had hoed hood who’d hide how’d taught Tut toy enough

– English Alphabet has 5• Use vowel combinatios: far fair fare• Consonantal doubling (hopping vs. hoping)

Syntax and Morphology

Phrase-level agreement– Subject-Verb

• John studies hard (STUDY+3SG)

– Noun-Adjective• Las vacas hermosas

Sub-word phrasal structures– נויספרבש– נו+ים+ספר+ב+ש– That+in+book+PL+Poss:1PL– Which are in our books

conj

prep

noun

posspluralarticle

Topology of Morphologies

Concatenative vs. Templatic

Derivational vs. Inflectional

Regular vs. Irregular

Concatenative Morphology

Morpheme+Morpheme+Morpheme+…Stems: also called lemma, base form, root, lexeme

– hope+ing hoping hop hoppingAffixes

– Prefixes: Antidisestablishmentarianism– Suffixes: Antidisestablishmentarianism– Infixes: hingi (borrow) – humingi (borrower) in Tagalog– Circumfixes: sagen (say) – gesagt (said) in German

Agglutinative Languages– uygarlaştıramadıklarımızdanmışsınızcasına– uygar+laş+tır+ama+dık+lar+ımız+dan+mış+sınız+casına– Behaving as if you are among those whom we could not cause to become civilized

Templatic Morphology

Roots and Patterns

وكتمب

ب K T B

و? ??َم�

تك

בוכת

ב

ו? ??

תכ

maktuubwritten

ktuuvwritten

Templatic Morphology: Root Meaning

KTB: writing “stuff”

כתב

מכתב

כתב

כתיבspelling

כתובתaddress

كتب

كاتب

مكتوب

كتابbook

مكتبةlibrary

مكتبoffice

write

writer

letter

Derivational vs. Inflectional

Word Classes– Parts of speech: noun, verb, adjectives, etc.– Word class dictates how a word combines with

morphemes to form new words

Derivational morphology

Nominalization: computerization, appointee, killer, fuzziness

Formation of adjectives: computational, clueless, embraceable

CatVar: Categorial Variation Databasehttp://clipdemos.umiacs.umd.edu/catvar/

Inflectional morphology

Adds: Tense, number, person, mood, aspect

Word class doesn’t changeWord serves new grammatical roleFive verb forms in EnglishOther languages have (lots more)

Nouns and Verbs (in English)

Nouns have simple inflectional morphology– cat– cat+s, cat+’s

Verbs have more complex morphology

Regulars and Irregulars

Nouns– Cat/Cats– Mouse/Mice, Ox, Oxen, Goose, Geese

Verbs– Walk/Walked– Go/Went, Fly/Flew

Regular (English) Verbs

Morphological Form Classes Regularly Inflected Verbs

Stem walk merge try map

-s form walks merges tries maps

-ing form walking merging trying mapping

Past form or –ed participle walked merged tried mapped

Irregular (English) Verbs

Morphological Form Classes Irregularly Inflected Verbs

Stem eat catch cut

-s form eats catches cuts

-ing form eating catching cutting

Past form ate caught cut

-ed participle eaten caught cut

“To love” in Spanish

Computational Morphology

Finite State Morphology– Finite State Transducers (FST)

Input/Output

Analysis/Generation

Computational Morphology

WORD STEM (+FEATURES)* cats cat +N +PL cat cat +N +SG cities city +N +PLgeese goose +N +PLducks (duck +N +PL) or

(duck +V +3SG)merging merge +V +PRES-PART caught (catch +V +PAST-PART) or

(catch +V +PAST)

Building a Morphological Parser

The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language

Approaches– Lexicon only– Lexicon and Rules

• Finite-state Automata• Finite-state Transducers

– Rules only

Lexicon-only Morphology

acclaim acclaim $N$

acclaim acclaim $V+0$

acclaimed acclaim $V+ed$

acclaimed acclaim $V+en$

acclaiming acclaim $V+ing$

acclaims acclaim $N+s$

acclaims acclaim $V+s$

acclamation acclamation $N$

acclamations acclamation $N+s$

acclimate acclimate $V+0$

acclimated acclimate $V+ed$

acclimated acclimate $V+en$

acclimates acclimate $V+s$

acclimating acclimate $V+ing$

• The lexicon lists all surface level and lexical level pairs

• No rules …?

• Analysis/Generation is easy

• Very large for English

• What about Arabic or Turkish?

• Chinese?

Building a Morphological Parser

The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language

Approaches– Lexicon only– Lexicon and Rules

• Finite-state Automata• Finite-state Transducers

– Rules only

Lexicon and Rules:FSA Inflectional Noun Morphology

reg-noun Irreg-pl-noun Irreg-sg-noun plural

fox

cat

dog

geese

sheep

mice

goose

sheep

mouse

-s

• English Noun Lexicon

• English Noun Rule

Lexicon and Rules: FSA English Verb Inflectional Morphology

reg-verb-stem irreg-verb-stem irreg-past-verb past past-part pres-part 3sg

walkfrytalkimpeach

cutspeakspokensing

sang

caughtateeaten

-ed -ed -ing -s

FSA for Derivational Morphology: Adjectival Formation

More Complex Derivational Morphology

Using FSAs for Recognition: English Nouns and their Inflection

Morphological Parsing

Finite-state automata (FSA) – Recognizer– One-level morphology

Finite-state transducers (FST)– Two-level morphology

• PC-Kimmo (Koskenniemi 83)

– input-output pair

Terminology for PC-Kimmo

Upper = lexical tapeLower = surface tapeCharacters correspond to pairs, written a:b If “a:a”, write “a” for shorthandTwo-level lexical entries# = word boundary ^ = morpheme boundaryOther = “any feasible pair that is not in this transducer”Final states indicated with “:” and non-final states

indicated with “.”

Four-Fold View of FSTs

As a recognizerAs a generatorAs a translatorAs a set relater

Nominal Inflection FST

Lexical and Intermediate Tapes

Spelling Rules

Name Rule Description Example

Consonant Doubling 1-letter consonant doubled before -ing/-ed beg/begging

E-deletion Silent e dropped before -ing and -ed make/making

E-insertion e added after s,z,x,ch,sh before s watch/watches

Y-replacement -y changes to -ie before -s, -i before -ed try/tries

K-insertion verbs ending with vowel + -c add -k panic/panicked

Chomsky and Halle Notation

ε → e / xsz

^ __ s #

Intermediate-to-Surface Transducer

State Transition Table

Two-Level Morphology

Sample Run

KIMMO DEMO

FSTs and ambiguity

Parse Example 1: unionizable– union +ize +able– un+ ion +ize +able

Parse Example 2: assess– assessv– assN +essN

Parse Example 3: tender– tenderAJ– tenNum+dAJ+erCMP

What to do about Global Ambiguity?

Accept first successful structure

Run parser through all possible paths

Bias the search in some manner

Computational Morphology

The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language

Approaches– Lexicon only– Lexicon and Rules

• Finite-state Automata• Finite-state Transducers

– Rules only

Computational Morphology

The Rules and the Lexicon– General versus Specific– Regular versus Irregular– Accuracy, speed, space– The Morphology of a language

Approaches– Lexicon only– Lexicon and Rules

• Finite-state Automata• Finite-state Transducers

– Rules only (next time!!)

Readings for next time

J&M Chapter 6