26
CS 4705 Morphology: Words and their Parts CS 4705

Morphology: Words and their Parts

  • Upload
    steffi

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Morphology: Words and their Parts. CS 4705. Basic Uses of Morphology. The study of how words are composed from smaller, meaning-bearing units ( morphemes ) Applications: Spelling correction: referece Hyphenation algorithms: refer-ence Part-of-speech analysis: googler - PowerPoint PPT Presentation

Citation preview

Page 1: Morphology: Words and their Parts

CS 4705

Morphology: Wordsand their Parts

CS 4705

Page 2: Morphology: Words and their Parts

Basic Uses of Morphology

• The study of how words are composed from smaller, meaning-bearing units (morphemes)

• Applications:– Spelling correction: referece– Hyphenation algorithms: refer-ence– Part-of-speech analysis: googler– Text-to-speech: grapheme-to-phoneme

conversion• hothouse (/T/ or /D/)

Page 3: Morphology: Words and their Parts

– Speech recognition: phoneme-to-grapheme conversion

– Amusing poetry and artificial languages in standardized tests

• ‘Twas brillig and the slithy toves…

• Muggles moogled migwiches

Page 4: Morphology: Words and their Parts

What is a word?

• In formal languages, words are arbitrary strings• In natural languages, words are made up of

meaningful subunits called morphemes– Allows for productivity: googled, texted– Abstract concepts denoting entities or

relationships in the world• Roots +• Syntactic or grammatical elements

– Realizations of morphemes: morphs• Door realizes door; take and took realize take

Page 5: Morphology: Words and their Parts

• Allomorphs are classes of related morphs that realize a given morpheme

– Allomorphs of s include en, men, es in English– Take and took are allomorphs of take

– Sum: Morpheme [s] is realized by an allomorph class that includes the related morphs {en,men,es}

– Syntactic or grammatical morphemes can convey many things– In Italian, mark nouns for gender and number

Singular PluralMasc pomodoro pomodoriFem cipolla cipollepomodor- cipoll-: stems, may or may not occur on their own as words

– Stem may not occur as a word: derivative/deriv– Base form (lemma) occurs as word: derivative/derive– Sometimes the same: cars has stem ‘car’ and base form or lemma

‘car’ too

Page 6: Morphology: Words and their Parts

What useful information does morphology give us?

• Different things in different languages

– Spanish: hablo, hablaré/ English: I speak, I will speak

– English: book, books/ Japanese: hon, hon

• Languages differ in how they encode morphological information

– Isolating languages (e.g. Cantonese) have no affixes: each word usually has 1 morpheme

– Agglutinative languages (e.g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem (like beads on a string) – each feature realized by a single affix, e.g. Finnish

Page 7: Morphology: Words and their Parts

epäjärjestelmällistyttämättömyydellänsäkäänköhän ‘Wonder if he can also ... with his capability of not causing

things to be unsystematic’

– Inflectional languages (e.g. English) merge different features into a single affix (e.g. ‘s’ in likes indicates both person and tense); and the same feature can be realized by different affixes

– Polysynthetic languages (e.g. Inuit languages) express much of their syntax in their morphology, incorporating a verb’s arguments into the verb, e.g. Western Greenlandic

Aliikusersuillammassuaanerartassagaluarpaalli.aliiku-sersu-i-llammas-sua-a-nerar-ta-ssa-galuar-paal-lientertainment-provide-SEMITRANS-one.good.at-COP-say.that-REP-FUT-sure.but-3.PL.SUBJ/3SG.OBJ-but'However, they will say that he is a great entertainer, but ...'

– So….different languages may require very different morphological analyzers

Page 8: Morphology: Words and their Parts

Morphology Can Help Define Word Classes

• AKA morphological classes, parts-of-speech• Closed vs. open (function vs. content) class words

– Pronoun, preposition, conjunction, determiner,…

– Noun, verb, adverb, adjective,…• Identifying word classes is useful for almost any

task in NLP, from translation to speech recognition to topic detection…very basic semantics

Page 9: Morphology: Words and their Parts

(English) Inflectional Morphology

Word stem + grammatical morpheme different forms of same word– Usually produces word of same class– Usually serves a syntactic or grammatical

function (e.g. agreement)like likes or likedbird birds

• Nominal morphology– Plural forms

• s or es• Irregular forms (goose/geese)

Page 10: Morphology: Words and their Parts

• Mass vs. count nouns (fish/fish(es), email or emails?)

– Possessives (cat’s, cats’)

• Verbal inflection

– Main verbs (sleep, like, fear) relatively regular• -s, ing, ed

• And productive: emailed, instant-messaged, faxed, homered

• But some are not:

– eat/ate/eaten, catch/caught/caught

– Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive

» Be: am/is/are/were/was/been/being

– Irregular verbs few (~250) but frequently occurring

Page 11: Morphology: Words and their Parts

• Particles occur in only one form: in English– Prepositions: to, from– Adverbs: happily, quickly– Conjunctions: but, and– Articles: the, a, an– Japanese?

• So….English inflectional morphology is fairly easy to model….with some special cases...

Page 12: Morphology: Words and their Parts

Derivational Morphology

• Word stem + syntactic/grammatical morpheme new words– Usually produces word of different class– Incomplete process: derivational morphs cannot

be applied to just any member of a class• Verbs --> nouns

– -ize verbs -ation nouns– generalize, realize generalization, realization– synthesize but no synthesization

Page 13: Morphology: Words and their Parts

• Verbs, nouns adjectives– embrace, pity embraceable, pitiable– care, wit careless, witless

• Adjective adverb– happy happily

• Process selective in unpredictable ways– Less productive: nerveless/*evidence-less,

malleable/*sleep-able, rar-ity/*rareness– Meanings of derived terms harder to predict by

rule• clueless, careless, nerveless, sleepless

Page 14: Morphology: Words and their Parts

• Derivation can be applied recursively:– Hospital hospitalize hospitalization

prehospitalization …– Morphological analysis identifies concatenative

processes as well as morphemes[pre[[[hospital]ize]ation]]

– But there are bracketing paradoxesunhappier

[un[happier]: not happier

[[unhappy]er]: more unhappy

Page 15: Morphology: Words and their Parts

Compounding

• Two base forms join to form a new word– Bedtime, Weinerschnitzel, Rotwein– Careful? Compound or derivation?

Page 16: Morphology: Words and their Parts

Affixes can be attached to stems in different ways

– Prefixation• Immaterial

– Suffixation: more common across languages than prefixation

• Trying

– Circumfixation: combine prefixation and suffixation

• Gesagt

Page 17: Morphology: Words and their Parts

– Infixation• English: Absobl**dylutely

• Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red) kumilad (to be red))

Page 18: Morphology: Words and their Parts

Concatenative vs. Non-concatenative Morphology

• Semitic root-and-pattern morphology– Root (2-4 consonants) conveys basic semantics

(e.g. Arabic /ktb/)– Vowel pattern conveys voice and aspect– Derivational template (binyan) identifies word

class

Page 19: Morphology: Words and their Parts

Template Vowel Pattern

active passive

CVCVC katab kutib write

CVCCVC kattab kuttib cause to write

CVVCVC ka:tab ku:tib correspond

tVCVVCVC taka:tab tuku:tib write each other

nCVVCVC nka:tab nku:tib subscribe

CtVCVC ktatab ktutib write

stVCCVC staktab stuktib dictate

Page 20: Morphology: Words and their Parts

Morphotactics

• What are the ‘rules’ for constructing a word in a given language?– Pseudo-intellectual vs. *intellectual-pseudo– Rational-ize vs *ize-rational– Cretin-ous vs. *cretin-ly vs. *cretin-acious

• Possible ‘rules’– Suffixes are suffixes and prefixes are prefixes– Certain affixes attach to certain types of stems

(nouns, verbs, etc.)– Certain stems can/cannot take certain affixes

Page 21: Morphology: Words and their Parts

• Semantics: In English, un- cannot attach to adjectives that already have a negative connotation:– Unhappy vs. *unsad– Unhealthy vs. *unsick– Unclean vs. *undirty

• Phonology: In English, -er cannot attach to words of more than two syllables– great, greater– Happy, happier– Competent, *competenter– Elegant, *eleganter– Unruly, ?unrulier

Page 22: Morphology: Words and their Parts

Morphological Parsing

• These regularities enable us to create software to parse words into their component parts– Known words and new ones (e.g.

Pneumonoultramicroscopicsilicovolcanoconiosis, Columbianize, Columbianization)

Page 23: Morphology: Words and their Parts

Morphological Representations: Evidence from Human Performance

• Hypotheses:– Full listing hypothesis: words listed – Minimum redundancy hypothesis:

morphemes listed• Experimental evidence:

– Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither

– Regularly inflected forms (e.g. cars) prime stem (car) but not derived forms (e.g. management, manage)

Page 24: Morphology: Words and their Parts

– But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart)

• Speech errors suggest affixes must be represented separately in the mental lexicon– ‘easy enoughly’ for ‘easily enough’

Page 25: Morphology: Words and their Parts

Summing Up

• Different languages have different morphological systems– If we can discover how to decode such a

system, we can identify useful information about the word class and the semantic meaning of a word

– Morphological regularities provide basis for building (automatic) morphological analyzers

• Next time: Read Ch 3.2-3.6– HW1 will be assigned (check the course

syllabus and courseworks)

Page 26: Morphology: Words and their Parts

Announcements

• HW1 will now be due 9/25/07• WICS lunch tomorrow at noon in the CS Lounge,

452 MUDD (rsvp to [email protected])