53
1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

Embed Size (px)

Citation preview

Page 1: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

1

COMP 791A: Statistical Language Processing

Linguistic EssentialsChap. 3

Page 2: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

2

Levels of study of NLP Lexical

Possible words in a given language rose ?gellapou

Phonetics & phonology How words are related to sounds

rose [roz]

Parts-of-speech & Morphology How words are constructed from basic meaning units

(morphemes) friend + ly --> friendly friend + s --> friends rose + ly ≠ rosely woman + s ≠ womans

Phrase Structure and Syntax How words can be ordered to form correct sentences

?Red the is rose / adj det verb noun The rose is red / det noun verb adj

Page 3: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

3

Levels of study of NLP (con’t) Semantics

What words mean (lexical semantics, word sense disambiguation) chair --> furniture / person

How word meanings are combined into the meaning of sentences. The chair is broken. The chair is sick.

Pragmatics How language conventions affects the literal meaning (interpretation)

Do you have the time? Do you have the children?

Discourse How surrounding sentences affect interpretation

The chair’s leg is broken. He went skiing last week-end. The chair’s leg is broken. Someone placed a 500kg package on it.

World-Knowledge How general knowledge about the world affects interpretation

The prof sent the student to see the chair because he was fed up with his behavior.

The prof sent the student to see the chair because he wanted to see him. The prof sent the student to see the chair because he was taking in class.

Page 4: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

4

Levels of study of NLP

Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics

Discourse World-Knowledge

Page 5: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

5

Parts of Speech and Morphology Parts of Speech (POS)

word/lexical/syntactic/grammatical categories/tag/class

Ex: noun, verb, adjectives, prepositions, …

Morphology study and description of word formation in a

language modification of a root form (stem) by affixes affix: prefixes, suffixes, infixes, circumfixes and exceptions… thief --> thieves chief --> chiefs

Word categories are systematically related by morphological processes

Page 6: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

6

Morphological processes Inflection

to indicate case, gender, number, tense, person, mood, or voice does not change the word’s grammatical class or meaning significantly

car --> cars talk --> talking

Derivation creation of a new word may have different meaning and/or grammatical class

infect --> disinfect grateful --> ungrateful wide (adjective) --> widely (adverb) teach (verb) --> teacher (noun)

Compounding merging 2 or more words into a single one written as separate words but pronounced as a single word / denotes 1 single concept so merits an entry in lexicon

tea kettle, disk drive, mad cow disease

Page 7: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

7

Classes of POS Open (lexical) class

things, actions, events, … ex. cat, John, eat

new words can be added easily nouns, verbs, adjectives, adverbs some languages do not have all these categories

Closed (functional) class generally function/grammatical words

ex. the, in, and, for relatively fixed membership prepositions, determiners, pronouns, conjunctions, particles, numerals, auxiliary verbs

Page 8: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

8

Main POS Open class

Noun – refers to entities like people, places, things or ideas. Adjective – describes the properties of nouns or pronouns. Verb – describes actions, activities and states. Adverb – describes a verb, an adjective or another adverb.

Closed class Pronoun – word that take the place of a noun or other. Determiner – describes the particular reference of a noun. Preposition - expresses spatial or time relationships. …

Page 9: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

9

Nouns (open) Entities like people, places, things or ideas

ex: dog, tree, Mary, idea

Typical inflections: number (singular, plural), gender (masculine, feminine, neuter), case (nominative, genitive, accusative, dative)

Sub-categories: proper nouns (John) adverbial nouns (today, home)

Page 10: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

10

Verbs (open) Actions, activities, and states

The men work in the field.

The men are working in the field.

The men are in the field.

Typical inflections: tenses: present, past, future other inflection: number, person aspect: progressive, perfective voice: active, passive

Sub-category: auxiliaries (considered closed-class words)

ex: be, do, will modal verbs (considered closed-class words)

ex: can, should, could main verbs

Page 11: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

11

Main verbs Transitive

requires a direct object (found with questions: what? or whom?)

?The child broke. The child broke a glass.

Intransitive does not require a direct object.

The train arrived.

Some verbs can be both transitive and intransitive The ship sailed the seas. (transitive) The ship sails at noon. (intransitive)

I met my friend at the airport. (transitive) The delegates met yesterday. (intransitive)

Page 12: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

12

Adjectives (open) Properties and attributes

long road rainy day attractive hat

Typical inflections: number, gender, case

Sub-categories: comparative (richer) superlative (richest)

Page 13: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

13

Adverbs (open) words added to a verb, adjective, adverbs

or other to expand its meaning You must set up the copy now. Mary walks gracefully. Sometimes I take a walk in the woods. Jack usually leaves the house at seven. I have always admired her.

sub-categories: locative (here) degree (very) manner (slowly) temporal (late, yesterday (noun?))

Page 14: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

14

Closed class categories Determiners:

words that makes specific the denotation of a noun phrase articles the hat, a hat demonstrative this hat, that hat possessive John‘s hat, my hat, her book wh-determiner which hat, whose hat quantifier some hat, every hat

Prepositions: words that show the relationship between certain words in a sentence

The accident occurred under the bridge. by, to, at,…

Conjunctions: words used to join other words or group of words or, when, but, and,…

Auxiliary & modal verbs: be, do, can , may, should,…

Page 15: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

15

Closed class categories (con’t) Particles:

words that are added to main verbs to construct different verbs

check+out = check out, make+up = make up Ex:

She made up a story She made it up

particles vs. prepositions she <ran up> a bill / she <ran> <up> a hill

Numerals: one, third

Page 16: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

16

Closed class categories (con’t) Pronouns:

a word that replaces a noun or even another sentence ex: she, ourselves, mine, that

subcategories: Personal:

You are very nice. Possessive:

Mine is nicer. Interrogative: used to ask questions: who?, what?, which?

Who is that girl ? Demonstrative: point out definite persons, places or things:

this, these, that This is my book. He said he was busy, but that was a lie.

Relative: joins the clause which is introduced its own attachment: who, which, that

She is the girl who won the race. ...

Page 17: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

17

Other parts of speech

Interjections: Ouch!

Negatives: no, not

Politeness markers: Hello, bye

Existential: There are 3 students sleeping.

Page 18: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

18

Summary Open class

nouns cat, spirit verbs eat, cook adjectives slow, large adverbs slowly

Closed class prepositions on, under, at determiners a, the, some pronouns she, who, I, other conjunctions and, but, or auxiliary verbs can, may, should particles up, on, off numerals one, two, first

Page 19: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

19

The substitution test Basic test to determine if 2 words belong to the

same POS class

intelligentThe sad one is in the corner. green fat …

Page 20: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

20

POS Tagging

Automatically assign POS tags to words in a text. Children/NOUN eat/VERB sweet/ADJECTIVE candy/NOUN

The/ARTICLE children/NOUN ate/VERB the/ARTICLE cake/NOUN

The/ARTICLE news/NOUN has/AUXILIARY been/MAIN VERB quite/ADVERB sad/ADJECTIVE in/PREPOSITION fact/NOUN

./PERIOD

Page 21: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

21

1st step towards NLU easier then full NLU (results > 95% accuracy) Useful for:

speech recognition/ synthesis (better accuracy) how to recognize/pronounce a word CONtent /noun VS conTENT/adj

stemming in IR which morphological affixes the word can take adverb - ly = noun (friendly - ly = friend)

Indexing in IR pick out nouns which may be more important than other

words in indexing documents

Why do POS Tagging?

Page 22: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

22

Tag Sets

A tag indicates the various conventional parts of speech.

Different Tag Sets have been used Ex. Brown Tag Set, Penn Treebank Tag Set Tag examples:

NP Proper noun NN Singular noun AT Article DET Determinant

More on this later

Page 23: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

23

Penn Treebank tag SetTag Description Examples

CC conjunction, coordinating and but either et for less minus neither nor or plus so therefore

CD numeral, cardinal mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one

DT determiner all an another any both del each either every half la many much

IN preposition or subordinating conjunct.

astride among upon whether out inside pro despite on by throughout

JJ adjective or numeral, ordinal third ill-mannered pre-war regrettable oiled calamitous first

JJR adjective, comparative bleaker braver breezier briefer brighter brisker broader bumper

NN noun, common, singular or mass common-carrier cabbage knuckle-duster Casino afghan shed

NNP noun, proper, singular Motown Venneboerger Czestochwa Ranzer Conchita Trumplane

NNS noun, common, plural undergraduates scotches bric-a-brac products bodyguards facets

PRP pronoun, personal hers herself him himself it itself me myself one oneself ours

RB adverb occasionally unabatingly maddeningly adventurously professedly

RP particle aboard about across along apart around aside at away back

TO "to" as preposition or infinitive marker

to

VB verb, base form ask assemble assess assign assume atone attention avoid bake

VBD verb, past tense dipped pleaded swiped wore soaked tidied convened halted

VBG verb, present participle or gerund telegraphing stirring focusing angering judging stalling lactating

VBN verb, past participle imitated dilapidated aerosolized chaired languished panelized used

VBP verb, present tense, not 3rd p. singular

predominate wrap resort sue twist spill cure lengthen brush

VBZ verb, present tense, 3rd p. singular bases reconstructs marks mixes displeases seals carps weaves

Page 24: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

24

Ambiguities in POS tagging

Children eat sweet candy / noun. Too much boiling will candy / adjective

the molasses.

Fruit flies / ? like / ? a banana.

Page 25: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

25

Levels of study of NLP

Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics

Discourse World-Knowledge

Page 26: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

26

Syntax or Phrase Structure Syntax

study of the regularities and constrains of word order and phrase structure

the book is red vs red book is the

Grammar expresses the relations among the

constituents of a sentence

Page 27: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

27

Constituents also called, syntactic structures Main Constituents:

S: sentence The boy is happy.

NP: noun phrase the little boy Sam Smith

I three boy from Montreal

VP: verb phrase eat an apple sing

leave Boston in the morning

PP: prepositional phrase in the morning about my ticket

AdjP: adjective phrase really funny rather clear

very large

AdvP: adverb phrases slowly really slowly

Page 28: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

28

Sentence Moods/Types Declarative

Mary eats. S --> NP VP

Imperative Eat! S --> VP

Yes-No Question Did Mary eat? S --> Aux NP VP

Wh-Question When did Mary eat? S --> WH-pro Aux NP VP

Page 29: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

29

Noun Phrases NP --> pre-modifiers head post-modifiers

head: central noun in NP the little boy, the boy from Montreal

pre-modifiers: determiners, cardinal, ordinal, quantifier

the boy, two boys, first boy, several boys AdjP

funny boy, really funny boy post-modifiers:

PP flights from Montreal

non-finite clause gerundive (-ing)

flights arriving from Montreal -ed

dinner served on board, jewels stolen from the queen infinitive form

flight to arrive from Montreal relative clause

flight that arrives from Montreal, girl who won the race

Page 30: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

30

Verb Phrases VP --> head-verb complements adjuncts

Some VPs: Verb eat. Verb NP leave Montreal. Verb NP PP leave Montreal in the morning. Verb PP leave in the morning. Verb S think I would like the fish. Verb VP want to leave.

want to leave Montreal.want to leave Montreal in the

morning.want to want to leave Montreal

in the morning.

Page 31: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

31

Subcategorisation frames Some verbs can take complements that others

cannotI want to fly. * I find to fly.

Verbs are subcategorized according to the complements they can take --> subcategorisation frames

traditionally: transitive vs intransitive nowadays: up to 100 subcategories / frames

Frame Verb Example

empty eat, sleep I eat.

NP prefer, find I prefer apples.

NP NP show, give Show me your hand.

PPfrom PPto fly, travel I fly f rom Montreal to Toronto

VPto prefer, want I prefer to leave.

S mean Does this mean you are going to leave me?

Page 32: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

32

Prepositional phrases

PP --> Preposition NP from Japan inside my blue bag

Page 33: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

33

Adjective Phrases

AdjP --> Adj Modifiers tall very tall taller than Mary

Page 34: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

34

Adverb Phrases

AdvP --> Adv Modifiers affirmatively very graciously rather secretively

Page 35: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

35

Context Free Grammars set of non-terminal symbols

constituents & parts-of-speech S, NP, VP, PP, Det, N, V, ...

set of terminal symbols lexicon of words & punctuation cat, mouse, nurses, eat, ...

a non-terminal designated as the starting symbol sentence S

a set of re-write rules having a single non-terminal on the LHS and one or

more terminal or non-terminal in the RHS S --> NP VP NP --> Pro | PN | Det Nominal

Page 36: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

36

A simple context-free grammar S --> NP VP NP --> AT NNS NP --> AT NN NP --> NP PP VP --> VP PP VP --> VBD VP --> VBD NP P --> IN NP

NNS --> children NNS --> students NNS --> mountains VBD --> slept VBD --> ate VBD --> saw AT --> the IN --> in IN --> of NN --> cake

The Grammar The Lexicon

Page 37: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

37

A parse tree a tree representation of the

application of the grammar to a specific sentence. S

NP VP

AT NNS VBD NP

The children ate AT NN

the cake

Page 38: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

38

Stochastic Grammars

Grammars obtained by adding probabilities to “algebraic” (i. e., non-probabilistic) grammars.

1 S --> NP VP 0.4 NP --> AT NNS 0.4 NP --> AT NN 0.2 NP --> NP PP 0.1 VP --> VP PP 0.1 VP --> VBD 0.8 VP --> VBD NP 1 P --> IN NP

Page 39: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

39

Syntactic Dependencies

Local dependency dependency between two words expressed within the

same syntactic rule. The 3/plural books/plural.

n-grams models this very well.

Non-local dependency two words can be syntactically dependent even though

they occur far apart in a sentence Ex: subject-verb agreement The children who found a wallet on the street

yesterday while walking their dog were given a reward.

challenge for certain statistical NLP approaches (ex. n-grams) that model local dependencies.

Page 40: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

40

Difficulties in parsing Attachment ambiguity

The children ate the cake with a spoon. The children ate (the cake with a spoon).?? The children (ate with a spoon).??

Page 41: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

41

Other difficulties

NP bracketing plastic cat food can cover--> ? (plastic cat) (food can) cover--> ? plastic (cat food can) cover

--> ? (plastic cat food) (can cover)

Conjunctions and appositives Maddy, my dog, and Samy

--> ?(Maddy, my dog), and (Samy)--> ?(Maddy), (my dog), and (Samy)

Page 42: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

42

Another Ambiguity: Garden-Path Sentences well-studied class of syntactic ambiguity sentence is re-analysed when the last

word in encountered humans have difficulty analysing such

sentences Example:

The horse raced past the barn fell.(the horse that was raced past the barn) fell.

Page 43: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

43

Garden Path: Wrong Parse[S [NP The horse] [VP raced past the barn]]fell

dt: determiner

n: nounv: verbp: prepositionS: sentenceNP: noun phrase

VP: verb phrase

PP: prepositional phrase

Page 44: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

44

Garden Path: Right Parse

[S [NP The horse [PAP raced past the barn]][VP fell]]

dt: determiner

n: nounv: verbp: prepositionS: sentenceNP: noun phrase

VP: verb phrase PP: prepositional phrase

PAP: passive phrase

Page 45: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

45

Levels of study of NLP

Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics

Discourse World-Knowledge

Page 46: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

46

Semantics

the study of the meaning of words, constructions, and utterances

can be divided into two parts: lexical semantics

meaning of words compositional semantics

Meaning of sentences and discourse the meaning of the whole often differs from the

meaning of the parts.

Page 47: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

47

Lexical Semantics Meaning of individual words

I went to the bank of Montreal and deposited 50$. I went to the bank of the river and dangled my feet.

Word Sense Disambiguation Determining which sense of a word is used in a

specific sentence

Semantic relations between words: hypernymy, hyponymy, synonymy, antonymy,

meronymy, holonymy, polysemy, homonymy and homophony.

Page 48: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

48

Meaning of sentences The cat eats the mouse = The mouse is eaten by the cat.

Goal: built a representation of the meaning of the sentence attach semantic roles to constituents

Some characteristics of a sentence that influence semantic interpretation: Type declarative, interrogative, imperative, exclamatory Polarity positive, negative Tense past, present, future Voice Active, passive

Some semantic roles (different from syntactic roles): Agent the doer of a volitional act Patient the thing that is affected by an act Recipient the receiver of an object Instrument the instrument used to perform an act. Time the time the act is performed. Location the location of an act or object. …

Page 49: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

49

Semantic Roles Ex:

JohnAGENT hit PeterPATIENT with a ballINSTRUMENT.

Ex: I ate spaghetti with meatballsINGREDIENT_OF_SPAGUETTI

I ate spaghetti with saladSIDE DISH_OF_SPAGUETTI I ate spaghetti with a forkINSTRUMENT I ate spaghetti with a friendACOMPANIER_OF_EATING

Important for machine translation… I AGENT: PERSON_LACKING_SOMEONE miss you PATIENT: PERSON_MISSED ?Je PATIENT: PERSON_MISSED teAGENT: PERSON_LACKING_SOMEONE manque. Tu PATIENT: PERSON_MISSED me AGENT: PERSON_LACKING_SOMEONE manques.

Page 50: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

50

Levels of study of NLP

Lexical Phonetics & phonology Parts-of-speech & Morphology Phrase Structure and Syntax Semantics Pragmatics

Discourse World-Knowledge

Page 51: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

51

Pragmatics

goes beyond the study of the meaning of a sentence

tries to explain what the speaker is really expressing

understanding how people use language socially (ex. figures of speech, speech acts, discourse analysis, …) Ex: Could you spare some change?

Page 52: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

52

Discourse Analysis In logics: A B C C B A Not in NL:

John visited Paris. He bought Mary some expensive cologne. Then he flew home. He went to Kmart. He bought some underwear.

John visited Paris. Then he flew home. He went to Kmart. He bought Mary some expensive cologne. He bought some underwear.

NL Text must be coherent ? Bill went to see his mother. The trunk is what

makes the bonsai, it gives it both its grace and power.

Page 53: 1 COMP 791A: Statistical Language Processing Linguistic Essentials Chap. 3

53

Using world knowledge

Using our general knowledge of the world to interpret a sentence/discourse

Ex: A men was killed yesterday because a jealous husband returned home earlier then usual.

Ex: Silence of the lambs…