141

Chapter Two Natural Language Processing - Copy

Embed Size (px)

DESCRIPTION

eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee

Citation preview

Natural Language Processing

Chapter TwoSystax

Binyam TekalignDebre Birhan University 28 October 2014

Natural Language Processing

• NLP is the branch of computer science focused on developing systems

that allow computers to communicate with people using everyday

language.

• Also called Computational Linguistics

• Also concerns how computational methods can aid the understanding of

human language

Word Classes and Part-of-Speech Tagging

Word Classes and POS Tagging8

Background

• Part of speech:

• Noun, verb, pronoun, preposition, adverb, conjunction, particle, and article

• Recent lists of POS (also know as word classes, morphological class, or lexical tags) have

much larger numbers of word classes.

• 45 for Penn Treebank

• http://www.cs.colorado.edu/~martin/SLP/Figures/

• 87 for the Brown corpus,

• http://www.scs.leeds.ac.uk/amalgam/tagsets/brown.html

• 146 for the C7 tagset

• http://www.comp.lancs.ac.uk/ucrel/claws7tags.html

Why Do We Care about Parts of Speech?

•Predicting what words can be expected next

Personal pronoun (e.g., I, she) ____________

•Stemming

-s means singular for verbs, plural for nouns

•As the basis for syntactic parsing and then meaning extraction

I will lead the group into the lead smelter.

•Machine translation

•Text Summarization

Word Classes and POS Tagging 10

English Word Classes

• Two broad subcategories of POS:

1. Closed class

2. Open class

Word Classes and POS Tagging 11

Cont..

1. Closed class

– Having relatively fixed membership, e.g., prepositions

– Function words:

• Grammatical words like of, and, or you,

• very short, occur frequently, and play an important role in grammar.

2. Open class

• Four major open classes occurring in the languages of the world:

• nouns, verbs, adjectives, and adverbs.

Word Classes and POS Tagging13

Open Class: Noun

• The name given to people, places, or things occur

• Thus, nouns include

• Concrete terms, like ship, and chair,

• Abstractions like bandwidth and relationship, and

• Verb-like terms like pacing

• Noun in English

• Things to occur with determiners (a goat)

• To take possessives (IBM’s annual revenue), and

• To occur in the plural form (goats)

Word Classes and POS Tagging 14

Open Class: Noun

• Nouns are traditionally grouped into proper nouns and common nouns.

• Proper nouns:

• IBM, Abebe

• Common nouns

• Count nouns:

• both singular and plural (goat/goats),

• Mass nouns:

• snow, salt,

Word Classes and POS Tagging 15

Open Class: Verb

• Most of the words referring to actions and processes including main verbs like

• draw, provide, differ, and go.

• A number of morphological forms:

• non-3rd-person (eat),

• 3rd-person (eats),

• progressive (eating),

• past participle (eaten)

Word Classes and POS Tagging 16

Open Class: Adjectives

• Terms describing properties or qualities

• Most languages have adjectives for the concepts of color (white, black), age (old,

young), and value (good, bad), but

• There are languages without adjectives, e.g., Chinese.

Word Classes and POS Tagging17

Open Class: Adverbs

• Words viewed as modifying something (often verbs)

• Directional (or locative) adverbs: specify the direction or location of some action,

• here, downhill

• Degree adverbs: specify the extent of some action, process, or property,

• extremely, very, somewhat

• Manner adverb: describe the manner of some action or process,

• slowly, slinkily, delicately

• Temporal adverbs: describe the time that some action or event took place,

• yesterday, Monday

Word Classes and POS Tagging 18

Closed Classes

• Some important closed classes in English

• Prepositions: on, under, over, near, by, at, from, to, with

• Determiners: a, an, the

• Pronouns: she, who, I, others

• Conjunctions: and, but, or, as, if, when

• Auxiliary verbs: can, may, should, are

• Particles: up, down, on, off, in, out, at, by

• Numerals: one, two, three, first, second, third

Word Classes and POS Tagging 19

Prepositions

• Prepositions occur before nouns, semantically they are relational

Preposition (and particles) of English from CELEX

Word Classes and POS Tagging 20

Particles

• A particle is a word that resembles a preposition or an adverb, and that often

combines with a verb to form a larger unit call a phrasal verb

English single-word particles from Quirk, et al (1985)

Word Classes and POS Tagging 21

Conjunctions

• Conjunctions are used to join two phrases, clauses, or sentences.

• and, or, or, but

Word Classes and POS Tagging 22

Coordinating and subordinating conjunctions of EnglishFrom the CELEX on-line dictionary.

Word Classes and POS Tagging 23

Open Classes: Pronouns

• Pronouns act as a kind of shorthand for referring to some noun

phrase or entity or event.

• Personal pronouns: persons or entities (you, she, I, it, me, etc)

• Possessive pronouns: forms of personal pronouns indicating actual

possession or just an abstract relation between the person and some

objects.

• Wh-pronouns: used in certain question forms, or may act as

complementizer.

Word Classes and POS Tagging 24

Pronouns of English from the CELEX on-line dictionary.

Word Classes and POS Tagging 26

Open Classes: Auxiliary Verbs

• Auxiliary verbs: mark certain semantic feature of a main verb

English modal verbs from the CELEX on-line dictionary.

Word Classes and POS Tagging 27

Open Classes: Others

• Interjections: oh, ah, hey, man,

• Negatives: no, not

• Politeness markers: please, thank you

• Greetings: hello, goodbye

Word Classes and POS Tagging 28

Tagsets for English

• There are a small number of popular tagsets for English, many of which evolved

from the 87-tag tagset used for the Brown corpus.

• Three commonly used

• The small 45-tag Penn Treebank tagset

• The medium-sized 61 tag C5 tageset used by the Lancaster UCREL project’s CLAWS tagger to tag

the British National Corpus, and

• The larger 146-tag C7 tagset

Word Classes and POS Tagging 29

Penn Treebank POS tags

Word Classes and POS Tagging 30

Part-of-Speech Tagging• POS tagging (tagging)

• The process of assigning a POS or other lexical marker to each word in a corpus.

• Also applied to punctuation marks

• Tags for NL are much more ambiguous.

• Taggers play an increasingly important role in speech recognition, NL parsing and IR

An Example

thegirlkisstheboyonthecheek

LEMMA TAG

+DET+NOUN+VPAST+DET+NOUN+PREP+DET+NOUN

thegirlkissedtheboyonthecheek

WORD

Labelling words for POS can be done by

dictionary lookup

morphological analysis

“tagging”

Word Classes and POS Tagging 32

Part-of-Speech Tagging

• The input to a tagging algorithm is a string of words and a specified tagset of the

kind described previously.VB DT NN .Book that flight .

VBZ DT NN VB NN ?Does that flight serve dinner ?

• Automatically assigning a tag to a word is not trivial

– For example, book is ambiguous: it can be a verb or a noun

– Similarly, that can be a determiner, or a complementizer.

Word Classes and POS Tagging33

Part-of-Speech Tagging

• Many tagging algorithms fall into two classes:

• Rule-based taggers

• Involve a large database of hand-written disambiguation rule

• Typically more than 1000 hand-written rules

• Stochastic taggers

• Resolve tagging ambiguities by using a training corpus to count the probability of a

given word having a given tag in a given context.

Syntax

Syntax• Syntax: from Greek syntaxis “setting out together”

• Refers to the way words are arranged together, and the relationship between

them.

• Goal of syntax is

• to model the knowledge of that people unconsciously have about the grammar of their

native language

Main ideas of syntax:• Constituency

• Groups of words may behave as a single unit or phrase

• e.g., NP

• Grammatical relations

• A formalization of ideas from traditional grammar about SUBJECT, OBJECT

• E.g. She ate her breakfast

• Subcategorization and dependencies

• Referring to certain kind of relations between words and phrases,

• e.g., the verb want can be followed by an infinitival phrase, as in I want to fly to Detroit.

Constituency

• NP:

• A sequence of words surrounding at least one noun, e.g.,

• three parties from Brooklyn arrive …

• hey sit

• the reason he comes into the Hot Box

• Preposed or postposed constructions,

• e.g., the PP, on September seventeenth, can be placed in a number of different locations

• On September seventeenth, I’d like to fly from Atlanta to Denver.

• I’d like to fly on September seventeenth from Atlanta to Denver.

• I’d like to fly from Atlanta to Denver On September seventeenth.

NPs

• NP -> Pronoun• I came, you saw it, they conquered

• NP -> Proper-Noun• Los Angeles is west of Texas• John Hennessy is the president of Stanford

• NP -> Det Noun• The president

• NP -> Nominal

• Nominal -> Noun Noun• A morning flight to Denver

PPs

• PP -> Preposition NP• From LA• To the store• On Tuesday morning• With lunch

Syntax

• Why should we care?

• Grammar checkers

• Question answering

• Information extraction

• Machine translation

Context-Free Grammars (CFG)

42

Cont..

• A context-free grammar is a notation for describing languages.

• It is more powerful than finite automata or Regular Expression

• But still cannot define all possible languages.

04/21/2023 Speech and Language Processing - Jurafsky and Martin

43

Context-Free Grammars

• Terminals

• We’ll take these to be words

• Non-Terminals

• The elements in a language

• Like Noun, Noun phrase NP, verb phrase VP and sentence S

• A start symbol S, which is a member of none terminals

• Rules

• Rules are equations that consist of a single non-terminal on the left and any number of

terminals and non-terminals on the right.

Cont..

A → B CMeans that A can be rewrite as B followed by C regardless of the context in

which A is foundS → NP VP

• A language that is defined by some CFG is called a context-free language.

Cont..

• Noun → flight | breeze | trip | morning | …

• Verb → is | prefer | like | need | want | fly …

• Adjective → cheapest | non-stop | first | latest

• Pronoun → me | I | you | it | …

• Proper-Noun → Alaska | Baltimore Chicago |

• Determiner → the | a | an | this | these | that

• Preposition → from | to | on | near | …

• Conjunction → and | or | but | …

Rules• S → NP VP I + want a morning flight

• NP → Pronoun I

• | Proper-Noun Los Angeles

• | Det Nominal a + flight

• Nominal → Noun Nominal morning + flight

• | Noun flights

• VP → Verb do

• | Verb NP want + a flight

• | Verb NP PP leave + Boston + in the morning

• | Verb PP leaving + on Thursday

• PP → Preposition NP from + Los Angeles

Sentence-Level Constructions

• There are a great number of possible overall sentence structures, but

• four are particularly common and important:

• Declarative structure,

• imperative structure,

• yes-no-question structure,

• wh-question structure.

Sentence-Types

• Declaratives: A plane leftS -> NP VP

• Imperatives: Leave!S -> VP

• Yes-No Questions: Did the plane leave?S -> Aux NP VP

• WH Questions: When did the plane leave?S -> WH Aux NP VP

Sentences with declarative structure

• –A subject NP followed by a VP

• The flight should be eleven a.m. tomorrow.

• I need a flight to Seattle leaving from Baltimore making a stop in Minneapolis.

• The return flight should leave at around seven p.m.

• I want a flight from Atlanta to Chicago.

• I plan to leave on July first around six thirty in the evening.

• S → NP VP

Sentence with imperative structure

– Begin with a VP and have no subject.

– Always used for commands and suggestions

• Show the lowest fare.

• Show me the cheapest fare that has lunch.

• List all flights between five and seven p.m.

• Show me all the flights leaving Baltimore.

• Show me flights arriving within thirty minutes of each other.

• Show me the last flight to leave.

– S → VP

Sentences with yes-no-question structure

– Begin with auxiliary, followed by a subject NP, followed by a VP.

• Do any of these flights have stops?

• Does American’s flight eighteen twenty five serve dinner?

• Can you give me the same information for United?

– S → Aux NP VP

The wh-subject-question structure

– Identical to the declarative structure, except that the first NP contains

some wh-word.

• What airlines fly from Burbank to Denver?

• Which flights serve breakfast?

• Which of these flights have the longest layover Nashville?

– S → Wh-NP VP

• The wh-noun-subject-question structure

• What flights do you have from Atlanta to Washington?

– S → Wh-NP Aux NP VP

Auxiliaries

• Auxiliaries or helping verbs– A subclass of verbs– Including the modal verb, can, could many, might, must, will, would, shall,

and should– The perfect auxiliary have,– The progressive auxiliary be, and– The passive auxiliary be.

Parsing and Syntax

Parsing

Parsing

• derive the syntactic structure of a sentence based on a language model (grammar)

• construct a parse tree, i.e. the derivation of the sentence based on the grammar (rewrite

system)

Outline Language, Syntax, Parsing

Problems in Parsing Ambiguity

Bottom vs. Top Down Parsing

Chart-Parsing

Earley-Algorithm04/21/2023 COSC 709: Natural Language Processing 56

Sample Grammar

Non Terminal (S, NT, T, P) Sentence Symbol S NT, Part-of-Speech NT, Constituents NT,Terminals, Word TGrammar Rules P NT (NT T)*

S NP VP statementS Aux NP VP questionS VP commandNP Det Nominal NP Proper-Noun Nominal Noun | Noun Nominal | Nominal PPVP Verb | Verb NP | Verb PP | Verb NP PP PP Prep NP

Det that | this | aNoun book | flight | meal | moneyProper-Noun Houston | American Airlines | TWAVerb book | include | preferAux doesPrep from | to | on

04/21/2023 COSC 709: Natural Language Processing 57

Parsing Task

Parse "Does this flight include a meal?"

04/21/2023 COSC 709: Natural Language Processing 58

Sample Parse Tree

Parse "Does this flight include a meal?"

S

Aux NP VP

Det Nominal Verb NP

Noun Det Nominal

does this flight include a meal

04/21/2023 COSC 709: Natural Language Processing 59

Problems in Parsing

Ambiguity

“Peter saw Mary with the telescope”

syntactical/structural ambiguity – several parse trees are possible e.g. above

sentence

semantic/lexical ambiguity – several word meanings e.g. bank (where you get

money) and (river) bank

even different word categories possible (interim) e.g. “He books the flight.” vs.

“The books are here.“ 04/21/2023 COSC 709: Natural Language Processing 60

Bottom-up – from word-nodes to sentence-symbol Top-down Parsing – from sentence-symbol to words

Bottom-up and Top-down Parsing

S

AUX NP VP

Det Nominal Verb NP

Noun Det Nominal

does this flight include a meal

04/21/2023 COSC 709: Natural Language Processing 61

Top Down Parsing

S

NP VP

Pronoun

04/21/2023 COSC 709: Natural Language Processing 62

Top Down Parsing

S

NP VP

Pronoun

book

X

04/21/2023 COSC 709: Natural Language Processing 63

Top Down Parsing

S

NP VP

ProperNoun

04/21/2023 COSC 709: Natural Language Processing 64

Top Down Parsing

S

NP VP

ProperNoun

book

X

04/21/2023 COSC 709: Natural Language Processing 65

Top Down Parsing

S

NP VP

Det Nominal

04/21/2023 COSC 709: Natural Language Processing 66

Top Down Parsing

S

NP VP

Det Nominal

book

X

04/21/2023 COSC 709: Natural Language Processing 67

Top Down Parsing

S

Aux NP VP

04/21/2023 COSC 709: Natural Language Processing 68

Top Down Parsing

S

Aux NP VP

book

X

04/21/2023 COSC 709: Natural Language Processing 69

Top Down Parsing

S

VP

04/21/2023 COSC 709: Natural Language Processing 70

Top Down Parsing

S

VP

Verb

04/21/2023 COSC 709: Natural Language Processing 71

Top Down Parsing

S

VP

Verb

book

04/21/2023 COSC 709: Natural Language Processing 72

Top Down Parsing

S

VP

Verb

bookX

that

04/21/2023 COSC 709: Natural Language Processing 73

Top Down Parsing

S

VP

Verb NP

04/21/2023 COSC 709: Natural Language Processing 74

Top Down Parsing

S

VP

Verb NP

book

04/21/2023 COSC 709: Natural Language Processing 75

Top Down Parsing

S

VP

Verb NP

book Pronoun

04/21/2023 COSC 709: Natural Language Processing 76

Top Down Parsing

S

VP

Verb NP

book Pronoun

Xthat

04/21/2023 COSC 709: Natural Language Processing 77

Top Down Parsing

S

VP

Verb NP

book ProperNoun

04/21/2023 COSC 709: Natural Language Processing 78

Top Down Parsing

S

VP

Verb NP

book ProperNoun

Xthat

04/21/2023 COSC 709: Natural Language Processing 79

Top Down Parsing

S

VP

Verb NP

book Det Nominal

04/21/2023 COSC 709: Natural Language Processing 80

Top Down Parsing

S

VP

Verb NP

book Det Nominal

that

04/21/2023 COSC 709: Natural Language Processing 81

Top Down Parsing

S

VP

Verb NP

book Det Nominal

that Noun

04/21/2023 COSC 709: Natural Language Processing 82

Top Down Parsing

S

VP

Verb NP

book Det Nominal

that Noun

flight

04/21/2023 COSC 709: Natural Language Processing 83

Bottom Up Parsing

book that flight

04/21/2023 COSC 709: Natural Language Processing 84

Bottom Up Parsing

book that flight

Noun

04/21/2023 COSC 709: Natural Language Processing 85

Bottom Up Parsing

book that flight

Noun

Nominal

04/21/2023 COSC 709: Natural Language Processing 86

Bottom Up Parsing

book that flight

Noun

Nominal Noun

Nominal

04/21/2023 COSC 709: Natural Language Processing 87

Bottom Up Parsing

book that flight

Noun

Nominal Noun

Nominal

X

04/21/2023 COSC 709: Natural Language Processing 88

Bottom Up Parsing

89

book that flight

Noun

Nominal PP

Nominal

04/21/2023 COSC 709: Natural Language Processing 89

Bottom Up Parsing

90

book that flight

Noun Det

Nominal PP

Nominal

04/21/2023 COSC 709: Natural Language Processing 90

Bottom Up Parsing

91

book that flight

Noun Det

NP

Nominal

Nominal PP

Nominal

04/21/2023 COSC 709: Natural Language Processing 91

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

Nominal PP

Nominal

04/21/2023 COSC 709: Natural Language Processing 92

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

Nominal PP

Nominal

04/21/2023 COSC 709: Natural Language Processing 93

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

S

VP

Nominal PP

Nominal

04/21/2023 COSC 709: Natural Language Processing 94

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

S

VP

X

Nominal PP

Nominal

04/21/2023 COSC 709: Natural Language Processing 95

Bottom Up Parsing

book that

Noun Det

NP

Nominal

flight

Noun

Nominal PP

Nominal

X

04/21/2023 COSC 709: Natural Language Processing 96

Bottom Up Parsing

book that

Verb Det

NP

Nominal

flight

Noun

04/21/2023 COSC 709: Natural Language Processing 97

Bottom Up Parsing

book that

Verb

VP

Det

NP

Nominal

flight

Noun

04/21/2023 COSC 709: Natural Language Processing 98

Det

Bottom Up Parsing

book that

Verb

VP

S

NP

Nominal

flight

Noun

04/21/2023 COSC 709: Natural Language Processing 99

Det

Bottom Up Parsing

book that

Verb

VP

S

XNP

Nominal

flight

Noun

04/21/2023 COSC 709: Natural Language Processing 100

Bottom Up Parsing

book that

Verb

VP

VP

PP

Det

NP

Nominal

flight

Noun

04/21/2023 COSC 709: Natural Language Processing 101

Bottom Up Parsing

book that

Verb

VP

VP

PP

Det

NP

Nominal

flight

Noun

X

04/21/2023 COSC 709: Natural Language Processing 102

Bottom Up Parsing

book that

Verb

VP

Det

NP

Nominal

flight

Noun

NP

04/21/2023 COSC 709: Natural Language Processing 103

Bottom Up Parsing

book that

Verb

VP

Det

NP

Nominal

flight

Noun

04/21/2023 COSC 709: Natural Language Processing 104

Bottom Up Parsing

book that

Verb

VP

Det

NP

Nominal

flight

Noun

S

04/21/2023 COSC 709: Natural Language Processing 105

Problems with Bottom-up and Top-down Parsing

• Problems with left-recursive rules like NP NP PP:

• don’t know how many times recursion is needed

• Pure Bottom-up or Top-down Parsing is inefficient because

• it generates and explores too many structures which in the end turn out to be.

• Combine top-down and bottom-up approach:

• Start with sentence; use rules top-down (look-ahead); read input; try to find shortest path

from input to highest unparsed constituent (from left to right).

• Chart-Parsing / Earley-Parser04/21/2023 COSC 709: Natural Language Processing 106

Chart Parsing / Early Algorithm• Early-Parser based on Chart-Parsing

• Essence: Integrate top-down and bottom-up parsing.

• Top-down:

• Start with S-symbol.

• Generate all applicable rules for S.

• Go further down with left-most constituent in rules and add rules for these constituents until you

encounter a left-most node on the RHS which is a word category (POS).

• Bottom-up:

• Read input word and compare.

• If word matches, mark as recognized and move parsing on to the next category in the rule(s).04/21/2023 COSC 709: Natural Language Processing 107

ChartA Chart is a graph with n+1 nodes marked 0 to n for a sequence of n input words. Arcs indicate recognized part of RHS of rule.The • indicates recognized constituents in rules.

A directed acyclic graph representation of the three dotted rules above.

04/21/2023 COSC 709: Natural Language Processing 108

Chart Parsing / Earley Parser 1Chart

Sequence of n input words; n+1 nodes marked 0 to n.

States in chart represent possible rules and recognized constituents.

RHS of recognized rule is covered by arc.

Interim state

S • VP, [0,0]

top-down look at rule S VP

nothing of RHS of rule yet recognized (• is far left)

arc at beginning, no coverage (covers no input word; beginning of arc at node 0 and end of arc at node 0)

04/21/2023 COSC 709: Natural Language Processing 109

Chart Parsing / Earley Parser 2Interim states

NP Det • Nominal, [1,2]

top-down look with rule NP Det • Nominal

Det recognized (• after Det)

arc covers one input word which is between node 1 and node 2

look next for Nominal, top-down

NP Det Nominal • , [1,3]

Nominal was recognized, move • after Nominal

move end of arc to cover Nominal; change 2 to 3

structure is completely recognized; arc is inactive;

mark NP as recognized in other rules (move • ), bottom up04/21/2023 COSC 709: Natural Language Processing 110

Chart - 0

Book this flight

S . VPVP . V NP

04/21/2023 COSC 709: Natural Language Processing 111

Chart - 1

VP V . NP

V

Book this flight

S . VP

NP . Det Nom

04/21/2023 COSC 709: Natural Language Processing 112

Chart - 2

VP V . NP

V

Book this flight

S . VP

NP Det . Nom

Det

04/21/2023 COSC 709: Natural Language Processing 113

Chart - 3a

VP V . NP

V

Book this flight

S . VP

NP Det . Nom

Det

Nom Noun .

Noun

04/21/2023 COSC 709: Natural Language Processing 114

Chart - 3b

VP V . NP

V

Book this flight

S . VP

NP Det Nom .

Det

Nom Noun .

Noun

04/21/2023 COSC 709: Natural Language Processing 115

Chart - 3c

VP V NP .

V

Book this flight

NP Det Nom .

Det

Nom Noun .

Noun

S . VP

04/21/2023 COSC 709: Natural Language Processing 116

Chart - 3d

VP V NP .

V

Book this flight

S VP .

NP Det Nom .

Det

Nom Noun .

Noun

04/21/2023 COSC 709: Natural Language Processing 117

Earley Parser

Earley Algorithm - Functionspredictor

generates new rules for partly recognized RHS with constituent right of • (top-down

generation)

scanner

if word category (POS) is found right of the • , the Scanner reads the next input word and

adds a rule for it to the chart (bottom-up mode)

completer

if rule is completely recognized (the • is far right), the recognition state of earlier rules in the

chart advances: the • is moved over the recognized constituent (bottom-up recognition).

04/21/2023 COSC 709: Natural Language Processing 119

Earley – Chart for “book that flight” from 2nd edition

Earley – Chart for “book that flight”

04/21/2023 COSC 709: Natural Language Processing 120

function EARLEY-PARSE(words, grammar) returns chartENQUEUE(( S, [0,0]), chart[0])for i_from 0 to LENGTH(words) do

for each state in chart[i] doif INCOMPLETE?(state) and NEXT-CAT(state) is not a part of speech then PREDICTOR(state) elseif INCOMPLETE?(state) and NEXT-CAT(state)is a part of speech then SCANNER(state)else COMPLETER(state)

endendreturn(chart) - continued -

Earley-Algorithm

04/21/2023 COSC 709: Natural Language Processing 121

procedure PREDICTOR((A B , [i,j]))for each (B ) in GRAMMAR-RULES-FOR(B, grammar) do ENQUEUE((B [j,j], chart[j])

end

procedure SCANNER ((A B , [i,j]))if B PARTS-OF-SPEECH(word[j]) then ENQUEUE((B word[j], [j,j+1]), chart[j+1])

end

procedure COMPLETER ((B , [j,k]))for each (A B , [i,j]) in chart[j] do ENQUEUE((A B , [i,k]), chart[k])

end

procedure ENQUEUE(state, chart-entry)if state is not already in chart-entry then PUSH(state, chart-entry)

end

Earley-Algorithm (continued)

04/21/2023 COSC 709: Natural Language Processing 122

Earley-Algorithm (copy from 2nd edition)

Earley – Algorithm

main

04/21/2023 COSC 709: Natural Language Processing 123

Earley-Algorithm (continued)

Earley – Algorithm

processes

04/21/2023 COSC 709: Natural Language Processing 124

Earley – Algorithm complete

04/21/2023 COSC 709: Natural Language Processing 125

Chart-Parser Algorithm (just FYI)

04/21/2023 COSC 709: Natural Language Processing 126

http://tomato.banatao.berkeley.edu:8080/parser/parser.html

• I have a car• I have expensive car• What are you doing• I am running• Get out of jail• Do not do that• Could you please give me the coffee

http://nlpdotnet.com/services/Tagger.aspx• http://textanalysisonline.com/nltk-stanford-postagger

Word Classes and POS Tagging 137

Penn Treebank POS tags

Sentence-Types

• Declaratives: A plane leftS -> NP VP

• Imperatives: Leave!S -> VP

• Yes-No Questions: Did the plane leave?S -> Aux NP VP Does he know the case?

• WH Questions: When did the plane leave?S -> WH Aux NP VP

139

An Exercise: The city hall parking lot in town

• NP NP NP PP• NP Det Nom• NP Adj Nom• NP Nom Nom• Nom NP Nom• Nom N• PP Prep NP• N city | hall | lot | town• Adj parking• Prep to | for | in

Earley – Chart for “book that flight” from 2nd edition

Earley – Chart for “book that flight”

04/21/2023 COSC 709: Natural Language Processing 140

S.NP VP Predicator Vhave. Scanner

S.Aux NP VP “” VPV.NP Completer

S.VP “” VP V.PP “”

VP .V NP “” NP.Det Nom predicator

VP .V NP “” NP.Pro “”

VP .V PP “” Deta. Scanner

NP.Det Nom “” NPDet.Nom Completer

NP.Pro “” Nom.n nom Predictor

Nom .n “” Nom.n PP “”

proI. Scanner Nom.n “”

NPpro. Completer Ncar. Scanner

SNP.VP “” NomN. Completer

VP .V NP predicator NPDet Nom. Completer

VP .V NP PP “” VPVNP. Completer

VP .V PP “” SNP VP. Completer

Earley Chart for “I have a car”.