Upload
stephen-grant
View
216
Download
1
Embed Size (px)
DESCRIPTION
April 2005CLINT Lecture IV3 Bibliography A. Voutilainen, Morphological disambiguation, in Karlsson, Voutilainen, Heikkila, Anttila (eds) Constraint Grammar pp , Mouton de Gruyter, See [e-book]e-book
Citation preview
Human Language Technology
Part of Speech (POS) Tagging II
Rule-based Tagging
April 2005 CLINT Lecture IV 2
Acknowledgment
Most slides taken from Bonnie Dorr’s course notes: www.umiacs.umd.edu/~bonnie/courses/cmsc723-03
Jurafsky & Martin Chapter 5
April 2005 CLINT Lecture IV 3
Bibliography
A. Voutilainen, Morphological disambiguation, in Karlsson, Voutilainen, Heikkila, Anttila (eds) Constraint Grammar pp165-284, Mouton de Gruyter, 1995. See [e-book]
April 2005 CLINT Lecture IV 4
EngCG Rule-Based Tagger (Voutilainen 1995) Rules based on English Constraint Grammar Two stage design Uses ENGTWOL Lexicon Hand written disambiguation rules
April 2005 CLINT Lecture IV 5
ENGTWOL Lexicon
Based on TWO-Level morphology of English (hence the name)
56,000 entries for English word stems Each entry annotated with morphological and
syntactic features
April 2005 CLINT Lecture IV 6
Sample ENGTWOL Lexicon
April 2005 CLINT Lecture IV 7
Examples of constraints (informal) Discard all verb readings if to the left there is an
unambiguous determiner, and between that determiner and the ambiguous word itself, there are no nominals (nouns, abbreviations etc.).
Discard all finite verb readings if the immediately preceding word is to.
Discard all subjunctive readings if to the left, there are no instances of the subordinating conjunction that or lest.
The first constraint would discard the verb reading (next slide)
There are about 1,100 constraints altogether
April 2005 CLINT Lecture IV 8
Actual Constraint Syntax
Given input: “that”If
(+1 A/ADV/QUANT)(+2 SENT-LIM)(NOT -1 SVOC/A)
Then eliminate non-ADV tagsElse eliminate ADV tag
this rule eliminates the adverbial sense of that as in “it isn’t that odd”
April 2005 CLINT Lecture IV 9
ENGCG Tagger
Stage 1: Run words through morphological analyzer to get all parts of speech. E.g. for the phrase “the tables”, we get the following
output:
"<the>" "the"<Def> DET CENTRAL ART SG/PL
"<tables>" "table" N NOM PL "table"<SVO> V PRES SG3 VFIN
Stage 2: Apply constraints to rule out incorrect POSs
April 2005 CLINT Lecture IV 10
Example
WORD TAGSPavlov PVLOV N NOM SG PROPERhad HAVE V PAST VFIN SVO
HAVE PCP2 SVOshown SHOW PCP2 SVOO SVO SVthat ADV
PRON DEM SGDET CENTRAL SEM SGCS (subord. conj)
salivation N NOM SG
Performance
Tested on examples from Wall St Journal, Brown Corpus, Lancaster-Oslo-Bergen Corpus
After application of the rules 93-97% of all words are fully disambiguated, and 99.7% of all words retain correct reading.
At the time, this was superior performance to other taggers
However, one should not discount the amount of effort needed to create this system