36
February 2007 CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking • Transformation Based Tagging • Chunking

February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

Embed Size (px)

Citation preview

Page 1: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

1

CSA2050: Natural Language Processing

Tagging 3 and Chunking

• Transformation Based Tagging• Chunking

Page 2: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

2

Tagging 3 and Chunking Lecture

• Slides based on Mike Rosner and Marti Hearst notes

• Additions from NLTK tutorials

Page 3: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

3

3 Approaches to Tagging

1. Rule-Based Tagger: ENGTWOL Tagger(Voutilainen 1995)

2. Stochastic Tagger: HMM-based Tagger

3. Transformation-Based Tagger: Brill Tagger(Brill 1995)

Page 4: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

4

Transformation-Based Tagging

• A combination of rule-based and stochastic tagging methodologies:– like rule-based tagging: rules are used to

specify tags in a certain environment;– like stochastic tagging: machine learning

is used.– Transformation-Based Learning (TBL)

Page 5: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

5

Transformation Based Error Driven Learning

unannotatedtext

initialstate

annotatedtext

TRUTH learner

transformationrulesdiagram after Brill (1996)

Page 6: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

6

TBL Requirements

• Initial State Annotator

• List of allowable transformations

• Scoring function

• Search strategy

Page 7: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

7

Initial State Annotation

• Input– Corpus– Dictionary– Frequency counts for each entry

• Output– Corpus tagged with most frequent tags

Page 8: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

8

TBL Requirements

• Initial State Annotator

• List of allowable transformations

• Scoring function

• Search strategy

Page 9: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

9

Transformations

Each transformation comprises• A source tag• A target tag• A triggering environment

Example• NN• VB• Previous tag is TO

Page 10: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

10

More Examples

Source tag Target Tag Triggering Environment

NN VB previous tag is TO

VBP VB one of the three previous tags is MD

JJR RBR next tag is JJ

VBP VB one of the two previous words is n’t

Page 11: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

11

Allowable transforms based on fixed schemas

Schema ti-3 ti-2 ti-1 ti ti+1 ti+2 ti+3

1 *2 *3 *4 *5 *6 *7 *8 *9 *

Page 12: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

12

Set of Possible Transformations

The set of possible transformations is enumerated by allowing

• every possible tag or word• in every possible slot• in every possible schema

This set can get quite large

Page 13: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

13

TBL Requirements

• Initial State Annotator

• List of allowable transformations

• Scoring function

• Search strategy

Page 14: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

14

Scoring Function

For a given tagging state of the corpusFor a given transformation

For every word position in the corpus• If the rule applies and yields a correct tag,

increment score by 1• If the rule applies and yields an incorrect tag,

decrement score by 1

Page 15: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

15

TBL Requirements

• Initial State Annotator

• List of allowable transformations

• Scoring function

• Search strategy

Page 16: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

16

The Basic Algorithm

• Label every word with its most likely tag

• Repeat the followingwhile improvement > threshold– Examine every possible transformation,

selecting the one that results in the most improved tagging

– Retag the data according to this rule– Append this rule to output list

• Return output list of transformations

Page 17: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

17

TBL: Remarks• Execution Speed: TBL tagger is slower

than HMM approach.• Learning Speed is slow: Brill’s

implementation over a day (600k tokens)

BUT …

• Learns small number of simple, non-stochastic rules

• Can be made to work faster with Finite State Transducers

Page 18: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

18

Tagging Unknown Words

• New words added to (newspaper) language 20+ per month

• Plus many proper names …• Increases error rates by 1-2%• Methods

Assume the unknowns are nouns. Assume the unknowns have a probability

distribution similar to words occurring once in the training set.

Use morphological information, e.g. words ending with –ed tend to be tagged VBN.

Page 19: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

19

Evaluation

• The result is compared with a manually coded “Gold Standard”– Typically accuracy reaches 95-97%– This may be compared with the result for a

baseline tagger (one that uses no context).

• Important: 100% accuracy is impossible even for human annotators.

Page 20: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

20

A word of caution

• 95% accuracy: every 20th token wrong

• 96% accuracy: every 25th token wrong– an improvement of 25% from 95% to 96% ???

• 97% accuracy: every 33th token wrong

• 98% accuracy: every 50th token wrong

Page 21: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

21

How much training data is needed?

• When working with the STTS (50 tags) we observed

• a strong increase in accuracy when testing on 10´000, 20´000, …, 50´000 tokens,

• a slight increase in accuracy when testing on up to 100´000 tokens,

• hardly any increase thereafter.

Page 22: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

22

Summary

• Tagging decisions are conditioned on a wider range of events that HMM models mentioned earlier. For example, left and right context can be used simultaneously.

• Learning and tagging are simple, intuitive and understandable.

• Transformation-based learning has also been applied to sentence parsing.

Page 23: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

23

The Three Approaches Compared

• Rule Based– Hand crafted rules– It takes too long to come up with good rules– Portability problems

• Stochastic– Find sequence with highest probability (Viterbi)– Result of training not accessible to humans– Large storage needs for intermediate results whilst training

• Transformation– Rules are learned– Small number of rules– Rules can be inspected and modified by humans

Page 24: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

24

Shallow/Chunk Parsing

Goal: divide a sentence into a sequence of chunks.

• Chunks are non-overlapping regions of a text[I] saw [a tall man] in [the park].

• Chunks are non-recursive– A chunk can not contain other chunks

• Chunks are non-exhaustive– Not all words are included in chunks

Page 25: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

25

Chunk Parsing Examples

• Noun-phrase chunking:[I] saw [a tall man] in [the park].

• Verb-phrase chunking:The man who [was in the park] [saw me].

• Prosodic chunking:

[I saw] [a tall man] [in the park].

• Question answering:– What [Spanish explorer] discovered [the

Mississippi River]?

Page 26: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

26

Motivation

• Locating information– e.g., text retrieval

• Index a document collection on its noun phrases

• Ignoring information– Generalize in order to study higher-level patterns

• e.g. phrases involving “gave” in Penn treebank:– gave NP; gave up NP in NP; gave NP up; gave NP help; gave

NP to NP

– Sometimes a full parse has too much structure• Too nested• Chunks usually are not recursive

Page 27: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

27

Representation

• BIO (or IOB)

Trees

Page 28: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

28

Comparison with Full Parsing

• Parsing is usually an intermediate stage– Builds structures that are used by later stages of processing

• Full parsing is a sufficient but not necessary intermediate stage for many NLP tasks– Parsing often provides more information than we need

• Shallow parsing is an easier problem– Less word-order flexibility within chunks than between chunks– More locality:

• Fewer long-range dependencies• Less context-dependence• Less ambiguity

Page 29: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

29

Chunks and Constituency

Constituents: [[a tall man] [ in [the park]]].

Chunks: [a tall man] in [the park].

• A constituent is part of some higher unit in the hierarchical syntactic parse

• Chunks are not constituents– Constituents are recursive

• But, chunks are typically subsequences of constituents– Chunks do not cross major constituent boundaries

Page 30: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

30

Chunk Parsing in NLTK

• Chunk parsers usually ignore lexical content– Only need to look at part-of-speech tags

• Possible steps in chunk parsing– Chunking, unchunking– Chinking– Merging, splitting

• Evaluation– Compare to a Baseline– Evaluate in terms of

• Precision, Recall, F-Measure• Missed (False Negative), Incorrect (False Positive)

Page 31: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

31

Chunk Parsing in NLTK

• Define a regular expression that matches the sequences of tags in a chunk

A simple noun phrase chunk regexp:(Note that <NN.*> matches any tag starting with NN)

<DT>? <JJ>* <NN.?>

• Chunk all matching subsequences:

the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN

[the/DT little/JJ cat/NN] sat/VBD on/IN [the/DT mat/NN]

• If matching subsequences overlap, first 1 gets priority

Page 32: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

32

Unchunking

• Remove any chunk with a given pattern– e.g., unChunkRule(‘<NN|DT>+’, ‘Unchunk NNDT’)– Combine with Chunk Rule <NN|DT|JJ>+

• Chunk all matching subsequences:– Input:

the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN

– Apply chunk rule

[the/DT little/JJ cat/NN] sat/VBD on/IN [the/DT mat/NN]– Apply unchunk rule

[the/DT little/JJ cat/NN] sat/VBD on/IN the/DT mat/NN

Page 33: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

33

Chinking

• A chink is a subsequence of the text that is not a chunk.• Define a regular expression that matches the sequences

of tags in a chinkA simple chink regexp for finding NP chunks: (<VB.?>|<IN>)+

• First apply chunk rule to chunk everything– Input:

the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN

– ChunkRule('<.*>+', ‘Chunk everything’)

[the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN]– Apply Chink rule above:

[the/DT little/JJ cat/NN] sat/VBD on/IN [the/DT mat/NN]

Page 34: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

34

Merging

• Combine adjacent chunks into a single chunk– Define a regular expression that matches the sequences of tags

on both sides of the point to be merged

• Example:– Merge a chunk ending in JJ with a chunk starting with NN

MergeRule(‘<JJ>’, ‘<NN>’, ‘Merge adjs and nouns’)

[the/DT little/JJ] [cat/NN] sat/VBD on/IN the/DT mat/NN

[the/DT little/JJ cat/NN] sat/VBD on/IN the/DT mat/NN

• Splitting is the opposite of merging

Page 35: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

35

Merging

• Combine adjacent chunks into a single chunk– Define a regular expression that matches the sequences of tags

on both sides of the point to be merged

• Example:– Merge a chunk ending in JJ with a chunk starting with NN

MergeRule(‘<JJ>’, ‘<NN>’, ‘Merge adjs and nouns’)

[the/DT little/JJ] [cat/NN] sat/VBD on/IN the/DT mat/NN

[the/DT little/JJ cat/NN] sat/VBD on/IN the/DT mat/NN

• Splitting is the opposite of merging

Page 36: February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking

February 2007 CSA3050: Tagging III and Chunking

36

Next Sessions…

• NLTK Exercises