45
Outline Motivation What are chunks? Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Chunking Ewan Klein [email protected] ICL — 14 November 2005 Ewan Klein [email protected] Chunking

Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking

Ewan [email protected]

ICL — 14 November 2005

Ewan Klein [email protected] Chunking

Page 2: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Motivation

What are chunks?

Chunking in NLTK-Lite

Chunking in Cass

Chunking as Tagging

Summary and Reading

Ewan Klein [email protected] Chunking

Page 3: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Problems with Full Parsing, 1

Goal: Build a complete parse tree for a sentence.I Coverage and ambiguity:

I No complete grammar of any languageI Sapir: “All grammars leak”I As coverage increases, so does ambiguity.I Problem of ranking parses by degree of ‘plausibility’

I Low accuracyI Unbounded dependencies hard to parseI Errors tend to propagate

Ewan Klein [email protected] Chunking

Page 4: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Problems with Full Parsing, 2

I Speed:I Complexity of rule-based chart parsing is O(n3) in length of

sentence, multiplied by factor O(G 2), where G is size ofgrammar.

I Practical results are often better, but still slow for parsing large(e.g., billion words) corpora in reasonable time.

I Finite state machines have worst-case complexity O(n) inlength of string.

Ewan Klein [email protected] Chunking

Page 5: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Motivations for Parsing

Why parse sentences in the first place?

I Parsing is usually an intermediate stage in a larger processingframework.

I Full parsing is a sufficient but not necessary step for manyNLP tasks.

I Full parsing often provides more information than we need orcan deal with.

Ewan Klein [email protected] Chunking

Page 6: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Partial Parsing / Chunking

Assign a partial structure to a sentence.

I Don’t try to deal with all of language

I Don’t attempt to resolve all semantically significant decisions

I Use deterministic grammars for easy-to-parse pieces, andother methods for other pieces, depending on task.

I “easy to parse” = no ambiguity & no recursionI Partial parsing is usually:

I easier to implementI more robustI faster

Ewan Klein [email protected] Chunking

Page 7: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking, 1

Goal: Divide a sentence into a sequence of chunks.

I Abney (1994):

[when I read] [a sentence], [I read it][a chunk] [at a time]

I Chunks are non-overlapping regions of text:[walk] [straight past] [the lake]

I (Usually) each chunk contains a head, with the possibleaddition of some preceding function words and modifiers

[ walk ] [straight past ] [the lake ]

I Chunks are non-recursive:I A chunk cannot contain another chunk of the same category

Ewan Klein [email protected] Chunking

Page 8: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking, 2

I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk

[take] [the second road] that [is] on [the left hand side]

I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:

I they are not included in noun chunks.

I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)

I noun groups — everything in NP up to and including thehead noun

I verb groups — everything in VP (including auxiliaries) up toand including the head verb

Ewan Klein [email protected] Chunking

Page 9: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking, 2

I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk

[take] [the second road] that [is] on [the left hand side]

I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:

I they are not included in noun chunks.

I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)

I noun groups — everything in NP up to and including thehead noun

I verb groups — everything in VP (including auxiliaries) up toand including the head verb

Ewan Klein [email protected] Chunking

Page 10: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking, 2

I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk

[take] [the second road] that [is] on [the left hand side]

I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:

I they are not included in noun chunks.

I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)

I noun groups — everything in NP up to and including thehead noun

I verb groups — everything in VP (including auxiliaries) up toand including the head verb

Ewan Klein [email protected] Chunking

Page 11: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking, 2

I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk

[take] [the second road] that [is] on [the left hand side]

I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:

I they are not included in noun chunks.

I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)

I noun groups — everything in NP up to and including thehead noun

I verb groups — everything in VP (including auxiliaries) up toand including the head verb

Ewan Klein [email protected] Chunking

Page 12: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking, 2

I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk

[take] [the second road] that [is] on [the left hand side]

I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:

I they are not included in noun chunks.

I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)

I noun groups — everything in NP up to and including thehead noun

I verb groups — everything in VP (including auxiliaries) up toand including the head verb

Ewan Klein [email protected] Chunking

Page 13: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunk Parsing: Accuracy

Chunk parsing attempts to do less, but does it more accurately.

I Smaller solution space

I Less word-order flexibility within chunks than between chunks.I Better locality:

I doesn’t attempt to deal with unbounded dependenciesI less context-dependenceI doesn’t attempt to resolve ambiguity — only do those things

which can be done reliably[the boy] [saw] [the man] [with a telescope]

I less error propagation

Ewan Klein [email protected] Chunking

Page 14: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunk Parsing: Domain Independence

Chunk parsing can be relatively domain independent, in thatI Dependencies involving lexical or semantic information tend to

occur at levels ‘higher’ than chunks:I attachment of PPs and other modifiersI argument selectionI constituent re-ordering

Ewan Klein [email protected] Chunking

Page 15: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunk Parsing: Efficiency

Chunk parsing is more efficient:

I smaller solution space

I relevant context is small and local

I chunks are non-recursive

I can be implement with a finite state automaton (FSA)

I can be applied to very large text sources

Ewan Klein [email protected] Chunking

Page 16: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Psycholinguistic Motivations

I Chunks as processing units — evidence that humans tend toread texts one chunk at a time

I Chunks are phonologically relevantI prosodic phrase breaksI rhythmic patterns

I Chunking might be a first step in full parsing

Ewan Klein [email protected] Chunking

Page 17: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 1

I Assume input is tagged.

I Identify chunks (e.g., noun groups) by sequences of tags:

announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$

Ewan Klein [email protected] Chunking

Page 18: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 2

I Assume input is tagged.

I Identify chunks (e.g., noun groups) by sequences of tags:

announce any new policy measures in his . . .

VB DT JJ NN NNS IN PRP$

I Define rules in terms of tag patterns

I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)

Ewan Klein [email protected] Chunking

Page 19: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 2

I Assume input is tagged.

I Identify chunks (e.g., noun groups) by sequences of tags:

announce any new policy measures in his . . .

VB DT JJ NN NNS IN PRP$

I Define rules in terms of tag patterns

I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)

Ewan Klein [email protected] Chunking

Page 20: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 2

I Assume input is tagged.

I Identify chunks (e.g., noun groups) by sequences of tags:

announce any new policy measures in his . . .

VB DT JJ NN NNS IN PRP$

I Define rules in terms of tag patterns

I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)

Ewan Klein [email protected] Chunking

Page 21: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 3

I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)

I Extending the example:

in his Mansion House speech

IN PRP$ NNP NNP NN

I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’

I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’

I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’

I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’

Ewan Klein [email protected] Chunking

Page 22: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 3

I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)

I Extending the example:

in his Mansion House speech

IN PRP$ NNP NNP NN

I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’

I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’

I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’

I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’

Ewan Klein [email protected] Chunking

Page 23: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 3

I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)

I Extending the example:

in his Mansion House speech

IN PRP$ NNP NNP NN

I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’

I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’

I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’

I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’

Ewan Klein [email protected] Chunking

Page 24: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 3

I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)

I Extending the example:

in his Mansion House speech

IN PRP$ NNP NNP NN

I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’

I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’

I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’

I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’

Ewan Klein [email protected] Chunking

Page 25: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunking with Regular Expressions, 3

I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)

I Extending the example:

in his Mansion House speech

IN PRP$ NNP NNP NN

I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’

I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’

I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’

I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’

Ewan Klein [email protected] Chunking

Page 26: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Tag Patterns in Chunk Rules

I NLTK-Lite tag patterns are a special kind of RegularExpression:

I Use ’< >’ for grouping instead of ’( )’, e.g.’<JJ>*’, ’<NN|NNS>*’

I Wildcard ’.’ never matches beyond tag boundaries, e.g.’<NN.*>’ matches ’<NN>’ and ’<NNS>’, but not ’<NN|JJ>’

I Whitespace is ignored in tag patterns, e.g.’<NN | JJ>’ is equivalent to ’<NN|JJ>’

Ewan Klein [email protected] Chunking

Page 27: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Tag Patterns in Chunk Rules

I NLTK-Lite tag patterns are a special kind of RegularExpression:

I Use ’< >’ for grouping instead of ’( )’, e.g.’<JJ>*’, ’<NN|NNS>*’

I Wildcard ’.’ never matches beyond tag boundaries, e.g.’<NN.*>’ matches ’<NN>’ and ’<NNS>’, but not ’<NN|JJ>’

I Whitespace is ignored in tag patterns, e.g.’<NN | JJ>’ is equivalent to ’<NN|JJ>’

Ewan Klein [email protected] Chunking

Page 28: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Tag Patterns in Chunk Rules

I NLTK-Lite tag patterns are a special kind of RegularExpression:

I Use ’< >’ for grouping instead of ’( )’, e.g.’<JJ>*’, ’<NN|NNS>*’

I Wildcard ’.’ never matches beyond tag boundaries, e.g.’<NN.*>’ matches ’<NN>’ and ’<NNS>’, but not ’<NN|JJ>’

I Whitespace is ignored in tag patterns, e.g.’<NN | JJ>’ is equivalent to ’<NN|JJ>’

Ewan Klein [email protected] Chunking

Page 29: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Chunk Grammars

Approach adopted in Cass (Abney)

I Recognition carried out by a cascade of FSAs — output ofone is the input to another

Level 0: tagged wordsLevel 1: all sequences at level 0 that match a given

pattern are replaced by appropriate labelI e.g., date expressions replaced by the label Date

Level n: do something with output of Level n − 1

I Strings that don’t match a pattern are just passed onunchanged

Ewan Klein [email protected] Chunking

Page 30: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CASS RegEx Grammar

Automata defined by a ‘regular expression grammar’

:chunksnx -> DT? NN+vx -> VBZ | VBD | BE VBG:phrasesvp -> vx nx*pp -> IN nx:clausec -> pp* nx pp* vp pp*

Ewan Klein [email protected] Chunking

Page 31: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CASS Example

take/VBP the/DT road/NN on/IN the/DT left/NN

[vx take/VBP] [nx the/DT road/NN] on/IN [nx the/DT left/NN]

[vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]

[c [vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]

Ewan Klein [email protected] Chunking

Page 32: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CASS Example

take/VBP the/DT road/NN on/IN the/DT left/NN

[vx take/VBP] [nx the/DT road/NN] on/IN [nx the/DT left/NN]

[vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]

[c [vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]

Ewan Klein [email protected] Chunking

Page 33: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CASS Example

take/VBP the/DT road/NN on/IN the/DT left/NN

[vx take/VBP] [nx the/DT road/NN] on/IN [nx the/DT left/NN]

[vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]

[c [vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]

Ewan Klein [email protected] Chunking

Page 34: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CASS Example

take/VBP the/DT road/NN on/IN the/DT left/NN

[vx take/VBP] [nx the/DT road/NN] on/IN [nx the/DT left/NN]

[vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]

[c [vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]

Ewan Klein [email protected] Chunking

Page 35: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CONLL Notation for Chunks

I Instead of using bracketing, as inannounce [any new policy measures] in [his ...

I we tag words according to where they are in a chunk:

announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$

O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.

I Known both as BIO and IOB tagging

I Used in CoNNL shared tasks

I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging

Ewan Klein [email protected] Chunking

Page 36: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CONLL Notation for Chunks

I Instead of using bracketing, as inannounce [any new policy measures] in [his ...

I we tag words according to where they are in a chunk:

announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$

O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.

I Known both as BIO and IOB tagging

I Used in CoNNL shared tasks

I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging

Ewan Klein [email protected] Chunking

Page 37: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CONLL Notation for Chunks

I Instead of using bracketing, as inannounce [any new policy measures] in [his ...

I we tag words according to where they are in a chunk:

announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$

O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.

I Known both as BIO and IOB tagging

I Used in CoNNL shared tasks

I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging

Ewan Klein [email protected] Chunking

Page 38: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CONLL Notation for Chunks

I Instead of using bracketing, as inannounce [any new policy measures] in [his ...

I we tag words according to where they are in a chunk:

announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$

O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.

I Known both as BIO and IOB tagging

I Used in CoNNL shared tasks

I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging

Ewan Klein [email protected] Chunking

Page 39: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

CONLL Notation for Chunks

I Instead of using bracketing, as inannounce [any new policy measures] in [his ...

I we tag words according to where they are in a chunk:

announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$

O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.

I Known both as BIO and IOB tagging

I Used in CoNNL shared tasks

I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging

Ewan Klein [email protected] Chunking

Page 40: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Summary

I Chunking is less ambitious than full parsing, but moreefficient.

I Maybe sufficient for many practical tasks:I Information ExtractionI Question AnsweringI Extracting subcatgorization framesI Providing features for machine learning, e.g., for building

Named Entity recognizers.

I Two main approaches:

1. Regular expressions over tag sequences2. Tagging with IOB tags

I Cass extends regular expression approach using a cascade offinite state transducers.

Ewan Klein [email protected] Chunking

Page 41: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Summary

I Chunking is less ambitious than full parsing, but moreefficient.

I Maybe sufficient for many practical tasks:I Information ExtractionI Question AnsweringI Extracting subcatgorization framesI Providing features for machine learning, e.g., for building

Named Entity recognizers.

I Two main approaches:

1. Regular expressions over tag sequences2. Tagging with IOB tags

I Cass extends regular expression approach using a cascade offinite state transducers.

Ewan Klein [email protected] Chunking

Page 42: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Summary

I Chunking is less ambitious than full parsing, but moreefficient.

I Maybe sufficient for many practical tasks:I Information ExtractionI Question AnsweringI Extracting subcatgorization framesI Providing features for machine learning, e.g., for building

Named Entity recognizers.

I Two main approaches:

1. Regular expressions over tag sequences2. Tagging with IOB tags

I Cass extends regular expression approach using a cascade offinite state transducers.

Ewan Klein [email protected] Chunking

Page 43: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Summary

I Chunking is less ambitious than full parsing, but moreefficient.

I Maybe sufficient for many practical tasks:I Information ExtractionI Question AnsweringI Extracting subcatgorization framesI Providing features for machine learning, e.g., for building

Named Entity recognizers.

I Two main approaches:

1. Regular expressions over tag sequences2. Tagging with IOB tags

I Cass extends regular expression approach using a cascade offinite state transducers.

Ewan Klein [email protected] Chunking

Page 44: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Reading

I Jurafsky and Martin, Section 10.5

I NLTK-Lite Chunk Parsing Tutorial

I Steven Abney. Parsing By Chunks. In: Robert Berwick,Steven Abney and Carol Tenny (eds.), Principle-BasedParsing. Kluwer Academic Publishers, Dordrecht. 1991.

I Steven Abney. Partial Parsing via Finite-State Cascades. J. ofNatural Language Engineering, 2(4): 337-344. 1996.

I Abney’s publications:http://www.vinartus.net/spa/publications.html

Ewan Klein [email protected] Chunking

Page 45: Chunking - School of Informatics, University of Edinburgh · Chunking in NLTK-Lite Chunking in Cass Chunking as Tagging Summary and Reading Problems with Full Parsing, 2 I Speed:

OutlineMotivation

What are chunks?Chunking in NLTK-Lite

Chunking in CassChunking as Tagging

Summary and Reading

Extra Tutorial

I Extra tutorial on writing tag patterns

I 5.00pm Tuesday 15th Nov, HCRC Seminar Room, 2Buccleuch Place

Ewan Klein [email protected] Chunking