Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking
Ewan [email protected]
ICL — 14 November 2005
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Motivation
What are chunks?
Chunking in NLTK-Lite
Chunking in Cass
Chunking as Tagging
Summary and Reading
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Problems with Full Parsing, 1
Goal: Build a complete parse tree for a sentence.I Coverage and ambiguity:
I No complete grammar of any languageI Sapir: “All grammars leak”I As coverage increases, so does ambiguity.I Problem of ranking parses by degree of ‘plausibility’
I Low accuracyI Unbounded dependencies hard to parseI Errors tend to propagate
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Problems with Full Parsing, 2
I Speed:I Complexity of rule-based chart parsing is O(n3) in length of
sentence, multiplied by factor O(G 2), where G is size ofgrammar.
I Practical results are often better, but still slow for parsing large(e.g., billion words) corpora in reasonable time.
I Finite state machines have worst-case complexity O(n) inlength of string.
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Motivations for Parsing
Why parse sentences in the first place?
I Parsing is usually an intermediate stage in a larger processingframework.
I Full parsing is a sufficient but not necessary step for manyNLP tasks.
I Full parsing often provides more information than we need orcan deal with.
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Partial Parsing / Chunking
Assign a partial structure to a sentence.
I Don’t try to deal with all of language
I Don’t attempt to resolve all semantically significant decisions
I Use deterministic grammars for easy-to-parse pieces, andother methods for other pieces, depending on task.
I “easy to parse” = no ambiguity & no recursionI Partial parsing is usually:
I easier to implementI more robustI faster
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking, 1
Goal: Divide a sentence into a sequence of chunks.
I Abney (1994):
[when I read] [a sentence], [I read it][a chunk] [at a time]
I Chunks are non-overlapping regions of text:[walk] [straight past] [the lake]
I (Usually) each chunk contains a head, with the possibleaddition of some preceding function words and modifiers
[ walk ] [straight past ] [the lake ]
I Chunks are non-recursive:I A chunk cannot contain another chunk of the same category
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking, 2
I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk
[take] [the second road] that [is] on [the left hand side]
I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:
I they are not included in noun chunks.
I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)
I noun groups — everything in NP up to and including thehead noun
I verb groups — everything in VP (including auxiliaries) up toand including the head verb
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking, 2
I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk
[take] [the second road] that [is] on [the left hand side]
I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:
I they are not included in noun chunks.
I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)
I noun groups — everything in NP up to and including thehead noun
I verb groups — everything in VP (including auxiliaries) up toand including the head verb
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking, 2
I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk
[take] [the second road] that [is] on [the left hand side]
I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:
I they are not included in noun chunks.
I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)
I noun groups — everything in NP up to and including thehead noun
I verb groups — everything in VP (including auxiliaries) up toand including the head verb
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking, 2
I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk
[take] [the second road] that [is] on [the left hand side]
I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:
I they are not included in noun chunks.
I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)
I noun groups — everything in NP up to and including thehead noun
I verb groups — everything in VP (including auxiliaries) up toand including the head verb
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking, 2
I Chunks are non-exhaustiveI Some words in a sentence may not be grouped into a chunk
[take] [the second road] that [is] on [the left hand side]
I NP postmodifiers (e.g., PPs, relative clauses) are oftenrecursive and/or structurally ambiguous:
I they are not included in noun chunks.
I Chunks are typically subsequences of constituents (they don’tcross constituent boundaries)
I noun groups — everything in NP up to and including thehead noun
I verb groups — everything in VP (including auxiliaries) up toand including the head verb
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunk Parsing: Accuracy
Chunk parsing attempts to do less, but does it more accurately.
I Smaller solution space
I Less word-order flexibility within chunks than between chunks.I Better locality:
I doesn’t attempt to deal with unbounded dependenciesI less context-dependenceI doesn’t attempt to resolve ambiguity — only do those things
which can be done reliably[the boy] [saw] [the man] [with a telescope]
I less error propagation
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunk Parsing: Domain Independence
Chunk parsing can be relatively domain independent, in thatI Dependencies involving lexical or semantic information tend to
occur at levels ‘higher’ than chunks:I attachment of PPs and other modifiersI argument selectionI constituent re-ordering
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunk Parsing: Efficiency
Chunk parsing is more efficient:
I smaller solution space
I relevant context is small and local
I chunks are non-recursive
I can be implement with a finite state automaton (FSA)
I can be applied to very large text sources
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Psycholinguistic Motivations
I Chunks as processing units — evidence that humans tend toread texts one chunk at a time
I Chunks are phonologically relevantI prosodic phrase breaksI rhythmic patterns
I Chunking might be a first step in full parsing
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 1
I Assume input is tagged.
I Identify chunks (e.g., noun groups) by sequences of tags:
announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 2
I Assume input is tagged.
I Identify chunks (e.g., noun groups) by sequences of tags:
announce any new policy measures in his . . .
VB DT JJ NN NNS IN PRP$
I Define rules in terms of tag patterns
I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 2
I Assume input is tagged.
I Identify chunks (e.g., noun groups) by sequences of tags:
announce any new policy measures in his . . .
VB DT JJ NN NNS IN PRP$
I Define rules in terms of tag patterns
I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 2
I Assume input is tagged.
I Identify chunks (e.g., noun groups) by sequences of tags:
announce any new policy measures in his . . .
VB DT JJ NN NNS IN PRP$
I Define rules in terms of tag patterns
I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 3
I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)
I Extending the example:
in his Mansion House speech
IN PRP$ NNP NNP NN
I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’
I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’
I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’
I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 3
I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)
I Extending the example:
in his Mansion House speech
IN PRP$ NNP NNP NN
I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’
I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’
I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’
I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 3
I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)
I Extending the example:
in his Mansion House speech
IN PRP$ NNP NNP NN
I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’
I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’
I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’
I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 3
I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)
I Extending the example:
in his Mansion House speech
IN PRP$ NNP NNP NN
I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’
I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’
I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’
I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunking with Regular Expressions, 3
I rule = parse.ChunkRule(’<DT><JJ><NN><NNS>’,’Modified plural NPs’)
I Extending the example:
in his Mansion House speech
IN PRP$ NNP NNP NN
I DT or PRP$: ’<DT|PRP$><JJ><NN><NNS>’
I JJ and NN are optional: ’<DT|PRP$><JJ>*<NN>*<NNS>’
I we can have NNPs: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NNS>’
I NN or NNS: ’<DT|PRP$><JJ>*<NNP>*<NN>*<NN|NNS>’
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Tag Patterns in Chunk Rules
I NLTK-Lite tag patterns are a special kind of RegularExpression:
I Use ’< >’ for grouping instead of ’( )’, e.g.’<JJ>*’, ’<NN|NNS>*’
I Wildcard ’.’ never matches beyond tag boundaries, e.g.’<NN.*>’ matches ’<NN>’ and ’<NNS>’, but not ’<NN|JJ>’
I Whitespace is ignored in tag patterns, e.g.’<NN | JJ>’ is equivalent to ’<NN|JJ>’
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Tag Patterns in Chunk Rules
I NLTK-Lite tag patterns are a special kind of RegularExpression:
I Use ’< >’ for grouping instead of ’( )’, e.g.’<JJ>*’, ’<NN|NNS>*’
I Wildcard ’.’ never matches beyond tag boundaries, e.g.’<NN.*>’ matches ’<NN>’ and ’<NNS>’, but not ’<NN|JJ>’
I Whitespace is ignored in tag patterns, e.g.’<NN | JJ>’ is equivalent to ’<NN|JJ>’
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Tag Patterns in Chunk Rules
I NLTK-Lite tag patterns are a special kind of RegularExpression:
I Use ’< >’ for grouping instead of ’( )’, e.g.’<JJ>*’, ’<NN|NNS>*’
I Wildcard ’.’ never matches beyond tag boundaries, e.g.’<NN.*>’ matches ’<NN>’ and ’<NNS>’, but not ’<NN|JJ>’
I Whitespace is ignored in tag patterns, e.g.’<NN | JJ>’ is equivalent to ’<NN|JJ>’
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Chunk Grammars
Approach adopted in Cass (Abney)
I Recognition carried out by a cascade of FSAs — output ofone is the input to another
Level 0: tagged wordsLevel 1: all sequences at level 0 that match a given
pattern are replaced by appropriate labelI e.g., date expressions replaced by the label Date
Level n: do something with output of Level n − 1
I Strings that don’t match a pattern are just passed onunchanged
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CASS RegEx Grammar
Automata defined by a ‘regular expression grammar’
:chunksnx -> DT? NN+vx -> VBZ | VBD | BE VBG:phrasesvp -> vx nx*pp -> IN nx:clausec -> pp* nx pp* vp pp*
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CASS Example
take/VBP the/DT road/NN on/IN the/DT left/NN
[vx take/VBP] [nx the/DT road/NN] on/IN [nx the/DT left/NN]
[vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]
[c [vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CASS Example
take/VBP the/DT road/NN on/IN the/DT left/NN
[vx take/VBP] [nx the/DT road/NN] on/IN [nx the/DT left/NN]
[vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]
[c [vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CASS Example
take/VBP the/DT road/NN on/IN the/DT left/NN
[vx take/VBP] [nx the/DT road/NN] on/IN [nx the/DT left/NN]
[vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]
[c [vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CASS Example
take/VBP the/DT road/NN on/IN the/DT left/NN
[vx take/VBP] [nx the/DT road/NN] on/IN [nx the/DT left/NN]
[vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]
[c [vx take/VBP] [nx the/DT road/NN] [pp on/IN [nx the/DT left/NN]]
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CONLL Notation for Chunks
I Instead of using bracketing, as inannounce [any new policy measures] in [his ...
I we tag words according to where they are in a chunk:
announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$
O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.
I Known both as BIO and IOB tagging
I Used in CoNNL shared tasks
I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CONLL Notation for Chunks
I Instead of using bracketing, as inannounce [any new policy measures] in [his ...
I we tag words according to where they are in a chunk:
announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$
O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.
I Known both as BIO and IOB tagging
I Used in CoNNL shared tasks
I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CONLL Notation for Chunks
I Instead of using bracketing, as inannounce [any new policy measures] in [his ...
I we tag words according to where they are in a chunk:
announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$
O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.
I Known both as BIO and IOB tagging
I Used in CoNNL shared tasks
I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CONLL Notation for Chunks
I Instead of using bracketing, as inannounce [any new policy measures] in [his ...
I we tag words according to where they are in a chunk:
announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$
O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.
I Known both as BIO and IOB tagging
I Used in CoNNL shared tasks
I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
CONLL Notation for Chunks
I Instead of using bracketing, as inannounce [any new policy measures] in [his ...
I we tag words according to where they are in a chunk:
announce any new policy measures in his . . .VB DT JJ NN NNS IN PRP$
O B-NP I-NP I-N P I-NP O B-NPwhere B-NP is ‘Begin noun chunk’, I-NP is ‘Inside nounchunk’ and O is ‘Outside any chunk’.
I Known both as BIO and IOB tagging
I Used in CoNNL shared tasks
I Allows off-the-shelf statistical taggers to be used for chunkingas well as POS tagging
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Summary
I Chunking is less ambitious than full parsing, but moreefficient.
I Maybe sufficient for many practical tasks:I Information ExtractionI Question AnsweringI Extracting subcatgorization framesI Providing features for machine learning, e.g., for building
Named Entity recognizers.
I Two main approaches:
1. Regular expressions over tag sequences2. Tagging with IOB tags
I Cass extends regular expression approach using a cascade offinite state transducers.
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Summary
I Chunking is less ambitious than full parsing, but moreefficient.
I Maybe sufficient for many practical tasks:I Information ExtractionI Question AnsweringI Extracting subcatgorization framesI Providing features for machine learning, e.g., for building
Named Entity recognizers.
I Two main approaches:
1. Regular expressions over tag sequences2. Tagging with IOB tags
I Cass extends regular expression approach using a cascade offinite state transducers.
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Summary
I Chunking is less ambitious than full parsing, but moreefficient.
I Maybe sufficient for many practical tasks:I Information ExtractionI Question AnsweringI Extracting subcatgorization framesI Providing features for machine learning, e.g., for building
Named Entity recognizers.
I Two main approaches:
1. Regular expressions over tag sequences2. Tagging with IOB tags
I Cass extends regular expression approach using a cascade offinite state transducers.
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Summary
I Chunking is less ambitious than full parsing, but moreefficient.
I Maybe sufficient for many practical tasks:I Information ExtractionI Question AnsweringI Extracting subcatgorization framesI Providing features for machine learning, e.g., for building
Named Entity recognizers.
I Two main approaches:
1. Regular expressions over tag sequences2. Tagging with IOB tags
I Cass extends regular expression approach using a cascade offinite state transducers.
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Reading
I Jurafsky and Martin, Section 10.5
I NLTK-Lite Chunk Parsing Tutorial
I Steven Abney. Parsing By Chunks. In: Robert Berwick,Steven Abney and Carol Tenny (eds.), Principle-BasedParsing. Kluwer Academic Publishers, Dordrecht. 1991.
I Steven Abney. Partial Parsing via Finite-State Cascades. J. ofNatural Language Engineering, 2(4): 337-344. 1996.
I Abney’s publications:http://www.vinartus.net/spa/publications.html
Ewan Klein [email protected] Chunking
OutlineMotivation
What are chunks?Chunking in NLTK-Lite
Chunking in CassChunking as Tagging
Summary and Reading
Extra Tutorial
I Extra tutorial on writing tag patterns
I 5.00pm Tuesday 15th Nov, HCRC Seminar Room, 2Buccleuch Place
Ewan Klein [email protected] Chunking