17
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 5 (17/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Classical Part of Speech (PoS) Tagging

Classical Part of Speech (PoS) Tagging

Embed Size (px)

DESCRIPTION

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 5 (17/01/06) Prof. Pushpak Bhattacharyya IIT Bombay. Classical Part of Speech (PoS) Tagging. Approach to Classical PoS Tagging. Lexicon labeling Look at the dictionary - PowerPoint PPT Presentation

Citation preview

Page 1: Classical Part of Speech (PoS) Tagging

CS460/IT632 Natural Language

Processing/Language Technology for the Web

Lecture 5 (17/01/06)Prof. Pushpak Bhattacharyya

IIT Bombay

Classical Part of Speech (PoS) Tagging

Page 2: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

2

Approach to Classical PoS Tagging

1. Lexicon labeling– Look at the dictionary– Obtain all tags for the words in the sentence– Plug them as labels for these words

2. Disambiguation– Use rules, to eliminate tags

3. Repeat disambiguation process until– all the tags are disambiguated, or – no further change occurs.

Page 3: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

3

Possible Tag’s Example

• Possible Tags for ‘that’1. DET (Determiner)

2. PRON (Pronoun)

3. ADV (Adverb)

4. COMPLIMENTIZER

Page 4: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

4

Usage Examples of ‘that’

1. ‘That’ as DET– Look at that man.

2. ‘That’ as PRON– That will never be understood.

3. ‘That’ as ADV– They have spent that much!

4. ‘That’ as COMPLIMENTIZER– She tells me that she is fine.

Page 5: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

5

A Disambiguation Rule• Given input ‘that’:

If ( +1 A / ADV / QUANT)

( +2 SENT_LIM)

( NOT -1 SVOC / A)

Then

eliminate non-ADV tags

Else

eliminate ADV tag

Page 6: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

6

Semantics of the Rule

• Conditions are associated through ‘ANDing’

• Condition is read as:– Next word is Adjective, Adverb, or Quantifier,

AND– Second followed word is a Sentence Limiter,

AND– Previous word is not a ‘consider’ type of word

Page 7: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

7

Apply the Disambiguation Rule

• Sentence 1 , 2, and 4 does not satisfy the conditions given in the rule,

• Sentence 3 does satisfy the conditions, viz– QUANT = ‘must’– SENT_LIM = ‘!’– SVOC = ‘spent’

Page 8: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

8

How to obtain Attributes and Rules

• We necessitate:– Lexical Attributes– Disambiguation rules

• Both can be obtained by:– Manual means– Learning

• It is an easy process for “Lexical attributes”, • It is not trivial for the “Disambiguation rules”.

Page 9: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

9

Specification for Rule learning

1. Rules have to be compact, i.e. each condition should be as specific as possible

2. A rule should cover lot of phenomena.

3. Rules have to be non-conflicting.

Page 10: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

10

Brill’s Tagger

• Learns rules from algorithm called as “Transformation based error driven learning”

• Uses AI search technique, viz,– Starts with “state space”– Use an algorithm (BFS, DFS, A*, et. al.) for

searching the space

Page 11: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

11

Brill’s tagging as search

• S0: Seed tagged text

• S1,S2: Generated states

• O1,O2: Operators (Rules)

S0

S1 S2

O1 O2

• Operators have LHS as condition and RHS as actions,• Generated states are obtained on performing the actions

Page 12: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

12

Learning using Templates

• Brill’s learning uses Templates.

• Templates are instantiated based on training situation.

• Steps in learning:– Look at the training corpus– Instantiate the templates– Arrive at a set of rules satisfying the

performance criteria

Page 13: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

13

An Example Template

• Change tag ‘a’ to tag ‘b’

whenthe preceding (following) word is tagged ‘z’

the word two before (after) is tagged ‘z’

one of the two preceding (following) words is tagged ‘z’

one of the three preceding (following) words is tagged ‘z’

the preceding word is tagged ‘z’ and the following word is tagged ‘w’

the preceding (following) word is tagged ‘z’ and the word two before (after) is tagged ‘w’

Page 14: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

14

Brill’s Tagger illustration

• Example:– They consider that odd.

• Tagged correctly as:– They_PPS consider_VB that_ADV odd_JJ.

• Now next step is to learn rules and decide which template to instantiate.

Page 15: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

15

Viterbi Algorithm illustration

S0 S0

b: 0.2

b: 0.3

a: 0.1

a 0.2

b: 0.5

a: 0.2

b: 0.1

a: 0.4

– Consider this state machine for input sequence: aabb– Next slide explains the search steps to get the maximum product value of the probabilities, for the state sequences.

Page 16: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

16

S0S0S0S0 (0.004)

S0

S0S1 (0.2)S0S0 (0.2)

S0S0S0 (0.04) S0S0S1 (0.04) S0S1S0 (0.02) S0S1S1 (0.08)

S0S0S0S1 (0.02)

S0S1S1S1 (0.024)

S0S1S1S0 (0.016)

xS0S1S1S0S1 (0.008)S0S1S1S0S0 (0.016)

S0S1S1S1S1 (0.0072)

S0S1S1S1S0 (0.0048)

a

a

b

b

x

x x

xx x

Page 17: Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

17

Remarks

• It prunes the node if there is another node ending with the same state and higher product value – Markov Assumption

• Complexity:– Without Markov Assumption: 2T (exponential)– With Markov Assumption: 2T (linear)