Classical Part of Speech (PoS) Tagging

CS460/IT632 Natural Language

Processing/Language Technology for the Web

Lecture 5 (17/01/06)Prof. Pushpak Bhattacharyya

IIT Bombay

Classical Part of Speech (PoS) Tagging

17/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay

2

Approach to Classical PoS Tagging

1. Lexicon labeling– Look at the dictionary– Obtain all tags for the words in the sentence– Plug them as labels for these words

2. Disambiguation– Use rules, to eliminate tags

3. Repeat disambiguation process until– all the tags are disambiguated, or – no further change occurs.


3

Possible Tag’s Example

• Possible Tags for ‘that’1. DET (Determiner)

2. PRON (Pronoun)

3. ADV (Adverb)

4. COMPLIMENTIZER


4

Usage Examples of ‘that’

1. ‘That’ as DET– Look at that man.

2. ‘That’ as PRON– That will never be understood.

3. ‘That’ as ADV– They have spent that much!

4. ‘That’ as COMPLIMENTIZER– She tells me that she is fine.


5

A Disambiguation Rule• Given input ‘that’:

If ( +1 A / ADV / QUANT)

( +2 SENT_LIM)

( NOT -1 SVOC / A)

Then

eliminate non-ADV tags

Else

eliminate ADV tag


6

Semantics of the Rule

• Conditions are associated through ‘ANDing’

• Condition is read as:– Next word is Adjective, Adverb, or Quantifier,

AND– Second followed word is a Sentence Limiter,

AND– Previous word is not a ‘consider’ type of word


7

Apply the Disambiguation Rule

• Sentence 1 , 2, and 4 does not satisfy the conditions given in the rule,

• Sentence 3 does satisfy the conditions, viz– QUANT = ‘must’– SENT_LIM = ‘!’– SVOC = ‘spent’


8

How to obtain Attributes and Rules

• We necessitate:– Lexical Attributes– Disambiguation rules

• Both can be obtained by:– Manual means– Learning

• It is an easy process for “Lexical attributes”, • It is not trivial for the “Disambiguation rules”.


9

Specification for Rule learning

1. Rules have to be compact, i.e. each condition should be as specific as possible

2. A rule should cover lot of phenomena.

3. Rules have to be non-conflicting.


10

Brill’s Tagger

• Learns rules from algorithm called as “Transformation based error driven learning”

• Uses AI search technique, viz,– Starts with “state space”– Use an algorithm (BFS, DFS, A*, et. al.) for

searching the space


11

Brill’s tagging as search

• S0: Seed tagged text

• S1,S2: Generated states

• O1,O2: Operators (Rules)

S0

S1 S2

O1 O2

• Operators have LHS as condition and RHS as actions,• Generated states are obtained on performing the actions


12

Learning using Templates

• Brill’s learning uses Templates.

• Templates are instantiated based on training situation.

• Steps in learning:– Look at the training corpus– Instantiate the templates– Arrive at a set of rules satisfying the

performance criteria


13

An Example Template

• Change tag ‘a’ to tag ‘b’

whenthe preceding (following) word is tagged ‘z’

the word two before (after) is tagged ‘z’

one of the two preceding (following) words is tagged ‘z’

one of the three preceding (following) words is tagged ‘z’

the preceding word is tagged ‘z’ and the following word is tagged ‘w’

the preceding (following) word is tagged ‘z’ and the word two before (after) is tagged ‘w’


14

Brill’s Tagger illustration

• Example:– They consider that odd.

• Tagged correctly as:– They_PPS consider_VB that_ADV odd_JJ.

• Now next step is to learn rules and decide which template to instantiate.


15

Viterbi Algorithm illustration

S0 S0

b: 0.2

b: 0.3

a: 0.1

a 0.2

b: 0.5

a: 0.2

b: 0.1

a: 0.4

– Consider this state machine for input sequence: aabb– Next slide explains the search steps to get the maximum product value of the probabilities, for the state sequences.


16

S0S0S0S0 (0.004)

S0

S0S1 (0.2)S0S0 (0.2)

S0S0S0 (0.04) S0S0S1 (0.04) S0S1S0 (0.02) S0S1S1 (0.08)

S0S0S0S1 (0.02)

S0S1S1S1 (0.024)

S0S1S1S0 (0.016)

xS0S1S1S0S1 (0.008)S0S1S1S0S0 (0.016)

S0S1S1S1S1 (0.0072)

S0S1S1S1S0 (0.0048)

a

a

b

b

x

x x

xx x


17

Remarks

• It prunes the node if there is another node ending with the same state and higher product value – Markov Assumption

• Complexity:– Without Markov Assumption: 2T (exponential)– With Markov Assumption: 2T (linear)

Documents

Classical Part of Speech (PoS) Tagging