CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof....

Preview:

Citation preview

CS460/IT632 Natural Language

Processing/Language Technology for the Web

Lecture 2 (06/01/06)Prof. Pushpak Bhattacharyya

IIT Bombay

Part of Speech (PoS) Tagging

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 2

Tagging or Annotation

● Purpose is Disambiguation● A word can have a number of labels● The problem is to give unique label.● PoS tagging makes use of the “local context”,

whereas Sense tagging needs “long distance dependency” and hence difficult too.

● PoS tagging is needed in mainly parsing and also in other applications.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 3

Approaches

● Rule Based approach● Statistical approach

– we will mainly focus on the statistical approach

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 4

Types of Tagging Tasks

● PoS● Named entity● Sense● Parse tree

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 5

PoS Tagging

● Example– “The Orange ducks clean the bills.”

● Assign tags to each word from the lexicon; multiple possibilities exist

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 6

Lexicon dictionary

● The: – DT (Determiner)

● Orange:– NN (Noun)

– JJ (Adjective)

● Duck:– NN

– VB ( Basic verb)

● Clean:– NN – VB

● Bill:– NN– VB

JJ, VB, NN are called as Syntactic entities or PoS tags

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 7

PoS tagging as a sequence labelling task

● Task is to assign the correct PoS tag sequence to the words.

● It can be:– Unigram: Consider one word while deciding the

sequence.

– Multigram: Consider multiple words.

● 16 (=1*2*2*2*1*2) possible sequences for the “Duck” example.

● It is a classification problem: classify each word’s tag correctly into the right category.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 8

Challenges● Lexical ambiguity: Multiple choices● Morphology analysis: Find the root word● Tokenization: Find word boundaries

– In Thai language there is no blank space

– Non trivial (example: capturing boundaries when the word is continued to the next line with a “-”)

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 9

Named Entity tagging

● Example 1:– “Mohan went to school in Kolkata”

● Tagged as:– “Mohan_Person went to School_Place in

Kolkata_Place”.

● Example 2:– “Kolkata bore the brunt of 1947 riots when 1947

children died at Kolkata.

– “Kolkata_? bore the brunt of 1947_year riots when 1947_num children died at Kolkata_Place.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 10

Sense tagging

● Detecting the meaning.● Our example tagged as:

– The Orange_{colour} ducks_{bird} clean the bills_{body_part}

● Sense tagging has been done by means of hypernymy.

● Semantic relations like hypernymy are stored in the lexical resource called “WordNet”.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 11

Parse Tree tagging

● Example parse tree:

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 12

Parse Tree tagging (contd.)

● Given a grammar, one can construct the parse tree.

● Annotation will produce following structure:– [ [The_DT [Orange_JJ Ducks_NN]NP]NP [clean_VB[the_VB

[bills_NN]NP]NP]VP]S

● This structure is called the Penn Treebank form

● From the Treebank form, one can arrive at a grammar through learning.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 13

Statistical Formulation of the PoS tagging problem

● Input:

– W1,W2,...Wn words

– C1,C2,....Cm Lexical tags reposition (DT,JJ, NN et. al.)

● Output:

– “Best” PoS tag sequence Ci1, Ci2

, Ci3....Cin

for the

given words.

● Best means:

– P(Ci1, Ci2

, Ci3....Cin

|W1,W2,...Wn) is the maximum of all

possible C-sequence.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 14

● Example:

– P(DT JJ NN| The Orange duck) > P(DT NN VB| The Orange duck) is required

● Why?:– Because given the phrase “The orange duck”, there

is overwhelming evidence in the corpus that “DT JJ NN” is the right tag sequence.

Statistical Formation of PoS tagging problem

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 15

Mathematical machinery

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 16

Bayes Theorem

● P(A|B) = (P(A).P(B|A)) / P(B)– Where,

– P(A): Prior probability

– P(A|B): Posterior probability

– P(B|A): likelihood

● Why apply Bayes theorem:– This is the Generative Vs Discriminative model

question.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 17

Apply Bayes theorem

P(Ci1, Ci2

, Ci3....Cin

|W1,W2,...Wn) = P(C|W)

=

where,

C = <Ci1, Ci2

, Ci3....Cin

>

W = <W1,W2,...Wn>

P(C). P(W|C)

P(W)

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 18

Best tag sequence

C* = <Ci1, Ci2

, Ci3....Cin

>* , where * signifies best

C-sequence

= argmax(P(C|W))● As denominator is common in all the tag sequences

Therefore,

C* = argmax(P(C).P(W|C))

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 19

Processing the1st part

P(C) = P(Ci1, Ci2

, Ci3....Cin

)

= P(Ci1).P(Ci2

|Ci1).P(Ci3

|Ci1. Ci2

)..P(Cin|Ci1

Ci2..

Cin-1)

(on applying chain rule of probability)

Ex: P(DT JJ NN) = P(DT).P(JJ|DT).P(NN|DT JJ)

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 20

Markov assumption

● Tag depends only on a window, not on everything that the “chain law” of probability demands.

● Kth order Markov assumption considers only previous K tags.

● Typical values of K = 3 for English, and (it seems) 5 for Hindi.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 21

Apply assumption

With K=2, our problem will be:

P(C) = P(Ci|Ci-1),

i: 1..n

C0: sentence beginning marker.

06/01/06 Prof. Pushpak Bhattacharyya, IIT Bombay 22

Exercise given in the lecture

● Contrast PoS tagging with Sense tagging.● Find an example to show the difference.

Recommended