91
Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011

Introduction to Classification Shallow Processing Techniques for NLP Ling570 November 9, 2011

  • View
    235

  • Download
    0

Embed Size (px)

Citation preview

Introduction toClassification

Shallow Processing Techniques for NLPLing570

November 9, 2011

RoadmapClassification problems:

Definition

Solutions

Case studies

Based on slides by F. Xia

Example: Text Classification

Task:Given an article, predict its category

Categories:

Example: Text Classification

Task:Given an article, predict its category

Categories:Sports, entertainment, news, weather,..Spam/not spam

Example: Text Classification

Task:Given an article, predict its category

Categories:Sports, entertainment, news, weather,..Spam/not spam

What kind of information is useful for this task?

Classification TaskTask:

C is a finite set of labels (aka categories, classes)Given x, determine its category y in C

Classification TaskTask:

C is a finite set of labels (aka categories, classes)Given x, determine its category y in C

Instance: (x,y)x: thing to be labeled/classifiedy: label/class

Classification TaskTask:

C is a finite set of labels (aka categories, classes)Given x, determine its category y in C

Instance: (x,y)x: thing to be labeled/classifiedy: label/class

Data: set of instances labeled data: y is knownunlabeled data: y is unknown

Classification TaskTask:

C is a finite set of labels (aka categories, classes) Given x, determine its category y in C

Instance: (x,y) x: thing to be labeled/classified y: label/class

Data: set of instances labeled data: y is known unlabeled data: y is unknown

Training data, test data

Text Classification Examples

Spam filtering

Call routing

Sentiment classificationPositive/NegativeScore: 1 to 5

POS TaggingTask: Given a sentence, predict tag of each word

Is this a classification problem?

POS TaggingTask: Given a sentence, predict tag of each word

Is this a classification problem?

Categories: N, V, Adj,…

What information is useful?

POS TaggingTask: Given a sentence, predict tag of each word

Is this a classification problem?

Categories: N, V, Adj,…

What information is useful?

How do POS tagging, text classification differ?

POS TaggingTask: Given a sentence, predict tag of each word

Is this a classification problem?

Categories: N, V, Adj,…

What information is useful?

How do POS tagging, text classification differ?Sequence labeling problem

Word SegmentationTask: Given a string, break into words

Categories:

Word SegmentationTask: Given a string, break into words

Categories: B(reak), NB (no break)B(eginning), I(nside), E(nd)

e.g. c1 c2 || c3 c4 c5

Word SegmentationTask: Given a string, break into words

Categories: B(reak), NB (no break)B(eginning), I(nside), E(nd)

e.g. c1 c2 || c3 c4 c5c1/NB c2/B c3/NB c4/NB c5/Bc1/B c2/E c3/B c4/I c5/E

What type of task?

Word SegmentationTask: Given a string, break into words

Categories: B(reak), NB (no break)B(eginning), I(nside), E(nd)

e.g. c1 c2 || c3 c4 c5c1/NB c2/B c3/NB c4/NB c5/Bc1/B c2/E c3/B c4/I c5/E

What type of task?Also sequence labeling

Solving a Classification Problem

Two StagesTraining:

Learner: training data classifier

Two StagesTraining:

Learner: training data classifier

Testing:Decoder: test data + classifier classification

output

Two StagesTraining:

Learner: training data classifier

Testing:Decoder: test data + classifier classification

output

AlsoPreprocessingPostprocessingEvaluation

Representing InputPotentially infinite values to represent

Representing InputPotentially infinite values to represent

Represent input as feature vectorx=<v1,v2,v3,…,vn>

x=<f1=v1,f2=v2,…,fn=vn>

Representing InputPotentially infinite values to represent

Represent input as feature vectorx=<v1,v2,v3,…,vn>

x=<f1=v1,f2=v2,…,fn=vn>

What are good features?

Example ISpam Tagging

Classes: Spam/Not Spam

Input: Email messages

Doc1Western Union Money Transfer [email protected] Bishops Square Akpakpa E1 6AO, CotonouBenin RepublicWebsite: http://www.westernunion.com/ info/selectCountry.asPPhone: +229 99388639

Attention Beneficiary,

This to inform you that the federal ministry of finance Benin Republic has started releasing scam victim compensation fund mandated by United Nation Organization through our office.

I am contacting you because our agent have sent you the first payment of $5,000 for your compensation funds total amount of $500 000 USD (Five hundred thousand united state dollar)

We need your urgent response so that we shall release your payment information to you.

You can call our office hot line for urgent attention(+22999388639)

Doc2 Hello! my dear. How are you today and your family? I hope all is

good,kindly pay Attention and understand my aim of communicating you todaythrough this Letter, My names is Saif al-Islam  al-Gaddafi the Son offormer  Libyan President. i was born on 1972 in Tripoli Libya,By Gaddafi’ssecond wive.I want you to help me clear this fund in your name which i deposited inEurope please i would like this money to be transferred into your accountbefore they find it.the amount is 20.300,000 million GBP British Pounds sterling through a

Doc3from: [email protected]

Apply for loan at 3% interest Rate..Contact us for details.

Doc4 from: [email protected]

REMINDER:

If you have not received a PIN number to vote in the elections and have not already contacted us, please contact either Drago Radev ([email protected]) or Priscilla Rasmussen ([email protected]) right away.

Everyone who has not received a pin but who has contacted us already will get a new pin over the weekend.

Anyone who still wants to join for 2011 needs to do this by Monday (November 7th) in order to be eligible to vote.

And, if you do have your PIN number and have not voted yet, remember every vote counts!

What are good features?

Possible FeaturesWords!

Possible FeaturesWords!

Feature for each word

Possible FeaturesWords!

Feature for each wordBinary: presence/absenceInteger: occurrence countParticular word types: money/sex/: [Vv].*gr.*

Possible FeaturesWords!

Feature for each wordBinary: presence/absenceInteger: occurrence countParticular word types: money/sex/: [Vv].*gr.*

Errors:Spelling, grammar

Possible FeaturesWords!

Feature for each wordBinary: presence/absenceInteger: occurrence countParticular word types: money/sex/: [Vv].*gr.*

Errors:Spelling, grammar

Images

Possible FeaturesWords!

Feature for each wordBinary: presence/absenceInteger: occurrence countParticular word types: money/sex/: [Vv].*gr.*

Errors:Spelling, grammar

Images

Header info

Representing Input:Attribute-Value Matrix

f1

Currency

f2

Country

… fm

Date

Label

x1= Doc1

x2=Doc2

..

xn=Doc4

Representing Input:Attribute-Value Matrix

f1

Currency

f2

Country

… fm

Date

Label

x1= Doc1 Spam

x2=Doc2 Spam

..

xn=Doc4 NotSpam

Representing Input:Attribute-Value Matrix

f1

Currency

f2

Country

… fm

Date

Label

x1= Doc1 1 1 0.3 0 Spam

x2=Doc2 Spam

..

xn=Doc4 NotSpam

Representing Input:Attribute-Value Matrix

f1

Currency

f2

Country

… fm

Date

Label

x1= Doc1 1 1 0.3 0 Spam

x2=Doc2 1 1 1.75 1 Spam

..

xn=Doc4 NotSpam

Representing Input:Attribute-Value Matrix

f1

Currency

f2

Country

… fm

Date

Label

x1= Doc1 1 1 0.3 0 Spam

x2=Doc2 1 1 1.75 1 Spam

..

xn=Doc4 0 0 0 2 NotSpam

ClassifierResult of training on input data

With or without class labels

ClassifierResult of training on input data

With or without class labels

Formal perspective:f(x) =y: x is input; y in C

ClassifierResult of training on input data

With or without class labels

Formal perspective: f(x) =y: x is input; y in C

More generally:f(x)={(ci,scorei)}, where

x is input,

ci in C,

scorei is score for category assignment

Testing Input:

Test data:e.g. AVM

Classifier

Output:

Testing Input:

Test data:e.g. AVM

Classifier

Output: Decision matrix Can assign highest

scoring class to each input

Testing Input:

Test data:e.g. AVM

Classifier

Output: Decision matrix Can assign highest

scoring class to each input

x1 x2 x3 ….

c1 0.1 0.1 0.2 …

c2 0 0.8 0 …

c3 0.2 0 0.7 …

c4

…..

0.7 0.1 0.1 …

Testing Input:

Test data:e.g. AVM

Classifier

Output: Decision matrix Can assign highest

scoring class to each input

x1 x2 x3 ….

c1 0.1 0.1 0.2 …

c2 0 0.8 0 …

c3 0.2 0 0.7 …

c4

…..

0.7 0.1 0.1 …

x1 x2 x3

c4 c2 c3

EvaluationConfusion matrix:

Precision: TP/(TP+FP)

GoldSystem

+ -

+ TP FP

- FN TN

EvaluationConfusion matrix:

Precision: TP/(TP+FP)

Recall: TP/(TP+FN)

GoldSystem

+ -

+ TP FP

- FN TN

EvaluationConfusion matrix:

Precision: TP/(TP+FP)

Recall: TP/(TP+FN)

F-score: 2PR/(P+R)

GoldSystem

+ -

+ TP FP

- FN TN

EvaluationConfusion matrix:

Precision: TP/(TP+FP)

Recall: TP/(TP+FN)

F-score: 2PR/(P+R)

Accuracy = (TP+TN)/(TP+TN+FP+TN)

GoldSystem

+ -

+ TP FP

- FN TN

EvaluationConfusion matrix:

Precision: TP/(TP+FP)

Recall: TP/(TP+FN)

F-score: 2PR/(P+R)

Accuracy = (TP+TN)/(TP+TN+FP+TN)

Why F-score? Accuracy?

GoldSystem

+ -

+ TP FP

- FN TN

Evaluation ExampleConfusion matrix:

Precision: 1/(1+4)=1/5

Recall: TP/(TP+FN)=1/6

F-score: 2PR/(P+R)=2*1/5*1/6/(1/5+1/6)=2/11

Accuracy = 91%

GoldSystem

+ -

+ 1 4

- 5 90

Evaluation ExampleConfusion matrix:

Precision:

GoldSystem

+ -

+ 1 4

- 5 90

Evaluation ExampleConfusion matrix:

Precision: 1/(1+4)=1/5

Recall: TP/(TP+FN)

GoldSystem

+ -

+ 1 4

- 5 90

Evaluation ExampleConfusion matrix:

Precision: 1/(1+4)=1/5

Recall: TP/(TP+FN)=1/6

F-score: 2PR/(P+R)=

GoldSystem

+ -

+ 1 4

- 5 90

Evaluation ExampleConfusion matrix:

Precision: 1/(1+4)=1/5

Recall: TP/(TP+FN)=1/6

F-score: 2PR/(P+R)=2*1/5*1/6/(1/5+1/6)=2/11

Accuracy = 91%

GoldSystem

+ -

+ 1 4

- 5 90

Classification Problem Steps

Input processing:Split data into training/dev/test

Classification Problem Steps

Input processing:Split data into training/dev/testConvert data into an Attribute-Value Matrix

Identify candidate featuresPerform feature selectionCreate AVM representation

Classification Problem Steps

Input processing:Split data into training/dev/testConvert data into an Attribute-Value Matrix

Identify candidate featuresPerform feature selectionCreate AVM representation

Training

Classification Problem Steps

Input processing:Split data into training/dev/testConvert data into an Attribute-Value Matrix

Identify candidate featuresPerform feature selectionCreate AVM representation

Training

Testing

Evaluation

Classification AlgorithmsWill be covered in detail in 572

Nearest Neighbor

Naïve Bayes

Decision Trees

Neural Networks

Maximum Entropy

Feature Design & Representation

What sorts of information do we want to encode?

Feature Design & Representation

What sorts of information do we want to encode?words, frequencies, ngrams, morphology, sentence

length, etc

Issue

Feature Design & Representation

What sorts of information do we want to encode?words, frequencies, ngrams, morphology, sentence

length, etc

Issue: Learning algorithms work on numbersMany work only on binary values (0/1)

Feature Design & Representation

What sorts of information do we want to encode?words, frequencies, ngrams, morphology, sentence

length, etc

Issue: Learning algorithms work on numbersMany work only on binary values (0/1)Others work on any real-valued input

How can we represent different information Numerically? Binary?

RepresentationWords/tags/ngrams/etc

RepresentationWords/tags/ngrams/etc

One feature per item:

RepresentationWords/tags/ngrams/etc

One feature per item: Binary: presence/absenceReal: counts

Binarizing numeric features:

RepresentationWords/tags/ngrams/etc

One feature per item: Binary: presence/absenceReal: counts

Binarizing numeric features:Single thresholdMultiple thresholdsBinning: 1 binary feature/bin

Feature TemplateExample: Prevword (or w-1)

Feature TemplateExample: Prevword (or w-1)

Template corresponds to many featurese.g. time flies like an arrow

Feature TemplateExample: Prevword (or w-1)

Template corresponds to many featurese.g. time flies like an arroww-1=<s>

w-1=time

w-1=flies

w-1=like

w-1=an…

Feature TemplateExample: Prevword (or w-1)

Template corresponds to many featurese.g. time flies like an arroww-1=<s>

w-1=time

w-1=flies

w-1=like

w-1=an…

Shorthand for: w-1=<s> 0 or w-1=time 1

AVM ExampleTime flies like an arrow

Note: this is a compact form of the true sparse vectorw-1=w 0 or 1, for w in |V|

w-1 w0 w-1w0 w+1 label

x1

x2

x3

AVM ExampleTime flies like an arrow

Note: this is a compact form of the true sparse vectorw-1=w 0 or 1, for w in |V|

w-1 w0 w-1w0 w+1 label

x1 <s> Time <s>Time

flies N

x2

x3

AVM ExampleTime flies like an arrow

Note: this is a compact form of the true sparse vectorw-1=w 0 or 1, for w in |V|

w-1 w0 w-1w0 w+1 label

x1 <s> Time <s>Time

flies N

x2 Time flies Time flies

like V

x3

AVM ExampleTime flies like an arrow

w-1 w0 w-1w0 w+1 label

x1 <s> Time <s>Time

flies N

x2 Time flies Time flies

like V

x3 flies like flies like an P

AVM ExampleTime flies like an arrow

Note: this is a compact form of the true sparse vectorw-1=w 0 or 1, for w in |V|

w-1 w0 w-1w0 w+1 label

x1 <s> Time <s>Time

flies N

x2 Time flies Time flies

like V

x3 flies like flies like an P

Example: NERNamed Entity tagging:

John visited New York last Friday [person John] visited [location New York] [time last

Friday]

As a classification problem

Example: NERNamed Entity tagging:

John visited New York last Friday [person John] visited [location New York] [time last

Friday]

As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-

B Friday/TIME-I

Example: NERNamed Entity tagging:

John visited New York last Friday [person John] visited [location New York] [time last

Friday]

As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-

B Friday/TIME-I

Input?

Example: NERNamed Entity tagging:

John visited New York last Friday [person John] visited [location New York] [time last

Friday]

As a classification problem John/PER-B visited/O New/LOC-B York/LOC-I last/TIME-B

Friday/TIME-I

Input?

Categories?

Example: CoreferenceQueen Elizabeth set about transforming her

husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Example: CoreferenceQueen Elizabeth set about transforming her

husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Can be viewed as a classification problem

Example: CoreferenceQueen Elizabeth set about transforming her

husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Can be viewed as a classification problem

What are the inputs?

Example: CoreferenceQueen Elizabeth set about transforming her

husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Can be viewed as a classification problem

What are the inputs?

What are the categories?

Example: CoreferenceQueen Elizabeth set about transforming her

husband, King George VI, into a viable monarch. Logue, a renowned speech therapist, was summoned to help the King overcome his speech impediment...

Can be viewed as a classification problem

What are the inputs?

What are the categories?

What features would be useful?

HW#7Viterbi!

Implement Viterbi algorithm

Use a standard, precomputed, presmoothed modelMade available after HW#6 handed in

Testing & EvaluationConvert output format to enable comparison

w/goldCompare to gold-standard and produce a score