24
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore

Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Embed Size (px)

DESCRIPTION

Text Categorization With Support Vector Machines: Learning With Many Relevant Features. By Thornsten Joachims Presented By Meghneel Gore. Goal of Text Categorization. Classify documents into a number of pre-defined categories. Documents can be in multiple categories - PowerPoint PPT Presentation

Citation preview

Page 1: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Text Categorization With Support Vector Machines: Learning With Many Relevant Features

By Thornsten JoachimsPresented By Meghneel Gore

Page 2: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Goal of Text Categorization

Classify documents into a number of pre-defined categories. Documents can be in multiple

categories Documents can be in none of the

categories

Page 3: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Applications of Text Categorization Categorization of news stories for

online retrieval Finding interesting information from

the WWW Guiding a user's search through

hypertext

Page 4: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Representation of Text

Removal of stop words Reduction of word to its stem Preparation of feature vector

Page 5: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Representation of Text

..........................................................................................................................................................

2 Comput1 Process2 Buy3 Memory....

This is a Document Vector

Page 6: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

What's Next...

Appropriateness of support vector machines for this application

Support vector machine theory Conventional learning methods Experiments Results Conclusions

Page 7: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Why SVMs?

High dimensional input space Few irrelevant features Sparse document vectors Text categorization problems are

linearly separable

Page 8: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Support Vector Machines

Visualization of a Support Vector Machine

Page 9: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Support Vector Machines Structural risk minimization

ndn

dherrortrainherrorP 4

ln)12

(ln2)(_))((

Page 10: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Support Vector Machines We define a structure of hypothesis

spaces Hi such that their respective VC dimensions di increases

Page 11: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Support Vector Machines Lemma [Vapnik, 1982]

Consider hyperplanes

}{)( bdwsigndh

As hypotheses

Page 12: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Support Vector Machines

Awwithbdw

,1

If all example vectors are contained in A hypersphere of radius R and it is Required that

Page 13: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Support Vector Machines Then this set of hyperplane has a

VC dimension d bounded by

1)],min([ 22 nARd

Page 14: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Minimize

Support Vector Machines

Such that

w

ibdwy ii ,1][

Page 15: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Conventional Learning Methods Naïve Bayes classifier Rocchio algorithm K-nearest Neighbors Decision tree classifier

Page 16: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Naïve Bayes Classifier Consider a document vector with

attributes a1, a2… an with target values v Bayesian approach:

),,,(maxarg 21 njVv

map aaavPvj

Page 17: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Naïve Bayes Classifier We can rewrite that using Bayes

theorem as

)()...,(maxarg

)...,(

)()...,(maxarg

21

21

21

jjnVv

n

jjn

Vvmap

vPvaaaP

aaaP

vPvaaaPv

j

j

Page 18: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Naïve Bayes Classifier Naïve Bayes method assumes that

the attributes are independent

)""(

...)""()""()(maxarg

)()(maxarg

11

21},{

1},{

j

jjjdislikelikev

n

ijij

dislikelikevNB

vsnowaP

vhadaPvMaryaPvP

vaPvPv

j

j

Page 19: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Experiments

Datasets Performance measures Results

Page 20: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Datasets Reuters-21578 dataset

9603 training examples 3299 testing documents

Ohsumed Corpus 10000 training documents 10000 testing examples

Page 21: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Performance Measures

Precision Probability that a document predicted

to be in class ‘x’ truly belongs to that class

Recall Probability that a document belonging

to class ‘x’ is classified into that class Precision/recall breakeven point

Page 22: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Results

Precision/recall break-even point on Ohsumed dataset

Page 23: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Results

Precision/recall break-even point on Reuters dataset

Page 24: Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Conclusions

Introduces SVMs for text categorization

Theoretical and empirical evidence that SVMs are well suited for text categorization

Consistent improvement in accuracy over other methods