Text Categorization With Support Vector Machines: Learning With Many Relevant Features

Text Categorization With Support Vector Machines: Learning With Many Relevant Features

By Thornsten JoachimsPresented By Meghneel Gore

Goal of Text Categorization

Classify documents into a number of pre-defined categories. Documents can be in multiple

categories Documents can be in none of the

categories

Applications of Text Categorization Categorization of news stories for

online retrieval Finding interesting information from

the WWW Guiding a user's search through

hypertext

Representation of Text

Removal of stop words Reduction of word to its stem Preparation of feature vector

Representation of Text

..........................................................................................................................................................

2 Comput1 Process2 Buy3 Memory....

This is a Document Vector

What's Next...

Appropriateness of support vector machines for this application

Support vector machine theory Conventional learning methods Experiments Results Conclusions

Why SVMs?

High dimensional input space Few irrelevant features Sparse document vectors Text categorization problems are

linearly separable

Support Vector Machines

Visualization of a Support Vector Machine

Support Vector Machines Structural risk minimization

ndn

dherrortrainherrorP 4

ln)12

(ln2)(_))((

Support Vector Machines We define a structure of hypothesis

spaces Hi such that their respective VC dimensions di increases

Support Vector Machines Lemma [Vapnik, 1982]

Consider hyperplanes

}{)( bdwsigndh

As hypotheses


Awwithbdw

,1

If all example vectors are contained in A hypersphere of radius R and it is Required that

Support Vector Machines Then this set of hyperplane has a

VC dimension d bounded by

1)],min([ 22 nARd

Minimize


Such that

w

ibdwy ii ,1][

Conventional Learning Methods Naïve Bayes classifier Rocchio algorithm K-nearest Neighbors Decision tree classifier

Naïve Bayes Classifier Consider a document vector with

attributes a1, a2… an with target values v Bayesian approach:

),,,(maxarg 21 njVv

map aaavPvj

Naïve Bayes Classifier We can rewrite that using Bayes

theorem as

)()...,(maxarg

)...,(

)()...,(maxarg

21

21

21

jjnVv

n

jjn

Vvmap

vPvaaaP

aaaP

vPvaaaPv

j

j

Naïve Bayes Classifier Naïve Bayes method assumes that

the attributes are independent

)""(

...)""()""()(maxarg

)()(maxarg

11

21},{

1},{

j

jjjdislikelikev

n

ijij

dislikelikevNB

vsnowaP

vhadaPvMaryaPvP

vaPvPv

j

j

Experiments

Datasets Performance measures Results

Datasets Reuters-21578 dataset

9603 training examples 3299 testing documents

Ohsumed Corpus 10000 training documents 10000 testing examples

Performance Measures

Precision Probability that a document predicted

to be in class ‘x’ truly belongs to that class

Recall Probability that a document belonging

to class ‘x’ is classified into that class Precision/recall breakeven point

Results

Precision/recall break-even point on Ohsumed dataset

Results

Precision/recall break-even point on Reuters dataset

Conclusions

Introduces SVMs for text categorization

Theoretical and empirical evidence that SVMs are well suited for text categorization

Consistent improvement in accuracy over other methods