Upload
vera-hanson
View
18
Download
2
Embed Size (px)
DESCRIPTION
Multi-class SVM with Negative Data Selection for Web Page Classification. Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference on Neural Networks 2004. Motivation. Several new websites are launched everyday Need to search fast and efficiently - PowerPoint PPT Presentation
Citation preview
Multi-class SVM with Negative Data Selection for Web Page
Classification
Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao
International Joint Conference on Neural Networks 2004
Motivation
• Several new websites are launched everyday
• Need to search fast and efficiently
• Search engines organize websites under topic hierarchy (taxonomy)
• Need a classifier: one-against-all SVM
• Catch: huge negative data increased training time
Negative Data Selection
Support vectors in the negative data are much similar to thepositive data than the other negative data
Negative Data Selection
1. Feature Selection: top n keywords from the positive data
2. All websites are represented as vectors of these top n keywords.
3. Cosine Similarity:
),(/),)(( ialliallkk catdocncattermdocnDF
m
im
kk
i
m
jj
inpc
km
kn
kp
kpDDSim
1
1
2
1
2
)
)(
()
)(
(),(
Negative Data Selection
• Plot similarity scores of negative to positive documents in descending order with negative documents
Sim
ilarit
y S
core
s in
D
esce
ndin
g or
der
Negative Documents
Convergence Point
Experiments
• Reuters dataset (10802 training, 565 test)
Class Number of Positive Data
Number of Negative Data
Crude 580 10222
Trade 475 10327
Dlr 162 10640
Nat-gas 92 10710
Acq 2357 8445
Experiments
Experiments