Upload
teige
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
PEBL: Web Page Classification without Negative Examples. Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti. Introduction. - PowerPoint PPT Presentation
Citation preview
PEBL: Web Page Classification without
Negative Examples
Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan
IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004
Presented by Chirayu Wongchokprasitti
Introduction
Web page classification is one of the main techniques for Web mining
Constructing a classifier requires positive and negative training examples
Cautious to avoid bias and laborious to collect negative training examples
Typical Learning Framework
Positive Example Base Learning (PEBL) Framework
Learn from positive data and unlabeled data
Unlabeled data indicates random samples of the universal set
Apply the Mapping-Convergence (M-C) Algorithm
Mapping-Convergence (M-C) Algorithm
Divide into 2 stages Mapping stage
Use any classifier that does not generate false negatives
They chose 1-DNF ( monotone Disjunctive Normal Form)
Convergence stage For maximizing margin They chose SVM (Support Vector Machine)
Mapping Stage
Use a weak classifier to draw an initial approximation of “strong” negative data.
First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features.
If feature frequency in positive data is larger than one in the universal data, it is a strong positive
Filter out any possible positive, leaving only strong negatives.
Convergence Stage
Use SVM to scope down the class boundary Iterate SVM for certain times to extract
negative data from unlabeled data The boundary will converge into the true
boundary.
Support Vector Machines
Visualization of a Support Vector Machine
Convergence of SVM
Data Flow Diagram
Experimental Results
Report the result with precision-recall breakeven point (P-R)
Experiment 1: the Internet Use DMOZ as the universal set
Experiment 2: University CS department Use WebKB data set
Mixture Models
Experiment 1
Experiment 2
Mixture Models
Summary and Conclusions
PEBL framework eliminates the need for manually collecting negative training examples
The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM
PEBL needs faster training time