15
PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti

PEBL: Web Page Classification without Negative Examples

  • Upload
    teige

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

PEBL: Web Page Classification without Negative Examples. Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: PEBL: Web Page Classification without Negative Examples

PEBL: Web Page Classification without

Negative Examples

Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan

IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004

Presented by Chirayu Wongchokprasitti

Page 2: PEBL: Web Page Classification without Negative Examples

Introduction

Web page classification is one of the main techniques for Web mining

Constructing a classifier requires positive and negative training examples

Cautious to avoid bias and laborious to collect negative training examples

Page 3: PEBL: Web Page Classification without Negative Examples

Typical Learning Framework

Page 4: PEBL: Web Page Classification without Negative Examples

Positive Example Base Learning (PEBL) Framework

Learn from positive data and unlabeled data

Unlabeled data indicates random samples of the universal set

Apply the Mapping-Convergence (M-C) Algorithm

Page 5: PEBL: Web Page Classification without Negative Examples

Mapping-Convergence (M-C) Algorithm

Divide into 2 stages Mapping stage

Use any classifier that does not generate false negatives

They chose 1-DNF ( monotone Disjunctive Normal Form)

Convergence stage For maximizing margin They chose SVM (Support Vector Machine)

Page 6: PEBL: Web Page Classification without Negative Examples

Mapping Stage

Use a weak classifier to draw an initial approximation of “strong” negative data.

First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features.

If feature frequency in positive data is larger than one in the universal data, it is a strong positive

Filter out any possible positive, leaving only strong negatives.

Page 7: PEBL: Web Page Classification without Negative Examples

Convergence Stage

Use SVM to scope down the class boundary Iterate SVM for certain times to extract

negative data from unlabeled data The boundary will converge into the true

boundary.

Page 8: PEBL: Web Page Classification without Negative Examples

Support Vector Machines

Visualization of a Support Vector Machine

Page 9: PEBL: Web Page Classification without Negative Examples

Convergence of SVM

Page 10: PEBL: Web Page Classification without Negative Examples

Data Flow Diagram

Page 11: PEBL: Web Page Classification without Negative Examples

Experimental Results

Report the result with precision-recall breakeven point (P-R)

Experiment 1: the Internet Use DMOZ as the universal set

Experiment 2: University CS department Use WebKB data set

Mixture Models

Page 12: PEBL: Web Page Classification without Negative Examples

Experiment 1

Page 13: PEBL: Web Page Classification without Negative Examples

Experiment 2

Page 14: PEBL: Web Page Classification without Negative Examples

Mixture Models

Page 15: PEBL: Web Page Classification without Negative Examples

Summary and Conclusions

PEBL framework eliminates the need for manually collecting negative training examples

The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM

PEBL needs faster training time