PEBL: Web Page Classification without Negative Examples

Hwanjo Yu, Jiawei Han, Kevin Chen-Chuan Chang

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, JAN 2004

Outline

Problem statement Motivation Related work Main contribution Technical details Experiments Summary

Problem Statement

To classify web pages into “user-interesting” classes.

E.g. “Home-Page Classifier” “Call for Papers Classifier”

Negative Samples are not given specifically.

Positive and Unlabeled Samples.

Motivation

Collecting Negative Samples may be delicate and arduous Negative samples must uniformly represent

the universal set. Manually collected negative training examples

could be biased. Predefined classes usually do not match

users’ diverse and changing search targets.

Challenges Collecting unbiased unlabeled data from

universal set. Random Sampling of web pages on Internet.

Achieving classification accuracy from positive unlabeled data as high as from labeled data. PEBL framework (Mapping-Convergence

Algorithm using SVM).

Related Work Semisupervised Learning

Requires sample of labeled (+/-) and unlabeled data EM algorithm Transductive SVM

Single-Class Learning or Classification Rule-based (k-DNF)

Not tolerant to sparse, high-dimensionality. Requires knowledge of proportion of positive instances in the

universal set. Probability-based

Requires prior probabilities for each class. Assumes linear separation.

OSVM, Neural Networks

Main Contribution Collection of just positive samples speeds

up the process of building classifiers. The universal set of unlabeled samples can

be reused for training different classifiers. This supports example based query on

internet. PEBL achieves accuracy as high as that of

a typical framework w/o loss of efficiency in testing.

SVM Overview

Mapping-Convergence Algorithm Mapping Stage

A weak classifier (1) that draws an initial approximation of “strong” negative data.

1 must not generate false negatives. Convergence Stage

Runs in iteration using a second base classifier (2) that maximizes the margin to make progressively better approximation of negative data.

2 must maximize margin.

Mapping Stage

Checking the frequency of the features within positive and unlabeled samples gives us a list of positive features.

Filter out all the samples having positive features leaving behind just the “strong” negative samples.

Mapping-Convergence Algorithm

Experiments LIBSVM for SVM implementation. Gaussian Kernels for better text categorization

accuracy. Experiment1: The Internet

2388 pages from DMOZ - unlabeled dataset 368 personal homepages, 449 non-homepages 192 college admission pages, 450 non-admission 188 resume pages, 533 non-resume pages

Experiments Experiment2: University CS Department

4093 pages from WebKB - unlabeled dataset 1641 student pages, 662 non-student pages 504 project pages, 753 non-project pages 1124 faculty pages, 729 non-faculty pages

Precision-Recall (P-R) breakeven point is used as the performance measure.

Compared against TSVM: Traditional SVM OSVM: One-Class SVM UN: treating unlabeled data as negative instances

Experiments

Summary

Classifying web pages of interesting class requires laborious preprocessing.

PEBL framework eliminates the need for negative training samples.

M-C algorithm achieves accuracy as high as traditional SVM.

Additional multiplicative logarithmic factor in training time on top of SVM.

PEBL: Web Page Classification without Negative Examples

Documents

· Web view2.Explain and give examples of merit goods, demerit goods, and public goods. 3.Define, distinguish between, illustrate, and give examples of positive and negative externalities

Integers: are positive and negative whole numbers Examples: …-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5…

PEBL: Web Page Classification without Negative Examples

Rules sports star queen ruler Names and 1 2 · 2015-02-24 · 5 5 Grammar Past simple: positive and negative (regular verbs) 6 Study the examples and find their negative forms in

EXAMPLES OF RIEMANNIAN MANIFOLDS WITH NON-NEGATIVEwziller/papers/survey... · EXAMPLES OF RIEMANNIAN MANIFOLDS WITH NON-NEGATIVE SECTIONAL CURVATURE WOLFGANG ZILLER Manifolds with

More Algebra! Oh no!. Examples: 6 * 3 = 18, positive * positive = positive -8 * 5 = -40, negative * positive = negative -7.5 * -2.8 = 21, negative * negative

PEBL C -2 Unit 1: Media

D TIC 10 ELECTE Il · For example, are the examples selected from a finite or infinite subset of the target language, and are they presented once or many times? Are negative examples

The PEBL Manualpebl.sourceforge.net/peblmanual.pdf · Chapter 1 About PEBL (Psychology Experiment Building Language) is a cross-platform, open-source programming language and execution

Journal Entry: Is ambition a positive or negative trait? Can you give examples of both?

Positive Examples Negative Examples Piano Positive Examples Negative Examples Piano Violin

The Art of Negative Space - 25 Amazing Examples

Learning from Negative Examples in Set-Expansion

Southmoor Academy Professional Development€¦ · Web viewAll classroom based members of staff participate in PEBL Groups where they are supported to carry out individual professional

Motorola PEBL U6 for T-Mobile

PEBL B-1. Spotlight on Culture The World Today Class 2: Review

Name: Period: Laws of Exponents: Negative …Name: Period: Laws of Exponents: Negative Exponents Examples: For any nonzero number raised to a negative exponent, rewrite only that Reciprocal

PeBL: A Next Generation Mobile Platform for Adaptive ... · PeBL eBooks include Experience API (xAPI)-based reporting ... include “field notebooks” with embedded resources and

PEBL C-2 Unit 1: Media. SPOTLIGHT ON MEDIA Exploring the World Through Media Class 2: Review

Halal and Healthiness of Snacks Analysis for Creative Economic … · 20 Cilok Negative Negative Negative 21 Sempol Negative Negative - 22 Pentol Batagor Negative Negative 23 Tahu