51
Online Learning by Projecting: From Theory to Large Scale Web-spam filtering Yoram Singer Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta (Google), Joseph Keshet (HUJI), Andrew Ng (Stanford), Shai Shalev-Shwartz (HUJI) ased on joint work with: UT Austin AIML Seminar, Jan. 27, 200

Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

  • Upload
    yardley

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Online Learning by Projecting: From Theory to Large Scale Web-spam filtering. Yoram Singer. Based on joint work with:. Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta (Google), Joseph Keshet (HUJI), Andrew Ng (Stanford), Shai Shalev-Shwartz (HUJI). - PowerPoint PPT Presentation

Citation preview

Page 1: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Online Learning by Projecting:From Theory to Large Scale

Web-spam filtering

Yoram Singer

Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta (Google), Joseph Keshet (HUJI),

Andrew Ng (Stanford), Shai Shalev-Shwartz (HUJI)

Based on joint work with:

UT Austin AIML Seminar, Jan. 27, 2005

Page 2: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Online Binary Classification

No animal eats bees

Pearls melt in vinegar

Dr. Seuss finished Dartmouth

There are weapons of mass destruction in Iraq

True

False

True

Page 3: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Binary Classification

• Instances (documents, signals):

• Labels (true/false, good/bad):

• Classification and Prediction:

• Mistakes and losses:

Page 4: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Online Binary Classification• Initialize your classifier ( )

• For t = 1,2,3,…,T,…

• Receive an instance:

• Predict label:

• Receive true label: [suffer “loss”/error]

• Update classifier ( )

Goal: suffer small losses while learning

Page 5: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Why Online?

• Adaptive• Simple to implement• Fast, small memory footprint• Can be converted to batch learning (O2B)• Formal guarantees

• But: might not be as effective as a well designed batch learning algorithms

Page 6: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Linear Classifiers & Margins

• The prediction is formed as follows:

• The margin of an example w.r.t

Positive Margin

Negative Margin

Page 7: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Separability Assumption

Page 8: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Classifier Update - Passive Mode

Page 9: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Prediction & Margin Errors

Page 10: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Hinge Loss

Page 11: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Version Space

In case of a prediction mistakethen must reside

Page 12: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Mistake Aggressive Mode

is projected onto thefeasible (dual) space

Page 13: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Passive-Aggressive Update

Page 14: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Three Decision Problems:A Unified View

Classification Regression Uniclass

Page 15: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

The Generalized PA Algorithm• Each example induces a set of consistent

hypotheses (half-space, hyper-slub, ball)• The new vector is set to be the

projection of onto set of consistent hyp.

Classification Regression Uniclass

Page 16: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Loss Bound (Classification)

• If there exists such that

• Then

where

PA makes a bounded number of mistakes

Page 17: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Proof Sketch

• Define:

• Upper bound:

• Lower bound:

Lipschitz Condition

Page 18: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Proof Sketch (Cont.)

• Combining upper and lower bounds

• L=B for classification and regression• L=1 for uniclass

Page 19: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Unrealizable Case

???

Page 20: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Unrealizable Case (Classification)

PA-I

PA-II

Page 21: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

(Not-really) Aggressive Updates

Page 22: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Mistake Bound for PA-I• Loss suffered by PA-I on round t:

• Loss suffered by any fixed vector:

• #Mistakes made by PA-I is at most:

Page 23: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Loss Bound for PA-II• Loss suffered by PA-II on round t:

• Loss suffered by any fixed vector:

• Cumulative loss ( ) of PA-II is at most:

Page 24: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Beyond Binary Decision Problems

• Applications and generalizations of PA:

• Multiclass categorization

• Topic ranking and filtering

• Hierarchical classification

• Sequence learning (Markov Networks)

• Segmentation of sequences

• Learning of pseudo-metrics

Page 25: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Movie Recommendation System

Recommender System

Page 26: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Recommending by Projecting

w

1 2 3 4

1b 2b 3b

• Project

• Apply Thresholds

Page 27: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Prank Update

w

51 3

3b 4b1b 2b

2 4

Rank Levels

Thresholds

Page 28: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Prank Update

xw⋅

w

51 3

3b 4b1b 2b

2 4

Page 29: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

PRank

w

51 3

3b 4b1b 2b

2 4

Correct RankInterval

Page 30: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Prank Update

w

51 3

3b 4b1b 2b

2 4

{2, 3}

Page 31: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

PRank Update

w

3b 4b1b 2b

Page 32: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

PRank Update

w

x

w

Page 33: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

• 74424 registered Viewers• 1648 listed Movies• Viewers rated subsets of movies• Demo: online movie recommendation

EachMovie Database

Page 34: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

PA@Google: Web Spam Filtering

[With Vineet Gupta]

• Query: “hotels palo alto”• Spammers:• Cardinal Hotel - Palo Alto - Reviews of Cardinal Hotel...

Palo Alto, California 94301 United States. Deals on Palo Alto hotels. ... More Palo Altohotels. ... Research other Palo Alto hotels. Is this hotel not right for you? ...www.tripadvisor.com/Hotel_Review-g32849-d79154-…

• Palo Alto Hotels - Cheap Hotels - Palo Alto Hotels ... Book Palo Alto Hotels Online or Call Toll Free 1-800-359-7234. ... Keywords: Palo AltoHotel Discounts - Cheap Hotels in Palo Alto. Hotels In Palo Alto. ... www.hotelsbycity.com/california/hotels-palo-alto-…

Page 35: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Enhancements for Web Spam • Various “signals” features • Design of special kernels• Multi-tier feedback (label):• +2 navigational site (e.g. www.stanford.edu)• +1 on topic• -1 off topic• -2 nuke the spammer

• Loss is sensitive to site label• Algorithmic modifications due to scale:• Online-to-batch conversions• Re-projections of old examples

• Part of a recent revision to search (Google3)

Page 36: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Web Spam Filtering - Results • Specific queries and domains are heavily spammed:• Over 50% of the returned URL for travel search• Certain countries are more spam prone

• Training set size: over half a million domains• Training time: 2 hours to 5 days • Test set size: the entire web crawled by Google

(over 100 million domains)

• A few hours to filter all domains on 100’s of cpus• Current reduction achieved (estimate): 50% of

spammers

Page 37: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Summary

• Unified online framework for decision problems• Simple and efficient algorithms (“kernelizable”)• Analyses for realizable and unrealizable cases • Numerous applications• Batch learning conversions & generalization• Generalizations using general Bregman projections• Approximate projections for large scale problems• Applications of PA to other decision problems

Page 38: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Related Work

• Projections Onto Convex Sets (POCS):• Y. Censor & S.A. Zenios, “Parallel Optimization”

(Hildreth’s projection algorithm), Oxford UP, 1997• H.H. Bauschke & J.M. Borwein,

“On Projection Algorithms for Solving Convex Feasibility Problems”, SIAM Review, 1996

• Online Learning:• M. Herbster, “Learning additive models online with fast

evaluating kernels”, COLT 2001

• J. Kivinen, A. Smola, and R.C. Williamson, “Online learning with kernels”, IEEE Trans. on SP, 2004

Page 39: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Relevant Publications• Online Passive Aggressive Algorithms, CDSS’03 CSKSS’05• Family of Additive Online Algorithms for Category Ranking, CS’03• Ultraconservative Online Algorithms for Multiclass Problems, CS’02 CS’03• On the algorithmic implementation of Multiclass SVM, CS’03 • PRanking with Ranking, CS’01 CS’04• Large Margin Hierarchical Classification, DKS’04• Learning to Align Polyphonic Music, SKS’04• Online and Batch Learning of Pseudo-metrics, SSN’04• The Power of Selective Memory:• A Temporal Kernel-Based Model for Tracking Hand-Movements from Neural

Activities, SCPVS’04

• Self-Bounded Learning of Prediction Suffix Trees, DSS’04

Page 40: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Hierarchical Classification:Motivation

Phonetic transcription of DECEMBER

Gross erorr

Small errors

T ix s eh m bcl b er

d AE s eh m bcl b er

d ix s eh NASAL bcl b er

Page 41: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Phonetic Hierarchy

b g

PHONEMES

Sononorants

Silences

ObstruentsNasals

Liquids

Vowels

Plosives FricativesFront Center Back

n m ng

d k p t

f v sh s thdhzh z

l y w r Affricates

jh ch

oyowuhuwaaao eraway

iy ih eyehae

Page 42: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Common Constructions

• Ignore the hierarchy - solve as multiclassCC

• A greedy approach: solve a multiclass problem at each node

CC

CC CC

Page 43: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Hierarchical Classifier

• Assume and

• Associate a prototype

with each label

• Classification rule:

W4 W5 W6 W7 W8

W9 W10

W1

W0

W2

W3

Page 44: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Hierarchical Classifier (cont.)

• Define

W4 W5 W6 W7 W8

W9 W10

W1

W0

W2

W3

Page 45: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

A Metric Over Labels

bb

aa

• A given hierarchy defines a metric over the set of labels via graph distance

Page 46: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

From PA to Hieron• Replace a simple margin constraint with a

tree-based margin constraint:

- correct label - predicted label

Page 47: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Hieron - Update

w4 w5 w6 w7 w8

w9 w10

w1 w2

w3

Page 48: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Hieron - Update

w6 w7

w10

Page 49: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Sample Run on Synthetic Data

The hierarchy given to the algorithm

An edge indicates that prototypes are “close”

QuickTime™ and aVideo decompressor

are needed to see this picture.

Page 50: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Experiments with Hieron

Datasets used

• Compared two models: Hieron with knowledge of the correct hierarchy Hieron without knowledge of the correct hierarchy (flat)

# train # test # labels depth

DMOZ (web pages) 8576 4-FCV 316 8

Speech (phonemes) 80000 20000 40 4

Synthetic data 12100 6050 121 4

Page 51: Online Learning by Projecting: From Theory to Large Scale Web-spam filtering

Experimental Results

• Each graph shows the difference between the error histograms of the two models

• Hieron makes fewer “gross” mistakes

• State-of-the-art results for frame-based phoneme

classification

DMOZ Phoneme (TIMIT) Synthetic