Upload
yardley
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Online Learning by Projecting: From Theory to Large Scale Web-spam filtering. Yoram Singer. Based on joint work with:. Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta (Google), Joseph Keshet (HUJI), Andrew Ng (Stanford), Shai Shalev-Shwartz (HUJI). - PowerPoint PPT Presentation
Citation preview
Online Learning by Projecting:From Theory to Large Scale
Web-spam filtering
Yoram Singer
Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta (Google), Joseph Keshet (HUJI),
Andrew Ng (Stanford), Shai Shalev-Shwartz (HUJI)
Based on joint work with:
UT Austin AIML Seminar, Jan. 27, 2005
Online Binary Classification
No animal eats bees
Pearls melt in vinegar
Dr. Seuss finished Dartmouth
There are weapons of mass destruction in Iraq
True
False
True
Binary Classification
• Instances (documents, signals):
• Labels (true/false, good/bad):
• Classification and Prediction:
• Mistakes and losses:
Online Binary Classification• Initialize your classifier ( )
• For t = 1,2,3,…,T,…
• Receive an instance:
• Predict label:
• Receive true label: [suffer “loss”/error]
• Update classifier ( )
Goal: suffer small losses while learning
Why Online?
• Adaptive• Simple to implement• Fast, small memory footprint• Can be converted to batch learning (O2B)• Formal guarantees
• But: might not be as effective as a well designed batch learning algorithms
Linear Classifiers & Margins
• The prediction is formed as follows:
• The margin of an example w.r.t
Positive Margin
Negative Margin
Separability Assumption
Classifier Update - Passive Mode
Prediction & Margin Errors
Hinge Loss
Version Space
In case of a prediction mistakethen must reside
Mistake Aggressive Mode
is projected onto thefeasible (dual) space
Passive-Aggressive Update
Three Decision Problems:A Unified View
Classification Regression Uniclass
The Generalized PA Algorithm• Each example induces a set of consistent
hypotheses (half-space, hyper-slub, ball)• The new vector is set to be the
projection of onto set of consistent hyp.
Classification Regression Uniclass
Loss Bound (Classification)
• If there exists such that
• Then
where
PA makes a bounded number of mistakes
Proof Sketch
• Define:
• Upper bound:
• Lower bound:
Lipschitz Condition
Proof Sketch (Cont.)
• Combining upper and lower bounds
• L=B for classification and regression• L=1 for uniclass
Unrealizable Case
???
Unrealizable Case (Classification)
PA-I
PA-II
(Not-really) Aggressive Updates
Mistake Bound for PA-I• Loss suffered by PA-I on round t:
• Loss suffered by any fixed vector:
• #Mistakes made by PA-I is at most:
Loss Bound for PA-II• Loss suffered by PA-II on round t:
• Loss suffered by any fixed vector:
• Cumulative loss ( ) of PA-II is at most:
Beyond Binary Decision Problems
• Applications and generalizations of PA:
• Multiclass categorization
• Topic ranking and filtering
• Hierarchical classification
• Sequence learning (Markov Networks)
• Segmentation of sequences
• Learning of pseudo-metrics
Movie Recommendation System
Recommender System
Recommending by Projecting
w
1 2 3 4
1b 2b 3b
• Project
• Apply Thresholds
Prank Update
w
51 3
3b 4b1b 2b
2 4
Rank Levels
Thresholds
Prank Update
xw⋅
w
51 3
3b 4b1b 2b
2 4
PRank
w
51 3
3b 4b1b 2b
2 4
Correct RankInterval
Prank Update
w
51 3
3b 4b1b 2b
2 4
{2, 3}
PRank Update
w
3b 4b1b 2b
PRank Update
w
x
w
• 74424 registered Viewers• 1648 listed Movies• Viewers rated subsets of movies• Demo: online movie recommendation
EachMovie Database
PA@Google: Web Spam Filtering
[With Vineet Gupta]
• Query: “hotels palo alto”• Spammers:• Cardinal Hotel - Palo Alto - Reviews of Cardinal Hotel...
Palo Alto, California 94301 United States. Deals on Palo Alto hotels. ... More Palo Altohotels. ... Research other Palo Alto hotels. Is this hotel not right for you? ...www.tripadvisor.com/Hotel_Review-g32849-d79154-…
• Palo Alto Hotels - Cheap Hotels - Palo Alto Hotels ... Book Palo Alto Hotels Online or Call Toll Free 1-800-359-7234. ... Keywords: Palo AltoHotel Discounts - Cheap Hotels in Palo Alto. Hotels In Palo Alto. ... www.hotelsbycity.com/california/hotels-palo-alto-…
Enhancements for Web Spam • Various “signals” features • Design of special kernels• Multi-tier feedback (label):• +2 navigational site (e.g. www.stanford.edu)• +1 on topic• -1 off topic• -2 nuke the spammer
• Loss is sensitive to site label• Algorithmic modifications due to scale:• Online-to-batch conversions• Re-projections of old examples
• Part of a recent revision to search (Google3)
Web Spam Filtering - Results • Specific queries and domains are heavily spammed:• Over 50% of the returned URL for travel search• Certain countries are more spam prone
• Training set size: over half a million domains• Training time: 2 hours to 5 days • Test set size: the entire web crawled by Google
(over 100 million domains)
• A few hours to filter all domains on 100’s of cpus• Current reduction achieved (estimate): 50% of
spammers
Summary
• Unified online framework for decision problems• Simple and efficient algorithms (“kernelizable”)• Analyses for realizable and unrealizable cases • Numerous applications• Batch learning conversions & generalization• Generalizations using general Bregman projections• Approximate projections for large scale problems• Applications of PA to other decision problems
Related Work
• Projections Onto Convex Sets (POCS):• Y. Censor & S.A. Zenios, “Parallel Optimization”
(Hildreth’s projection algorithm), Oxford UP, 1997• H.H. Bauschke & J.M. Borwein,
“On Projection Algorithms for Solving Convex Feasibility Problems”, SIAM Review, 1996
• Online Learning:• M. Herbster, “Learning additive models online with fast
evaluating kernels”, COLT 2001
• J. Kivinen, A. Smola, and R.C. Williamson, “Online learning with kernels”, IEEE Trans. on SP, 2004
Relevant Publications• Online Passive Aggressive Algorithms, CDSS’03 CSKSS’05• Family of Additive Online Algorithms for Category Ranking, CS’03• Ultraconservative Online Algorithms for Multiclass Problems, CS’02 CS’03• On the algorithmic implementation of Multiclass SVM, CS’03 • PRanking with Ranking, CS’01 CS’04• Large Margin Hierarchical Classification, DKS’04• Learning to Align Polyphonic Music, SKS’04• Online and Batch Learning of Pseudo-metrics, SSN’04• The Power of Selective Memory:• A Temporal Kernel-Based Model for Tracking Hand-Movements from Neural
Activities, SCPVS’04
• Self-Bounded Learning of Prediction Suffix Trees, DSS’04
Hierarchical Classification:Motivation
Phonetic transcription of DECEMBER
Gross erorr
Small errors
T ix s eh m bcl b er
d AE s eh m bcl b er
d ix s eh NASAL bcl b er
Phonetic Hierarchy
b g
PHONEMES
Sononorants
Silences
ObstruentsNasals
Liquids
Vowels
Plosives FricativesFront Center Back
n m ng
d k p t
f v sh s thdhzh z
l y w r Affricates
jh ch
oyowuhuwaaao eraway
iy ih eyehae
Common Constructions
• Ignore the hierarchy - solve as multiclassCC
• A greedy approach: solve a multiclass problem at each node
CC
CC CC
Hierarchical Classifier
• Assume and
• Associate a prototype
with each label
• Classification rule:
W4 W5 W6 W7 W8
W9 W10
W1
W0
W2
W3
Hierarchical Classifier (cont.)
• Define
•
W4 W5 W6 W7 W8
W9 W10
W1
W0
W2
W3
A Metric Over Labels
bb
aa
• A given hierarchy defines a metric over the set of labels via graph distance
From PA to Hieron• Replace a simple margin constraint with a
tree-based margin constraint:
- correct label - predicted label
Hieron - Update
w4 w5 w6 w7 w8
w9 w10
w1 w2
w3
Hieron - Update
w6 w7
w10
Sample Run on Synthetic Data
The hierarchy given to the algorithm
An edge indicates that prototypes are “close”
QuickTime™ and aVideo decompressor
are needed to see this picture.
Experiments with Hieron
Datasets used
• Compared two models: Hieron with knowledge of the correct hierarchy Hieron without knowledge of the correct hierarchy (flat)
# train # test # labels depth
DMOZ (web pages) 8576 4-FCV 316 8
Speech (phonemes) 80000 20000 40 4
Synthetic data 12100 6050 121 4
Experimental Results
• Each graph shows the difference between the error histograms of the two models
• Hieron makes fewer “gross” mistakes
• State-of-the-art results for frame-based phoneme
classification
DMOZ Phoneme (TIMIT) Synthetic