15
A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Embed Size (px)

Citation preview

Page 1: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples

Dell Zhang (BBK) and Wee Sun Lee (NUS)

Page 2: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Problem

Supervised Learning

Page 3: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Problem

Semi-Supervised Learning

Page 4: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Problem

PU Learning

Page 5: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Problem

Unlabeled Examples Help

Page 6: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Problem

PU Learning To distinguish

the interesting instances (the positive class C+) with

other instances (the negative class C-)

by learning a classifier from a set of positive examples P and a set of unlabeled examples U

There is no labeled negative example!

Page 7: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Applications To automatically filter web pages according to a user's

preference the browsed or bookmarked pages can be used as positive examples while unlabeled examples can be easily collected from the web

To automatically find machine learning literature the ICML papers can be used as positive examples while unlabeled examples can be easily collected from the ACM or IEEE

digital library To automatically identify cancer patients

the patients known to have cancers can be used as positive examples while unlabeled examples can be easily collected from the patient

database To automatically discover future customers for direct

marketing the current customers of the company can be used as positive examples while unlabeled examples can be purchased at a low cost compared with

obtaining negative examples ……

Page 8: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Approaches Existing Approaches

PNB (Denis et al. 2002); PNCT (Denis et al. 2003)

S-EM (Liu et al. 2002); RC-SVM (Li & Liu 2003)

PEBL (Yu et al. 2004); SVMC (Yu 2005) PN-SVM (Fung et al. 2005) W-LR (Lee & Liu 2003); B-SVM (Liu et al.

2003) Our Proposed Approach

B-Pr

Page 9: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Our Approach

Cx

Cx

p

1 pP

U1

Pr[ | ] Pr[ | ](1 )P C p x x

Pr[ | ] Pr[ | ] Pr[ | ]U C p C x x x

A Probabilistic Model

Page 10: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Our Approach

1Pr[ | ] Pr[ | ] Pr[ | ] Pr[ | ]

1

pC C P U

p

x x x x

( ) sgn Pr[ | ] Pr[ | ]f b P U x x x

( ) sgn Pr[ | ] Pr[ | ]f x C C x x

(1 ) (1 )b p p

Page 11: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Our Approach

Biased PrTFIDF (B-Pr) Estimate

PrTFIDF (Joachims 1997) Estimmate

Maximize On a held-out validation set (Lee & Liu 2003)

Linear Time Complexity!

b2Pr[ ] Pr[ ( ) 1]pr C r f x

Pr[ | ] and Pr[ | ]P Ux x

Page 12: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Experiments

Reuters-21578

B-Pr>RC-SVM>PEBL (p=0.55)

RC-SVM>B-Pr>PEBL (p=0.85)

Page 13: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Experiments

20NewsGroups

B-Pr>W-LR>S-EM (p=0.3)

B-Pr>W-LR>S-EM (p=0.7)

Page 14: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Conclusion

A New Approach to Learning from Positive and Unlabeled Examples As effective as the state-of-the-art

approaches Yet simpler and faster

Page 15: A Simple Probabilistic Approach to Learning from Positive and Unlabeled Examples Dell Zhang (BBK) and Wee Sun Lee (NUS)

Thank you

Questions? Comments? Suggestions? ……