1

Data mining project poster

Embed Size (px)

Citation preview

Page 1: Data mining project poster

MISSION

Build a spam filter using semi-supervised learning

method.WHY?

Labelled data is usu-ally hard to obtain, we could use the unlabelled data as much as possi-

ble.HOW?

Using semi-supervised learning method to imple-ment the spam filter.

Spam

Ham

Labelled

Unlabelled

01We compared the performance of three semi-supervised learning methods (Self-training, EM-based and Graph-based) and chose the best.

02Since Self-training has the best performance of all, we compared the peroformance of its different learn-ers (Bayesian, Decision Tree and AdaBoost)

03We resembled the semi-super-vised learning method and super-vised learning method with Bag-ging.

References

1. Mark Culp. spa: A semi-supervised r package

for semi-parametric graph-based estim

ation.

Journal of Statistical Software.

2. Niamh Russell, Laura Cribbin, and

2. Niamh Russell, Laura Cribbin, and

Thomas Brendan Murphy. upclass: An

r package for updating model-based

classification rules..

3. Xiaojin Zhu. Sem

i-supervised

learning tutorial, 2007.

4. Xiaojin Zhu and Andrew

4. Xiaojin Zhu and Andrew

B Goldberg. Introduction

to semi-supervised

learning.

LabelledData

TrainClassifier

Apply onunlabedlleddata

EnhanceClassifier