19
1 Introduction to Support Vector Machines for Data Mining Mahdi Nasereddin Ph.D. Pennsylvania State University School of Information Sciences and Technology

Introduction to Support Vector Machines for Data Mining

  • Upload
    ranit

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction to Support Vector Machines for Data Mining. Mahdi Nasereddin Ph.D. Pennsylvania State University School of Information Sciences and Technology. Agenda. Introduction Support Vector Machines Preliminary Experimentation Conclusion Questions?. Data Mining Techniques:. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Support Vector Machines for Data Mining

1

Introduction to Support Vector Machines for Data Mining

Mahdi Nasereddin Ph.D.

Pennsylvania State University

School of Information Sciences and Technology

Page 2: Introduction to Support Vector Machines for Data Mining

2

Agenda

Introduction Support Vector Machines Preliminary Experimentation Conclusion Questions?

Page 3: Introduction to Support Vector Machines for Data Mining

3

Data Mining Techniques: Neural Networks Decision Trees Multivariate Adaptive Regression Splines

(MARS) Rule Induction Nearest Neighbor Method and discriminant

analysis Genetic Algorithms Support Vector Machines

Page 4: Introduction to Support Vector Machines for Data Mining

4

Support Vector Machines

First introduced by Vapnik and Chervonenkis in COLT-92

Bases on Statistical Learning TheoryApplicationsBasic Theory

• Classification• Regression

Page 5: Introduction to Support Vector Machines for Data Mining

5

Successful Applications of SVMS

Protein Structure Prediction http://www.cs.umn.edu/~hpark/papers/surface.pdf

Intrusion Detection www.cs.nmt.edu/~IT Handwriting Recognition Detecting Steganography in digital images

http://www.cs.dartmouth.edu/~farid/publications/ih02.html

Page 6: Introduction to Support Vector Machines for Data Mining

6

Successful Applications of SVMS

Breast Cancer Prognosis: Chemotherapy Effect on Survival Rate (Lee, Mangasarian and Wolberg, 2001)

Particle and Quark-Flavour Identification in High Energy Physics (http://wwwrunge.physik.uni-freiburg.de/preprints/EHEP9901.ps)

Function Approximation

Page 7: Introduction to Support Vector Machines for Data Mining

7

Support Vector Machines(Linearly separable case)

-2

0

2

4

6

8

10

0 5 10 15 20

Page 8: Introduction to Support Vector Machines for Data Mining

8

Support Vector Machines(Linearly separable case)

-2

0

2

4

6

8

10

0 5 10 15 20

Page 9: Introduction to Support Vector Machines for Data Mining

9

Support Vector Machines(Linearly separable case)

-2

0

2

4

6

8

10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Page 10: Introduction to Support Vector Machines for Data Mining

10

Non-Linearly separable case

Page 11: Introduction to Support Vector Machines for Data Mining

11

SVM for Regression

In case of regression, the goal is to construct a hyperplane that is close to as many points as possible.

For both classification and regression, learning is done via quadratic programming (one optimum point)

Page 12: Introduction to Support Vector Machines for Data Mining

12

Strengths and Weaknesses of SVM

StrengthsTraining is relatively easy

• No local optimal, unlike in neural networks

It scales relatively well to high dimensional dataWeaknesses

Need a “good” kernel function

Page 13: Introduction to Support Vector Machines for Data Mining

13

Preliminary Experimentation: Forecasting GDP using Oil Prices (with F. Malik) Forecasting model Objective: To predict the Gross

Domestic Product (GDP) for the next quarter usingOil prices (including time lag)GDP time

Page 14: Introduction to Support Vector Machines for Data Mining

14

Data Set

We looked at quarterly Oil prices and GDP data

January 1947 – December 2002 Oil price data were obtained from Bureau of

Labor Statistics GDP data were obtained from the Bureau of

Economic Analysis. We used the growth rate of GDP and the

growth rate of oil prices.

Page 15: Introduction to Support Vector Machines for Data Mining

15

Models

Neural NetworksBack-propagationOne hidden layerDelta rule was used for training

LS-SVM (Van Gestel, 2001)Matlab toolbox

Page 16: Introduction to Support Vector Machines for Data Mining

16

Experimentation

Created the training data to predict the last 40 quarters GDP (test data)

Trained the neural network and the SVM

Used the model to predict GDP, and calculated the error of prediction

Page 17: Introduction to Support Vector Machines for Data Mining

17

Results

ModelModel MAEMAE

Neural Network

0.0044

LS-SVM 0.0052

Page 18: Introduction to Support Vector Machines for Data Mining

18

Good References

Introductions Martin Law, “An Introduction to Support Vector Machines” Andrew More, “Support Vector Machines”

www.cs.cmu.edu/~awm N. Cristianini www.support-vector.net/tutorial.html

In depth Support Vector Machines book www.support-vector.net

Page 19: Introduction to Support Vector Machines for Data Mining

19

Questions

E-mail: [email protected] Presentation will be posted (by Friday) at

http://www.bklv.psu.edu/faculty/nasereddin