13
Introduction Support Vector Regression QSAR Problems and Data SVMs for QSAR Linear Program Feature Selection Model Selection and Bagging Computational Results Discussion

Introduction Support Vector Regression QSAR Problems and Data SVMs for QSAR

  • Upload
    meena

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction Support Vector Regression QSAR Problems and Data SVMs for QSAR Linear Program Feature Selection Model Selection and Bagging Computational Results Discussion. Support Vector Regression. e -insensitive loss function. Quadratic SVMs with L 2 -norm. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Introduction Support Vector Regression

QSAR Problems and Data

SVMs for QSAR

Linear Program Feature Selection

Model Selection and Bagging

Computational Results

Discussion

Page 2: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Support Vector Regression

-insensitive loss function

)( bxwy

)( bxwy

Page 3: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Quadratic SVMs with L2-norm

0)(

0 s.t.

)()(

)()(21

min

1

*

1

*

1

*

1 1

**

l

iii

*ii

l

iii

l

iiii

l

i

l

jjjijii

C,αα

y

K

Page 4: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Linear SVMs with L1-norm (-SVR)

*

1

*

1

*

**

1

*

1

*

)(

)(

0,,,, s.t.

)()(1

min

i

l

jijjji

ii

l

jijjj

iijj

l

iii

l

jjj

bKy

ybK

ClC

l

Page 5: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

QSAR Problems and Data

SVMs for QSARSVMs for QSAR SVMs for QSARSVMs for QSAR

Statistical Analysis Statistical Analysis QSAR Model BuildingQSAR Model Building

Statistical Analysis Statistical Analysis QSAR Model BuildingQSAR Model Building

Calculation of DescriptorsCalculation of DescriptorsCalculation of DescriptorsCalculation of Descriptors

3D Geometry Optimization3D Geometry Optimization 3D Geometry Optimization3D Geometry Optimization Preparation of Input DATA Preparation of Input DATA

(Bioactivity value, Structures)(Bioactivity value, Structures)

Preparation of Input DATA Preparation of Input DATA (Bioactivity value, Structures)(Bioactivity value, Structures)

Page 6: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Data Sets

HIV dataset five classes of Anti-HIV molecules, 64 molecules, 620 descriptors

Lombardo benchmark dataset Brain-blood barrier partitioning dataset, 62 molecules, 649 descriptors

Data Matrix descriptor1 descriptor2 - - - descriptor m Activity Data Matrix descriptor1 descriptor2 - - - descriptor m Activity

Molecule 1 x11 x12Molecule 1 x11 x12 x1m x1m ln BB ln BB Molecule 2 x21 x22 Molecule 2 x21 x22 x2m x2m ln BB ln BB - - - - - - - - - - - - Molecule n x n1 x n2 Molecule n x n1 x n2 x nm x nm ln BB ln BB

Page 7: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Data Matrix descriptor1 descriptor2 descriptor3 - - - descriptor m Activity Data Matrix descriptor1 descriptor2 descriptor3 - - - descriptor m Activity

Molecule 1 x11 x12 x13 x1m ln BB Molecule 1 x11 x12 x13 x1m ln BB Molecule 2 x21 x22 x23 x2m ln BB Molecule 2 x21 x22 x23 x2m ln BB - - - - - - - - - - - - Molecule n x n1 x n2 x n3 x nm ln BBMolecule n x n1 x n2 x n3 x nm ln BB

Page 8: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

SVMs for QSAR

Construct Datasets

Final Model

Optimize Model

Model SelectionC, , ,

Bagging Models

Feature Selection

Page 9: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Linear Program Feature Selection

*

1

*

1

*

**

1

*

1

*

)(

)(

0,,,, s.t.

)()( min

i

n

jijjji

ii

n

jijjj

iijj

l

iii

n

jjj

bxy

ybx

ClC

Page 10: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Bagging

• Different validation sets give different models

• Many local minima in SVM parameter search

• Average models

Model Selection

• Choose SVM model parameters, C, or ,

• Select evaluation function Q2

• Evaluate on testing data

• Adjust using cross validation

Page 11: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

Computational Results

Methods (10-fold

CV)

Full Data (649)

LP FS (21)

NN SA(9)

Q2 q2 Q2 q2 Q2 q2

L1-SVM .384 .382 .157 .153 .219 .217

L2-SVM .310 .292 .171 .160 .247 .245

NN .320 .301 .222 .193 .247 .238

Page 12: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

2

3 4

5

6

7

8

9

10

11

12

13

14 151617

1819

20

212223

2425

26

272829

30

3132

33

34

35

36

37

3839

4041

4243

4445

46

47

48 49 50

5152

53

54

55

56

57

5859 60

61

62

SCATTERPLOT DATA ( SVM1LOMBFULL )

Observed Response

Pre

dict

ed R

espo

nse

Q2 = 0.384

q2 = 0.382 RMSE = 0.500

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

2

3 4

5

6

7

8

9

10

1112

13

14 15

16

17

1819

20 21

2223

2425

26

2728

29

30

3132

33

34

35

36

37

3839

4041

4243

4445

46

4748 49 50

5152

53

54

55

56

57

5859

60

61

62

SCATTERPLOT DATA ( SVM1LOMBLPFS )

Observed Response

Pre

dict

ed R

espo

nse

Q2 = 0.157

q2 = 0.153 RMSE = 0.316

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

2

3

45

6

7 8

910

1112

1314

15

16

17

1819

2021

2223

2425

26

27282930

31

3233

34

35

36

373839

40

41

42

43

44

4546

47

4849

50

5152

53

54 55

56

57

58

59

60

61

62

SCATTERPLOT DATA ( SVM1LOMBNNSA )

Observed Response

Pre

dict

ed R

espo

nse

Q2 = 0.219

q2 = 0.217 RMSE = 0.117

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1

2

3 4

5

6

7

8

9

10

1112

13

1415

16

17

1819

20 21

2223

24 25 26

272829

30

3132

33

34

35

36

37

3839

4041

4243

4445

464748 49 50

5152

5354

55

56

57

5859

60

61

62

SCATTERPLOT DATA ( SVM2LOMBLPFS )

Observed Response

Pre

dict

ed R

espo

nse

Q2 = 0.171

q2 = 0.160 RMSE = 0.104

Page 13: Introduction   Support Vector Regression   QSAR Problems and Data   SVMs for QSAR

This work is supported by NSF (IIS-9979860 and 970923)

Discussion

Robust optimization methods

LPFS outperforms NNSA

L1-SVM can run faster than L2-SVM

? May improve LPFS method

? May improve performance of L1-SVM