26
S1 Supplementary Information Development and Application of a Comprehensive Machine Learning Program for Predicting Molecular Biochemical and Pharmacological Properties Hwanho Choi, Hongsuk Kang, Kee-Choo Chung, and Hwangseo Park* Sejong Battery Institute, Sejong University, 209 Neungdong-ro, Kwangjin-gu, Seoul 05006, South Korea * Author to whom correspondence should may be addressed. [email protected] Chemical structures of the molecules in pK i (thrombin), logD, P Caco-2 , and CL int datasets. Linear correlation diagrams between the experimental and calculated logD values of 53 SAMPL5 molecules in the 5-fold external cross- validation. Linear correlation diagrams between the experimental and calculated logCL int values of 37 drug molecules in the 5-fold external cross- validation. S2-S16 S17-S21 S22-S26 Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics. This journal is © the Owner Societies 2019

Supplementary Information Properties Molecular Biochemical

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Supplementary Information Properties Molecular Biochemical

S1

Supplementary Information

Development and Application of a Comprehensive Machine Learning Program for Predicting

Molecular Biochemical and Pharmacological Properties

Hwanho Choi, Hongsuk Kang, Kee-Choo Chung, and Hwangseo Park*

Sejong Battery Institute, Sejong University, 209 Neungdong-ro, Kwangjin-gu, Seoul 05006, South Korea

*Author to whom correspondence should may be addressed.

[email protected]

Chemical structures of the molecules in pKi (thrombin), logD, PCaco-2,

and CLint datasets.

Linear correlation diagrams between the experimental and calculated

logD values of 53 SAMPL5 molecules in the 5-fold external cross-

validation.

Linear correlation diagrams between the experimental and calculated

logCLint values of 37 drug molecules in the 5-fold external cross-

validation.

S2-S16

S17-S21

S22-S26

Electronic Supplementary Material (ESI) for Physical Chemistry Chemical Physics.This journal is © the Owner Societies 2019

Page 2: Supplementary Information Properties Molecular Biochemical

S2

Table S1. Chemical structures of 88 thrombin inhibitors. Indicated in red are the

molecules in the training set.

01 02 03

04 05 06

07 08 09

10 11 12

13 14 15

Page 3: Supplementary Information Properties Molecular Biochemical

S3

16 17 18

19 20 21

22 23 24

25 26 27

28 29 30

31 32 33

Page 4: Supplementary Information Properties Molecular Biochemical

S4

34 35 36

37 38 39

40 41 42

43 44 45

46 47 48

49 50 51

Page 5: Supplementary Information Properties Molecular Biochemical

S5

52 53 54

55 56 57

58 59 60

61 62 63

64 65 66

67 68 69

Page 6: Supplementary Information Properties Molecular Biochemical

S6

70 71 72

73 74 75

76 77 78

79 80 81

82 83 84

85 86 87

Page 7: Supplementary Information Properties Molecular Biochemical

S7

88

Page 8: Supplementary Information Properties Molecular Biochemical

S8

Table S2. Chemical structures of 53 SAMPL5 molecules

SAMPL5_002 SAMPL5_003 SAMPL5_004

SAMPL5_005 SAMPL5_006 SAMPL5_007

SAMPL5_010 SAMPL5_011 SAMPL5_013

SAMPL5_015 SAMPL5_017 SAMPL5_019

SAMPL5_020 SAMPL5_021 SAMPL5_024

SAMPL5_026 SAMPL5_027 SAMPL5_033

Page 9: Supplementary Information Properties Molecular Biochemical

S9

SAMPL5_037 SAMPL5_042 SAMPL5_044

SAMPL5_045 SAMPL5_046 SAMPL5_047

SAMPL5_048 SAMPL5_049 SAMPL5_050

SAMPL5_055 SAMPL5_056 SAMPL5_058

SAMPL5_059 SAMPL5_060 SAMPL5_061

SAMPL5_063 SAMPL5_065 SAMPL5_067

Page 10: Supplementary Information Properties Molecular Biochemical

S10

SAMPL5_068 SAMPL5_069 SAMPL5_070

SAMPL5_071 SAMPL5_072 SAMPL5_074

SAMPL5_075 SAMPL5_080 SAMPL5_081

SAMPL5_082 SAMPL5_083 SAMPL5_084

SAMPL5_085 SAMPL5_086 SAMPL5_088

SAMPL5_090 SAMPL5_092

Page 11: Supplementary Information Properties Molecular Biochemical

S11

Table S3. Chemical structures of 38 drug molecules in Caco-2 dataset. Indicated in red

are the molecules in the training set.

Salicylic acid Dopamine Urea

Phenytoin Acyclovir Alprenolol

Atenolol Caffeine Chlorpromazine

Clonidine Desipramine Diazepam

Ganciclovir Indomethacin Labetalol

Page 12: Supplementary Information Properties Molecular Biochemical

S12

Metoprolol Pindolol Propranolol

Terbutaline Timolol Dexamethasone

Corticosterone Hydrocortisone Estradiol

Sucrose Progesterone Aminophenazone

Testosterone Mannitol Phencyclidine

Zidovudine Nadolol Nicotine

Page 13: Supplementary Information Properties Molecular Biochemical

S13

Bremazocine Scopolamine Sulfasalazine

Meloxicam Warfarin

Page 14: Supplementary Information Properties Molecular Biochemical

S14

Table S4. Chemical structures of 37 drug molecules in CLint dataset

Phenytoin Metoprolol Amitryptyline

Omeprazole Diltiazem Clozapine

Alprazolam Zolpidem Imipramine

Tenoxicam Methohexital Chlorpromazine

Methoxsalen Propranolol Diclofenac

Nicardipine Prednisone Verapamil

Page 15: Supplementary Information Properties Molecular Biochemical

S15

Midazolam Diazepam Triazolam

Quinidine Ibuprofen Diphenhydramine

Fluvastatin Tolbutamide Desipramine

Propafenone Dexamethasone Amobarbital

Hexobarbital Phenacetin Nilvadipine

Lorcainide FK1052 FK480

Page 16: Supplementary Information Properties Molecular Biochemical

S16

Tenidap

Page 17: Supplementary Information Properties Molecular Biochemical

S17

Figure S1. Linear correlation diagrams between the experimental and calculated results for

logD values of the molecules in training (black) and test set (red).

Fold #1

Compounds in the training set: 002, 003, 004, 006, 007, 010, 011, 015, 017, 019, 021, 024, 026, 027, 037, 042, 045, 046, 048, 049, 050, 055, 056, 058, 059, 060, 063, 065, 070, 072, 074, 075, 081, 082, 083, 084, 088, 090, 092

Compounds in the test set: 005, 013, 020, 033, 044, 047, 061, 067, 068, 069, 071, 080, 085, 086

Page 18: Supplementary Information Properties Molecular Biochemical

S18

Figure S2. Linear correlation diagrams between the experimental and calculated results for

LogD values of the molecules in training (black) and test set (red).

Fold #2

Compounds in the training set: 003, 005, 006, 007, 010, 011, 015, 017, 019, 020, 024, 027, 033, 037, 042, 044, 045, 048, 049, 055, 058, 059, 060, 061, 065, 067, 068, 069, 072, 074, 075, 080, 082, 083, 084, 085, 088, 090, 092

Compounds in the test set: 002, 004, 013, 021, 026, 046, 047, 050, 056, 063, 070, 071, 081, 086

Page 19: Supplementary Information Properties Molecular Biochemical

S19

Figure S3. Linear correlation diagrams between the experimental and calculated results for

LogD values of the molecules in training (black) and test set (red).

Fold #3

Compounds in the training set: 002, 003, 004, 005, 006, 007, 010, 011, 013, 015, 017, 020, 021, 024, 026, 027, 033, 037, 044, 046, 047, 048, 049, 050, 055, 058, 059, 063, 065, 069, 071, 080, 081, 082, 083, 084, 085, 088, 090

Compounds in the test set: 019, 042, 045, 056, 060, 061, 067, 068, 070, 072, 074, 075, 086, 092

Page 20: Supplementary Information Properties Molecular Biochemical

S20

Figure S4. Linear correlation diagrams between the experimental and calculated results for

LogD values of the molecules in training (black) and test set (red).

Fold #4

Compounds in the training set: 002, 003, 006, 007, 010, 011, 013, 015, 017, 019, 021, 026, 027, 033, 037, 042, 044, 045, 046, 047, 055, 056, 058, 059, 060, 061, 063, 068, 070, 071, 072, 074, 080, 082, 083, 085, 086, 088, 090

Compounds in the test set: 004, 005, 020, 024, 048, 049, 050, 065, 067, 069, 075, 081, 084, 092

Page 21: Supplementary Information Properties Molecular Biochemical

S21

Figure S5. Linear correlation diagrams between the experimental and calculated results for

LogD values of the molecules in training (black) and test set (red).

Fold #5

Compounds in the training set: 002, 003, 005, 006, 007, 010, 013, 015, 017, 019, 020, 024, 026, 027, 033, 037, 042, 044, 045, 048, 049, 056, 059, 061, 063, 065, 068, 069, 070, 071, 074, 075, 080, 082, 083, 084, 085, 090, 092

Compounds in the test set: 004, 011, 021, 046, 047, 050, 055, 058, 060, 067, 072, 081, 086, 088,

Page 22: Supplementary Information Properties Molecular Biochemical

S22

Figure S6. Linear correlation diagrams between the experimental and calculated results for

logCL values of the molecules in training (black) and test set (red).

Fold #1

Compounds in the training set: phenytoin, amitryptyline, omeprazole, clozapine, zolpidem, tenoxicam, methohexital, chlorpromazine, methoxsalen, propranolol, diclofenac, nicardipine, prednisone, verapamil, midazolam, diazepam, triazolam, quinidine, ibuprofen, diphenhydramine, fluvastatin, tolbutamide, desipramine, amobarbital, phenacetin, fk1052, lorcainide, tenidap

Compounds in the test set: metoprolol, diltiazem, alprazolam, imipramine, propafenone, dexamethasone, hexobarbital, nilvadipine, fk480

Page 23: Supplementary Information Properties Molecular Biochemical

S23

Figure S7. Linear correlation diagrams between the experimental and calculated results for

logCL values of the molecules in training (black) and test set (red).

Fold #2

Compounds in the training set: metoprolol, omeprazole, diltiazem, clozapine, alprazolam, imipramine, tenoxicam, methohexital, methoxsalen, propranolol, diclofenac, nicardipine, verapamil, midazolam, diazepam, triazolam, quinidine, ibuprofen, fluvastatin, desipramine, propafenone, dexamethasone, amobarbital, hexobarbital, phenacetin, nilvadipine, fk480, lorcainide

Compounds in the test set: phenytoin, amitryptyline, zolpidem, chlorpromazine, prednisone, diphenhydramine, tolbutamide, fk1052, tenidap

Page 24: Supplementary Information Properties Molecular Biochemical

S24

Figure S8. Linear correlation diagrams between the experimental and calculated results for

logCL values of the molecules in training (black) and test set (red).

Fold #3

Compounds in the training set: phenytoin, amitryptyline, omeprazole, diltiazem, alprazolam, zolpidem, imipramine, tenoxicam, methoxsalen, propranolol, nicardipine, prednisone, verapamil, midazolam, diazepam, triazolam, quinidine, ibuprofen, diphenhydramine, fluvastatin, tolbutamide, desipramine, propafenone, amobarbital, phenacetin, nilvadipine, fk1052, fk480

Compounds in the test set: metoprolol, clozapine, methohexital, chlorpromazine, diclofenac, dexamethasone, hexobarbital, lorcainide, tenidap

Page 25: Supplementary Information Properties Molecular Biochemical

S25

Figure S9. Linear correlation diagrams between the experimental and calculated results for

logCL values of the molecules in training (black) and test set (red).

Fold #4

Compounds in the training set: metoprolol, amitryptyline, omeprazole, clozapine, alprazolam, zolpidem, tenoxicam, methohexital, chlorpromazine, methoxsalen, propranolol, diclofenac, nicardipine, prednisone, verapamil, diazepam, triazolam, quinidine, ibuprofen, fluvastatin, desipramine, propafenone, dexamethasone, amobarbital, phenacetin, nilvadipine, fk1052, tenidap

Compounds in the test set: phenytoin, diltiazem, imipramine, midazolam, diphenhydramine, tolbutamide, hexobarbital, fk480, lorcainide

Page 26: Supplementary Information Properties Molecular Biochemical

S26

Figure S10. Linear correlation diagrams between the experimental and calculated results for

logCL values of the molecules in training (black) and test set (red).

Fold #5

Compounds in the training set: phenytoin, metoprolol, amitryptyline, omeprazole, diltiazem, zolpidem, imipramine, tenoxicam, methohexital, chlorpromazine, methoxsalen, diclofenac, prednisone, midazolam, triazolam, quinidine, diphenhydramine, fluvastatin, tolbutamide, desipramine, dexamethasone, amobarbital, hexobarbital, nilvadipine, fk1052, fk480, lorcainide, tenidap

Compounds in the test set: clozapine, alprazolam, propranolol, nicardipine, verapamil, diazepam, ibuprofen, propafenone, phenacetin