Upload
candace-pena
View
37
Download
0
Tags:
Embed Size (px)
DESCRIPTION
CORAL SEA. Workflow. The software “ CORAL SEA “ is a tool to build up the quantitative structure – property / activity relationships ( QSPRs / QSARs ). The representation of the molecular structure that is used in the CORALSEA is SMILES - PowerPoint PPT Presentation
Citation preview
The software “CORALSEA“ is a tool to build up the quantitative structure – property / activity
relationships (QSPRs/QSARs)
The representation of the molecular structure that is used in the CORALSEA is SMILES
= simplified molecular input-line entry system
For details, please see http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
Here we used for the demo of CORALSEA our model from article “THE DEFINITION OF THE MOLECULAR STRUCTURE FOR POTENTIAL ANTI-MALARIA AGENTS BY THE MONTE CARLO METHOD” Struct. Chem. 2013; 24:1369–1381
You can develop a better model , but now please follow our suggestions.
The first action is the preparation of SMILES file which is the input for CORALSEA
+1 COc1ccc2c(c1)NC(C)=C(CCCCCCC)C2=O 7.332+2 COc1ccc2c(c1)NC(C)=CC2=O 4.903+3 O=C1c2ccccc2NC(C)=C1CCCCCCC 6.979+4 O=C1c2ccccc2NC(C)=C1CCCCCCCCC 7.400#5 O=C1c3ccccc3NC(C)=C1C2CCCCC2 5.652-6 O=C1c3ccccc3NC(C)=C1c2ccccc2 6.270+7 O=C2c3ccccc3NC(C)=C2Cc1ccccc1 5.207+8 O=C1c2ccccc2NC(C)=C1Br 7.110-9 O=C1c2ccccc2NC(C)=C1\C=C\CCCCCCC 7.824+10 C=C(CCCCCCC)C=1C(=O)c2ccccc2NC=1C 7.472+12 O=C2c3ccccc3NC(C)=C2/C=C/c1ccccc1 5.827+13 COc1ccc2NC(C)=C(Br)C(=O)c2c1 5.934-14 Cc1ccc2NC(C)=C(Br)C(=O)c2c1 6.583#15 Brc1ccc2NC(C)=C(Br)C(=O)c2c1 6.470+17 Fc1ccc2NC(C)=C(Br)C(=O)c2c1 6.903+18 Clc1ccc2NC(C)=C(C#CCCCC)C(=O)c2c1 4.336#19 COc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.675-21 COc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 5.859-22 COc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.295-23 COc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 6.570+24 COc3cccc1c3NC(C)=C(C1=O)c2ccccc2 5.779-25 Clc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.279#26 Clc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 5.485#28 Clc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.324-29 Clc1ccc2NC(C)=C(C(=O)c2c1)c3ccccc3 6.110-30 Clc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 5.731-31 Clc1ccc2NC(C)=C(C(=O)c2c1Cl)c3ccccc3 5.493#33 Clc1cc2NC(C)=C(C(=O)c2c(Cl)c1)c3ccccc3 5.464#34 COc1ccc3c(c1)C(=O)C(Cc2ccccc2)=C(C)N3C 5.094+35 COc1ccc3c(c1)N(C)C(C)=C(Cc2ccccc2)C3=O 5.106+36 Fc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.081+37 Clc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.815+38 Brc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.602#39 Fc1cc2c(cc1OC)NC(C)=C(CC)C2=O 6.793+41 Brc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.440-44 Clc1cc2c(cc1OC)NC(C)=C(C2=O)C3CCCCC3 6.401+45 Clc1cc3c(cc1OC)NC(C)=C(Cc2ccccc2)C3=O 7.164-46 Clc1cc2c(cc1OC)NC(C)=C(C)C2=O 7.564#47 CC(C)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 6.712+48 CC(CC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.199+49 Clc1cc2c(cc1OC)NC(C)=CC2=O 5.731-50 Clc1cc2c(cc1OC)NC(C)=C(C#CCCCC)C2=O 5.376#53 CC(C)(C)OC(=O)/C=C/C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.271
Each compound should be represented by (1) The type=[+,-,#]; (2) The ID: it can be CAS (chemical abstract service) or a number;(3) SMILES; and (4) Endpoint value.
“+” is indicator of sub-training set;“-” is indicator of calibration set;“#” is indicator of test set.
The role of sub-training set is developer of model; The role of calibration set is critic of model; The role of test set is estimator of model.
MyFile.txt
It is a good idea to reserve some substances as "invisible" validation set for final estimation of the model
10
*11 O=C1c2ccccc2NC(C)=C1C\C=C\CCCCCC 6.728
*16 Clc1ccc2NC(C)=C(Br)C(=O)c2c1 6.900
*20 COc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 4.624
*27 Clc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 4.805
*32 Clc1cc2c(cc1Cl)NC(C)=C(C2=O)c3ccccc3 6.456
*40 Clc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.559
*42 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCC)C2=O 8.530
*43 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCCCC)C2=O 8.779
*51 C=C(CCCCC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.830
*52 Clc1cc2c(cc1OC)NC(C)=C(\C=C\CCCCC)C2=O 7.975
Format of file for this validation is the following:
(1)The number of compounds; (2) list of compounds in the above-mentioned format type-ID-SMILES-Endpoint values.
MyInput.txt
In order to start your work you must download CORALSEA.zip from www.insilico.eu/coral When it is done, you must insert folder "CORALSEA" in your computer:
In order to carry out QSPR/QSAR analysis of data represented for CLASSIFICATION MODEL one should do the following:
(i) Insert “#TRNCLBTST-1.txt” in
the folder;
(ii)Insert “#Input-1.txt” in the folder.
(iii)Click CORALSEA.exe. “#TRNCLBTST.txt-is file which contains training (TRN), calibration(CLB) ,and test(TST) sets#Input.txt is data which are not visible during building up model
It appears in your screen:
File “#Output-1.txt contains statistical characteristics for the validation set (#Output-1.txt is placed in folder “Model”)
In order to carry out QSPR/QSAR analysis of data represented for REGRESSION MODEL one should do the following:
(i) Insert “#TRNCLBTST.txt” in the
folder;
(ii)Insert “#Input-1.txt” in the folder.
(iii)Click CORALSEA.exe.
“#TRNCLBTST.txt-is file which contains training (TRN), calibration(CLB) ,and test(TST) sets#Input.txt is data which are not visible during building up model
It appears in your screen:
Insert name “#TRNCLBTST-1.txt” in text box. After this, please select “Classic Scheme” or “Balance of Correlation” for your QSPR/QSAR investigation
SELECT
INSERT
It appears in your screen:
You can use “classic scheme”, balance of correlations, and Ideal slopes C1,C1’
It appears in your screen:
You can choice your mode e.g. (1) Define Dstart=0.25 ; (2) Nepoch=20; after this you must do(3) Click “Save method”, otherwise method remains the same
1
1
2
3
It appears in your screen:
Programm will carry out the Monte Carlo optimization with various threshold and the number of epochs. The preferable values of threshold and the number of epochs one can find in file “Search/BestMDL.txt” when the calculation will be completed.
The containing of file “search/BestMDL.txt” will be approximately the following:
One can see that preferable threshold (T*) is 2, and the preferable number of epochs (N*) is 15.One can use this information to build up robust model.
An attempt to build up robust model…
Create Folder “MyCORALSEA-T2-N15” (copy of “MyCORALSEA”)
Run CORALSEA.exe in this folder “MyCORALSEA-T2-N15”
Click “Load method”
It appears in your screen:
(1) Insert Nepoch=15, (2) Click “Building up preferable model (T*,N*)”
T*=2N*=15
(3)Insert Threshold=2, and (4) Click “Continue”
1
2
3
4
Folder “Model” contains parameters of the QSPR/QSAR model
File “#Output-1.txt contains statistics for the invisible validation set
It will appear at the screen
(1) Insert name “MyInput.txt” instead of “#Input-1.txt”
(2) Click “Start of DCW and Endpoint calculation for SMILES input file”
2 MyInput.txt1
It will appear at the screen
After these actions, file “model/Output.txt” will contain results of calculation for compounds from “MyInput.txt”
Click “OK”
It will appear at the screen
You will see graphical representation for sub-training, calibration, test, and validation sets.
One can calculate model for individual SMILES
(1) Insert SMILES in indicated box;(2) Click “Start of DCW and Endpoint Calculation for Inserted SMILES”
1
2
The Containing of “Model/DemoDesc.txt” is the following:
DCW is DCW(2,15) for NC(CCCNC(N)=N)C(O)=O; Endpoint=2.9412.This example is only demo, the NC(CCCNC(N)=N)C(O)=O is apparently out
of Domain of applicability.
These slides have shown the "technology", but to understand "philosophy", please read file
"ReadMe.pdf"