1
Institute of Biomedical Chemistry of Rus. Acad. Med. Sci.; *A.N. Sysin Institute of Human Ecology and Environmental Health of Rus. Acad. Med. Sci., Moscow, Russia. (Q)SAR and (Q)AAR analysis of ToxCast Dataset Using PASS and GUSAR approaches Vladimir Poroikov , Dmitry Filimonov, Alexey Zakharov, Alexey Lagunin, Sergey Novikov* References Acknowledgements. We gratefully acknowledge Prof. Alex Tropsha for kindly assistance in presentation of the results at the ToxCast Poster Session. The work was supported in part by the FP7 project 200787 (OpenTox) and ISTC project # 3777. Apologies . I am sorry for not obtaining the US visa in time and, therefore, inability to take part in the ToxCast Workshop on May 14- 15, 2009. In case, if you will have any questions/suggestions, please, do not hesitate to contact me: [email protected]; tel: 7 499 246-0920; fax: 7 499 245-0857. Introduction The aim of the study: (1) To estimate the possibility of prediction of ToxCast Phase 1 (TC1) in vivo data on the basis of structural formulae, physical-chemical properties and in vitro data from TC1 dataset. (2) To estimate the possibility of prioritization of molecules from the TC1 dataset for the toxicological testing using the integral parameter. Materials The data on in vivo and in vitro assays of chemical compounds were used for (quantitative) structure-activity relationships ((Q)SAR) and (quantitative) activity-activity relationships ((Q)AAR) analysis from the ToxCast Phase 1 dataset. The data from CPDB dataset (CPDB, 25 October 2007) was used as the training set for the carcinogenicity prediction of in vivo ToxCast assays. The data were extracted from EPA Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network [1]. We used 1397 compounds that were tested in the standard two-year rodent carcinogenicity bioassay. Small inorganic compounds (e.g. NO2), oils, paraffins and mixtures of compounds were excluded from the set. Methods PASS program. (Prediction of Activity Spectra for Substances) is a computer program for evaluation of general biological potential in a molecule on the basis of its structural formulae [2]. MNA ("Multilevel Neighbourhoods of Atoms") descriptors are used for presentation of a compound’s structure. The list of predictable biological activities contains 3750 types (PASS 2009.1 version) including main and side pharmacological effects (antihypertensive, hepatoprotective, anti-inflammatory etc.), mechanisms of action (5-hydroxytryptamine agonist, cyclooxygenase inhibitor, adenosine uptake inhibitor, etc.), specific toxicities (mutagenicity, carcinogenicity, teratogenicity, etc.) and metabolic terms (CYP1A substrate, CYP3A4 inhibitor, CYP2C9 inducer, etc.). The mean accuracy calculated by leave-one-out cross-validation procedure is 95%. PASS predictions for TC1 molecules are presented at the ToxCast web-site, and can be used as parameters characterizing these compounds in biological space. QNA descriptors. A molecular structure is described as a set of QNA (Quantitative Neighborhoods of Atoms) descriptors [3]. QNA descriptors are based on values of ionization potential (IP) and electron affinity (EA) of each atom in the molecule. QNA descriptors are calculated as following: Pi = Σk Bi-½(Exp(–½C))ikBk-½, Qi = Σk Bi-½(Exp(–½C))ikBk-½Ak, where Ak = ½(IPk + EAk), Bk = IPk – EAk, C is a molecular connectivity matrix. Thus, each atom of molecule is described by two values, P and Q. Since any molecule has different number of atoms, P and Q are proportional to the number of atoms in molecule, but for regression analysis it is necessary to describe the molecular structure as a vector with the fixed length. Therefore, Chebyshev polynomial’s are used for vector’s presentation of a molecular structure: where Tn is nth degree of Chebyshev polynomial, P` and Q` are the orthonormalized representation of P and Q values (zero mean values of P` and Q`, unit variance and absence of correlation of P` and Q`). The Tn(P,Q) values are calculated for each atom of a molecule. A whole molecule is presented as an average value of Chebyshev polynomials for all atoms; therefore, the length of the vector is defined by the numbers of Chebyshev polynomials - m. On one hand the large number of Chebyshev polynomials may describe complex structure-activity relationships; on the other hand the large length of the vector that represents the structure may provide overtraining in regression analysis. Therefore, the initial value of m is determined as a half of number of ))) ( ( ( ))) ( ( ( ) , ( Q th ArcCos n Cos P th ArcCos n Cos Q P T Q P n Q P n n n Analysis of Chemical Space 0 1 2 3 4 5 6 7 8 0 10 20 30 40 50 60 70 80 90 100 TO XC AST_309_0_M F.txt 10_MF.txt M D D R 2003 P rinciple_M F.txt roadmap_2008-01-28_MF.txt We compared the distribution of molecules from ToxCast Phase 1 dataset with compounds from the PASS Training set (10_MF), MDDR 2003, and RoadMap 2008. The compounds from ToxCast Phase 1 dataset contain less non-hydrogen atoms than typical drug-like molecules, and RoadMap dataset. In PASS training set average MW=416 Dalton; The average molecular weight of compounds from ToxCast Phase 1 database is 302 Dalton that smaller than those for drug-like compounds. Analysis of Biological Parameters 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 100 200 300 400 500 600 MolecularW eight k k D n m ToxDose 1 where: D is the LEL value; m is the number of end-points for a particular compound; n is the total number of tests. PASS prediction of the carcinogenicity To predict the carcinogenicity with PASS, we used compounds from CPDB as a training set. After the training procedure average accuracy of prediction (LOO CV) equals to 74%. With the trained version of PASS we predicted carcinogenicity for 306 compounds from ToxRefDB. Four compounds that have two components were excluded from the prediction. Accuracy of prediction for ToxRefDB is given below. NA TP TN FP FN Sensitivity Specificity Accuracy CHR_Mouse_LiverTumors 71 31 108 41 64 0.33 0.72 0.57 CHR_Mouse_LungTumors 71 11 138 11 84 0.12 0.93 0.61 CHR_Mouse_Tumorigen 71 25 129 41 49 0.34 0.76 0.63 CHR_Rat_LiverTumors 62 8 159 28 58 0.12 0.85 0.66 CHR_Rat_TesticularTumors 62 12 133 101 7 0.63 0.57 0.57 CHR_Rat_ThyroidTumors 62 3 212 20 18 0.14 0.91 0.85 CHR_Rat_Tumorigen 62 8 184 46 15 0.35 0.8 0.76 NA - data not available; TP - true positive; TN - true negative; FP - false positive; FN - false negative; Sensitivity TP/(TP+FN); Specificity TN/(TN+FP); Accuracy (TP+TN)/(TP+TN+FP+FN) Accuracy of prediction varies from 0.57 (CHR_Mouse_LiverTumors and CHR_Rat_TesticularTumors) to 0.85 (CHR_Rat_ThyroidTumors). Sensetivity varies from 0.12 (CHR_Mouse_LungTumors) to 0.63 (CHR_Rat_TesticularTumors). Specificity varies from 0.57 (CHR_Rat_TesticularTumors) to 0.93 (CHR_Mouse_LungTumors). Rat activities were predicted more accurately than mouse activities. QSAR Models for Rat’s Cholinesterase Inhibitors We collected 45 inhibitors of Rat’s Cholinesterase (CHR_Rat_CholinesteraseInhibition) from TC1. The toxicity end-point based on the EC50 (mg/kg) values was used for building of QSAR models, using QNA descriptors and SCR. The eighteen QSAR models were created by QNA/SCR approach. Only four QSAR models have Q 2 > 0.50 and R 2 > 0.60. These models were additionally validated by leave-10%-out cross validation. Leave-10%-out cross validation procedure was repeated 20 times and average R 2 of prediction was calculated. Results of validation are presented below. Name Number R 2 Q 2 Fisher SD Variables L10%OCV model 1 45 0.697 0.567 10.706 0.699 8 0.71 model 2 45 0.684 0.559 10.073 0.706 8 0.70 model 3 45 0.676 0.553 11.366 0.709 7 0.65 model 4 45 0.686 0.520 7.960 0.744 10 0.52 Three from four QSAR models have average R 2 of prediction more then 0.60. Therefore, the obtained models are robust and predictive. ToxDose 0.01 0.1 1 10 100 1000 10000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 m kM/kg C H R _M ouse_KidneyPathology CH R _M ouse_LiverHypertrophy CH R _M ouse_LiverNecrosis CHR_Mouse_LiverProliferativeLesions CH R _M ouse_LiverTum ors C H R _M ouse_LungTum ors CH R _M ouse_Tum origen CH R _Rat_CholinesteraseInhibition C H R _R at_KidneyN ephropathy CH R _Rat_KidneyProliferativeLesions CH R _Rat_LiverH ypertrophy CH R _Rat_LiverN ecrosis CH R _Rat_LiverProliferativeLesions CH R _Rat_LiverTum ors CH R _Rat_SpleenPathology CH R _Rat_TesticularAtrophy CH R _Rat_TesticularTum ors CH R _Rat_ThyroidH yperplasia CH R _Rat_ThyroidProliferativeLesions CH R _Rat_ThyroidTum ors CH R _Rat_Tum origen ToxD ose_m kM We calculated the integral parameter ToxDose for twenty end-points of carcinogenicity for mouse and rat. ToxDose values are correlated with other carcinogenicity end- points and may be use for prioritization of molecules from the TC1 dataset for toxicological testing. The data for cholinesterase inhibition differs significantly from all other end-points; thus they were excluded from the further analysis. Results of PASS Training for Predicting Different Categories of To The ninety five percent of ToxCast compounds are discriminated from drug-like molecules. For different toxicity grouping PASS accuracy of recognition varies from 75.0% to 59.1%; and the most toxic compounds are predicted better. Thus, PASS prediction could be applied for selection of priorities in testing of the most probable toxic compounds. Conclusions 1) http://www.epa.gov/NCCT/dsstox/ 2) Poroikov V, Filimonov D. 2005. PASS: Prediction of Biological Activity Spectra for Substances. In: Predictive Toxicology (Christoph Helma, eds). LLC, Boca Raton, Taylor & Francis Group, 459-478. 3) Lagunin, A.; Zakharov, A.; Filimonov, D.; Poroikov, V. A new approach to QSAR modelling of acute toxicity. SAR and QSAR in Environmental Research 2007, 18, 285-298. 4) Filimonov, D.; Akimov, D.; Poroikov, V. The Method of Self-Consistent Regression for the Quantitative Analysis of Relationships Between Structure and Properties of Chemicals. Pharm.Chem. J. 2004, 1, 21-24. Biological activities predicted by PASS can be directly compared to TC1 in vivo data only in a few cases (carcinogenicity and cholinesterase inhibitors). Comparison of TC1 in vivo data for the same species and between the species lead to the following conclusions: 1) Correlation coefficients between the in vivo data for same species varies from 0.75 to 0.95. 2) Correlation coefficients for the same tissues between the species less than 0.10. Comparison of in vivo data presented as integral parameter (ToxDose) with in vitro data demonstrated that the maximal value of correlation coefficient is 0.26 (ToxDose vs. ToxCast Novascreen data). Thus, no significant correlation between in vivo and in vitro data is found. No Activity Type Number IEP, % Data1 IEP, % Data2 1 ToxCast 301 5.151 5.267 2 CHR_ToxDose < 1.0 mkM/kg 9 25.68 9 5.706 3 CHR_ToxDose < 3.16 mkM/kg 19 28.62 3 9.155 4 CHR_ToxDose < 10.0 mkM/kg 51 28.05 2 6.487 5 CHR_ToxDose < 31.6 mkM/kg 89 25.00 8 5.24 6 CHR_ToxDose < 100.0 mkM/kg 145 27.27 2 6.457 7 CHR_ToxDose < 316.0 mkM/kg 176 24.13 6.011 8 CHR_ToxDose < 1000.0 mkM/kg 196 23.49 9 5.727 9 CHR_ToxDose 0.316-3.16 mkM/kg 15 33.09 1 14.14 10 CHR_ToxDose 1.0-10.0 mkM/kg 42 26.57 6 7.669 11 CHR_ToxDose 3.16-31.6 mkM/kg 70 28.50 2 7.024 12 CHR_ToxDose 10.0-100.0 mkM/kg 94 38.76 10.41 7 13 CHR_ToxDose 31.6-316.0 mkM/kg 87 33.64 5 9.508 14 CHR_ToxDose 100.0-1000.0 mkM/kg 51 40.90 1 15.01 15 CHR_ToxDose 316.0-3160.0 mkM/kg 26 34.02 9 14.91 5 Here: Data1 is 301 chemicals from ToxCast in vivo dataset with 100 drug-like chemical compounds; Data2 is 301 chemicals from ToxCast in vivo dataset with 60620 chemical compounds from PASS training set. 1) Compounds from TC1 dataset are smaller than typical drug-like structures and molecules from RoadMap set. 2) No significant correlation between the in vivo and in vitro data from TC1 set was observed. 3) Despite the chemical dissimilarity between the TC1 compounds and drug-like molecules, PASS-based prediction of carcinogenicity could be obtain with reasonable accuracy. 4) It is shown that integral parameter characterizing general toxicity ToxDose can be predicted by PASS with reasonable accuracy. Thus, such approach could be recommended for prioritization in chemicals testing.

Institute of Biomedical Chemistry of Rus. Acad. Med. Sci.; *A.N. Sysin Institute of Human Ecology and Environmental Health of Rus. Acad. Med. Sci., Moscow,

Embed Size (px)

Citation preview

Page 1: Institute of Biomedical Chemistry of Rus. Acad. Med. Sci.; *A.N. Sysin Institute of Human Ecology and Environmental Health of Rus. Acad. Med. Sci., Moscow,

Institute of Biomedical Chemistry of Rus. Acad. Med. Sci.; *A.N. Sysin Institute of Human Ecology and Environmental Health of Rus. Acad. Med. Sci., Moscow, Russia.

(Q)SAR and (Q)AAR analysis of ToxCast Dataset Using PASS and GUSAR approaches

Vladimir Poroikov, Dmitry Filimonov, Alexey Zakharov, Alexey Lagunin, Sergey Novikov*

References

Acknowledgements. We gratefully acknowledge Prof. Alex Tropsha for kindly assistance in presentation of the results at the ToxCast Poster Session. The work was supported in part by the FP7 project 200787 (OpenTox) and ISTC project # 3777.

Apologies. I am sorry for not obtaining the US visa in time and, therefore, inability to take part in the ToxCast Workshop on May 14-15, 2009. In case, if you will have any questions/suggestions, please, do not hesitate to contact me: [email protected]; tel: 7 499 246-0920; fax: 7 499 245-0857.

Introduction The aim of the study: (1) To estimate the possibility of prediction of ToxCast Phase 1 (TC1) in vivo data on the basis of structural formulae, physical-chemical properties and in vitro data from TC1 dataset.(2) To estimate the possibility of prioritization of molecules from the TC1 dataset for the toxicological testing using the integral parameter.

MaterialsThe data on in vivo and in vitro assays of chemical compounds were used for (quantitative) structure-activity relationships ((Q)SAR) and (quantitative) activity-activity relationships ((Q)AAR) analysis from the ToxCast Phase 1 dataset. The data from CPDB dataset (CPDB, 25 October 2007) was used as the training set for the carcinogenicity prediction of in vivo ToxCast assays. The data were extracted from EPA Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network [1]. We used 1397 compounds that were tested in the standard two-year rodent carcinogenicity bioassay. Small inorganic compounds (e.g. NO2), oils, paraffins and mixtures of compounds were excluded from the set.

MethodsPASS program. (Prediction of Activity Spectra for Substances) is a computer program for evaluation of general biological potential in a molecule on the basis of its structural formulae [2]. MNA ("Multilevel Neighbourhoods of Atoms") descriptors are used for presentation of a compound’s structure. The list of predictable biological activities contains 3750 types (PASS 2009.1 version) including main and side pharmacological effects (antihypertensive, hepatoprotective, anti-inflammatory etc.), mechanisms of action (5-hydroxytryptamine agonist, cyclooxygenase inhibitor, adenosine uptake inhibitor, etc.), specific toxicities (mutagenicity, carcinogenicity, teratogenicity, etc.) and metabolic terms (CYP1A substrate, CYP3A4 inhibitor, CYP2C9 inducer, etc.). The mean accuracy calculated by leave-one-out cross-validation procedure is 95%. PASS predictions for TC1 molecules are presented at the ToxCast web-site, and can be used as parameters characterizing these compounds in biological space.QNA descriptors. A molecular structure is described as a set of QNA (Quantitative Neighborhoods of Atoms) descriptors [3]. QNA descriptors are based on values of ionization potential (IP) and electron affinity (EA) of each atom in the molecule. QNA descriptors are calculated as following: Pi = Σk Bi-½(Exp(–½C))ikBk-½, Qi = Σk Bi-½(Exp(–½C))ikBk-½Ak, where Ak = ½(IPk + EAk), Bk = IPk – EAk, C is a molecular connectivity matrix. Thus, each atom of molecule is described by two values, P and Q. Since any molecule has different number of atoms, P and Q are proportional to the number of atoms in molecule, but for regression analysis it is necessary to describe the molecular structure as a vector with the fixed length. Therefore, Chebyshev polynomial’s are used for vector’s presentation of a molecular structure:

where Tn is nth degree of Chebyshev polynomial, P` and Q` are the orthonormalized representation of P and Q values (zero mean values of P` and Q`, unit variance and absence of correlation of P` and Q`). The Tn(P,Q) values are calculated for each atom of a molecule. A whole molecule is presented as an average value of Chebyshev polynomials for all atoms; therefore, the length of the vector is defined by the numbers of Chebyshev polynomials - m. On one hand the large number of Chebyshev polynomials may describe complex structure-activity relationships; on the other hand the large length of the vector that represents the structure may provide overtraining in regression analysis. Therefore, the initial value of m is determined as a half of number of molecules in the training set. Self-Consistent Regression (SCR). Self-consistent regression can obtain the best QSAR/QSPR model for the training set with a large number of descriptors. SCR is based on least-squares regularized method. The main feature of SCR method is a removal of variables, which are worse for description of an appropriate value [4]. Integral parameter ToxDose. Using dosage characteristics from all 75 end-points, experimental data for which were obtained in vivo, we calculated the integral parameter - ToxDose for 283 compounds:

))),((()))(((),( QthArcCosnCosPthArcCosnCosQPT QPn QP nnn

Analysis of Chemical Space

0

1

2

3

4

5

6

7

8

0 10 20 30 40 50 60 70 80 90 100

TOXCAST_309_0_MF.txt

10_MF.txt

MDDR 2003 Principle_MF.txt

roadmap_2008-01-28_MF.txt

We compared the distribution of molecules from ToxCast Phase 1 dataset with compounds from the PASS Training set (10_MF), MDDR 2003, and RoadMap 2008. The compounds from ToxCast Phase 1 dataset contain less non-hydrogen atoms than typical drug-like molecules, and RoadMap dataset. In PASS training set average MW=416 Dalton; The average molecular weight of compounds from ToxCast Phase 1 database is 302 Dalton that smaller than those for drug-like compounds.

Analysis of Biological Parameters

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 100 200 300 400 500 600

Molecular Weight

k kDn

mToxDose

1where: D is the LEL value; m is the number of end-points for a particular compound; n is the total number of tests.

PASS prediction of the carcinogenicity To predict the carcinogenicity with PASS, we used compounds from CPDB as a training set. After the training procedure average accuracy of prediction (LOO CV) equals to 74%. With the trained version of PASS we predicted carcinogenicity for 306 compounds from ToxRefDB. Four compounds that have two components were excluded from the prediction. Accuracy of prediction for ToxRefDB is given below.

  NA TP TN FP FN Sensitivity Specificity Accuracy

CHR_Mouse_LiverTumors 71 31 108 41 64 0.33 0.72 0.57

CHR_Mouse_LungTumors 71 11 138 11 84 0.12 0.93 0.61

CHR_Mouse_Tumorigen 71 25 129 41 49 0.34 0.76 0.63

CHR_Rat_LiverTumors 62 8 159 28 58 0.12 0.85 0.66

CHR_Rat_TesticularTumors 62 12 133 101 7 0.63 0.57 0.57

CHR_Rat_ThyroidTumors 62 3 212 20 18 0.14 0.91 0.85

CHR_Rat_Tumorigen 62 8 184 46 15 0.35 0.8 0.76

NA - data not available; TP - true positive; TN - true negative; FP - false positive; FN - false negative; Sensitivity – TP/(TP+FN); Specificity – TN/(TN+FP); Accuracy – (TP+TN)/(TP+TN+FP+FN)

Accuracy of prediction varies from 0.57 (CHR_Mouse_LiverTumors and CHR_Rat_TesticularTumors) to 0.85 (CHR_Rat_ThyroidTumors). Sensetivity varies from 0.12 (CHR_Mouse_LungTumors) to 0.63 (CHR_Rat_TesticularTumors). Specificity varies from 0.57 (CHR_Rat_TesticularTumors) to 0.93 (CHR_Mouse_LungTumors). Rat activities were predicted more accurately than mouse activities.

QSAR Models for Rat’s Cholinesterase Inhibitors We collected 45 inhibitors of Rat’s Cholinesterase (CHR_Rat_CholinesteraseInhibition) from TC1. The toxicity end-point based on the EC50 (mg/kg) values was used for building of QSAR models, using QNA descriptors and SCR. The eighteen QSAR models were created by QNA/SCR approach. Only four QSAR models have Q2 > 0.50 and R2 > 0.60. These models were additionally validated by leave-10%-out cross validation. Leave-10%-out cross validation procedure was repeated 20 times and average R2 of prediction was calculated. Results of validation are presented below.

Name Number R2 Q2 Fisher SD Variables L10%OCV

model 1 45 0.697 0.567 10.706 0.699 8 0.71

model 2 45 0.684 0.559 10.073 0.706 8 0.70

model 3 45 0.676 0.553 11.366 0.709 7 0.65

model 4 45 0.686 0.520 7.960 0.744 10 0.52

Three from four QSAR models have average R2 of prediction more then 0.60. Therefore, the obtained models are robust and predictive.

ToxDose

0.01

0.1

1

10

100

1000

10000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

mk

M/k

g

CHR_Mouse_KidneyPathology

CHR_Mouse_LiverHypertrophy

CHR_Mouse_LiverNecrosis

CHR_Mouse_LiverProliferativeLesions

CHR_Mouse_LiverTumors

CHR_Mouse_LungTumors

CHR_Mouse_Tumorigen

CHR_Rat_CholinesteraseInhibition

CHR_Rat_KidneyNephropathy

CHR_Rat_KidneyProliferativeLesions

CHR_Rat_LiverHypertrophy

CHR_Rat_LiverNecrosis

CHR_Rat_LiverProliferativeLesions

CHR_Rat_LiverTumors

CHR_Rat_SpleenPathology

CHR_Rat_TesticularAtrophy

CHR_Rat_TesticularTumors

CHR_Rat_ThyroidHyperplasia

CHR_Rat_ThyroidProliferativeLesions

CHR_Rat_ThyroidTumors

CHR_Rat_Tumorigen

ToxDose_mkM

We calculated the integral parameter ToxDose for twenty end-points of carcinogenicity for mouse and rat. ToxDose values are correlated with other carcinogenicity end-points and may be use for prioritization of molecules from the TC1 dataset for toxicological testing. The data for cholinesterase inhibition differs significantly from all other end-points; thus they were excluded from the further analysis.

Results of PASS Training for Predicting Different Categories of ToxDose

The ninety five percent of ToxCast compounds are discriminated from drug-like molecules. For different toxicity grouping PASS accuracy of recognition varies from 75.0% to 59.1%; and the most toxic compounds are predicted better. Thus, PASS prediction could be applied for selection of priorities in testing of the most probable toxic compounds.

Conclusions

1) http://www.epa.gov/NCCT/dsstox/2) Poroikov V, Filimonov D. 2005. PASS: Prediction of Biological Activity Spectra for Substances. In: Predictive Toxicology (Christoph Helma, eds). LLC, Boca Raton, Taylor & Francis Group, 459-478.3) Lagunin, A.; Zakharov, A.; Filimonov, D.; Poroikov, V. A new approach to QSAR modelling of acute toxicity. SAR and QSAR in Environmental Research 2007, 18, 285-298. 4) Filimonov, D.; Akimov, D.; Poroikov, V. The Method of Self-Consistent Regression for the Quantitative Analysis of Relationships Between Structure and Properties of Chemicals. Pharm.Chem. J. 2004, 1, 21-24.

Biological activities predicted by PASS can be directly compared to TC1 in vivo data only in a few cases (carcinogenicity and cholinesterase inhibitors).Comparison of TC1 in vivo data for the same species and between the species lead to the following conclusions:1) Correlation coefficients between the in vivo data for same species varies from 0.75 to 0.95.2) Correlation coefficients for the same tissues between the species less than 0.10.Comparison of in vivo data presented as integral parameter (ToxDose) with in vitro data demonstrated that the maximal value of correlation coefficient is 0.26 (ToxDose vs. ToxCast Novascreen data). Thus, no significant correlation between in vivo and in vitro data is found.

No Activity Type NumberIEP, %Data1

IEP, %Data2

1 ToxCast 301 5.151 5.267

2 CHR_ToxDose < 1.0 mkM/kg 9 25.689 5.706

3 CHR_ToxDose < 3.16 mkM/kg 19 28.623 9.155

4 CHR_ToxDose < 10.0 mkM/kg 51 28.052 6.487

5 CHR_ToxDose < 31.6 mkM/kg 89 25.008 5.24

6 CHR_ToxDose < 100.0 mkM/kg 145 27.272 6.457

7 CHR_ToxDose < 316.0 mkM/kg 176 24.13 6.011

8 CHR_ToxDose < 1000.0 mkM/kg 196 23.499 5.727

9 CHR_ToxDose 0.316-3.16 mkM/kg 15 33.091 14.14

10 CHR_ToxDose 1.0-10.0 mkM/kg 42 26.576 7.669

11 CHR_ToxDose 3.16-31.6 mkM/kg 70 28.502 7.024

12 CHR_ToxDose 10.0-100.0 mkM/kg 94 38.76 10.417

13 CHR_ToxDose 31.6-316.0 mkM/kg 87 33.645 9.508

14 CHR_ToxDose 100.0-1000.0 mkM/kg 51 40.901 15.01

15 CHR_ToxDose 316.0-3160.0 mkM/kg 26 34.029 14.915

Here: Data1 is 301 chemicals from ToxCast in vivo dataset with 100 drug-like chemical compounds; Data2 is 301 chemicals from ToxCast in vivo dataset with 60620 chemical compounds from PASS training set.

1) Compounds from TC1 dataset are smaller than typical drug-like structures and molecules from RoadMap set.

2) No significant correlation between the in vivo and in vitro data from TC1 set was observed.3) Despite the chemical dissimilarity between the TC1 compounds and drug-like molecules, PASS-

based prediction of carcinogenicity could be obtain with reasonable accuracy.4) It is shown that integral parameter characterizing general toxicity ToxDose can be predicted by PASS

with reasonable accuracy. Thus, such approach could be recommended for prioritization in chemicals testing.