10
Journal of Hazardous Materials 183 (2010) 913–922 Contents lists available at ScienceDirect Journal of Hazardous Materials journal homepage: www.elsevier.com/locate/jhazmat QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis Kunal Roy , Rudra Narayan Das Drug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India article info Article history: Received 17 June 2010 Received in revised form 27 July 2010 Accepted 27 July 2010 Available online 6 August 2010 Keywords: QSTR QSAR Tetrahymena pyriformis ETA Validation Toxicity abstract Aldehydes are a toxic class of chemicals causing severe health hazards. In this background, quanti- tative structure–toxicity relationship (QSTR) models have been developed in the present study using Extended Topochemical Atom (ETA) indices for a large group of 77 aromatic aldehydes for their acute toxicity against the protozoan ciliate Tetrahymena pyriformis. The ETA models have been compared with those developed using various non-ETA topological indices. Attempt was also made to include the n-octanol/water partition coefficient (log K o/w ) as an additional descriptor considering the impor- tance of hydrophobicity in toxicity prediction. Thirty different models were developed using different chemometric tools. All the models have been validated using internal validation and external validation techniques. The statistical quality of the ETA models was found to be comparable to that of the non- ETA models. The ETA models have shown the important effects of steric bulk, lipophilicity, presence of electronegative atom containing substituents and functionality of the aldehydic oxygen to the toxic- ity of the aldehydes. The best ETA model (without using log K o/w ) shows encouraging statistical quality (Q 2 int = 0.709,Q 2 ext = 0.744). It is interesting to note that some of the topological models reported here are better in statistical quality than previously reported models using quantum chemical descriptors. © 2010 Elsevier B.V. All rights reserved. 1. Introduction With the advent of modern science, technology and industri- alization, the use of chemicals is getting increased accompanied with an increased number of new chemicals. Hence, the harmful effects of chemical contamination and poisoning are getting a soci- etal impact. This has led both the chemical industry and regulatory agencies to focus on the effects that a particular chemical would exert in the environment prior to its release in the ecosystem [1]. As per the European Union’s Registration, Evaluation and Autho- rization of Chemicals (REACH) program, all chemicals produced or imported >1 ton per annum (tpa) in the European Union will need to be assessed for human and environmental hazards [2]. As per the European Union White Paper concerning a future of chemi- cal policy, the development and use of methods to determine the potential hazardous effects of chemicals on living organisms must be developed [3]. In this perspective, experimental toxicity test- ing of chemicals that are exposed to the environment would be the most reliable approach. Though experimental testing yields the most reliable data regarding the effect of chemicals, screening of a Corresponding author. Tel.: +91 98315 94140; fax: +91 33 2837 1078. E-mail addresses: kunalroy [email protected], [email protected] (K. Roy). URL: http://sites.google.com/site/kunalroyindia/ (K. Roy). large number of potential toxicants is troublesome [4]. The toxico- logical experimentation and generation of data thereof is often time consuming and costly too. Development of predictive models in the form of quantitative structure–activity relationships (QSARs) offers a good opportunity to fill data gaps in environmental risk assess- ment and regulatory concerns in such cases [5]. QSAR uses chemical toxicity data of comparatively less number of compounds to pre- dict the toxicity of a large number of compounds. The Office of Toxic Substances of the US Environmental Protection Agency has devel- oped several QSARs for small available data points assuming the nature of relationship between a chemical class and correspond- ing toxicity [6]. QSAR is now accepted as a very efficient tool to predict adverse effects of xenobiotics in risk assessment due to its consistent application for the design of more efficacious pharma- ceuticals and pesticides [7]. QSAR is very useful and fruitful in the ecotoxicological studies for the prediction of toxicity of compounds [8,9]. Toxicity data of a large amount of chemicals obtained by reliable and robust methods is available for the inhibition of the growth of the protozoan ciliate Tetrahymena pyriformis [10–13]. T. pyri- formis presents the most expensive toxicity dataset for population growth inhibition. Members of Ciliophora are also frequently used in toxicity assessment as a protozoan test system. The relationship between fish toxicity and toxic potency of T. pyriformis is well estab- lished [14]. Hence, a dataset of chemicals showing toxicity against T. 0304-3894/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.jhazmat.2010.07.116

QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

Embed Size (px)

Citation preview

Page 1: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

Qt

KDJ

a

ARRAA

KQQTEVT

1

aweeaeArittcpbitm

0d

Journal of Hazardous Materials 183 (2010) 913–922

Contents lists available at ScienceDirect

Journal of Hazardous Materials

journa l homepage: www.e lsev ier .com/ locate / jhazmat

STR with extended topochemical atom (ETA) indices. 14. QSAR modeling ofoxicity of aromatic aldehydes to Tetrahymena pyriformis

unal Roy ∗, Rudra Narayan Dasrug Theoretics and Cheminformatics Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology,

adavpur University, Kolkata 700 032, India

r t i c l e i n f o

rticle history:eceived 17 June 2010eceived in revised form 27 July 2010ccepted 27 July 2010vailable online 6 August 2010

eywords:STR

a b s t r a c t

Aldehydes are a toxic class of chemicals causing severe health hazards. In this background, quanti-tative structure–toxicity relationship (QSTR) models have been developed in the present study usingExtended Topochemical Atom (ETA) indices for a large group of 77 aromatic aldehydes for their acutetoxicity against the protozoan ciliate Tetrahymena pyriformis. The ETA models have been comparedwith those developed using various non-ETA topological indices. Attempt was also made to includethe n-octanol/water partition coefficient (log Ko/w) as an additional descriptor considering the impor-tance of hydrophobicity in toxicity prediction. Thirty different models were developed using differentchemometric tools. All the models have been validated using internal validation and external validation

SAR

etrahymena pyriformisTAalidationoxicity

techniques. The statistical quality of the ETA models was found to be comparable to that of the non-ETA models. The ETA models have shown the important effects of steric bulk, lipophilicity, presenceof electronegative atom containing substituents and functionality of the aldehydic oxygen to the toxic-ity of the aldehydes. The best ETA model (without using log Ko/w) shows encouraging statistical quality(Q 2 = 0.709, Q 2 = 0.744). It is interesting to note that some of the topological models reported here

ality

int ext

are better in statistical qu

. Introduction

With the advent of modern science, technology and industri-lization, the use of chemicals is getting increased accompaniedith an increased number of new chemicals. Hence, the harmful

ffects of chemical contamination and poisoning are getting a soci-tal impact. This has led both the chemical industry and regulatorygencies to focus on the effects that a particular chemical wouldxert in the environment prior to its release in the ecosystem [1].s per the European Union’s Registration, Evaluation and Autho-ization of Chemicals (REACH) program, all chemicals produced ormported >1 ton per annum (tpa) in the European Union will needo be assessed for human and environmental hazards [2]. As perhe European Union White Paper concerning a future of chemi-al policy, the development and use of methods to determine theotential hazardous effects of chemicals on living organisms must

e developed [3]. In this perspective, experimental toxicity test-

ng of chemicals that are exposed to the environment would behe most reliable approach. Though experimental testing yields the

ost reliable data regarding the effect of chemicals, screening of a

∗ Corresponding author. Tel.: +91 98315 94140; fax: +91 33 2837 1078.E-mail addresses: kunalroy [email protected], [email protected] (K. Roy).URL: http://sites.google.com/site/kunalroyindia/ (K. Roy).

304-3894/$ – see front matter © 2010 Elsevier B.V. All rights reserved.oi:10.1016/j.jhazmat.2010.07.116

than previously reported models using quantum chemical descriptors.© 2010 Elsevier B.V. All rights reserved.

large number of potential toxicants is troublesome [4]. The toxico-logical experimentation and generation of data thereof is often timeconsuming and costly too. Development of predictive models in theform of quantitative structure–activity relationships (QSARs) offersa good opportunity to fill data gaps in environmental risk assess-ment and regulatory concerns in such cases [5]. QSAR uses chemicaltoxicity data of comparatively less number of compounds to pre-dict the toxicity of a large number of compounds. The Office of ToxicSubstances of the US Environmental Protection Agency has devel-oped several QSARs for small available data points assuming thenature of relationship between a chemical class and correspond-ing toxicity [6]. QSAR is now accepted as a very efficient tool topredict adverse effects of xenobiotics in risk assessment due to itsconsistent application for the design of more efficacious pharma-ceuticals and pesticides [7]. QSAR is very useful and fruitful in theecotoxicological studies for the prediction of toxicity of compounds[8,9].

Toxicity data of a large amount of chemicals obtained by reliableand robust methods is available for the inhibition of the growthof the protozoan ciliate Tetrahymena pyriformis [10–13]. T. pyri-

formis presents the most expensive toxicity dataset for populationgrowth inhibition. Members of Ciliophora are also frequently usedin toxicity assessment as a protozoan test system. The relationshipbetween fish toxicity and toxic potency of T. pyriformis is well estab-lished [14]. Hence, a dataset of chemicals showing toxicity against T.
Page 2: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

914 K. Roy, R.N. Das / Journal of Hazardous Materials 183 (2010) 913–922

Table 1List of aromatic aldehydes and their log Ko/w values along with observed and calculated toxicity values to Tetrahymena pyriformis.

S. no. Compounds log Ko/wa Toxicity to Tetrahymena pyriformis (pIGC50)

Observeda Calculated

Model 4 Model 29

1 4-Nitrobenzaldehyde 1.56 0.203 0.202 0.0972 1-Naphthaldehyde 2.67 0.423 0.780 0.8563 4-Biphenylcarboxaldehyde 3.38 1.119 1.117 1.1554 4-Bromobenzaldehyde 2.48 0.587 0.573 0.4485 4-Cyanobenzaldehyde 1.21 0.043 0.213 −0.1156 Benzaldehyde 1.48 −0.196 −0.161 −0.2217 p-Tolualdehyde 1.99 −0.057 0.019 0.1218 4-Fluorobenzaldehyde 1.54 −0.127 0.030 −0.0289 4-Chlorobenzaldehyde 2.13 0.400 0.355 0.288

10 4-Ethylbenzaldehyde 2.52 0.291 0.204 0.33811* Terephthaldicarboxaldehyde 1.36 −0.086 0.158 −0.01712 4-Anisaldehyde 1.65 −0.047 0.016 0.10813 4-Ethoxybenzaldehyde 2.31 0.073 0.324 0.40614 4-Acetamidobenzaldehyde 1.25 −0.224 0.277 −0.19315 2-Tolualdehyde 2.26 0.011 0.069 0.30716* 3-Tolualdehyde 1.99 0.081 0.040 0.12117 2-Chlorobenzaldehyde 2.33 0.487 0.426 0.42818 3-Chlorobenzaldehyde 2.26 0.406 0.384 0.37319* 2-Nitrobenzaldehyde 1.74 0.167 0.344 0.21420 3-Nitrobenzaldehyde 1.47 0.178 0.260 0.03821 Phenyl-1,3-dialdehyde 1.36 0.183 0.198 −0.01722 2-Anisaldehyde 1.72 0.148 0.103 0.16423 3-Anisaldehyde 1.71 0.232 0.052 0.14724 3-Bromobenzaldehyde 2.48 0.506 0.593 0.44825 3-Fluorobenzaldehyde 1.76 0.154 0.060 0.11526 2,4-Dichlorobenzaldehyde 3.08 1.036 1.070 0.99827 2,4-Dimethoxybenzaldehyde 1.79 −0.056 −0.022 0.15828 2,4,5-Trimethoxybenzaldehyde 1.19 −0.101 0.073 −0.33629 4-(Dimethylamino)benzaldehyde 1.81 0.231 0.024 0.25930 4-Phenoxybenzaldehyde 3.96 1.257 1.075 1.44731 2-Bromobenzaldehyde 2.48 0.477 0.622 0.45832 2-Fluorobenzaldehyde 1.76 0.079 0.104 0.12633* 4-Butoxybenzaldehyde 3.37 0.716 0.773 0.88534* 4-(Pentyloxy)benzaldehyde 3.89 1.179 1.016 1.12235 4-Isopropylbenzaldehyde 2.92 0.67 0.435 0.56536 Pentafluorobenzaldehyde 2.39 0.815 1.071 0.64937 2-Chloro-5-nitrobenzaldehyde 2.25 0.527 0.810 0.52638 2-Chloro-6-fluorobenzaldehyde 2.51 0.155 0.775 0.70639 3-Cyanobenzaldehyde 1.18 −0.02 0.255 −0.13440* 2-Chloro-3-hydroxy-4-methoxybenzaldehyde 1.72 0.204 0.459 0.48341 6-Chloro-2-fluoro-3-methylbenzaldehyde 3.01 1.238 0.948 1.04542 3-Chloro-2-fluoro-5-(trifluoromethyl)benzaldehyde 3.5 1.723 1.391 1.34443 2,3,5-Trichlorobenzaldehyde 3.69 1.499 1.614 1.48444 2-Fluorenecarboxaldehyde 3.43 1.499 1.556 1.31445 2-Methyl-1-naphthaldehyde 3.17 1.231 1.072 1.15546 4-Methyl-1-naphthaldehyde 3.17 1.123 1.038 1.15547* Phenanthrene-9-carboxaldehyde 3.84 1.708 1.702 1.51148 5-Hydroxy-2-nitrobenzaldehyde 1.75 0.329 0.413 0.50349* 3-Hydroxy-4-nitrobenzaldehyde 1.47 0.273 0.431 0.32150 3-Hydroxybenzaldehyde 1.38 0.085 −0.018 0.16151 3-Hydroxy-4-methoxybenzaldehyde 0.97 −0.142 −0.070 0.01552 3,4-Dimethoxy-5-hydroxycarboxaldehyde 0.69 −0.39 −0.036 −0.27453 2,3-Dihydroxybenzaldehyde 1.03 0.111 0.178 0.09554 2,5-Dihydroxybenzaldehyde 1.33 0.277 0.178 0.28055 3,4-Dihydroxybenzaldehyde 1.03 0.107 0.113 0.08556* 3,4,5-Trihydroxybenzaldehyde 0.42 −0.196 0.144 −0.27957 2,3,4-Trihydroxybenzaldehyde 0.79 0.001 0.179 −0.03958* 2,4,6-Trihydroxybenzaldehyde 0.72 0.128 0.214 −0.08559* 2,4-Dihydroxybenzaldehyde 1.33 0.515 0.152 0.28060 3-Ethoxy-2-hydroxycarboxaldehyde 2.17 0.85 0.334 0.70661* 3-Methoxysalicylaldehyde 1.37 0.377 −0.002 0.27562 3,5-Dibromosalicylaldehyde 3.42 1.648 1.348 1.53463 4,6-Dimethoxy-2-hydroxybenzaldehyde 1.86 0.617 0.033 0.48764 2-Hydroxy-3-nitrocarboxaldehyde 1.84 0.87 0.520 0.56165* 2-Chloro-4-hydroxycarboxaldehyde 2.28 0.89 0.588 0.83966* 4-Hydroxy-3-nitrobenzaldehyde 1.48 0.614 0.313 0.32767 4-Hydroxybenzaldehyde 1.35 0.266 −0.047 0.14268* 2-Hydroxy-1-naphthaldehyde 2.99 1.32 0.860 1.32869 5-Bromovanillin 1.92 0.617 0.595 0.61270* 4-Hydroxy-1-naphthaldehyde 2.62 1.05 0.810 1.085

Page 3: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

K. Roy, R.N. Das / Journal of Hazardous Materials 183 (2010) 913–922 915

Table 1 (Continued)

S. no. Compounds log Ko/wa Toxicity to Tetrahymena pyriformis (pIGC50)

Observeda Calculated

Model 4 Model 29

71 5-Bromosalicylaldehyde 2.8 1.107 0.738 1.10972 5-Chlorosalicylaldehyde 2.65 1.009 0.612 1.07973* 2-Hydroxybenzaldehyde 1.81 0.424 0.025 0.45174 3-Bromo-4-hydroxycarboxaldehyde 2.15 0.61 0.673 0.68675* 3-Methoxy-4-hydroxybenzaldehyde 1.21 −0.03 −0.061 0.17176 3,5-Dibromo-4-hydroxycarboxaldehyde 2.77 0.89 1.289 1.112

.58 0.015 0.280 0.313

ppamttam[afer

tciuirms

acat(r

mehEndipt

2

2

i(sctv

cal descriptors have also been calculated for the given dataset ofcompounds to assess the performance of ETA descriptors in com-parison to the non-ETA descriptors. Different non-ETA descriptorsconsidered here are presented in Table 2.

Table 2Categorical list of non-ETA descriptors used in the QSTR analysis.

Category of descriptors Name of descriptors

Topological Balaban J, Kappa shape indices (1�,2�,3�,1��m,2��m,3��m), Flexibility �, Subgraphcount (SC-0, SC-1, SC-2, SC-3 P, SC-3 C),Molecular Connectivity index (0�, 1�, 2�, 3�p,3�c, 0�v, 1�v, 2�v, 3�v

p,3�vc), Zagreb, Wiener

W, Electrotopological state parameters

77* 3-Ethoxy-4-hydroxybenzaldehyde 1

a From ref. [4].* Test set compounds.

yriformis is effective and reliable. QSAR plays an important role inredicting toxicity of different classes of compounds like aromaticnd aliphatic aldehydes, ketones, esters, alcohols etc. A global QSARodel was developed by Cronin and Schultz for the prediction of

he toxicity of phenols to T. pyriformis using 166 phenols [15], andhe model was refined afterwards using 250 compounds (Cronin etl.) [12]. Zhao et al. used the same dataset and developed a QSARodel with hydrophobicity, ionization and electronic parameters

16] and later on with ionization and polarity of phenols [17]. Innother QSAR study, Roy and coworkers have developed modelsor six different groups of aliphatic compounds such as alcohols,sters, acids, aldehydes, ketones and amines in order to provide aobust prediction of aliphatic toxicity against the T. pyriformis [18].

Aldehydes present a very essential class of chemical compoundshat are used widely in different types of chemical industries. Theyause tissue damage, cytotoxicity, mutagenicity and carcinogenic-ty, thus leading to severe health hazards [19]. Aldehydes are chieflysed in agrochemicals, pharmaceuticals, food, flavor and fragrance

ndustries [20]. Owing to the widespread use of aldehydes andelated compounds and their presence in nature causing environ-ental hazards, it is essential to measure or predict toxicity of all

uch compounds [21].Netzeva and Schultz developed a QSAR model for the toxicity of

ldehydes to T. pyriformis with the maximum acceptor superdelo-alizability and the octanol–water partition coefficient (log Ko/w)s predictor variables [4]. Kar et al. developed QSAR models onhe same dataset using quantum topological molecular similarityQTMS) descriptors along with log Ko/w and obtained encouragingesults [22].

In this present study we have attempted to develop predictiveodels relating toxicity of aromatic aldehydes to T. pyriformis with

xtended topochemical atom (ETA) parameters. The ETA modelsave been compared with those developed using various non-TA topological indices. Attempt was also made to include the-octanol/water partition coefficient (log Ko/w) as an additionalescriptor considering the importance of hydrophobicity in tox-

city prediction. The objective of this study has been to developredictive models using simple and straight-forward approach ofopological descriptors.

. Materials and methods

.1. The dataset

In this present work, 40-h population growth impairment test-ng data of 77 aromatic aldehydes (Table 1) to the ciliate T. pyriformis

strain GL-C) reported by Netzeva and Schultz [4] was used totudy the usefulness and predictive ability of ETA parameters inomparison with other non-ETA topological parameters. In addi-ion to ETA and non-ETA topological descriptors, log Ko/w, whosealues were taken from the literature [4], was used as a predic-

Fig. 1. Arbitrary numbering of non-hydrogen common atoms in the aromatic alde-hyde skeleton.

tor variable considering importance of hydrophobicity for toxicityprediction.

2.2. Types of descriptors

2.2.1. ETA descriptorsThe ETA [23–37] indices formulated based on modification of

TAU descriptors [38–50], which were developed in the valenceelectron mobile (VEM) environment, have been used to developthe QSTR models for 40-h T. pyriformis toxicity of 77 aromaticaldehydes. Some basic parameters of the ETA scheme used in thedevelopment of QSTR models for the above mentioned dataset havebeen defined in Supplementary Materials section. The definitionsof the important ETA parameters are also presented in Table S1in Supplementary Materials section. Fig. 1 shows the eight com-mon non-hydrogen atoms shared by all 77 aldehydic derivatives,collectively referred to as the common skeleton.

2.2.2. Non-ETA descriptorsAlong with ETA descriptors, other selected non-ETA topologi-

(S sCH3, S ssCH2, S dsCH, S tsC, S aaCH,S sssCH, S dssC, S aasC, S aaaC, S ssssC, S ssNH,S tN, S sssN, S ddsN, S sOH, S dO, S ssO, S sF,S sCl, S sBr)

Structural MW, Rotlbonds, Hbond acceptor, Hbond donor.

Page 4: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

916K

.Roy,R

.N.D

as/JournalofH

azardousM

aterials183 (2010) 913–922

Table 3Comparison of models obtained using different chemometric tools.

Type ofdescriptors

Model type Model no. No. ofdescriptors

LVs Descriptors R2 R2a Q 2

intr2

m(LOO)rmsepint Q 2

ext(F1)Q 2

ext(F2)Q 2

ext(F3)r2

m(test)rmsepext r2

m(overall)cR2

p

ETA GFA Linear 1 5 – �˛, [�′F]Cl, [�ε/N],

[∑

ˇ′s], [�′

F]1

0.758 0.735 0.693 0.661 0.285 0.694 0.693 0.693 0.688 0.285 0.621 0.725

Spline 2 4 – [�′F]Cl, �˛, [�ε/N],

〈∑

ˇ′s − 1.125〉

0.749 0.730 0.701 0.667 0.282 0.703 0.702 0.702 0.673 0.281 0.639 0.648

G/PLS Linear 3 6 5 [�˛]2, [�′F]6, [�′

F]8,

[�ε/N], [∑

ˇ′s], [�′

F]Cl

0.761 0.738 0.677 0.654 0.292 0.705 0.704 0.704 0.690 0.280 0.654 0.678

Spline 4 6 5 [�′F]8, [�′

F]Cl,

〈∑

ˇ′s − 1.13636〉, �˛,

〈[�′F]NO2

− 0〉,〈�ε/N − 0.653158〉

0.793 0.773 0.709 0.680 0.278 0.744 0.743 0.743 0.717 0.261 0.683 0.706

Stepwise MLR 5 2 – [�˛]2, [�′F]Cl 0.587 0.572 0.537 0.609 0.350 0.611 0.610 0.610 0.606 0.322 0.572 –

Non-ETA GFA Linear 6 5 – S dsCH,S ssO,S sCl,S sCH3, 1�v

0.757 0.734 0.705 0.675 0.280 0.718 0.717 0.717 0.712 0.274 0.665 0.625

Spline 7 4 – S sCl,〈0.723055 − S dsCH〉,2�v , 〈S ssO − 5.56191〉

0.773 0.756 0.726 0.682 0.269 0.618 0.617 0.617 0.594 0.318 0.641 0.522

G/PLS Linear 8 5 2 S aasC, S sCl, 1�v,2�v,S ssO

0.740 0.731 0.693 0.679 0.285 0.705 0.705 0.704 0.698 0.280 0.683 0.579

Spline 9 4 2 〈S ssO − 5.07023〉,〈SC-O − 11〉, S sCl, 3�v

c

0.770 0.762 0.730 0.715 0.267 0.599 0.598 0.598 0.570 0.326 0.678 0.538

Stepwise MLR 10 4 – 2�v, S sCH3,S sF, S sCl 0.751 0.732 0.708 0.673 0.278 0.697 0.696 0.696 0.707 0.284 0.635 –

ETA + Non-ETA

GFA Linear 11 4 – 2�v , S sCl, S sF, S sCH3 0.751 0.732 0.708 0.673 0.278 0.698 0.697 0.697 0.710 0.283 0.635 0.622

Spline 12 5 – [�′F]1,〈S ssO − 5.07023〉,

[�′F]Cl, Weiner,

〈3.65968 − 2�v〉

0.804 0.785 0.756 0.699 0.254 0.730 0.729 0.729 0.738 0.268 0.696 0.738

G/PLS Linear 13 4 3 2�v, [�′F]local, [�ε/N],

[�˛]p

0.750 0.736 0.683 0.631 0.305 0.701 0.700 0.700 0.680 0.282 0.640 0.649

Spline 14 4 3 2�v, [�′F]Br,

〈2.475 − [�˛]p〉, S sCH3

0.723 0.708 0.639 0.616 0.309 0.711 0.710 0.710 0.702 0.277 0.641 0.624

Stepwise MLR 15 3 – 2�v, [�ε/N], [∑

ˇ′s] 0.736 0.721 0.684 0.659 0.289 0.643 0.643 0.642 0.677 0.308 0.608 –

Page 5: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

ardous Materials 183 (2010) 913–922 917

2

ppcnil

2

fmgmc

2

2

stwiMS

2

sGt

2

ceoatr

(

[

rp[frt

3

sfsEoboo

dev

elop

edw

ith

top

olog

ical

des

crip

tors

alon

gw

ith

log

Ko/

w.

Mod

elty

pe

Mod

eln

o.N

o.of

des

crip

tors

LVs

Des

crip

tors

R2

R2 a

Q2 in

tr2 m

(LO

O)

rmse

pin

tQ

2 ext(

F1)

Q2 ex

t(F2

)Q

2 ext(

F3)

r2 m(t

est)

rmse

pex

tr2 m

(ove

rall

)c R

2 p

G/P

LSSp

lin

e19

53

〈0.1

8201

8−

[�′ F] F

〉,〈[�

′ F] C

HO

−0.

2618

86〉,

〈[�˛

] Y−

0.5〉

,〈0

.673

8−

�ε/

N〉,

log

Ko/

w

0.88

50.

879

0.85

80.

849

0.19

40.

842

0.84

10.

841

0.80

60.

205

0.83

70.

688

G/P

LSSp

lin

e24

54

log

Ko/

w,S

C-3

C,

〈17.

5956

−S

sOH

〉,S

tN,S

dO

0.90

40.

897

0.88

60.

882

0.17

40.

900

0.90

00.

900

0.90

00.

163

0.88

60.

656

G/P

LSSp

lin

e29

43

〈3.2

0702

−[�

′ F] lo

cal〉,

log

Ko/

w,

〈0.1

5882

−[�

′ F] O

H〉,2

0.89

50.

889

0.87

00.

868

0.18

60.

902

0.90

20.

902

0.89

60.

161

0.87

70.

719

K. Roy, R.N. Das / Journal of Haz

.2.3. Physicochemical descriptorn-Octanol/water partition coefficient or log Ko/w was used as the

hysicochemical descriptor considering its importance in toxicityrediction. The partition coefficient value (log Ko/w) was used inombination with ETA, non-ETA and the combined set (ETA andon-ETA) of descriptors to assess its role for the prediction of tox-

city of aldehydes in presence of other topological descriptors. Theog Ko/w values were collected from ref. [4].

.3. Software used for generation of descriptors

Different ETA descriptors �, �R, �F, �B and contributions of dif-erent vertices or fragments to �F were calculated, using distance

atrix and VEM vertex counts as inputs, using the GW-BASIC pro-rams KRETA1 and KRETA2 [51]. The toxicity dataset has also beenodeled by using other non-ETA descriptors which have been cal-

ulated using Cerius 2 version 4.10 software [52].

.4. Model development

.4.1. Statistical analyses performedThe statistical analyses were carried out employing partial least

quares (PLS) and multiple linear regression (MLR) as the statis-ical tools. Different methods used for the selection of descriptorsere Stepwise Regression (for MLR) [53], Genetic Function Approx-

mation (GFA) (for MLR and PLS [54–58]). The details of stepwiseLR, GFA [59–61] and G/PLS [62,63] techniques have been given in

upplementary Materials section.

.4.2. Software used for model developmentMINITAB [64] was used for stepwise regression and partial least

quares methods. Cerius2 version 4.10 [52] was used for GFA and/PLS analyses. STATISTICA [65] was used for the determination of

he LOO (leave-one-out) values of the training set compounds.

.4.3. Statistical parametersThe statistical quality of the equations [66] was judged by cal-

ulating several metrics namely correlation coefficient (r or R),xplained variance (R2

a) and variance ratio (F) at specified degreesf freedom (df). The regression constants and F ratios of all theccepted equations are significant at 95 and 99% levels, respec-ively, if not stated otherwise. Internal validation parameters (Q2,2m(LOO) [67] and rmsepint [68]), external validation parameters

Q 2ext(F1), Q 2

ext(F2) [69,70] and Q 2ext(F3) [71], r2

m(test) [72] and rmsepext

68]) and overall validation parameter [r2m(overall)] [70] were also

eported. The r2m metrics have been recently introduced by the

resent authors’ group [67,72] and extensively used by them73–77] and also other research groups [78–82]. We have also per-ormed process randomization test for the genetic models [83] andeported the cR2

p values [84]. The details of the validation parame-ers have been given in Supplementary Materials section.

. Results and discussion

Five different models were developed applying GFA (linear andpline), G/PLS (linear and spline), and stepwise regression methodsor each set of descriptors (ETA, non-ETA and combined set). Table 3hows a comparison among different models obtained for ETA, non-

TA and combined set of descriptors. It is observed that during usef different chemometric methods, a few descriptors emerge in theest equations repetitively in different models representing theirbvious importance in predicting the toxicity of the selected classf compounds. Ta

ble

4Se

lect

edm

odel

s

Typ

eof

des

crip

tors

ETA

+lo

gK

o/w

Non

-ET

A+

log

Ko/

w

ETA

+N

on-

ETA

+lo

gK

o/w

Page 6: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

9 ardous

3

ts[tratcmsvGvst〈0rbaccit(00rtctr

A((2bQvim

3

dT7

ta–mdar

owSs

18 K. Roy, R.N. Das / Journal of Haz

.1. Models with ETA descriptors

Two GFA (linear and spline) and one G/PLS (spline) models con-ain the term �˛ and the rest two models contain the term [�˛]2,ignifying the importance of �˛, i.e., the molecular bulk. The term�′

F]Cl is present in the best equations using ETA descriptors for allhe five chemometric methods signifying the contribution of chlo-ine atom (i.e., lipophilicity). Similarly, the terms �ε/N and

∑ˇ′

sre present in the best equations for GFA and G/PLS methods andhese signify the contribution of electronegative atoms (�ε/N) andontribution of sigma electrons (

∑ˇ′

s). The GFA and G/PLS splineodels provide a more explanatory contribution of

∑ˇ′

s; the GFApline model shows the presence of the term 〈

∑ˇ′

s − 1.125〉 (aalue of

∑ˇ′

s less than 1.125 positively contributes) while the/PLS spline model shows the presence of 〈∑ˇ′

s − 1.13636〉 (aalue of

∑ˇ′

s less than 1.13636 positively contributes) therebyuggesting a possible range of the value of

∑ˇ′

s required for theoxicity. Similarly, the G/PLS spline model also contains the term�ε/N − 0.653158〉 suggesting a range (a value of �ε/N more than.653158 positively contributes) of values for the descriptor �ε/Nequired for the toxicity. The GFA linear model also shows contri-ution of [�′

F]1, i.e., contribution of the aromatic carbon to whichldehyde group is attached. The G/PLS linear model similarly showsontributions of [�′

F]8, the aldehydic carbonyl oxygen), [�′F]6, (ortho

arbon to C1) and [�′F]NO2

(the nitro substituent). From Table 3,t is observed that the G/PLS spline model is the best model onhe basis of r2

m(overall) (0.683). Model 1 (GFA linear) and model 2GFA spline) are comparable in terms of rmsepint (having values.285 and 0.282 respectively), rmsepext (having values 0.285 and.281 respectively), adjusted R2 (R2

a) (having values 0.735 and 0.730espectively) and r2

m(LOO) (having values 0.661 and 0.667 respec-ively) metrics. Model 1 (GFA linear) and model 3 (G/PLS linear) areomparable in terms of model R2 values (0.758 and 0.761 respec-ively), adjusted R2 (R2

a) values (0.735 and 0.738 respectively), and2m(test) (having values 0.688 and 0.690 respectively) parameters.

gain, model 1, model 2 and model 3 are comparable in terms of R2a

having values 0.735, 0.730 and 0.738 respectively) and rmsepext

having values 0.285, 0.281and 0.280 respectively). Similarly model(GFA spline) and model 4 (G/PLS spline) are comparable on the

asis of corresponding leave-one-out cross validation parameter2int (having values 0.701 and 0.709 respectively). Such comparablealues of quality metrics for different models obtained by apply-ng different techniques suggest proper selection of variables for

odeling the toxicity.

.2. Models with non-ETA descriptors

All the five models involving non-ETA descriptors contain theescriptor S sCl signifying the importance of chlorine atom (–Cl).he descriptor S dsCH is present in model 6 (GFA linear) and model(GFA spline) and it suggests the importance of the fragment of

ype . The fragment S sCH3 is present in model 6 (GFA linear)nd model 10 (stepwise regression) signifying the importance ofCH3 (methyl) group. Model 6 (GFA linear), model 7 (GFA spline),odel 8 (G/PLS linear) and model 9 (G/PLS spline) contain another

escriptor S ssO which refers to the importance of ether type link-ge –O–. The fragment S aasC is present in model 8 (G/PLS spline)eferring to the importance of aromatic carbon fragment of type

. Model 10 (stepwise regression) has shown the importancef fluorine atom (–F). A required range of the descriptor S ssOas obtained in model 7 (GFA spline) 〈S ssO − 5.56191〉 (a value ofssO less than 5.56191 positively contributes) and model 9 (G/PLS

pline) 〈S ssO − 5.07023〉 (a value of S ssO less than 5.07023 pos-

Materials 183 (2010) 913–922

itively contributes). Model 7 also provided such relationship forS dsCH: 〈0.723055–S dsCH〉 (a value of S dsCH less than 0.723055is required for the toxicity). The connectivity index 1�v was found tobe present in model 6 and model 8 signifying the importance of onebond fragment and branching considering valence electron contri-bution. Similarly, 2�v was present in model 7, model 8 and model10 signifying the importance of second order valence molecularconnectivity. The parameter 3�v

c was present in model 9 signifyingimportance of third order valence molecular connectivity of cluster

type ( ). Model 9 also shows the presence and thereby impor-tance of the subgraph count index SC-0, the zero order subgraphwhich refers to the number of vertices in a molecular graph. Themodel provides a required range for SC-0: 〈SC-0–11〉 (a value of SC-0more than 11 is required for positive contribution). Different qual-ity metrics calculated for the non-ETA models also are comparablefor different models. Model 6 and model 8 are comparable in termsof their adjusted R2 (having values of 0.734 and 0.731 respectively),rmsepint (having values of 0.280 and 0.285 respectively) and r2

m(LOO)(having values of 0.675 and 0.679 respectively) metrics. Model 7and model 8 are comparable in terms of their corresponding r2

m(LOO)(having values of 0.682 and 0.679 respectively) parameter. Simi-larly, model 7 and model 9 are comparable considering R2 (havingvalues of 0.773 and 0.770 respectively), leave-one-out cross valida-tion correlation coefficient Q 2

int (having values of 0.726 and 0.730respectively) and root mean square error in prediction of the train-ing set rmsepint (having values of 0.269 and 0.267 respectively)parameters. Model 6 (GFA linear), model 8 (G/PLS linear) and model10 (stepwise regression) are comparable in terms of adjusted R2

(R2a) values (0.734, 0.731 and 0.732 respectively) and r2

m(LOO) (show-ing values of 0.675, 0.679 and 0.673 respectively) metrics. Model 6and model 10 are comparable in terms of the r2

m(test) (having values0.712 and 0.707 respectively) parameter, while model 8 and model10 are comparable in terms of the rmsepext (having values of 0.280and 0.284) parameter.

3.3. Models with combined set (ETA and non-ETA) of descriptors

The valence molecular connectivity index of second order 2�v

is present in all the five models obtained from the combineddescriptor set and hence the parameter is important. The electro-topological state parameter for methyl group (–CH3), i.e., S sCH3, isalso important and is present in model 11 and model 14. The elec-tronic parameter �ε/N and the molecular bulk parameter �˛ arealso important in exerting toxic effects and are present in model 13(G/PLS linear) and model 15 (stepwise regression). Different mod-els are also comparable in terms of their quality metric parameters.Model 11 and model 13 are comparable in terms of R2 (0.751 and0.750 respectively) parameter, adjusted R2 (R2

a) (0.732 and 0.736respectively) parameter, rmsepext (0.283 and 0.282 respectively)and r2

m(overall) (0.635 and 0.640 respectively) parameters. Similarcomparison can be made between model 13 and model 14 relat-ing the following statistical parameters: rmsepint (0.305 and 0.309respectively) and r2

m(overall) (0.640 and 0.641 respectively). Model13 and model 15 can be compared considering their leave-one-outcross validation parameter Q 2

int (0.683 and 0.684 respectively) andr2m(test) (0.680 and 0.677 respectively). The r2

m(overall) parameter val-ues of model 11, 13 and 14 (having values 0.635, 0.640 and 0.641respectively) are comparable.

3.4. Comparison of ETA and non-ETA models

Among the best models involving ETA and non-ETA parameters,it can be concluded that ETA models provide better external vali-dation characteristics in term of predictive R2, i.e., Q 2

ext(F1) (0.744;

Page 7: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

ardous

m

r

u0cit0mvpo

3

G

aoerttscGicSs�totuamvia

3

cclMooblsc

K. Roy, R.N. Das / Journal of Haz

odel 4), Q 2ext(F2) (0.743; model 4), Q 2

ext(F3) (0.743; model 4) and2m(test) (0.717; model 4) which are greater than corresponding val-es of the non-ETA models (values being 0.718, 0.717, 0.717 and.712 respectively; model 6). External validation being the mostonclusive proof of predictive ability of models, such observations very interesting. The models involving combined set of descrip-ors show improved external validation parameter values (0.730,.729, 0.729 and 0.738 respectively; model 12) than that of non-ETAodels implying that ETA descriptors improve the quality metric

alues of non-ETA models when used in combination and therebyrovides more robust prediction of the toxicity of the given datasetf compounds.

.5. Discussion on the best ETA model

The best ETA model on the basis of r2m(overall) is obtained from the

/PLS spline technique. The concerned equation is shown below:

pIGC50 = −5.073 + 12.498 × [�′F]8 + 1.077 × [�′

F]Cl − 8.406

×〈∑

ˇ′s − 1.13636〉 + 0.666 ×

∑˛ + 0.186 × 〈[�′

F]NO2− 0〉

+2.752 × 〈∑ ε/N − 0.653158〉R2 = 0.793, R2

a = 0.773, Q 2int = 0.709, r2

m(LOO) = 0.680, rmsepint

= 0.278, Q 2ext(F1) = 0.744, Q 2

ext(F2) = 0.743, Q 2ext(F3)

= 0.743, r2m(test) = 0.717, rmsepext = 0.261, r2

m(overall) = 0.683

(1)

Eq. (1) [Model 4; Table 3] has six ETA descriptors which areble to explain 77.30% and predict 70.90% of the toxicity (pIGC50)f the training set compounds. The term [�′

F]8 shows the positiveffect exerted by aldehydic carbonyl oxygen atom, whereas [�′

F]Clefers to the positive effect of chlorine (lipophilicity) in producingoxicity of aromatic aldehydes. It may be noted that the best equa-ion using the combined set of descriptors (model 12, GFA spline)hows the presence of the descriptor [�′

F]Cl with a positive coeffi-ient and the best equation involving non-ETA descriptors (model 8,/PLS linear) shows the presence of the descriptor S sCl with a pos-

tive coefficient. Both the observations confirm the obvious positiveontribution of chlorine atom (lipophilicity) exerting the toxicity.imilar positive effect is also shown by the nitro group as a sub-tituent. Eq. (1) suggests a required value range for the descriptorε/N: a value greater than 0.653158 positively contributes towards

he toxicity of aromatic aldehydes thus signifying the contributionf electronegative atoms of the aldehydic compounds towards theoxicity. The term �˛ corresponds to the positive effect of molec-lar bulk of the compounds towards the toxicity. The equationlso explains the contribution of the sum of sigma electron, ˇ′

s (aeasure of occurrence of electronegative atoms), showing that a

alue lower than 1.13636 positively contributes towards the tox-city. Table 1 shows calculated values of the toxicity of aldehydesccording to Eq. (1).

.6. Models with log Ko/w descriptor

We have also modeled the toxicity using ETA, non-ETA andombined set of descriptors along with log Ko/w as a physico-hemical descriptor. Table 4 shows only selected models involvingog Ko/w. Other models are listed in Table S4 in Supplementary

aterials section. It was observed that when models are devel-ped with n-octanol/water partition coefficient (log Ko/w) as one

f the predictor variables, the quality of all types of models areetter than the corresponding models developed without using

og Ko/w. On the basis of r2m(overall) metric, it was found that G/PLS

pline method provided the best models using ETA, non-ETA andombined sets of descriptors when used along with log Ko/w. The

Materials 183 (2010) 913–922 919

r2m(overall) values for the best models are 0.837 (ETA + log Ko/w),

0.886 (non-ETA + log Ko/w) and 0.877 (combined + log Ko/w) whichare higher than the best r2

m(overall) values for models developedwithout log Ko/w as a descriptor and the corresponding values are0.683, 0.683 and 0.696 respectively. Hence it can be concluded thatn-octanol/water partition coefficient improves the model qualitywhen used with other topological descriptors and it is an impor-tant parameter in predicting toxicity of aromatic aldehydes. Similarobservation was also made in ref. [22].

4. Overview of the models

From the comparison of all quality metric parameters calcu-lated for all models, it was observed that ETA models producedcomparable results with respect to the non-ETA models in predict-ing toxicity of aromatic aldehydes. The physicochemical parameterlog Ko/w improved model quality in both cases and therefore isessential in the prediction of toxicity. From the best ETA model(selected on the basis of r2

m(overall); model number 4, G/PLS spline), itwas found that the aldehydic carbonyl oxygen functionality impartsa positive contribution towards the toxicity. Similar positive con-tribution is exerted by chlorine atom (lipophilicity) and nitro group(NO2) as substituents and molecular bulk of the compounds inproducing toxic effects. The contribution of

∑ˇ′

s (a measure ofelectronegative atom count) is negative when its value is greaterthan 1.13636, i.e., a compound showing a

∑ˇ′

s value greater than1.13636 will be less toxic. Similar relationship is obtained for thecontribution of the electronegativity measure, �ε/N, where a valuegreater than 0.653158 positively contributes towards the toxicity.It appears that there should a required number of hetero-atoms inthe molecules relative to the molecular size for exerting the toxicity.

The robustness of all related 30 models was tested by per-forming process randomization or Y-scrambling operation. Therandomization operation was performed in Cerius 2 (version 4.10)[52] software at the confidence level of 95% (variable selection strat-egy was repeated at each cycle of randomization). From this test,we obtained the mean randomized correlation coefficient Rr. Byusing this Rr value and the non-randomized correlation coefficientvalue R, we calculated the parameter cR2

p as defined in Supple-mentary Materials section. Table S5 in Supplementary Materialssection shows the detailed results of process randomization test.The values of cR2

p for all the models varied from a minimum of 0.522(model 7, non-ETA, GFA spline) to a maximum of 0.853 (model 17,ETA + log K, GFA spline) suggesting that the non-random modelswere not obtained by chance because all the models passed thethreshold value of cR2

p, i.e., 0.5 (Table S5 in Supplementary Materi-als section).

5. Comparison with previously reported models on thisdataset

Netzeva and Schultz [4] developed QSAR models on the datasetof the present work using maximum acceptor superdelocalizabilityand the octanol–water partition coefficient (log Ko/w) as predictorvariables and reported quality of model on the basis of internal val-idation parameters. Their best model for 77 compounds showeda model R2 value of 0.805 and corresponding cross-validated R2 of0.789 [4]. Kar et al. [22] developed QSAR models on the same datasetusing QTMS descriptors along with log Ko/w and reported the qual-ity of their model in terms of internal as well as external validation

2

parameters. The best model in that work showed a R value of 0.829,leave-one-out cross validated R2 (Q 2

(LOO)) value of 0.804 (inter-

nal validation parameter), and predictive R2 (R2pred) value of 0.886

(external validation parameter) [22]. In our present work, we havereported both internal and external validation parameters along

Page 8: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

920 K. Roy, R.N. Das / Journal of Hazardous

Tab

le5

Com

par

ison

ofst

atis

tica

lpar

amet

ers

ofth

em

odel

sd

evel

oped

inth

ep

rese

nt

wor

kw

ith

pre

viou

sly

rep

orte

dm

odel

s.

Ref

eren

ceLe

velo

fth

eory

for

quan

tum

chem

ical

des

crip

tors

Mod

elin

gte

chn

iqu

eM

odel

/Eq.

no.

n tra

inin

gn t

est

No.

ofd

escr

ipto

rsD

escr

ipto

rsSt

atis

tica

lpar

amet

ers

ofth

em

odel

s

R2

Q2 in

tR

2 pre

dr2 m

(tes

t)r2 m

(ove

rall

)

Net

zeva

and

Sch

ult

z[4

]A

M1

–Eq

.(9)

Ref

[26]

77(w

hol

ese

t)2

Am

ax,l

ogK

o/w

0.80

50.

789

––

Kar

etal

.[22

]H

F/6-

31G

(d)

–M

odel

19R

ef[2

4]58

193

�3,l

ogK

o/w

0.82

90.

804

0.88

60.

780

0.78

8

The

pre

sen

tw

ork

(Top

olog

ical

des

crip

tors

)G

FAli

nea

rM

odel

2658

194

SC-3

C,S

sOH

,[ˇ

′ s],

log

Ko/

w

0.89

40.

872

0.91

10.

890

0.81

4

G/P

LSsp

lin

eM

odel

2958

194

〈3.2

0702

−[�

′ F] lo

cal〉,

log

Ko/

w,

〈0.1

5882

−[�

′ F] O

H〉,

2�

0.89

50.

870

0.90

20.

896

0.87

7

Materials 183 (2010) 913–922

with the overall validation parameter. Table 5 shows a comparisonof the presently developed models with those reported earlier. Thecomparison shows higher predictive ability of the models presentlydeveloped by us than those reported previously as evidenced fromthe values of internal and external validation parameters. It is inter-esting to note that quality of some of the models developed hereusing only topological parameters along with log Ko/w are better instatistical quality than those reported previously [22] and devel-oped using, along with log Ko/w, quantum chemical descriptorscalculated at sufficiently high level of theory. Topological descrip-tors may be computed in a straight-forward approach makingthem less computer intensive than quantum chemical descriptors.Thus, the models developed and reported here may be preferredover the previously reported models involving quantum chemicaldescriptors in terms of usability and reproducibility in addition topredictability of models.

6. Conclusion

In this work, we have developed QSTR models for T. pyriformistoxicity data of a large diverse set of aromatic aldehydes by dividingthe dataset into a training set of 58 and a test set of 19 com-pounds. We have calculated various ETA parameters for all the 77compounds and developed suitable models by using GFA, G/PLSand stepwise regression as statistical techniques. The models wererigorously validated by internal, external and overall validationtechniques. In addition to the ETA models, various non-ETA mod-els were also developed to compare the ETA models with them.n-Octanol/water partition coefficient (log Ko/w) was also employedas a physicochemical descriptor, in addition to ETA and/or non-ETAdescriptors, to account for the importance of hydrophobicity in theprediction of toxicity of aldehydic chemicals. From the results, itwas observed that ETA models provided robust prediction of thetoxicity of the aldehydes and the statistical quality of the modelswere comparable with the corresponding non-ETA models. Finally,incorporation of partition coefficient (log Ko/w) as a descriptorshowed improved model quality signifying its obvious importancein predicting toxicity of aromatic aldehydes. From the ETA mod-els is was found that toxicity of the aromatic aldehydes increaseswith the increase in molecular bulk of the compounds, presence ofelectronegative substituents (like O, N, F), lipophilic substituents,e.g., chlorine. The aldehydic oxygen was also important for tox-icity. Some of the models reported here are better in statisticalquality than previously reported models involving complex quan-tum chemical descriptors which require more computational timethan topological descriptors. Thus, the models reported here maybe used more conveniently than the previously reported models,with better confidence of prediction accuracy.

Acknowledgement

Financial assistance from the Ministry of Human ResourceDevelopment, Govt. of India, New Delhi in the form of a scholarshipto RND is thankfully acknowledged.

Appendix A. Supplementary data

Supplementary data associated with this article can be found, inthe online version, at doi:10.1016/j.jhazmat.2010.07.116.

References

[1] J.R. Seward, E.L. Hamblen, T.W. Schultz, Regression comparisons of Tetrahymenapyriformis and Poecilia reticulate toxicity, Chemosphere 47 (2002) 93–101.

[2] A.O. Aptula, D.W. Roberts, Mechanistic applicability domains for non-animalbased prediction of toxicological end points: general principles and applicationto reactive toxicity, Chem. Res. Toxicol. 19 (2006) 1097–1105.

Page 9: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

ardous

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[[[

[

[

[

[

[

[

K. Roy, R.N. Das / Journal of Haz

[3] F.M. Christensen, J.H.M. de Bruijn, B.G. Hansen, S.J. Munn, B. Sokull-Klüttgenm, F. Pedersen, Assessment tools under the new EuropeanUnion chemicals policy, GMI 41 (2003) 5–19, http://ecb.jrc.ec.europa.eu/documents/REACH/PUBLICATIONS/Assessment tools REACH040122.pdf (lastaccessed on 14 June 2010).

[4] T.I. Netzeva, T.W. Schultz, QSARs for the aquatic toxicity of aromatic aldehydesfrom Tetrahymena data, Chemosphere 61 (2005) 1632–1643.

[5] A.D. De Weese, T.W. Schultz, Structure–activity relationships for aquatic toxi-city to Tetrahymena: halogen-substituted aliphatic esters, Environ. Toxicol. 16(2001) 54–60.

[6] C.M. Auer, J.V. Nabholz, K.P. Baetcke, Mode of action and the assessment ofchemical hazards in the presence of limited data: use of structure–activityrelationships (SAR) under TSCA, Section 5, Environ. Health Perspect. 87 (1990)183–197.

[7] S.P. Bradbury, Quantitative structure–activity relationships and ecological riskassessment: an overview of predictive aquatic toxicology research, Toxicol.Lett. 79 (1995) 229–237.

[8] J.D. McKinney, A. Richard, C. Waller, M.C. Newman, F. Gerberick, The practice ofstructure activity relationships (SAR) in toxicology, Toxicol. Sci. 56 (2000) 8–17.

[9] T.W. Schultz, M.T.D. Cronin, J.D. Walker, A.O. Aptula, Quantitativestructure–activity relationships (QSARs) in toxicology: a historical perspective,J. Mol. Struct. (Theochem.) 622 (2003) 1–22.

10] K. Pirselova, S. Balaz, T.W. Schultz, Model-based QSAR for ionizable compounds:toxicity of phenols against Tetrahymena pyriformis, Arch. Environ. Contam. Tox-icol. 30 (1996) 170–177.

11] T.W. Schultz, A.P. Bearden, J.S. Jaworska, A novel approach for estimating toxi-city of phenols, SAR QSAR Environ. Res. 5 (1996) 99–112.

12] M.T.D. Cronin, A.O. Aptula, J.C. Duffy, T.I. Netzeva, P.H. Rowe, I.V. Valkova, T.W.Schultz, Comparative assessment of methods to develop QSARs for the pre-diction of the toxicity of phenols to Tetrahymena pyriformis, Chemosphere 49(2002) 1201–1221.

13] J. Damborsky, T.W. Schultz, Comparison of the QSAR models for toxicity andbiodegradability of anilines and phenols, Chemosphere 34 (1997) 429–446.

14] T.W. Schultz, Structure–toxicity relationships for benzene evaluated withTetrahymena pyriformis, Chem. Res. Toxicol. 12 (1999) 1262–1267.

15] M.T.D. Cronin, T.W. Schultz, Structure–toxicity relationships for phenols toTetrahymena pyriformis, Chemosphere 32 (1996) 1453–1468.

16] Y.H. Zhao, X. Yuan, L.M. Su, W.C. Qin, M.H. Abraham, Classification of toxicity ofphenols to Tetrahymena pyriformis and subsequent derivation of QSARs fromhydrophobic, ionization and electronic parameters, Chemosphere 75 (2009)866–871.

17] Y.H. Zhao, X.J. Zhang, Y. Wen, F.T. Sun, Z. Guo, W.C. Qin, H.W. Qin, J.L. Xu, L.X.Sheng, M.H. Abraham, Toxicity of organic chemicals to Tetrahymena pyriformis:effect of polarity and ionization on toxicity, Chemosphere 79 (2010) 72–77.

18] D.R. Roy, R. Parthasarathi, B. Maiti, V. Subramanian, P.K. Chattaraj, Electrophilic-ity as a possible descriptor for toxicity prediction, Bioorg. Med. Chem. 13 (2005)3405–3412.

19] P.J. O’Brien, A.G. Siraki, N. Shangari, Theme: industrial dust & chemical toxicol-ogy, Crit. Rev. Toxicol. 35 (2005) 609–662.

20] R.C. Prince, D.E. Gunson, Just plain vanilla? Trends Biochem. Sci. 19 (1994)521–1521.

21] I. Kahn, S. Sild, U.J. Maran, Modeling the toxicity of chemicals toTetrahymena pyriformis using heuristic multi-linear regression and heuristicback-propagation in neural networks, Chem. Inf. Model. 47 (2007) 2271–2279.

22] S. Kar, A.P. Harding, K. Roy, P.L.A. Popelier, QSAR with quantum topologicalmolecular similarity indices: toxicity of aromatic aldehydes to Tetrahymenapyriformis, SAR QSAR Environ. Res. 21 (2010) 149–168.

23] K. Roy, G. Ghosh, Introduction of extended topochemical atom (ETA) Indices inthe valence electron mobile (VEM) environment as tools for QSAR/QSPR studies,Internet Electron. J. Mol. Des. 2 (2003) 599–620.

24] K. Roy, G. Ghosh, QSTR with extended topochemical atom indices. 2. Fish toxi-city of substituted benzenes, J. Chem. Inf. Comput. Sci. 44 (2004) 559–567.

25] K. Roy, G. Ghosh, QSTR with extended topochemical atom indices. 3. Toxicity ofnitrobenzenes to Tetrahymena pyriformis, QSAR Comb. Sci. 23 (2004) 99–108.

26] K. Roy, G. Ghosh, QSTR with extended topochemical atom indices. 4. Modelingof the acute toxicity of phenylsulfonyl carboxylates to Vibrio fischeri using prin-cipal component factor analysis and principal component regression analysis,QSAR Comb. Sci. 23 (2004) 526–535.

27] K. Roy, G. Ghosh, QSTR with extended topochemical atom indices. Part 5. Mod-eling of the acute toxicity of phenylsulfonyl carboxylates to Vibrio fischeri usinggenetic function approximation, Bioorg. Med. Chem. 13 (2005) 1185–1194.

28] K. Roy, G. Ghosh, QSTR with extended topochemical atom (ETA) indices. VI.Acute toxicity of benzene derivatives to tadpoles (Rana japonica), J. Mol. Model.12 (2006) 306–316.

29] K. Roy, I. Sanyal, QSTR with extended topochemical atom indices. 7. QSAR ofsubstituted benzenes to Saccharomyces cerevisiae, QSAR Comb. Sci. 25 (2006)359–371.

30] K. Roy, G. Ghosh, QSTR with extended topochemical atom (ETA) indices. 8. QSARfor the inhibition of substituted phenols on germination rate of Cucumis sativususing chemometric tools, QSAR Comb. Sci. 25 (2006) 846–859.

31] K. Roy, G. Ghosh, QSTR with extended topochemical atom (ETA) indices. 9.Comparative QSAR for the toxicity of diverse functional organic compounds toChlorella vulgaris using chemometric tools, Chemosphere 70 (2007) 1–12.

32] K. Roy, G. Ghosh, QSTR with extended topochemical atom indices. 10. Modelingof toxicity of organic chemicals to humans using different chemometric tools,Chem. Biol. Drug Des. 72 (2008) 383–394.

[

[

Materials 183 (2010) 913–922 921

33] K. Roy, G. Ghosh, QSTR with extended topochemical atom (ETA) indices. 11.Comparative QSAR of acute NSAID cytotoxicity in rat hepatocytes using chemo-metric tools, Mol. Simul. 35 (2009) 648–659.

34] K. Roy, G. Ghosh, QSTR with extended topochemical atom (ETA) indices. 12.QSAR for the toxicity of diverse aromatic compounds to Tetrahymena pyriformisusing chemometric tools, Chemosphere 77 (2009) 999–1009.

35] K. Roy, G. Ghosh, QSTR with extended topochemical atom (ETA) Indices. 13.Modeling of hERG K+ channel blocking activity of diverse functional drugs usingdifferent chemometric tools, Mol. Simul. 15 (2009) 1256–1268.

36] K. Roy, I. Sanyal, P.P. Roy, QSPR of the bioconcentration factors of nonionicorganic compounds in fish using extended topochemical atom (ETA) indices,SAR QSAR Environ. Res. 17 (2006) 563–582.

37] K. Roy, I. Sanyal, G. Ghosh, QSPR of n-octanol/water partition coefficient ofnonionic organic compounds using extended topochemical atom (ETA) indices,QSAR Comb. Sci. 25 (2006) 629–646.

38] D.K. Pal, C. Sengupta, A.U. De, A new topochemical descriptor (TAU) in molecularconnectivity concept: part I—aliphatic compounds, Indian J. Chem. 27B (1988)734–739.

39] D.K. Pal, C. Sengupta, A.U. De, Introduction of a novel topochemical index andexploitation of group connectivity concept to achieve predictability in QSARand RDD, Indian J. Chem. 28B (1989) 261–267.

40] D.K. Pal, M. Sengupta, C. Sengupta, A.U. De, QSAR with TAU (t) indices: partI—polymethylene primary diamines as amebicidal agents, Indian J. Chem. 29B(1990) 451–454.

41] D.K. Pal, S.K. Purkayastha, C. Sengupta, A.U. De, Quantitative structure–propertyrelationships with TAU indices: part I—research octane numbers of alkane fuelmolecules, Indian J. Chem. 31B (1992) 109–114.

42] K. Roy, D.K. Pal, A.U. De, C. Sengupta, Comparative QSAR with molecular negen-tropy molecular connectivity, STIMS and TAU indices: part I. Tadpole narcosisof diverse functional acyclic compounds, Indian J. Chem. 38B (1999) 664–671.

43] K. Roy, D.K. Pal, A.U. De, C. Sengupta, Comparative QSAR studies with molecularnegentropy, molecular connectivity, STIMS and TAU indices. Part II: generalanaesthetic activity of aliphatic hydrocarbons, halocarbons and ethers, IndianJ. Chem. 40B (2001) 129–135.

44] K. Roy, A. Saha, Comparative QSPR studies with molecular connectivity, molec-ular negentropy and TAU Indices. Part I: molecular thermochemical propertiesof diverse functional acyclic compounds, J. Mol. Model. 9 (2003) 259–270.

45] K. Roy, A. Saha, Comparative QSPR studies with molecular connectivity, molec-ular negentropy and TAU indices. Part 2: lipid–water partition coefficient ofdiverse functional acyclic compounds, Internet Electron. J. Mol. Des. 2 (2003)288–305.

46] K. Roy, A. Saha, QSPR with TAU indices: water solubility of diverse functionalacyclic compounds, Internet Electron. J. Mol. Des. 2 (2003) 475–491.

47] K. Roy, S. Chakroborty, C.C. Ghosh, A. Saha, QSPR with TAU indices: molarthermochemical properties of diverse functional acyclic compounds, J. IndianChem. Soc. 81 (2004) 115–125.

48] K. Roy, A. Saha, QSPR with TAU indices: boiling points of sulfides and thiols,Indian J. Chem. 43A (2004) 1369–1376.

49] K. Roy, A. Saha, QSPR with TAU indices: molar refractivity of diverse functionalacyclic compounds, Indian J. Chem. 44B (2005) 1693–1707.

50] K. Roy, A. Saha, QSPR with TAU indices: part 5. Liquid heat capacity of diversefunctional organic compounds, J. Indian Chem. Soc. 83 (2006) 351–355.

51] GW-BASIC programs RRR98, KRETA1, KRETA2, KRPRES1 and KRPRES2 weredeveloped by Kunal Roy and standardized using known data sets.

52] The Cerius 2 version 4.10 is a product of Accelrys Inc., San Diego, CA, USA,Available at http://www.accelrys.com/cerius2, 2005.

53] R.B. Darlington, Regression and Linear Models, McGrawHill, New York, 1990.54] H. Wold, Research Papers in Statistics, Wiley, New York, NY, USA, 1966.55] H. Jores-Kong, H. Wold, Systems Under Indirect Observation: Causality, Struc-

ture, Prediction, North-Holland, Amsterdam, The Netherlands, 1982.56] S. Wold, PLS for multivariate linear modeling, in: H. van de Waterbeemd

(Ed.), Chemometric Methods in Molecular Design, VCH, Weinheim, 1995, pp.195–218.

57] G.M.S. da Silva, C.M. Sant’Anna, E. Barreiro, A novel 3D-QSAR compara-tive molecular field analysis (CoMFA) model of imidazole and quinazolinonefunctionalized p38 MAP kinase inhibitors, Bioorg. Med. Chem. 12 (2004)3159–3166.

58] S.S. Kulkarni, V.M. Kulkarni, Three-dimensional quantitative structure–activityrelationship of interleukin 1-beta converting enzyme inhibitors: a comparativemolecular field analysis study, J. Med. Chem. 42 (1999) 373–380.

59] J. Holland, Adaptation in Artificial and Natural Systems, University of MichiganPress, Ann Arbor, MI, 1975.

60] J. Friedman, Multivariate Adaptive Regression Splines, Technical Report No.102, Laboratory for Computational Statistics, Department of Statistics, StanfordUniversity, Stanford, CA, Nov 1988 (revised Aug 1990).

61] D. Rogers, A.J. Hopfinger, Application of genetic function approxima-tion to quantitative structure–activity relationships and quantitativestructure–property relationships, J. Chem. Inf. Comput. Sci. 34 (1994)854–866.

62] Y. Fan, L.M. Shi, K.W. Kohn, Y. Pommier, J.N. Weinstein, Quantitative structureantitumor activity relationships of camptothecin analogues: cluster analysisand genetic algorithm-based studies, J. Med. Chem. 44 (2001) 3254–3263.

63] T. Kimura, K. Hasegawa, K. Funatsu, GA strategy for variable selection in QSARstudies: GA-based region selection for CoMFA modeling, J. Chem. Inf. Comput.Sci. 38 (1998) 276–282.

Page 10: QSTR with extended topochemical atom (ETA) indices. 14. QSAR modeling of toxicity of aromatic aldehydes to Tetrahymena pyriformis

9 ardous

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[tionship study on fish toxicity of substituted benzenes, QSAR Comb. Sci. 27

22 K. Roy, R.N. Das / Journal of Haz

64] MINITAB is a statistical software of Minitab Inc., USA,<http://www.minitab.com>.

65] STATISTICA is a statistical software of STATSOFT Inc., USA,<http://www.statsoft.com/>.

66] G.W. Snedecor, W.G. Cochran, Statistical Methods, Oxford & IBH Publishing Co.Pvt. Ltd., New Delhi, 1967.

67] P.P. Roy, S. Paul, I. Mitra, K. Roy, On two novel parameters for validation ofpredictive QSAR models, Molecules 14 (2009) 1660–1701.

68] V. Consonni, D. Ballabio, R. Todeschini, Evaluation of model predictive abilityby external validation techniques, J. Chemom. 24 (2010) 194–201.

69] D.M. Hawkins, The problem of overfitting, J. Chem. Inf. Comput. Sci. 44 (2004)1–12.

70] G. Schuurmann, R.-U. Ebert, J. Chen, B. Wang, R. Kuhne, External validationand prediction employing the predictive squared correlation coefficient-testset activity mean vs training set activity mean, J. Chem. Inf. Model. 48 (2008)2140–2145.

71] V. Consonni, D. Ballabio, R. Todeschini, Comments on the definition of the Q2

parameter for QSAR validation, J. Chem. Inf. Model. 49 (2009) 1669–1678.72] P.P. Roy, K. Roy, On some aspects of variable selection for partial least squares

regression models, QSAR Comb. Sci. 27 (2008) 302–313.73] I. Mitra, A. Saha, K. Roy, Quantitative structure–activity relationship modeling

of antioxidant activities of hydroxybenzalacetones using quantum chemical,physicochemical and spatial descriptors, Chem. Biol. Drug Des. 73 (2009)

526–536.

74] K. Roy, P.L.A. Popelier, Exploring predictive QSAR models for hepatocyte tox-icity of phenols using QTMS descriptors, Bioorg. Med. Chem. Lett. 18 (2008)2604–2609.

75] K. Roy, P.L.A. Popelier, Exploring predictive QSAR models using quantumtopological molecular similarity (QTMS) descriptors for toxicity of nitroaro-

[

[

Materials 183 (2010) 913–922

matics to Saccharomyces cerevisiae, QSAR Comb. Sci. 27 (2008) 1006–1012.

76] K. Roy, P. Popelier, Predictive QSPR modeling of acidic dissociation constant(pKa) of phenols in different solvents, J. Phys. Org. Chem. 22 (2009) 186–196.

77] I. Mitra, P.P. Roy, S. Kar, P.K. Ojha, K. Roy, On further application of rm2 as a

metric for validation of QSAR models, J. Chemom. 24 (2010) 22–33.78] A. Nargotra, S. Sharma, J.L. Koul, P.L. Sangwan, I.A. Khan, A. Kumar, S.C. Taneja,

S. Koul, Quantitative structure activity relationship (QSAR) of piperine analogsfor bacterial NorA efflux pump inhibitors, Eur. J. Med. Chem. 44 (2009) 4128–4135.

79] R. Hu, J.-P. Doucet, M. Delamar, R. Zhang, QSAR models for 2-amino-6-arylsulfonylbenzonitriles and congeners HIV-1 reverse transcriptase inhibitorsbased on linear and nonlinear regression methods, Eur. J. Med. Chem. 44 (2009)2158–2171.

80] A.A. Toropov, A.P. Toropova, E. Benfenati, QSPR modeling bioconcentrationfactor (BCF) by balance of correlations, Eur. J. Med. Chem. 44 (2009) 2544–2551.

81] A. Basu, K. Jasu, V. Jayaprakash, N. Mishra, P. Ojha, S. Bhattacharya, Developmentof CoMFA and CoMSIA models of cytotoxicity data of anti-HIV-1-phenylamino-1H-imidazole derivatives, Eur. J. Med. Chem. 44 (2009) 2400–2407.

82] Z. Gong, B. Xia, R. Zhang, X. Zhang, B. Fan, Quantitative structure–activity rela-

(2008) 967–976.83] K. Roy, S. Paul, Exploring 2D and 3D QSARs of 2,4-diphenyl-1,3-oxazolines

for ovicidal activity against Tetranychus urticae, QSAR Comb. Sci. 28 (2009)406–425.

84] R. Todeschini, Milano Chemometrics, Italy 2010 (Personal Communication).