13
QSTR with Extended Topochemical Atom Indices. 7.* QSAR of Substituted Benzenes to Saccharomyces cerevisiae Kunal Roy and Indrani Sanyal Drug Theoretics and Cheminformatics Lab, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Faculty of Engineering and Technology, Jadavpur University, Kolkata 700 032, India, E-mail: [email protected], [email protected], URL: http://www.geocities.com/kunalroy_in Keywords: QSAR; QSTR; ETA; Topological indices; Toxicology; FA-MLR; PLS; PCRA Received: September 12, 2005; Accepted: November 7, 2005 DOI: 10.1002/qsar.200530172 Abstract The experimental determination of toxicological properties of commercial chemicals being costly and time consuming process, there is the need to develop mathematical predictive tools to theoretically quantify such properties. In this background, we have modeled the nonspecific toxicity of 51 substituted benzenes to the yeast Saccharomyces cerevisiae using extended topochemical atom (ETA) indices. Principal component factor analysis (FA) was used as the data-preprocessing step to reduce the dimensionality of the data matrix and identify the important variables that are devoid of collinearities. Multiple linear regression (MLR) analyses show that the best ETA model has the following statistical quality: n ¼ 51, Q 2 ¼ 0.851, R a 2 ¼ 0.874, R ¼ 0.940, F ¼ 87.9 (df 4, 46), s ¼ 0.235, PRESS ¼ 3.3. We have also modeled the toxicity data using other topological descriptors including Wiener, Hosoya Z, molecular connectivity, kappa shape, Balaban J and E-state param- eters apart from physicochemical parameters like AlogP98, MolRef, H_bond_acceptor and H_bond _donor. The best model shows the following quality: n ¼ 51, Q 2 ¼ 0.837, R a 2 ¼ 0.855, R ¼ 0.929 , F ¼ 98.9 (df 3,47), s ¼ 0.253, PRESS ¼ 3.6. An attempt to use a combined set including both ETA and non-ETA parameters comes out with the following results: n ¼ 51, Q 2 ¼ 0.824, R a 2 ¼ 0.852, R ¼ 0.940, F ¼ 73.0 (df 4,46), s ¼ 0.255, PRESS ¼ 3.9. Besides FA-MLR, stepwise regression analysis and partial least squares (PLS) analysis were used as additional statistical tools. The use of the ETA indices suggested negative contributions of functionalities of amino and carboxylic acid substituents on the benzene ring and the presence of the electronegative atoms and positive contributions of branching and functionality of chloro substituent. Using factor scores as independent variables, principal component regression analysis (PCRA) was performed and the derived relations were of the following statistical qualities: Q 2 values being 0.926, 0.878 and 0.869 while R 2 values being 0.942, 0.903 and 0.899 for factor scores derived from ETA, non-ETA and combined matrices respectively. Thus, it appears that the ETA descriptors have significant potential in QSAR/QSPR/QSTR, which warrants extensive evaluation. 1 Introduction The effect of hazardous chemicals and pollutants on the ecosystem is a matter of great concern considering that though large number of chemical compounds (to the tune of tens of thousands) are in commercial use, relatively few of these have been subjected to adequate assessment for their hazardous environmental properties [1]. In order to evaluate environmentally safe levels of dangerous chemi- cals, there is the need for a set of toxicological data on or- ganisms representative of the ecosystems, which is often unavailable or inadequate [2]. Faced with the task of screening a large number of chemicals, for an increasing array of toxicity endpoints, using limited resources, quanti- tative structure – activity relationships (QSARs) have been used in many diverse problem settings as a comple- ment to experimental data [3]. In a QSAR model, an at- tempt is made to relate a particular biological activity of a representative set of molecules to molecular descriptors by means of a statistical method such as regression analy- sis, partial least squares (PLS) analysis or neural networks. The design of powerful QSAR models requires the selec- QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371 # 2006 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim 359 * For parts 1 – 6, vide references 22 – 27. Full Papers

QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

Embed Size (px)

Citation preview

Page 1: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

QSTR with Extended Topochemical Atom Indices. 7.*QSAR of Substituted Benzenes to Saccharomyces cerevisiae

Kunal Roy and Indrani Sanyal

Drug Theoretics and Cheminformatics Lab, Division of Medicinal and Pharmaceutical Chemistry, Department of PharmaceuticalTechnology, Faculty of Engineering and Technology, Jadavpur University, Kolkata 700 032, India,E-mail: [email protected], [email protected], URL: http://www.geocities.com/kunalroy_in

Keywords: QSAR; QSTR; ETA; Topological indices; Toxicology; FA-MLR; PLS; PCRA

Received: September 12, 2005; Accepted: November 7, 2005

DOI: 10.1002/qsar.200530172

AbstractThe experimental determination of toxicological properties of commercial chemicals beingcostly and time consuming process, there is the need to develop mathematical predictivetools to theoretically quantify such properties. In this background, we have modeled thenonspecific toxicity of 51 substituted benzenes to the yeast Saccharomyces cerevisiae usingextended topochemical atom (ETA) indices. Principal component factor analysis (FA)was used as the data-preprocessing step to reduce the dimensionality of the data matrixand identify the important variables that are devoid of collinearities. Multiple linearregression (MLR) analyses show that the best ETA model has the following statisticalquality: n¼51, Q2¼0.851, Ra

2¼0.874, R¼0.940, F¼87.9 (df 4, 46), s¼0.235, PRESS¼3.3.We have also modeled the toxicity data using other topological descriptors includingWiener, Hosoya Z, molecular connectivity, kappa shape, Balaban J and E-state param-eters apart from physicochemical parameters like AlogP98, MolRef, H_bond_acceptorand H_bond _donor. The best model shows the following quality: n¼51, Q2¼0.837, Ra

2 ¼0.855, R¼0.929 , F¼98.9 (df 3,47), s¼0.253, PRESS¼3.6. An attempt to use a combinedset including both ETA and non-ETA parameters comes out with the following results:n¼51, Q2¼0.824, Ra

2¼0.852, R¼0.940, F¼73.0 (df 4,46), s¼0.255, PRESS¼3.9. BesidesFA-MLR, stepwise regression analysis and partial least squares (PLS) analysis were usedas additional statistical tools. The use of the ETA indices suggested negative contributionsof functionalities of amino and carboxylic acid substituents on the benzene ring and thepresence of the electronegative atoms and positive contributions of branching andfunctionality of chloro substituent. Using factor scores as independent variables, principalcomponent regression analysis (PCRA) was performed and the derived relations were ofthe following statistical qualities: Q2 values being 0.926, 0.878 and 0.869 while R2 valuesbeing 0.942, 0.903 and 0.899 for factor scores derived from ETA, non-ETA and combinedmatrices respectively. Thus, it appears that the ETA descriptors have significant potentialin QSAR/QSPR/QSTR, which warrants extensive evaluation.

1 Introduction

The effect of hazardous chemicals and pollutants on theecosystem is a matter of great concern considering thatthough large number of chemical compounds (to the tuneof tens of thousands) are in commercial use, relatively fewof these have been subjected to adequate assessment fortheir hazardous environmental properties [1]. In order toevaluate environmentally safe levels of dangerous chemi-cals, there is the need for a set of toxicological data on or-

ganisms representative of the ecosystems, which is oftenunavailable or inadequate [2]. Faced with the task ofscreening a large number of chemicals, for an increasingarray of toxicity endpoints, using limited resources, quanti-tative structure – activity relationships (QSARs) havebeen used in many diverse problem settings as a comple-ment to experimental data [3]. In a QSAR model, an at-tempt is made to relate a particular biological activity of arepresentative set of molecules to molecular descriptorsby means of a statistical method such as regression analy-sis, partial least squares (PLS) analysis or neural networks.The design of powerful QSAR models requires the selec-

QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371 G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 359

* For parts 1 – 6, vide references 22 – 27.

Full Papers

Page 2: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

tion of a valuable statistical method and an accurate de-scription of the molecules. With the advancement of com-putational efficiency and toxicological understanding,QSARs are evolving as popular and indispensable tool inassessing potential toxic effects of organic chemicals. Con-sidering the ever-increasing production of new chemicals,and the need to optimize resources to assess thousands ofexisting chemicals in commerce, regulatory agencies haveturned to QSARs as essential tools to help prioritize tieredrisk assessments when empirical data are not available toevaluate toxicological effects [4]. Progress in designing sci-entifically credible QSARs is intimately associated withthe development of empirically derived databases of well-defined and quantified toxicity endpoints, which are basedon a strategic evaluation of diverse sets of chemical struc-tures, modes of toxic action, and species. The US Environ-mental Protection Agency (EPA) designed ASTER (As-sessment Tools for Evaluation of Risk), which is an inte-gration of AQUIRE (database of aquatic toxicity) andQSAR (database of physicochemical properties andQSAR models) to assist regulators in producing assess-ments [5]. Apart from prediction of ecological and humanhealth effects, QSARs are being used to help industry de-sign safer chemicals for commercial use [3]. QSARs havealso been used in exploring the mechanism of toxic actionsof chemicals [6].Different QSAR methods have been used for ecotoxico-

logical risk assessment. A QSAR model based on atomtype E-state index has been reported for the prediction oftoxicity of fathead minnow for a diverse set of 140 organicchemicals [7]. The toxicity of organic chemicals to Chlorel-la vulgaris was modeled using multiple linear regressionand k-nearest neighbours methods [8]. Different classifica-tion techniques like discriminant analysis, quadratic dis-criminant analysis, regularized discriminant analysis, etc.,have been used for toxicity prediction [9]. The toxicity ofdiverse aromatic compounds to Tetrahymena pyriformiswas modeled using partial least squares and Bayesianregularized neural network [10, 11]. 3-D QSAR studies ontoxicity of chlorophenols have been reported using theCATALYST software [12]. Mazzatorta et al. have modeledtoxicity of 562 organic chemicals using neural and fuzzy-neural networks [13]. Recently, Huuskonen has modeledtoxicity of organic chemicals using E-state index [14].Physicochemical descriptors have been used by Kulkarniet al. [15] to design eco-friendly molecules with lower tox-icity. QSARs were developed for the prediction of aque-ous toxicities for Poecilia reticulata using CODESSA treat-ment by Katritzky et al. [16] QSAR was built with weight-ed holistic invariant molecular (WHIM) indices as well asphysicochemical parameters by Di Marzio et al. [17]. Prin-cipal component analysis was used as a classification toolfor the toxicity data of dangerous chemicals by Vighi et al.[18]. Basak et al. [19] have used H-QSAR for predictingtoxicity of chemicals. Devillers [20] has derived a generalmodel for predicting acute toxicity of pesticides.

In our present work, we have modeled the nonspecifictoxicity data of 51 substituted benzenes against the yeastSaccharomyces cerevisiae [21] using the recently intro-duced extended topochemical atom (ETA) indices [22 –27], which were developed in the valence electron mobile(VEM) environment as an extension of the TAU concept[28 – 39]. For this purpose we have used multiple linear re-gression technique, stepwise regression analysis, PLS anal-ysis and principle component regression analysis (PCRA)as the statistical tools. The best relation obtained has beencompared to that with some selected topological and phys-icochemical descriptors and also with models reportedpreviously.

2 Materials and methods

Definitions of some of the basic parameters used in theETA scheme are given below.

2.1 The core count (a)

The core count [a] for a nonhydrogen vertex was definedas [22]

a ¼ Z � Zv

Zv � 1PN � 1

ð1Þ

In Eq. 1, Z and Zv represent atomic number and valenceelectron number respectively and PN stands for periodnumber. Hydrogen atom being considered as reference, afor hydrogen is taken to be zero. The a values of differentatoms (which are commonly found in organic compounds)have high correlation (r¼0.946) with (uncorrected) vander Waals volume. Thus, Sa values of all nonhydrogenatoms of a molecule (instead of vertex count NV) may betaken as a gross measurement of molecular bulk.

2.2 The electronegativity measure (e)

We defined a term e as a measure of electronegativity inthe following manner [22]:

e ¼ �aþ 0:3Zv ð2Þ

It was found [22] that e has good correlation (r¼0.937)with PaulingNs electronegativity scale.

2.3 The VEM count b

The VEM count b of the ETA scheme was defined as [22]

b ¼X

xs þX

ypþ d ð3Þ

360 G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371

Full Papers Kunal Roy and Indrani Sanyal

Page 3: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

In the above equation, x is the contribution of a sigmabond (s) having values of 0.5 for two bonded atoms of sim-ilar electronegativity (De�0.3) and 0.75 for two bondedatoms of different electronegativity (De>0.3). Again, inthe case of pi bonds (p), contributions (y) are considereddepending on the type of the double bond: (i) for pi bondbetween two atoms of similar electronegativity (De�0.3),y is taken to be 1; (ii) for pi bond between two atoms ofdifferent electronegativity (De>0.3) or for conjugated(nonaromatic) pi system, y is considered to be 1.5; (iii) foraromatic system, y is taken as 2. d is a correction factor ofvalue 0.5 per atom with lone pair of electrons capable ofresonance with aromatic ring (e.g., nitrogen of aniline,oxygen of phenol, etc.). For a given part (substructure) ofa molecular graph, Sbs and Sbns may be calculated consid-ering all bonds (sigma bonds for the former and pi bondsand lone pair of electrons for the latter) in the substruc-ture. Sb’s (defined as [Sbs]/NV, NV being the vertex count)may be taken as a relative measure of number of electro-negative atoms in the substructure while Sb’ns (defined as[Sbns]/NV) may be taken as a relative measure of electron-richness (unsaturation) of the substructure.

2.4 The VEM vertex count g

The VEM vertex count gi of the ith vertex in a molecular

graph was defined as [22]

gi ¼ai

bi

ð4Þ

In the equation above, ai stands for a value for the ith

vertex and bi stands for VEM count considering allbonds connected to the atom and lone pair of electrons (ifany).

2.5 The composite index h

The composite index h was defined in the following man-ner [22]:

h ¼Xi < j

gigj

r2ij

" #0:5

ð5Þ

In Eq. 5, rij stands for the topological distance between ith

atom and jth atom. Again, when all heteroatoms in the mo-lecular graph are replaced by carbon and multiple bondsare replaced by single bond, corresponding moleculargraph is considered as the reference alkane and the corre-sponding composite index value is designated as hR [22].

2.6 The functionality index hF

Considering functionality as the presence of heteroatoms(atoms other than carbon or hydrogen) and multiplebonds, functionality index hF was calculated [22] as hR –h.

To avoid dependence of functionality on vertex count orbulk, another term h’F was defined as hF/NV.

2.7 The atom level index

The contribution of a particular position or vertex to thefunctionality can be determined in the following manner[22]:

h½ i¼Xj=i

gigj

r2ij

" #0:5

ð6Þ

In Eq. 6, [h]i stands for contribution of the ith vertex to h.

Similarly, contribution of the ith vertex [hR]i to hR can becomputed. Contribution of the ith vertex [hF]i to functional-ity was defined [22] as [hR]i – [h]i. To avoid dependence ofthis value on NV, a related term [h’F]i was defined [22] as[hF]i/NV.

2.8 The local index hlocal

When only bonded interactions are considered (rij¼1), thecorresponding composite index may be written as hlocal.

hlocal ¼X

i < j;rij¼1

gigj

� �0:5

ð7Þ

In the similar way, hRlocal for the corresponding reference

alkane may also be calculated. Local functionality contri-bution (without considering global topology), hF

local, maybe calculated as hR

local – hlocal.

2.9 The branching index hB

Branching is calculated with respect to h value of the cor-responding normal alkane (straight chain compound ofsame vertex count obtained from the reference alkane),hN

local, which may be conveniently calculated as (for com-pounds having non-hydrogen vertex count NV3) [22]:

hlocal ¼ 1:414þ ðNV � 3Þ0:5: ð8Þ

Branching index hB can be calculated as hNlocal – hR

local

þ0.086NR, where NR stands for the number of rings in themolecular graph of the reference alkane. The NR term inthe branching index expression represents a correction fac-tor for cyclicity. To calculate branching contribution incomparison to the molecular size, another term h’B was de-fined [22] as hB/NV.

2.10 The shape indices

The terms like (Sa)p/Sa, (Sa)Y/Sa and (Sa)X/Sa can beused as the shape parameters. (Sa)p, (Sa)Y and (Sa)X

QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371 www.qcs.wiley-vch.de G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 361

QSTR with Extended Topo chemical Atom Indices. 7.a QSAR of Substituted Benzenes to Saccharomyces cerevisiae

Page 4: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

stand for summation of a values of the vertices that arejoined to one, three and four other nonhydrogen verticesrespectively in the molecular graph.Calculation of different indices is illustrated in Table 1

taking the example of 4-chlorophenol.In the present communication, utility of ETA param-

eters is demonstrated through a Quantitative Structure –Toxicity Relationship (QSTR) study taking nonspecifictoxicity of 51 substituted benzenes to Saccharomyces cere-visiae [21] as the model data set (Table 2). The definitionsof important ETA parameters are given in Table 3.Factor analysis (FA) has been performed as the data-pre-

processing step for the identification of important descrip-tors for the subsequent multiple regression analysis [40 –42]. For this purpose the data matrix consisting of the de-scriptors has been subjected to principal component factoranalysis using SPSS software [43]. The object of principalcomponent analysis (PCA) is to reduce the number of vari-ables of possible importance in characterizing an array of

numbers. The data are transformed into a small number oflinear combinations of the original variable, called principalcomponents. The number of possible principal componentswill be equal to the number of variables. The principal com-ponents will be unrelated, but when combined will explainthe whole of the variance of the data. Each row of eigen-vectors is called a principal component and the sum of thesquares of the terms will always be equal to unity. Each ei-gen value expresses the fraction of the variance of the ele-ments of the matrix which is explained by the component.The sum of the squares of the eigenvectors in each principalcomponent is called the communality of the row. Thisshould equal one. The importance of eigenvector in a rowcan, therefore, be assessed by calculating the communalitywithout the vector and noting how far it deviates from uni-ty. But PCA can be applied only to square matrices. So thedata matrix is to be converted to the corresponding correla-tion matrix which is derived from the standardized valuesof the data. The difference between PCA and FA is that

362 G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371

Table 1. Calculations of ETA parameters: example of 4-chlorophenol

4-Chlorophenol Reference alkane

Vertex no. 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

ai 0.5 0.5 0.5 0.5 0.5 0.5 0.33 0.72 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5[bs]i 1.75 1 1 1.75 1 1 0.75 0.75 1.5 1 1 1.5 1 1 0.5 0.5[bns]i 2 2 2 2 2 2 0.5 0.5 0 0 0 0 0 0 0 0bi 3.75 3 3 3.75 3 3 1.25 1.25 1.5 1 1 1.5 1 1 0.5 0.5gi 0.13 0.17 0.17 0.13 0.17 0.17 0.26 0.57 0.33 0.5 0.5 0.33 0.5 0.5 1 1[h]i 0.75 0.74 0.75 0.82 0.75 0.74 0.66 0.94 – – – – – – – –[hR]i – – – – – – – – 2.06 2.12 2.12 2.06 2.12 2.12 2.10 2.10[h/F]i 0.16 0.17 0.17 0.16 0.17 0.17 0.18 0.15 – – – – – – – –h 3.074 –hR – 8.392

h/F 0.665 –hlocal 1.395 –hR

local– 3.786

hFlocal 2.392 –

hNlocal

– 3.914

h/B 0.027 –Sa 4.045 –[Sa]p 1.045 –

Full Papers Kunal Roy and Indrani Sanyal

Page 5: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

the former reduces the number of rows, and the latter re-duces the number of columns. Elements are shed from theright hand sides of the principal components.

The principal objectives of factor analysis are to displaymultidimensional data in a space of lower dimensionalitywith minimal loss of information (explaining >95% of thevariance of the data matrix) and to extract basic featuresbehind the data with ultimate goal of interpretation and/orprediction. The factors were extracted by the principalcomponent method and then rotated by VARIMAX rota-tion to obtain ThurstonNs simple structure. Only factors de-scribing 5% of the total variance was considered. Theanalyses were carried out based on the following postu-lates: (a) only variables with non-zero loadings in such fac-tors where biological activity also has non-zero loadingwere considered important in explaining variance of theactivity; (b) only variables with non-zero loadings in differ-ent factors were combined in regression equations; (c) thefactor pattern indicates whether in the parameter spacethe biological activity can be explained in a satisfactorymanner; if not a different set of variables are to be chosen.Besides FA-MLR, stepwise regression and partial leastsquares (PLS) analyses were also performed as additionalmodel-building techniques using MINITAB software [44].Stepwise regression was performed using the critical F

values being specified at 4 for the forward entry methodand at 3.5 for the backward removal method. In the PLSregression, the prediction functions are represented by fac-

QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371 www.qcs.wiley-vch.de G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 363

Table 2. Observed and calculated toxicity of substituted ben-zenes against Saccharomyces cerevisiae

Sl.No.

Compounds Obsa Calcb Resb Calcc Resc

1 Chlorobenzene 1.18 1.26 �0.08 1.25 �0.072 Bromobenzene 1.40 1.69 �0.29 1.63 �0.233 1,2-Dichlorobenzene 1.96 2.04 �0.08 1.96 04 1,4-Dichlorobenzene 1.96 2.03 �0.07 1.95 0.015 1,3-Dichlorobenzene 2.32 2.51 �0.19 2.45 �0.136 4-Bromo-chlorobenzene 2.08 2.31 �0.23 2.22 �0.147 1,2,3-Trichlorobenzene 2.41 2.63 �0.22 2.58 �0.178 1,2,4-Trichlorobenzene 2.54 2.63 �0.09 2.57 �0.039 2,5-Dichlorotoluene 2.33 2.35 �0.02 2.17 0.1610 2,4,5-Trichlorotoluene 2.91 2.96 �0.05 2.82 0.0911 3-Chlorobenzoic acid 1.72 1.47 0.25 1.43 0.2912 4-Chlorobenzoic acid 1.85 1.46 0.39 1.42 0.4313 3-Bromobenzoic acid 1.94 1.74 0.2 1.69 0.2514 4-Fluorobenzoic acid 1.37 0.96 0.41 1.05 0.3215 4-bromobenzoic acid 1.95 2.12 �0.17 1.91 0.0416 2-Aminobenzoic acid 0.79 0.96 �0.17 0.97 �0.1817 3-Aminobenzoic acid 0.32 0.56 �0.24 0.75 �0.4318 4-Aminobenzoic acid 0.23 0.92 �0.69 0.95 �0.7219 Pentachlorophenol 3.83 3.59 0.24 3.82 0.0120 2,4-Dichlorophenol 2.43 2.27 0.16 2.43 021 2-Methylphenol 1.38 1.23 0.15 1.15 0.2322 2-Chlorophenol 1.43 1.59 �0.16 1.53 �0.123 4-Chlorophenol 1.63 1.58 0.05 1.52 0.1124 2,6-Dimethylphenol 1.35 1.69 �0.34 1.49 �0.1425 Phenol 0.86 0.64 0.22 0.74 0.1226 Nitrobenzene 1.01 0.96 0.05 1.03 �0.0227 2-Chloronitrobenzene 1.65 1.73 �0.08 1.75 �0.128 3-Chloronitrobenzene 1.64 1.73 �0.09 1.74 �0.129 4-Chloronitrobenzene 1.65 1.72 �0.07 1.73 �0.0830 4-Bromonitrobenzene 2.13 2.00 0.13 1.99 0.1431 2-Methylnitrobenzene 1.52 1.42 0.1 1.34 0.1832 3-Methylnitrobenzene 1.52 1.42 0.1 1.34 0.1833 4-Methylnitrobenzene 1.50 1.42 0.08 1.34 0.1634 3,4-Dichloronitrobenzene 2.20 2.28 �0.08 2.33 �0.1335 2,4-Dichloronitrobenzene 2.24 2.29 �0.05 2.34 �0.136 o-Dinitrobenzene 1.41 1.44 �0.03 1.54 �0.1337 m-Dinitrobenzene 1.45 1.44 0.01 1.54 �0.0938 2,4-Dinitrobromobenzene 2.47 2.21 0.26 2.36 0.1139 2,4-Dinitrochlorobenzene 1.90 2.03 �0.13 2.16 �0.2640 2,4-Dinitrotoluene 2.02 1.77 0.25 1.76 0.2641 2,3-Dinitrotoluene 1.97 1.77 0.2 1.76 0.2142 2,6-Dinitrotoluene 1.61 1.77 �0.16 1.76 �0.1543 3-Chloroaniline 1.80 1.43 0.37 1.48 0.3244 4-Chloroaniline 1.80 1.44 0.36 1.48 0.3245 4-Bromoaniline 1.91 1.79 0.12 1.81 0.146 2,4-Dichloroaniline 2.40 2.11 0.29 2.13 0.2747 2,4,6-Tribromoaniline 3.12 2.95 0.17 3.19 �0.0748 2,4,6-Trichloroaniline 2.45 2.63 �0.18 2.69 �0.2449 4-Methylaniline 0.77 1.10 �0.33 1.14 �0.3750 p-Phenylenediamine 0.89 0.78 0.11 1.01 �0.1251 2-Chloro-4-nitroaniline 1.42 1.81 �0.39 1.38 0.04

Obs¼Observed, Calc¼Calculated, Res¼Residual¼Obs –Calca Ref. [21], b according to Eq. (9), c according to Eq. (10)

Table 3. Definitions of the different ETA parameters used in ex-ploring QSAR of the toxicity of substituted benzenes

Variables Definition

Sa Sum of a values of all non-hydrogen vertices of amolecule

[Sa]p Sum of a values of all non-hydrogen vertices eachof which is joined to only one other non-hydrogenvertex of the molecule

h The composite ETA indexhR The composite index for the reference alkaneNV Vertex count (excluding hydrogen)N Total number of atoms (including hydrogen)Sbs Sum of bs values of all non-hydrogen vertices of a

moleculeSbns Sum of bns values of all non-hydrogen vertices of a

moleculeSb/s Sum of b/s values of all non-hydrogen vertices of a

molecule ;P

bNs is defined as [P

bs]/NVSb/ns Sum of b/ns values of all non-hydrogen vertices of a

molecule ;P

bNs is defined as [P

bns]/NVSe /N Sum of e /N values of all atoms including hydrogen[Sa]2 Square of the sum of a values of all non-hydrogen

vertices of a molecule[h/F]Cl Functionality for the chlorine atom[h/F]NO2 Functionality for the nitro group[h/F]Br Functionality for the bromine atom[h/F]CH3 Functionality for the methyl group[h/F]NH2 Functionality for the amino group[h/F]COOH Functionality for the carboxylic group[h/F]F Functionality for the fluorine atom[h/F]OH Functionality for the hydroxyl group

QSTR with Extended Topo chemical Atom Indices. 7.a QSAR of Substituted Benzenes to Saccharomyces cerevisiae

Page 6: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

tor scores extracted as linear combinations of the originalpredictor variables, so that there is no correlation betweenthe factor score variables used in the predictive regressionmodel. Only those variables which are highly loaded in thefactor loading pattern after VARIMAX rotations weretaken into consideration. PLS regression was repeated,each time eliminating the variable with the minimumstandardized regression coefficient value, until the modelgiving the maximum Q2 value was obtained. Attempt wasalso made to perform PCRA [42] taking factor scores asthe predictor variables and adopting linear regressionmethod. In this case the principal components serve as la-tent variables. PCRA has an advantage that collinearitiesamong X variables are not a disturbing factor and that thenumber of variables included in the analysis may exceedthe number of observations [42]. In PCRA, all descriptorsare assumed to be important while the aim of factor analy-sis is to identify relevant descriptors.The calculations of h, hR, hF, hB and contributions of dif-

ferent vertices to hF were done, using distance matrix andVEM vertex counts as inputs, by the GW-BASIC pro-grams KRETA1 and KRETA2 developed by one of theauthors [45]. We have also modeled the toxicity data usingother selected topological and physicochemical variablesand compared the ETA models with non-ETA ones. Thevalues for the topological descriptors and physicochemicalvariables for the compounds have been generated byQSARþ and Descriptorþ modules of the Cerius 2 version4.8 software [46]. The various topological indices calculat-ed are Balaban J, connectivity indices (0c,1c, 2c, 3cp,

3cc,0cv,

1cv, 2cv, 3cpv , 3cc

v), kappa shape indices (1k, 2k, 3k, 1ka,2ka,

3ka) and Wiener index (W). Among the physicochemicalvariables, molar refractivity (MolRef) and hydrophobicity

(AlogP98), and the number of hydrogen bond donor andacceptors (H_bond_donor and H_bond_ acceptor) wereconsidered.The statistical quality of the MLR equations [47] was

judged by the parameters like explained variance (R2a, i.e.,adjusted R2), correlation coefficient (r or R), standard er-ror of estimate (s) and variance ratio (F) at specified de-grees of freedom (df). PRESS (leave-one-out) statistics[48, 49] like leave-one-out (LOO) cross-validation R2 (Q2)and predicted residual sum of squares (PRESS) were ob-tained using the MINITAB software. Finally, “leave-many-out” cross-validation was applied on the final equa-tions. All the accepted MLR equations have regressionconstants and F ratios significant at 95% and 99% levelsrespectively, if not stated otherwise. A compound was con-sidered as an outlier if the residual is more than twice thestandard error of estimate for a particular equation.

3 Results and Discussion

3.1 QSAR with ETA indices

Table 4 shows the results of factor analysis of the data matrixcomposed of the ETA descriptors and the toxicity. It is ob-served that ten factors could explain 96.3% of the varianceof the data matrix. Factor loading pattern after VARIMAXrotation shows that the toxicity of the substituted benzenes ishighly loaded with factor 2 which is in turn highly loaded in[Sa]p /Sa and [Sa]p. The toxicity shows low loading in therest of the factors: factor 1 (highly loaded in Sbs, Sbns, Sb’s,Se /N, [h’F]NO2), factor 3 (highly loaded in [h’F]Cl, [h’F]Br), fac-tor 4 (highly loaded in [h’F]CH3), factor 5 (highly loaded in[h’F]OH), factor 6 (highly loaded in

Pb’ns), factor 7 (highly

364 G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371

Table 4. Factor loadingsa of the variables (ETA) after VARIMAX rotation

Variables Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6 Factor 7 Factor 8 Factor 9 Factor 10 Communality

pC 0.058 0.922* �0.197 �0.031 �0.036 0.021 0.140 0.015 �0.182 �0.048 0.951Sbs 0.933* 0.203 �0.055 0.112 0.048 0.012 0.056 0.033 0.139 0.142 0.974Sbns 0.906* �0.037 0.213 0.148 0.165 0.189 0.159 �0.047 �0.102 0.071 0.995Sb/s 0.835* 0.046 �0.355 �0.102 0.018 0.089 �0.169 0.001 0.087 0.014 0.881Sb/ns 0.255 �0.052 0.013 �0.022 �0.082 0.929* 0.014 �0.079 �0.183 �0.014 0.977Se /N 0.786* 0.353 �0.309 �0.268 �0.062 0.025 0.236 0.080 0.013 0.034 0.977Sa 0.338 0.467 0.204 0.008 0.268 �0.051 �0.009 �0.038 0.159 0.713* 0.983[Sa]2 0.305 0.664 0.335 0.059 0.305 0.091 0.035 �0.059 0.383 0.045 0.904[h/F]Cl �0.074 0.495 �0.810* �0.191 �0.051 �0.015 0.080 �0.060 �0.134 �0.074 0.979[h/F]NO2 0.831* �0.117 0.265 0.179 0.187 0.173 0.135 �0.073 �0.256 0.008 0.961[h/F]Br �0.295 0.475 0.723* �0.237 0.120 �0.010 0.074 �0.036 �0.035 0.119 0.928[h/F]CH3 0.072 �0.069 0.015 0.976* �0.028 �0.001 0.080 �0.035 �0.092 0.004 0.979[h/F]NH2 �0.164 �0.069 0.005 �0.079 0.118 �0.008 �0.965* �0.042 �0.027 0.004 0.985[h/F]COOH �0.071 �0.214 0.074 �0.185 0.077 �0.452 0.049 0.206 0.760* 0.120 0.938[h/F]F �0.003 �0.086 0.011 �0.035 0.027 �0.085 0.040 0.984* 0.099 �0.017 0.997[h/F]OH �0.174 �0.063 �0.077 0.034 �0.952* 0.088 0.127 �0.034 �0.052 �0.110 0.988[Sa]p 0.127 0.946* 0.023 �0.049 �0.009 �0.057 �0.036 �0.044 �0.019 0.214 0.966[Sa]p/Sa 0.006 0.979* �0.032 �0.041 0.051 �0.015 �0.008 �0.074 �0.021 0.066 0.975% variance 0.231 0.227 0.096 0.069 0.066 0.065 0.062 0.059 0.053 0.035 0.963

a Factor loadings more than 0.7 are shown with asterisks.

Full Papers Kunal Roy and Indrani Sanyal

Page 7: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

loaded in [h’F]NH2), factor 8 ( highly loaded in [h’F]F), factor 9(highly loaded in [h’F]COOH) and factor 10 (highly loaded inP

a).Based on the results of factor analysis, a number of

equations were generated taking the descriptors showinghigh loading in different factors and the best model is not-ed below:

pC ¼ 5:017ð�0:393Þ½X

ap=X

aþ 0:962ð�0:258Þ½h0F Cl

� 1:212ð�0:439Þ½h0F NH2

� 0:469ð�0:195Þ½h0F COOH

þ 0:144ð�0:123Þn ¼ 51;R2

a ¼ 0:874;R2 ¼ 0:884;R ¼ 0:940;F ¼ 87:7ðdf4; 46Þ; s ¼ 0:236;Q2 ¼ 0:851; SDEP ¼ 0:254; SPRESS ¼ 0:268;PRESS

¼ 3:3 ð9Þ

The standard errors of the regression coefficients are givenwithin parentheses. Eq. 9 with four predictor variablescould predict and explain 85.1% and 87.4% of the var-iance respectively. The standard error of this equation is0.24 and all the coefficients (except the intercept) are sig-nificant at 95% level. The calculated toxicity values ac-cording to Eq. 9 are given in Table 1. Scatter plot of ob-served versus leave-one-out predicted values according toEq. 9 is given in Figure 1a. Furthermore, the data set wasdivided into six classes according to the functional groupspresent on the benzene ring such as halo – benzenes, sub-stituted benzoic acids, substituted phenols, substitutedmono – nitro benzenes, substituted dinitrobenzenes andsubstituted anilines. FA–MLR was performed for each ofthe classes and the statistical quality of the equations ob-tained is presented in Table 5.Next, another model - building attempt was made from

the ETA descriptors using the stepwise regression method.The best model obtained is found to be identical to Eq. 9.This was followed by the application of PLS analysis on

the ETA descriptors and the best model obtained is re-ported below:

pC ¼ �3:906X

b0S þ 1:797

Xe=N þ 0:226½

XaP

þ 2:802½X

aP=X

aþ 0:904½h0F Cl � 0:586½h0

F COOH

þ 1:448

n ¼ 51;R2a ¼ 0:878;R2 ¼ 0:893;R ¼ 0:945;

Q2 ¼ 0:863;PRESS ¼ 3:0 ð10Þ

Eq. 10 shows that using six predictor variables, 86.3% pre-dicted variance and 87.8% explained variance can be ach-ieved. The positive coefficients of [Sa]p /Sa in Eq. 9 andEq. 10 indicate that the toxicity increases with increase inbranching. Again, the positive coefficients of [h’F]Cl inEq. 9 and Eq. 10 suggest the positive contribution of thefunctionality of the chloro substituent. The negative coeffi-

cients of [h’F]COOH in Eq. 9 and Eq. 10 and those of [h’F]NH2in both Eq. 9 and Eq. 10 indicate that the toxicity decreas-es as functionality values of carboxylic acid and aminosubstituents increase. Again, the negative coefficient ofP

b’s in Eq. 10 suggests that the toxicity decreases with in-crease in the number of electronegative atoms.

QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371 www.qcs.wiley-vch.de G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 365

Figure 1. Scatter plots of observed versus leave-one-out pre-dicted values for (a) Eq. (9), (b) Eq. (12) and (c) Eq. (16).

Table 5. Summary of the results of FA–MLR on each of theclasses using ETA descriptors:

Sl. No. Class R2 Ra2 Q2 s PRESS

1 Halo-benzenes 0.980 0.974 0.967 0.080 0.0802 Benzoic acids 0.888 0.870 0.791 0.262 0.7673 Phenols 0.951 0.941 0.902 0.242 0.5804 Mono-nitro

Benzenes0.993 0.992 0.985 0.035 0.020

5 Dinitro benzenes 0.868 0.801 – a 0.167 – a

6 Anilines 0.727 0.688 0.428 0.422 2.607

a LOO statistics could not be calculated due to single occurrence of a fea-ture.

QSTR with Extended Topo chemical Atom Indices. 7.a QSAR of Substituted Benzenes to Saccharomyces cerevisiae

Page 8: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

Attempt was also made to use factor scores as the pre-dictor variables to avoid loss of information on the selec-tion of relevant molecular descriptors from the set of de-scriptors and significant increase in statistical qualities wasobtained.

pC ¼ 0:612ð�0:024Þf1� 0:131ð�0:024Þf3þ 0:093 ð�0:024Þf7� 0:121ð�0:024Þf9þ 1:777ð�0:023Þ

n ¼ 51;R2a ¼ 0:937;R2 ¼ 0:942;R ¼ 0:971;

s ¼ 0:166;F ¼ 187:3ðdf4; 46Þ;Q2 ¼ 0:926;PRESS ¼ 1:6 ð11Þ

Eq. 11 could predict and explain 92.6% and 93.7% respec-tively of the variance of the toxicity. The factor scores asmentioned in Eq. 11 signify the importance of differentvariables as shown in asterisks in Table 4. To further checkthe predictive capacity of Eq. 9, leave – 20% – out was ap-plied and the results presented in Table 6 suggest that theequation is of acceptable quality. Next, attempt was madeto compare the ETA relations with non-ETA ones.

3.2 QSAR with non-ETA indices

Table 7 shows rotated loading matrix obtained from factoranalysis of the data matrix composed of the toxicity valuesand selected topological and physicochemical descriptors.Seven factors could explain 96.7% of the variance of thedata matrix. The toxicity of the substituted benzenes wasfound to be highly loaded with factor 2 (which is in turnhighly loaded in f, 0cv, 1cv, 2cv, 3cp

v , 3ccv, AlogP98 and Mol-

Ref).The best formulated relation from these descriptors was

the following:

pC ¼ 2:898ð�0:249Þ3cvc � 0:057ð�0:014ÞS sNH2 þ 0:030 ð�0:007ÞS sCl þ 0:662ð�0:089Þ

n ¼ 51;R2a ¼ 0:855;R2 ¼ 0:863;R ¼ 0:929;

F ¼ 98:9ðdf3; 47Þ; s ¼ 0:253;

Q2 ¼ 0:837; SDEP ¼ 0:266; SPRESS ¼ 0:277; PRESS¼ 3:6 ð12Þ

Eq. 2 is a three – variable relation predicting 83.7% andexplaining 85.5% of the variance of the toxicity. All the re-gression coefficients are significant at 95% level. Scatterplot of observed versus leave-one-out predicted values ac-cording to Eq. 12 is given in Figure 1b. Like the ETA mod-els, FA–MLR was performed using non-ETA descriptorsfor each of the classes into which the dataset was subdivid-ed and the statistical quality of the equations obtained ispresented in Table 8. In order to explore the possibility ofimproving the quality of the relation for all compounds us-ing the non-ETA descriptors (Eq. 12), stepwise regressionwas performed and the best model obtained is noted be-low:

pC ¼ 0:429ð�0:500ÞAlogP98þ 0:269ð�0:055Þ0cV � 0:086 ð�0:314ÞS sCH3 � 0:815ð�0:202Þ

n ¼ 51;R2a ¼ 0:860;R2 ¼ 0:869;R ¼ 0:932;

F ¼ 103:4ðdf3; 47Þ; s ¼ 0:248;Q2 ¼ 0:847; SDEP ¼ 0:258; SPRESS ¼ 0:269;PRESS

¼ 3:4 ð13Þ

Eq. 13 is a three – variable relation predicting 84.7% andexplaining 86.0 % of the variance of the toxicity. Subse-quently, PLS analysis was performed and the followingequation was obtained:

pC ¼ 0:084H bond accþ 0:2962cv þ 1:0043cvc þ 0:185 AlogP98þ 0:020S sCl � 0:186S dssC � 0:005

n ¼ 51;R2a ¼ 0:868;R2 ¼ 0:884;R ¼ 0:940;

Q2 ¼ 0:867;PRESS ¼ 2:9 ð14Þ

Eq. 14 shows that using six predictor variables, 86.7% pre-dicted variance and 86.8% explained variance can be ach-ieved.

366 G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371

Table 6. Results of leave-many-out cross-validation applied on Eqs. 9, 12 and 16Model equation, pC¼Sbi xiþa

KeyEq.no

Type ofcross-validation

Number ofcycles

Average regression coefficients (� standard deviations) Statistics Q2

(Average Pres)

9 Leave – 20% – out 5a 5.030 (�0.254)[Sa]P/)[Sa]þ0.960 (�0.148)[h/F]Cl�1.24(�0.193) [h/F]NH2�0.458 (�0.159) [h/F]COOHþ0.141 (�0.087)

0.857 (0.204)

12 Leave – 20% – out 5a 2.911 (�0.071) 3cVC�0.058 (�0.011) S_sNH2þ0.030(�0.002) S_sClþ0.659 (�0.025)

0.843 (0.189)

16 Leave – 20% – out 5a 2.879 (�0.084) 3cVCþ1.155 (�0.158) [h/F]Cl�0.554 (�0.138)[h/F]COOH�1.548(�0.258) [h/F]NH2þ0.700 (�0.028)

0.838 (0.206)

Q2 denotes cross-validated R2. Average Pres means average of absolute values of predicted residuals.a Compounds were deleted in 5 cycles in the following manner: (1, 6, 11, 16, ... . . .46, 51), (2, 7, 12, 17,.... . . 42, 47),.... . . . . . . . . . . , (5, 10, 15, 20,.... . . . .45, 50).

Full Papers Kunal Roy and Indrani Sanyal

Page 9: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

When factor scores were used as predictor variables, thefollowing equation was obtained:

pC ¼ 0:586ð�0:030Þf2� 0:138ð�0:030Þf3þ 0:152 ð�0:030Þf4� 0:112ð�0:030Þf5þ 1:777ð�0:030Þ

n ¼ 51;R2a ¼ 0:895;R2 ¼ 0:903;R ¼ 0:950;

s ¼ 0:215;F ¼ 107:6ðdf4; 46Þ;Q2 ¼ 0:878; PRESS¼ 2:7 ð15Þ

Eq. 15 could predict and explain 87.8% and 89.5% respec-tively of the variance of toxicity. The factor loadings asmentioned in Table 7 signify the importance of differentvariables as shown in asterisks in Table 7. Next, attemptwas made to use non-ETA descriptors along with ETAones to further improve the ETA models.

3.3 QSAR with ETA and non-ETA indices

Table 9 shows factor analysis results of the data matrixcomposed of ETA and non-ETA descriptors. Eight factors

QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371 www.qcs.wiley-vch.de G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 367

Table 7. Factor loadings of the non-ETA variables (topological and physicochemical) after VARIMAX rotation

Variables Factor 1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6 Factor 7 Communality

pC 0.039 0.883 �0.208 0.229 �0.169 �0.033 0.019 0.906JX 0.339 0.493 0.118 0.332 �0.036 0.559 �0.220 0.845f 0.517 0.821* �0.136 0.134 0.093 �0.096 �0.004 0.996Wiener 0.980* 0.150 �0.052 �0.007 �0.027 0.044 �0.011 0.989LogZ 0.979* 0.186 �0.037 0.021 0.024 0.054 0.007 0.998Zagreb 0.952* 0.283 �0.001 0.070 0.019 0.081 0.012 0.9990c 0.963* 0.253 �0.011 0.050 0.019 0.071 0.012 0.9991c 0.980* 0.181 �0.036 0.010 0.020 0.051 0.008 0.9982c 0.960* 0.263 0.010 0.022 0.004 0.049 0.031 0.9953cp 0.853* 0.410 0.004 0.236 0.069 0.162 �0.009 0.9823cc 0.876* 0.425 0.095 0.074 �0.017 0.061 0.068 0.9710cv 0.357 0.928* �0.012 �0.052 �0.045 0.038 �0.048 0.9971cv 0.305 0.945* �0.015 �0.079 �0.024 0.021 �0.047 0.9962cv 0.115 0.987* 0.014 �0.088 �0.050 0.013 �0.040 0.9993cp

v 0.196 0.933* �0.017 0.191 0.007 0.094 �0.061 0.9583cc

v �0.027 0.990* 0.051 �0.069 �0.068 0.016 �0.027 0.9951k 0.973* 0.217 �0.022 0.028 0.018 0.059 0.011 0.9992ka 0.988* 0.000 �0.090 �0.102 0.021 �0.017 0.009 0.9963ka 0.874* �0.138 �0.039 �0.324 0.003 �0.215 0.112 0.9481ka 0.812* 0.563 �0.047 0.129 0.051 0.024 0.007 0.9991ka 0.730* 0.643 �0.165 0.052 0.083 �0.108 0.001 0.9953ka 0.466 0.671 �0.142 �0.230 0.055 �0.387 0.129 0.909S_sCH3 0.045 �0.075 �0.124 �0.076 �0.058 0.942* 0.023 0.920S_ aaCH �0.670 �0.556 �0.122 �0.404 �0.037 �0.099 �0.121 0.962S_dssC �0.111 0.124 �0.179 0.217 �0.808* 0.237 �0.272 0.889S_aasC �0.726* 0.382 0.241 �0.268 �0.100 0.256 0.003 0.879S_sNH2 �0.159 �0.070 0.975* �0.047 �0.106 �0.058 �0.024 0.998S_ddsN �0.866* 0.129 0.193 0.089 0.353 �0.036 0.124 0.953S_sOH �0.109 �0.127 0.009 0.115 0.941* 0.077 0.009 0.934S_dO 0.922* �0.191 �0.192 �0.176 �0.197 �0.022 �0.054 0.997S_sF 0.042 �0.103 �0.033 0.020 0.173 �0.026 0.944* 0.936S_sCl �0.152 0.535 �0.088 0.785* �0.059 �0.145 �0.013 0.958S_sBr �0.190 0.571 0.054 �0.719* �0.023 �0.117 �0.054 0.899H_bond_acc 0.774* 0.471 �0.272 0.215 0.083 �0.203 0.015 0.990H_bond_don �0.193 �0.133 0.913* 0.005 0.323 �0.031 �0.013 0.993AlogP98 �0.188 0.776* �0.471 0.313 �0.168 0.082 �0.002 0.993MolRef 0.584 0.773* 0.146 �0.034 �0.130 0.104 �0.090 0.998% variance 0.421 0.292 0.067 0.057 0.054 0.047 0.030 0.967

a Factor loadings more than 0.7 are shown with asterisks.

Table 8. Summary of the results of FA–MLR on each of theclasses using non - ETA descriptors:

Sl. No. Class R2 Ra2 Q2 s PRESS

1 Halo-benzenes 0.878 0.843 0.741 0.205 0.6272 Benzoic acids 0.888 0.870 0.791 0.262 0.7673 Phenols 0.875 0.850 0.776 0.385 1.3284 Mono-nitro

Benzenes0.961 0.950 – a 0.085 – a

5 Dinitro benzenes 0.879 0.818 – a 0.159 – a

6 Anilines 0.751 0.715 0.428 0.403 2.606

a LOO statistics could not be calculated due to single occurrence of a fea-ture.

QSTR with Extended Topo chemical Atom Indices. 7.a QSAR of Substituted Benzenes to Saccharomyces cerevisiae

Page 10: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

could explain 95.7% of the variance. The toxicity of thesubstituted benzenes was found to be highly loaded withfactor 2 (which is in turn highly loaded in [Sa]p, [Sa]p/Sa,

f, 0cv, 1cv, 2cv, 3cpv, 3cc

v, AlogP98). Based on the result of fac-tor analysis, the following equation was obtained using thecombined set of descriptors:

368 G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371

Table 9. Factor loadings of the variables (ETA and non-ETA) after VARIMAX rotation

Factor1 Factor 2 Factor 3 Factor 4 Factor 5 Factor 6 Factor 7 Factor 8 Communality

pC 0.037 0.874 0.239 �0.196 �0.197 �0.055 0.016 0.032 0.905Sbs 0.976* 0.178 0.112 �0.014 �0.027 0.018 �0.012 0.008 0.999Sbns 0.901* �0.089 �0.154 �0.299 �0.179 0.026 �0.157 �0.056 0.994Sb/s 0.761* �0.010 0.413 �0.126 0.128 �0.183 �0.066 �0.009 0.819Sb/ns 0.258 �0.127 �0.062 �0.750* �0.040 �0.136 0.191 �0.068 0.710Se /N 0.733* 0.304 0.391 �0.077 �0.229 �0.342 0.071 0.062 0.966Sa 0.467 0.561 �0.288 0.139 0.084 �0.026 �0.289 �0.054 0.729[Sa]2 0.384 0.608 �0.327 0.073 �0.042 �0.056 �0.277 �0.039 0.712[h/F]Cl �0.150 0.477 0.832* �0.081 �0.094 �0.151 0.035 �0.045 0.985[h/F]NO2 0.801* �0.171 �0.182 �0.387 �0.159 0.072 �0.182 �0.075 0.923[h/F]Br �0.230 0.503 �0.716* 0.015 �0.086 �0.217 �0.097 �0.028 0.883[h/F]CH3 0.162 �0.084 �0.033 �0.105 �0.108 0.941* �0.010 �0.035 0.944[h/F]NH2 �0.191 �0.053 �0.005 �0.080 0.953* �0.054 �0.148 �0.022 0.979[h/F]COOH 0.088 �0.139 �0.125 0.927* 0.082 �0.159 0.035 0.147 0.956[h/F]F 0.030 �0.098 �0.012 0.184 �0.038 �0.038 �0.005 0.976* 0.999[h/F]OH �0.236 �0.091 0.107 �0.102 �0.140 0.074 0.885* �0.042 0.895[Sa]p 0.175 0.978* �0.016 �0.046 0.038 �0.051 0.020 �0.031 0.995[Sa]p/Sa 0.036 0.971* 0.051 �0.073 �0.020 �0.041 �0.049 �0.053 0.959JX 0.343 0.518 0.171 �0.271 0.169 0.403 0.428 �0.063 0.867f 0.508 0.816* 0.153 0.098 �0.157 �0.103 0.037 0.001 0.994Wiener 0.975* 0.172 �0.050 �0.024 �0.048 0.057 �0.002 0.012 0.989logZ 0.974* 0.202 0.002 0.048 �0.040 0.075 �0.008 0.014 0.998Zagreb 0.947* 0.298 0.053 0.039 �0.004 0.095 0.013 0.018 0.9990c 0.957* 0.268 0.034 0.043 �0.015 0.088 0.002 0.017 0.9991c 0.975* 0.198 �0.006 0.049 �0.040 0.075 �0.017 0.013 0.9992c 0.953* 0.278 0.022 0.055 0.003 0.080 �0.040 0.019 0.9983cp 0.853* 0.419 0.192 0.015 0.008 0.128 0.155 0.027 0.9813cc 0.867* 0.437 0.097 0.056 0.085 0.096 �0.051 0.033 0.9760cv 0.339 0.936* �0.025 �0.013 �0.023 0.025 �0.018 �0.042 0.9961cv 0.286 0.952* �0.047 0.013 �0.030 0.009 �0.023 �0.043 0.9952cv 0.094 0.991* �0.040 �0.003 �0.001 0.002 �0.042 �0.048 0.9973cp

v 0.188 0.929* 0.186 �0.041 �0.020 0.041 0.124 �0.021 0.9533cc

v �0.047 0.993* �0.012 �0.015 0.036 0.004 �0.047 �0.046 0.9931k 0.967* 0.233 0.013 0.046 �0.026 0.081 �0.010 0.015 0.9992ka 0.981* 0.018 �0.112 0.073 �0.097 0.029 �0.080 0.006 0.9973ka 0.857* �0.126 �0.251 0.189 �0.069 �0.085 �0.299 0.024 0.9521ka 0.806* 0.570 0.126 0.062 �0.058 0.025 0.036 0.013 0.9991ka 0.720* 0.645 0.069 0.113 �0.186 �0.092 �0.017 0.000 0.9953ka 0.441 0.663 �0.092 0.259 �0.197 �0.269 �0.310 0.013 0.918S_sCH3 0.026 �0.058 �0.013 �0.063 �0.094 0.968* 0.075 �0.032 0.960S_ aaCH �0.677 �0.557 �0.373 0.005 �0.122 �0.053 �0.162 �0.134 0.970S_dssC �0.089 0.146 0.121 �0.920* �0.088 0.157 �0.036 �0.182 0.957S_aasC �0.749* 0.379 �0.097 0.084 0.222 0.318 �0.236 �0.127 0.944S_sNH2 �0.162 �0.057 �0.032 0.034 0.971* �0.060 �0.125 �0.049 0.996S_ddsN �0.870* 0.093 0.181 0.382 0.152 �0.027 0.092 0.060 0.979S_sOH �0.102 �0.147 0.002 0.648 �0.038 �0.051 0.722* 0.079 0.984S_dO 0.920* �0.164 �0.206 �0.148 �0.178 0.026 �0.162 �0.041 0.997S_sF 0.030 �0.098 �0.012 0.184 �0.038 �0.038 �0.005 0.976* 0.999S_sCl �0.128 0.492 0.827* �0.083 �0.088 �0.160 0.057 �0.041 0.987S_sBr �0.223 0.601 �0.729* 0.006 0.038 �0.149 �0.073 �0.005 0.972H_bond_acc 0.779* 0.469 0.171 0.027 �0.283 �0.224 0.092 0.053 0.998H_bond_don �0.192 �0.130 �0.028 0.329 0.887* �0.085 0.195 �0.010 0.995AlogP98 �0.188 0.757* 0.362 �0.205 �0.461 0.069 0.000 �0.008 0.997% variance 0.363 0.262 0.069 0.068 0.065 0.050 0.041 0.038 0.957

a Factor loadings more than 0.7 are shown with asterisks.

Full Papers Kunal Roy and Indrani Sanyal

Page 11: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

pC ¼ 2:857ð�0:249Þ3cvc þ 1:163ð�0:273Þ½h0F Cl � 0:562

ð�0:210Þ½h0F COOH � 1:526ð�0:475Þ

½h0F NH2

þ0:707 ð�0:094Þn ¼ 51;R2

a ¼ 0:852;R2 ¼ 0:884;R ¼ 0:940;F ¼ 73:0ðdf4; 46Þ; s ¼ 0:255;Q2 ¼ 0:824; SDEP ¼ 0:277; SPRESS

¼ 0:291;PRESS ¼ 3:9 ð16Þ

Eq. 16 is a four – variable relation predicting 82.4% andexplaining 85.2% of the variance of the toxicity. All the re-gression coefficients are significant at 95% level. Scatterplot of observed versus leave-one-out predicted values ac-cording to Eq. 16 is given in Figure 1c. Like the previoustwo cases (ETA and non-ETA models), FA–MLR was re-peated for each of the classes into which the data set wassubdivided and the statistical quality of the equations ispresented in Table 10. An attempt was made to improvethe relation for the whole data set obtained from com-bined set of descriptors (Eq. 16) using stepwise regressionand the best model obtained is noted below:

pC ¼ 3:60ð�0:500Þ½X

ap=X

aþ 0:296ð�0:055Þ

AlogP98þ 0:91ð�0:314ÞX

e=N � 0:708ð�0:209Þn ¼ 51;R2

a ¼ 0:888;R2 ¼ 0:894;R ¼ 0:946;F ¼ 132:7ðdf3; 47Þ; s ¼ 0:222;Q2 ¼ 0:875; SDEP ¼ 0:230; SPRESS ¼ 0:239;PRESS

¼ 2:7 ð17Þ

Eq. 17 is a three – variable relation predicting 87.5% andexplaining 88.8% of the variance of the toxicity. This is fol-lowed by the application of PLS analysis on the combinedset of descriptors and the best model obtained is reportedbelow:

pC ¼ 0:219½h0F NO2

þ 2:172½X

aP=X

aþ 1:0563cvc

þ 0:565½h0F Cl þ 1:389½h0

F NH2þ 1:089½h0

F OH � 0:0792k

þ 0:071H bond accþ 0:191A logP98� 0:099H bond don

n ¼ 51;R2a ¼ 0:886;R2 ¼ 0:909;R ¼ 0:953;

Q2 ¼ 0:875;PRESS ¼ 2:7 ð18Þ

Eq. 18 shows that using ten predictor variables, 87.5% pre-dicted variance and 88.6% explained variance can be ach-ieved. Subsequently, PCRA was repeated with the com-bined set of descriptors using the factor scores as the pre-dictor variables. The equation thus obtained was of the fol-lowing statistical quality:

pC ¼ 0:580ð�0:031Þf2� 0:159ð�0:031Þf3� 0:130 ð�0:031Þf4� 0:131ð�0:031Þf5þ 1:777ð�0:031Þ

n ¼ 51;R2a ¼ 0:890;R2 ¼ 0:899;R ¼ 0:948;

s ¼ 0:220;F ¼ 102:3ðdf4; 46Þ;Q2 ¼ 0:869;PRESS¼ 2:9 ð19Þ

Eq. 19 could predict and explain 86.9% and 89.0% respec-tively of the variance of toxicity. The factor scores as men-tioned in Eq. 19 signify the importance of different varia-bles as shown in asterisks in Table 9. The derived relations(Eqs. 11, 15 and 19) from principal component regressionanalysis were of excellent statistical qualities (Q2 valuesbeing 0.926, 0.878 and 0.869 while R2 values being 0.942,0.903 and 0.899 for factor scores derived from ETA, non-ETA and combined matrices respectively).

3.4 Overview of the models

A comparative study has been done on the basis of thestatistical qualities obtained from the three different mod-els (ETA, non-ETA and the combined set including bothETA and non-ETA descriptors) using FA–MLR, stepwiseregression, PLS and PCRA as the necessary statisticaltools. Table 11 shows the comparison of the statisticalquality of the different models. In case of the ETA models,a significant increase in the statistical quality (R2¼0.942,Ra

2¼0.937,Q2¼0.926 and PRESS¼1.6) was obtained us-ing PCRA compared to the three other techniques used.Both FA–MLR and stepwise regression produced thesame relation with the predicted variance (Q2) value beingequal to 0.851. The statistical quality (Q2¼0.863) obtainedfrom PLS is slightly better than that of FA-MLR.For the non-ETA models, the best statistical quality

(R2¼0.903, Ra2¼0.895, Q2¼0.878 and PRESS¼2.9) wasobtained using PCRA, although it was inferior in compari-son to that of PCRA equation with ETA descriptors. Thestatistical quality obtained using FA- MLR (Q2¼0.837) isslightly inferior compared to those obtained from stepwise(Q2¼0.847) and PLS (Q2¼0.867) regression.For the combined models, the statistical quality ob-

tained using stepwise (R2¼0.894, Ra2¼0.888, Q2¼0.875and PRESS¼2.7) is almost comparable to that obtainedusing PLS (R2¼0.909, Ra

2¼0.886, Q2¼0.875 andPRESS¼2.7). Unlike the previous two cases, this time the

QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371 www.qcs.wiley-vch.de G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 369

Table 10. Summary of the results of FA–MLR on each of theclasses using both ETA and non-ETA descriptors:

Sl. No. Class R2 Ra2 Q2 s PRESS

1 Halo-benzenes 0.980 0.974 0.967 .083 0.0802 Benzoic acids 0.995 0.993 – a 0.063 – a

3 Phenols 0.938 0.907 0.562 0.303 2.5994 Mono-nitro Benzenes 0.961 0.950 – a 0.085 – a

5 Dinitro benzenes 0.868 0.801 – a 0.167 – a

6 Anilines 0.751 0.715 0.428 0.403 2.606

aLOO statistics could not be calculated due to single occurrence of a fea-ture.

QSTR with Extended Topo chemical Atom Indices. 7.a QSAR of Substituted Benzenes to Saccharomyces cerevisiae

Page 12: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

statistical quality (Q2¼0.869) obtained from PCRA isslightly inferior, whereas the predicted variance (Q2¼0.824) value obtained from FA–MLR is the least of all thestatistical qualities obtained from all the models.Previously QSAR models were reported [21] in which

the same dataset was divided into six different classes(halo – benzenes, substituted benzoic acids, substitutedphenols, substituted mono – nitro benzenes, substituted di-nitrobenzenes and substituted anilines) and the first threeclasses (halo – benzenes, substituted benzoic acids and sub-stituted phenols) were modeled using logKow (octanol-wa-ter partition coefficient) and the R2 statistics of the bestmodels obtained were 0.98 (s¼0.07, F¼534.81), 0.99 (s¼0.06, F¼988.89) and 0.97 (s¼0.13, F¼223.64) respective-ly. The remaining three classes (substituted mono – nitrobenzenes, substituted dinitrobenzenes and substituted ani-lines) were modeled using an electronic parameter (Ham-mett sigma) along with logKow and the R

2 of the best mod-els obtained were 0.99 (s¼0.02, F¼1040.50), 0.96 (s¼0.08, F¼108.68) and 0.98 (s¼0.06, F¼775.10) respective-ly. However, in that communication no attempt was madeto model the whole set of 51 compounds while the focus ofthe present communication is to explore QSAR of thewhole data set using different statistical techniques and de-scriptors. In the present study, we have also modeled eachof the classes against ETA, non-ETA and the combined in-dices using FA- MLR.Table 5 shows the results of FA–MLR on each of the

classes using ETA descriptors. The R2 values obtained arequite comparable to the previous models reported. Themaximum R2 (0.993) is obtained for the mono-nitro ben-zenes. However, the R2 (0.727) for anilines is considerablyless. As QSAR was not reported for the whole data set inthe previous communication [21], no comparison can bemade with the present models (Eqs. 9, 10, 11).Table 8 gives an insight for the statistical qualities ob-

tained from each of the classes using non-ETA descriptors.The mono-nitro benzenes topped the list once again withR2 being equal to 0.961 and the anilines produced the least

R2 (0.751). But the overall R2 values obtained are inferiorcompared to the previous case.Finally, Table 10 shows the results of FA-MLR when

ETA indices were combined with the non-ETA ones. Inthis case the benzoic acids produced the maximum R2

(0.995) and following the previous trends, the anilines pro-duced the least R2 (0.715). Table 10 shows an overall im-provement in the R2 values when the non-ETA descriptorsare combined with that of ETA, which means that the stat-istical quality of the non-ETA relations is improved by theETA indices.

4 Conclusion

This study suggests that ETA parameters are sufficientlyrich in chemical information to encode the structural fea-tures contributing significantly to the nonspecific toxicityof substituted benzenes to Saccharomyces cerevisiae. Thisindicates that ETA indices merit further assessment to ex-plore their potential in QSAR/QSPR/QSTR modeling.

Acknowledgement

One of the authors (KR) thanks the All India Council forTechnical Education (AICTE), New Delhi for financial as-sistance under the Career Award for Young Teachersscheme.

References

[1] D. Mackay, J. Hubbarde, E. Webster, QSAR Comb. Sci.2003, 22, 106 – 112.

[2] M. Vighi, P. Gramatica, F. Consolaro, R. Todeschini, Eco-toxicol. Environ. Saf. 2001, 49, 206 – 220.

[3] J. D. McKinney, A. Richard, C. Waller, M. C. Newman, F.Gerberick, Toxicol. Sci. 2000, 56, 8 – 17.

[4] S. P. Bradbury, C. L. Russom, G. T. Ankley, T. W. Schultz,J. D. Walker, Environ. Toxicol. Chem. 2003, 22, 1789 – 1798.

[5] C. L. Russom, E. B. Anderson, B. E. Greenwood, A. Pilli,Sci. Total Environ. 1991, 667, 109 – 110.

[6] S. Ren, Environ. Toxicol. 2002, 17, 119 – 127.[7] J. Huuskonen, Chemosphere 2003, 50, 949 – 953.[8] M. T. Cronin, T. I. Netzeva, J. C. Dearden, R. Edwards,A. D. Worgan, Chem. Res. Toxicol. 2004, 17, 545 – 554.

[9] P. Mazzatorta, E. Benfenati, P. Lorenzini, M. J. Vighi, J.Chem. Inf. Comput. Sci. 2004, 44, 105 – 112.

[10] M. T. Cronin, T. W. Schultz, Chem. Res. Toxicol. 2001, 14,1284 – 1295.

[11] F. R. Burden, D. A. Winkler, Chem. Res. Toxicol. 2000, 13,436 – 440.

[12] F. Briens, R. Bureau, S. Rault, Ecotoxicol. Environ. Saf.1999, 43, 241 – 251.

[13] P. Mazzatorta, E. Benfenati, C. D. Neagu, G. Gini, J. Chem.Inf. Comput. Sci. 2003, 43, 513 – 518.

[14] J. Huuskonen, Chemosphere 2003, 50, 949 – 953.

370 G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371

Table 11. Comparison of statistical quality of different models

Model Statistical Tools R2 Ra2 Q2 PRESS

ETA FA–MLR 0.884 0.874 0.851 3.3Stepwise 0.884 0.874 0.851 3.3PLS 0.893 0.878 0.863 3.0PCRA 0.942 0.937 0.926 1.6

Non-ETA FA–MLR 0.863 0.855 0.837 3.6Stepwise 0.869 0.860 0.847 3.4PLS 0.884 0.868 0.867 2.9PCRA 0.903 0.895 0.878 2.7

Combined FA–MLR 0.884 0.852 0.824 3.9Stepwise 0.894 0.888 0.875 2.7PLS 0.909 0.886 0.875 2.7PCRA 0.899 0.890 0.869 2.9

Full Papers Kunal Roy and Indrani Sanyal

Page 13: QSTR with Extended Topochemical Atom Indices. 7. QSAR of Substituted Benzenes to Saccharomyces cerevisiae

[15] S. A. Kulkarni, D. V. Raje, T. Chakraborti, SAR QSAR En-viron. Res. 2001, 12, 565 – 591.

[16] A. R. Katritzky, D. B. Tatham, U. Maran, J. Chem. Inf.Comput. Sci. 2001, 41, 1162 – 1176.

[17] W. Di Marzio, S. Galassi, R. Todeschini, F. Consolaro, Che-mosphere 2001, 44, 401- 406.

[18] M. Vighi, P. Gramatica, F. Consolaro, R. Todeschini, Eco-toxicol. Environ. Saf. 2001,. 49, 206 – 220.

[19] S. C. Basak, G. D. Grunwald, B. D. Gute, K. Balasubramani-an, D. Opitz, J. Chem. Inf. Comput. Sci. 2000, 40, 885 – 890.

[20] J. Devillers, SAR QSAR Environ. Res. 2001, 11, 397 – 417.[21] Y.-Y. Liao, L.-S. Wang, Y.-B. He, H. Yang, Bull. Environ.

Contam. Toxicol. 1996, 56, 460 – 466.[22] K. Roy, G. Ghosh, Internet Electron. J. Mol. Des. 2003, 2,

599 – 620, http://www.biochempress.com.[23] K. Roy, G. Ghosh, J. Chem. Inf. Comput. Sci. 2004, 44,

559 – 567.[24] K. Roy, G. Ghosh, QSAR Comb. Sci. 2004, 23, 99 – 108[25] K. Roy, G. Ghosh, QSAR Comb. Sci. 2004, 23, 526 – 535.[26] K. Roy, G. Ghosh, Bioorg. Med. Chem. 2004, 13, 1185 –

1194, http://dx.doi.org/10.1016/j.bmc.2004.11.014[27] K. Roy, G. Ghosh, J. Mol. Model. 2005, 11, in press, http://

dx.doi.org/10.1007/s00894-005-0033-7.[28] D. K. Pal, C. Sengupta, A. U. De, Indian J. Chem. 1988,

27B, 734 – 739.[29] D. K. Pal, C. Sengupta, A. U. De, Indian J. Chem. 1989,

28B, 261 – 267.[30] D. K. Pal, M. Sengupta, C. Sengupta, A. U. De, Indian J.

Chem. 1990, 29B, 451 – 454.[31] D. K. Pal, S. K. Purkayastha, C. Sengupta, A. U. De, Indian

J. Chem. 1992, 31B, 109 – 114.[32] K. Roy, D. K. Pal, A. U. De, C. Sengupta, Indian J. Chem.

1999, 38B, 664 – 671.[33] K. Roy, D. K. Pal, A. U. De, C. Sengupta, Indian J. Chem.

2001, 40B, 129 – 135.[34] K. Roy, A. Saha, J. Mol. Model. 2003, 9, 259 – 270.

[35] K. Roy, A. Saha, Internet Electron. J. Mol. Des. 2003, 2,288 – 305, http://www.biochempress.com.

[36] K. Roy, A. Saha, Internet Electron. J. Mol. Des. 2003, 2,475 – 491, http://www.biochempress.com.

[37] K. Roy, S. Chakroborty, C. C. Ghosh, A. Saha, J. IndianChem. Soc. 2004, 81, 115 – 125.

[38] K. Roy, A. Saha, Indian J. Chem. 2004, 43A, 1369 – 1376.[39] K. Roy, A. Saha, Indian J. Chem. 2005, 44B, 1693 – 1707.[40] R. Franke, Theoretical Aspects of Rational Drug Design,

Elsevier, Amsterdam, 1984, pp. 184 – 193.[41] P. J. Lewi, Multivariate data analysis in structure-activity re-

lationships; in: E. J. Ariens (Ed.), Drug Design, vol. 10,Academic Press, New York, 1980, pp. 307 – 342.

[42] R. Franke, A. Gruska, Principal component and factor anal-ysis; in: H. van de Waterbeemd (Ed.), Chemometric Meth-ods in Molecular Design, vol. 2, VCH, Weinheim, 1995, pp.113 – 163.

[43] SPSS is statistical software of SPSS Inc., USA[44] MINITAB is statistical software of MINITAB Inc., USA.[45] The GW-BASIC programs RRR98, KRETA1, KRETA2,

KRPRES1 and KRPRES2 were developed by Kunal Royand standardized using known data sets.

[46] Cerius 2 version 4.8 is a product of Accelrys Inc., San Die-go,CA.

[47] G. W. Snedecor, W. G. Cochran, Statistical Methods, Oxfordand IBH Publishing Co. Pvt.Ltd., New Delhi, 1967,pp. 381 – 418.

[48] S. Wold, L. Eriksson, Statistical Validation of QSAR Re-sults, In: H. van de Waterbeemd, (Ed.), Chemometric Meth-ods in Molecular Design, VCH, Weinheim, 1995, pp. 312 –317.

[49] A. K. Debnath, Quantitative Structure-Activity Relationship(QSAR): A Versatile Tool in Drug Design, in: A. K. Ghose,V. N. Viswanadhan (Eds.), Combinatorial Library Designand Evaluation, Marcel Dekker, Inc., New York, 2001,pp. 73 – 129.

QSAR Comb. Sci. 25, 2006, No. 4, 359 – 371 www.qcs.wiley-vch.de G 2006 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 371

QSTR with Extended Topo chemical Atom Indices. 7.a QSAR of Substituted Benzenes to Saccharomyces cerevisiae