10
QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal Component Regression Analysis Kunal Roy* and Gopinath Ghosh Drug Theoretics and Cheminformatics Lab, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032 (India), E-mail: [email protected], URL: http://www.geocities.com/kunalroy_in, Tel: 91-33-2414 6676, Fax: 91-33-24146677 Full Paper The present paper deals with modeling of the acute toxicity of 56 phenylsulfonyl carboxylates to Vibrio fischeri. Principal component factor analysis has been used as the data-preprocessing step for the selection of independent variables for the subsequent multiple regres- sion analysis. The statistical quality of the best model using ETA descriptors is as follows: n 56, Q 2 0.726, R a 2 0.837, R 0.923, F 57.4 (df 5, 50), s 0.186, AVRES 0.136. Attempt has also been made to model the data set with different non-ETA parameters (topological indices including Wiener, Hosoya Z, molecular connectivity, kappa shape, Balaban J and E-State parameters apart from physicochemical parameters like AlogP98, MolRef and H-bond-acceptor) and the best model shows the following quality: n 56, Q 2 0.763, R a 2 0.798, R 0.903, F 44.3 (df 5, 50), s 0.207, AVRES 0.139. An attempt to use both ETA and non-ETA parameters has lead to an equation which is somewhat better in statistical quality than the best ETA model: n 56, Q 2 0.779, R a 2 0.861, R 0.935, F 69.4 (df 5, 50), s 0.172, AVRES 0.127. Use of the ETA indices has suggested that negative contributions of steric bulk, branching, functionality of C 10 , functionality of chloro substituent at X 1 position and presence of unsaturation at the substituent(s) on C 10 , and positive contributions of functionality of O 13 and presence of substituents with electronegative atoms at R 2 and R 3 positions. Using factor scores as independent variables, principal component regression analysis has been per- formed and the derived relations are of excellent statistical qualities (Q 2 values being 0.816, 0.828 and 0.848 while R 2 values being 0.894, 0.871 and 0.905 for factor scores derived from ETA, non-ETA and combined matrices respectively) which are comparable to that (Q 2 and R 2 values being 0.790 and 0.920 respectively) of the relation generated from comparative molecular field analysis on the same data set. 1 Introduction Quantitative structure-activity relationship (QSAR) analy- sis has become an indispensable tool in ecotoxicological risk assessments, which are used in formulating regulatory decisions of environmental protection agencies [1 ± 3]. Due to shortage of experimental data, QSAR estimates for the selection of persistent, bio-accumulative and toxic (PBT) substances appear as an attractive alternative [4]. It has been argued that all new chemicals should be assessed using a consistent and transparent methodology that uses chemical property data derived from QSARs, or exper- imental determination when possible and applies evaluative or regio-specific environmental models [5]. QSAR methods routinely result in ecotoxicity estimations of acute and chronic toxicity to various organisms, and in fate estimations of physical/chemical properties, degradation, and biocon- centration [6]. It is now possible to predict accurately potential of organic chemicals to cause diverse effects to a range of organisms and degrade or partition within the environment [7]. QSARs have also been used in exploring the mechanism of toxic actions of chemicals [8]. Many QSAR approaches and statistical methods have been adopted to explore ecotoxicological modeling of diverse categories of organic compounds. Cui et al. have 526 QSAR Comb. Sci. 2004, 23 DOI: 10.1002/qsar.200430891 ¹ 2004 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim * Author for correspondence. Key words: QSAR, QSTR, Extended topochemical atom index, ETA, TAU, VEM Kunal Roy and Gopinath Ghosh & Combinatorial Science

QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

Embed Size (px)

Citation preview

Page 1: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

QSTR with Extended Topochemical Atom Indices. 4. Modeling ofthe Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeriusing Principal Component Factor Analysis and PrincipalComponent Regression AnalysisKunal Roy* and Gopinath Ghosh

Drug Theoretics and Cheminformatics Lab, Division of Medicinal and Pharmaceutical Chemistry, Department of PharmaceuticalTechnology, Jadavpur University, Kolkata 700 032 (India), E-mail: [email protected], URL: http://www.geocities.com/kunalroy_in,Tel: � 91-33-2414 6676, Fax: � 91-33-24146677

Full Paper

The present paper deals with modeling of the acutetoxicity of 56 phenylsulfonyl carboxylates to Vibriofischeri. Principal component factor analysis has beenused as the data-preprocessing step for the selection ofindependent variables for the subsequent multiple regres-sion analysis. The statistical quality of the best model usingETA descriptors is as follows: n� 56, Q2� 0.726, Ra

2�0.837, R� 0.923, F� 57.4 (df 5, 50), s� 0.186, AVRES�0.136. Attempt has also been made to model the data setwith different non-ETA parameters (topological indicesincluding Wiener, Hosoya Z, molecular connectivity,kappa shape, Balaban J and E-State parameters apartfrom physicochemical parameters like AlogP98, MolRefand H-bond-acceptor) and the best model shows thefollowing quality: n� 56,Q2� 0.763, Ra

2� 0.798, R� 0.903,F� 44.3 (df 5, 50), s� 0.207, AVRES� 0.139. An attemptto use both ETA and non-ETA parameters has lead to anequation which is somewhat better in statistical quality

than the best ETA model: n� 56, Q2� 0.779, Ra2� 0.861,

R� 0.935, F� 69.4 (df 5, 50), s� 0.172, AVRES� 0.127.Use of the ETA indices has suggested that negativecontributions of steric bulk, branching, functionality of C10,functionality of chloro substituent at X1 position andpresence of unsaturation at the substituent(s) on C10, andpositive contributions of functionality of O13 and presenceof substituents with electronegative atoms at R2 and R3

positions. Using factor scores as independent variables,principal component regression analysis has been per-formed and the derived relations are of excellent statisticalqualities (Q2 values being 0.816, 0.828 and 0.848 while R2

values being 0.894, 0.871 and 0.905 for factor scoresderived from ETA, non-ETA and combined matricesrespectively) which are comparable to that (Q2 and R2

values being 0.790 and 0.920 respectively) of the relationgenerated from comparative molecular field analysis onthe same data set.

1 Introduction

Quantitative structure-activity relationship (QSAR) analy-sis has become an indispensable tool in ecotoxicological riskassessments, which are used in formulating regulatorydecisions of environmental protection agencies [1 ± 3].Due to shortage of experimental data, QSAR estimatesfor the selection of persistent, bio-accumulative and toxic(PBT) substances appear as an attractive alternative [4]. It

has been argued that all new chemicals should be assessedusing a consistent and transparent methodology that useschemical property data derived from QSARs, or exper-imental determinationwhen possible and applies evaluativeor regio-specific environmental models [5]. QSARmethodsroutinely result in ecotoxicity estimations of acute andchronic toxicity to various organisms, and in fate estimationsof physical/chemical properties, degradation, and biocon-centration [6]. It is now possible to predict accuratelypotential of organic chemicals to cause diverse effects to arange of organisms and degrade or partition within theenvironment [7]. QSARs have also been used in exploringthe mechanism of toxic actions of chemicals [8].Many QSAR approaches and statistical methods have

been adopted to explore ecotoxicological modeling ofdiverse categories of organic compounds. Cui et al. have

526 QSAR Comb. Sci. 2004, 23 DOI: 10.1002/qsar.200430891 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim

* Author for correspondence.

Key words: QSAR, QSTR, Extended topochemical atom index,ETA, TAU, VEM

Kunal Roy and Gopinath Ghosh

� ����������� � ��� �

Page 2: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

reported holographic QSAR for toxicity data of 83 benzenederivatives to the autotrophic Chlorella vulgaris [9]. Com-parative molecular field analysis (CoMFA) was used tomodel acute toxicity of 56 phenylsulfonyl carboxylates onVibrio fischeri [10]. The acute toxicity data of 20 alpha-substituted phenylsulfonyl acetates againstDaphnia magnawas modeled using theoretical linear solvation energyrelationships and charge model descriptors [11]. The jointtoxicity of 2,4-dinitrotoluenewith aromatic compoundswithVibrio fischeri was subjected to QSAR study using theenergy of lowest unoccupied molecular orbital [12]. Partialleast squares andmultiple regression analyses were used formodeling toxicity of aromatic compounds to Chlorellavulgaris [13]. Different classification techniques were ap-plied on 235 pesticides using 153 descriptors by Mazzatortaet al. for the toxicity prediction [14].Recently the present group of authors have introduced

extended topochemical atom (ETA) indices [15 ± 17] in thevalence electron mobile (VEM) environment [18 ± 28] andexplored quantitative structure-toxicity relationships ofcompounds of different chemical groups [15 ± 17]. In con-tinuation of such effort, the present paper deals withmodeling of the acute toxicity (Table 1) of phenylsulfonylcarboxylates to Vibrio fischeri. Aromatic sulfones beingextensively used [10] as intermediates in themanufacture ofpesticides, herbicides and anthelmentics and also as floata-tion agents and extractants in the petrochemical andmetallurgical industries, modeling QSTR of these com-pounds appears to be of timely need in order to predict theecological effects of the compounds in case of theiraccidental discharge.

2 Materials and Methods

Definitions of some of the basic parameters used in theETAscheme are given below.The core count of a non-hydrogen vertex [�] is defined as

[15]:

� � Z � ZvZv

� 1PN � 1

�1�

In Eq. (1), PN denotes period number. Hydrogen atombeing considered as the reference,� for hydrogen is taken tobe zero. Again, another term � (a measure of electro-negativity) has been defined [15] in the following manner:

� � ��� 0�3ZV �2�

It is interesting to note that � values of different atoms(which are commonly found in organic compounds) havehigh correlation (r� 0.946) [15] with (uncorrected) van derWaals volume while � has good correlation (r� 0.937) withPauling×s electronegativity scale [15].The VEM count � of ETA scheme is defined as:

� ��

x� ��

y�� � �3�

In the above equation, � is a correction factor of value 0.5per atom with loan pair of electrons capable of resonancewith an aromatic ring (e.g., nitrogen of aniline, oxygen ofphenol, etc.). For calculation of the VEM count, contribu-tion of a sigma bond (x) between two atoms of similarelectronegativity (��� 0.3) is considered to be 0.5, and for asigma bond between two atoms of different electronegativ-ity (��� 0.3) it is considered to be 0.75. Again, in case of pibonds, contributions (y) are considered depending on thetype of double bond: (i) for pi bond between two atoms ofsimilar electronegativity (��� 0.3), y is taken to be 1; (ii) forpi bond between two atoms of different electronegativity(��� 0.3) or for conjugated (non-aromatic) pi system, y isconsidered to be 1.5; (iii) for aromatic pi system, y is taken as2.The VEM vertex count �i of the ith vertex in a molecular

graph is defined as:

�i ��i�i

�4�

In the above equation, �i stands for � value for the ith vertexand �i stands for VEM count considering all bondsconnected to the atom i and its lone pair of electrons (if any).Finally, the composite index � is defined in the following

manner:

� ��i < j

�i�jr2ij

� �0�5

�5�

In Eq. 5, rij stands for the topological distance between ith

atomand jth atom.Again,whenall heteroatoms andmultiplebonds in the molecular graph are replaced by carbon andsingle bond respectively, corresponding molecular graphmay be considered as the reference alkane and thecorresponding composite index value is designated as �R.Considering functionality as the presence of heteroatoms(atoms other than carbon or hydrogen) and multiple bonds,functionality index �F may be calculated as �R - �. To avoiddependence of functionality on vertex count or bulk, wehave defined [15] another term �/

F as �F/NV. Again, one candetermine contribution of a particular position, vertex orsubstructure to functionality in the following manner:

��i ��j � i

�i�jr2ij

� �0�5

�6�

In Eq. 6, [�]i stands for contribution of the ith vertex to �.Similarly, contribution of the ith vertex [�R]i to �R can becomputed.Contributionof the ith vertex [�F]i to functionalitymay be defined as [�R]i -[�]i. To avoid dependence of thisvalue onNV, a related term [�/

F]i was defined [15] as [�F]i/NV.

QSAR Comb. Sci. 2004, 23 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 527

QSTR with Extended Topochemical Atom Indices

� ����������� � ��� �

Page 3: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

528 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 2004, 23

Table 1. Observed, calculated and predicted toxicity of phenylsulfonyl carboxylates to Vibrio fischeri.

Sl. Structural featuresa Toxicity to Vibrio fischeri (pC)

R1 R2 R3 X1 X2 Obs.b Calc Predc Cald Predd Cale Prede

1 CH3 -(CH2)2- H H 2.28 2.20 2.19 2.23 2.22 2.19 2.172 CH3 -(CH2)3- H H 2.12 2.01 2.00 1.83 1.80 1.82 1.793 CH3 -(CH2)4- H H 1.91 1.85 1.84 1.74 1.72 1.74 1.734 CH3 -(CH2)5- H H 1.81 1.70 1.70 1.65 1.63 1.67 1.665 CH3 -(CH2)2- H NO2 2.12 1.89 1.88 1.95 1.92 1.94 1.906 CH(CH3)2 -(CH2)2- H NO2 1.78 1.65 1.65 1.77 1.76 1.77 1.777 CH(CH3)2 -(CH2)3- H NO2 1.81 1.49 1.47 1.37 1.34 1.39 1.368 CH(CH3)2 -(CH2)5- H NO2 1.45 1.23 1.22 1.19 1.17 1.23 1.229 CH(CH3)2 -(CH2)6- H NO2 1.05 1.13 1.14 1.10 1.10 1.14 1.1510 CH3 -(CH2)2- H Br 1.89 1.82 1.82 2.08 2.10 1.95 1.9611 CH3 -(CH2)3- H Br 1.76 1.64 1.64 1.68 1.67 1.58 1.5712 CH3 -(CH2)4- H Br 1.60 1.50 1.50 1.59 1.59 1.50 1.5013 CH3 -(CH2)5- H Br 1.31 1.38 1.38 1.49 1.51 1.42 1.4314 CH3 -(CH2)2- H Cl 1.96 1.99 1.99 2.01 2.02 2.06 2.0715 CH3 -(CH2)3- H Cl 1.92 1.81 1.80 1.61 1.56 1.68 1.6716 CH(CH3)2 -(CH2)2- H Cl 1.86 1.75 1.74 1.83 1.82 1.90 1.9017 CH2(CH2)2CH3 -(CH2)2- H Cl 1.70 1.63 1.62 1.72 1.73 1.84 1.8618 CH(CH3)2 -(CH2)4- H Cl 1.51 1.43 1.43 1.33 1.31 1.43 1.4319 CH(CH3)2 -(CH2)5- H Cl 1.32 1.31 1.31 1.24 1.23 1.35 1.3520 CH(CH3)2 -(CH2)6- H Cl 0.90 1.20 1.22 1.15 1.19 1.26 1.2921 CH(CH3)2 -(CH2)2- H CH3 1.96 1.81 1.81 1.94 1.94 1.93 1.9322 CH(CH3)2 -(CH2)3- H CH3 1.46 1.64 1.64 1.54 1.55 1.55 1.5523 CH3 -(CH2)2- H CH3 2.22 2.07 2.05 2.13 2.11 2.09 2.0824 CH2CH3 -(CH2)2- H CH3 1.92 1.94 1.94 2.03 2.04 2.03 2.0425 CH2CH3 -(CH2)3- H CH3 1.68 1.76 1.76 1.63 1.62 1.65 1.6526 CH(CH3)2 -(CH2)4- H CH3 1.22 1.49 1.51 1.45 1.46 1.46 1.4827 CH(CH3)2 -(CH2)5- H CH3 1.09 1.37 1.38 1.36 1.38 1.38 1.4028 CH3 -(CH2)5- H CH3 1.40 1.58 1.59 1.54 1.55 1.56 1.5729 CH3 H H H NO2 1.29 1.50 1.56 1.52 1.54 1.45 1.4930 CH(CH3)2 H H H NO2 1.28 1.26 1.26 1.33 1.33 1.35 1.3531 CH3 H H Cl NO2 0.44 0.86 1.38 1.28 1.42 0.80 1.2532 CH(CH3)2 H H Cl NO2 1.13 0.67 0.27 1.09 1.08 0.73 0.3933 CH3 H H NO2 H 1.49 1.48 1.48 1.47 1.47 1.54 1.5534 CH(CH3)2 H H NO2 H 1.34 1.24 1.22 1.28 1.28 1.42 1.4335 CH3 H H NO2 Cl 1.33 1.28 1.27 1.23 1.21 1.43 1.4436 CH(CH3)2 H H NO2 Cl 1.45 1.06 0.98 1.04 0.97 1.30 1.2937 CH3 H CH3 H NO2 1.48 1.80 1.83 1.42 1.42 1.55 1.5638 CH3 CH3 CH3 H NO2 1.42 1.73 1.74 1.64 1.67 1.54 1.5639 CH3 CH2CH3 CH2CH3 H NO2 1.36 1.48 1.49 1.44 1.45 1.40 1.4040 CH3 CH2(CH2)2CH3 CH2(CH2)2CH3 H NO2 1.10 1.07 1.07 1.07 1.06 1.07 1.0741 CH3 CH2Ph CH2Ph H NO2 0.60 0.62 0.63 0.66 0.67 0.70 0.7142 CH2CH3 CH2(CH2)2CH3 CH2(CH2)2CH3 H NO2 1.08 0.99 0.98 0.97 0.96 0.99 0.9843 CH2CH3 CH3 CH2Ph H NO2 0.98 0.97 0.97 1.05 1.06 1.06 1.0644 CH2CH3 CH3 CH2CH�CH2 H NO2 1.12 1.34 1.35 1.37 1.39 1.33 1.3445 CH2CH3 CH3 CH2-1-Naph H NO2 0.83 0.71 0.66 0.72 0.71 0.78 0.7746 CH(CH3)2 CH2(CH2)2CH3 CH2(CH2)2CH3 H NO2 1.05 0.91 0.90 0.88 0.86 0.88 0.8647 Cyclohexyl H CH3 H NO2 1.19 1.22 1.22 0.99 0.98 1.27 1.2848 CH3 H CH2CO2CH2CH3 H NO2 1.00 1.31 1.33 1.09 1.10 1.20 1.2149 CH(CH3)2 H CH2CO2CH(CH3)2 H NO2 0.92 1.03 1.03 0.81 0.80 0.93 0.93

Kunal Roy and Gopinath Ghosh

� ����������� � ��� �

Page 4: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

Again, considering only bonded interactions (rij� 1), thecorresponding composite index is written as �local.

�local ��

i < j�rij�1

�i�j

� �0�5�7�

In the similar way, �Rlocal for the corresponding reference

alkane may also be calculated. Local functionality contri-bution (without considering global topology), �F

local, may becalculated as �R

local- �local.Branching index �B can be calculated as �N

local - �Rlocal�

0.086NR, where NR stands for the number of rings in themolecular graph of the reference alkane. TheNR term in thebranching index expression represents a correction factorfor cyclicity. �N

local indicates � value of the correspondingnormal alkane (straight chain compound of same vertexcount obtained from the reference alkane), which may beconveniently calculated as (when NV � 3):

�localN � 1�414� NV � 3� �0�5 �8�

To calculate branching contribution relative to the mo-lecular size, another term �/

B has been defined as �B/NV.In the present communication, utility of ETA parameters

has been demonstrated through a QSTR study taking acutetoxicity of phenylsulfonyl carboxylates toVibrio fischeri [10]

as themodel data set (Table 1). The definitions of importantETA parameters are given in Table 2. Factor analysis hasbeen performed as the data-preprocessing step for theidentification of important descriptors for the subsequentmultiple regression analysis [29 ± 31]. For this purpose, thedata matrix consisting of the descriptors has been subjectedto principal component factor analysis using SPSS software[32]. In a typical factor analysis procedure, the datamatrix isfirst standardized, and correlation matrix and subsequentlyreduced correlation matrix are constructed. An eigen valueproblem is then solved and the factor pattern can beobtained from the corresponding eigen vectors. The princi-pal objectives of factor analysis are to display multidimen-sional data in a space of lower dimensionality with minimalloss of information and to extract basic features behind thedata with ultimate goal of interpretation and/or prediction.The factors were extracted by the principal componentmethod and then rotated by VARIMAX rotation to obtainThurston×s simple structure. Only variables with non-zeroloadings in such factors where biological activity also hasnon-zero loading were considered important in explainingvariance of the activity. Further, variables with non-zeroloadings in different factors were combined in regressionequations. Attempt was also made to perform principalcomponent regression analysis (PCRA) [31] taking factorscores as the predictor variables and adopting backward

QSAR Comb. Sci. 2004, 23 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 529

Table 1. (cont.)

Sl. Structural featuresa Toxicity to Vibrio fischeri (pC)

R1 R2 R3 X1 X2 Obs.b Calc Predc Cald Predd Cale Prede

50 CH(CH3)2 CH2CO2CH2CH3 CH2CO2CH2CH3 H NO2 0.66 0.75 0.78 0.95 1.02 0.61 0.6051 CH3 �CHPh H NO2 0.82 0.86 0.87 0.88 0.88 0.82 0.8152 CH2CH3 �CHPh H NO2 0.75 0.78 0.79 0.78 0.79 0.75 0.7553 CH(CH3)2 �CHPh H NO2 0.64 0.70 0.72 0.69 0.70 0.65 0.6554 CH2CH(CH3)2 �CHPh H NO2 0.66 0.63 0.63 0.60 0.59 0.57 0.5655 CH(CH3)2 �CHPh H CH3 0.89 0.80 0.78 0.90 0.90 0.76 0.7356 CH(CH3)2 �CHPh H Cl 0.80 0.75 0.74 0.78 0.77 0.73 0.71

a In cases of compounds 1 ± 28, R2 and R3 substituents correspond to alicyclic rings of different size.b Obs�Observed (Ref. 10); Cal�Calculated, Pred�Predictedc From Eq. 14d From Eq. 16e From Eq. 18

Table 2. Definitions of important ETA parameters used in exploring QSAR of toxicity of phenylsulfonyl carboxylates.

Parameter Definition

� Sum of � values of all non-hydrogen vertices of a molecule[�]P Sum of � values of all non-hydrogen vertices each of which is joined to only one other vertex of the molecule[�/

F]10 functionality for carbon atom at 10th position.[�/

F]13 functionality for oxygen atom at 13th position.[�/

F]X1-Cl functionality for chlorine substituent at X1 position in the phenyl ring.[�/

F]X2-NO2 functionality for nitro substituent at X2 position in the phenyl ring.[�/

s]Sub Sum of �/s values of substituents (R2 and R3) on C10; �/

s is defined as [�s]/Nv for non-hydrogen substituent(s);in case, hydrogen is present in the substituent position, the value for that position is taken as zero.

[�/ns]Sub Sum of �/

ns values of substituents on C10; �/ns is defined as [�ns]/Nv for non-hydrogen substituent(s); in case,

hydrogen is present in the substituent position, the value for that position is taken as zero.

QSTR with Extended Topochemical Atom Indices

� ����������� � ��� �

Page 5: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

stepwise regression method. In this case, the principalcomponents serve as latent variables. PCRA has anadvantage that collinearities among X variables are not adisturbing factor and that the number of variables includedin the analysis may exceed the number of observations [31].In PCRA, all descriptors are assumed to be important whilethe aim of factor analysis is to identify relevant descriptors.The non-hydrogen common atoms of the compounds

were so numbered that thesemaintain same serials in all themodels (Figure 1). The calculations of �, �R, �F, �B andcontributions of different vertices to �F were done, usingdistance matrix and VEM vertex counts as inputs, by theGW-BASIC programs KRETA1 and KRETA2 developedbyoneof the authors [33].Wehave alsomodeled the toxicitydata using other selected topological and physicochemicalvariables and compared the ETA models with non-ETAones. The values for the topological descriptors andphysicochemical variables for the compounds have beengenerated by QSAR�and Descriptor�modules of theCerius 2 version 4.6 software [34]. The various topologicalindices calculated are Wiener, Hosoya Z, Balaban J,connectivity indices(0, 1, 2, 3p, 3c, 3CH, 0v, 1v, 2v, 3vp, 3vc, 3vCH ), kappashape indices (1�, 2�, 3�, 1��, 2��, 3��) andE-state parameters.Among the physicochemical variables, molar refractivity(MolRef), hydrophobicity (AlogP98) and H-bond-acceptorwere considered.The regression analyses were carried out using a program

RRR98 [33]. The statistical quality of the equations [35] wasjudged by the parameters like explained variance (Ra2, i.e.,adjusted R2), correlation coefficient (r or R), standard errorof estimate (s) and variance ratio (F) at specified degrees offreedom (df). PRESS (leave-one-out) statistics [36, 37] werecalculated using the programs KRPRES1 and KRPRES2[33], and leave-one-out cross-validation R2 (Q2), predictedresidual sum of squares (PRESS), standard deviation basedon PRESS (SPRESS), standard deviation of error of prediction(SDEP) and average absolute predicted residual (Presav)were reported. Finally, ™leave-many-out∫ cross-validationwas appliedon the final equations.All the accepted equationshave regression constants and F ratios significant at 95% and99% levels respectively, if not stated otherwise. A compoundwas considered as anoutlier, if the residual ismore than twicethe standard error of estimate for a particular equation.

3 Results and Discussion

Table 3 shows that nine factors could explain 95.0% of thevariance of the data matrix composed of the toxicity dataand ETA descriptors. The toxicity data are highly loadedwith factor 1 (which is in turn highly loaded in �,functionality contributions of atoms 1 ± 9 and 11 ± 13, and[�]sub) andmoderatelywith factor 2 (which is highly loadedin �/

, �/ns, functionality contribution of atom 10, �/

F,[�/

ns]C10, and [�/ns]Sub). Again, the toxicity data are

moderately loaded with factor 9 (loaded in [�/F]X1-Cl) and

slightly loaded in factors 5 (loaded in [�/F]X1-NO2), 4 (loaded in

�/B and [�]P/�) and 8 (loaded in [�/

F]X2-Cl). Based on theresults of the factor analysis, the following three relations ofalmost comparable qualities were derived.

pC��1.232 (�1.273)[�]P/�� 40.358 (� 8.588) [�/F]13

� 3.266 (�2.799) [�/F]X1-Cl� 0.776 (�0.279)[�/

s]Sub� 0.635 (�0.168) [�/

ns]Sub� 2.212 (� 0.904)n� 56, Q2� 0.704, Ra2� 0.826, R2� 0.842,R� 0.918, s� 0.192 (9)F� 53.3 (df 5,50), AVRES� 0.139, PRESS� 3.464,SDEP� 0.249, SPRESS� 0.263, Presav� 0.168

pC��3.927 (�2.783) [�/F]X1-Cl� 6.310 (�3.592) [�/

F]10�0.229 (� 0.032) �� 0.611 (�0.297) [�/

S]Sub� 3.760 (� 0.456)n� 56, Q2� 0.730, Ra2� 0.824, R2� 0.837,R� 0.915, s� 0.193 (10)F� 65.5 (df 4,51), AVRES� 0.148, PRESS� 3.160,SDEP� 0.238, SPRESS� 0.249, Presav� 0.172

pC��3.436 (�2.947) [�/F]X1-Cl� 0.389 (�0.313) [�/

s]Sub� 13.401 (� 3.999) [�/

F]10 �60.911 (� 9.000) [�/F]13

� 3.472 (�0.809)n� 56, Q2� 0.692, Ra2� 0.803, R2� 0.817,R� 0.904, s� 0.205 (11)F� 56.9 (df 4,51), AVRES� 0.156, PRESS� 3.600,SDEP� 0.254, SPRESS� 0.266, Presav� 0.182

The coefficient of [�]p/� in Eq. 9 is significant at 90%level. The negative coefficient of [�]p/� in Eq. 9 indicatesthat the toxicity decreases with increase in branching.Again, the negative coefficient of � in Eq. 10 indicates thenegative contribution of steric bulk to the toxicity. Thepositive coefficient of [�/

F]13 in Eq. 9 and the negativecoefficients of [�/

F]10 in Eqs. 10 and 11 signify positive andnegative contributions of the functionalities of O13 and C10

respectively. The negative coefficients of [�/F]X1-Cl in Eqs. 9 ±

11 indicate that the toxicity decreases as functionality valueof Cl substituent at X1 position increases. The positivecoefficient of [�/

s]Sub in Eqs. 9 ± 11 suggest that the toxicityincreases with increase in the number of electronegativeatoms atR2 andR3 positions.Again, the negative coefficient

530 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 2004, 23

Figure 1. General structure of phenylsulfonyl carboxylates: thecommon atoms have been numbered 1 through 13.

Kunal Roy and Gopinath Ghosh

� ����������� � ��� �

Page 6: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

of [�/ns]Sub in Eq. 9 suggests that the toxicity decreases with

increase in unsaturation at R2 and R3 positions.When squared term of � is added to Eq. 10, a tangible

rise in statistical quality occurs.

pC��3.761 (�2.689) [�/F]X1-Cl� 6.511 (� 3.469) [�/

F]10� 0.656 (�0.289) [�/

s]Sub �0.588 (�0.327) �� 0.017 (�0.015) [�]2� 5.630 (�1.751)n� 56, Q2� 0.726, Ra2� 0.837, R2� 0.852,R� 0.923, s� 0.186 (12)F� 57.4 (df 5,50), AVRES� 0.136, PRESS� 3.207,SDEP� 0.239, SPRESS� 0.253, Presav� 0.163

Eq. 12 shows an inverted parabolic relation with respect to� which means that the toxicity decreases with increase insteric bulk up to certain limiting value.When [�/

F]13 was used

insteadof� inEq. 12, predicted variance increased to someextent (0.732 vs. 0.726) though explained variance decreasedmarginally (0.826 vs. 0.837).

pC��3.831 (�2.758) [�/F]X1-Cl� 0.493 (� 0.299) [�/

s]Sub�9.457 (� 3.907) [�/

F]10 �26.236 (�3.646) [�/F]13

� 0.006 (�0.002) [�]2

n� 56, Q2� 0.732, Ra2� 0.826, R2� 0.839,R� 0.916, s� 0.192 (13)F� 613.6 (df 551), AVRES� 0.142, PRESS� 3.135,SDEP� 0.237, SPRESS� 0.250, Presav� 0.166

According to the equation statistics, Eq. 12 is the best ETAmodel, while considering the cross-validation statistics, Eq.13 is the best one.The intercept of Eq. 13 was set to zero. Attempt was also

made to use factor scores as the predictor variables to avoid

QSAR Comb. Sci. 2004, 23 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 531

Table 3. Factor loadings of the variables (ETA parameters) after VARIMAX rotation.

Variables Factor Factor Factor Factor Factor Factor Factor Factor Factor Commu-nalities1 2 3 4 5 6 7 8 9

pC .660 � .469 � .098 � .181 � .247 .018 .027 .128 � .345 .895� � .964 .196 .101 .073 � .038 .059 � .025 � .023 � .047 .991[�]P/� .343 .165 .145 .729 .285 � .117 � .258 .020 .230 .911[�]Y/� � .005 .554 .153 .592 .266 � .358 .175 � .077 .139 .936[�]X/� .483 � .650 � .163 � .198 � .346 .252 .018 .074 � .161 .937�/

B � .523 .205 � .068 .715 .068 � .028 .132 .081 .051 .864�/

s .569 .148 .510 .324 .012 .065 .265 .060 .013 .790�/

ns .212 .880 .223 .106 .108 .217 .084 � .089 .022 .954�/ .267 .848 .271 .138 .104 .213 .111 � .077 .022 .958[�/

F]1 .962 .045 .015 .007 .217 .128 .004 .064 .047 .998[�/

F]2 .959 .013 .124 .033 .026 .116 .007 � .062 � .157 .979[�/

F]3 .867 .126 .269 .207 .253 .050 .102 � .095 .143 .989[�/

F]4 .747 .152 .436 .040 .235 .060 .293 � .202 .133 .976[�/

F]5 .884 .102 .298 .196 .065 .056 .091 � .160 .159 .985[�/

F]6 .959 .053 .133 .095 .156 .102 .033 � .033 .102 .997[�/

F]7 .964 .140 � .038 � .060 .102 .161 � .021 .078 .021 .996[�/

F]8 .948 � .144 � .086 � .085 .087 .199 � .073 .088 .016 .994[�/

F]9 .948 � .144 � .086 � .085 .087 .199 � .073 .088 .016 .994[�/

F]10 .316 .883 � .010 .000 .264 � .100 .004 .033 .139 .979[�/

F]11 .949 .061 � .123 � .140 � .007 .180 � .063 .115 � .028 .990[�/

F]12 .912 � .211 � .178 � .134 � .014 .093 � .079 .120 .027 .956[�/

F]13 .965 .081 � .077 � .083 .043 .140 � .003 .105 .006 .985[�/

F]14 .638 .137 .154 � .151 � .034 .639 � .137 � .072 � .004 .906�/

F � .237 .736 .343 .314 .168 .226 .219 � .186 � .001 .976[�/

F]X2-NO2 � .190 .250 .590 .323 � .134 � .051 .184 � .581 .177 .974[�/

F] X2-Cl .085 � .048 .129 .066 .069 � .111 .066 .935 � .029 .927[�/

F] X2-Br .102 � .121 .014 � .011 � .056 .158 � .932 � .008 � .066 .926[�/

F] X2-CH3 .040 � .138 � .893 .042 � .151 � .077 .130 � .173 � .039 .897[�/

F] X1-NO2 .291 .137 � .008 .114 .849 � .001 .078 .229 � .191 .932[�/

F] X1-Cl .147 .073 .071 .146 .018 � .002 .052 � .053 .937 .938[�]R � .430 � .020 � .044 .100 .042 � .822 .163 .107 � .012 .914[�]Sub � .880 .160 .003 � .100 � .144 .370 .013 � .050 � .117 .984[�/

ns]C10 � .223 .851 � .162 � .201 � .171 � .201 � .105 .062 � .025 .926[�]Sub2 � .809 .216 .074 .019 .005 .473 .111 � .051 � .069 .951[�]X2 � .310 .129 .494 .465 � .266 � .038 � .489 � .207 .101 .937[�/

s]Sub � .240 � .244 � .184 � .171 � .765 .078 � .064 .098 � .389 .936[�/

ns]Sub � .466 .846 � .075 � .076 � .153 .087 .015 .022 � .077 .982% variance .431 .205 .086 .068 .041 .039 .033 .025 .022 .950

QSTR with Extended Topochemical Atom Indices

� ����������� � ��� �

Page 7: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

loss of information on selection of relevant moleculardescriptors from the set of descriptors and significantincrease in statistical qualities was obtained.

pC� 0.304 (�0.044) f 1� 0.216 (�0.044) f 2� 0.045 (�0.044) f 3� 0.083 (�0.044) f 4� 0.114 (�0.044) f 5� 0.059 (�0.044) f 8� 0.159 (�0.044) f 9� 1.359 (�0.042)n� 56, Q2� 0.816, Ra2� 0.879, R2� 0.894,R� 0.946, s� 0.161 (14)

F� 57.8 (df 7,48), AVRES� 0.117, PRESS� 2.146,SDEP� 0.196, SPRESS� 0.211, Presav� 0.143

Eq. 14 could predict and explain 81.6% and 87.9%respectively of the variance of the toxicity. The factor scoresas mentioned in Eq. 14 signify the importance of differentvariables as shown in bold face in Table 3. Compounds 20and 32 act as outliers for Eq. 14.Table 4 shows that ten factors could explain 95.8% of the

variance of the data matrix composed of the toxicity data

532 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 2004, 23

Table 4. Factor loadings of the variables (non-ETA parameters) after VARIMAX rotation.

Variables Factor Factor Factor Factor Factor Factor Factor Factor Factor Factor Commu-nalities1 2 3 4 5 6 7 8 9 10

pC � .705 .256 .372 � .002 � .002 .382 � .002 � .050 � .121 � .102 .874Wiener .953 � .201 � .050 � .091 � .078 � .037 � .023 � .080 � .101 .083 .991LogZ .169 .019 � .218 .838 � .014 � .035 � .131 .060 � .103 .048 .8140 .959 � .220 � .014 � .029 � .080 � .086 � .053 .034 � .076 .086 .9991 .976 � .136 � .026 � .017 � .059 � .063 � .004 � .076 � .093 .031 .9942 .953 � .139 � .015 .001 � .093 � .058 � .102 � .082 � .021 .190 .9933P .895 .038 .218 .031 .027 � .060 .106 � .246 � .209 .049 .972;3C .633 � .330 .105 � .024 � .140 � .019 � .284 .136 .170 .541 .9613CH � .303 .147 .099 � .047 � .024 .928 .028 � .056 .030 .019 .9920v .981 .010 .109 .052 .074 � .074 � .058 .078 .007 .038 .9991v .955 .121 .187 .127 .062 � .065 � .063 .029 .013 � .056 .9942v .810 .280 .434 .190 .139 .010 � .094 � .018 .052 .044 .9933vP .546 .359 .651 .153 .198 � .023 .089 � .098 � .109 � .159 .9693vC .305 .314 .766 .075 .181 .185 .061 .053 � .013 .312 .9563vCH � .303 .147 .099 � .047 � .024 .928 .028 � .056 .030 .019 .9921� .927 � .290 � .055 � .047 � .089 � .123 � .062 .126 � .071 .053 .9992� .901 � .291 � .187 � .051 � .092 � .164 � .053 .131 � .077 � .037 .9973� .722 � .400 � .409 � .079 � .141 � .145 � .203 .173 .047 .071 .9741�� .936 � .244 .013 � .032 � .046 � .124 � .094 .185 � .017 .020 .9992�� .919 � .233 � .112 � .029 � .042 � .167 � .088 .198 � .014 � .086 .9973�� .739 � .358 � .353 � .062 � .105 � .144 � .252 .238 .119 .038 .970Balaban J .065 � .535 � .094 � .221 � .038 � .261 .059 .729 � .012 � .070 .957LogP .452 .409 .172 .631 � .070 � .080 .364 � .046 .175 � .095 .985MR .914 .005 .270 .005 � .006 � .114 � .031 .096 .054 � .062 .939MolRef .989 .026 � .014 .081 .001 � .079 .032 � .033 � .046 .054 .999AlogP98 .767 .382 .169 .400 .087 � .079 .158 .038 .137 � .118 .995H-bond-acc .346 � .802 � .093 � .335 .018 � .062 � .246 .115 .147 .080 .989HOMO � .058 .147 .140 � .021 .960 � .045 .054 � .027 � .060 � .030 .978LUMO � .205 .783 .078 � .108 � .394 � .015 .064 .114 .148 � .057 .871Dipole .602 � .434 .060 .005 .367 � .103 .360 � .046 � .134 .088 .856S-sCH3 .478 .169 .159 .155 � .228 .019 � .259 .672 � .103 .280 .966S-ssCH2 .054 .361 .425 .531 .082 � .121 � .072 .011 .091 � .553 .937S-dsCH .324 .144 � .757 .103 � .000 � .001 .191 � .023 � .113 .176 .790S-aaCH .518 .348 � .317 � .039 � .020 � .016 .360 � .515 � .168 .160 .941S-sssCH � .187 .202 .099 .033 .072 .064 .880 � .050 .028 � .029 .875S-dssC � .456 .291 .378 .599 .014 .015 .318 � .142 .090 � .107 .936S-aasC .061 .888 .089 .096 .236 .146 .097 � .150 � .107 .086 .937S-ssssC � .234 � .228 � .823 .134 � .088 � .193 � .289 � .085 .144 � .019 .959S-ddsN � .212 .902 .180 .020 .212 .144 .000 � .035 .083 � .091 .973S-dO .477 � .797 � .062 � .175 � .166 � .092 � .137 .081 � .176 .076 .995S-ssO .471 � .095 .095 � .609 .011 .042 � .534 .245 � .063 .032 .963S-ddssS � .663 .566 .236 .288 .118 .156 .062 � .076 .014 � .128 .963S-sCl � .108 .074 � .037 .002 � .108 .040 .025 � .015 .958 .070 .951S-sBr � .140 .076 .073 .000 .974 .009 .023 � .055 � .041 � .027 .985% variance .446 .193 .072 .061 .052 .039 .031 .027 .022 .016 .958

Kunal Roy and Gopinath Ghosh

� ����������� � ��� �

Page 8: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

and non-ETA descriptors. The toxicity values are highlyloaded with factor 1, which is in turn highly loaded in zero,first and second order connectivity, 3P and kappa shapeindices, molar refractivity (MR, MolRef) and AlogP98.Again, the toxicity is moderately loaded with factor 3(loaded in 3VC and S-ssssC), factor 6 (loaded in 3CH and3VCH) and factor 2 (loaded in H-bond-acc, LUMO, S-aasC,S-ddsN and S-dO).Based on the results of the factor analysis, the following

equation was obtained:

pC��0.021 (�0.004) MolRef� 1.240 (�0.675)3VCH

� 0.179 (� 0.088) S-ssssC� 0.257 (� 0.194) S-ddsN� 0.022 (� 0.025) S-sCl� 2.885 (�0.354)n� 56, Q2� 0.763, Ra2� 0.798, R2� 0.816,R� 0.903, s� 0.207 (15)F� 44.3 (df 5,50), AVRES� 0.139, PRESS� 2.769,SDEP� 0.222, SPRESS� 0.235, Presav� 0.156

The regression coefficient of S-sCl in Eq. 15 is significant at90% level. The explained variance value (0.798) of Eq. 15 isless than those of Eqs. 9 ± 13. However, the predictedvariance (0.763) of Eq. 15 is more than those of Eqs. 9 ± 13.Eq. 15 indicates that the toxicity decreases with increase inmolecular bulk and branching as evidenced from thenegative and positive coefficients of MolRef and 3VCH

respectively. Again, importance of E-state terms can befound from the appearance of these terms in Eq. 15.When factor scores were used as predictor variables, a

tangible rise in statistical quality was obtained:

pC��0.325 (�0.047) f 1� 0.118 (�0.047) f 2� 0.171 (�0.047) f 3� 0.176 (�0.047) f 6� 0.056 (�0.047) f 9� 0.047 (�0.047) f 10� 1.359 (�0.046)n� 56, Q2� 0.828, Ra2� 0.856, R2� 0.871,R� 0.934, s� 0.175 (16)F� 55.4 (df 6,49), AVRES� 0.115, PRESS� 2.015,SDEP� 0.190, SPRESS� 0.203, Presav� 0.132

The regression coefficient of f10 in Eq. 16 is significant at94% level. Eq. 16 based on factor scores of data matrix ofnon-ETA variables is statistically comparable to Eq. 14based on factor scores of data matrix of ETAvariables. Thisshows that the matrix composed of ETA descriptors is asrich in chemical information as is the matrix composed ofdifferent and diverse types of non-ETA descriptors. Com-pound 31 acts as an outlier for Eq. 16.When the data matrix composed of both ETA and non-

ETA parameters was considered, eleven factors couldexplain 95.3%of the variance (Table not shown for brevity).The toxicity is highly loaded with factor 1 (which has in turnhigh loading in Wiener, zero, first and second orderconnectivity, 3P and kappa indices, MR, MolRef, AlogP98,NV, �, functionalities of atoms 1 ± 9 and 11 ± 13 and [�]Sub.Again, the toxicity is moderately loaded with factors 3

(loaded in S-dsCH, �/ns and functionality of atom 10,

[�/ns ]C10 and [�/

ns ]Sub) and 7 (loaded in 3CH and 3VCH).Apart from these, the toxicity shows considerable loadingswith factors 2 (loaded in H-bond-acc, LUMO, S-aasC, [�/

F]X2-NO2 and [�/F]X2-CH3), 8 (loaded in Balaban J and S-sCH3)

and 11 (loaded in [�/F]X1-Cl). Based on the results of factor

analysis, the following best equation was obtained.

pC��0.151 (�0.026) 0V� 1.202 (� 0.546) 3VCH� 10.872 (�2.963) [�/

F]10� 4.532 (�2.382) [�/F]Xl-Cl

�0.296 (�0.297) [�/F]X2-NO2 � 4.000 (�0.444)

n� 56, Q2� 0.779, Ra2� 0.861, R2� 0.874,R� 0.935, s� 0.172 (17)F� 69.4 (df 5,50), AVRES� 0.127, PRESS� 2.589,SDEP� 0.215, SPRESS� 0.228, Presav� 0.152

The regression coefficient of [�/F]X2-NO2 in Eq. 17 is signifi-

cant at 94% level. Eq. 17 is statistically better than both Eqs.13 and 15. Using factor scores as the predictor variables, thefollowing equation with excellent statistical qualities(84.8% predicted variance, 89.3% explained variance) wasobtained.

pC�� 0.301 (�0.041) f 1� 0.108 (�0.041) f 2� 0.189 (�0.041) f 3� 0.188 (�0.041) f 7� 0.062 (�0.041) f 8� 0.122 (�0.041) f 11� 1.359 (�0.040)n� 56, Q2� 0.848, Ra2� 0.893, R2� 0.905,R� 0.951, s� 0.151 (18)F� 77.5 (df 6,49), AVRES� 0.113, PRESS� 1.775,SDEP� 0.178, SPRESS� 0.190, Presav� 0.134

Eq. 18 shows that the combinedmatrix of the ETA and non-ETA descriptors is no doubt more rich in chemicalinformation than the ETA matrix, but the difference isonlymarginal, as there has been a gain of only 3%predictedvariance and 1.5% explained variance compared to Eq. 14.The factor scores as mentioned in Eq. 18 signify theimportance of different variables in which the factors arehighly loaded. Compounds 20 and 32 act as outliers forEq. 18.The intercorrelations (r) among predictor variables of

different equations are given in Table 5. The calculated andpredicted toxicity values according to Eqs. 14, 16 and 18 aregiven in Table 1.The derived relations (Eqs. 14, 16 and 18) from principal

component regression analysis are of excellent statisticalqualities (Q2 values being 0.816, 0.828 and 0.848 while R2

values being 0.894, 0.871 and 0.905 for factor scores derivedfrom ETA, non-ETA and combined matrices respectively)which are comparable to that (Q2 and R2 values being 0.790and 0.920 respectively) of the relation generated fromcomparative molecular field analysis on the same data set[10].

QSAR Comb. Sci. 2004, 23 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 533

QSTR with Extended Topochemical Atom Indices

� ����������� � ��� �

Page 9: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

4 Conclusion

This study suggests that ETA parameters are sufficientlyrich in chemical information to encode the structuralfeatures contributing significantly to the acute toxicity ofphenylsulfonyl carboxylates toVibrio fischeri. This indicatesthat ETA indices merit further assessment to explore theirpotential in QSAR/QSPR/QSTR modeling.

Acknowledgement

A financial grant from J.U. Research Fund is thankfullyacknowledged. One of the authors (KR) thanks the AllIndia Council for Technical Education, NewDelhi (Govt. ofIndia) for financial assistance under the Career Award forYoung Teachers (CAYT) scheme.

References

[1] C. L. Russom, E. B. Anderson, B. E. Greenwood, A. Pilli, Sci.Total Environ. 1991, 109 ± 110, 667 ± 670

[2] E. M. Hulzebos, R. Posthumus, SAR QSAR Environ Res.2003, 14, 285 ± 316.

[3] M. T. Cronin, J. S. Jaworska, J. D. Walker, M. H. Comber,C. D. Watts, A. P. Worth, Environ. Health Perspect. 2003, 111,1391 ± 1401.

[4] L. Carlsen, J. D. Walker, QSAR Comb. Sci. 2003, 22, 49 ± 57.[5] D. Mackay, E. Webster, SAR QSAR Environ. Res. 2003, 14,

7 ± 16.[6] M. Zeeman, C. M. Auer, R. G. Clements, J. V. Nabholz, R. S.

Boethling, SAR QSAR Environ. Res. 1995, 3, 179 ± 201.[7] M. H. I. Comber, J. D. Walker, C. Watts, J. Hermens, Environ.Toxicol. Chem. 2003, 22, 1822 ± 1828.

[8] S. Ren, Environ. Toxicol. 2002, 17, 119 ± 127.[9] S. Cui, X. Wang, S. Liu, L. Wang, SAR QSAR Environ. Res.

2003, 14, 223 ± 231.[10] X. Liu, Z. Yang, L. Wang, SAR QSAR Environ. Res. 2003, 14,

183 ± 190.

[11] X. Liu, B. Wang, Z. Huang, S. Han, L. Wang, Chemosphere2003, 50, 403 ± 408.

[12] X. Yuan, G. Lu, J. Zhao, J. Environ. Sci. Health Part A. Tox.Hazard Subst. Environ. Eng. 2002, 37, 573 ± 578.

[13] T. I. Netzeva, J. C. Dearden, R. Edwards, A. D. Worgan, M. T.Cronin, J. Chem Inf. Comput. Sci. 2004, 44, 258 ± 265.

[14] P. Mazzatorta, E. Benfenati, P. Lorenzini, M. Vighi, J. Chem.Inf. Comput. Sci. 2004, 44, 105 ± 112.

[15] K. Roy, G. Ghosh, Internet Electron. J. Mol. Des. 2003, 2,599 ± 620, http://www.biochempress.com.

[16] K. Roy, G. Ghosh, J. Chem. Inf. Comput. Sci. 2004, 44, 559 ±567.

[17] K. Roy, G. Ghosh, QSAR Comb. Sci. 2004, 23, 99 ± 108.[18] D. K. Pal, C. Sengupta, A. U. De, Indian J. Chem. 1988, 27B,

734 ± 739.[19] D. K. Pal, C. Sengupta, A. U. De, Indian J. Chem. 1989, 28B,

261 ± 267.[20] D. K. Pal, M. Sengupta, C. Sengupta, A. U. De, Indian J.

Chem. 1990, 29B, 451 ± 454.[21] D. K. Pal, S. K. Purkayastha, C. Sengupta, A. U. De, Indian J.

Chem. 1992, 31B, 109 ± 114.[22] K. Roy, D. K. Pal, A. U. De, C. Sengupta, Indian J. Chem.

1999, 38B, 664 ± 671.[23] K. Roy, D. K. Pal, A. U. De, C. Sengupta, Indian J. Chem.

2001, 40B, 129 ± 135.[24] K. Roy, A. Saha, J. Mol. Model. 2003, 9, 259 ± 270.[25] K. Roy, A. Saha, Internet Electron. J. Mol. Des. 2003, 2, 288 ±

305, http://www.biochempress.com.[26] K. Roy, A. Saha, Internet Electron. J. Mol. Des. 2003, 2, 475 ±

491, http://www.biochempress.com.[27] K. Roy, S. Chakroborty, C. C. Ghosh, A. Saha, J. Indian

Chem. Soc. 2004, 81, 115 ± 125.[28] K. Roy, A. Saha, Indian J. Chem. 2004, 43A, 1369 ± 1376.[29] R. Franke, Theoretical Aspects of Rational Drug Design.

Elsevier, Amsterdam, 1984, pp. 184 ± 193.[30] P. J. Lewi, Multivariate data analysis in structure-activity

relationships; in: E. J. Ariens, (Ed.), Drug Design, vol. 10,Academic Press, New York 1980, pp. 307 ± 342.

[31] R. Franke, A. Gruska, Principal component and factoranalysis; in: H. van de Waterbeemd, (Ed.), ChemometricMethods in Molecular Design, vol. 2, VCH, Weinheim 1995,pp. 113 ± 163.

[32] SPSS is statistical software of SPSS Inc., USA.

534 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci. 2004, 23

Table 5. Intercorrelation (r) matrix.

[�]p/� [�/F]13 [�/

F]X1-Cl [�/s]Sub [�/

ns]Sub [�/F]10 � [�]2 MolRef 3vCH S-ssssC S-ddsN S-sCl 0v [�/

F]X2-NO2

[�]p/� 1.000 0.253 0.335 0.477 0.432 0.091 0.310 0.319 0.348 0.020 0.187 0.369 0.250 0.226 0.195[�/

F]13 1.000 0.128 0.233 0.363 0.380 0.921 0.897 0.930 0.335 0.302 0.185 0.136 0.957 0.323[�/

F]X1-Cl 1.000 0.444 0.093 0.229 0.145 0.142 0.160 0.090 0.268 0.194 0.364 0.149 0.256[�/

s]Sub 1.000 0.087 0.556 0.209 0.191 0.212 0.443 0.583 0.602 0.197 0.228 0.231[�/

ns]Sub 1.000 0.531 0.618 0.623 0.628 0.225 0.294 0.189 0.122 0.480 0.225[�/

F]10 1.000 0.156 0.142 0.152 0.165 0.845 0.374 0.109 0.281 0.136� 1.000 0.996 0.993 0.363 0.221 0.233 0.137 0.980 0.317[�]2 1.000 0.986 0.351 0.229 0.251 0.149 0.968 0.323MolRef 1.000 0.373 0.214 0.207 0.150 0.978 0.3013vCH 1.000 0.215 0.339 0.098 0.363 0.245S-ssssC 1.000 0.349 0.145 0.297 0.129S-ddsN 1.000 0.169 0.179 0.770S-sCl 1.000 0.115 0.3010v 1.000 0.272[�/

F]X2-NO2 1.000

Kunal Roy and Gopinath Ghosh

� ����������� � ��� �

Page 10: QSTR with Extended Topochemical Atom Indices. 4. Modeling of the Acute Toxicity of Phenylsulfonyl Carboxylates to Vibrio fischeri using Principal Component Factor Analysis and Principal

[33] The GW-BASIC programs RRR98, KRETA1, KRETA2,KRPRES1 and KRPRES2 were developed by Kunal Royand standardized using known data sets.

[34] Cerius 2 version 4.6 is a product of Accelrys Inc., San Diego,CA.

[35] G. W. Snedecor, W. G. Cochran, Statistical Methods, Oxfordand IBH Publishing Co. Pvt. Ltd., New Delhi 1967, pp. 381 ±418.

[36] S. Wold, L. Eriksson, Statistical validation of QSAR results,in: H. van de Waterbeemd, (Ed.), Chemometric Methods inMolecular Design, VCH, Weinheim 1995, pp. 312 ± 317.

[37] A. K. Debnath, Quantitative structure-activity relationship(QSAR): A versatile tool in drug design, in: A. K. Ghose,V. N. Viswanadhan, (Eds.), Combinatorial Library Designand Evaluation, Marcel Dekker, Inc., New York 2001,pp. 73 ± 129.

Received on June 15, 2004; Accepted on July 12, 2004

QSAR Comb. Sci. 2004, 23 ¹ 2004 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim 535

QSTR with Extended Topochemical Atom Indices

� ����������� � ��� �