20
Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae Kunal Roy* and Somnath Paul Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata-700032, Fax: þ 91-33-2837-1078, E-mail: [email protected], URL: http://www.geocities.com/kunalroy_in Keywords: GFA, G/PLS, K-means cluster, QSAR, Two-spotted spider mites Received: December 3, 2008; Accepted: December 5, 2008 DOI: 10.1002/qsar.200810130 Abstract Quantitative Structure – Activity Relationship (QSAR) studies have been carried out for ovicidal activity of 2,4-diphenyl-1,3-oxazoline analogues against two-spotted spider mite Tetranychus urticae, which causes serious damage to agricultural products. The studies have been performed with 2D (physicochemical, structural, and topological) and 3D (shape, spatial, electronic, and molecular field) descriptors. The chemometrics tools used for the analyses are Genetic Function Approximation (GFA) and Genetic Partial Least Squares (G/PLS). The whole dataset (n ¼ 90) was divided into a training set (75% of the dataset) and a test set (remaining 25%) on the basis of K-means clustering technique of standardized topological and structural descriptor matrix. Models developed from the training set were used to predict the activity of the test set compounds. All the models have been validated internally, externally, and by Y -randomization technique. However, different models emerged as the best ones according to different validation criteria. We have tried a consensus model, which is based on the results obtained by all predictive models and this may provide the most stable solution. Models obtained by using 2D parameters revealed that the chain length of the substituent at para position of the 4- phenyl ring is a critical factor. Lipophilicity of the molecule also reflects a dominant role for the ovicidal activity. Models generated from 3D descriptors suggest that the shape of the substituents should be optimum and the lipophilic substituents having electronegative atoms with distributed positive charge over a surface may enhance the ovicidal activity. The model obtained from Molecular Field Analysis (MFA) suggests that bulky substituents with optimally distributed charge may increase the ovicidal activity. 1 Introduction Agriculture has played a crucial role in the development of human civilization. However, the agricultural produc- tivity may be greatly diminished by different types of weeds, fungus, pests, insects etc., amongst which, insects have played a key role. There should be a control measure to protect the agricultural products from insects, either chemically or biologically (genetically). Genetic control is very complicated and expensive, making chemical control as the first choice. Insecticides are broadly classified into three classes: (1) neuroreactive compounds; (2) respiratory inhibitors; and (3) insect growth inhibitors. Neuroreactive compounds are classified into four major classes [1]: (a) or- ganochlorides (potent alkylating agent); (b) organophos- phates (anticholinesterases); (c) pyrethroids (open sodium channel of the neuronal membrane of the insects); (d) ne- onicotinoids (neurotoxic). The use of insecticides is be- lieved to be one of the major factors behind the increase in agricultural productivity in the 20th century. Nearly all insecticides have the potential to significantly alter ecosys- tems: many are toxic to humans and others are concentrat- ed in the food chain. It is necessary to balance agricultural needs with environmental and health issues when using in- secticides [2]. Some phytophagous mites take the leading role in severe damage of agricultural products. One of the most important mites is Tetranychus urticae Koch (Acari: Tetranycidae), which has detrimental effect on the cultiva- tion products including vegetables (peppers, tomatoes, po- tatoes, beans), crops (corns), and flowers (roses). The red, two spotted spider mites lay eggs on the leaves, and cause a threat to host plants by sucking cell contents from the leaves cell by cell, leaving very tiny, pale spots or scars 406 # 2009 WILEY-VCH Verlag GmbH &Co. KGaA, Weinheim QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 Supporting information for this article is available on the WWW under www.qcs.wiley-vch.de Full Papers

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Embed Size (px)

Citation preview

Page 1: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines forOvicidal Activity Against Tetranychus urticae

Kunal Roy* and Somnath Paul

Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata-700032,Fax: þ91-33-2837-1078, E-mail: [email protected], URL: http://www.geocities.com/kunalroy_in

Keywords: GFA, G/PLS, K-means cluster, QSAR, Two-spotted spider mites

Received: December 3, 2008; Accepted: December 5, 2008

DOI: 10.1002/qsar.200810130

AbstractQuantitative Structure – Activity Relationship (QSAR) studies have been carried out forovicidal activity of 2,4-diphenyl-1,3-oxazoline analogues against two-spotted spider miteTetranychus urticae, which causes serious damage to agricultural products. The studieshave been performed with 2D (physicochemical, structural, and topological) and 3D(shape, spatial, electronic, and molecular field) descriptors. The chemometrics tools usedfor the analyses are Genetic Function Approximation (GFA) and Genetic Partial LeastSquares (G/PLS). The whole dataset (n¼90) was divided into a training set (75% of thedataset) and a test set (remaining 25%) on the basis of K-means clustering technique ofstandardized topological and structural descriptor matrix. Models developed from thetraining set were used to predict the activity of the test set compounds. All the modelshave been validated internally, externally, and by Y-randomization technique. However,different models emerged as the best ones according to different validation criteria. Wehave tried a consensus model, which is based on the results obtained by all predictivemodels and this may provide the most stable solution. Models obtained by using 2Dparameters revealed that the chain length of the substituent at para position of the 4-phenyl ring is a critical factor. Lipophilicity of the molecule also reflects a dominant rolefor the ovicidal activity. Models generated from 3D descriptors suggest that the shape ofthe substituents should be optimum and the lipophilic substituents having electronegativeatoms with distributed positive charge over a surface may enhance the ovicidal activity.The model obtained from Molecular Field Analysis (MFA) suggests that bulkysubstituents with optimally distributed charge may increase the ovicidal activity.

1 Introduction

Agriculture has played a crucial role in the developmentof human civilization. However, the agricultural produc-tivity may be greatly diminished by different types ofweeds, fungus, pests, insects etc., amongst which, insectshave played a key role. There should be a control measureto protect the agricultural products from insects, eitherchemically or biologically (genetically). Genetic control isvery complicated and expensive, making chemical controlas the first choice. Insecticides are broadly classified intothree classes: (1) neuroreactive compounds; (2) respiratoryinhibitors; and (3) insect growth inhibitors. Neuroreactivecompounds are classified into four major classes [1]: (a) or-ganochlorides (potent alkylating agent); (b) organophos-phates (anticholinesterases); (c) pyrethroids (open sodiumchannel of the neuronal membrane of the insects); (d) ne-onicotinoids (neurotoxic). The use of insecticides is be-

lieved to be one of the major factors behind the increasein agricultural productivity in the 20th century. Nearly allinsecticides have the potential to significantly alter ecosys-tems: many are toxic to humans and others are concentrat-ed in the food chain. It is necessary to balance agriculturalneeds with environmental and health issues when using in-secticides [2]. Some phytophagous mites take the leadingrole in severe damage of agricultural products. One of themost important mites is Tetranychus urticae Koch (Acari:Tetranycidae), which has detrimental effect on the cultiva-tion products including vegetables (peppers, tomatoes, po-tatoes, beans), crops (corns), and flowers (roses). The red,two spotted spider mites lay eggs on the leaves, and causea threat to host plants by sucking cell contents from theleaves cell by cell, leaving very tiny, pale spots or scars

406 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Supporting information for this article is available onthe WWW under www.qcs.wiley-vch.de

Full Papers

Page 2: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

where the green epidermal cells have been destroyed.These mites produce lesions on the green epidermal cellsand thus can significantly reduce the photosynthetic capa-bility of plants, greatly reducing their production of nu-trients, followed by their death [3]. A number of acaricideswith various modes of action have been developed to con-trol these mites. Among them, acaricides that inhibit thegrowth and molting of mites, flucycloxuron [4], flufenoxur-on [5], and hexythiazox [6] are a few examples of this classof insecticides, because they are effective against strainsresistant to respiratory inhibitors and neurotoxic com-pounds, and are thought to be safe for mammals [7].

Computer-aided chemical design has been generally ac-cepted and extensively applied in the area of modern drugdiscovery, ecotoxicological modeling, and design of agro-chemicals for its high efficiency in the design of new com-pounds and optimization of lead compounds, thus savingboth time and economic costs in the large-scale experi-mental synthesis and biological tests [8]. QuantitativeStructure – Activity Relationship (QSAR) helps us to un-derstand Structure – Activity Relationship (SAR) in aquantitative manner. It is one of the most important appli-cations of chemometrics, giving information useful for thedesign of new compounds acting on a specific target.QSAR attempts to find consistent relationship between bio-logical activity or toxicity and molecular properties. ThusQSAR models can be used to predict the activity of newcompounds. QSAR models have been reported by differentgroups of researchers for agrochemicals like herbicides, fun-gicides, and insecticides. Zhu et al. [9] have developedQSAR models with pyrazolo-[5,1-d][1,2,3,5]-tetrazin-4(3H)-one derivatives (herbicides) using physicochemicalparameters in their recent study. Wang et al. [10] have doneCoMFA and CoMSiA studies on new family of sulfonylureaherbicides. Zou et al. [11] have performed QSAR studieswith a series of novel 5-[1-aryl-1,4-dihydro-6-methylpyrida-zin-4-one-3-yl]-2-arylamino-1,3,4-thiadiazoles (fungicides)using hydrophobicity and electronic parameters. Grodnitz-ky and Coats [12] have developed QSAR models in orderto predict insect toxicity of monoterpenoids.

Recently, Suzuki et al. [7] have developed classicalQSAR models of 2-(2’,6’-difluorophenyl),4-phenyl-1,3-ox-azoline derivatives for the ovicidal activity against T. urti-cae by using hydrophobic (p and log P), electronic (s),steric (Es and STERIMOL) parameters and some indica-tor variables (Io, Im, Ip). In the present paper, we have per-formed further QSAR studies on the dataset reported bySuzuki et al. [7] with 2D (physicochemical, structural, andtopological) and 3D (shape, spatial, electronic, and molec-ular field) descriptors to get a deep insight into the SARs.

2 Experimental Section

The ovicidal activity data (LC50) of 90 2-(2’,6’-difluoro-phenyl)-4-phenyl-1,3-oxazoline derivatives [7] were con-

verted to reciprocal logarithmic values [pLC50¼� log LC50

(M)] which have been used for the QSAR analysis. Thereis only one region of structural variations in the com-pounds, which is the R position of the phenyl ring. Thusthe present QSAR study explores the impact of substitu-tional variation at the 4-phenyl ring of the 1,3-oxazolinenucleus on the ovicidal activity of the compounds. Thestructures of the compounds are shown in Scheme 1 andassociated ovicidal activities are listed in Table 1. The

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 407

Table 1. Ovicidal activity (observed and calculated) of 90 2-(2’,6’-difluorophenyl)-4-phenyl-1,3-oxazoline derivatives(Scheme 1).

Compound no. Activity

Observed [7] Calculated

Eq. (M4) Eq. (M6) Eq. (M7)

1 4.71 5.09 4.11 5.262 3.74 4.70 4.13 5.27a3 4.76 4.43 4.85 5.214 3.76 4.54 3.63 4.135 3.78 4.40 4.06 4.28a6 4.74 5.04 4.92 4.667 5.77 4.82 5.28 4.868 3.74 3.58 4.11 5.259 3.76 3.29 5.99 3.2910 4.76 4.86 4.99 4.4711 4.78 3.34 6.03 4.6712 4.74 4.19 4.48 4.59a13 4.77 3.80 5.26 5.0714 5.74 6.53 5.00 5.2215 7.76 7.14 5.93 7.2016 7.78 7.84 7.66 8.04a17 8.8 8.64 9.13 8.0418 8.8 8.64 8.67 8.04a19 8.8 8.46 7.77 7.9220 8.84 8.38 9.01 8.0421 8.87 8.13 8.58 8.0422 8.9 7.89 9.11 8.0423 8.93 7.64 8.95 8.0424 7.97 7.28 8.20 8.0425 3.74 4.63 4.25 5.9726 4.76 5.64 5.48 6.2827 7.78 6.24 6.27 7.9428 7.8 6.85 6.79 6.6129 8.82 8.11 7.81 6.3730 8.89 8.47 8.86 7.94a31 8.92 8.23 8.01 6.38a32 7.96 7.86 7.26 6.37a33 6.97 7.74 7.03 6.37a34 7.84 8.28 9.24 7.3435 8.85 8.11 6.55 7.3436 5.79 6.44 6.48 6.2237 5.82 7.66 7.32 6.3638 6.92 7.96 7.47 6.3739 3.81 2.50 4.05 6.1540 2.83 2.64 2.29 3.1541 5.74 7.18 6.18 6.4742 7.77 7.36 7.17 6.2743 7.83 7.32 7.65 6.42

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 3: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

range of the ovicidal activity values is quite wide (6.1 logunits).

2.1 Descriptors

We have performed QSAR studies on the dataset reportedby Suzuki et al. [7] with 2D (physicochemical, structural,

and topological) and 3D (shape, spatial, electronic, andmolecular field) descriptors. The categorical list [13] of de-scriptors used in the development of QSAR models wasreported in Table 2.

2.2 Cluster Analysis

The main target of any QSAR modeling is that the devel-oped model should be strong enough to be capable ofmaking accurate and reliable predictions of biological ac-tivities of new compounds. The models were cross validat-ed using Leave-One-Out (LOO) method. However, inter-nal validation does not ascertain that the model will per-form well on a new set of data. For maximum cases, ap-propriate external dataset is not available for predictionpurpose. Thus, the whole dataset was divided into a train-ing set and a test set or external evaluation set. The mod-els developed from training set were externally validatedusing the test set. Predictive capacity of a model for newchemical entities is influenced by chemical nature of thetraining set molecules used for development of the model[14 – 16]. In actual case, the test set molecules will be pre-dicted well when these molecules are structurally verysimilar to the training set molecules. The reason is thatthe model has enclosed all features common to the train-ing set molecules. There are different techniques avail-able for division of the dataset into training and test setslike statistical molecular design, self-organizing map, clus-tering, Kennard – Stone selection, sphere exclusion, etc.[17]. In the present case we have used clustering tech-nique as the method for training set selection. Clusteranalysis [18] is a technique to arrange the objects intogroups. There are two types of clustering: (i) hierarchicalclustering and (ii)) non-hierarchical clustering. Hierarchi-cal clustering techniques proceed by either a series of suc-cessive mergers or a series of successive divisions. Ag-glomerative hierarchical methods start with the individualobjects. Thus, there are initially as many clusters as ob-jects. The most similar objects are first grouped and theseinitial groups are merged according to their similarities.Eventually, as the similarity decreases, all subgroups arefused into a single cluster. Divisive hierarchical methodswork in the opposite direction. An initial single group ofobjects is divided into two subgroups such that the objectsin one subgroup are far from the objects in the other.These subgroups are then further divided into dissimilarsubgroups; the process continues until there are as manysubgroups as objects – that is, until each object forms agroup. Non-hierarchical clustering techniques are de-signed to group items, rather than variables, into a collec-tion of K clusters. The number of clusters, K, may eitherbe specified in advance or determined as part of the clus-tering procedure. One of the important non-hierarchicaltechniques is K-means clustering [19] which has beenused in the present study. This method assigns each itemto the cluster having the nearest centroid (mean). In its

408 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Table 1. (cont.)

Compound no. Activity

Observed [7] Calculated

Eq. (M4) Eq. (M6) Eq. (M7)

44 6.82 7.44 7.86 6.3645 3.78 5.46 5.47 6.4546 8.82 8.41 7.89 8.04a47 3.76 6.21 4.57 5.19a48 8.89 7.76 7.46 8.0449 5.79 6.98 6.09 6.2350 7.84 7.94 7.61 7.95a51 6.9 7.61 8.31 7.8852 7.92 7.49 8.01 7.9053 6.93 7.37 7.84 7.9354 5.79 5.37 5.50 5.8755 5.81 6.34 6.59 5.4156 6.84 7.19 7.45 7.9557 7.86 7.81 7.42 7.9658 8.87 7.86 8.21 7.5259 7.81 6.54 5.69 5.72a60 5.83 6.71 6.27 5.4261 5.88 6.68 6.17 5.7762 8.86 7.06 8.08 7.9563 7.87 7.68 7.60 7.9664 7.89 7.73 7.79 7.4665 6.89 7.56 7.40 7.95a66 8.84 6.29 7.49 5.7367 5.92 7.31 6.86 7.9568 5.79 7.10 6.72 6.51a69 8.86 8.35 8.26 8.0470 8.88 8.23 8.14 8.0471 8.89 8.10 7.75 8.0472 7.92 7.86 7.67 8.0473 6.95 7.61 7.33 8.0474 6.77 7.16 6.84 6.5175 8.79 7.33 7.40 6.49a76 7.81 6.87 7.04 8.0477 8.84 8.38 7.98 8.0478 8.88 8.13 7.47 8.0479 8.91 7.88 6.89 8.0480 5.94 7.63 6.54 8.0481 5.97 7.39 6.36 8.04a82 5.79 6.93 6.44 6.50a83 6.82 7.10 6.43 6.4784 4.76 4.99 4.71 6.4485 5.86 7.40 8.32 5.81a86 5.77 6.20 6.38 5.4187 6.79 6.37 6.95 5.0388 5.88 7.00 7.74 8.0489 5.79 5.80 6.07 5.6390 5.82 5.97 6.90 6.11

a Test set members.

Full Papers Kunal Roy and Somnath Paul

Page 4: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 409

Scheme 1.

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 5: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

410 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Full Papers Kunal Roy and Somnath Paul

Page 6: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

simplest version, the process is composed of three steps[20]:i. Partitioning the items in to K initial clusters.

ii. Assigning an item to the cluster whose centroid (mean)

is nearest (distance is usually computed using Euclidean

distance with standardized observation) followed by re-

calculation of the centroid for the cluster receiving the

new item and for the cluster losing the item.

iii. Step 2 is repeated until no more reassignments take

place.

In this paper, the total dataset (n¼90) was divided intotraining set (n¼71) and test (external evaluation) set

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 411

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 7: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

(n¼19) (75 and 25% respectively of the total number ofcompounds) based on clusters obtained from K-meansclustering applied on standardized topological and struc-tural descriptor matrix. All the parameters were standar-dized to values between 0 and 1 and the whole dataset wasclustered into four subgroups from each of which 25% ofcompounds were selected as members of the test set. Seri-al numbers of compounds under different clusters wereshown in Table 3.

2.3 Molecular Shape Analysis (MSA)

MSA was used as a 3D QSAR technique. In our study, thesteps to perform MSA were [21]:

1) Conformational analysis. The first operation in MSAis the conformational analysis of the analogues. The con-formers were generated with the “optimal search method”option followed by energy minimization.

2) Hypothesizing an active conformer. The aim of thisstep is to select a conformer that is present in the rate-lim-iting step for the activity in a biological reaction. The finalQSAR equation was obtained with the minimum-energyconformer of the most active compound 23 as the activeconformer.

3) Selection of a candidate shape reference compound.Shape reference compound is the molecule that is usedwhen shape descriptors are calculated. MSA compares all

other molecules to the shape reference compound (globalminimum of compound 23) and provides informationabout each comparison.

4) Performing pair-wise molecular superposition. Eachstudy molecule was aligned to the shape reference com-pound using the Maximum Common Sub Graph (MCSG)method to calculate the shape descriptors.

5) Measurement of molecular shape commonality. Afteralignment, various shape descriptors, based on relativeshape similarity with the shape reference compound, werecalculated for each study molecule.

6) Other molecular descriptors. Determination of othermolecular features by calculating spatial, physicochemical,and electronic parameters was done in addition to theshape descriptors.

7) Construction of QSAR. QSAR equations were gener-ated using Genetic Function Approximation (GFA) withlinear and spline options as the statistical tool.

2.4 GFA-Multiple Linear Regression

Genetic algorithms are derived from an analogy with theevolution of DNA [22]. The GFA algorithm was initiallyanticipated by (i) Holland�s genetic algorithm and (ii)Friedman�s multivariate adaptive regression splines(MARS) algorithm. In this algorithm an individual ormodel is represented as 1D string of bits. A distinctive fea-

412 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Table 2. Categorical list of descriptors used in the development of QSAR models.

Category of descriptors Name of the descriptors

Topological Balaban (Jx), Kappa shape index (1k, 2k, 3k, 1kam, 2kam, 3kam), Flexibility (f), Subgraph count(SC_0,SC_1, SC_2, SC-3_P, SC-3_C), Connectivity index (0c, 1c, 2c, 3cp,

3cc,0cv, 1cv, 2cv, 3cv

p, 3cvc), Wiener, Za-

greb, Electro-topological state fragment type (S_sCH3, S_ssCH2, S_aaCH, S_sssCH, S_dssC, S_aasC,S_ssssC, S_dsN, S_sssN, S_sOH, S_ddssS, S_dO, S_ssO, S_ssS, S_dssS, S_sF, S_sCl, S_sBr), Atombased electro-topological state index.

Structural MW, Rotlbonds, Hbond acceptor, Hbond donorPhysicochemical Alog P, Alog P98, MR, Molref, p.Shape DIFFV, COSV, Fo, NCOSV, ShapeRMS.Electronic Apol, Dipole-mag, HOMO, LUMO, Sr.Spatial RadOfGyration, Jurs_SASA, Jurs_PPSA_1, Jurs_PNSA_1, Jurs_DPSA_1, Jurs_PPSA_2, Jurs_PN-

SA_2, Jurs_DPSA_2, Jurs_PPSA_3, Jurs_PNSA_3, Jurs_DPSA_3, Jurs_FPSA_1, Jurs_FNSA_1,Jurs_FPSA_2, Jurs_FNSA_2, Jurs_FPSA_3, Jurs_FNSA_3, Jurs_WPSA_1, Jurs_WNSA_1, Jurs_WP-SA_2, Jurs_WNSA_2, Jurs_WPSA_3, Jurs_WNSA_3, Jurs_RPCG, Jurs_RNCG, Jurs_RPCS,Jurs_RNCS, Jurs_TPSA, Jurs_TASA, Jurs_RPSA, Jurs_RASA, Shadow_XY, Shadow_XZ, Shado-w_YZ, Shadow_XYfrac, Shadow XZfrac, Shadow_YZfrac, Shadow_nu, Shadow_Xlength, Shadow_Y-length, Shadow_Zlength, Area, Vm, Density, PMI_mag.

Molecular field (probes used) Hþ , CH3, CHþ3 , CH�

3 .

Table 3. Serial numbers of compounds under different clusters.

Cluster number Serial number of compounds

1 1, 2, 3, 8, 9, 14, 15, 16, 17, 18, 36, 37, 39, 43, 45, 47, 84.2 4, 5, 6, 7, 10, 11, 12, 13, 25, 26, 27, 41, 42, 44, 49, 54, 55, 59, 60, 68, 74, 75, 76, 82, 83, 86, 87, 89, 90.3 21, 22, 23, 24, 30, 31, 32, 33, 38, 48, 51, 52, 53, 64, 67, 71, 72, 73, 79, 80, 81.4 19, 20, 28, 29, 34, 35, 40, 46, 50, 56, 57, 58, 61, 62, 63, 65, 66, 69, 70, 77, 78, 85, 88.

Full Papers Kunal Roy and Somnath Paul

Page 8: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

ture of GFA is that it produces a population of models(e.g., 100), instead of generating a single model, as do mostother statistical methods. Genetic algorithm makes superi-or models to those developed using stepwise regressiontechniques because it selects the basis functions genetical-ly. Descriptors, which were selected by this algorithm,were subjected to multiple linear regression for generationof models. A “fitness function” or Lack of Fit (LOF) isused to estimate the quality of an individual or model, sothat best individual or model receives the best fitnessscore. The error measurement term LOF is determined bythe following equation:

LOF ¼ LSE

1� cþdpM

� �2 ð1Þ

In Eq. (1), c is the number of basis functions (other thanconstant term); d is smoothing parameter (adjustable bythe user); M is number of samples in the training set; LSEis Least Squares Error and p is total number of featurescontained in all basis functions.

Once models in the population have been rated usingthe LOF score, the genetic cross over operation is repeat-edly performed. Initially two good models are probabilisti-cally selected as parents and each parent is randomly cutinto two pieces and a new model (child) is generated usinga piece from each parent. After many mating steps, i.e., ge-netic crossover type operation, average fitness of individu-als (models) in the population increases as good combina-tion of genes are discovered and spread through the popu-lation. It can build not only linear models but also higher-order polynomials, splines, and Gaussians. In our presentwork, linear and spline terms have been used. For the de-velopment of GFA models, Cerius2 4.10 version [13] hasbeen used. The mutation probabilities were kept at 10000iterations. Smoothness (d) was kept at 1.00. Initial equa-tion length value was selected as 4 and the length of the fi-nal equation was not fixed.

2.5 Molecular Field Analysis (MFA)

MFA [23] attempts to postulate and represent the essentialfeatures of a receptor site from the aligned common fea-tures of the molecules in 3D space. MFA is a 3D-QSARapproach that computes the steric, electrostatic, and hy-drogen-bond donor/acceptor interactions of a given seriesof molecules, using probes within a regularly spaced grid.MFA is a method for quantifying the interaction energybetween a probe molecule and a set of aligned target mol-ecules in QSAR. Interaction energies measured and ana-lyzed for a set of 3D structures can be useful in establish-ing SARs. To generate an energy field (also known as aprobe map), a probe molecule is placed at a random loca-tion and then moved about a target molecule within a de-fined 3D grid. At each defined point in the grid, an energy

calculation is performed, measuring the interaction energybetween the probe and the target molecule. Atoms in thetarget molecule are fixed, so that intra-molecular energyin the target is ignored. When a complete probe map is cal-culated for each molecule in the target set, energy valuesfor each point in the grid can be reported in columns add-ed to the study table. For a set of structures for which ener-gy fields are generated, some or all the grid data pointscan be used as descriptors in generating QSARs and ana-lyzing SARs. The selections of the independent variablecolumns were done automatically. MFA was performed us-ing the QSAR module of Cerius2 4.10 [13] version. A re-gression analysis was performed using the Genetic PartialLeast Squares (G/PLS) method that combines the bestfeatures of GFA and PLS. A rectangular field was generat-ed using the probes Hþ, CH3, CHþ

3 , and CH�3 . A grid spac-

ing of 2 � was used, and fields at 2184 points were gener-ated. The energy cutoff was kept at �50 to þ50 kcal. Thecharge calculation method was set to Gasteiger type.Alignments have been done on the basis of common sub-structure (CSS) method. The CSS method starts with de-fining a core model substructure to find a match in all ofthe molecules under the study. A regression analysis wasperformed using the G/PLS method that combines thebest features of GFA and PLS. The mutation probabilitieswere kept at 1000 iterations. Smoothness (d) was kept at1.00. Initial equation length value was selected as 4 andthe length of the final equation was not fixed. All the vari-ables have been scaled.

2.6 G/PLS

G/PLS [13, 24 – 25] is a statistical method that combinesthe best features of GFA and PLS. Both of these methodsare valuable statistical techniques for QSAR modelingwhere the number of descriptors is more than the numberof samples. GFA is used to select the appropriate variablesto be used in the development of a model. It is followed byPLS regression as fitting technique to weigh the relativecontribution of the selected variables in the final model.G/PLS retains the ease of interpretation of GFA by backtransforming the PLS components to the original varia-bles. There is no chance of over-fitting of the model.

2.7 Validation Methods

The robustness of the models should be verified by using dif-ferent types of validation criteria. For validation of QSARmodels, usually four strategies [26] are adopted: (i) internalvalidation or cross-validation; (ii) validation by dividing thedataset into training and test compounds; (iii) data randomi-zation or Y-scrambling, and (iv) true external validation byapplication of model on new external data. But, due to thelack of true external evaluation set, total dataset has been di-vided into internal evaluation (training) set and externalevaluation (test) set. So, we have performed only the first

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 413

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 9: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

three validation techniques. Most of the QSAR modelingmethods implement the LOO or Leave-Many-Out (LMO)cross-validation procedure, which are internal validationtechniques. The outcome from the cross-validation proce-dure is cross-validated R2 (LOO-Q2 or LMO-Q2) which isused as a criterion of both robustness and predictive abilityof the model. In this paper, we have performed the LOO val-idation method as internal validation tool. Cross-validatedsquared correlation coefficient R2 (LOO-Q2) is calculated ac-cording to this equation:

Q2 ¼ 1�PðYobsðtrainingÞ � YcalðtrainingÞÞ2PðYobsðtrainingÞ � Y trainingÞ2

ð2Þ

In Eq. (2), Y training represents average activity value of thetraining set while Yobs(training) and Ycal(training) represent ob-served and calculated activity values of training set com-pounds respectively. Often, a high Q2 value (Q2>0.5) isconsidered as a proof of high predictive ability of the mod-el [27].

Models are generated based on training set compoundsand predictive capacity of the models is judged based onthe predictive R2 (R2

pred) values calculated according to thefollowing equation [28]:

R2pred ¼ 1�

PðYobsðtestÞ � YpredðtestÞÞ2

PðYobsðtestÞ � Y trainingÞ2

ð3Þ

In Eq. (3), Ypred(test) and Yobs(test) indicate predicted and ob-served activity values respectively of the test set com-pounds and Y training indicates the mean activity value of thetraining set compounds. The value of R2

pred for an accepta-ble model should be more than 0.5.

Further statistical significance of the relationship be-tween the ovicidal activity and chemical structure descrip-tors was obtained by randomization (Y-randomization) ofthe model development process. The test was done by re-peatedly scrambling the activity values to generate QSARmodels and comparing the resulting scores with the scoreof the original QSAR model generated from nonrandom-ized activity values. If the score of the QSAR model is sig-nificantly better than those of the random models thenthat model should be considered as statistically robustmodel [29]. The randomization tests have been performedat 90% confidence level.

2.8 Softwares

MINITAB [30] was used for linear regression. Cerius2 ver-sion 4.10 [13] was used for GFA and MFA analyses. STA-TISTICA [31] and SPSS [32] were used to determine theLOO calculated values of training set compounds and cor-relation matrix of the descriptors respectively.

3 Results and Discussion

In this paper, the total dataset (n¼90) was divided intotraining set (n¼71) and test (external evaluation) set (n¼19) (75 and 25% respectively of the total number of com-pounds) based on clusters obtained from K-means cluster-ing applied on standardized topological and structural de-scriptor matrix. The Principal Component Analysis (PCA)score plot (Figure 1) of first three components of thestandardized topological and structural descriptor matrixshows distribution of training and test set compounds in3D space and their cluster membership (1 – 4). It may benoted that, the distribution of the whole dataset into train-ing and test sets has been done by K-means clustering andnot using the PCA score plot. However, the plot showsthat each test set compound is located in the close vicinityof at least one training set compound in the 3D space. Thevalues of important variables appearing the reported equa-tions are shown in Tables S1 and S2.

3.1 Classical Type QSAR

3.1.1 QSAR using Topological, Structural and Physico-chemical Descriptors

Models have been generated with topological, structural,and physicochemical descriptors using GFA with linearand spline options as the statistical tool. The mutationprobability was kept at 10000 iterations. In case of GFAlinear technique, the following equation was obtained withacceptable LOO internal variance (Q2) and predicted var-iance (R2

pred).

414 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Figure 1. PCA score plot of first three components of thestandardized topological and structural descriptor matrix.

Full Papers Kunal Roy and Somnath Paul

Page 10: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

pLC50 ¼ �431:08ð�98:98Þ � 67:880ð�15:580Þ1kþ 65:700�ð�15:010ÞSC 1þ 1:384ð�0:563Þ3kam � 0:228�ð�0:048ÞS dO� 1:072ð�0:274ÞS17 þ 0:767�ð�0:259ÞS16

nTraining ¼ 71;R2 ¼ 0:598;R2a ¼ 0:560;F ¼ 15:9ðdf 6; 64Þ;

Q2 ¼ 0:514;PRESS ¼ 105:9; nTest ¼ 19;R2pred ¼ 0:595;

r2 ¼ 0:586; r20 ¼ 0:584; r2

m ¼ 0:560: ðM1Þ

The above model could explain 56% of the variance (ad-justed coefficient of variation). The LOO predicted var-iance was found to be 51.4%. The predictive potential ofthis model has been determined by predicted R2 on thetest set compounds and it was found to be 0.595. Thesquared correlation coefficient between the observed andpredicted activity of the test set compounds was 0.586. Thesquared correlation coefficient between the observed andpredicted activity of the test set compounds, setting inter-cept to zero, was found to be 0.584.

Using the standardized variable matrix for regression,the significance level of the descriptors is found to be ofthe order: 1k, SC_1, 3kam, S_dO, S17, S16. But, in the abovemodel, the variables SC_1 and 1k are highly intercorrelat-ed though the equation shows acceptable internal and ex-ternal validation statistics. We have anticipated that theremay be a parabolic relationship between the activity andthe parameter SC_1, as the regression coefficients of SC_1and 1k are of opposite sign to each other. Thus, 1k may beacting similar to square term of SC_1. To verify this, wehave used the square term of SC_1 instead of 1k and gener-ated a model with better internal predicted variance andexternal predictive potential than the previous model(Eq. M1).

pLC50 ¼ �37:096ð�8:617Þ þ 3:194ð�0:657ÞSC 1� 0:065

ð�0:014ÞðSC 1Þ2 þ 1:504ð�0:565Þ3kam � 0:223ð�0:047ÞS dO� 1:104ð�0:269ÞS17

þ0:767ð�0:256ÞS16

nTraining ¼ 71;R2 ¼ 0:607;R2a ¼ 0:570;F ¼ 16:5ðdf 6; 64Þ;

Q2 ¼ 0:520; PRESS ¼ 104:6; nTest ¼ 19;R2pred ¼ 0:607;

r2 ¼ 0:597; r20 ¼ 0:595; r2

m ¼ 0:570 ðM1aÞ

Though Eq. (M1a) is better than the previously mentionedone (Eq. M1), but here is also a high intercorrelation be-tween the parameters SC_1 and 3kam. To avoid this type ofhigh intercorrelation, we have omitted the term 3kam andused bilinear modeling technique.

pLC50 ¼ �4:464ð�1:796Þ þ 0:446ð�0:074ÞSC 1� 0:676�ð�0:163Þ logðb10SC 1 þ 1Þ � 0:221ð�0:048ÞSd O� 1:119ð�0:273ÞS17 þ 0:800ð�0:260ÞS16

nTraining ¼ 71;R2 ¼ 0:586;R2a ¼ 0:554;F ¼ 18:4ðdf5; 65Þ;

Q2 ¼ 0:513; PRESS ¼ 106:1; nTest ¼ 19;R2pred ¼ 0:556;

r2 ¼ 0:546; r20 ¼ 0:544; r2

m ¼ 0:522: ðM1bÞ

Though the external validation statistics of Eq. (M1b) isslightly inferior to the Eq. (M1a), the latter does not sufferfrom the problem of high intercorrelation. SC_1 is thenumber of first-order sub graphs, which indicates the num-ber of edges, connecting the vertices of the moleculargraph (the number of bonds in the molecule). As the chainlength of para substituents on the 4-phenyl ring increases,the value of SC_1 also goes up. It suggests that the ovicidalactivity of 2(2’,6’-difluorophenyl)-4-phenyl-1,3-oxazolinederivatives first increase linearly with an increase in thevalue of SC_1 of the molecules up to an optimum SC_1value of 29 and then rate of increase in activity diminishesgradually. The optimum value of SC_1 can be calculatedby the following equation:

SC 1opt ¼ loga

bðb� aÞ ð4Þ

In Eq. (4), b is equal to 3.95�10�29, a is the regression co-efficient of SC_1 (0.446) and b is the regression coefficientof bilinear term of SC_1 (0.676). This reflects that the longchain substitution is favorable for better ovicidal activity.

S_dO indicates electro-topological state value of thefragment “¼O”. It has detrimental effect toward the ovici-dal activity, which can be confirmed in case of compounds39, 40 having “¼O” in their structure. As compound 39[�S(¼O)CH3] and compound 40 contain (�SO2CH3) frag-ments at the para position of their 4-phenyl ring, their ac-tivity is very much less.

In the above model, S17 is the atom type electro-topolog-ical state index of the para position of the 4-phenyl ring. Ithas negative contribution toward the ovicidal activity. ThisE-state index can signify the electronic interaction withinthe molecular topological environment of each atom inthe molecule. The presence of para substituents containingelectronegative atom directly attached to the 4-phenyl ringor at its close vicinity may reduce the value of S17. Thisphenomenon has been observed in case of compounds 16,35, and compounds 14, 44. In case of compound 35, thesubstituent contains highly electronegative atoms at paraposition (�OCH2CF3) of the 4-phenyl ring, whereas incase of compound 16, the substituent does not containelectronegative atoms [�CH(CH3)2]. So, the value of S17 isvery less in case of compound 35 (0.068) whereas that ofcompound 16 is quite high (1.25). So, the activity of com-pound 35 is greater than compound 16. In case of com-pound 44, the substituent contains highly electronegativeatoms at para position (�CF3) of the 4-phenyl ring, where-as in case of compound 14, the substituent does not con-tain electronegative atoms (�CH3). So, the value of S17 isless in case of compound 44 (0.099) whereas that of com-pound 14 is high (1.22). So, the activity of compound 44 isgreater than compound 14. It may be noted here that thevalue of S17 for compound 25 is significantly lower thancompounds 20 – 23. However, the former is less active thanthe latter compounds as compound 25 has considerably

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 415

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 11: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

low value of SC_1 (the optimum being 29). Thus it appearsthat presence of electronegative atoms(s) at the para posi-tion of the 4-phenyl ring will contribute positively to theovicidal activity provided the para substituent is of suffi-cient length (and hence, lipophilicity). S16 is the atom typeelectro-topological state index of the meta position of the4-phenyl ring. It has favorable contribution toward the ac-tivity as evidenced by the positive regression coefficient.The value of electro-topological state atomic index at thisposition may be influenced by the long chain substituentsat para position. The presence of long chain (lipophilic)para substitution may enhance the value of S16 and thus in-crease the ovicidal activity.

In case of GFA spline technique the following equationwas generated with acceptable LOO internal variance(Q2) and predicted variance (R2

pred).

pLC50 ¼ 27:154ð�4:831Þ � 0:207ð�0:038Þ<136� Zagreb>� 0:363ð�0:091ÞSC� 3 P� 0:217ð�0:045ÞS dO� 2:481ð�0:673Þ<S17 � 1:237>

nTraining ¼ 71;R2 ¼ 0:623;R2a ¼ 0:600;F ¼ 27:2ðdf4; 66Þ;

Q2 ¼ 0:580;PRESS ¼ 91:4; nTest ¼ 19;R2pred ¼ 0:627;

r2 ¼ 0:637; r20 ¼ 0:636; r2

m ¼ 0:617: ðM2Þ

Zagreb is defined as the sum of the squares of vertex va-lences. But, in the above model, it is present as a splineterm and its negative regression coefficient suggests that ifthe value of Zagreb is less than 136 then it shows a nega-tive contribution toward the activity. In cases where thevalues of Zagreb are greater than 136, the ovicidal activityalso increases. This reflects that long chain substituentscan increase the ovicidal activity. SC-3_P is the number ofthird-order subgraphs in the molecular graph where thepath length is three and it shows detrimental impact on theovicidal activity. Another model, not reported here, sug-gests that, if the value of SC-3_P is less than 53, it has posi-tive contribution but when the value is greater than 53then it shows negative contribution, which can be ex-plained by compounds 21 – 24. As in case of compounds21, 22, and 23, the value of SC-3_P is less than 53, and theactivity also goes up with increase in the length of substitu-ent. But, as the value of SC-3_P crosses the mark of 53,the activity of compound 24 comes down. S17 is the atomtype electro-topological state index of the para position ofthe 4-phenyl ring. It is present as a spline parameter andhas a negative effect on the activity. The presence of parasubstituents containing electronegative atom directly at-tached to the 4-phenyl ring or at its close vicinity may re-duce the value of S17. If the value of S17 is less than 1.237,then the contribution of the spline term in the above mod-el is nullified.

3.1.2 QSAR using Topological, Structural, and Physico-chemical Descriptors including Substituents� Hydro-phobicity

Here, we have used hydrophobic substituent constant (p)as an additional variable. In case of GFA linear technique,the following equation was obtained with satisfactoryLOO internal variance (Q2) and predicted variance(R2

pred).

pLC50 ¼ 5:312ð�1:025Þ þ 2:909ð�0:501Þ1cv

� 4:720ð�0:809Þ3cvp þ 1:660ð�0:177Þpp

�0:002ð�0:001ÞWiener� 0:272ð�0:101ÞS ssCH2

� 0:957ð�0:209ÞS17

nTraining ¼ 71;R2 ¼ 0:743;R2a ¼ 0:718;F ¼ 30:8ðdf6; 64Þ;

Q2 ¼ 0:688;PRESS ¼ 68:0; nTest ¼ 19;R2pred ¼ 0:569;

r2 ¼ 0:570; r20 ¼ 0:569; r2

m ¼ 0:552: ðM3Þ

Based on regression on standardized variable matrix, sig-nificance level of the descriptors are of the order 1cv, 3cv

p,pp, Wiener, S_ssCH2, S17. But, in the above model, the vari-ables (1cv, 3cv

p, pp, Wiener, S_ssCH2) are highly intercorre-lated and thus we have eliminated 1cv, 3cv

p, Wiener,S_ssCH2 and took the less intercorrelated variables, i.e.,pp, S17. We have anticipated that there may be a parabolicrelationship between the biological activity and pp, as theparameters pp and Wiener have coefficients of oppositesign. So, we have used the squared term of pp instead ofWiener and generated the following model.

pLC50 ¼ 6:191ð�0:290Þ þ 1:471ð�0:166Þpp � 0:171�ð�0:028Þp2

p � 0:865ð�0:236ÞS17

nTraining ¼ 71;R2 ¼ 0:619;R2a ¼ 0:602;F ¼ 36:4ðdf3; 67Þ;

Q2 ¼ 0:545; PRESS ¼ 99:1; nTest ¼ 19;R2pred ¼ 0:601;

r2 ¼ 0:591; r20 ¼ 0:589; r2

m ¼ 0:565: ðM3aÞ

The model is better than the previously mentioned one[Eq. (M3)]. We have used the bilinear modeling techniqueto describe the effects of the variables on the activity moresignificantly.

pLC50 ¼ 6:001ð�0:278Þ þ 1:765ð�0:182Þpp � 1:951�ð�0:269Þ logðb10pp þ 1Þ � 0:849ð�0:221ÞS17

nTraining ¼ 71;R2 ¼ 0:664;R2a ¼ 0:649;F ¼ 44:2ðdf3; 67Þ;

Q2 ¼ 0:626; PRESS ¼ 81:5; nTest ¼ 19;R2pred ¼ 0:629;

r2 ¼ 0:623; r20 ¼ 0:621; r2

m ¼ 0:595: ðM3bÞ

The parameter pp indicates the hydrophobicity of the sub-stituents at the para position of the 4-phenyl ring. It hasoptimistic contribution toward the activity as evidenced bythe positive regression coefficient. It suggests that the ovi-cidal activity of 2-(2’,6’-difluorophenyl)-4-phenyl-1,3-oxa-zoline derivatives first increase linearly with an increase in

416 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Full Papers Kunal Roy and Somnath Paul

Page 12: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

the value of pp of the substituents up to an optimum pp

value of 2.90 and then rate of increase in activity diminish-es gradually. The optimum value of b in the above equa-tion [Eq. (M3b)] is 1.2�10�2.

In case of GFA spline technique the following equationwas obtained with agreeable LOO internal variance (Q2)and predicted variance (R2

pred).

pLC50 ¼ 14:049ð�1:821Þ � 1:873ð�0:179Þ<1:98� pp>� 0:923ð�0:196ÞS17 � 0:120ð�0:036ÞSC 3 Pþ 8:540ð�2:318Þ<0:14� pm>

nTraining ¼ 71;R2 ¼ 0:740;R2a ¼ 0:724;F ¼ 47:0ðdf4; 66Þ;

Q2 ¼ 0:700;PRESS ¼ 65:4; nTest ¼ 19;R2pred ¼ 0:625;

r2 ¼ 0:615; r20 ¼ 0:614; r2

m ¼ 0:596: ðM4Þ

In the above model pp is present as a spline variable andthe spline term has detrimental contribution toward theovicidal activity. If the value of pp is less than 1.98 then itlessens the activity but if the value of it is greater than1.98, then the unfavorable effect may be nullified. So, longchain substitution at para position of the 4-phenyl ringmay be favorable for the activity. The parameter pm indi-cates the hydrophobicity parameter of the substituents atmeta position. But, in the above model pm is present as aspline variable and the spline term has optimistic contribu-tion toward the ovicidal activity. The value of pm should beless than 0.14 for a favorable effect. So, less lipophilic sub-stituents at meta position of the 4-phenyl ring may be fa-vorable for the activity.

3.2 3D QSAR

3.2.1 MSA

We have further performed 3D QSAR to obtain additionalinformation about the effect of shape, spatial arrangementof atoms, and charge distribution of the substituents on thebiological activity. This study was conducted using MSA de-scriptors along with additional descriptors like physico-chemical, spatial, and electronic parameters. We have de-veloped two types of models: (i) using shape, spatial, andelectronic descriptors; (ii) using shape, spatial, electronic,and physicochemical parameters using GFA with spline op-tion as statistical tool. Figure 2 shows the aligned geometryof the training set compounds used in MSA.

3.2.2 QSAR using Shape, Spatial, and Electronic Descrip-tors

Models have been generated with shape, spatial, and elec-tronic descriptors using GFA with spline option as the stat-istical tool. The mutation probability was kept at 10 000iterations. In case of GFA spline technique the followingequation was obtained with acceptable LOO internal var-iance (Q2) and predicted variance (R2

pred).

pLC50 ¼ 12:245ð�1:488Þ � 16:054ð�2:557ÞJurs RPCG� 0:009ð�0:001Þ<1275:64� PMI mag>� 0:001�ð�0:0001ÞApolþ 0:014ð�0:004ÞCOSVþ 2:660�ð�0:694Þ<� 10:733�HOMO>þ 0:718�ð�0:279ÞShapeRMS

nTraining ¼ 71;R2 ¼ 0:666;R2a ¼ 0:635;F ¼ 21:3ðdf6; 64Þ;

Q2 ¼ 0:615;PRESS ¼ 83:9; nTest ¼ 19;R2pred ¼ 0:678;

r2 ¼ 0:752; r20 ¼ 0:740; r2

m ¼ 0:670: ðM5Þ

Jurs_RPCG is the partial charge of the most positive atomdivided by the total positive charge.

Jurs RPCG ¼ Qþmax

Qþð5Þ

Qþmax¼Charge of the most positive atom.Qþ¼Total positive charge.Jurs_RPCG has unfavorable contribution toward the ac-

tivity as evidenced by the negative regression coefficient.This indicates that if the positive charge concentrates at aparticular atom then activity may be diminished. So, forbetter activity, positive charge should be distributed over asurface. PMI_mag is the moment of inertia, which is theresultant of the moment of inertia of three axes. They arecalculated for a series of straight lines through the centerof mass. These are associated with the principal axes of theellipsoid. In the above model, it is present as a spline varia-ble and the spline term has detrimental contribution to-ward the ovicidal activity. If the value of PMI_mag is lessthan 1275.64, the ovicidal activity will be low. Apol is thesummation of atomic polarizabilities of the whole mole-cule. It has unfavorable contribution toward the ovicidal

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 417

Figure 2. Aligned geometry of the training set members usedin MSA.

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 13: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

activity. This can be explained by the example of com-pounds 23, 24, and compounds 31 – 33. This suggests that,as the value of Apol increases, ovicidal activity also in-creases. But, after attaining a certain length of the substitu-ent, it negatively influences the activity. This implies thatthe size of the substituent should be optimum. COSV iscommon overlap steric volume, which is the volumeshared by two superimposed molecules: the common vol-ume between each individual molecule and the moleculeselected as shape reference compound. This is a measureof how similar in steric shape the analogues are to theshape reference candidate. It has favorable contributiontoward the activity as evidenced by the positive regressioncoefficient. When the common overlap steric volume of in-dividual compounds increases with respect to the shapereference compounds, better activity may be obtained.HOMO is the energy of highest occupied molecular orbi-tal. Molecules with high HOMO energy can easily donateelectrons and it represents nucleophilicity of a molecule.In the above model, it is present as spline variable and thespline term has positive contribution toward the ovicidalactivity. ShapeRMS is the root mean square deviation be-tween the individual molecule and the shape referencecompound. It has positive contribution toward the ovicidalactivity. When the values of COSV of two compounds aresimilar, then increase in the value of ShapeRMS may raisethe activity as is the case for compounds 16 and 18.

3.2.3 QSAR using Shape, Spatial, Electronic, and Physi-cochemical Descriptors

Models have been generated with shape, spatial, electron-ic, and physicochemical descriptors using GFA with splineoption as the statistical tool. The mutation probability waskept at 10 000 iterations. In case of GFA spline techniquethe following equation was obtained with satisfactoryLOO internal variance (Q2) and predicted variance(R2

pred).

pLC50 ¼ 10:106ð�0:434Þ � 2:082ð�0:227Þ<5:838�A� log P98>� 0:012ð�0:002Þ<1061:54� PMI mag>þ 0:320ð�0:068Þ<78:714�Molref>� 327:551ð�86:110Þ<0:036� Jurs FPSA 3>� 0:009ð�0:003ÞNCOSV

nTraining ¼ 71;R2 ¼ 0:711;R2a ¼ 0:688;F ¼ 31:9ðdf5; 65Þ;

Q2 ¼ 0:666;PRESS ¼ 72:7; nTest ¼ 19;R2pred ¼ 0:755;

r2 ¼ 0:752; r20 ¼ 0:748; r2

m ¼ 0:704: ðM6Þ

Alog P98 is the calculated value of logarithm of partitioncoefficient using Ghose and Crippen�s method [33]. It ispresent as a spline variable which negatively contributesto the ovicidal activity. If the critical value of Alog P98 ex-ceeds the limit of 5.838, then negative effect may be nulli-fied. So, for better ovicidal activity, molecule should be op-timally lipophilic. Molref is the atom type molar refractivi-

ty, estimated by group contribution method suggested byGhose and Crippen. It is present as a spline variable and ispositively contributing to the ovicidal activity. If the valueof Molref is higher than 78.714, then better activity mayremain. So, for greater ovicidal activity, size of the mole-cule should be large, i.e., the substituents should be bulky.

Jurs_FPSA-3 is fractional charged partial positive sur-face area. It can be calculated by the atomic chargeweighted positive surface area (PPSA-3) divided by the to-tal molecular Solvent Accessible Surface Area (SASA).

Jurs FPSA� 3 ¼ PPSA� 3SASA

ð6Þ

It is present as a spline parameter in the above model andhas detrimental contribution toward the ovicidal activity.

NCOSV is the volume of the individual molecule andthe common overlap steric volume. It has detrimental con-tribution toward the activity as evidenced by the negativeregression coefficient. If the non common volume of indi-vidual molecule differs from the shape reference com-pound to a large extent, the ovicidal activity may reduce.

The intercorrelation (r2) matrices for Eqs. (M4) and(M6) are shown in Tables S3 and S4 respectively of Sup-porting Information.

3.2.4 MFA

MFA is a method for quantifying the interaction energybetween a probe molecule and a set of aligned target mol-ecules in QSAR. The mutation probability was kept at1000 iterations. Figure 3 shows the aligned geometry of thetraining set compounds used in MFA. In case of G/PLSspline technique the following equation was obtained withsatisfactory LOO internal variance (Q2) and predicted var-iance (R2

pred).

pLC50 ¼ 8:040� 0:272<3:177� CH�3 =1100> �0:452<3:158� CH3=1294> �0:037<Hþ=1253� 1:882> �0:236<� 4:627�Hþ=1266 >

nTraining ¼ 71;R2 ¼ 0:574;R2a ¼ 0:561;F ¼ 45:8ðdf2; 68Þ;

Q2 ¼ 0:531;PRESS ¼ 102:1; nTest ¼ 19;R2pred ¼ 0:517;

r2 ¼ 0:582; r20 ¼ 0:582; r2

m ¼ 0:582: ðM7Þ

CH3/1294 indicates the interaction of steric probe at gridpoint number 1294 with the molecules. It is present as aspline term and that spline parameter has detrimental ef-fect to the activity. The value of CH3/1294 should be great-er than 3.158 to cancel out this effect. Interaction with thegrid point 1294 is possible when a long chain substituent ispresent at the para position of the 4-phenyl ring. This sug-gests that the presence of bulky substituents at the paraposition of the 4-phenyl ring may enhance the activity.CH�

3 /1100 indicates the interaction of electronegative

418 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Full Papers Kunal Roy and Somnath Paul

Page 14: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

probe at lattice point 1100 with the molecules. It is presentas a spline descriptor and has negative contribution towardactivity. The value of CH�

3 /1100 should be greater than3.177 for better activity. Hþ/1253 and Hþ/1266 are the in-teraction of electropositive probes at the grid points 1253and 1266 respectively. Both are present as spline variablesand have detrimental contributions to the ovicidal activity.The value of Hþ/1253 should be less than 1.882 and that ofHþ/1266 should be greater than �4.627. The above resultsdemonstrate that, neither only electronegative nor onlyelectropositive substituents enhance the activity. For bet-ter activity, charge distribution should be optimum amongthe substituents. The substituents having electronegativeatom with distributed positive charge may amplify the ovi-cidal activity. Figure 4 shows the most active compound(compound 23) within the MFA grid showing the impor-tant interaction points. Substituents with optimum lengthand lipophilicity and proper charge distribution will en-hance the ovicidal activity.

3.3 Further Test on Model Predictivity

3.3.1 Model Instability Analysis for Classical LOO Vali-dation

The model instability analysis can be applied during LOOvalidation by calculating the difference in the descriptorregression coefficient measured and summarized for alldescriptors and compounds used in validation. This is de-fined as the Model Instability Coefficient (MIC) [34]which can be calculated by the following equation:

MIC ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiX

i

DIC2i

ND

s

ð7Þ

In the above equation, ND is the number of descriptorsused and DICi is the descriptors instability coefficient fordescriptors i.

DICi ¼ 100�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXN

j¼1

DC2j =E2

N

� �vuut ð8Þ

In the equation given above, DCj is the difference betweenthe descriptor regression coefficients with the jth com-pound in and out of the training set, N is the number ofcompounds selected for validation and E is the 95% confi-dence interval of the descriptor regression coefficient. Ifthe MIC value is less than 100% then the model is consid-ered to be stable; if it is over 100% then the model is con-sidered to be unstable. The actual value of MIC is thequantitative measure of the model instability. Analogous-ly, DIC indicates the instability for each descriptor sepa-rately. The values of DIC and MIC for Eq. (M4) and (M6)are listed in the Table 4.

The variation in predicted values with compounds inand out of the training set during validation can also besummarized. This is defined as the Model Value InstabilityCoefficient (MVIC) [34] which can be evaluated by thefollowing equation:

MVIC ¼ 100�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXN

j

DV2j =E2

N

� �vuut ð9Þ

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 419

Figure 3. Aligned geometry of the training set compoundsused in MFA.

Figure 4. The most active compound (compound 23) within theMFA grid showing its important interaction points.

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 15: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

In the above equation, DVj is the difference between thecalculated/predicted values with the compound in and outof the training set, N is the total number of compounds se-lected for validation and E is the 95% confidence intervalof the calculated activity value with the compound in thetraining set. If MVIC is less than 100% then the model canbe considered stable, while a value of MVIC greater than100% indicates instability of the model and indicates thatthe model is under fitted. The values of MVIC ofEqs. (M4) and (M6) are listed in Table 4. It is found thatMIC and MVIC values of Eqs. (M4) and (M6) are withinacceptable range.

3.3.2 Modeling Power of a Model

The �descriptive power� statistic Dp [35] attempts to assessthe stability of the b-coefficients (b) based on the estimat-ed overall relative uncertainty U(b) according to the fol-lowing equation:

Dp ¼ 100� 1�meanUðbÞ

b

����

����

� �� ð10Þ

Here, the b-vector is obtained using the full model (i.e., allthe compounds in the dataset) and the U(b) vector is ob-tained by jack-knifing (in each cross-validation step, thesub-model generates a vector of �perturbed respect to b�coefficients. For each coefficient, bj (j goes from 0 to thenumber of descriptors).

U bj

� �¼ 2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiXb� bpi

� �2 �

N0 � 1ð Þ=N0ð Þr

ð11Þ

Here, i represents the cross-validation step. U(b) can beregarded as approximately 95% confidence interval. Thevalue of Dp should be in the range of 0 – 100% for an ac-ceptable model. The values of Dp for Eqs. (M4) and (M6)are listed in Table 5.

The �predictive power� statistic (Pp) [35] attempts to es-timate the predictive capability of a model.

Pp ¼ 100

� 1þ N0RMSEC2= N0 � 1ð Þ � 2 N0 � 1ð ÞRMSECV2=N0

s yð Þ2

! !

ð12Þ

In Eq. (12), RMSEC and RMSECV are root mean squareerror of calibration and root mean square error of crossva-lidation respectively while s(y)2 is the variance of the ob-served y data. The value of Pp should be in the range of0 – 100% for an acceptable model. The values of Pp forEqs. (M4) and (M6) are listed in Table 5.

The global �modeling power� (Mp) [35] can be calculatedby the following equation:

Mp ¼fDpDp þ fPpPp

fDp þ fPpð13Þ

In the above equation, fDp and fPp are weighting factorsthat represent the relative importance that the user wantsto confer to the descriptive and predictive aspects. For sim-plicity, fDp and fPp can be in the range 0 – 1, but taking intoaccount that fPp¼1� fDp. So, for simplicity the above men-tioned equation becomes

Mp ¼ fDpDp þ 1� fDp

� �Pp ð14Þ

We have used 50% weightage for both fDp and fPp. The cal-culated values of Mp for Eqs. (M4) and (M6) are listed inTable 5. It is found that Eqs. (M4) and (M6) have model-ing power of 63.0 and 65.7% respectively.

3.4 Additional Test on External Validation

The models were also subjected to the test for criteria ofexternal validation as suggested by Golbraikh and Tropsha[36]. To know predictive potential of the models, squaredcorrelation coefficient values between the observed andpredicted values of the test set compounds with intercept(r2) and without intercept (r2

0) were calculated. Inter-

420 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Table 4. Results of model instability analysis for classical LOO validation for the best two models.

Eq. no. Type of descriptors Model type Descriptors DIC MIC MVIC

M4 TopologicalþStructuralþPhysicochemicala GFA-Spline SC-3_P 1.174 5.438 4.571S17 5.975h1.98-ppi 6.435h0.14-pmi 6.307

M6 ShapeþSpatialþElectronicþPhysicochemical GFA-Spline NCOSV 4.650 5.368 4.433h1061.54-PMI_magi 5.277h78.714-Molrefi 5.325h5.838-Alog P98i 5.456h0.036-Jurs_FPSA_3i 6.042

a Along with p values.

Full Papers Kunal Roy and Somnath Paul

Page 16: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

change of the axes gives the value of r’20. According to Gol-braikh and Tropsha [36], models are considered accepta-ble, if they satisfy all of the following conditions:i. Q2>0.5.

ii. r2>0.6.iii. (r2� r20)/r

2<0.1 or (r2� r’20)/r2<0.1.

iv. 0.85�k�1.15 or 0.85�k’�1.15.

When the observed values of the test set compounds (Y-axis) are plotted against the predicted values of the com-pounds (X-axis) setting intercept to zero, the slope of thefitted line gives the value of k. Interchange of the axesgives the value of k’. A list of values of different validationparameters defined above for different models is given inTable 6.

It has been previously shown [37] the R2pred may not

be sufficient to indicate external predictivity of a model.The value of R2

pred is mainly controlledbyPðYobsðtestÞ � Y trainingÞ2, i.e., the difference between ob-

served values of test set compounds and mean observedactivity values of training dataset. Thus, it may not truly re-flect the predictive capability on new dataset. Besides this,the squared correlation coefficient (r2) between observedand predicted values of the test set compounds does notnecessarily mean that the predicted values are very near toobserved activity (there may be considerable numericaldifference between the values though maintaining an over-all good intercorrelation). So, for better external predic-tive potential of the model, a modified r2 (r2

mðtestÞ) was in-troduced by the following equation [37]:

r2mðtestÞ ¼ r2 1�

ffiffiffiffiffiffiffiffiffiffiffiffiffiffir2 � r2

0

q� �ð15Þ

The value of r2mðtestÞ should be greater than 0.5 for an ac-

ceptable model. The values of r2mðtestÞ for the different mod-

els have been reported in Table 7. According to the abovementioned parameter r2

mðtestÞ, the model with best externalpredictivity is model M6.

Previously the concept r2m was applied only to the test

set prediction [37], but it can as well be applied for trainingset if one considers the correlation between observed andLOO predicted values of the training set compounds [38].More interestingly, this can be used for the whole set con-sidering LOO-predicted values for the training set andpredicted values of the test set compounds. The advantag-es of such consideration are: (i) Unlike external validationparameters (R2

pred etc.), the r2m (overall) statistic is not

based only on limited number of test set compounds. It in-cludes prediction for both test set and training set (usingLOO predictions) compounds. Thus, this statistic is basedon prediction of comparably large number of compounds.In many cases, test set size is considerably small and re-gression based external validation parameter may be lessreliable and highly dependent on individual test set obser-vations. In such cases, the r2

m (overall) statistic may be ad-vantageous. (ii) In many cases, comparable models are ob-tained where some models show comparatively better in-ternal validation parameters and some other models showrelatively superior external validation parameters. Thismay create problem in selecting the final model. The r2

m

(overall) statistic may be used for selection of the best pre-dictive models from among comparable models. For thepresent QSAR study, we have determined r2

m values forboth training (based on LOO predicted values) and testsets and also for the whole set for the reported models andthe results are shown in Table 7. According to the afore-mentioned novel statistical parameter r2

m (overall), modelM4 is the best one.

3.5 Applicability Domain (AD)

When a compound is highly dissimilar to all compounds ofthe modeling set, the reliable prediction of its activity isunlikely. A concept of the AD [39] was used to avoid suchan unjustified extrapolation of activity predictions. In thisstudy, the AD was defined as a threshold distance (DT) be-

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 421

Table 5. Results of Modeling power of the best two models.

Eq. no. Type of descriptors Model type Descriptive power(Dp)

Predictive power(Pp)

Modeling power(Mp)

M4 TopologicalþStructuralþPhysicochemicala GFA-Spline 58.47% 67.60% 63.04%M6 ShapeþSpatialþElectronicþPhysicochemical GFA-Spline 67.42% 64.00% 65.71%

a Along with p values.

Table 6. External validation criteria according to Golbraikh and Tropsha [36] for the best two models.

Eq. no. Type of descriptors Model type r2 Q2 (r2-r20)/r2 K

M4 TopologicalþStructuralþPhysicochemicala GFA-Spline 0.615 0.700 0.002 1.001M6 ShapeþSpatialþElectronicþPhysicochemical GFA-Spline 0.752 0.666 0.005 1.008

a Along with p values.

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 17: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

tween a compound under prediction and it�s closest near-est neighbor of the training set, calculated as follows [39]:

DT ¼ yþ zs ð16Þ

Here, �y is the average Euclidean distance of each com-pound from the center of that cluster. We have used topo-logical and structural parameters to define the clusters. s

is the standard deviation of these Euclidean distances, andz is an arbitrary parameter to control the significance level.We set the value of this parameter z to 0.5 [39]. In ourstudy, compounds 32, 33, 34, and 86 from the test set arelocated out side the AD. So, we have eliminated thosecompounds from the test set and determined the predic-tive potential of the model no. M6. The external predictivepotential parameter (R2

pred) has increased significantly(R2

pred¼0.792) on removal of these four test set com-pounds.

3.6 Process Randomization

Robustness of the models relating the ovicidal activitywith selected descriptors was judged by randomization (Y-randomization) of the model development process. Thetest was done by repeatedly scrambling the activity valuesto generate QSAR models from the whole pool of descrip-tors and then comparing the resulting scores with the scoreof the original QSAR model generated from nonrandom-ized activity values. In each case except MFA, the averagecorrelation coefficient (Rr) of randomized models weresignificantly less than the correlation coefficient (R) of thenonrandomized model. The results of process randomiza-tion have been shown in Table 8.

3.7 Model Randomization

Further statistical significance of the relationship betweenthe ovicidal activity and descriptors was checked by ran-domization test (Y-randomization) of the models. Thistechnique ensures the robustness of the model. The valuesof dependent variable were randomly scrambled and newQSAR models were developed keeping the independentvariable matrix unchanged. The randomization tests forthe models have been performed at 99% confidence level.The test has been done by shuffling the ovicidal activityvalues and for each model the average value of the corre-lation coefficient (Rr) was calculated. For an acceptableQSAR model, the average correlation coefficient (Rr) ofrandomized models should be less than the correlation co-efficient (R) of nonrandomized model. No clear-cut rec-ommendation was found in the literature for the differ-ence between the average correlation coefficient (Rr) ofrandomized models and the correlation coefficient (R) ofnonrandomized model. We have used a parameter R2

p inthe present paper, which penalizes the model R2 for thedifference between squared mean correlation coefficient

422 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Tabl

e7.

Com

pari

son

ofst

atis

tica

lqu

alit

ypa

ram

eter

san

dva

lidat

ion

para

met

ers

ofth

em

odel

s.

Eq.

no.

Type

ofde

scri

ptor

sM

odel

type

Mod

elQ

ualit

yIn

tern

alva

lidat

ion

para

met

erE

xter

nal

valid

atio

npa

ram

eter

sO

vera

llva

lidat

ion

para

met

er

Mod

elra

ndom

izat

ion

R2

R2 a

Fs

Q2

PR

ESS

r2 mðL

OOÞ

R2 pr

edr2 mðT

estÞ

r2 mðO

vera

llÞR

2 rb

R2 p

M1

Top

olog

icalþ

Stru

ctur

alþ

Phy

sico

chem

ical

GFA

-Lin

ear

0.59

80.

560

15.9

1.16

90.

514

105.

90.

393

0.59

50.

560

0.41

40.

076

0.43

2M

2T

opol

ogic

alþ

Stru

ctur

alþ

Phy

sico

chem

ical

GFA

-Spl

ine

0.62

30.

600

27.2

1.11

60.

580

91.4

0.43

30.

627

0.61

70.

446

0.04

90.

472

M3

Top

olog

icalþ

Stru

ctur

alþ

Phy

sico

chem

ical

aG

FA-L

inea

r0.

743

0.71

830

.80.

935

0.68

868

.00.

521

0.56

90.

552

0.51

40.

081

0.60

5M

4T

opol

ogic

alþ

Stru

ctur

alþ

Phy

sico

chem

ical

aG

FA-S

plin

e0.

740

0.72

447

.00.

926

0.70

065

.40.

529

0.62

50.

596

0.53

50.

052

0.61

4M

5Sh

apeþ

Spat

ialþ

Ele

ctro

nic

GFA

-Spl

ine

0.66

60.

635

21.3

1.06

50.

615

83.9

0.45

50.

678

0.67

00.

468

0.08

80.

506

M6

Shap

Spat

ialþ

Ele

ctro

nicþ

Phy

sico

chem

ical

GFA

-Spl

ine

0.71

10.

688

31.9

0.98

40.

666

72.7

0.50

00.

755

0.70

40.

526

0.06

80.

570

M7

Mol

ecul

arfi

eld

desc

ript

ors

G/P

LS-

Splin

e0.

574

0.56

145

.81.

168

0.53

110

2.1

0.53

10.

517

0.58

20.

512

0.00

50.

433

M8

Con

sens

us0.

736

0.78

646

.60.

601

0.71

60.

633

0.61

1

aA

long

wit

hp

valu

es.

bSq

uare

dm

ean

Rfo

rra

ndom

mod

els.

Full Papers Kunal Roy and Somnath Paul

Page 18: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

(R2r ) of randomized models and squared correlation coeffi-

cient (R2) of nonrandomized model. The above mentionednovel parameter can be calculated by the following equa-tion:

R2p ¼ R2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2 � R2

r

pð17Þ

This novel parameter R2p ensures that the models thus de-

veloped are not obtained by chance. For an acceptableQSAR model, the value of R2

p should be greater than 0.5.The values of R2, R2

r , and R2p for different models have

been reported in the Table 7. According to the above men-tioned novel parameter R2

p, model M4, i.e., QSAR modelwith topological, structural and physicochemical descrip-tors including substituent hydrophobicity is the best one.

3.8 Comparison with Previous Work

Suzuki et al. [7] worked on the same dataset eliminatingfour compounds (compounds 36 – 38, 45) as outliers andgenerated classical QSAR models on the basis of remain-ing 86 compounds. They used hydrophobic (p and log P),electronic (s), steric (Es and STERIMOL) parameters andsome indicator variables (Io, Im, Ip). They obtained a para-bolic relationship between log P (lipophilicity) and theovicidal activity. The best model generated by them using86 compounds had regression coefficient (R) of 0.861,which is quite close to our best model, i.e., model M4(R2¼0.740). But, Suzuki et al. did not validate their mod-els either internally or externally. We have tried validatingthe model developed by Suzuki et al. [7] internally andLOO predicted variance (Q2) on the basis of 86 com-pounds was found to be 0.689. This is less than the internalpredictive variance (Q2¼0.700) obtained in case ofEq. (M4) of our study which included those 4 compoundswhich had been excluded by Suzuki et al. [7] in their study.This reflects superiority of our model [Eq. (M4)] over thepreviously reported model [7]. When we have excludedthose 4 compounds from the training set and developed anequation using same combination of variables as presentin Eq. (M4), and then validated the model internally, theQ2 values obtained was 0.726 which is tangibly higher thanthe Q2 value (0.689) of the model reported by Suzuki et al.

It may be noted here that we have not used any indicatorvariables in our study as used by Suzuki et al. [7] and wehave performed different sorts of validation [internal, ex-ternal, process randomization, model randomization andchecking special recommendation criteria for different as-pects of validation (vide supra)]. In addition to classicaltype of QSAR, we have also performed 3D QSAR analy-sis.

3.9 Consensus Modeling

We have used models that are considered acceptable ac-cording to Golbraikh and Tropsha [36] and used R2, Q2,R2

pred, r2mðLOOÞ, r2

mðtestÞ, r2mðOverallÞ to evaluate the statistical

quality of the QSAR models. But, different models werefound to be the best one according to different criteria.The GFA model M4 was found to be the best modelamong the models derived from 2D descriptors on the ba-sis of highest internal (Q2¼0.700) and external (R2

pred¼0.625) predictive potential. Also, based upon the r2

mðtestÞvalues, which penalizes a model for large differences be-tween observed and predicted values, the model generatedby GFA using 2D descriptors (model M4) was found to besuperior (r2

mðtestÞ ¼0.596) in comparison to the other modelsderived from 2D parameters. Again, the GFA model M6was found to be the best model among the models derivedfrom 3D descriptors based on maximum internal (Q2¼0.666) and external (R2

pred¼0.755) predictive power. Fur-thermore, based upon the r2

mðtestÞ values, which penalizes amodel for great differences between observed and predict-ed values, the model generated by GFA using 3D descrip-tors (model M6) was found to be superior (r2

mðtestÞ ¼0.704)in comparison to the other models derived from 3D de-scriptors. According to the newly introduced parameter r2

m

(overall), the best model is the model M4 [r2m (overall)¼

0.535], among all the models generated by GFA. We havetried a consensus model [39] is based on the results ob-tained by all predictive models (model M1 – M7) with thehope that it may provide the most stable solution. In gen-eral, consensus prediction implies averaging the predic-tions for each compound made by individual models forcontinuous QSAR, or by majority voting for classificationQSAR, using all models passing the validation criteria.The consensus modeling has been done by averaging the

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 423

Table 8. Results of randomization test applied on the model development process.

Eq. no. Type of descriptors Model type R Rr

M1 TopologicalþStructuralþPhysicochemical GFA-Linear 0.756 0.323M2 TopologicalþStructuralþPhysicochemical GFA-Spline 0.798 0.336M3 TopologicalþStructuralþPhysicochemicala GFA-Linear 0.861 0.372M4 TopologicalþStructuralþPhysicochemicala GFA-Spline 0.865 0.348M5 ShapeþSpatialþElectronic GFA-Spline 0.808 0.450M6 ShapeþSpatialþElectronicþPhysicochemical GFA-Spline 0.811 0.447M7 Molecular field descriptors G/PLS-Spline 0.693 0.519

a Along with p values.

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

Page 19: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

pLC50 values calculated (for training set) and predicted(for test set) by all seven types of models. The statisticalqualities of consensus model have been included in Ta-ble 7.

4 Conclusions

The present studies have been performed with 2D (physi-cochemical, structural and topological) and 3D (shape,spatial, electronic and molecular field) descriptors. Thewhole dataset (n¼90) was divided into a training set (75%of the dataset) and a test set (remaining 25%) on the basisof K-means clustering technique of standardized topologi-cal and structural descriptor matrix. Models developedfrom the training set compounds were used to predict theactivity of the test set compounds. A comparison of statis-tical quality of different models is given in Table 7. Model-ing with 2D descriptors reflected the importance of Za-greb, subgraph count (SC_1, SC-3_P), electro-topologicalstate indices of specific fragment types (S_dO, S_ssCH2),electro-topological state indices of specific atoms (S16, S17),Molref, hydrophobicity of substituents and whole mole-cule (pm, pp, Alog P98). Among the 3D parameters, spatial(Jurs_RPCG, Jurs_FPSA_3, PMI_mag), shape (COSV,NCOSV, ShapeRMS) and electronic (HOMO, Apol) de-scriptors showed importance. MFA suggests the impor-tance of probes (Hþ, CH3, CH�

3 ) at definite locations. TheGFA model M4 was found to be the best model amongthe models derived from 2D descriptors on the basis ofhighest internal (Q2¼0.700) and external (R2

pred¼0.625)predictive potential. Also, based upon the r2

mðtestÞ values,which penalizes a model for large differences between ob-served and predicted values, the model generated by GFAusing 2D descriptors (model M4) was found to be superior(r2

mðtestÞ ¼0.596) in comparison to the other models derivedfrom 2D parameters. Again, the GFA model M6 wasfound to be the best model among the models derivedfrom 3D descriptors based on maximum internal (Q2¼0.666) and external (R2

pred¼0.755) predictive power. Fur-thermore, based upon the r2

mðtestÞ values, which penalizes amodel for great differences between observed and predict-ed values, the model generated by GFA using 3D descrip-tors (model M6) was found to be superior (r2

mðtestÞ ¼0.704)in comparison to the other models derived from 3D de-scriptors. According to the newly introduced parameter r2

m

(overall), the best model is the model M4 [r2m (overall)¼

0.535], among all the models generated by GFA. Accord-ing to the parameter R2

p, reported in this paper, model M4(R2

p¼0.614) is the best one. On the basis of the potentialof external predictivity, model M6 is the best one. If weuse the concept of AD on model M6 and eliminate fourtest set compounds (located out side the AD), then we ob-tain significantly higher external predictive variance(R2

pred¼0.792).

Models obtained by using 2D parameters revealed thatthe chain length of the substituent at para position of the4-phenyl ring is a critical factor. Initially the ovicidal activ-ity is enhanced as the substituent chain length increases,but after a certain limit the activity reduces though thechain length increases. This implies that the lipophilicity ofthe substituents should be optimum. Electro-topologicalstate indices of specific atoms (S17, S16) and substituent hy-drophobicity parameter suggest that the presence of longchain para substituents containing electronegative atomdirectly attached to the 4-phenyl ring or at its close vicinitymay increase the ovicidal activity. The value of substituenthydrophobicity constant at para position should be with in1.98 – 2.90 for the optimal activity. Lipophilicity of thewhole molecule also plays a dominant role. Models gener-ated from 3D descriptors suggest that the shape of the sub-stituents should be optimum and lipophilic substituentshaving electronegative atoms with distributed positivecharge over the surface may enhance the ovicidal activity.The model obtained from MFA suggests that bulky sub-stituents with optimally distributed charge may increasethe ovicidal activity. The results of our present study maybe useful for the design and development of novel com-pounds having better ovicidal activity against T. urticae.

References

[1] http://en.wikipedia.org/wiki/Insecticides.[2] http://en.wikipedia.org/wiki/Insecticide.[3] http://en.wikipedia.org/wiki/Red_spider_mite.[4] A. C. Grosscurt, M. T. Harr, B. Jongsma, A. Stoker, Pestic.

Sci. 1988, 22, 51 – 59.[5] B. S. Clarke, P. J. Jewess, Pestic. Sci. 1990, 28, 377 – 388.[6] L. Nippon Soda Co., Japan Pestic. Information 1984, 44,

21 – 24.[7] J. Suzuki, I. Tanji, Y. Ota, K. Toda, Y. Nakagawa , J. Pestic.

Sci. 2006, 31, 409 – 416.[8] G. F. Yang, X. Huang, Curr. Pharm. Des. 2006, 12, 4601 –

4611.[9] Y. Q. Zhu, C. Wu, H. Li, X. M. Zou, X. K. Si, F. Z.

Hu,H. Z. Yang , J. Agric. Food Chem. 2007, 55, 1364 – 1369.[10] J. G. Wang, Z. M. Li, N. Ma, B. L. Wang, L. Jiang, S. S.

Pang, Y. T. Lee, L. W. Guddat, R. G. Duggleby, J. Comput.Aided Mol. Des. 2005, 19, 801 – 820.

[11] X. J. Zou, G. Y. Jin, Z. X. Zhang, J. Agric. Food Chem.2002, 50, 1451 – 1454.

[12] A. J. Grodnitzky, R. J. Coats, J. Agric. Food Chem. 2002, 50,4576 – 4580.

[13] Cerius2 Version 4.10 is a product of Accelrys Inc., San Die-go, CA.

[14] L. Eriksson, J. Jaworska, A. P. Worth, M. T. D. Cronin,R. M. McDowell, Environ. Health Perspect. 2003, 111,1361 – 1375.

[15] R. Guha, P. C. Jurs, J. Chem. Inf. Model. 2005, 45, 65 – 73.[16] J. T. Leonard, K. Roy, QSAR Comb. Sci. 2006, 25, 235 – 251.[17] K. Roy, Expert. Opin. Drug Discov. 2007, 2, 1567 – 1577.[18] B. Everitt, S. Landau, M. Leese, Cluster Analysis, Arnold,

London 2001.

424 � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.qcs.wiley-vch.de QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425

Full Papers Kunal Roy and Somnath Paul

Page 20: Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae

[19] E. R. Dougherty, J. Barrera, M. Brun, S. Kim, R. M. Cesar,Y. Chen, M. Bittner, J. M. Trent, J. Comput. Biol. 2002, 9,105 – 126.

[20] A. R. Johnson, W. D. Wichern, Applied Multivariate Statisti-cal Analysis, 5th edition 2005, 668 – 730.

[21] A. J. Hopfinger, J. S. Tokarsi, In: Charifson PS (Ed.), Practi-cal Applications of Computer-Aided Drug Design, MarcelDekker, New York 1997, 105 – 164.

[22] D. Rogers, A. J. Hopfinger, J. Chem. Inf. Comput. Sci. 1994,34, 854 – 866.

[23] A. Hirashima, T. Eiraku, E. Kuwano, M. Eto, Internet. Elec-tron. J. Mol. Des. 2003, 2, 511 – 526.

[24] S. Wold, In: van de Waterbeemd H, (Ed.), ChemometricMethods in Molecular Design, Weinheim, VCH 1995, p. 195.

[25] Y. Fan, L. M. Shi, K. W. Kohn, Y. Pommier, J. N. Weinstein,J. Med. Chem. 2001, 44, 3254.

[26] P. P. Roy, J. T. Leonard, K. Roy, Chemom. Intell. Lab. Sys.2008, 90, 31 – 42.

[27] H. Kubinyi, F. A. Hamprecht, T. Mietzner, J. Med. Chem.1998, 41, 2553 – 2564.

[28] G. R. Marshall, in: H. Kubinyi (Ed.), 3D QSAR in DrugDesign – Theory, Methods and Applications, ESCOM, Lei-den 1994, 117 – 133.

[29] S. Deswal, N. Roy, Eur. J. Med. Chem. 2006, 11, 1339 – 1346.[30] MINITAB is a statistical software of Minitab Inc., USA.[31] STATISTICA is a statistical software of STATSOFT Inc.,

USA.[32] SPSS is a statistical software of SPSS Inc., USA.[33] A. Ghose, V. N. Viswanadhan, J. J. Wendoloski, J. Phys.

Chem. 1998, 102, 3762 – 3772.[34] E. Kolossov, R. Stanforth, SAR QSAR Environ. Res. 2007,

18, 89 – 100.[35] S. Sagradoa, M. T. D. Cronin, Anal. Chim. Acta 2008, 609,

169 – 174.[36] A. Golbraikh, A. Tropsha, J. Mol. Graph. Model. 2002, 20,

269 – 276.[37] P. P. Roy, K. Roy, QSAR Comb. Sci. 2008, 27, 302 – 313.[38] P. P. Roy, K. Roy, Chem. Biol. Drug Des. 2008, 72, 370 – 382.[39] L. Zhang, H. Zhu, T. Oprea, A. Golbraikh, A. Tropsha,

Pharm. Res. 2008, 25, 1902 – 1914.

QSAR Comb. Sci. 28, 2009, No. 4, 406 – 425 www.qcs.wiley-vch.de � 2009 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim 425

Exploring 2D and 3D QSARs of 2,4-Diphenyl-1,3-oxazolines for Ovicidal Activity Against Tetranychus urticae