10
india n Journal of Chemist ry Vol. 42A, June 2003. pp. 1426- 1 435 Use of distance-based topological indices in modeling antihypertensive ac ti vity: Case of 2-aryl-imino-imidazolidines Vijay K Agrawal'*, Sneha Ka rmarkar h , Padmakar V Khadikar b & Shac hi Shrivastava" " QSAR and Computer Chem ical Laborator ies, APS University, Rewa 486 003, India, e-mai l: vijay-agrawa l@lycos.com hResearch Division, Laxmi Fumigat i on and Pest Contro l Pv t. Ltd., 3 Khatipura, Indore 452007, India. and Istavan Lukovits Che mical Center,Hungarian Academy of Sciences, H-1525 l3udapest P.O.B. 17. Hung ary e-mail: lukovit s@c hemre s. hu Received 31 OClOber 2002 This paper describes topol og ical mode ling of antihypertensive activity of 2-aryl-imino-illlidazolidines. A large pool of distance-based topological indices cons isting of W. B, X, J, Sz, and l og RB is initially used for this purpose. An exce ll e nt model is obtained in multi-parametric regression containing X and logRB along with two indicator parameters. A basic problem in drug design consists of finding a co mpound sat isf ying various constrains defined over a spectrum of chemical and biological properties. Although the problem of designing drugs pervades much of pharmaceutical research, statisticians have yet to become significantly involved in this important realm of research. In constructing graph theoretical schemes to "traditional" quantitative structure-actIvIty relationship (QSAR ) methods 1 ·5 one must not be wary of using a com plementary approach. Traditional QSAR is usually based on a large number of emp irical parameters 1-7. The graph theoretical approac h involves (a rather small set of) structural or grap h invariants. In QSAR, one uses statistical methods in order to select critical descriptors and demonstrate a structure-activity correlation. In graph theory, one manipulate s a structure algebraically, using partial order and ranking based on se lected standards. Of course, graph theoretical descriptors also yield structure-property or st ructure-acti vity correlations 8 IO . Application of graph theory in QSAR covers a variety of topics, from the study of various physicochemical data to biological activity and toxicity including graph theoretical descriptors and pattern recognition 1.10. The prime distinction between grap h theoretical schemes and traditional QSAR is that the former is "structure exp licit" while the latter is "structure cryptic". The former uses well defined mathematical invariants which have a direct structural interpretation while the latter are mo stly expressed in terms of properties that remain to be interpreted structurally. The combination of topological QSAR methodology with experience and intuition of experts in drug design may result in a much more organized search for the novel drugs for human , animal, and plant therapy. It has been known for some time that certain invariants of molecular graph - usually referred to as topological indices - can be use d to demon strate QSAR in pharmacology . On e such index is the Wiener index (W) introduced 50 years ago by the American chemist Harry Wienerll. W is now considered to represent a measure of com pactness of the molecu les. However , only recent ly its relation with the molecular van der Waal s area has been demonstrated 12 . Recent ly, a new topological index introduced by Gutman 13. 14 was named as Szeged index 14 and abbreviated as Sz. This index is considered as a modification of W for cyclic graphs. Some of the properties of Sz are reported in the litera ture l ). 17 but very few applications of Sz in QSAR studies are known I8 . 22 .

Use of distance-based topological indices in modeling …nopr.niscair.res.in/bitstream/123456789/20674/1/IJCA 42A... · 2013. 8. 20. · AGRAWAL eI 0/: USE OF DISTANCE-BASED TOPOLOGICAL

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • indian Journa l of Chemistry Vol. 42A, June 2003. pp. 1426- 1435

    Use of distance-based topological indices in modeling antihypertensive activity : Case of 2-aryl-imino-imidazolidines

    Vijay K Agrawal'*, Sneha Karmarkar h, Padmakar V Khad ikar b & Shachi Shrivastava" " QSAR and Computer Chemical Laboratories, APS University, Rewa 486 003, India,

    e-mai l: [email protected]

    hResearch Division, Laxmi Fumigat ion and Pest Control Pv t. Ltd., 3 Khatipura, Indore 452007, India.

    and

    Istavan Lukovits

    Chemical Re~earch Center,Hungarian Academy of Sciences, H-1525 l3udapest P.O.B. 17. Hungary e-mail: lukovits@chemres. hu

    Received 3 1 OClOber 2002

    This paper describes topolog ical mode ling of antihypertensive activity of 2-aryl-imino-illlidazolidines. A large pool of

    distance-based topological indices consisting of W. B, X, J, Sz, and log RB is initially used for this purpose. An excell ent

    model is obtained in multi -parametric regression containing X and logRB along with two indicator parameters.

    A basic problem in drug design consists of finding a compound sat isfy ing various constrains defined over a spectrum of chemical and biological properties. Although the problem of designing drugs pervades much of pharmaceutical research, statisticians have yet to become significantly involved in this important realm of research.

    In constructing graph theoretical schemes to "traditional" quantitative structure-actIvIty relationship (QSAR) methods 1·5 one must not be wary of using a complementary approach. Traditional QSAR is usually based on a large number of empirical parameters 1-7. The graph theoretical approach involves (a rather small set of) structural or graph invariants. In QSAR, one uses statistical methods in order to select critical descriptors and demonstrate a structure-activity correlation. In graph theory, one manipulates a structure algebraically, using partial order and ranking based on selected standards . Of course, graph theoretical descriptors also yield structure-property or structure-acti vity correlations8• IO.

    Application of graph theory in QSAR covers a variety of topics, from the study of various physicochemical data to biological activity and toxicity including graph theoretical descriptors and pattern recognition 1.10. The prime distinction between graph theoretical schemes and traditional QSAR is

    that the former is "structure explicit" whi le the latter is "structure cryptic". The former uses well defined mathematical invariants which have a direct structural interpretation while the latter are mostly expressed in terms of properties that remain to be interpreted structurally. The combination of topological QSAR methodology with experience and intuition of experts in drug design may result in a much more organized search for the novel drugs for human, animal, and plant therapy.

    It has been known for some time that certain invariants of molecular graph - usually referred to as topological indices - can be used to demonstrate QSAR in pharmacology . One such index is the Wiener index (W) introduced 50 years ago by the American chemist Harry Wienerll. W is now considered to represent a measure of compactness of the molecu les. However, only recent ly its relation with the molecular van der Waals area has been demonstrated 12 .

    Recent ly, a new topological index introduced by Gutman 13. 14 was named as Szeged index 14 and

    abbreviated as Sz. This index is considered as a modification of W for cyclic graphs. Some of the properties of Sz are reported in the literature l). 17 but very few applications of Sz in QSAR studies are known I8.22 .

  • AGRAWAL eI 0/: USE OF DISTANCE-BASED TOPOLOGICAL INDICES IN MODELING 1427

    Nowadays, the usual practice in developing QSAR models is to initially use a large set of topological indices . If needed, additional structure related molecular descriptors and/or indicator parameters can also be used. [n the present study we have, therefore used six distance-based topological indices: Wiener (W)-, branching (B)-, first-order connectivity (x)-Balaban (1)-, Szeged (Sz)- and logRB indices. In addition we have also used initially a set of four indicator parameters, the details of which are given in the next section23.

    Materials and Methods (i) Definitions of Wiener (W) , Szeged (Sz), Balaban (J) alld the molecular connectivity indices (X)

    Let G be the usual, hydrogen atom depleted, graph representation of the molecule under consideration24 . Hence, G is a connected graph without directed and multiple edges and without loops. V(G) and E(G) denote the vertex and edge sets of G, respectively . If c is an edge of G, connecting the vertices u and v, then e =uv. The number of vertices of G is denoted by iGi .

    The distance between the vertices of G is defined as usual 24 : the distance d(u, viG), between two vertices u and v of G is equal to the length of the shortest path connecting these vertices.

    Wiener index(W) The Wiener index (W)II of a graph G is just the

    sum of distances of all pairs of vertices of G:

    W=W(G)=1/2 ~ ~ d(v,uiG) VE V(G)UE V(G)

    =I12~ d(viG) VEV(G)

    ... (1)

    where, d(viG) is called the distance number of vertex v and is defined as below.

    d(viG)= ~ d(v,uiG) uE V(G)

    Szeged Index (Sz)

    ... (2)

    Let e = UVE E(G).Then we define two subsets of vertex set of G as follows:

    N l,eIG) = {x E;: V(G)ld(x, uiG) > d(x, viG)} N2(eiGj = {{(E V(G)id(x, uiG) > d(x, viG)}

    (3)

    (4)

    The number of elements in set NI(e iG) and N2(e iG) are denoted by nl(e iG) and n2(e iG),

    respectively . Thus, nl(e iG) counts the vertices of G lying closer to vertex u than to v. The meaning of n2(e iG) is analogous. The vertices equidi stant from both ends of the edge uv belong neither to N l(eiG) nor to N2(eiG)

    The Szeged index l3.14 of the graph G is the sum of all edge contributions:

    Sz(G) = Sz = ~ niCe iG) n2(e iG) eEE{G)

    Balaban index (J)

    .. . (5 )

    The Balaban index, J (the average distance sum connectivity index) is defined25. 26 by :

    . . . (6)

    where M is the number of bonds in a graph G, Il is the cyclomatic number of G and dj 's(i=1,2,3, ... ,N) are the distance sums (distance degrees) of atoms in G such that

    N

    dj = I(O\ ... (7) j = 1

    The cyclomatic numbeI: Il of G indicates the number of independent cycles in G and is equal to the minimum number of cuts (removal of bonds) necessary to convert a polycylic structure into an acyclic structure:

    Il=M-N+l . .. (8)

    One way to compute the Balaban index (J) for hetero-system is to modify the elements of the distance matrix for hetero-system as follows: (i) The diagonal elements:

    ... (9)

    where Zc = 6 and Zj = atomic number of the given element.

    (ii) The off-diagonal elements:

    N

    I(O)jj=dj = Ik, ... (10) j =1

    where the summation is over all bonds . The bond parameter k, is given by:

  • 1428 INDIAN J CHEM, SEC. A, JUNE 2003

    where br is the bond weight with values: 1 for s ingle bond, 2 for double bond, 1.5 for aromatic bond and 3 for triple bond.

    Molecular connectivity indices (X) The connectivity index X = X(G) of a graph G IS

    defined by Randic3. 4. 27 as follows :

    ... (11)

    where OJ and OJ are the valence of a vertex i and j, equal to the number of bonds connected to the atoms i andj, in G.

    In the case of helero-systems the connectivity is given in terms of valence delta values OjV and 8/ of atoms i and j and is denoted by Xv. This version of the connectivity index is called the valence connectivity index and is defined3.4. 27 as:

    . .. (12)

    where the sum is taken over all bonds i-j of the molecule. Valence delta values are given by the following expression:

    0" = Z;v - Hi I Z-Z-l

    I J

    . .. ( 13)

    where Zj is the atomic number of atom i, Zjv is the number of valence electron of the atom i and Hi is the number of hydrogen atoms attached to atom i.

    Nowadays the connectivity and the valence connectivity indices expressed by Eqs. (II) and (12) are termed as first-order connectivity and first-order valence connectivity indices respectively . Lower or higher order indices are also possible which are defined analogously .

    The branching index log RB has been calculated by the method as described by Todeschini et al. 8· 10

    (ii) Indicator parameters These are dummy parameters that are sometimes

    used to obtain better (i.e. statistically more significant) QSAR models in mu ltivariate regression

    analysis. In the present study we have used four such dummy parameters (indicator parameters) lp" Ip2, Ip3 and Ip4 The indicator parameter, Ip" is equal to one unit if a chloro-group is present, otherwise its value is zero. If a methyl-group is present the indictor is IP2 and is equal to one while in the absence of a methyl group, Ip2 is zero. Ip3 and Ip4 are equal to one if two or more methyl- or chloro-groups are present, respectively, otherwise their values are zero.

    (iii) Statistics M I · I . I ' 28 29 C I . u tiP e regressIOn ana YSls ' lor corre atmg

    antihypertensive activity of the compounds under present study were done using Regress-l software complied by one of the authors (IL).

    Results The formulae of 2-aryl-imino-imidazolidines, their

    antihypertensive activity (expressed as log lIEDso), and the indicator parameters (lp" Ip2, Ip3 and Ip4) are listed in Table 1.

    Table 2 shows the distance-based topological indices used in the present study .

    Table 1- 2-Ary l-imino-imidazolidines used in the present study, the ir log( I/ED.lo)and indicator values.

    Compd R log Ipi IP2 Ip3 Ip4 No. ( I/ED.lo)

    2,6-Di-CI 2.14 0 0 2 2,4.6-Tri-Cl 1.41 0 0 3 2,3,-Di-CI 1.37 0 0 4 2,6-Di-CI,4-Me 1.22 0 I 5 2-CI ,6-Me 1.18 I 0 0 6 2,6-Di-Me 0.85 0 I I 0 7 2,4-Di-CI 0.68 0 0 I 8 2-CI,4-Me 0.68 I 0 0 9 2,4-Di-CI,6-Me 0.57 0 1 10 2,4-Di-M~,6-CI 0.52 I 1 0 \I 2,5-Di-CI 0.32 0 0 I 12 2-CI 0.15 0 0 0 13 2,6-Di-Me ,4-CI -0.04 1 \J

    14 2-Me ,4-CI -0.05 I 0 0

    15 2,4,6-Tri-Me -0.07 0 1 0

    16 2,4-Di-Me -0.56 0 1 0

    17 2-Me -0.61 0 I 0 0

    18 H -2.10 0 0 0 0

    /

  • AGRAWAL et at: USE OF DISTANCE-BASED TOPOLOGICAL INDICES IN MODELING 1429

    Table 2- Values of topological indices calculated for compounds used in the present study

    Compd. No. W X J Sz log RB

    I 301 6.7709 1.8619 437 94.6795

    2 365 7.4617 1.8830 534 113.8738

    3 306 6.7709 1.8270 447 95.5756

    4 365 7.1647 1.8830 534 113.8738

    5 301 6.7709 1.8619 437 94.6795

    6 301 6.7709 1.8619 437 94.6795

    7 313 6.7540 1.7867 461 96.8441

    8 313 6.7540 1.7867 461 96.8441

    9 365 7.1647 1.8830 534 113.8738

    10 365 7.1647 1.8830 534 113.8738

    II 308 6.7540 1.8156 451 96.0864

    12 253 6.3602 1.7759 370 79.0361

    13 365 7.1647 1.8830 534 113.8738

    14 313 6.7540 1.7867 461 96.8441

    15 365 7. 1647 1.8830 534 113.8738

    16 313 6.7540 1.7867 461 96.8441

    17 253 6.3602 1.7759 370 79.0361

    18 209 5.9495 1.6943 309 64.7790

    Table 3--Correlation matrix for the inter-correlation of structural descriptors and their correlation with the activity

    log( IIED50) w X J

    log( IIED50) I .()()()()

    W 0.4844 I.()()()()

    X 0.5576 0.9930 I.()()()()

    J 0.6704 0.8522 0.9029 I.()()()()

    Sz 0.4662 0.9995 0.9887 0.8346

    logRB 0.5015 0.9993 0.9962 0.8707

    Ipi 0.6165 0.3819 0.3917 0.3291

    Ip2 -0. 1216 0.3936 0.3904 0.3344

    Ip3 -0.1906 0.3756 0.3751 0.3544

    1124 0.5803 0.3024 0.3141 0.2936

    Correlations between the aforementioned molecular descriptors and antihypertensive activity (log I/EDso) are given in Table 3.

    Table 4 · records the statistical parameters and ____ Quality of various statistically significant uni- and . ..........

    multiv~ate regression equations. Table ~oLlects different significant models. The esti~ated .antihypertensive activities values

    (log l1ED50) obtained fron... most significant QSAR models are presented in Tabk~ 6 and are compared with observed values.

    Finally, Fig. 1 displays the correlation between observed and estimated (obtained ·by using the most significant correlation expressions) antihypertensive activities (log lIEDso)·

    Sz logRB Ipi Ip2 Ip3 Ip4

    I.()()()()

    0.9976 I.()()()()

    0.3805 0.3821 I.()()()()

    0.3939 0.3919 -0.2403 I.()()()()

    0.3738 0.3775 -0.4462 0.4947 I.()()()()

    0.2991 0.3060 0.4947 -0.5325 -0.4947 I.()()()()

    Discussion The data in Table 1 shows that degeneracy exists in

    the antihypertensive activity (log IIED50). The data in Table 2 also indicate that similar type of degeneracy exists in the distance-based topological indices. The degeneracy in these indices is obvious because these indices belong to the first generation topological indices as described by Balaban30•

    It is worth mentioning that the magnitude of all topological indices used increases if the molecule becomes (through substitution) bigger. Mono-substituted compounds exhibit lower while tri-substituted compounds exhibit greater values of these indices.

  • 1430 IND IAN J CHEM. SEC. A. JUNE 2003

    Table 4--Regress ion parameters and quality or the proposed model s

    Compd. Parameters Ai (8) (Se) (R2) (R) F- Q Prob. No. used i = 1.2.3,4 Ratio (RISe)

    I. J 11.6041(±3.2 105) -20.7965 0.7312 0.4495 0.6704 13.064 0.9168 2.327xlO-3

    2. X 21.8397(±4.6460) - 100.5X36 0.559lJ 0.6973 0.8351 17.280 1.4915 1.280x 104

    logRB -0.4847(±0.1108) 3. X 1.0620(±0.5653) -7.5071 0.7210 0.4981 0.7058 7.444 0.9789 5.681 x 10-

    3

    Ipi 0.9753(±0.4124 ) 4. X 2.0300(±0.5932) - 12.9252 0.7570 0.4467 0.6684 6.056 0.8829 0.0118

    IP2 -0.7628(±0.3976) 5. X I. I 842(±0.5506) -7.9678 0.7246 0.4931 0.7020 7.295 0.9688 6.124x I0-

    J

    Ip4 0.8567(±0.3690) 6. J 9.0756(±2.8973) -16.8373 0.6231 0.6252 0.7907 12.511 1.2690 6.359x I0-4

    Ipi 0.9208(±0.3472) 7. J 14.6081 (±2.7900) -26.0082 0.5942 0.6592 0.8 119 14.507 1.3664 3. 117xlO-l

    Ip3 1.0 I 58(±0.3344 )

    8. Log RB 0.0211(±0.0139) -2.3944 0.7459 0.4628 0.6803 6.462 0.91 20 9.457xlO-3

    Ipi 1.0320(±0.4248) 9. Log RB 0.0440(±0.0 151) -3.4594 0.8072 0.3710 0.6091 4.425 0.7546 0.039

    IP2 0.7 I 63(±0.4242)

    10. Log RB 0.0453(±0.0 144) -3.7746 0.7753 0.4198 0.6479 5.426 0.8357 0.0169 Ip3 0.1190(±0.4406)

    II. Log RB 0.0242(±0.0 136) -2.3047 0.7531 0.4525 0.6727 6.199 0.8932 0.0101 Ip4 0.8976(±0.3824)

    12. w 0.0062(±0.0044 ) -2.2767 0.7530 0.4526 0.6728 6.201 0.8935 0.0109 lpi 1.0479(±0.4288)

    13. W 0.3584(±0.1134) -2.2962 0.6976 0.5302 0.7282 8.465 1.0439 3.462xI0-J

    Sz 0.2389(±0.0778) 14. w 0.0072(±0.0043 ) -2. 1977 0.7604 0.4418 0.6647 5.936 0.8741 0.0126

    IP4 0.9100(±0.3857) 15. Sz 0.0039(±0.0030) -2. 1587 0.7597 0.4428 0.6654 5.960 0.8759 0.0125

    Jp l 1.0650(±0.4323) 16. Sz 0.0047(±0.0030) -2.0870 0.7679 0.4308 0.6563 5.676 0.8547 0.0146

    1p4 O. 9228(±0.3891) 17. w 0.3428(±0.0894 ) -1.8070 0.5492 0.7282 0.8533 12.502 1.5537 3.006xlO-4

    Sz -0.2309(±0.06 13) lp i 0.9997(±0.3130)

    18. w 0.3296(±0.0956) -1.7925 0.5845 0.6922 0.8320 10.493 1.4234 7.034xlO-4 Sz -0.2210(±0.0654) Ip4 0.8088(±0.2980)

    19. w 0.0053(±0.0042) -2 .0113 0.7208 0.5319 0.7293 5.303 1.0118 0.0119 Ip i 0.7464(±0.4547) ---- -------Ip4 0.6237(±0.4051 ) QL 20. Sz 0.0033(±0.0029) 1.8956 0.7268 0.5240 0.7239 5.137 0.9960 Ip i 0.7591 (±0.4585) Ip4 0.631O(±0.4083)

    21. X 27.6660(±8.0 179) -85 .7528 0.5594 0.71 80 0.8474 11 .883 1.5148 3.865xlO-4

    J -27 .0832(±13.27 1 I ) Sz -0. 1140(±0.0320)

    22. X -1.4589(±0.0424 ) -21.0851 0.604 1 0.6712 0.8 19:' 9.527 1.3562 1.10l x 10-3

    J 16.7831 (±6.1820) Ip i 1.0337(±0.3462) COIlId.

  • AGRAWAL elal: USE OF DISTANCE-BASED TOPOLOGICAL I DICES IN MODELING 1431

    Table 4--Regression paramctcrs and quality of the propos.:d mociels-Collici.

    Compd. No.

    Parameters used

    Ai (B) (Sc) i = 1.2,3,4

    (R) F-Ratio

    Q (J

  • 1432

    Model No.

    I

    2 3 4 5. 6. 7. 8. 9. 10. II. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 3 1. 32.

    INmAN J CHEM, SEC. A. JUNE 2003

    Table 5--Various correlation models and their qualities of correlations

    Regression expression

    log(ED50) = 11.6041(±3.2 105)J-20.7965 log(ED50)= 21 .8397(±4.6460)X+-O.4847(±o.1 108)log RB+ 100.5836 log(ED50)= 1.0620(±o.5653)x+O.9753(±o.4124)lpl +7.5071 log(ED50)= 2.0300(±o.5932)x-0.7628(±o.3976)lp2-12.9252 log(ED50)= I. I 842(±o.5506)X+O.8567(±o.3690)1p4-7 . 9678 log(ED50)= 9.0756 (±2.8973)J+O.9208(±o.3472)lp 1-16.8373 log(ED50) = 14.6081 (±2.7900)J + 1.0 I 58(±o.3344 )lp3+26.0082 log(ED50) =0.021 I (±o.0139)log RB+(±)1P1-2.3944 log(ED50)= O.044O(±o.O 151 )Iog RB+O.7163(±o.4242)lp2-3.4594 log(ED50)= 0.0453(±o.0144)log RB+O.119O(±o.4406)lp3-3.7746 log(ED50)= 0.0242(±o. 136)log RB+O.8976(±o.3824)1p4-2.3047 log(ED50)= 0.0062(±o.0044)W+ 1.0479(±o.4288)lpl +2.2767 log(ED50)=O.3584(±o. II34)W+O.2389 (±o.0778)Sz-2.2962 log(ED50)=O.0072(±o.0043)W+O.9100 (±o.3857)1p4-2.1977 log(ED50)= 0.0039(±o.0030)Sz+ 1.0650(±o.4323)lp 1-2.1587 log(ED50)= 0.0047(±o.0030)Sz+O.9228(±o.389I )1p4-2.0870 log(ED50)= 0.3428(±o.0894)W -O.2309(±o.0613)Sz+O.9997(±o.3130)lp 1- 1.8070 log(ED50)= 0.3296(±o.0956)W-0.22I O(±o.0654)Sz+O.8088(±0.2980)1p4- I. 7925 log(ED50)= 0.0053(±o.0042)W+O.7464(±o.4547)IP I +O.6237(±o.405I )1p4-2.0 113 log(ED50)= 0.0033(±o.0029)Sz+O.759 I (±o.4585)lP I +o.6310(±o.4083)1p4+ 1.8956 log(ED50)= 27.666O(±18.0179)X-27.0832(±13.2711)J-0.114O(±o.0320)Sz-85.7528 log(ED50)= -1.4589(±o.0424)X+ 16.7831 (±6.1820)J+ 1.0337(±o.3462)lpl -21.085I log(ED50)= 11.3886(±2.2930)X-0.0532(±o.0 I I 7)Sz+O.9 I 44(±o.27 I O)lp 1-53.21 02 log(ED50)=20.1589(±3.5781 )X+O.4553(±o.0849)log RB+O.8426(±o.2456)1 P 1-92.6412 log(ED50)=21 .8309(±3.7442)X-0.4745(±0.0893)log RB-0.7I 52(±o.237I )lp2-1 0 1.0870 log(ED50)=21.6818(±3.2809)X-O.4698(±o.0783)log RB+O.90 II (±o.224 7)lp3-1 00.71 97 log(ED50)=20.4524(±3.5339)X-O.46OO(±o.0840)log RB+O.7572(±o.2163)1p4-93.8642 log(ED50)=21 .711 0(±2.8883)x-O.4667(±o.0689)log RB-0.4507(±o.2003)lp2+O.702 (±o.2166)lp3- 1 01.0070 log(ED50)=1 9.6818(±2.9852)X-0.4466(±o.0708)log RB+O.5884(±o.2254)lp I +O.5372(±o.2004)Ip4-·90.2705 log(ED50)=22.2253(±6.5808)X-18.8563(± I 0.8269)J-0.0944(±o.026I )Sz+0.80 17(±o.2613)lp 1-73.38 11 log(ED50)=20.6144(±3.2623)X-0.4568(±o.0772)log RB+O.6114(±o.2520)lp 1-0.4564(±o.2303)Ip2+95.1417 log(ED50)=O.3282(±o.0833)W+O.2214(±o.057I )Sz+O.7497(±o.3212)lp I +0.5211 (±0.2874)1p4-1.6047

    mono-parametric models are not adequate for modeling the pharmaceutical activity.

    At this stage we would like to emphasize that the use of correlation coefficient, R, or coefficient of determination, R2, as the sole criterion for a quality of regression is deficient and can be misleading. Hence, the conclusions based on R or R2 have to be taken with due reservation. It is desirable to verify such correlation with some other statistical criteria. One such criterion is the standard error of estimation (Se) and/or F-ratio. On the basis of R and Se a quantity named as quality factor (Q) is proposed in the literature31 , which is the ratio of correlation coefficient (R) to the standard error of estimate (Se) viz., Q = R/Se. We have, therefore used Q-values for describing the quality of statistically significant correlations.

    In view of the above fact we have used the maximum R2 improvement method28.29 to derive prediction models. This method finds the "best" one variable model, the "best" two variable model and so forth for the prediction of property/activity relationship. Several models (combinations of variables) were examined to identify combinations of variables with good prediction capabilities. In all regression models developed in order to obtain the most reliable results, we have examined a variety of statistics associated with residues, i.e. the Wilks-Shapiro test for normality and Cooks D-statistics for outliers28,29.

    The regression parameters as well as quality of statistically significant correlations are given in Table 4.

    Table 4 shows that only the mono-parametric model is possible by using the Balaban index (1):

  • AGRA WAL e/ al : USE OF DISTANCE-BASED TOPOLOGICAL INDICES IN MODELING 1433

    Table 6-Estimated log( IIED50) values using model-28 and -29 and their comparison with the observed ones.

    log 1/EDso = -20.7965 +1 1.6041 (±3 .2105)J . .. (14) n=18, Se=0.7319, r=0.6704, F=13 .064, Q = 0.9168

    Compd Obs No. log( lIED50)

    I 2.140

    2 1.410

    3 1.370

    4 1.220

    5 1.180

    6 0.850

    7 0.680

    8 0.680

    9 0.570

    10 0.520

    II 0.320

    12 0. 150

    13 -0.040

    14 -0.050

    15 -0.070

    16 -0.560

    17 -0.610

    18 -2. 100

    E ~

  • 1434 INDI AN J C HEM, SEC. A, JUN E 2003

    common fit in all regression analysis in describing descriptors that are highly inter-correlated. He further stated that by discarding one of the descriptors that commonly duplicates another we may be di scarding a descriptor that nevertheless carries useful structural information in a way that does not parallel other descri ptors.

    Thus, following Randic33, we may say that in the referred bi-parametric model containing X and log RB , their information contents may be different. However, such unknown information content is yet to be investigated. Randic claims that in spite of high collinearity between X and log RB , the bi-parametric regression equations can be considered stati stically justified. Another resu lt in favor of thi s fi nding is that coefficients of both X and log RB are considerably higher than respective standard deviations and that such model(s) are considered stati stically signi ficant.

    Successive regression analysis resulted into several three-parameter models (Table 4). Out of models containing X, log RB , and Ip3 is found to be better than the bi-parametric model discussed above. The best three-parametric model is :

    log lIED50 = -100.7197 +2 1.6818(±3.2809) X -0.4698(±0.0783) logRB -0.9011(±0.2247)Ip3 (16)

    n=18, Se=0.3954, R=0.9269, F=28.467 , Q=2.3442

    Once again thi s model also contains highly linearly correlated X and 10gRB indices. However, its stati stical relevance is due to the Randic arguments made above. In addition, this model contains an indicator parameter Ip3 which accounts for the presence of multiple -CH3 group in the drug moiety. The negative sign associated with the coefficient of Ip3 term in the above model shows that multi-substitution of methyl groups have adverse effect on antihypertensive activity of the compounds.

    Further, stepwise regression once again resulted into several four-parameter models (Table 4), out of which the two model s containing (i) X, log RB, Ipl, Ip4 and (ii) X, log RB , Ip2, Ip3 ; respectively were fo und to be optimal :

    log 11ED50= -90.2705+19.6818(±2.9852) X -0.4466(±0.0708)log RB +0.5884(±0.2254)Ipl +0.5372(±0.2004)Ip4 ... (17)

    n =18, Se=0.3557, R=0.9456, F=27.443, Q=2.6584

    log lIED50= -101.0050+21.7110 (±12.8883) X -0.4667(±0.0689)log RB

    0.4507(±0.2003)lp2+0.7027(±6.2166)lp3 (18) n = 18, Se=0.348I , R=0.9480, F=28.8l4, Q=2.7234

    The statistics involved show that thi s latter model (Eq.18) is the most appropriate one for modeling the antihypertensive activity . This model shows the dominating influence of methyl group In the exhibi tion of antihypertensive activity of the compounds used.

    Lookjng to the size of the sample we cannot attempt still higher regression analysis. This is because there is thumb rule stating that the number of descriptors to be used in providing the statistically significant model should be at least one fourth of the compounds involved in a set. In our ca')e there are 18 compounds, hence, at the most four descriptors can be used.

    In order to confirm our findings we have estimated antihypertensive activities from these two best models and compared the results with observed activities. Such comparison is shown in Table 6. The correlation between estimated activities and experimental ones and the res idue i.e., the difference between the observed and estimated activity supports our proposi tion that the model expressed by Eq. 10 is the best. The predicti ve correlation coeffic ient (rpred = 0.894 and 0.899) for the models expressed by Eqs 17 and 18 respectively confirms our findings.

    Conclusion From the above study, it may be concluded that out

    of the pool of distance-based topological indices X and 10gRB are the most appropriate indices for modeling, monitoring, and estimating antihypertensive activity of the compounds used.

    Acknowledgement One of the authors (PVK) is highly obliged and

    thankful to Prof. Ivan Gutman for introducing him to the fascinating field of Chemical Topology and Graph Theory .

    References Kier L B & Hall L H, Advances in drug research, (Academic Press, New York) 1992.

    2 Chemical applications of topology and graph theory, edited by R B King, (Elsevier, Amsterdam) 1983.

  • AGRAWAL et al: USE OF DISTANCE-BASED TOPOLOGICAL lNDICES IN MODEUNG 1435

    3 Kier L B & Hall L H, Molecular cOlUlectivity in structure-activity reLationship, (Wiley, New York), 1986.

    4 Kier L B & Hall L H, MoLecular connectivity in chemistry and drug research, (Academic Press, New York), 1976.

    5 Chemical applications of graph theory, edited by A T Balaban (Academic Press, London) 1976.

    6 Trinajstic N, Chemical graph theory, (CRC Press, Boca-Raton), 1983.

    7 Trinajstic N, Chemical graph theory, (CRC Press, Boca -Raton), 1992.

    8 Topological indices and related descriptors in QSAR and QSPR, edited by J Devilliers & A T Balaban (Gordon & Breach, Amsterdam) 1999.

    9 Comparative QSAR, edited by J Devilliers (Tailor and Francis, Washington), 1998.

    IO Todeschini R & Consonni V, Handbook of moLecular descriptors, (Wiley-VCH, Weinheim) 2000.

    II Wiener H, J Am chem Soc, 69 (1947) 17. 12 Gutman I, Yeh Y N, Lee S L & Luo Y L, Indian J Chem,

    32A (1993) 651. I3 Gutman I, Graph Theory Notes New York, 27 (1994) 9. 14 Khadikar P V, Deshpande N V, Kale P P, Dobrynin A,

    Gutman I & Domolor G, J chem InfComput Sci, 35 (1995) 547.

    15 Gutman I, Popovic L, Khadikar P V, Karmarkar S, Joshi S & M Mandloi, MATCH - COllIDI math comput Chem 35 (1997) 91.

    16 Gutman I, Khadikar P V, Rajput P V & Karmarkar S, J Serb chem Soc, 60 (1995) 759.

    17 Gutman I, Khadikar P V & Khaddar T, MATCH-COIlIDI math comput Chern, 35 (1997) 105.

    18 Khadikar P V, Karmarkar S. Joshi S & Gutman L J Serb chem Soc, 61 (1996) 89.

    19 Karmarkar S, Karmarkar S, Joshi S, Das A & Khadikar P V, J Serb chem Soc, 62 (1997) 227.

    20 Agrawal V K & Khadikar P V, Bioorg med Chern 9(2001 ) 3035.

    21 Agrawal V K, Sharma R & Khadikar P V, Bioorg rned Chern. IO (2002) 2993.

    22 Agrawal V K, Sharma R & Khadikar P V, Bioorg rned Chern. IO (2002) 3571.

    23 Tinnermals P B & Vanzwieter P A, J rned Chem, 20 (1971) 1636.

    24 Buckley F & Harary F, DistOllce in graphs. (Addison-Wesley: Reading), 1990.

    25 Balaban A. T. Chern Phys Lett., 89 (1982) 399. 26 Khadikar P V, Sharma S, Sharma V, Joshi S, Lukovits I &

    Kaveeshwar M, Bull Soc chern Belg. 106 (1997) 767. 27 Randic M, J. Arn chem Soc. 97(1975) 6609. 28 Box G E B, Hunter W G & Hunter J S. Statistics for

    experiments, (Wiley, New York), 1978.

    29 Chatterjee S, Hadi A S & Price B, Regression analysis by examples, (Wiley, New York), 2000.

    30 Balaban AT. Bonchev D, Mekenyan 0 , Charlon M & I Motoc (Eds ), Steric effects ill drug design, (Akademic-Verlag, Berlin), 1983.

    31 Pogliani L, Amino Acids, 6 (1994) 141. 32 Mandloi M, Sikarwar A, Sapre N S, Karmark.ar S &

    Khadikar P V, J chern InfCornpuJ Sci, 40 (2000) 57. 33 Randic M, Croat chern Acta. 66(1993) 289.