View
223
Download
3
Category
Preview:
Citation preview
Quantitative Structure-Activity Relationships (QSAR)
Comparative Molecular Field Analysis (CoMFA)
Gijs Schaftenaar
Outline
• Introduction
• Structures and activities
• Analysis techniques: Free-Wilson, Hansch
• Regression techniques: PCA, PLS
• Comparative Molecular Field Analysis
QSAR: The Setting
Quantitative structure-activity relationships
are used
when there is little or no receptor information,
but
there are measured activities of (many)
compounds
From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
0
1
2
3
4
5
6
7
8
9
1 3 5 7 9 11 13 15
EC5
0
From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
LD50
From Structure to Property
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
O
H
H H
OH
QSAR: Which Relationship?
Quantitative structure-activity
relationships
correlate chemical/biological activities
with structural features or atomic, group
or
molecular properties.
within a range of structurally similar compounds
Free Energy of Binding andEquilibrium Constants
The free energy of binding is related to the reaction constants of ligand-receptor complex formation:
Gbinding = –2.303 RT log K
= –2.303 RT log (kon / koff)
Equilibrium constant K
Rate constants kon (association) and koff (dissociation)
Concentration as Activity Measure
• A critical molar concentration Cthat produces the biological effectis related to the equilibrium constant K
• Usually log (1/C) is used (c.f. pH)
• For meaningful QSARs, activities needto be spread out over at least 3 log units
Free Energy of Binding
Gbinding = G0 + Ghb + Gionic + Glipo + Grot
G0 entropy loss (translat. + rotat.) +5.4
Ghb ideal hydrogen bond –4.7
Gionic ideal ionic interaction –8.3
Glipo lipophilic contact –0.17
Grot entropy loss (rotat. bonds) +1.4
(Energies in kJ/mol per unit feature)
Basic Assumption in QSAR
The structural properties of a compound
contribute
in a linearly additive way to its biological
activity
provided there are no non-linear dependencies of
transport or binding on some properties
An Example: Capsaicin Analogs
X EC50(M) log(1/EC50)
H 11.80 4.93
Cl 1.24 5.91
NO2 4.58 5.34
CN 26.50 4.58
C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
X
NH
O
OH
MeO
An Example: Capsaicin Analogs
X log(1/EC50) MR Es
H 4.93 1.03 0.00 0.00 0.00
Cl 5.91 6.03 0.71 0.23 -0.97
NO2 5.34 7.36 -0.28 0.78 -2.52
CN 4.58 6.33 -0.57 0.66 -0.51
C6H5 6.62 25.36 1.96 -0.01 -3.82
NMe2 5.36 15.55 0.18 -0.83 -2.90
I 6.46 13.94 1.12 0.18 -1.40
NHCHO ? 10.31 -0.98 0.00 -0.98
MR = molar refractivity (polarizability) parameter; = hydrophobicity parameter;
= electronic sigma constant (para position); Es = Taft size parameter
An Example: Capsaicin Analogs
X
NH
O
OH
MeO
log(1/EC50) = -0.89 + 0.019 *
MR + 0.23 * + -0.31 * +
-0.14 * Es
An Example: Capsaicin Analogs
X EC50(M) log(1/EC50)
H 11.80 4.93
Cl 1.24 5.91
NO2 4.58 5.34
CN 26.50 4.58
C6H5 0.24 6.62
NMe2 4.39 5.36
I 0.35 6.46
NHCHO ? ?
X
NH
O
OH
MeO
Free-Wilson Analysis
log (1/C) = aixi + xi: presence of group i (0 or 1)
ai: activity group contribution of group i
: activity value of unsubstituted compound
Free-Wilson Analysis
+ Computationally straightforward
– Predictions only for substituents already included
– Requires large number of compounds
Hansch Analysis
Drug transport and binding affinity
depend nonlinearly on lipophilicity:
log (1/C) = a (log P)2 + b log P + c + k
P: n-octanol/water partition coefficient
: Hammett electronic parameter
a,b,c: regression coefficients
k: constant term
Hansch Analysis
+ Fewer regression coefficients needed for correlation
+ Interpretation in physicochemical terms
+ Predictions for other substituents possible
Molecular Descriptors
• Simple counts of features, e.g. of atoms, rings,H-bond donors, molecular weight
• Physicochemical properties, e.g. polarisability, hydrophobicity (logP), water-solubility
• Group properties, e.g. Hammett and Taft constants, volume
• 2D Fingerprints based on fragments
• 3D Screens based on fragments
2D Fingerprints
Br
NH
O
OH
MeO
C N O P S X F Cl Br I Ph CO NH OH Me Et Py CHO SO C=C CΞC C=N Am Im
1 1 1 0 0 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 0
Principal Component Analysis (PCA)
• Many (>3) variables to describe objects= high dimensionality of descriptor data
• PCA is used to reduce dimensionality
• PCA extracts the most important factors (principal components or PCs) from the data
• Useful when correlations exist between descriptors
• The result is a new, small set of variables (PCs) which explain most of the data variation
Different Views on PCA
• Statistically, PCA is a multivariate analysis technique closely related to eigenvector analysis
• In matrix terms, PCA is a decomposition of matrix Xinto two smaller matrices plus a set of residuals: X = TPT + R
• Geometrically, PCA is a projection technique in which X is projected onto a subspace of reduced dimensions
Partial Least Squares (PLS)
y1 = a0 + a1x11 + a2x12 + a3x13 + … + e1
y2 = a0 + a1x21 + a2x22 + a3x23 + … + e2
y3 = a0 + a1x31 + a2x32 + a3x33 + … + e3
…
yn = a0 + a1xn1 + a2xn2 + a3xn3 + … + en
Y = XA + E
(compound 1)
(compound 2)
(compound 3)
…
(compound n)
X = independent variables
Y = dependent variables
PLS – Cross-validation
• Squared correlation coefficient R2
• Value between 0 and 1 (> 0.9)
• Indicating explanative power of regression equation
• Squared correlation coefficient Q2
• Value between 0 and 1 (> 0.5)
• Indicating predictive power of regression equation
With cross-validation:
PCA vs PLS
• PCA: The Principle Components describe the
variance in the independent variables (descriptors)
• PLS: The Principle Components describe the
variance in both the independent variables (descriptors)
and the dependent variable (activity)
Comparative Molecular Field Analysis (CoMFA)
• Set of chemically related compounds
• Common substructure required
• 3D structures needed (e.g., Corina-
generated)
• Bioactive conformations of the active
compounds are to be aligned
CoMFA Alignment
C7OH
OH
A
D
B
C1
MeO OMe
ClClCl
BA
O
OC7OH
OHOH
A
B
C1
O
NMe2
OH
A B
CL
LL d1
d2d3L
LL
d1
d2
d3
L
LL
d1
d2
d3
L
L
L
d1 d2
d3
L
LL
d1
d2
d3
"Pharmacophore"
CoMFA Model Derivation
Van der Waals field(probe is neutral carbon)
Evdw = (Airij-12 - Birij
-6)
Electrostatic field(probe is charged atom)
Ec = qiqj / Drij
• Molecules are positioned in a regular grid
according to alignment
• Probes are used to determine the molecular
field:
Recommended