QSAR Qualitative Structure-Activity
Relationships Can one predict activity (or properties in
QSPR) simply on the basis of knowledge of the structure of the molecule?
In other, words, if one systematically changes a component, will it have a systematic effect on the activity?
Choice of Model Can approach in two directions:
Simple to complex model Complex to simple model
Simplest Model Linear relationship between x and y Y = mx + b
Minimize error by least squares: (Yi – Y’i)2 = [Yi – (mXi + b)]2
Y’i is predicted value
Least Squares
Correlation coefficient
-1 < r < 1
Another test
Is the line better than the mean?
y = 0.0676x - 0.3882
R2 = 0.0045
-15
-15 -10 -5 0 5 10 15
y = 2.9562x - 0.2597
R2 = 0.8686
-60
-30
0
30
60
-10 -5 0 5 10 15
A circle 2 lines
y = 2.8515x - 31.647
R2 = 0.9179
0
25
50
75
100
10 20 30 40 50
y = 0.0008x + 275.11
R2 = 0.978
0
250
500
750
1000
0 200000 400000 600000 800000
One bad point Wrong model
Multiple Regression Y = f (X1, X2…Xn) Problems:
Choice of model – linear, polynomial, etc.
Visualization Interpretation Computationally demanding
Variable reduction Principal Component Analysis
Principal Component PC1 = a1,1x1 + a1,2x2 + … + a1,nxn
PC2 = a2,1x1 + a2,2x2 + … + a2,nxn
Keep only those components that possess largest variation
PC are orthogonal to each other
Exploring QSAR Pickup the NONLIN program
http://www.trinity.edu/sbachrac/drugdesign2007/
Unzip and install it on your computer
Read the Read.Me and Nonlin.doc documentation
Look at the HeatForm.NLR file with any word processor
Running NONLIN Start an MSDOS window Change to directory where the
code is Cd /d d:\nonlin
Execute the program with data file Nonlin heatForm > output
assignment Propose a QSAR scheme to predict
the Hf of the alkanes
Early Examples Hammett (1930s-1940s)
COOH COO + H K0
COOH COO + H KpX X
COOH COO + H Km
X X
para = log10
meta = log10
Kp
Km
K0
K0
Hammett (cont.) Now suppose have a related series
reflect sensitivity to substituent reflect sensitivity to different system
CH2COOH CH2COO + H K'x
log10K'xK'0
X X
=
Hammett (cont.) Linear Free Energy Relationship
G = -2.303RTlog10KSoG – G0 = -2.303RTandG’ – G’0 = -2.303RTThereforeG’ – G’0 = (G – G0)
Free-Wilson Analysis Log 1/C = ai +
where C=predicted activity, ai= contribution per group, and
=activity of reference
Free-Wilson example
Log 1/C = -0.30 [m-F] + 0.21 [m-Cl] + 0.43 [m-Br] + 0.58 [m-I] + 0.45 [m-Me] + 0.34 [p-F] + 0.77 [p-Cl]+ 1.02 [p-Br] + 1.43 [p-I] + 1.26 [p-Me] + 7.82
NBr
X
Y HClactivity of analogs
Problems include at least two substituent position necessary and only predict new combinations of the substituents used in the analysis.
Hansch Analysis
Log 1/C = a + b + c
where x) = log PRX – log PRH
and log P is the water/octanol partition
This is also a linear free energy relation
Molecular Descriptors Simple rules for describing some aspect of a molecule
Structure Property
2D descriptors only use the atoms and connection information of the molecule
Internal 3D descriptors use 3D coordinate information about each molecule; however, they are invariant to rotations and translations of the conformation
External 3D descriptors also use 3D coordinate information but also require an absolute frame of reference (e.g., molecules docked into the same receptor).
Descriptor examples Physical Properties
MW log P (ocanol/water partition) bp, mp Dipole moment solubility
Descriptor examples Structural descriptors
2D Atom/Bond counts
Number non-H atoms Number of rotatable bonds
Number of each functional group 2C chains, 3C chains, 4C chains, 5C chains, etc. Rings and their size
3D Number of accessible conformations Surface area
Topological Descriptors Weiner Path Index
12
3
4
5
6
7
0 1 2 3 4 2 31 0 1 2 3 1 22 1 0 1 2 2 13 2 1 0 1 3 21 2 3 4 0 4 32 1 2 3 4 0 33 2 1 2 3 3 0
Distance Matrix
w = diji j>i
w = 46
Topological Descriptors Randic Index
1
1
3
3
1
2
1
3
3
9
36
2
.577
.577
.577
.333 .707.408
3.179
valenceat vertex
bond valuesas productof above
edge termas reciprocal ofsquare rooot of above bond values
Sum ofedge terms
Predict bp of alkanes
y = 1.5225x + 7.2917
R2 = 0.9547
50
60
70
80
90
100
30 35 40 45 50 55 60 65
Weiner Index
bp
3D Molecular Descriptors Potential energy Solvation energy Water accessible surface area Water accessible surface area of
all atoms with positive (negative) partial charge
Pharmacophore Specification of the spatial
arrangement of a small number of atoms or functional groups
With the model in hand, search databases for molecules that fit this spatial environment
Creating a Pharmacophore
O
O
OH
O
O
OH
3D Pharmacophore searching With the pharmacophore in hand,
search databases containing 3-D structure of molecules for molecules that fit
Can rank these “hits” using scoring system described later
Pharmacophore Descriptors Number of acidic atoms Number of basic atoms Number of hydrogen bond donor atoms Number of hydrophobic atoms Sum of VDW surface areas of hydrophobic
atoms
Lipinski’s Rule of 5
potential drug candidates should Have 5 or fewer H-bond donors (expressed
as the sum of OHs and NHs) Have a MW <500 LogP less than 5 Have 10 or less H-bond acceptors
(expressed as the sum of Ns and Os)
Adv. Drug Delivery Rev., 1997, 23, 3
Docking Interact a ligand with a receptor Need to do the following
A) select appropriate ligands B) select appropriate conformation of receptor C) select appropriate conformations of ligands D) combine the ligand and receptor (docking) E) evaluate these combinations and rank order
them
Selection of Ligands Want drug-like molecules
250< MW < 500 Lipinski’s rules
Search through databases Available Chemicals Directory (ACD) World Drug Index NCI Drug database In-house databases
Receptor Conformation Usually Receptor is assumed to be
static Get structure from X-ray or NMR
experiment Protein Data Bank (
http://www.rcsb.org/pdb/) 41385 Structures
Ligand Conformation Rigid or flexible If rigid, optimize the structure then use
it throughout the docking procedure If flexible, can
A) create a set of low energy conformations and then use this set as a collection of rigid structures in docking
B) optimize structure within active site of receptor, i.e. dock and optimize together
Docking Place ligand in appropriate location
for interacting with the receptor Methodological problem:
1) No best method for defining shape 2) No general solution for packing
irregular objects (the knapsack problem)
Docking Algorithmic Components Receptor and Ligand Description (keep in
mind relative errors of structures, etc.) Bind the Ligand to Receptor
(configuration/conformation search) Geometric search (match ligand and
receptor site descriptions) Search for minimum energy - molecular
dynamics (MD) or monte carlo (MC) Evaluation of the dock (Gbind) also
called scoring
Descriptor Matching MethodDOCK program 1) Generate molecular surface for
receptor 2) Generate spheres to fill the active site
(usually 30-50 spheres)
3) Match sphere centers to the ligand atoms (originally just lowest E conformer, now use multiple conformers, but still rigid) – generates 10K orientations per ligand – Shape-driven!
4) Score the interaction
Fragment-Joining MethodFlexX, LUDI Place base fragments into microstates
of the active site (Fragments can be small molecules like benzene, formaldehyde, formamide, naphthol, etc.)
Optimize position of the Base fragment Join fragments with small connecting
chains made of CH2, CO, CONH, etc.
Scoring (evaluation of the dock) Want to quickly evaluate the
strength of the interaction between ligand and receptor Full free energy computation
Expensive Requires excellent force fields
Empirical method Fast and cheap Requires fitting to a broad set of ligand/receptor
complexes
Empirical Scoring Method of Bohm (LUDI, FlexX, etc.)
Gbind = G0 + h-bonds Ghb f(R,) + ion Gion f(R,) + Glipo Alipo + Grot NROT
G0 reduction in binding energy due to loss of rotation and translation of ligandGhb contribution from ideal hydrogen bondGion contribution from ionic interactionsGlipo contribution from lipophilic interactionsGrot contribution from freezing rotations within ligand
These come from empirical fits.
Bohm Method (cont.) f(R,) are penalty functions for non-
ideal interactions – distances too short/long, angles not linear
f (R,) = f1(R)f2()
f1(R) = 1, R<0.2 Å f2() = 1, <30° 1-(R-0.2)/0.4, R<0.6 Å 1-(-30)/50, <80° 0, R>0.6 Å 0, >80°
R is deviation from ideal H...O/N distance of 1.9 Å is deviation from ideal N/O-H…O/N angle of 180°
Bohm Method (cont.) Alipo is the lipophilic contact
surface, evaluated by a coarse grid of boxes
NROT is the number of rotatable bonds – acyclic sp3-sp3, sp3-sp2 and sp2-sp2. No terminal groups or flexibility of rings incorporated.
H.-J. Bohm, J. Comput.-Aided Mol. Des., 1994, 8, 243-256
Scoring alternatives Many variations on Bohm scheme
Buried Polar term, desolvation term, different forms for the lipophilic term, include metal bonding, etc.
Combine scoring functions, i.e. QSAR with scoring functions as variables
Use empirical score to select set of hits, then refine with free energy minimization