Upload
eleanore-lucas
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
SMA5233 SMA5233 Particle Methods and Molecular DynamicsParticle Methods and Molecular Dynamics
Lecture 5: Applications in Biomolecular Simulation and Lecture 5: Applications in Biomolecular Simulation and Drug Design Drug Design
A/P Chen Yu ZongA/P Chen Yu Zong
Tel: 6516-6877Tel: 6516-6877Email: Email: [email protected]@nus.edu.sg
http://http://bidd.nus.edu.sgbidd.nus.edu.sgRoom 08-14, level 8, S16 Room 08-14, level 8, S16
National University of SingaporeNational University of Singapore
22
Proteins Proteins Proteins are life’s machines, tools and structuresProteins are life’s machines, tools and structures– Many jobs, many shapes, many sizesMany jobs, many shapes, many sizes
33
Proteins Proteins Proteins are life’s machines, tools and structuresProteins are life’s machines, tools and structures– Nature reuses designs for similar jobsNature reuses designs for similar jobs
1hdd
1enh 1f43 1ftt
1bw5 1du6 1cqt
44
Proteins Proteins
Proteins are hetero-polymers of specific Proteins are hetero-polymers of specific sequencesequence
– There are 20 common polymeric units (amino There are 20 common polymeric units (amino acids)acids)
Composed of a variety of basic chemical moietiesComposed of a variety of basic chemical moieties
– Chain lengths range from 40 amino acids on upChain lengths range from 40 amino acids on up
M K L V D Y A G E
55
Proteins Proteins Proteins are hetero-polymers that adopt a Proteins are hetero-polymers that adopt a unique foldunique fold
M K L V D Y A G E
66
Proteins Proteins Protein folding as a reactionProtein folding as a reaction
Products
Reactants
Transition state
Free
Energy
Bad
Good
77
Proteins Proteins Protein folding … Protein folding …
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Bad
Good
88
Proteins Proteins Folded proteinsFolded proteins
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Folded, active, functional, biologically relevant state (ensemble of conformers)
Bad
Good
99
Proteins Proteins Folded proteinsFolded proteins
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Static, 3D coordinates of some proteins’ atoms are available from x-ray crystallography & NMR
Bad
Good
1010
Proteins Proteins Folded proteinsFolded proteins
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Static, 3D coordinates of some proteins’ atoms are available from PDB http://www.pdb.org
Bad
Good
1111
Proteins Proteins Folded proteins are complex and dynamic Folded proteins are complex and dynamic moleculesmolecules
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Bad
Good
1212
Proteins Proteins Folded proteins are complex and dynamic Folded proteins are complex and dynamic moleculesmolecules
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Bad
Good
1313
Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics
PDB ID: 3chy, E. coli CheY 1.66 Å X-ray crystallography
1414
Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics
PDB ID: 3chy, E. coli CheY 1.66 Å X-ray crystallography
1515
Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics
3chy, hydrogens added
1616
Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics
3chy, waters added (i.e. solvated)
1717
Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics
3chy, waters and hydrogens hidden
1818
Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics
native state simulation of 3chy at 298 Kelvin, waters and hydrogens hidden
1919
Proteins Proteins Folding & unfolding at atomic resolutionFolding & unfolding at atomic resolution
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Disordered, non-functional, heterogeneous ensemble of conformers
Bad
Good
2020
Proteins Proteins Protein folding, why we care how it happensProtein folding, why we care how it happens
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
mutation
mutation
mutation
Many diseases are related to protein folding and / or misfolding in response to genetic mutation.
2121
Proteins Proteins Protein folding, why we care how it happensProtein folding, why we care how it happens
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
mutation
mutation
mutation
We need to comprehend folding to build nano-scale biomachines (that could produce energy, etc…)
2222
Proteins Proteins Protein folding takes > 10 Protein folding takes > 10 μμs (often much s (often much longer)longer)
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Bad
Good
2323
Proteins Proteins Protein folding takes > 10 Protein folding takes > 10 μμs (often much s (often much longer)longer)
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Bad
Good
2424
Proteins Proteins Protein folding is the reverse of protein Protein folding is the reverse of protein unfoldingunfolding
Native
Denatured /
Partially Unfolded
Transition state
Free
Energy
Bad
Good
2525
Proteins Proteins Protein unfolding is relatively invariant to Protein unfolding is relatively invariant to temperaturetemperature
Native Denatured /
Partially Unfolded
Transition state
Free
Energy
Temperature
Bad
Good
2626
Molecular DynamicsMolecular DynamicsMD provides atomic resolution of folding / MD provides atomic resolution of folding / unfoldingunfolding
unfolding simulation (reversed) of 3chy at 498 Kelvin, waters & hydrogens hidden
2727
Forces Involved in the Protein Forces Involved in the Protein FoldingFolding
Electrostatic interactionsElectrostatic interactions
van der Waals interactionsvan der Waals interactions
Hydrogen bondsHydrogen bonds
Hydrophobic interactionsHydrophobic interactions ((Hydrophobic molecules associate with each other in Hydrophobic molecules associate with each other in water solvent as if water molecules is the repellent to water solvent as if water molecules is the repellent to them. It is like oil/water separation. them. It is like oil/water separation. The presence of The presence of water is important for this interactionwater is important for this interaction.).)
2828
Energy Functions used in Energy Functions used in Molecular SimulationMolecular Simulation
pairs ,ticelectrosta
pairs , der Waalsvan
612
Hbonds
1012
dihedralsangles
2
0
bonds
2
0totalcos1
jiij
ji
jiij
ij
ij
ij
ij
ij
ij
ij
b
r
r
B
r
A
r
D
r
C
nKKrrKV
Electrostatic term
H-bonding term
Van der Waals term
Bond stretching term
Dihedral termAngle bending term
r ΦΘ
+ ーO H
rr r
The most time demanding part.
2929
System for MD SimulationsSystem for MD Simulations
Without water molecules With water molecules
# of atoms: 304
# of atoms: 304 + 7,377 = 7,681
3030
MD Requires Huge MD Requires Huge Computational CostComputational Cost
Time step of MD (Time step of MD (ΔΔtt) is limited up to about 1 fsec ) is limited up to about 1 fsec ((1010-15 -15 sec)sec)..←← The size of The size of ΔΔt t should beshould be approximately one-tenth the time of the fastest motion in approximately one-tenth the time of the fastest motion in the system. For simulation of a protein, because bond stretching motions of light the system. For simulation of a protein, because bond stretching motions of light atoms (ex. O-H, C-H), whose periods are about atoms (ex. O-H, C-H), whose periods are about 1010-14 -14 sec, are the fastest motions in sec, are the fastest motions in the system for biomolecular simulations, the system for biomolecular simulations, ΔΔtt is usually set to about 1 fsec. is usually set to about 1 fsec.
Huge number of water molecules have to be used in biomolecular Huge number of water molecules have to be used in biomolecular MD simulations.MD simulations.← ← The number of atom-pairs evaluated for non-bonded interactions (van der Waals, The number of atom-pairs evaluated for non-bonded interactions (van der Waals, electrostatic interactions) increases in order of electrostatic interactions) increases in order of N N 2 2 ((NN is the number of atoms). is the number of atoms).
It is difficult to simulate for long time. It is difficult to simulate for long time. Usually a few tens of Usually a few tens of nanoseconds simulation is performed.nanoseconds simulation is performed.
3131
Time Scales of Protein Motions Time Scales of Protein Motions and MDand MD
Time
10-15 10-610-910-12 10-3 100
(s)(fs) (ps) (μs)(ns) (ms)
Bond stretching
Permeation of an ion in Porin channel
Elastic vibrations of proteins
It is still difficult to simulate a whole process of a protein folding using the conventional MD method.
MD
α-Helix folding
β-Hairpin folding
Protein folding
3232
Much Faster, Much Larger!Much Faster, Much Larger!
Special-purpose computerSpecial-purpose computer– Calculation of non-bonded interactions is performed using the Calculation of non-bonded interactions is performed using the
special chip that is developed only for this purpose. special chip that is developed only for this purpose. – For example;For example;
MDM (Molecular Dynamics Machine) or MD-GrapeMDM (Molecular Dynamics Machine) or MD-Grape: : RIKENRIKENMD EngineMD Engine: : Taisho Pharmaceutical Co., and Fuji Xerox Co. Taisho Pharmaceutical Co., and Fuji Xerox Co.
ParallelizationParallelization– A single job is divided into several smaller ones and they are A single job is divided into several smaller ones and they are
calculated on multi CPUs simultaneously. calculated on multi CPUs simultaneously. – Today, almost MD programs for biomolecular simulations (ex. Today, almost MD programs for biomolecular simulations (ex.
AMBER, CHARMm, GROMOS, NAMD, MARBLEAMBER, CHARMm, GROMOS, NAMD, MARBLE,, etc) can run etc) can run on parallel computers.on parallel computers.
3333
Brownian Dynamics (BD)Brownian Dynamics (BD)The dynamic contributions of the solvent are The dynamic contributions of the solvent are incorporated as a dissipative random force incorporated as a dissipative random force (Einstein’s derivation on 1905). Therefore, (Einstein’s derivation on 1905). Therefore, water water molecules are not treated explicitly.molecules are not treated explicitly.Since BD algorithm is derived under the conditions Since BD algorithm is derived under the conditions that solvent damping is large and the inertial that solvent damping is large and the inertial memory is lost in a very short time, memory is lost in a very short time, longer time-longer time-steps can be used.steps can be used.
BD method is suitable for long time simulation.BD method is suitable for long time simulation.
3434
System for BDSystem for BD SimulationsSimulations
Without water molecules With water molecules
# of atoms: 304
# of atoms: 304 + 7,377 = 7,681
3535
Algorithm of BDAlgorithm of BD
The The Langevin equationLangevin equation can be expressed as can be expressed as
Here, Here, rrii and and mmii represent the position and mass of atom represent the position and mass of atom ii, respectively. , respectively. ζζii is a frictional is a frictional coefficient and is determined by the Stokes’ law, that is, coefficient and is determined by the Stokes’ law, that is, ζζii = 6 = 6ππaaii
StokesStokesηη in which in which aaiiStokesStokes is a is a
Stokes radius of atom Stokes radius of atom ii and and ηη is the viscosity of water. is the viscosity of water. FFii is the systematic force on atom is the systematic force on atom ii. . RRii is a random force on atom is a random force on atom ii having a zero mean < having a zero mean <RRii((tt)> = 0 and a variance <)> = 0 and a variance <RRii((tt))RRjj((tt)> = )> = 66ζζiikTkTδδijijδδ((tt));; this derives from the effects of solvent. this derives from the effects of solvent.
For the overdamped limit, we set the left of eq.7 to zero,For the overdamped limit, we set the left of eq.7 to zero,
The integrated equation of eq. 8 is called The integrated equation of eq. 8 is called Brownian dynamicsBrownian dynamics;;
where where ΔtΔt is a time step and is a time step and ωωii is a random noise vector obtained from Gaussian is a random noise vector obtained from Gaussian distribution.distribution.
iii
ii
i ttm RF
rr
d
d
d
d2
2
iii
it
RFr
d
d
i
ii
i
iit
Tkt
tttt ω
Frr
B
2)()()(
(7)
(9)
(8)
3636
Computational Time of BDComputational Time of BD
AlgorithmAlgorithm ComputerComputer# # of of
atomsatomsTime Time (sec)(sec)
EfficiencyEfficiency
MDMDPentium4 Pentium4 2.8 GHz2.8 GHz
7,6817,681 2,0572,057 1.001.00
BDBDPentium4 Pentium4 2.8 GHz2.8 GHz
304304 38.838.8 53.053.0
BDBD+MTS+MTS††
Pentium4 Pentium4 2.8 GHz2.8 GHz
304304 12.812.8 161161
BDBD+MTS+MTS††
IBM IBM Regatta Regatta 8 CPU8 CPU
304304 3.43.4 605605
†MTS(Multiple time step) algorithm: This method reduces the frequency of calculation of the most time-demanding part ( non-bonded energy terms ) .
Computational time required for 1 nsec simulation of a peptide
3737
Folding Simulation of an Folding Simulation of an αα-Helical Peptide using BD-Helical Peptide using BD
Th
e f
ract
ion
of
nati
ve c
on
tact
s
Simulation time (nsec)0 300200100 400
0
0.20.40.60.81.0
3838
Folding Simulation of an Folding Simulation of an ββ-Hairpin Peptide using BD-Hairpin Peptide using BD
Th
e f
ract
ion
of
nati
ve c
on
tact
s
Simulation time (nsec)0 300200100 400
00.20.40.60.81.0
3939
Time Scales of Protein Motions Time Scales of Protein Motions and BDand BD
Time
10-15 10-610-910-12 10-3 100
(s)(fs) (ps) (μs)(ns) (ms)
BD method allows us to simulate for long time.
BD
α-Helix folding
β-Hairpin folding
Protein folding
MD
Bond stretching
Permeation of an ion in Porin channel
Elastic vibrations of proteins
4040
EXAMPLE: Unfolding of Staphylococcal EXAMPLE: Unfolding of Staphylococcal protein A through high temperature MD protein A through high temperature MD
simulationssimulations
D. Alonso and V. Daggett, PNAS 2000; 97: 133-138.
4141
Simulation MethodologySimulation Methodology
Starting structure: NMR structures 1edk and 1bbd Starting structure: NMR structures 1edk and 1bbd
ENCAD programENCAD program
The protein was initiallyThe protein was initially minimizedminimized 1,000 steps 1,000 steps in vacuoin vacuo. .
The minimized protein was thenThe minimized protein was then solvatedsolvated with water in a with water in a box (approximately 1 g/ml) extending a minimum of box (approximately 1 g/ml) extending a minimum of 10 Å from the protein10 Å from the protein
The box dimensions were then increased uniformly to yield The box dimensions were then increased uniformly to yield the experimental liquid water density for the temperature of the experimental liquid water density for the temperature of interest (0.997 g/ml at 298 K and 0.829 at 498 K) interest (0.997 g/ml at 298 K and 0.829 at 498 K)
4242
Simulation Methodology (cont)Simulation Methodology (cont)
The systems were then The systems were then equilibratedequilibrated by minimizing the by minimizing the water for 2,000 steps, minimizing water and protein for water for 2,000 steps, minimizing water and protein for 100 steps, performing MD of the water for 4,000 steps, 100 steps, performing MD of the water for 4,000 steps, minimizing the water for 2,000 steps, minimizing the minimizing the water for 2,000 steps, minimizing the protein for 500 steps, and minimizing the protein and water protein for 500 steps, and minimizing the protein and water for 1,000 steps. for 1,000 steps.
ProductionProduction MD simulations were then run using a 2-fs MD simulations were then run using a 2-fs time step for several ns (T=298K, T=498K)time step for several ns (T=298K, T=498K)
4343
RMSD and RG as a function of simulation time
4444
Snapshots along the unfolding trajectory
4545
EXAMPLE 2: Identification of the N-EXAMPLE 2: Identification of the N-terminal peptide binding site of GRP94terminal peptide binding site of GRP94
GRP94 GRP94 - Glucose - Glucose regulated protein 94regulated protein 94
VSV8 peptide VSV8 peptide - derived from - derived from vesicular stomatitis virusvesicular stomatitis virus
Gidalevitz T, Biswas C, Ding H, Schneidman-Duhovny D, Wolfson HJ, Stevens F, Radford S, Argon Y. J Biol Chem. 2004
4646
Biological and Drug Design Biological and Drug Design MotivationMotivation
The complex between the two molecules highly The complex between the two molecules highly stimulates the response of the T-cells of the immune stimulates the response of the T-cells of the immune system.system. The grp94 protein alone does not have this property. The grp94 protein alone does not have this property. The activity that stimulates the immune response is The activity that stimulates the immune response is due to the ability of grp94 to bind different peptides.due to the ability of grp94 to bind different peptides. Characterization of peptide binding site is highly Characterization of peptide binding site is highly important for drug design. important for drug design. Either the peptides or their derived non-peptide Either the peptides or their derived non-peptide inhibitors can be developed into drugs for treating inhibitors can be developed into drugs for treating immune related or immune-regulating diseasesimmune related or immune-regulating diseases
4747
GRP94 moleculeGRP94 molecule
There was no structure of grp94 protein. Homology There was no structure of grp94 protein. Homology modeling was used to predict a structure using another modeling was used to predict a structure using another protein with 52% identity.protein with 52% identity.
Recently the structure of grp94 was published. The RMSD Recently the structure of grp94 was published. The RMSD between the crystal structure and the model is 1.3A.between the crystal structure and the model is 1.3A.
4848
Docking and Structure Docking and Structure OptimizationOptimization
PatchDock was applied to dock the two molecules, without any binding site PatchDock was applied to dock the two molecules, without any binding site constraints followed by MD or MM simulation to optimize the docked structure. constraints followed by MD or MM simulation to optimize the docked structure. Docking results were clustered in the two cavities:Docking results were clustered in the two cavities:
4949
GRP94 moleculeGRP94 molecule There is a binding site for inhibitors between the helices.There is a binding site for inhibitors between the helices. There is another cavity produced by beta sheet on the There is another cavity produced by beta sheet on the opposite side.opposite side.
5050
Advantages of Fully Atomic ModelsAdvantages of Fully Atomic Models
Computationally very costlyComputationally very costly
Cannot reach the long time and length scales of Cannot reach the long time and length scales of biological interestbiological interest
Disadvantages of Fully Atomic ModelsDisadvantages of Fully Atomic Models
Detailed level of description of protein and solventDetailed level of description of protein and solvent
Can use enhanced sampling techniques (see talks by Andrij and Arturo)
Can use simplified (coarse-grained) models
5151
Molecular DynamicsMolecular Dynamics
Scalable, parallel MD & analysis software:Scalable, parallel MD & analysis software:
in lucem Molecular Mechanics1
ilmm
1. Beck, Alonso, Daggett, (2004) University of Washington, Seattle
5252
Molecular DynamicsMolecular Dynamicsililmmmm isis written in C (ANSI / POSIX)written in C (ANSI / POSIX)64 bit math64 bit mathPOSIX threads / MPIPOSIX threads / MPI
Software design philosophy:Software design philosophy:– KernelKernel
Compiles user’s molecular mechanics programsCompiles user’s molecular mechanics programsSchedules execution across processor and machinesSchedules execution across processor and machines
– Modules, e.g.Modules, e.g.Molecular DynamicsMolecular DynamicsAnalysisAnalysis
CPU CPU
POSIX threads
(multiprocessor machines)
CPU CPU
Message Passing Interface
(multiple machines)
+
VERY high bandwidth
5353
Molecular DynamicsMolecular Dynamicsililmmmm isis written in C (ANSI / POSIX)written in C (ANSI / POSIX)64 bit math64 bit mathPOSIX threads / MPIPOSIX threads / MPI
Software design philosophy:Software design philosophy:– KernelKernel
Compiles user’s molecular mechanics programsCompiles user’s molecular mechanics programsSchedules execution across processor and machinesSchedules execution across processor and machines
– Modules, e.g.Modules, e.g.Molecular DynamicsMolecular DynamicsAnalysisAnalysis
CPU CPU
POSIX threads
(multiprocessor machines)
CPU CPU
Message Passing Interface
(multiple machines)
+
VERY high bandwidth
5454
Dynameomics Dynameomics Simulate representative protein from all foldsSimulate representative protein from all folds
1. Day R., Beck D. A. C., Armen R., Daggett V. Protein Science (2003) 10: 2150-2160.
fold
pop
ula
tion
cove
rag
e
150 folds represent ~ 75%of known protein structures
fold
1
5555
Dynameomics Dynameomics
Simulate representative protein from all Simulate representative protein from all foldsfolds– Native (folded) dynamicsNative (folded) dynamics
20 nanosecond simulation at 298 Kelvin20 nanosecond simulation at 298 Kelvin
– Folding / unfolding pathwayFolding / unfolding pathway3 x 2 ns simulations at 498 K3 x 2 ns simulations at 498 K
2 x 20 ns simulations at 498 K2 x 20 ns simulations at 498 K
– Each target requires 6 simulationsEach target requires 6 simulations
==
MANY CPU HOURSMANY CPU HOURS
5656
Dynameomics Dynameomics
NERSC DOE INCITE awardNERSC DOE INCITE award– 2,000,000 + hours2,000,000 + hours– 906 simulations of 151 protein folds on Seaborg906 simulations of 151 protein folds on Seaborg
– One to two simulations per node (8 – 16 CPUs / One to two simulations per node (8 – 16 CPUs / simulation)simulation)
– Opportunity to tune Opportunity to tune ililmm for maximum mm for maximum performanceperformance
5757
DynameomicsDynameomics
Load balancingLoad balancing– Even distribution of non-bonded pairs to processorsEven distribution of non-bonded pairs to processors
~20%faster
5858
DynameomicsDynameomics
Parallel efficiency Parallel efficiency – Threaded computations on 16 CPU IBM NighthawkThreaded computations on 16 CPU IBM Nighthawk
0
0.2
0.4
0.6
0.8
1
1CPU
2CPU
4CPU
8CPU
12CPU
16CPU
par
alle
l ef
fici
ency
p, number of processors
t(p), run-time using p processors
)(
)1(1)(
pt
t
ppeparallel efficiency,
5959
Dynameomics Dynameomics
Simulate representative from top 151 foldsSimulate representative from top 151 folds– 151 folds represent about 75% of known 151 folds represent about 75% of known
proteinsproteins~ 11 ~ 11 μμs of combined sim. time from 906 sims!s of combined sim. time from 906 sims!
~ 2 terabytes of data (w/ 40 to 60% compression!)~ 2 terabytes of data (w/ 40 to 60% compression!)
~ 75 / 151 have been analyzed~ 75 / 151 have been analyzed
Validated against experiment where possibleValidated against experiment where possible
6060
Dynameomics Dynameomics Now what?Now what?– Simulate the top 1130 folds (>90%)Simulate the top 1130 folds (>90%)
More CPU timeMore CPU time– Share simulation data from top 151 folds w/ world:Share simulation data from top 151 folds w/ world:
www.dynameomics.orgwww.dynameomics.org
Coordinates, analyses, available via WWWCoordinates, analyses, available via WWW
MicrosoftSQL database w/ On-Line Analytical MicrosoftSQL database w/ On-Line Analytical Processing (OLAP)Processing (OLAP)
End-user queries of coordinate data, analyses, etc.End-user queries of coordinate data, analyses, etc.– Data miningData mining
More CPU time, clever statistical algorithms, etc.More CPU time, clever statistical algorithms, etc.