SMA5233 Particle Methods and Molecular Dynamics Lecture 5: Applications in Biomolecular Simulation...

Preview:

Citation preview

SMA5233 SMA5233 Particle Methods and Molecular DynamicsParticle Methods and Molecular Dynamics

Lecture 5: Applications in Biomolecular Simulation and Lecture 5: Applications in Biomolecular Simulation and Drug Design Drug Design

A/P Chen Yu ZongA/P Chen Yu Zong

Tel: 6516-6877Tel: 6516-6877Email: Email: phacyz@nus.edu.sgphacyz@nus.edu.sg

http://http://bidd.nus.edu.sgbidd.nus.edu.sgRoom 08-14, level 8, S16 Room 08-14, level 8, S16

National University of SingaporeNational University of Singapore

22

Proteins Proteins Proteins are life’s machines, tools and structuresProteins are life’s machines, tools and structures– Many jobs, many shapes, many sizesMany jobs, many shapes, many sizes

33

Proteins Proteins Proteins are life’s machines, tools and structuresProteins are life’s machines, tools and structures– Nature reuses designs for similar jobsNature reuses designs for similar jobs

1hdd

1enh 1f43 1ftt

1bw5 1du6 1cqt

44

Proteins Proteins

Proteins are hetero-polymers of specific Proteins are hetero-polymers of specific sequencesequence

– There are 20 common polymeric units (amino There are 20 common polymeric units (amino acids)acids)

Composed of a variety of basic chemical moietiesComposed of a variety of basic chemical moieties

– Chain lengths range from 40 amino acids on upChain lengths range from 40 amino acids on up

M K L V D Y A G E

55

Proteins Proteins Proteins are hetero-polymers that adopt a Proteins are hetero-polymers that adopt a unique foldunique fold

M K L V D Y A G E

66

Proteins Proteins Protein folding as a reactionProtein folding as a reaction

Products

Reactants

Transition state

Free

Energy

Bad

Good

77

Proteins Proteins Protein folding … Protein folding …

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Bad

Good

88

Proteins Proteins Folded proteinsFolded proteins

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Folded, active, functional, biologically relevant state (ensemble of conformers)

Bad

Good

99

Proteins Proteins Folded proteinsFolded proteins

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Static, 3D coordinates of some proteins’ atoms are available from x-ray crystallography & NMR

Bad

Good

1010

Proteins Proteins Folded proteinsFolded proteins

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Static, 3D coordinates of some proteins’ atoms are available from PDB http://www.pdb.org

Bad

Good

1111

Proteins Proteins Folded proteins are complex and dynamic Folded proteins are complex and dynamic moleculesmolecules

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Bad

Good

1212

Proteins Proteins Folded proteins are complex and dynamic Folded proteins are complex and dynamic moleculesmolecules

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Bad

Good

1313

Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics

PDB ID: 3chy, E. coli CheY 1.66 Å X-ray crystallography

1414

Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics

PDB ID: 3chy, E. coli CheY 1.66 Å X-ray crystallography

1515

Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics

3chy, hydrogens added

1616

Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics

3chy, waters added (i.e. solvated)

1717

Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics

3chy, waters and hydrogens hidden

1818

Molecular DynamicsMolecular DynamicsMD provides atomic resolution of native MD provides atomic resolution of native dynamicsdynamics

native state simulation of 3chy at 298 Kelvin, waters and hydrogens hidden

1919

Proteins Proteins Folding & unfolding at atomic resolutionFolding & unfolding at atomic resolution

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Disordered, non-functional, heterogeneous ensemble of conformers

Bad

Good

2020

Proteins Proteins Protein folding, why we care how it happensProtein folding, why we care how it happens

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

mutation

mutation

mutation

Many diseases are related to protein folding and / or misfolding in response to genetic mutation.

2121

Proteins Proteins Protein folding, why we care how it happensProtein folding, why we care how it happens

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

mutation

mutation

mutation

We need to comprehend folding to build nano-scale biomachines (that could produce energy, etc…)

2222

Proteins Proteins Protein folding takes > 10 Protein folding takes > 10 μμs (often much s (often much longer)longer)

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Bad

Good

2323

Proteins Proteins Protein folding takes > 10 Protein folding takes > 10 μμs (often much s (often much longer)longer)

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Bad

Good

2424

Proteins Proteins Protein folding is the reverse of protein Protein folding is the reverse of protein unfoldingunfolding

Native

Denatured /

Partially Unfolded

Transition state

Free

Energy

Bad

Good

2525

Proteins Proteins Protein unfolding is relatively invariant to Protein unfolding is relatively invariant to temperaturetemperature

Native Denatured /

Partially Unfolded

Transition state

Free

Energy

Temperature

Bad

Good

2626

Molecular DynamicsMolecular DynamicsMD provides atomic resolution of folding / MD provides atomic resolution of folding / unfoldingunfolding

unfolding simulation (reversed) of 3chy at 498 Kelvin, waters & hydrogens hidden

2727

Forces Involved in the Protein Forces Involved in the Protein FoldingFolding

Electrostatic interactionsElectrostatic interactions

van der Waals interactionsvan der Waals interactions

Hydrogen bondsHydrogen bonds

Hydrophobic interactionsHydrophobic interactions ((Hydrophobic molecules associate with each other in Hydrophobic molecules associate with each other in water solvent as if water molecules is the repellent to water solvent as if water molecules is the repellent to them. It is like oil/water separation. them. It is like oil/water separation. The presence of The presence of water is important for this interactionwater is important for this interaction.).)

2828

Energy Functions used in Energy Functions used in Molecular SimulationMolecular Simulation

pairs ,ticelectrosta

pairs , der Waalsvan

612

Hbonds

1012

dihedralsangles

2

0

bonds

2

0totalcos1

jiij

ji

jiij

ij

ij

ij

ij

ij

ij

ij

b

r

qq

r

B

r

A

r

D

r

C

nKKrrKV

Electrostatic term

H-bonding term

Van der Waals term

Bond stretching term

Dihedral termAngle bending term

r ΦΘ

+ ーO H

rr r

The most time demanding part.

2929

System for MD SimulationsSystem for MD Simulations

Without water molecules With water molecules

# of atoms: 304

# of atoms: 304 + 7,377 = 7,681

3030

MD Requires Huge MD Requires Huge Computational CostComputational Cost

Time step of MD (Time step of MD (ΔΔtt) is limited up to about 1 fsec ) is limited up to about 1 fsec ((1010-15 -15 sec)sec)..←← The size of The size of ΔΔt t should beshould be approximately one-tenth the time of the fastest motion in approximately one-tenth the time of the fastest motion in the system. For simulation of a protein, because bond stretching motions of light the system. For simulation of a protein, because bond stretching motions of light atoms (ex. O-H, C-H), whose periods are about atoms (ex. O-H, C-H), whose periods are about 1010-14 -14 sec, are the fastest motions in sec, are the fastest motions in the system for biomolecular simulations, the system for biomolecular simulations, ΔΔtt is usually set to about 1 fsec. is usually set to about 1 fsec.

Huge number of water molecules have to be used in biomolecular Huge number of water molecules have to be used in biomolecular MD simulations.MD simulations.← ← The number of atom-pairs evaluated for non-bonded interactions (van der Waals, The number of atom-pairs evaluated for non-bonded interactions (van der Waals, electrostatic interactions) increases in order of electrostatic interactions) increases in order of N N 2 2 ((NN is the number of atoms). is the number of atoms).

It is difficult to simulate for long time. It is difficult to simulate for long time. Usually a few tens of Usually a few tens of nanoseconds simulation is performed.nanoseconds simulation is performed.

3131

Time Scales of Protein Motions Time Scales of Protein Motions and MDand MD

Time

10-15 10-610-910-12 10-3 100

(s)(fs) (ps) (μs)(ns) (ms)

Bond stretching

Permeation of an ion in Porin channel

Elastic vibrations of proteins

It is still difficult to simulate a whole process of a protein folding using the conventional MD method.

MD

α-Helix folding

β-Hairpin folding

Protein folding

3232

Much Faster, Much Larger!Much Faster, Much Larger!

Special-purpose computerSpecial-purpose computer– Calculation of non-bonded interactions is performed using the Calculation of non-bonded interactions is performed using the

special chip that is developed only for this purpose. special chip that is developed only for this purpose. – For example;For example;

MDM (Molecular Dynamics Machine) or MD-GrapeMDM (Molecular Dynamics Machine) or MD-Grape: : RIKENRIKENMD EngineMD Engine: : Taisho Pharmaceutical Co., and Fuji Xerox Co. Taisho Pharmaceutical Co., and Fuji Xerox Co.

ParallelizationParallelization– A single job is divided into several smaller ones and they are A single job is divided into several smaller ones and they are

calculated on multi CPUs simultaneously. calculated on multi CPUs simultaneously. – Today, almost MD programs for biomolecular simulations (ex. Today, almost MD programs for biomolecular simulations (ex.

AMBER, CHARMm, GROMOS, NAMD, MARBLEAMBER, CHARMm, GROMOS, NAMD, MARBLE,, etc) can run etc) can run on parallel computers.on parallel computers.

3333

Brownian Dynamics (BD)Brownian Dynamics (BD)The dynamic contributions of the solvent are The dynamic contributions of the solvent are incorporated as a dissipative random force incorporated as a dissipative random force (Einstein’s derivation on 1905). Therefore, (Einstein’s derivation on 1905). Therefore, water water molecules are not treated explicitly.molecules are not treated explicitly.Since BD algorithm is derived under the conditions Since BD algorithm is derived under the conditions that solvent damping is large and the inertial that solvent damping is large and the inertial memory is lost in a very short time, memory is lost in a very short time, longer time-longer time-steps can be used.steps can be used.

BD method is suitable for long time simulation.BD method is suitable for long time simulation.

3434

System for BDSystem for BD SimulationsSimulations

Without water molecules With water molecules

# of atoms: 304

# of atoms: 304 + 7,377 = 7,681

3535

Algorithm of BDAlgorithm of BD

The The Langevin equationLangevin equation can be expressed as can be expressed as

Here, Here, rrii and and mmii represent the position and mass of atom represent the position and mass of atom ii, respectively. , respectively. ζζii is a frictional is a frictional coefficient and is determined by the Stokes’ law, that is, coefficient and is determined by the Stokes’ law, that is, ζζii = 6 = 6ππaaii

StokesStokesηη in which in which aaiiStokesStokes is a is a

Stokes radius of atom Stokes radius of atom ii and and ηη is the viscosity of water. is the viscosity of water. FFii is the systematic force on atom is the systematic force on atom ii. . RRii is a random force on atom is a random force on atom ii having a zero mean < having a zero mean <RRii((tt)> = 0 and a variance <)> = 0 and a variance <RRii((tt))RRjj((tt)> = )> = 66ζζiikTkTδδijijδδ((tt));; this derives from the effects of solvent. this derives from the effects of solvent.

For the overdamped limit, we set the left of eq.7 to zero,For the overdamped limit, we set the left of eq.7 to zero,

The integrated equation of eq. 8 is called The integrated equation of eq. 8 is called Brownian dynamicsBrownian dynamics;;

where where ΔtΔt is a time step and is a time step and ωωii is a random noise vector obtained from Gaussian is a random noise vector obtained from Gaussian distribution.distribution.

iii

ii

i ttm RF

rr

d

d

d

d2

2

iii

it

RFr

d

d

i

ii

i

iit

Tkt

tttt ω

Frr

B

2)()()(

(7)

(9)

(8)

3636

Computational Time of BDComputational Time of BD

AlgorithmAlgorithm ComputerComputer# # of of

atomsatomsTime Time (sec)(sec)

EfficiencyEfficiency

MDMDPentium4 Pentium4 2.8 GHz2.8 GHz

7,6817,681 2,0572,057 1.001.00

BDBDPentium4 Pentium4 2.8 GHz2.8 GHz

304304 38.838.8 53.053.0

BDBD+MTS+MTS††

Pentium4 Pentium4 2.8 GHz2.8 GHz

304304 12.812.8 161161

BDBD+MTS+MTS††

IBM IBM Regatta Regatta 8 CPU8 CPU

304304 3.43.4 605605

†MTS(Multiple time step) algorithm: This method reduces the frequency of calculation of the most time-demanding part ( non-bonded energy terms ) .

Computational time required for 1 nsec simulation of a peptide

3737

Folding Simulation of an Folding Simulation of an αα-Helical Peptide using BD-Helical Peptide using BD

Th

e f

ract

ion

of

nati

ve c

on

tact

s

Simulation time (nsec)0 300200100 400

0

0.20.40.60.81.0

3838

Folding Simulation of an Folding Simulation of an ββ-Hairpin Peptide using BD-Hairpin Peptide using BD

Th

e f

ract

ion

of

nati

ve c

on

tact

s

Simulation time (nsec)0 300200100 400

00.20.40.60.81.0

3939

Time Scales of Protein Motions Time Scales of Protein Motions and BDand BD

Time

10-15 10-610-910-12 10-3 100

(s)(fs) (ps) (μs)(ns) (ms)

BD method allows us to simulate for long time.

BD

α-Helix folding

β-Hairpin folding

Protein folding

MD

Bond stretching

Permeation of an ion in Porin channel

Elastic vibrations of proteins

4040

EXAMPLE: Unfolding of Staphylococcal EXAMPLE: Unfolding of Staphylococcal protein A through high temperature MD protein A through high temperature MD

simulationssimulations

D. Alonso and V. Daggett, PNAS 2000; 97: 133-138.

4141

Simulation MethodologySimulation Methodology

Starting structure: NMR structures 1edk and 1bbd Starting structure: NMR structures 1edk and 1bbd

ENCAD programENCAD program

The protein was initiallyThe protein was initially minimizedminimized 1,000 steps 1,000 steps in vacuoin vacuo. .

The minimized protein was thenThe minimized protein was then solvatedsolvated with water in a with water in a box (approximately 1 g/ml) extending a minimum of box (approximately 1 g/ml) extending a minimum of 10 Å from the protein10 Å from the protein

The box dimensions were then increased uniformly to yield The box dimensions were then increased uniformly to yield the experimental liquid water density for the temperature of the experimental liquid water density for the temperature of interest (0.997 g/ml at 298 K and 0.829 at 498 K) interest (0.997 g/ml at 298 K and 0.829 at 498 K)

4242

Simulation Methodology (cont)Simulation Methodology (cont)

The systems were then The systems were then equilibratedequilibrated by minimizing the by minimizing the water for 2,000 steps, minimizing water and protein for water for 2,000 steps, minimizing water and protein for 100 steps, performing MD of the water for 4,000 steps, 100 steps, performing MD of the water for 4,000 steps, minimizing the water for 2,000 steps, minimizing the minimizing the water for 2,000 steps, minimizing the protein for 500 steps, and minimizing the protein and water protein for 500 steps, and minimizing the protein and water for 1,000 steps. for 1,000 steps.

ProductionProduction MD simulations were then run using a 2-fs MD simulations were then run using a 2-fs time step for several ns (T=298K, T=498K)time step for several ns (T=298K, T=498K)

4343

RMSD and RG as a function of simulation time

4444

Snapshots along the unfolding trajectory

4545

EXAMPLE 2: Identification of the N-EXAMPLE 2: Identification of the N-terminal peptide binding site of GRP94terminal peptide binding site of GRP94

GRP94 GRP94 - Glucose - Glucose regulated protein 94regulated protein 94

VSV8 peptide VSV8 peptide - derived from - derived from vesicular stomatitis virusvesicular stomatitis virus

Gidalevitz T, Biswas C, Ding H, Schneidman-Duhovny D, Wolfson HJ, Stevens F, Radford S, Argon Y. J Biol Chem. 2004

4646

Biological and Drug Design Biological and Drug Design MotivationMotivation

The complex between the two molecules highly The complex between the two molecules highly stimulates the response of the T-cells of the immune stimulates the response of the T-cells of the immune system.system. The grp94 protein alone does not have this property. The grp94 protein alone does not have this property. The activity that stimulates the immune response is The activity that stimulates the immune response is due to the ability of grp94 to bind different peptides.due to the ability of grp94 to bind different peptides. Characterization of peptide binding site is highly Characterization of peptide binding site is highly important for drug design. important for drug design. Either the peptides or their derived non-peptide Either the peptides or their derived non-peptide inhibitors can be developed into drugs for treating inhibitors can be developed into drugs for treating immune related or immune-regulating diseasesimmune related or immune-regulating diseases

4747

GRP94 moleculeGRP94 molecule

There was no structure of grp94 protein. Homology There was no structure of grp94 protein. Homology modeling was used to predict a structure using another modeling was used to predict a structure using another protein with 52% identity.protein with 52% identity.

Recently the structure of grp94 was published. The RMSD Recently the structure of grp94 was published. The RMSD between the crystal structure and the model is 1.3A.between the crystal structure and the model is 1.3A.

4848

Docking and Structure Docking and Structure OptimizationOptimization

PatchDock was applied to dock the two molecules, without any binding site PatchDock was applied to dock the two molecules, without any binding site constraints followed by MD or MM simulation to optimize the docked structure. constraints followed by MD or MM simulation to optimize the docked structure. Docking results were clustered in the two cavities:Docking results were clustered in the two cavities:

4949

GRP94 moleculeGRP94 molecule There is a binding site for inhibitors between the helices.There is a binding site for inhibitors between the helices. There is another cavity produced by beta sheet on the There is another cavity produced by beta sheet on the opposite side.opposite side.

5050

Advantages of Fully Atomic ModelsAdvantages of Fully Atomic Models

Computationally very costlyComputationally very costly

Cannot reach the long time and length scales of Cannot reach the long time and length scales of biological interestbiological interest

Disadvantages of Fully Atomic ModelsDisadvantages of Fully Atomic Models

Detailed level of description of protein and solventDetailed level of description of protein and solvent

Can use enhanced sampling techniques (see talks by Andrij and Arturo)

Can use simplified (coarse-grained) models

5151

Molecular DynamicsMolecular Dynamics

Scalable, parallel MD & analysis software:Scalable, parallel MD & analysis software:

in lucem Molecular Mechanics1

ilmm

1. Beck, Alonso, Daggett, (2004) University of Washington, Seattle

5252

Molecular DynamicsMolecular Dynamicsililmmmm isis written in C (ANSI / POSIX)written in C (ANSI / POSIX)64 bit math64 bit mathPOSIX threads / MPIPOSIX threads / MPI

Software design philosophy:Software design philosophy:– KernelKernel

Compiles user’s molecular mechanics programsCompiles user’s molecular mechanics programsSchedules execution across processor and machinesSchedules execution across processor and machines

– Modules, e.g.Modules, e.g.Molecular DynamicsMolecular DynamicsAnalysisAnalysis

CPU CPU

POSIX threads

(multiprocessor machines)

CPU CPU

Message Passing Interface

(multiple machines)

+

VERY high bandwidth

5353

Molecular DynamicsMolecular Dynamicsililmmmm isis written in C (ANSI / POSIX)written in C (ANSI / POSIX)64 bit math64 bit mathPOSIX threads / MPIPOSIX threads / MPI

Software design philosophy:Software design philosophy:– KernelKernel

Compiles user’s molecular mechanics programsCompiles user’s molecular mechanics programsSchedules execution across processor and machinesSchedules execution across processor and machines

– Modules, e.g.Modules, e.g.Molecular DynamicsMolecular DynamicsAnalysisAnalysis

CPU CPU

POSIX threads

(multiprocessor machines)

CPU CPU

Message Passing Interface

(multiple machines)

+

VERY high bandwidth

5454

Dynameomics Dynameomics Simulate representative protein from all foldsSimulate representative protein from all folds

1. Day R., Beck D. A. C., Armen R., Daggett V. Protein Science (2003) 10: 2150-2160.

fold

pop

ula

tion

cove

rag

e

150 folds represent ~ 75%of known protein structures

fold

1

5555

Dynameomics Dynameomics

Simulate representative protein from all Simulate representative protein from all foldsfolds– Native (folded) dynamicsNative (folded) dynamics

20 nanosecond simulation at 298 Kelvin20 nanosecond simulation at 298 Kelvin

– Folding / unfolding pathwayFolding / unfolding pathway3 x 2 ns simulations at 498 K3 x 2 ns simulations at 498 K

2 x 20 ns simulations at 498 K2 x 20 ns simulations at 498 K

– Each target requires 6 simulationsEach target requires 6 simulations

==

MANY CPU HOURSMANY CPU HOURS

5656

Dynameomics Dynameomics

NERSC DOE INCITE awardNERSC DOE INCITE award– 2,000,000 + hours2,000,000 + hours– 906 simulations of 151 protein folds on Seaborg906 simulations of 151 protein folds on Seaborg

– One to two simulations per node (8 – 16 CPUs / One to two simulations per node (8 – 16 CPUs / simulation)simulation)

– Opportunity to tune Opportunity to tune ililmm for maximum mm for maximum performanceperformance

5757

DynameomicsDynameomics

Load balancingLoad balancing– Even distribution of non-bonded pairs to processorsEven distribution of non-bonded pairs to processors

~20%faster

5858

DynameomicsDynameomics

Parallel efficiency Parallel efficiency – Threaded computations on 16 CPU IBM NighthawkThreaded computations on 16 CPU IBM Nighthawk

0

0.2

0.4

0.6

0.8

1

1CPU

2CPU

4CPU

8CPU

12CPU

16CPU

par

alle

l ef

fici

ency

p, number of processors

t(p), run-time using p processors

)(

)1(1)(

pt

t

ppeparallel efficiency,

5959

Dynameomics Dynameomics

Simulate representative from top 151 foldsSimulate representative from top 151 folds– 151 folds represent about 75% of known 151 folds represent about 75% of known

proteinsproteins~ 11 ~ 11 μμs of combined sim. time from 906 sims!s of combined sim. time from 906 sims!

~ 2 terabytes of data (w/ 40 to 60% compression!)~ 2 terabytes of data (w/ 40 to 60% compression!)

~ 75 / 151 have been analyzed~ 75 / 151 have been analyzed

Validated against experiment where possibleValidated against experiment where possible

6060

Dynameomics Dynameomics Now what?Now what?– Simulate the top 1130 folds (>90%)Simulate the top 1130 folds (>90%)

More CPU timeMore CPU time– Share simulation data from top 151 folds w/ world:Share simulation data from top 151 folds w/ world:

www.dynameomics.orgwww.dynameomics.org

Coordinates, analyses, available via WWWCoordinates, analyses, available via WWW

MicrosoftSQL database w/ On-Line Analytical MicrosoftSQL database w/ On-Line Analytical Processing (OLAP)Processing (OLAP)

End-user queries of coordinate data, analyses, etc.End-user queries of coordinate data, analyses, etc.– Data miningData mining

More CPU time, clever statistical algorithms, etc.More CPU time, clever statistical algorithms, etc.

Recommended