BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom mark@biop.ox.ac.uk

Preview:

Citation preview

BioSimGRID: A GRID Database of Biomolecular Simulations

Mark S.P. Sansomhttp://indigo1.biop.ox.ac.ukmark@biop.ox.ac.uk

Overview

Introduction to biomolecular simulations

www.biosimgrid.org

Why?

Case study – added value from comparisons

How?

Progress towards a prototype of BioSimGRID

The future?

Towards computational systems biology

MD Simulations: from Structure to Dynamics

Molecular simulations as a tool for protein structure analysis

MD – Newtonian simulation of molecular dynamics using an empirical forcefield

Why? - Proteins move

X-ray structure: average structure at 100 K in crystal

MD simulations: dynamics at 300 K in water (& membrane)

Challenge: to relate structural dynamics to biological function

Molecular Dynamics

Describe the forces on all atoms:

bonded (bonds, angles, dihedrals)non-bonded (van der Waals, electrostatics)

Describe the initial atom positions: Integrate: F = ma (a few million times…) Result: positions and energies of all

atoms during a few nanoseconds Applications: liquids … peptides …

proteins … membranes Membrane + protein + water = ca.

50,000 atoms

Need for comparative analysis of simulations – GRID data and collaboration

Need for efficient parallelisation – clusters and/or HPC

Current Paradigm for MD Simulations

Target selection: literature based; interesting protein/problem

System preparation: highly interactive; slow; idiosyncratic

Simulation: diversity of protocols

Analysis: highly interactive; slow; idiosyncratic

Dissemination: traditional – papers, posters, talks

Archival: ‘archive’ data … and then mislay the tape!

Integrating Simulations and Structural Biology of Proteins

Novel structure(RCSB)

Sequence alignmentBiomedically relevant homologue(s)

Homology model(s)

MD simulationsBiomolecular simulation database

Comparative analysis

Evaluation/refinement of model

Biological and pharmacological simulation & modellinge.g. drug discovery

bacterial K channel

mammalian K channel

dynamics in membrane

drug docking calculations

Interaction site dynamics

bioi

nfo

rmat

ics

& s

tru

ctur

al

biol

ogy

Bio

Sim

GR

IDdr

ug

disc

over

y

Comparative Simulations: Drug Receptors

Why? – increase significance of results

Sampling – long simulations and multiple simulations

Sampling via biology – exploiting evolution

Biology emerges from comparisons…

e.g. mammalian receptor vs. bacterial binding protein

Rat GluR2 EC fragment Major receptor in mammalian

brains – drug target MD simulations with/without

bound ligands Analyse inter-domain motions

glutamate

S1

S2

GluR2 – Flexibility & Gating…

Flexibility depends on ligand occupancy & species

Gating mechanism – decrease in flexibility on channel activation

But … incomplete sampling Need: longer simulations &

comparative simulations

empty Kainate Glutamate

>> >

“OFF” “ON”

0 1.0 1.50.5

1

2

3

4

time (ns)

RM

SD

)

0

empty

+Kai

+Glu

2.0

GlnBP – A Bacterial Binding Protein

GlnBP – bacterial 2-domain periplasmic binding protein

Similar fold to mammalian GluR2

X-ray shows ligand binding induces domain closure

MD shows ligand binding reduces inter-domain motions - cf. GluR2 simulations

+ Gln

empty Gln bound

X-ray structuresMD Simulation

empty

Gln bound

Main Initial Tasks

To establish a distributed database environment

To develop Grid/Web services using GT3/OGSA

infrastructure

To develop software tools for interrogation and

data-mining

To develop generic analysis tools

Annotation of simulation data with biological and

structural data from other databases

York

Nottingham

Birmingham

OxfordRAL

Southampton

London

collaborating groups

• Oxford– database management system (Bing Wu)– (meta)data curatorship & integration (Kaihsu Tai)

• Southampton– application programming interface & data retrieval (Muan

Hong Ng)– generic analysis tools (Stuart Murdock)

Dividing up the Tasks

table trajectory:one entry foreach trajectory

table coordinate: {x, y, z}one entry foreach atom in each residue in each frame in each trajectory

table atom: one entry foreach atom in each residue ineach trajectory

table residue: one entry foreach residue in each trajectory

table frame: one entry foreach frame in each trajectory

dictionary tablesmetadata tables

Database Design: Simplified

Database Design: A More Complete Version

Simulation Metadata

Difficult to extract from published literature

This is a prototype: a needs analysis with users/depositors must be conducted

Annotation/links to other biological databases essential

idmoleculesauthordepositorsaffiliationspublicationsmethodsrc_struref_struprogverhardwarenum_of_proctimestepnum_of_frameens_typethermostatsolventforcefieldele_statequ_prothyd_atomunit_shape…

metadata

Database Editor & SQL Query Capability

BioSimGRID Prototype

Target date for prototype: July 2003

Deliverables to Date…

• Database schema• Sample database (with test trajectories)• Prototype shared between 2 sites• Analysis tools – preliminary versions• Interface to database for data retrieval• Python hosting environment

Roadmap

Dec 2002 – project started

July 2003 – (internal) prototype

September 2003 – working prototype (All Hands meeting)

November 2003 – test ‘real world’ applications

December 2003 – multi-site prototype

2004 – multi-site deposition of data

2005 – open up to additional groups for deposition/testing

Future Directions

HTMD – simulations coupled to structural genomics

Diamond light source

Computational system biology – virtual outer membrane

HPCx

Multiscale biomolecular simulations – from QM/MM to meso-scale modelling

GRID-enabled simulations

Combine all of these with BioSimGRID…

Structural Genomics & HTMD

Overall vision – simulation as an integral component of structural genomics

Needs capacity computation – GRID?

MD database (distributed) – BioSimGRID

synchrotron

MD database

novel biology…

compute GRID

Towards a Virtual Outer Membrane (vOM)

Om

pT

Om

pX

Om

pA

Om

pF

PhoE

FhuA

Pi

TolC

LamB

FhuDMalE

PiBP

OM

PLA

OpcA

- - - -+

Pi

TonB

First step towards computational systems biology – a suitable system

Bacterial OMs – 5 or 6 proteins = 90% of protein content

Structures or good homology models of proteins are available

Complex lipid – outer leaflet is lipopolysaccharide (LPS)

Minimum system size ca. 2.5x106 atoms; simulation times ca. 50 ns

cf. current FhuA – 80,000 atoms & 10 ns – need HPCx

Multiscale Biomolecular Simulations

Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids)

Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases

Need for GRID-based integrated simulations

Oxford

Dr Phil Biggin

Dr Carmen Domene

Dr Alessandro Grottesi

Dr Andrew Hung

Dr Daniele Bemporad

Dr Shozeb Haider

Dr Kaihsu Tai

Dr Bing Wu

George Patargias

Oliver Beckstein

Yalini Pathy

Pete Bond

Jonathan Cuthbertson

Sundeep Deol

Jeff Campbell

Loredana Vaccaro

Jennifer Johnston

Katherine Cox

Robert d’Rozario

John Holyoake

Andrew Pang

BBSRC DTI

The Wellcome Trust GSK

EC (TMR) OeSC (EPSRC & DTI)

EPSRC OSC (JIF)

MRC

BioSimGRID

Leo Caves (York)

Simon Cox (Southampton)

Jon Essex (Southampton)

Paul Jeffreys (Oxford)

Charles Laughton (Nottingham)

David Moss (Birkbeck)

Oliver Smart (Birmingham)

Southampton

Dr Stuart Murdock

Dr Muan Hong Ng

Dr Richard Maurer

Dr Hans Fangohr

Steve Johnston

Recommended