From Sequence Analysis to Simulations: Applications of HPC in Modern Biology

R. SankararamakrishnanDepartment of Biological Sciences & Bioengineering

IIT-Kanpur

IIT-K REACH Symposium 2010

Oct 9th 2010

Computers and Computing in Biology

Bioinformatics

Computational Biology

Mathematical Biology

Biostatistics

Biomathematics

Quantitative Biology

Biophysics

What is Bioinformatics? - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

What is Computational Biology? - The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems.

- NIH Definition http://www.bisti.nih.gov/

Definitions

Explosive growth of biological data

HPC Applications: Three examples

Evolutionary relationship among a given set of protein or DNA sequences

Drug Discovery and Design

Structure-function relationship of large biomolecular assemblies

I. HPC in PhylogeneticsI. HPC in Phylogenetics

Phylogeny and Phylogenetic tree

Study of evolutionary relationships (sequences/species)

Relationships between organisms with common ancestor

Phylogenetic tree is a graph representing evolutionary history of sequences/species

HumanChimpanzee

Gorilla

Orangutan

Rooted Tree Unrooted Tree

Direction of evolution

Chimpanzee

Gorilla

Orangutan

Phylogenetic trees can be represented in two different ways

Has a unique node

No assumption about common ancestry

Molecular phylogeny in a criminal investigation

Maximum Likelihood Method – An Introduction

David Mount (2002)

Maximum Likelihood Method – An Introduction

David Mount (2002)

For each unrooted tree, there will be many possible rooted trees

Species

Number of Rooted Trees Number of Unrooted Trees

4 15 3

5 105 15

6 34,459,425 2,027,025

7 213,458,046,767,875 7,905,853,580,625

8 8,200,794,532,637,891,559,375

221,643,095,476,699,771,875

Number of possible unrooted and rooted trees

Maximum likelihood phylogeny problem is NP-hard

Very CPU intensive

For trees containing more than 20 to 25 sequences, the problem cannot be solved any more

Efficient heuristic tree search algorithms are required to reduce the size of the search space

Recently developed algorithms:

IQPNNI, PHYML, GARLI, RAxML

None of these algorithms are guaranteed to find the ML tree; only yield the best known ML tree

Computing phylogenetic trees using ML method

Parallelization strategy

Ott et al. (2008)

RAxML performance in some HPC platforms

Ott et al. (2008)

212 sequences, 566,470 base pairs

One of the largest datasets analyzed under ML

IBM BlueGene/L; 1024 CPUs

7 distinct tree searches in 14 hours

Phylogenetic analysis of plant channel proteins identified new subfamily

Bansal and Sankararamakrishnan, BMC Struct. Biol. (2007)Gupta and Sankararamakrishnan, BMC Plant Biol. (2009)

II. HPC in Drug Discovery & II. HPC in Drug Discovery & Drug DesignDrug Design

“Is there really a case where a drug that is on the market was designed by a computer?”“The reality is that the use of computers and computer methods permeates all aspects of drug discovery today”

Jorgensen (2004)

Roles of Computation in Drug Discovery

“Drug discovery is complex: Successful teams and companies need to congratulated, whereas search for one individual or computer program is counterproductive. There is not going to be a voila moment at the computer terminal. Instead, there is systematic use of wide-ranging computational tools to facilitate and enhance the drug discovery process”

Computation in Drug Discovery

Jorgensen (2004)

Structure-based Drug Design – An Introduction

http://csb.stanford.edu/levitt/demo_lectures/lec7/Lecture7/Discovering_Drugs/pages/Structure_Based_Drug_Design.html

http://www.biocryst.com/our_science

Wim Holwww.bmsc.washington.edu/WimHol/sbdd3.JPG

Lead Generation

Lead optimization

De novo design

Virtual screening

Bleicher et al. (2003)

All drugs that are presently in the market are estimated to target less than 500 biomolecules

Docking & Scoring

Drug targets and Drug discovery: Issues

Issues: Scoring function, solvent effect and protein flexibility

Four proteins: trypsin, HIV PR, CDK2 and AChE

Test set for each protein: 10,000 randomly selected compounds

6000 docking poses were selected for the top 1000 compounds

They served as initial conformations for MD simulations

Combination of docking and MD showed a higher and more stable enrichment performance than docking method used alone

A special purpose computer, MDGRAPE-3, was used for MD simulations

It is a cluster of personal computers

Each equipped with 24 MDGRAPE-3 chips and has a peak speed of approximately 2 Tflops

50 such computers were used

Average computational time for a single protein-ligand complex is 2.5 h

For 6,000 protein-ligand conformations, calculations were completed in a week

Steered Molecular Dynamics to compute the force required to extract the inhibitors from enzymes

A small string is connected to the ligand in the complex

This string is pulled at constant velocity into the surrounding water

Force is determined from the extension of the spring and recorded as a function of time

Strongly-bound inhibitors higher peak forces

Weaker inhibitors flatter profiles

Steered MD in Drug Discovery

Jorgensen, 2010

Protein-protein interactions in programmed cell death

Lama and Sankararamakrishnan, Proteins (2008)Lama and Sankararamakrishnan, Biochemistry (2010)

Bcl-2 family complex structures

Total number of atoms: ~50,000 to ~75,000

Simulation period: 50 ns

III. Large Biomolecular Assemblies

First Biomolecular simulation was performed in 1977

GlpF: 81006 AtomsAQP1: 75057 Atoms PfAQP: 81503 Atoms

30ns production run was performed for all the three systems.

Each simulation takes ~40 days CPU time (Total CPU time ~ 120 days).

MD simulations of channel proteins in bilayers

Alok Jain, Ravi Verma and R. Sankararamakrishnan, Manuscript in preparation

Complete virus: 1 million atoms(Freddolino et al., 2006)

Arrays of light-harvesting proteins – 1 million atoms (Chandler et al., 2008)

Simulations reaching the million-atom mark

BAR domain proteins – 2.3 million atoms (Yin et al., 2009)

The flagellum – 2.4 million atoms (Kitao et al., 2006)

Minimization and equilibration

Cluster of 48 AMD Athlon 2600+ processors

Simulation

256 Altix nodes at NCSA @UIUC

1.1. ns/day

Complete virus: 1 million atoms

(Freddolino et al., 2006)

Functions of large molecular machines

30S ribosome

Fungal fatty acid synthase

Gumbart et al. (2009)

2.7 million atoms

50 ns simulation

MD of protein-conducting channel bound to ribosome

Largest system simulated to date

Bacterial ribosomes are important targets for antibiotics

Phylogenetic analysis

Large Biomolecula

r systems

Drug Design & Discovery

HPC Platforms for Biology Applications

FPGA-boards: Field programmable gate arrays are ICs which can be programmed. FGPA boards with commonly used bioinformatics algorithms are available

Graphics-Processing Unit (GPU): All bioinformatics applications

Grid Computing: Many applications

Distributed Computing: Protein folding, Drug docking

Cloud Computing:

Acknowledgements

Anjali Bansal

Dilraj Lama

Alok Jain

Tuhin Kumar Pal

Priyanka Srivastava

Vivek Modi

Ravi Kumar Verma

Krishna Deepak

Phani Deep

DST, DBT, CSIR, MHRD

From Sequence Analysis to Simulations: Applications of HPC in Modern Biology

Documents

Peta-Scale Simulations with the HPC Software … · Peta-Scale Simulations with the HPC Software Framework waLBerla: ... April 15, 2016 allocation of block data ... 1 balanced grid

HPC molecular simulations using LAMMPShpcadvisorycouncil.com/events/2011/Stanford... · HPC molecular simulations using LAMMPS ... (Large-scale Atomic/Molecular Massively Parallel

Speed up your CFD simulations with - FLOW-3D · 2020-02-21 · Speed up your CFD simulations with FLOW-3D/MP FLOW-3D /MP is the high peformance computing (HPC) version of FLOW-3D.It

Scalable Agent-based Modelling with Cloud HPC Resources for Social Simulations

HPC-Cloud / Cloud-HPC - Approaches to HPC with OpenStack

GPU ACCELERATED COMPUTING IN HPC AND IN THE DATA … · Applications Amber NAMD CUSTOMER USECASES CONSUMER INTERNET Speech Translate Recommender SUPERCOMPUTING Molecular Simulations

The ~okeanos public cloud - · PDF fileUberCloud HPC experiment “Using Cloud Computing to perform Molecular Dynamics simulations of the mutant PI3Kαprotein

HPC simulations of glassy materials for biomedicine Jamieson …2014_user_meeting:cp2... · 2020. 8. 21. · yttrium aluminosilicate glass for radiotherapy fluorinated bioactive silicate

Utilizing CRM products in earthquake sequence …...Utilizing CRM products in earthquake sequence simulations Eric M. Dunham, Kali L. Allison, WeiqiangZhu, Brittany Erickson Cycle

Accelerating Innovation Through HPC-Enabled Simulations

Visual Analytics, HPC, Simulations & AI - GTC On-Demand ...on-demand.gputechconf.com/gtc/...hpc...interaction.pdf · The area between the completely real and completely virtual, consists

Massively Parallel Phase Field Simulations using HPC

Fortissimo Success Story Cerebral blood flow simulations · Fortissimo Success Story Through cloud based-HPC simulations Vittamed can realise a reduction in time to market, due to

HPC Fluid Flow Simulations in Porous Media Geometries · HPC Fluid Flow Simulations in ... The simulation of ows in porous media on the microscale level ... thousands of spheres and

HPC Fluid Flow Simulations in Porous Media Geometries€¦ · Keywords: Computational Fluid Dynamics, Cartesian Grids, HPC, Porous Media, Geometry Generation, Adaptive Grids. 1 Introduction

HPC SIMULATION AND OPTIMIZATION OF MATERIAL FORMING PROCESSES · HPC SIMULATION AND OPTIMIZATION OF MATERIAL FORMING PROCESSES ... First simulations of the casting process took place

Certified Systems for Amber Molecular Dynamics · 2. Molecular Dynamics (HPC systems). HPC Systems (High Performance Computing) installed for Molecular Dynamics Simulations, has been

COT 6930 HPC and Bioinformatics Sequence Alignment

The Effect of HPC Cluster Architecture on the Scalability Performance ...€¦ · The Effect of HPC Cluster Architecture on the Scalability Performance of CAE Simulations Pak Lui

Article - Welcome to HPC-Forge | HPC-Forge · Web viewThis project concerned the development of tools for visualization of output from brain simulations performed on supercomputers