24
DO (WILL) GRIDS MATTER IN DRUG DISCOVERY? Arthur Thomas SIB/Vital-IT and SwissBioGrid

DO (WILL) GRIDS MATTER IN DRUG DISCOVERY? Arthur Thomas SIB/Vital-IT and SwissBioGrid

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

DO (WILL) GRIDS MATTERIN

DRUG DISCOVERY?

Arthur Thomas

SIB/Vital-IT and SwissBioGrid

Biology: Big Science!

Osaka/Hitachi UHVEM:World’s largest electron microscope

Argonne Advanced Photon Source:World’s largest X-ray Crystallography System

US NHMFL: 900MHz 21-T wide-bore NMR Facility

Automation Partnership:HTS “Factory” 2.5x105/8hr107 data points/year

Sanger Institute Sequencing Factory

Siemens PET scanner

Biology: Big Data!

• 32,000 measures/spectrum• 900 spectra/LC run= 28,800,000 measurements (55MB)/LC run

• 55 MB/LC run• 3 MS-MS/spectrum• 200 KB/MS-MS• (900 x 3 x 200 KB) + 55 MB = 595 MB

• 10 spectra/mm = 100 spectra/mm2

• 100 x 100 = 10,000 spectra/cm2

• 16 x 16 cm2 gel• 6 x 16 x 10,000 = 2,560,000 spectra/gel• 2,560,000 x 200 KB = 512 TB

[Source: Ron Appel (SIB)]

• 32,000 measures/spectrum• 900 spectra/LC run= 28,800,000 measurements (55MB)/LC run

• 55 MB/LC run• 3 MS-MS/spectrum• 200 KB/MS-MS• (900 x 3 x 200 KB) + 55 MB = 595 MB

• 10 spectra/mm = 100 spectra/mm2

• 100 x 100 = 10,000 spectra/cm2

• 16 x 16 cm2 gel• 6 x 16 x 10,000 = 2,560,000 spectra/gel• 2,560,000 x 200 KB = 512 TB

[Source: Ron Appel (SIB)]

[Source: Selinger et al. Trends in Biotech. (2003)]

Biology: Big Data!

Source: GenomeNet, Kyoto

~1000 different biology reference data bases:

• Genome/Nucleotide Sequence Databases • RNA sequence databases • Protein sequence databases • Structure Databases • Metabolic and Signaling Pathways • Human Genes and Diseases • Microarray and other Gene Expression Databases • Proteomics Resources • Other Molecular Biology Databases • Organelle databases • Plant databases • Immunological databases

Source: M Y Galperin, Nucleic Acids Research (2006)

Biology: Visualisation! Collaboration!

NCMIR “BioWall”

SAGE HP Halo Collaboration Studio

Drug Discovery & Development12+ years, $1-1.25 billion

HTS QSAR ADME/ToxSequenceHomology,GeneExpression,Proteomics,

Comb.Libraries

System &DiseaseModelling

Trial Design

‘Omics

Paradigm Change

Old Science New Science

Classical chemistry Combinatorial

chemistry

Basic biology ‘Omics,

Biotechnology

Experimentation Computation

Low throughput High throughput

Animal studies Molecular imaging

Paradigm Change

Old Science New Science

Classical chemistry Combinatorial

chemistry

Basic biology ‘Omics,

Biotechnology

Experimentation Computation

Low throughput High throughput

Animal studies Molecular imaging

The Discovery Sieve

Getting Less and Less for More and More

Source: PPD Inc.

Pharma Challenges• Declining productivity and ROI

– $1+ billion to bring a drug to market, $1 million/day revenue lost to delay, declining post-patent lifetimes (5-7 years)

– Most drug candidates fail• 1:10 development candidates fail• 1:2 clinical trial candidates fail

– Number of NCEs has been falling for a decade– 2:3 drugs do not generate a lifetime return– Blockbuster (“one size fits all”) and “me too” mentalities not sustainable;

many patents (~$72b) expiring in next 5 years– Stricter regulation (pre- and post-market), greater price pressure and

greater liability (Vioxx, Baycol, …)• Deluge of data, drought of knowledge

– Huge investment in high-throughput data generation technologies not matched by investment in data analysis technologies

– Poorly integrated data silos• Increasingly collaborative landscape

– Challenges of sharing information across enterprise boundaries

New Pharma Ecosystem?• 1,500 ($50b+) pharma/biotech partnerships in last 7

years

– e.g. 50% of Roche pharma/diagnostic revenues from licensing deals

Source: Recombinant Capital

Typical Grid Applications

• Drug Discovery– Sequence analysis– Microarray analysis/network inference– Virtual Screening (Autodock, CHARMM, Glide, FlexX)

• Development– ADME, PK/PD (NONMEM, WinNonLin)– Trial design (TrialSimulator)– Process validation, compliance

• Marketing– Market data analysis (SAS, SPSS)

“Instead of spending millions of dollars and years in the lab screening hundreds of thousands of compounds, now it will be possible to screen hundreds of millions of molecules in months” (Graham Richards)

Pharma Grids: the Good News • J&JPRD1

– 1,200 rising to 3,000 PCs; mix of Linux (clusters) and Windows (desktops)

– 20+ applications– United Devices GridMP

• Novartis2

– Began in 2001– Now 2,700+ PCs (out of 65,000), 5+ Tflops, 25,000 PC’s eventually?– Apps: docking, genome annotation, chemoinformatics, clinical trial

simulation, text mining– $400k investment, $2+ millions annual savings– United Devices GridMP for PC farm– Rigidly standardized PC environment

• gsk1

– 1,000+ PCs– $1 million estimated annual savings– United Devices GridMP for PC farm

1 Source: United Devices, Inc.2 Source: Manuel Peitsch, Novartis

Pharma Grids: not-so-good News

“Less than half of the top 20 pharmaceutical companies are

implementing Grids” [William Fellows, 451 Group]

Barriers to Grid adoption• Difficulty of Building a Business Case

– Cui bono? – Measuring the ROI?

• Unsuitable licensing models: driving open source?• Trust and Access Control issues

– Extending to the balkanized (fire-walled) global enterprise– Extending to the whole development ecosystem

• Technical Barriers– Lack of suitable (“embarassingly parallel”) applications– Heterogeneity of platforms– Poor standardization of middleware (commercial vs open

source): will SOA (OGSA) solve this?– Poor data grid management, semantic integration: driving

development of ontologies?– Limited bandwidth: increasing use of Lambda rails?

Overcoming the Barriers:Building a Business Case

• Capacity Improvement– Driven by ROI– Reduced build and running costs of PC Grids

cf. dedicated clusters

• R&D Process Innovation– Driven by need for new ways of doing– Collaborative research (industry/academia)– “Open source research” (NIH, Wellcome)

Overcoming the Barriers:Technical

• Software– Less intrusive, more standardized middleware– Web services, OGSA

• Data Management– DataGrid techologies

• Data Integration– Ontologies and shared knowledge spaces

• “Utility/On-Demand” Computing• Bandwidth

– National and international LambdaRails

• Virtual Laboratories/Organizations

LambdaRails™

Source: OptiPuter Group

SwissBioGrid: A National Resource

• Dedicated to large-scale computational applications in bioinformatics, modelling, chemoinformatics and bio-medical sciences• CSCS manages GRID infrastructure, middleware, security• SIB/Vital-IT has primary responsibility for providing bioinformatics application validation and optimization, Web services, database services• Some sites compute-intensive, some data-intensive

SwissBioGrid: A Mixture ofClusters and PCs

UniZH Matterhorn(Sun Grid Engine)

SIB Vital-IT (Platform LSF)

ETHZ Hreidar(Sun Grid Engine)

NorduGRID/

ARC

NorduGRID/

ARC

CSCS - Ticino Cluster (Itanium, LSF) - Terrane Cluster (PS 5, PBS) - Sun Cluster (PBS)

UniBS/FMI PC farms

ProtoGRIDMetascheduler

UniBS BC2 cluster(Platform LSF)

Some Good News…“Open source discovery” is thriving!

• Anthrax (7,000+ CPU years)• Smallpox (68,000+ CPU years)

– 400,000+ CPUs, 53,000+ CPU years to date, 75+ CPU years/day

• Human Proteome folding, Phase II (761+ CPU years)

• Cancer project Phase II (437+ CPU years)• AIDS project (25,000+ CPU years)

Dengue [10 million infections, 100,000 deaths/year]– Autodock, Glide– Mixed PC and cluster Grid– 130,000 ligands from NCI DTP library docked against dengue NS5 protein– ~ 1 CPU min/dock– 70 hits found, being evaluated in vitro– Plan to dock 2.7 million ligands from ZINC library– 1875 CPU-days for 1 target/1 site/1 parameter set/1 library (“parameter sweep”)

More Good News… WISDOM• Malaria [500 million infections, 1.3 million deaths/year]

– Autodock, FlexX– 80 CPU years in 6 weeks– 1,000,000 ligands against 11 targets– Top 1,000 hits identified

• Avian Flu [the next Big One]– 77 CPU years on 2000 computers– 300,000 ligands against 8 Influenza A neuraminidase targets– Hits now being analyzed

VitalIT IA64

VitalIT Nocona

BC2 Athlon

BC2 Opteron

BC2 PC-Grid

Uni ZH PC-Grid

Novartis PC-Grid

VitalIT IA64

VitalIT Nocona

BC2 Athlon

BC2 Opteron

BC2 PC-Grid

Uni ZH PC-Grid

Novartis PC-Grid

From Data Sharing toKnowledge Sharing

• DataGrid– SwissBioGrid experiment in data grid using

Avaki – Complex update patterns

• KnowledgeGrid– Aggressive use of ontologies for knowledge

standardization and sharing• Gene Ontology

Thank You!

Questions?