66
Feb. 25, 2004 Feb. 25, 2004 World University Network - Worldwid World University Network - Worldwid e Broadcast e Broadcast The Future of The Future of Bioinformatics Bioinformatics (with examples from structural bioinformatics) (with examples from structural bioinformatics) Philip E. Bourne Philip E. Bourne The University of California The University of California San Diego San Diego [email protected] [email protected] http://www.sdsc.edu/pb/Talks http://www.sdsc.edu/pb/Talks

Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

The Future of BioinformaticsThe Future of Bioinformatics(with examples from structural bioinformatics)(with examples from structural bioinformatics)

Philip E. BournePhilip E. BourneThe University of California San DiegoThe University of California San Diego

[email protected]@ucsd.eduhttp://www.sdsc.edu/pb/Talkshttp://www.sdsc.edu/pb/Talks

Page 2: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

OutlineOutline

Bioinformatics thus farBioinformatics thus far Today – a growth disciplineToday – a growth discipline Drivers Drivers

DataData Complexity – biological and dataComplexity – biological and data

The interface to medical informatics and The interface to medical informatics and systems biologysystems biology

ChallengesChallenges The devil is in the detailsThe devil is in the details Quality controlQuality control Fundamentals versus relevance to biologyFundamentals versus relevance to biology

"You can observe a lot just by watching."

Page 3: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bioinformatics Thus Far – Pre 1970Bioinformatics Thus Far – Pre 1970Bioinformatics (2003) 19 2176-2190Bioinformatics (2003) 19 2176-2190

1945 Biochemical Pathways - Horowitz1953 Structure of DNA – W&C1969 Genetic Variation

1953 Game Theory – Neumann and Morgenstern1959 Grammars – Chomsky1962 Information Theory – Shannon & Weaver1966 Cellular automata – Neuman

1962 Molecular Homology – Florkin1965 Evolutionary Patterns – Purling1966 Molecular Modeling - Levinthal1967 Phylogenetic Trees – Fitch1969 Properties – Ptitsyn1970 Dynamic Programming N&W

Page 4: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bioinformatics Thus Far – 1970’sBioinformatics Thus Far – 1970’sProblem DefinitionProblem Definition

Improved Sequence AlignmentsSanakoff

Structural patternsAnd PropertiesRichards

Smith Waterman Algorithm

Exon/IntronsGilbert

Structure PredictionLevittChou and FasmanScheraga

Public Resources Dayhoff, PDB

Page 5: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bioinformatics Thus Far – 1980’sBioinformatics Thus Far – 1980’sComputational Biology EmergesComputational Biology Emerges

Domains recognizedRashin

Tree of Life Emerges

FASTALipman & Pearson

ProfilesGribskov

Reductionism beginsThorntonSander

Neural netsHopfield

Molecular computingConrad

NanotechnologyDrexler

ClusteringShepard

Relational DatabasesNetworks – EMBLnet, BIONET

Page 6: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bioinformatics Thus Far – 1990’s Bioinformatics Thus Far – 1990’s Bioinformatics and Biotechnology Bioinformatics and Biotechnology

EmergeEmerge

Human Genome Human Genome ProjectProject

Internet/WebInternet/Web

Page 7: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

So What is Bioinformatics Today?So What is Bioinformatics Today?

A relatively new term for a scientific endeavor that has A relatively new term for a scientific endeavor that has been around much longerbeen around much longer

Medical informatics preceded it, and defined some of the Medical informatics preceded it, and defined some of the foundations?foundations?

A scientific endeavor driven out of a paradigm shift in A scientific endeavor driven out of a paradigm shift in which biology became a data driven sciencewhich biology became a data driven science

A scientific endeavor that has gained from fundamental A scientific endeavor that has gained from fundamental developments is computer and information science e.g., developments is computer and information science e.g., algorithms, ontologies, Bayesian networks, neural algorithms, ontologies, Bayesian networks, neural networks, text mining …networks, text mining …

A growth discipline…….A growth discipline…….

"Do you mean now?" -- When asked for the time. "

Page 8: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bioinformatics - A Vice Chancellor’s View

Biological Experiment Data Information Knowledge Discovery

Collect Characterize Compare Model Infer

Sequence

Structure

Assembly

Sub-cellular

Cellular

Organ

Higher-life

Year90 05

Computing Power

SequencingTechnology

Data1 10 100 1000 100000

95 00

Human Genome Project

E.ColiGenome

C.ElegansGenome 1 Small

Genome/Mo.ESTs

YeastGenome

Gene Chips

Virus Structure

Ribosome

Model Metaboloic Pathway of E.coli

Complexity Technology

Brain Mapping

Genetic Circuits

Neuronal Modeling

Cardiac Modeling

Human Genome

# People/Web Site

(C) Copyright Phil Bourne 1998

106 102 1

Page 9: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

http://www.iscb.org/history.shtml

1500

2002

EdmontonCANADA

Growth in Bioinformatics as Growth in Bioinformatics as Measured by ISMB AttendanceMeasured by ISMB Attendance

Page 10: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bioinformatics Journal

0

200

400

600

800

1000

1200

1400

1997 1998 1999 2000 2001 2002 2003

Submissions

Bioinformatics Journal

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1997 1998 1999 2000 2001 2002 2003

Impact Factor

Growth in the JournalBioinformatics

Page 11: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Drivers – Data Growth and Data Drivers – Data Growth and Data ComplexityComplexity

Consider Macromolecular Structure as an Consider Macromolecular Structure as an exampleexample

Page 12: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bourne Bioinformatics Editorial 1999 15(9):715 “Over the next 5 years there will be an estimated 10

major structural genomics efforts each yielding 200structures per year. While these efforts will deplete

regular structure determination efforts, improvementsin technology and a general expansion of the field

will continue to yield 50 structures per week worldwideoutside of the structural genomics initiatives.”

Net result 35,000 structures by 2005

"You can observe a lot just by watching."

There were 11,000 structures at the time of this prediction

Page 13: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

PDB Growth CurvePDB Growth Curve

Approx. 24,000 structures todayIn 2003 approx. 5,000 structures were deposited

Page 14: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

HistoryHistoryPredictions Can Be Good

Page 15: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

A Data Centric View of the FutureA Data Centric View of the Future

Data complexityData complexity High throughput data collectionHigh throughput data collection Database versus literatureDatabase versus literature Bioinformatics as data driverBioinformatics as data driver Data representationData representation Data integrationData integration

"If you come to a fork in the road, take it."

Page 16: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

(a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA(e) antibodies (f) viruses (g) actin (h) the nucleosome (i) myosin (j) ribosome

Numbers and Complexity

Courtesy of David Goodsell, TSRI

Complexity is increasing

Page 17: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

"The ribosome, together with its accessories, is probably "The ribosome, together with its accessories, is probably the most sophisticated machine ever made.the most sophisticated machine ever made.““ R. Garrett (1999) R. Garrett (1999) NatureNature 400 400

• Translates mRNA into proteinTranslates mRNA into protein

• Molecular Mass: 2.6 millionMolecular Mass: 2.6 million

• Maximum Dimension ~25 nmMaximum Dimension ~25 nm

• 2/3 RNA – performs catalysis2/3 RNA – performs catalysis

• 1/3 protein –outer scaffold for the RNA1/3 protein –outer scaffold for the RNA

Complexity - The Ribosome Complexity - The Ribosome A NanomachineA Nanomachine

proteinmRNA

30s30s

50s50s

Figure from J. Frank, Wadsworth Center, NY

Page 18: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

High Throughput - The Structural Genomics Pipeline (X-ray Crystallography)

Basic Steps

Target Selection

Crystallomics• Isolation,• Expression,• Purification,• Crystallization

DataCollection

StructureSolution

StructureRefinement

Functional Annotation Publish

Bioinformatics Throughout the Process

Bioinformatics• Distant homologs • Domain recognition

AutomationBioinformatics• Empirical rules

AutomationBetter sources

Software integrationDecision Support

MAD Phasing Automated fitting

Bioinformatics• Alignments• Protein-protein interactions• Protein-ligand interactions• Motif recognition

No?

Page 19: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

An Aside on the Future of PublishingFull Description Captured as the Paper/Database is

Written/Deposited Does away with ...

… the p53 core domain structure consists of a ß sandwich that serves as a scaffold for two large loops and a loop-sheet- helix motif ... ----Science Vol.265, p346

1TSR

Corresponding structure from the PDB

?Oops!

ß sandwich? Where?Large loop? Which one??

Loop-sheet-helix???

Page 20: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

BioEditor - A DTD Driven BioEditor - A DTD Driven Domain Specific EditorDomain Specific Editor

http://bioeditor.sdsc.edu

Page 21: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bourne et al. 2004 Pacific Symposium on Biocomputinghttp://www-smi.stanford.edu/projects/helix/psb04/bourne.doc

Structural Genomics Targets and their Status from http://targetdb.rcsb.org

Page 22: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

The Data - Bioinformatics CycleThe Data - Bioinformatics CycleResult – Computation and Experiment Result – Computation and Experiment

Become More SynergisticBecome More Synergistic

Turn Data into Knowledge

Turn Knowledge into New Data Requirements

Data Bioinformatics

Page 23: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Deuterium Exchange Mass Spec to Predict StructureDeuterium Exchange Mass Spec to Predict Structure

DXMS

COREX

Target ProteinStructure Templates

CASP

X-ray or NMR

Sequence

Homology

Threadingab in

itio

others

Amino Acid

S

tabi

lity

)

Profile Match Method

Best Structure(s)

Page 24: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Biological RepresentationBiological Representation

The Gene Ontology changes everythingThe Gene Ontology changes everything Molecular functionMolecular function Biochemical processBiochemical process Cellular locationCellular location DAG – machine usableDAG – machine usable

The number of papers referencing the The number of papers referencing the gene ontology has increased dramatically gene ontology has increased dramatically in the last yearin the last year

Page 25: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Biological Data Representation Biological Data Representation Future Future

Tools to construct ontologies from free Tools to construct ontologies from free text?text?

Ontologies for details of function, protein-Ontologies for details of function, protein-protein interaction, protocols, complete protein interaction, protocols, complete pathway informationpathway information

Page 26: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Data IntegrationData Integration

Web Services – the Web Services – the holy grail of holy grail of

interoperability? interoperability?

Page 27: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Web ServicesWeb Services

Its not CORBA – biologists can do itIts not CORBA – biologists can do it Easy to implementEasy to implement Platform independentPlatform independent Driver to force data providers to define and Driver to force data providers to define and

publish a detailed API publish a detailed API Compelling - introduces the prospect of Compelling - introduces the prospect of

global workflowglobal workflow

Page 28: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Perl Web Services Client ExamplePerl Web Services Client Example A small PERL program to access all Pubmed A small PERL program to access all Pubmed

abstracts containing the word ‘ferritin’abstracts containing the word ‘ferritin’use SOAP::Lite;

$ids_ref = SOAP::Lite

-> uri(‘http://server.location.edu/pdbWebServices’)

-> proxy(‘http://server.location.edu/pdbWebServices’)

-> pubmedAbstractQuery($ARGV[0])

-> result;

@ids = @($ids_ref);

Print “@ids\n”;

Mycomputer(1)% web_service.pl ferritin

1AEW 1AQO 1BCF 1BFR 1BG7 1DPS 1EUM 1FHA 1JGC 1JI5 1JIG 1MFR 1QGH 1RCC 1RCD 1RCE 1RCG 1RCI 1RYT 2FHA

Page 29: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

A Biological Complexity A Biological Complexity PerspectivePerspective

Page 30: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Cell BiologyCell Biology

AnatomyAnatomy

PhysiologyPhysiology

ProteomicsProteomicsGenomicsGenomics

MedicinalMedicinal ChemistryChemistry

OrganismsOrganisms

OrgansOrgans

CellsCells

MacromoleculesMacromoleculesBiopolymersBiopolymers

Atoms & MoleculesAtoms & Molecules

SCIENTIFIC RESEARCH& DISCOVERY

REPRESENTATIVE DISCIPLINE

EXAMPLE UNITS

MRIMRI

HeartHeart

NeuronNeuron

StructureStructureSequenceSequence

ProteaseProteaseInhibitorInhibitor

ElectronElectronMicroscopyMicroscopy

Migratory Migratory SensorsSensors

VentricularVentricularModelingModeling

X-rayX-rayCrystallographyCrystallography

ProteinProteinDockingDocking

REPRESENTATIVE TECHNOLOGY

Technologies

Training

Infrastructure

You Are Here

Page 31: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

The Post-Genomic EraThe Post-Genomic Era

GenomesGene

ProductsStructure &

FunctionPathways &Physiology

The “New” Central Dogma

~ Scientific Challenges - Deciphering the genome, mapping the genotype-phenotype relationships, dissecting organismic function, engineering organisms with altered functionality, figuring out complex traits and polymorphism, understanding physiology.

~ Algorithmic Challenges - comparisons of whole and partial genomes, metrics for similarity and homology, metabolic reconstruction, dissecting pathways, and whole cell modeling.

~ Computational Challenges - creation the informatics infrastructure, creation, annotation, curation and dissemination of databases, development of parallel computational methods.

Page 32: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Interaction NetworksInteraction Networks

A Protein Interaction Map of Drosophila melanogaster

L. Giot, et al. Science, Vol. 302, Issue 5651, 1727-1736, December 5, 2003

Page 33: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Phenomena in biological systems may be Phenomena in biological systems may be organized in several layers.organized in several layers.

PopulationsPopulations Ecological CommunitiesEcological Communities Populations of a SpeciesPopulations of a Species

Physiology and Organisms Physiology and Organisms Integrative physiology, HomeostasisIntegrative physiology, Homeostasis Organs, TissuesOrgans, Tissues CellsCells

Pathways and Information TransferPathways and Information Transfer Integrated metabolism, regulatory, developmental pathwaysIntegrated metabolism, regulatory, developmental pathways Simple pathways for information transfer, regulation, developmentSimple pathways for information transfer, regulation, development Simple metabolic pathways for creating & using other molecules Simple metabolic pathways for creating & using other molecules

Biological Macromolecules and StructuresBiological Macromolecules and Structures Biomolecular Assemblies; ligand-receptor complexesBiomolecular Assemblies; ligand-receptor complexes Molecules and Structures created by genes, gene products Molecules and Structures created by genes, gene products Gene Products: RNAs; ProteinsGene Products: RNAs; Proteins Genes and GenomesGenes and Genomes

Physics and ChemistryPhysics and Chemistry e.g. Physical Chemistry, Organic Chemistry, Information theory, Constraints of self-assembling adaptive systemse.g. Physical Chemistry, Organic Chemistry, Information theory, Constraints of self-assembling adaptive systems

Page 34: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Each system layer builds from lower system layers Each system layer builds from lower system layers & acquires new emergent properties& acquires new emergent properties

PopulationsPopulations Ecological CommunitiesEcological Communities Populations of a SpeciesPopulations of a Species

Physiology and Organisms Physiology and Organisms Integrative physiology, HomeostasisIntegrative physiology, Homeostasis Organs, TissuesOrgans, Tissues CellsCells

Pathways and Information TransferPathways and Information Transfer Integrated metabolism, regulatory, developmental pathwaysIntegrated metabolism, regulatory, developmental pathways Simple pathways for information transfer, regulation, developmentSimple pathways for information transfer, regulation, development Simple metabolic pathways for creating & using other molecules Simple metabolic pathways for creating & using other molecules

Biological Macromolecules and StructuresBiological Macromolecules and Structures Biomolecular Assemblies; ligand-receptor complexesBiomolecular Assemblies; ligand-receptor complexes Molecules and Structures created by genes, gene products Molecules and Structures created by genes, gene products Gene Products: RNAs; ProteinsGene Products: RNAs; Proteins Genes and GenomesGenes and Genomes

Physics and ChemistryPhysics and Chemistry e.g. Physical Chemistry, Organic Chemistry, Information theory, Constraints of self-assembling adaptive systemse.g. Physical Chemistry, Organic Chemistry, Information theory, Constraints of self-assembling adaptive systems

New

Em

ergent Properties

Genes and Genomes

BiomolecularStructure &

Function

Biochemical Pathways &Processes

Tissue & Organismal Physiology

Ecological Processes

& Populations

Physics and Chemistry

Developmental & Physiological

Processes

Page 35: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

The Next ResponseThe Next Response

Transitional medicine Transitional medicine Personalized medicinePersonalized medicine Merger of medical, chem and Merger of medical, chem and

bioinformaticsbioinformatics Training in cooperative in silico and Training in cooperative in silico and

experimental researchexperimental research Centers that reflect that training ie different Centers that reflect that training ie different

to NCBI or EBIto NCBI or EBI

Think! How the hell are you gonna think and hit at the same time?" "

Page 36: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide BroadcastStatement of the Director, NIGMS, before the House Appropriations Subcommittee on Labor, HHS, Education Thursday, February 25, 1999

Page 37: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Near Term ChallengesNear Term Challenges

Better Resources and AlgorithmsBetter Resources and Algorithms

Page 38: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Current Data Resources and Algorithms are Current Data Resources and Algorithms are Challenged by Biological ComplexityChallenged by Biological Complexity

Our understanding of biological complexity Our understanding of biological complexity is not reflected in the current generation of is not reflected in the current generation of biological data resourcesbiological data resources

Hence these resources do not enable the Hence these resources do not enable the next generationnext generation

Algorithms are often limited since Algorithms are often limited since complexity implies variationcomplexity implies variation

Consider an example - the protein kinase-Consider an example - the protein kinase-like superfamilylike superfamily

Page 39: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

The The SCOPSCOP Classification Hierarchy Classification HierarchySCOP Root

/ + Class

Familyb-Glucanase a-Amylase(N) b-Amylase

FoldTIM b/a-barrel NAD(P)-binding RossmanCellulases

SuperfamilyTIM PLP-binding barrel(Trans)glycosidases

PDBDomains

1e43(a:1-393)d1e43a2

1e3x(a:1-393)d1e3xa2

1e3z(a:1-393)d1e3za2 R

ela

ted

by

ho

mo

log

yD

ete

rmin

ed

by

stru

ctu

re

Courtesy Steven Brenner

Page 40: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

An Example of a Structural Superfamily: An Example of a Structural Superfamily: The Protein Kinase-Like SuperfamilyThe Protein Kinase-Like Superfamily

Superfamily: not all eukaryotic or protein kinases: some homologues discovered in bacteria that phosphorylate antibiotics, others phosphorylate lipids Typical Kinase Core (c-Src, PDB ID: 2SRC)

SCOP grouping for kinases

1) Class: Alpha+Beta

2) Fold: Protein Kinase Catalytic Core

3) Superfamily: Protein Kinase Catalytic Core

4) Families:

a) Ser/Thr Kinases

b) Tyr Kinases

c) Atypical Kinases

d) Antibiotic Kinases

e) Lipid Kinases

Page 41: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Evolution of the Kinase Evolution of the Kinase Superfamily: Comparison of Superfamily: Comparison of Three Superfamily MembersThree Superfamily Members

•A: Casein kinase 1 (PDB ID: 1CSN)

•B: Aminoglycoside kinase (PDB ID: 1J7L)

•C: Phosphatidylinositol 3-kinase (PDB ID: 1E8X).

•D: The previous three structures with only their shared region superposed (1CSN: light blue, 1J7L: red, 1E8X: yellow).

•The three kinases share a minimal core required for ATP binding and phosphotransfer.

Page 42: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Our Algorithms Need to Our Algorithms Need to Continue to EvolveContinue to Evolve

Consider structure comparison Consider structure comparison and alignment of the diverse and alignment of the diverse

protein kinasesprotein kinases

Page 43: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

An Example of Manual vs. Automated with Combinatorial Extension An Example of Manual vs. Automated with Combinatorial Extension (CE)(CE)•The manual alignment can be used to better understand the limitations of our automated method

•Alignment of helix C of two tyrosine kinases

•Insulin Receptor Kinase (pdb id 1IR3)

•c-Src (pdb id 2SRC)

•Can be aligned with 40% ident, 3.0Å RMSD

•In Src, C-helix is displaced and rotated outward

•Rotation pushes n-terminal end of helix out very far from n-terminal end of IRK

•CE gaps a part of this (yellow), splitting helix, aligning part of IRK helix C with loop leading to helix C in Src

Orange: IRK, Blue: c-SrcYellow: CE gap region

Page 44: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

An Example of Manual vs. Automated with CEAn Example of Manual vs. Automated with CE•A closer look:

•The CE alignment puts closer C-alpha positions together but does not respect helical relationships

•Hand alignment respects helix, aligns more distant C-alpha positions

CE alignment Hand alignment

Page 45: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Improving CEfam: Improving CEfam: Multiple Alignments Multiple Alignments with CEwith CE

•Example with strands 1 and 2 of kinase superfamily

•A: original

•B: optimal parameters

•C: manual

•Parameters also improved results with other protein superfamilies in visual analysis

•Just as sequence alignments are benchmarked against structure alignments, structure alignments should be benchmarked to manual results

•Improvement in optimization is now being folded into the next generation of CE

Page 46: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Near Term Challenges - Near Term Challenges - Quality ControlQuality Control

Consider an exampleConsider an example

The definition of domains from The definition of domains from

3-D structure3-D structure

Page 47: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

The 3D Domain Assignment Problem

Domain is a fundamental structural, functional and evolutionary unit of protein:

Compact

Stable

Have hydrophobic core

Fold independently

Perform specific function

Can be re-shuffled and put together in different combinations

Evolution works on the level of domain

Page 48: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Exact assignments of domains remains a difficult and unresolved problem.

There is no complete agreement among experts on domain assignment given a protein structure.

Expert methods agree on 80% of all existing manual assignments, the remaining 20% represent “difficult” cases

Expert assignment #1

Expert assignment #2

Expert assignment #3

Page 49: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Manual and automatic consensusagree

328 chains (77.3% of chains with consensus)

Automatic consensus only46 chains (10.9% of chains

with consensus)Manual consensus only 47 chains (11.1% of chains with consensus)

Automatic consensus and manual consensus disagree 3 chains (0.7% of chains with consensus)

Chains with manual consensus: 375 (80% of entire dataset)

Chains with automatic consensus: 374 (80% of entire dataset)

Chains with consensus (automatic or manual) : 424 (90.6% of entire dataset)

Manual vs. automatic consensuses: do they overlap?

Veretnik et al. 2003 JMB submitted

Page 50: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

1cjaa1cjaa (actin-fragmin kinase, slime mold):(actin-fragmin kinase, slime mold): an unusual kinase an unusual kinase [complex interface][complex interface]

1 domain 1 domain + unassigned 4 domains

DALICATHSCOP, PDP, DomainParser

typical kinase

Page 51: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Near Term Challenges – Near Term Challenges – High ThroughputHigh Throughput

Page 52: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

iintegrated ntegrated GGenomic enomic AAnnotation Pipeline - iGAPnnotation Pipeline - iGAP

Deduced protein sequences

Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG)

Structural assignment of domains by PSI-BLAST on FOLDLIB

Only sequences w/out A-prediction

Only sequences w/out A-prediction

Structural assignment of domains by 123D on FOLDLIB

Create PSI-BLAST profiles for protein sequences

Store assigned regions in the DB

Functional assignment by PFAM, NR, PSIPred assignments

FOLDLIB

NR, PFAM

Building FOLDLIB:

PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP

90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30)

Domain location prediction by sequence

structure infosequence info

SCOP, PDB

Page 53: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Deduced Protein sequences

Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG)

Structural assignment of domains by PSI-BLAST on FOLDLIB

Only sequences w/out A-prediction

Only sequences w/out A-prediction

Structural assignment of domains by 123D on FOLDLIB

Create PSI-BLAST profiles for Protein sequences

Store assigned regions in the DB

Functional assignment by PFAM, NR, PSIPred assignments

FOLDLIB

NR, PFAM

Building FOLDLIB:

PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP

90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30)

Domain location prediction by sequence

structure infosequence info

SCOP, PDB

~800 genomes @ 10k-20k per =~107 ORF’s

4 CPU years

228 CPU years

3 CPU years

9 CPU years

252 CPU years

3 CPU years

104 entries

iintegrated ntegrated GGenomic enomic AAnnotation nnotation PPipeline - ipeline - iGAPiGAP

Li, et al., (2003) Genome Biology

Page 54: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Towards Workflows and the GridTowards Workflows and the Grid

XML

iGAP

Executables ParametersInputOutputResources

APST

Data Manager

Compute Manager

Scheduler

Grid ResourceInformation

Storage

Compute

Grid Middleware

MDS/NWS/Ganglia

SSH/GRAM/GASSPBS/Loadleveler/Condor

SCP/GASS/SRB/FTP

Page 55: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

THE EOL GRID THE EOL GRID CONSORTIUM CONSORTIUM

EOL

Industrial PartnersIBMCeres

Titech Japan

SDSC Blue Horizon The EOL Cluster Sun Enterprise Server

BIISingapore

Encyclopedia Proteomics Inc.

Page 56: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Near Term Challenges –Near Term Challenges –

We need to overcome the We need to overcome the “high noon” problem“high noon” problem

Page 57: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

High Noon – A Working DefinitionHigh Noon – A Working Definition

12:00The cost:benefit ratio of entry to bioinformatics

tools and resources istoo high for the majority of biologists

Thus, those who could gain and

contribute most from the services provided are not users

Page 58: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

One Approach - MBTOne Approach - MBT Java toolkit for developing custom molecular Java toolkit for developing custom molecular

visualization applicationsvisualization applications

High-qualityHigh-qualityinteractiveinteractiverendering of: rendering of:

sequence sequence structurestructure functionfunction

http://mbt.sdsc.edu

Page 59: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

MBT FunctionalityMBT Functionality ProvidesProvides

Data loadingData loading Local files (PDB, mmCIF, Fasta, etc)Local files (PDB, mmCIF, Fasta, etc) Compressed files (zip, gzip)Compressed files (zip, gzip) Remote (http, ftp, OpenMMS?, EJB?)Remote (http, ftp, OpenMMS?, EJB?)

Efficient data accessEfficient data access Raw dataRaw data Derived data (StructureMap)Derived data (StructureMap)

Vizualization (plug-in viewers)Vizualization (plug-in viewers)

Page 60: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

MBT ArchitectureMBT Architecture

Page 61: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Future - The Structure Should Future - The Structure Should be the User Interfacebe the User Interface

Ligand - What otherentries contain this?

Chain - What otherentries have chains with >90% sequence identity?

Residue - What is the environment of this residue?

Page 62: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

On-going and Longer On-going and Longer Term ChallengesTerm Challenges

Page 63: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Outstanding Problems in Outstanding Problems in Sequence Analysis & Sequence Analysis &

ComparisonComparison

Exon recognitionExon recognition Protein coding gene modelingProtein coding gene modeling Protein/EST alignmentProtein/EST alignment Large scale sequence comparison and alignmentLarge scale sequence comparison and alignment Synteny recognitionSynteny recognition Polymorphism / variation detectionPolymorphism / variation detection Regulatory pattern recognitionRegulatory pattern recognition Repetitive DNA characterizationRepetitive DNA characterization RNA gene modelingRNA gene modeling

Page 64: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Exemplar Bioinformatics ProblemsExemplar Bioinformatics Problems

1. Full genome comparisons

2. Rapid assessment of polymorphic variations

3. Complete construction of orthologous and paralogous groups

4. Structure resolution of large assemblies/complexes

5. Dynamical simulation of realistic systems

6. Rapid structural/topological clustering of proteins

7. Protein folding

8. Computer simulation of membrane insertion

9. Simulation of cellular pathways/ sensitivity analysis of pathways stoichiometry and kinetics

Page 65: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

Bringing the Data View and the Complexity Bringing the Data View and the Complexity View Together to Define the Bioinformatics View Together to Define the Bioinformatics

“Engineering” Challenge“Engineering” Challenge

Easy access to any type of Easy access to any type of biological data across databasesbiological data across databases

Ability to go across databases and Ability to go across databases and types of datatypes of data

Rapidly infer knowledge from new Rapidly infer knowledge from new genome sequencesgenome sequences

Find relationships between Find relationships between sequence, structure and function sequence, structure and function of gene productsof gene products

Relate genotype to phenotype in Relate genotype to phenotype in speciesspecies

Access and apply polymorphism Access and apply polymorphism data seamlesslydata seamlessly

A single computer interface (Web A single computer interface (Web browser?)browser?)

Computer platform independenceComputer platform independence Total opaqueness of format Total opaqueness of format

differencesdifferences Compute on a point and click Compute on a point and click

modemode Seamless access to files, file Seamless access to files, file

uploads and downloadsuploads and downloads Multimedia capabilities on the Multimedia capabilities on the

interfaceinterface Ability to integrate new Ability to integrate new

tools/databases painlesslytools/databases painlessly

Page 66: Feb. 25, 2004 World University Network - Worldwide Broadcast The Future of Bioinformatics (with examples from structural bioinformatics) Philip E. Bourne

Feb. 25, 2004Feb. 25, 2004 World University Network - Worldwide BroadcastWorld University Network - Worldwide Broadcast

AcknowledgementsAcknowledgements

To all those who have chosen To all those who have chosen bioinformatics as a career and make the bioinformatics as a career and make the field so richfield so rich

Particularly those who do so for lesser Particularly those who do so for lesser rewards – the data providers and rewards – the data providers and annotatorsannotators

My group for the fun we had discussing My group for the fun we had discussing this topicthis topic

http://rinkworks.com/said/yogiberra.shtmlhttp://rinkworks.com/said/yogiberra.shtml"I didn't really say everything I said."