17
Bioinformatics: Data-driven molecular biology Mikhail Gelfand A.A.Kharkevich Institute for Information Transmission Problems, RAS Moscow II Испано-российский форум по информационным и коммуникационным технологиям Madrid, 21-25 / IX / 2009

Bioinformatics : Data-driven molecular biology

  • Upload
    gunda

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Bioinformatics : Data-driven molecular biology. Mikhail Gelfand A.A.Kharkevich Institute for Information Transmission Problems, RAS Moscow II Испано-российский форум по информационным и коммуникационным технологиям Madrid, 21-25 / IX / 2009. Exponential increase of data volume. - PowerPoint PPT Presentation

Citation preview

Page 1: Bioinformatics :  Data-driven molecular biology

Bioinformatics: Data-driven molecular

biology

Mikhail GelfandA.A.Kharkevich Institute for Information Transmission Problems, RAS

Moscow

II Испано-российский форум по информационным и коммуникационным технологиям

Madrid, 21-25 / IX / 2009

Page 2: Bioinformatics :  Data-driven molecular biology

Exponential increase of data volume

red – papers (PubMed)blue – sequence fragments (GenBank)green – nucleorides (GenBank)

100

1000

10000

100000

1000000

10000000

100000000

1000000000

10000000000

100000000000

1982 1987 1992 1997 2002 2007

of 18 million papers in PubMed, ~675 thousand have keywords “bioinformat* OR comput*”

Page 3: Bioinformatics :  Data-driven molecular biology

622 complete genomes (bacteria)

3 3 6 6 719 25 30

4866

81

142

186

0

20

40

60

80

100

120

140

160

180

200

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Page 4: Bioinformatics :  Data-driven molecular biology

>45 thousand Google hits on “genome deciphered”

Top 10 hits:• bioremediation

– bacterium Pseudomonas• agriculture and biotech

– crop and biofuel plant Sorghum– rice

• medicine– pathogenic bacterium

Staphylococcus– SARS (atypical pneumonia) virus– Brugia worm (elephantiasis)

• individual genome (medicine)– James Watson

• science / model organism – macaque

• science / evolution– mammoth (mitochondrial)– platypus

Page 5: Bioinformatics :  Data-driven molecular biology

Sequencing is just the beginning

Bacterial genome: several million nucleotides

600 through 9,000 genes (~ 90% of a genome codes for proteins)

This slide: 0,1% of the Escherichia coli genome

Human genome: 3 billion nucleotides, 25-30 thousand genes

polymorphisms (individual differences): ~ 1 for 1000 nucleotides

differences between human and chimpanzee: ~ 1 of 100

Page 6: Bioinformatics :  Data-driven molecular biology

Not just genomes

Other types of large-scale experiments / datasets:• State of the genome

(gene expression)– methylation– nucleosome positioning– histone modifications

• Transcriptomics, protein abundance (gene expression)

• Protein-protein interactions– signaling etc.– functional complexes

• Protein-DNA interactions (regulation)

• etc. etc.

Page 7: Bioinformatics :  Data-driven molecular biology

Goals

• Functional annotation of genes and proteins– biological function– regulation (in what conditions)

• Functional annotation of genomes– metabolic reconstruction and modeling– regulatory networks and development– prediction of organism properties from its

genome

Page 8: Bioinformatics :  Data-driven molecular biology

Applications: biotechnology

• Improvement of production strains (chemistry, pharma, food industry)– via modeling of metabolic pathways

• New enzymes (new functions, stress tolerance)– via sequencing and functional annotation

• Biofuels – fast-growing, stress-tolerant plants;

identification of genes – microbes as producers of ethanol or fatty acids:

targeted genome design

Page 9: Bioinformatics :  Data-driven molecular biology

Applications: medicine and pharma

• Personalized medicine – identification of predisposing alleles: lifestyle– pharmacogenomics (metabolic alleles)– diagnostics

• Drug targets (chronic disease)– analysis of signaling pathways

• Anti-infectives – identification of drug targets

• Drug design; identification of drug candidates– modeling of protein structure and interactions

of proteins with small molecules

Page 10: Bioinformatics :  Data-driven molecular biology

Methods. Integration of data

• Systems biology:Integration of diverse datasets for one organism

• Comparative genomics:Simultaneous analysis of genomic data for many organisms

• Comparative systems biology:understanding the evolution of gene regulation and expression, signaling etc.

• Comparative structural biology

Page 11: Bioinformatics :  Data-driven molecular biology

Bioinformatics in Russia• Few high-throughput experiments

– Open data– Collaborations– Theory (evolution), methods, algorithms

• Highlights:– Evolution (IITP RAS) and taxonomy (IPCB MSU)– Regulation (FBB MSU, GosNIIGenetika, IITP RAS, ICaG SB

RAS)– Annotation (FBB MSU, IITP RAS)– Protein Structure (IPR RAS, IMB RAS, IPCB MSU, BF MSU)– Modeling

• Metabolism (IPCB MSU, ICaG SB RAS)• Regulation (SpBSPU , ICaG SB RAS)

– Drug design (IBMC RAMS)

Page 12: Bioinformatics :  Data-driven molecular biology

Research and Training Center “Bioinformatics”, Institute of Information

Transmission Problems (5 years: 2003-2009)• Molecular evolution

– Alternative splicing as a driver of evolution in eukaryotes

– Positive selection

• Comparative genomics of regulation in bacteria– Evolution of regulatory pathways– Protein-DNA interactions

• Annotation– Gene recognition– Functional annotation– Regulation

Page 13: Bioinformatics :  Data-driven molecular biology

Comparative genomics in action: confirmed predictions

• Regulatory mechanisms– riboswitches (riboflavin – vitamin B1, thiamin – vitamin B2)– antisense regulation of the methionine-cysteine pathway– role of the ribosome in zinc homeostasis

• Regulators: NrdR, MtaR/MetR, CmbR, NiaR• Enzymes: FadE, ThiN, TenA, CobZ, CobX/CbiZ, PduX, NagP,

NagB-II• Microcins (capistruin, Burkholderia thailandensis) • Transporters

– АВС-transporters with universal energizing components: Co, Ni, biotin (vitamin H), thiamin (vitamin B2), riboflavin (vitamin B1)

– other: threonin, methionin, oligogalacturonides, N-acetylglucosamin, corrinoids, nyacin, riboflacin, Co

• Regulatory motifs: nitrogen-fixation, fatty acid biosynthesis, iron homeostasis, catabolism of chitin and pectin

• Regulatory sites: several dozens

Page 14: Bioinformatics :  Data-driven molecular biology

Functional annotation of genomes

Трансляция

Транскрипция

Репликация и репарация

Деление

Сигнальные пути

Внешняя мембрана

Движение

Оборот белков

Ионы

Защита

Секреция

Энергия

Сахара

Аминокислоты

Нуклеотиды

Коферменты

Липиды

Вторичный метаболизм

Слабо определено

Не определено

First Russian bacterial genome, Acholeplasma laidlawii (2008): sequencing and proteomics: Institute of Physico-Chemical Medicine; annotation: IITP: ~1,5 Mb; ~1400 genes. Established function for ~80% genes; metabolic reconstruction

Page 15: Bioinformatics :  Data-driven molecular biology

Publications (refereed)

0

5

10

15

20

25

30

35

2003 2004 2005 2006 2007 2008 2009 average

Book Chapters

Russian Journals

International Journals

Collaboration (USA)

Collaboration (Europe)

Page 16: Bioinformatics :  Data-driven molecular biology

Collaborations• European Laboratory of Molecular Biology *• Germany

– Humboldt University, Berlin– Munich Technical University

• France– Lyon University

• United Kingdom– University of East Anglia

• Spain– Center for Genome Regulation (Barcelona)

• USA– MIT– Burnham Institute *– Lawrence Berkeley National Laboratory *– Stowers Institute *– Rutgers University

• China– China-Germany Partner Institute of Molecular Genetics

(Shanghai) • Industry

– Biomax (Germany)– Interated Genomics (USA)

Bold: on-going

* Former students

Page 17: Bioinformatics :  Data-driven molecular biology