83
Genomics, Bioinformatics and the Revolution in Biology Jonathan Pevsner, Ph.D. Kennedy Krieger Institute/ Johns Hopkins School of Medicine

Genomics, Bioinformatics and the Revolution in Biology

  • Upload
    abby

  • View
    88

  • Download
    0

Embed Size (px)

DESCRIPTION

Genomics, Bioinformatics and the Revolution in Biology. Jonathan Pevsner, Ph.D. Kennedy Krieger Institute/ Johns Hopkins School of Medicine. Outline. Three views of bioinformatics and genomics Informatics From small to large From genotype to phenotype The chromosomes - PowerPoint PPT Presentation

Citation preview

Page 1: Genomics, Bioinformatics and the Revolution in Biology

Genomics, Bioinformaticsand the Revolution in Biology

Jonathan Pevsner, Ph.D.Kennedy Krieger Institute/

Johns Hopkins School of Medicine

Page 2: Genomics, Bioinformatics and the Revolution in Biology

Outline

Three views of bioinformatics and genomicsInformaticsFrom small to largeFrom genotype to phenotype

The chromosomes

SNPs, HapMap, and the 1000 Genomes project

Page 3: Genomics, Bioinformatics and the Revolution in Biology

• Bioinformatics is the interface of biology and computers.It is the analysis of proteins, genes and genomes using computer algorithms and databases.

• Genomics is the analysis of genomes, including the nature of genetic elements on chromosomes. The tools of bioinformatics are used to make sense of the billions of base pairs of DNA that are sequenced by genomics projects.

• Genetics is the study of the origin and expression of individual uniqueness.

Definitions of bioinformatics and genomics

Page 4: Genomics, Bioinformatics and the Revolution in Biology

Three views of bioinformatics and genomics

1. The field of informatics

2. From small to large

3. From genotype to phenotype

Page 5: Genomics, Bioinformatics and the Revolution in Biology

Tool-users

Tool-makers

bioinformatics

public healthinformatics

medicalinformatics

infrastructure

databases algorithms

genomics

Page 6: Genomics, Bioinformatics and the Revolution in Biology

Three views of bioinformatics and genomics

1. The field of informatics

2. From small to large

3. From genotype to phenotype

Page 7: Genomics, Bioinformatics and the Revolution in Biology

DNA RNA phenotypeprotein

Page 8: Genomics, Bioinformatics and the Revolution in Biology

020406080

100120140160180200

1982 1992 2002 2008

Total number of DNA base pairs in GenBank/WGS

Sequences (millions)

Base pairs (billions)

Rapid growth of DNA sequences

Year

Page 9: Genomics, Bioinformatics and the Revolution in Biology

Time ofdevelopment

Body region, physiology, pharmacology, pathology

Page 10: Genomics, Bioinformatics and the Revolution in Biology
Page 11: Genomics, Bioinformatics and the Revolution in Biology
Page 12: Genomics, Bioinformatics and the Revolution in Biology

The Origin of Species (1859)

It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent upon each other in so complex a manner, have all been produced by laws acting around us.

Source: Origin of Species, Chapter 15

Page 13: Genomics, Bioinformatics and the Revolution in Biology
Page 14: Genomics, Bioinformatics and the Revolution in Biology

Eukaryotes(Baldauf et al. 2000)

animals

fungi

plants

slimemold

GiardiaTrichomonas

Paramecium

Trypanosoma

Plasmodium

Page 15: Genomics, Bioinformatics and the Revolution in Biology

Wolfe et al. (1999)

Page 16: Genomics, Bioinformatics and the Revolution in Biology

Wolfe et al. (1999)

8 chromosomes(5,000 genes)

16 chromosomes(10,000 genes)

16 chromosomes(6,000 genes)

Page 17: Genomics, Bioinformatics and the Revolution in Biology

Paramecium tetraurelia: a ciliate with two nuclei, 40,000 genes, and three whole-genome duplications

Page 18: Genomics, Bioinformatics and the Revolution in Biology

Phylogenetic footprinting

Population shadowing

Phylogenetic shadowing

Page 19: Genomics, Bioinformatics and the Revolution in Biology

Three views of bioinformatics and genomics

1. The field of informatics

2. From small to large

3. From genotype to phenotype

Page 20: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Page 21: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Phenotype

We see 500 inpatients and 13,000 outpatients per year at the Kennedy Krieger Institute. Why do children engage in self-injurious behavior? In many cases, there are chromosomal insults.

Page 22: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

From genotype…

…to phenotype

Page 23: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

DNA

cellular phenotype

proteinRNA

clinical phenotype

Page 24: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

DNA

Central dogma of molecular biology:DNA is transcribed into RNA,and translated into protein.

proteinRNA

Central dogma of bioinformatics/genomics:the genome is transcribed into the transcriptome, and translated into the proteome.

Page 25: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

populationOver 200 billion base pairs of DNA have now been sequenced, from >165,000 organisms.

0

20

40

60

80

100

120

140

160

180

200

1982 1992 2002 2008

Page 26: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Scope of bioinformatics

Sequence analysisPairwise alignmentMultiple sequence alignmentPhylogenyDatabase searching (e.g. BLAST)

Functional genomicsRNA studies; gene expression profilingProteomics; protein structureGene function

Page 27: Genomics, Bioinformatics and the Revolution in Biology

Pairwise alignments in the 1950s

-corticotropin (sheep)Corticotropin A (pig)

ala gly glu asp asp gluasp gly ala glu asp glu

OxytocinVasopressin

CYIQNCPLGCYFQNCPRG

Page 28: Genomics, Bioinformatics and the Revolution in Biology

Early example of sequence alignment: globins (1961)

H.C. Watson and J.C. Kendrew, “Comparison Between the Amino-Acid Sequences of Sperm Whale Myoglobin and of Human Hæmoglobin.” Nature 190:670-672, 1961.

myoglobin globins:

Page 29: Genomics, Bioinformatics and the Revolution in Biology

2e Fig. 5.21

LAGAN

Page 30: Genomics, Bioinformatics and the Revolution in Biology

Multiple sequence alignment of five globins:ClustalW

Page 31: Genomics, Bioinformatics and the Revolution in Biology

Praline

Page 32: Genomics, Bioinformatics and the Revolution in Biology

MUSCLE

Page 33: Genomics, Bioinformatics and the Revolution in Biology

Probcons

Page 34: Genomics, Bioinformatics and the Revolution in Biology

TCoffee

Page 35: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Scope of bioinformatics

Sequence analysisPairwise alignmentMultiple sequence alignmentPhylogenyDatabase searching (e.g. BLAST)

Functional genomicsRNA studies; gene expression profilingProteomics; protein structureGene function

Page 36: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Four bases: A, G, C, T arranged in base pairs along a double helix (1953).

Human genome project: sequencing all ~3 billion base pairs (2003).

Page 37: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

1995: first genome sequence (a bacterium)2000: fruit fly genome, plant2003: human genome2008: --two individual human genomes finished

--1,000 human genomes (launched)--SNPs used to study chromosomes

Page 38: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Page 39: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Page 40: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Time ofdevelopment

Body region, physiology, pharmacology, pathology

Page 41: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Page 42: Genomics, Bioinformatics and the Revolution in Biology

DNA

RNA

protein

cell

pathway

organism

population

Phenotype

Genotype

Page 43: Genomics, Bioinformatics and the Revolution in Biology

Outline

Three views of bioinformatics and genomicsInformaticsFrom small to largeFrom genotype to phenotype

The chromosomes

SNPs, HapMap, and the 1000 Genomes project

Page 44: Genomics, Bioinformatics and the Revolution in Biology

Eukaryotic genomes are organized into chromosomes

Genomic DNA is organized in chromosomes. The diploid number of chromosomes is constant in each species(e.g. 46 in human). Chromosomes are distinguished by a centromere and telomeres.

The chromosomes are routinely visualized by karyotyping(imaging the chromosomes during metaphase, when each chromosome is a pair of sister chromatids).

Page 45: Genomics, Bioinformatics and the Revolution in Biology

Fig. 16.19Page 565

Page 46: Genomics, Bioinformatics and the Revolution in Biology

human chromosome 21at NCBI

nucleolar organizing center

centromere

Page 47: Genomics, Bioinformatics and the Revolution in Biology

nucleolar organizing center

centromere

human chromosome 21at www.ensembl.org

Page 48: Genomics, Bioinformatics and the Revolution in Biology

human chromosome 21at UCSC Genome Browser

centromere

Page 49: Genomics, Bioinformatics and the Revolution in Biology

human chromosome 21at UCSC Genome Browser

centromere

Page 50: Genomics, Bioinformatics and the Revolution in Biology

First P.G. mitosis in polar view. Tradescantia virginiana, Commelinaceae, n = 9 (from aberrrant plant with 22 chromosomes). 2 BE - CV smears. x 1200. Printed on multigrade paper. Darlington.

Page 51: Genomics, Bioinformatics and the Revolution in Biology

First P.G. mitosis in Paris quadrifolia, Liliaceae, showing all stages from prophase to telophase. n = 10 (cf. Darlington 1937, 1941) 2 BE – CV smear, 8mm. objective. x 800Darlington.

Page 52: Genomics, Bioinformatics and the Revolution in Biology

Root tip squashes showing anaphase separation. Fritillaria pudica, 3x = 39, spiral structure of chromatids revealed by pressure after cold treatment. 2 BD – Feulgen; x 3000Darlington.

Page 53: Genomics, Bioinformatics and the Revolution in Biology

Cleavage mitosis in the morula of the teleostean fish, Coregonus clupeoides, in the middle of anaphase. Spindle structure revealed by slow fixation. Section cut at 10 u. x 4000. Strong Flemming, haematoxylin. Prep. and photo by P.C. Koller.Darlington.

Page 54: Genomics, Bioinformatics and the Revolution in Biology

The eukaryotic chromosome: Robertsonian fusioncreates one metacentric by fusion of two acrocentrics

Ohno (1970) Plate II

ordinary male house mouse (Mus musculus, 2n = 40)

male tobacco mouse (Mus poschiavinus, 2n = 26)

Page 55: Genomics, Bioinformatics and the Revolution in Biology

The spectrum of variation

Category of variation Size typeSingle base pair changes 1 bp SNPs,

point mutationsSmall insertions/deletions 1 – 50 bpShort tandem repeats 1 – 500 bp microsatellitesFine-scale structural var. 50 bp – 5 kb del, dup, inv

tandem repeatsRetroelement insertions 0.3 – 10 kb SINEs, LINEs

LTRs, ERVsIntermediate-scale struct. 5 kb – 50 kb del, dup, inv,

tandem repeatsLarge-scale structural var. 50 kb – 5 Mb del, dup, inv, large

tandem repeatsChromosomal variation >>5Mb aneuploidy

Adapted from Sharp AJ et al. (2006) Annu Rev Genomics Hum Genet 7:407-42

Page 56: Genomics, Bioinformatics and the Revolution in Biology

Across the genome, there are four possible SNP calls:[1] homozygous (AA)[2] homozygous (BB)[3] heterozygous (AB)[4] no call

Page 57: Genomics, Bioinformatics and the Revolution in Biology

In a deleted region, there are three possible SNP calls:[1] A (interpreted as AA)[2] B (interpreted as BB)[3] no call

Across the genome, there are four possible SNP calls:[1] homozygous (AA)[2] homozygous (BB)[3] heterozygous (AB)[4] no call

Page 58: Genomics, Bioinformatics and the Revolution in Biology

Single nucleotide polymorphisms (SNPs) to investigate chromosomes: A case of 7p deletion

AA AB BB

Page 59: Genomics, Bioinformatics and the Revolution in Biology

AA AB BBA B

A case of 7p deletion

Page 60: Genomics, Bioinformatics and the Revolution in Biology

A B

•Deletions (and duplications) such as these are called copy number variants (CNVs).• CNVs commonly occur in normal individuals. • When found in individuals with disease, we can tell if they are inherited (likely to be benign) or occur de novo (more likely to be disease-associated) by comparison to the parents’ genotypes.• Recent papers report many CNVs in disease.

A case of 7p deletion

Page 61: Genomics, Bioinformatics and the Revolution in Biology

A case of trisomy 21 (Down syndrome)

AAA AAB ABB BBB

Page 62: Genomics, Bioinformatics and the Revolution in Biology

Three cases of 10q deletion

Page 63: Genomics, Bioinformatics and the Revolution in Biology
Page 64: Genomics, Bioinformatics and the Revolution in Biology

Deafness gene?

Page 65: Genomics, Bioinformatics and the Revolution in Biology

The International HapMap Project

► A catalog of common genetic variants that occur in humans ► The project’s goal is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared ► An initial focus has been on four groups (n=270):

CEU European ancestry (30 trios)Utah residents

YRI African ancestry (30 trios)Yoruba in Ibadan, Nigeria

JPT/CHB Asian ancestry (90 individuals)Japanese in Tokyo, JapanHan Chinese in Beijing, China

► Phase I (2005): > 1 million SNPs Phase II (2007): added 2.1 million SNPs

Page 66: Genomics, Bioinformatics and the Revolution in Biology

The International HapMap Project

► In addition to CEU, YRI, and JPT/CHB additional populations have been genotyped including:

Maasai in Kinyawa, KenyaLuhya in Webuye, KenyaGujarati Indians in Houston, TXToscani in ItalyMexican ancestry in Los AngelesAfrican ancestry in southwestern US

Page 67: Genomics, Bioinformatics and the Revolution in Biology
Page 68: Genomics, Bioinformatics and the Revolution in Biology
Page 69: Genomics, Bioinformatics and the Revolution in Biology
Page 70: Genomics, Bioinformatics and the Revolution in Biology
Page 71: Genomics, Bioinformatics and the Revolution in Biology

The ENCODE project

►The ENCyclopedia Of DNA Elements (ENCODE) project was launched in 2003 ► Pilot phase: devise and test high-throughput approaches to identify functional elements. Efforts center on 44 DNA targets. These cover about 1 percent of the human genome, or about 30 million base pairs. ► Second phase: technology development. ► Third phase: production. Expand the ENCODE project to analyze the remaining 99 percent of the human genome.

Page 72: Genomics, Bioinformatics and the Revolution in Biology

The ENCODE project

Goal of ENCODE: build a list of all sequence-based functional elements in human DNA. This includes: ► protein-coding genes► non-protein-coding genes► regulatory elements involved in the control of gene transcription ► DNA sequences that mediate chromosomal structure and dynamics.

Page 73: Genomics, Bioinformatics and the Revolution in Biology

ENCODE data at the UCSC Genome Browser: beta globin

HBB, HBD, HBG1,HBG2, HBE1

Page 74: Genomics, Bioinformatics and the Revolution in Biology

ENCODE data at the UCSC Genome Browser: beta globin(50,000 base pairs including HBB, HBD, HBG1, HBG2, HBE1)

Page 75: Genomics, Bioinformatics and the Revolution in Biology

ENCODE tracks available at the UCSC Genome Browser

Page 76: Genomics, Bioinformatics and the Revolution in Biology

EGASP: the human ENCODE Genome Annotation Assessment Project

EGASP goals:

[1] Assess of the accuracy of computational methods to predict protein coding genes. 18 groups competed to make gene predictions, blind; these were evaluated relative to reference annotations generated by the GENCODE project.

[2] Assess of the completeness of the current human genome annotations as represented in the ENCODE regions.

Page 77: Genomics, Bioinformatics and the Revolution in Biology

UCSC: tracks for Gencode and for various gene prediction algorithms(focus on 50 kb encompassing five globin genes)

JIGSAW

Gencode

Page 78: Genomics, Bioinformatics and the Revolution in Biology

On bioinformatics

“Science is about building causal relations between natural phenomena (for instance, between a mutation in a gene and a disease). The development of instruments to increase our capacity to observe natural phenomena has, therefore, played a crucial role in the development of science - the microscope being the paradigmatic example in biology. With the human genome, the natural world takes an unprecedented turn: it is better described as a sequence of symbols. Besides high-throughput machines such as sequencers and DNA chip readers, the computer and the associated software becomes the instrument to observe it, and the discipline of bioinformatics flourishes.”

Page 79: Genomics, Bioinformatics and the Revolution in Biology

On bioinformatics

“However, as the separation between us (the observers) and the phenomena observed increases (from organism to cell to genome, for instance), instruments may capture phenomena only indirectly, through the footprints they leave. Instruments therefore need to be calibrated: the distance between the reality and the observation (through the instrument) needs to be accounted for. This issue of Genome Biology is about calibrating instruments to observe gene sequences; more specifically, computer programs to identify human genes in the sequence of the human genome.”

Martin Reese and Roderic Guigó, Genome Biology 2006 7(Suppl I):S1,introducing EGASP, the Encyclopedia of DNA Elements (ENCODE) Genome Annotation Assessment Project

Page 80: Genomics, Bioinformatics and the Revolution in Biology

The 1000 Genomes Project

Goal: To create a deep catalog of human genetic variation in multiple populations.

[1] Discover variants (SNPs, copy number variants, insertions/deletions). Include ~all variants with allele frequencies >1% across the genome (and >0.1-0.5% in gene regions)

[2] Estimate the frequencies of variant alleles

Page 81: Genomics, Bioinformatics and the Revolution in Biology

The 1000 Genomes Project

Secondary goals: • Characterize SNPs• Improve the human reference sequence• Study regions under selection• Study variation across populations• Study mutation and recombination

Page 82: Genomics, Bioinformatics and the Revolution in Biology

The 1000 Genomes Project

Current approaches include sequencing two HapMap trios (one from YRI, one CEU; father/mother/child) at 20X depth using next generation sequencing technology. For one individual, 20X depth = 60 gigabasesFor one trio, 20X depth = 180 gigabases

In another approach, sequence many individuals (n=1000) from the extended HapMap collection at lighter coverage.

Page 83: Genomics, Bioinformatics and the Revolution in Biology

Conclusions

We briefly surveyed the fields of bioinformatics and genomics. Bioinformatics serves biology, and genomics depends on the tools of bioinformatics.

There are rapid advances in available technologies, such as next generation sequencing, that allow us to address fundamental biological questions at unprecedented resolution. These questions include the nature of variation within and between genomes of individuals, groups (gender, ethnicity, disease status), and across species. Other questions, posed decades ago, concern biological processes such as development, metabolism, adaptation, and function.