40
enomics and High Throughput Sequencing Technologies Applications Jim Noonan Department of Genetics

Genomics and High Throughput Sequencing Technologies: Applications

  • Upload
    lei

  • View
    61

  • Download
    1

Embed Size (px)

DESCRIPTION

Genomics and High Throughput Sequencing Technologies: Applications. Jim Noonan Department of Genetics. Outline. Personal genome sequencing. Rationale: understanding human disease Variant discovery and interpretation Genome reduction strategies ( exome sequencing ). - PowerPoint PPT Presentation

Citation preview

Page 1: Genomics and High Throughput Sequencing Technologies: Applications

Genomics and High Throughput Sequencing Technologies:Applications

Jim NoonanDepartment of Genetics

Page 2: Genomics and High Throughput Sequencing Technologies: Applications

Outline

Personal genome sequencing•Rationale: understanding human disease•Variant discovery and interpretation•Genome reduction strategies (exome sequencing)

Functional analysis of biological systems using sequencing•Transcriptome analysis: RNA-seq•Regulatory element discovery: ChIP-seq•Chromatin state profiling and the ‘histone code’•Large-scale efforts: ENCODE and the NIH Epigenome Roadmap

Page 3: Genomics and High Throughput Sequencing Technologies: Applications

Whole genome sequencing: 1000 Genomes

Page 4: Genomics and High Throughput Sequencing Technologies: Applications

Nature 467:1061 (2010)

Page 5: Genomics and High Throughput Sequencing Technologies: Applications

The genetic architecture of human disease

State, MW. Neuron 68:254 (2010)

Page 6: Genomics and High Throughput Sequencing Technologies: Applications

Cooper and Shendure, Nat Rev Genet 12:628 (2011)

Challenge:Interpreting genetic variation

Page 7: Genomics and High Throughput Sequencing Technologies: Applications

Protein-sequence based

DNA-sequence based

Tools for identifying rare damaging mutations

Page 8: Genomics and High Throughput Sequencing Technologies: Applications

Damages protein

Conserved

Cooper and Shendure, Nat Rev Genet 12:628 (2011)

All humans have rare damaging mutations

Page 9: Genomics and High Throughput Sequencing Technologies: Applications

Genome reduction: Exome sequencing

Bamshad et al. Nat Rev Genet 12:745 (2011)

Page 10: Genomics and High Throughput Sequencing Technologies: Applications

De novo mutation

• Likely to have functional effect• Recurrence in independent affected individuals• Absence in controls• Reveal critical pathways in disease

Screen unrelated trios for recurrence

Finding disease-causing rare variants by exome sequencing

Page 11: Genomics and High Throughput Sequencing Technologies: Applications

Sanders et al., Nature 485:237 (2012)

Page 12: Genomics and High Throughput Sequencing Technologies: Applications

Outline

Personal genome sequencing•Rationale: understanding human disease•Variant discovery and interpretation•Genome reduction strategies (exome sequencing)•Challenges to de novo genome assembly using short reads

Functional analysis of biological systems using sequencing•Transcriptome analysis: RNA-seq•Regulatory element discovery: ChIP-seq•Chromatin state profiling and the ‘histone code’•Large-scale efforts: ENCODE and the NIH Epigenome Roadmap

Page 13: Genomics and High Throughput Sequencing Technologies: Applications

mRNA-seq workflow

Martin and Wang Nat Rev Genet 12:671 (2011) Wang et al. Nat Rev Genet 10:57 (2009)

Page 14: Genomics and High Throughput Sequencing Technologies: Applications

Gene expression profiling by massively parallelRNA sequencing (RNA-seq)

Page 15: Genomics and High Throughput Sequencing Technologies: Applications

Mapping RNA-seq reads and quantifying transcripts

Page 16: Genomics and High Throughput Sequencing Technologies: Applications

Quantifying gene expression by RNA-seq

Use existing gene annotation:• Align to genome plus annotated splices• Depends on high-quality gene annotation• Which annotation to use: RefSeq, GENCODE, UCSC?• Isoform quantification?• Identifying novel transcripts?

Reference-guided alignments:• Align to genome sequence• Infer splice events from reads• Allows transcriptome analyses of genomes with poor

gene annotation

De novo transcript assembly:• Assemble transcripts directly from reads• Allows transcriptome analyses of species without

reference genomes

Page 17: Genomics and High Throughput Sequencing Technologies: Applications

Normalization methods:Reads per kilobase of feature length per million mapped reads (RPKM)

RNA-seq reads mapped to reference

• What is a “feature?”• What about genomes with poor genome annotation?• What about species with no sequenced genome?

For a detailed comparison of normalization methods, see Bullard et al. BMC Bioinformatics 11:94.

Page 18: Genomics and High Throughput Sequencing Technologies: Applications

Wang et al. Nat Rev Genet 10:57 (2009)

What depth of sequencing is required to characterize a transcriptome?

Page 19: Genomics and High Throughput Sequencing Technologies: Applications

Considerations

Gene length:• Long genes are detected before short genes

Expression level:• High expressors are detected before low expressors

Complexity of the transcriptome:• Tissues with many cell types require more sequencing

Feature type• Composite gene models • Common isoforms • Rare isoforms

Detection vs. quantification• Obtaining confident expression level estimates (e.g.,

“stable” RPKMs) requires greater coverage

Page 20: Genomics and High Throughput Sequencing Technologies: Applications

Pervasive alternative splicing in humans

Wang et al. Nature 456:470 (2008)

Page 21: Genomics and High Throughput Sequencing Technologies: Applications

Map reads to genome

Map remaining reads to known splice junctions

Composite gene model approach

•Requires good gene models•Isoforms are ignored•Which annotation to use: RefSeq, GENCODE, UCSC?

Page 22: Genomics and High Throughput Sequencing Technologies: Applications

Strategies for transcript assembly

Garber et al. Nat Methods 8:469 (2011)

Page 23: Genomics and High Throughput Sequencing Technologies: Applications

ChIP-seq

• General transcription machinery

• Transcription factors

• Modifications to histone tails

• Methylated DNA

Page 24: Genomics and High Throughput Sequencing Technologies: Applications

Noonan and McCallion, Ann Rev Genomics Hum Genet 11:1 (2010)

Rationale: identifying regulatory elements in genomes

Page 25: Genomics and High Throughput Sequencing Technologies: Applications

ChIP-seq peak calling

ChIP-seq is an enrichment methodRequires a statistical framework for determining the significance of enrichment

ChIP-seq ‘peaks’ are regions of enriched read density relative to an input controlInput = sonicated chromatin collected prior to immunoprecipitation

Page 26: Genomics and High Throughput Sequencing Technologies: Applications

There are many ChIP-seq peak calling methods

Wilbanks and Facciotti PLoS ONE 5:e11471 (2010)

Page 27: Genomics and High Throughput Sequencing Technologies: Applications

Zhou et al. Nat Rev Genet 12:7 (2011)

The histone code

Page 28: Genomics and High Throughput Sequencing Technologies: Applications

Mapping and analysis of chromatin state dynamics in nine human cell types

Ernst et al., Nature 473:43 (2011)

Cell types:•H1 ESC•K562 (erythrocyte derived)•GM12878 (B-lymphoblastoid)•HepG2 (hepatocellular carcinoma)•HUVEC (umbilical vein endothelium)•HSMM (skeletal muscle myoblasts)•NHLF (lung fibroblast)•NHEK (epidermal keratinocytes)•HMEC (mammary epithelium)

Marks:•H3K4me3 (promoter/enhancer)•H3K4me2 (promoter/enhancer)•H3K4me1 (enhancer)•H3K9ac (promoter/enhancer)•H3K27ac (promoter/enhancer)•H3K36me3 (transcribed regions)•H4K20me1 (transcribed regions)•H3K27me3 (Polycomb repression)•CTCF

Page 29: Genomics and High Throughput Sequencing Technologies: Applications

Mapping and analysis of chromatin state dynamics in nine human cell types

Ernst et al., Nature 473:43 (2011)

Page 30: Genomics and High Throughput Sequencing Technologies: Applications

Chromatin state dynamics at WLS

Ernst et al., Nature 473:43 (2011)

Page 31: Genomics and High Throughput Sequencing Technologies: Applications

• Annotation based on nearest TSS

Functions associated with putative promoter and enhancer states

Page 32: Genomics and High Throughput Sequencing Technologies: Applications

ChIP-seq: enhancer identification in vivo

•p300 = enhancer-associated factor

Visel et al. Nature 457:854 (2009)

•p300 binding = ~90% predictive of enhancer activity

Page 33: Genomics and High Throughput Sequencing Technologies: Applications

Myers, PLoS Biol 9:e1001046 (2011)

Systematic experimental annotation of regulatory functions

Page 34: Genomics and High Throughput Sequencing Technologies: Applications

http://genome.ucsc.edu/ENCODE/

The ENCODE Project

Page 35: Genomics and High Throughput Sequencing Technologies: Applications

http://www.roadmapepigenomics.org/

The NIH Roadmap Epigenomics Project

Page 36: Genomics and High Throughput Sequencing Technologies: Applications

Myers, PLoS Biol 9:e1001046 (2011)

ENCODE cell lines

Page 37: Genomics and High Throughput Sequencing Technologies: Applications

http://genome.ucsc.edu/ENCODE/

ENCODE Project data access

Page 38: Genomics and High Throughput Sequencing Technologies: Applications

Genome Browser interface and data types

Genome Viewer

Categories of data: displayed as tracksDiscrete intervals (genes) or continuous (transcription)

Hyperlinks and pulldown tabs for individual tracks•Go to track description page •Hide or show data in genome viewer

Some tracks include multiple datasets (‘subtracks’)•Go to track description page to select

Page 39: Genomics and High Throughput Sequencing Technologies: Applications

ENCODE Transcription track

Display options

Subtracks

Page 40: Genomics and High Throughput Sequencing Technologies: Applications

Conclusions

Personal genomics is becoming a reality•Genome sequencing will be a routine diagnostic tool•$5,000 to sequence single genome; current cost for clinical resequencing of single genes•Your genome will be sequenced•Long-read sequencing will solve de novo assembly issues •Data analysis and interpretation

RNA-seq and ChIP-seq•Identifying genes and annotating regulatory function within and among genomes•Computational issues: data normalization, peak calling, differential

expression and binding•Large-scale studies revealing regulatory architecture of human & model genomes