50
Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health Center Chem 395 Bioanalytical Chemistry Storrs, Connecticut February 8, 2011

Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Modern Genomic Analysis Methods

Janet HagerTranslational Genomics Core

Department of Genetics and Developmental BiologyUniversity of Connecticut Health Center

Chem 395 Bioanalytical ChemistryStorrs, Connecticut

February 8, 2011

Page 2: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Genomic Resources at the UCHC Translational Genomics Core

Ill

Sample QCAgilent Bioanalyzer 2100Nanodrop 1000

MicroarraysIllumina BeadStation and BeadChip: Gene Expression and Infinium and GoldenGate genotypingIllumina BeadXpress: Custom Veracode genotyping panels Affymetrix Chip and Fluidics Station: Gene Expression, CNV, genotyping

Deep SequencingIllumina Genome Analyzer IIxIlumina HiSeq2000

4Department of Genetics and Developmental BiologyCell and Genome Sciences Center - 400 Farmington Avenue

Page 3: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 4: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

The basic goal for sample prep in all 2nd generation next gen sequencing are the same:

Generate large numbers of unique "polonies" (polymerase generated colonies) or “clusters” that can be simultaneously sequenced.

These parallel reactions occur on the surface of a "flow cell" (basically a water-tight microscope slide) which provides a large surface area for many thousands of parallel chemical reactions.

Page 5: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 6: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

Step 1: Sample Preparation

DNA sample of interest is sheared to appropriate size (average ~800bp) using a compressed air device known as a nebulizer.

Ends of the DNA are polished, and two unique adapters are ligated to the fragments.

Ligated fragments of the size range of 150-200bp are isolated via gel extraction and amplified using limited cycles of PCR.

Complete detailed protocols for DNA, RNA and small RNA library preparation

This process is a fairly straightforward multi-step molecular biology process

Page 7: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 8: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing MethodSteps 2-6: Cluster Generation by Bridge Amplification

454 and ABI SOLiD methods use a bead-based emulsion PCR to generate "polonies”

Illumina uses a unique "bridged" amplification reaction that occurs on the surface of the flow cell.

Flow cell surface is coated with single stranded oligonucleotides that correspond to the sequences of the adapters ligated during the sample preparation stage.

Single-stranded, adapter-ligated fragments are bound to the surface of the flow cell exposed to reagents for polyermase-based extension.

Page 9: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

Page 10: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

Page 11: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

Steps 2-6: Cluster Generation by Bridge Amplification

Repeated denaturation and extension results in localized amplification of single molecules in millions of unique locations across the flow cell surface.

This process occurs in what is referred to as Illumina's "cluster station", an automated flow cell processor.

Priming occurs as the free/distal end of a ligated fragment "bridges" to a complementary oligo on the surface.

Page 12: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

Page 13: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

Page 14: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 15: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing MethodSteps 7-12: Sequencing by Synthesis

Flow cell containing millions of unique clusters is now loaded into the sequencer for automated cycles of extension and imaging.

The first cycle of sequencing consists first of the incorporation of a single fluorescent nucleotide, followed by high resolution imaging of the entire flow cell.

Images represent the data collected for the first base. Any signal above background identifies the physical location of a cluster (or polony), and the fluorescent emission identifies which of the four bases was incorporated at that position.

This cycle is repeated, one base at a time, generating a series of images each representing a single base extension at a specific cluster.

Base calls are derived with an algorithm that identifies the emission color over time. Illumina reads range from 36-150 bases.

Page 16: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

Structure of reversible terminator

Page 17: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Next Gen Sequencing Method

Page 18: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 19: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Genome Analyzer IIx

The Genome AnalyzerIIx has a single 8 sample flowcell = 8 samples per run

• up to 2 x 150 bp read lengths • up to 640 million paired-end reads per flow cell,• enabling a broad range of high-throughput sequencing applications:

• RNA-seq: expression, isoform, splice variants

• small RNA-seq: microRNA, ncRNA

• Whole genome-seq: SNP variants, CNV

• Exome-seq: expression, variants

• ChIP-seq: transcription factor regulatory sites

Page 20: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Genome Analyzer IIx

Page 21: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Genome Analyzer IIx Optical Path

Page 22: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Genome Analyzer IIx

Flow cell imaging

Page 23: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina Genome Analyzer IIx

Page 24: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina HiSeq 2000

Page 25: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina HiSeq 2000

Page 26: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina HiSeq 2000

Page 27: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina HiSeq 2000

Page 28: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Illumina HiSeq 2000

Two flow cell capacity with 8 sample lanes per flow cell = 16 samples per run

Page 29: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 30: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 31: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 32: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 33: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 34: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 35: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 36: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 37: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 38: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 39: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Other 2nd Generation Sequencing Platforms

• ABI SOLiD – sequencing by ligation; two base calling; short read.• Roche 454 – emulsion PCR; pyrosequencing; long read• Life Technologies Ion Torrent PGM – Personal Genome Machine. Ion Torrent uses the simplest sequencing chemistry including natural nucleotides, no enzymatic cascade, no fluorescence, no chemiluminescence, no optics, no light: The Chip is the Machine.TM

• Helicos – Single molecule sequencing

Page 40: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 41: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 42: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Ion Torrent

• Ion Torrent uses a high-density array of micro-machined wells to perform this biochemical process in a massively parallel way. • Each well holds a different DNA template generated by emulsion PCR. Beneath the wells is an ion-sensitive layer and beneath that a proprietary Ion sensor.• The Ion Personal Genome Machine (PGM™) sequencer then sequentially floods the chip with one nucleotide after another. • If the next nucleotide that floods the chip is not a match, no voltage change will be recorded and no base will be called.

Page 43: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Ion Torrent

When a nucleotide is incorporated into a strand of DNA by a polymerase, a hydrogen ion is released as a byproduct.

Semiconductor based technology

Page 44: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

3rd Generation Sequencing Platforms

• Pacific Bioscience PacBio RS – Single molecule, real-time analysis. SMRT DNA sequencing is performed on SMRT Cells, each patterned with 150,000 zero mode waveguides or ZMWs. Each ZMW contains a single DNA polymerase, providing the window to observe DNA sequencing in real-time. The PacBio RS system continuously monitors ZMWs in sets of 75,000 at a time.

• Oxford Nanopore  – Nanopores offer a label-free, electrical, single-molecule DNA sequencing method, obviating the need for amplification or labeling, by detecting a direct electrical signal. While the evolution of other technologies relies on improvements in existing chemical, optical or bioinformatics procedures, nanopores will bypass these to deliver a genuinely revolutionary sequencing method.

Page 45: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

• Exonuclease sequencingUsing a processive enzyme to cleave individual nucleotides from a DNA strand and pass them through a protein nanopore.  Oxford Nanopore has a commercialisation agreement with Illumina for this method.

• Strand sequencingIdentifying individual nucleotides on a DNA strand as it passes intact through a protein nanopore.

• Solid state sequencingUsing synthetic materials, rather than protein pores, to create nanopores.

Nanopore sequencing technology developmenthttp://vimeo.com/18630569

Page 46: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Exonuclease sequencing: Combining a protein nanopore and processive enzyme for the sequential identification of DNA bases as they pass through the pore.Alpha haemolysin nanopore showing cyclodextrin adapter molecule (the DNA binding site).In the exonuclease sequencing method, a protein nanopore is coupled with a processive enzyme, an exonuclease. The enzyme cleaves individual DNA bases from a DNA strand. These bases enter the nanopore and undergo a binding event before passing through the pore. During this binding event, they cause characteristic disruption in current that can be used to identify the DNA bases in sequence. This method is combined with Oxford Nanopore's proprietary electronics system for scaled-up nanopore sensing.

Structure of the Protein Nanopore

Page 47: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

• Covalent attachment of a cyclodextrin molecule to the inside surface of the nanopore. • Acts as a binding site for individual DNA bases and allows accurate measurement of their passage through the nanopore binding site. • Allows the identification of individual nucleoside 5’-monophosphate molecules (DNA bases) to a standard commensurate with a high resolution DNA sequencing technology.

• Electronic trace showing DNA bases in solution entering the nanopore. • The bases individually, transiently bind to the cyclodextrin adapter. • Each time a base passes through the pore there is a disruption in the current. • Diagram shows four different magnitudes of disruption which can be classified as C, G,

A or T.

Page 48: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health
Page 49: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Examples of Computational Solutions

• Cloud computing tools such as Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp. Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85. Crossbow is available from http://bowtie-bio.sourceforge.net/crossbow/ webcite.

• DNAnexus – a cost effective storage and computational toolset – web based, running on the cloud

• Galaxy – suite of genomic tools developed and hosted by Penn State. Web based and running on Penn servers with NHGRI support

Page 50: Modern Genomic Analysis Methods Janet Hager Translational Genomics Core Department of Genetics and Developmental Biology University of Connecticut Health

Development of Computational Biology Programs

We have made significant progress and investment in the tools for genomic studies

Progress is being made in building infrastructure for collection and integration of clinical data

Focus needed on developing a program in computational biology to attract faculty and research analysts

To assist clinicians and scientists with highthroughput genomic data analysis

The data anaysis challenge is huge!