24
RNA/DNA Sequencing and Genotyping Analysis Bryce Christensen, PhD Director of Services and Statistical Geneticist Genetic Data Visualization and Analysis with Golden Helix

Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Embed Size (px)

DESCRIPTION

GenomeBrowse, a free visualization tool for all types of sequence data, was introduced in 2012 to broad acclaim. Researchers using GenomeBrowse discovered a product far beyond the status quo with seamless navigation of sequence alignments and other genomic data using a fluid, fast, and intuitive interface that just "made sense." Recent updates to GenomeBrowse, including support for VCF files and BED files and the ability to export tables of data extracted from viewable annotation tracks, further improved the product and created new synergy with Golden Helix SNP & Variation Suite (SVS). This webcast will demonstrate the ability of GenomeBrowse to stream sequence alignment data from the Amazon Cloud, seamlessly transitioning between whole genome views and base-pair resolution in the context of both public and custom annotation tracks. We will show how GenomeBrowse can be used in conjunction with SVS to highlight false variant calls, confirm the inheritance pattern of putative functional variants, and aid in the interpretation of a variant's impact. Examples of RNA-seq expression analysis, somatic variation in cancer, and family-based DNA-seq analysis will be included.

Citation preview

Page 1: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

RNA/DNA Sequencing and Genotyping

Analysis

Bryce Christensen, PhD Director of Services and Statistical Geneticist

Genetic Data Visualization and Analysiswith Golden Helix

Page 2: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Use the Questions pane in your GoToWebinar window

Questions during the presentation

Page 3: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Core Features Packages Core Features

Powerful Data Management Rich Visualizations Robust Statistics Flexible Easy to use

Applications

Genotype Analysis DNA sequence analysis CNV Analysis RNA-seq differential expression Family Based Association

SNP & Variation Suite (SVS)

Page 4: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

RNA-seq expression profiling analysis

DNA-seq variant validation Genomic interpretation

GenomeBrowse

Page 5: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Today’s Agenda

RNA-Seq differential expression example using SVS and GenomeBrowse2

3 Somatic mutation analysis example

Getting started with GenomeBrowse and new features1

Page 6: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS
Page 7: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Acknowledgments

NA12878 WGS data is from Illumina Genome Network

RNA-seq example data provided by EA, same data is available to all SVS and GenomeBrowse users.

Gastric cancer sample pair described here:- Zang, et al., Exome Sequencing of gastric adenocarcinoma

identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes. Nat. Genet. 44, 570-574 (2012).

A big thank you to the Golden Helix product development team!

Page 8: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Questions?

Use the Questions pane in your GoToWebinar window

Page 9: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

END

Page 10: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Visualization Experience

Natural zooming and navigation controls that mimic familiar panning and scrolling actions.

Coverage and pile-up views with different modes to highlight mismatches and look for strand bias.

Deep, stable stacking algorithms to look at all reads in a pile-up, not just the first 10 or 20.

Easily generate exportable tables of any data in the viewable window.

Context-sensitive information by clicking on any feature.

A dynamic labeling system which gives optimal detail on annotation features without cluttering the view.

Page 11: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Data Streaming

Cloud-based repository of public annotations including dbSNP, 1000 Genomes, NHLBI 6500 Exomes, UCSC Known Genes, Ensembl, the OMIM catalog, and much more.

All public annotation tracks are hosted on the cloud with optimized on-demand streaming so that you don’t have to download them to start viewing data.

Annotations are updated frequently and automatically by Golden Helix so that you can have immediate access to the most up-to-date information.

Additional species are also available including cattle, sheep, major food crops, model organisms, etc.

Page 12: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Flexible Data Storage Options

Easily manage repositories of BAM files, whether they are on local hard drives or network attached storage with an integrated download manager that can be used to create local copies of cloud-based public or private data files.

GenomeBrowse is tightly integrated with the EA Pipeline so that EA customers have immediate streaming access to their RNA-Seq analysis outputs, saving terabytes of data download.

GenomeBrowse users are required to create an account, enabling secure connections to cloud-based data sources. - Illumina BaseSpace integration coming soon.

Page 13: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

SVS Example: Ogden Syndrome Data

Lethal X-linked recessive disease affecting males in multiple generations of a family in Ogden, Utah

Sequenced 5 members of the family, including 3 carriers, to identify the causal mutation.

Dr. Lyon graciously shared this data with GHI.

Page 14: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Study Design

Five family members sequenced using X-chromosome exome capture

Page 15: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

The Human Reference Sequence

Genome Reference Consortium (GRCh37)- Feb 2009, previous was NCBI36 March 2006- 9 alt loci and 187 patches (11 patch releases)

Supercontigs: Large unplaced contigs- Some localized to chr level and some unknown

Does not include a Mitochondrial reference- UCSC hg19 includes older NCBI 36 MT- 1000 genomes project using revised Cambridge

Reference Sequence (rCRS)- Provide “g1k” reference: includes rCRS, Human

herpesvirus 4 type 1, supercontigs and “decoy” sequence

v38 genome coming this summer:- Incorporate all patches into the reference- Some allele fixes to have reference match major

Page 16: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Single Nucleotide Variants (i.e. SNVs or SNPs)

Single base substitution from reference

Note that “reference” is not always the “major” allele

“Multi-allelic” sites have more than 2 cataloged alleles

Gholson Lyon, 2012

Page 17: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Small Insertions/Deletions

Generally defined as being < 150bp (often much shorter)

Frameshift insertions/deletions important “loss of function” class of variants- Although InDels divisible by three are “in-frame” when in coding region

Hard to call consistently. Poor concordance between algorithms.

Where to call an InDel in a homopolymer?- GTTTAC - GTTTTAC- 01234567- How do you describe the insertion? Ins of T at 5? Or ins of T at 1?

- CGI in their v1 pipeline preferred calling insertion at end, others at beginning, now always at beginning

MNP – Can also be called differently

Page 18: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Copy Number Variants

Best results with WGS- CNVs > 10kb pretty accurate.- Under 10kb problematic.

Detecting Deletions- Can see coverage drop to near zero- Harder to pinpoint breakpoint- Possible false positives in low-

mapability regions

Amplifications- Can see coverage jump- False positives due sample prep or

sequence artifacts

Need “baseline,” look at Log Ratio- Somatic detection uses normal tissues- Can have control population

Venter vs Watson WGS CNV-seq

64kbp gain in DDAH1 Gene of NA12878

Page 19: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Structural Variants

Looking for:- Balanced rearrangements- Inversions- Translocations- Complex

Signals to detect SV:- Paired-end mappings / insert length- Depth of coverage- Split-read mapping

Translocations can result in “fusion” genes. - For example BCR-ABL fusion gene

central in pathogenesis certain leukemias.

Page 20: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Example 1kb Inversion (intron of APP)

Let’s take a look at this one…

Page 21: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Golden HelixLeaders in Genetic Analytics

Founded in 1998

Multi-disciplinary: computer science, bioinformatics, statistics, genetics

Software and analytic services

In Everything We Do…

Empowerment

Simplicity

Responsiveness

Excellence

About Golden Helix

Page 22: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Hundreds of Customers World-Wide

Page 23: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

-Over 750 Published Citations

Page 24: Exploring DNA/RNA-Seq Analysis Results with Golden Helix GenomeBrowse and SVS

Visualization

Genome browsers:- Validate variant calls- Look at gene annotations,

problematic regions, population catalogs

- Compare samples where no variant called

Free Genome Browsers:- IGV

- Popular desktop by Broad

- UCSC- Web-based, extensive annotations

- GenomeBrowse- Designed to be publication ready- Smooth zoom and navigation