Upload
golden-helix-inc
View
195
Download
1
Tags:
Embed Size (px)
DESCRIPTION
GenomeBrowse, a free visualization tool for all types of sequence data, was introduced in 2012 to broad acclaim. Researchers using GenomeBrowse discovered a product far beyond the status quo with seamless navigation of sequence alignments and other genomic data using a fluid, fast, and intuitive interface that just "made sense." Recent updates to GenomeBrowse, including support for VCF files and BED files and the ability to export tables of data extracted from viewable annotation tracks, further improved the product and created new synergy with Golden Helix SNP & Variation Suite (SVS). This webcast will demonstrate the ability of GenomeBrowse to stream sequence alignment data from the Amazon Cloud, seamlessly transitioning between whole genome views and base-pair resolution in the context of both public and custom annotation tracks. We will show how GenomeBrowse can be used in conjunction with SVS to highlight false variant calls, confirm the inheritance pattern of putative functional variants, and aid in the interpretation of a variant's impact. Examples of RNA-seq expression analysis, somatic variation in cancer, and family-based DNA-seq analysis will be included.
Citation preview
RNA/DNA Sequencing and Genotyping
Analysis
Bryce Christensen, PhD Director of Services and Statistical Geneticist
Genetic Data Visualization and Analysiswith Golden Helix
Use the Questions pane in your GoToWebinar window
Questions during the presentation
Core Features Packages Core Features
Powerful Data Management Rich Visualizations Robust Statistics Flexible Easy to use
Applications
Genotype Analysis DNA sequence analysis CNV Analysis RNA-seq differential expression Family Based Association
SNP & Variation Suite (SVS)
RNA-seq expression profiling analysis
DNA-seq variant validation Genomic interpretation
GenomeBrowse
Today’s Agenda
RNA-Seq differential expression example using SVS and GenomeBrowse2
3 Somatic mutation analysis example
Getting started with GenomeBrowse and new features1
Acknowledgments
NA12878 WGS data is from Illumina Genome Network
RNA-seq example data provided by EA, same data is available to all SVS and GenomeBrowse users.
Gastric cancer sample pair described here:- Zang, et al., Exome Sequencing of gastric adenocarcinoma
identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes. Nat. Genet. 44, 570-574 (2012).
A big thank you to the Golden Helix product development team!
Questions?
Use the Questions pane in your GoToWebinar window
END
Visualization Experience
Natural zooming and navigation controls that mimic familiar panning and scrolling actions.
Coverage and pile-up views with different modes to highlight mismatches and look for strand bias.
Deep, stable stacking algorithms to look at all reads in a pile-up, not just the first 10 or 20.
Easily generate exportable tables of any data in the viewable window.
Context-sensitive information by clicking on any feature.
A dynamic labeling system which gives optimal detail on annotation features without cluttering the view.
Data Streaming
Cloud-based repository of public annotations including dbSNP, 1000 Genomes, NHLBI 6500 Exomes, UCSC Known Genes, Ensembl, the OMIM catalog, and much more.
All public annotation tracks are hosted on the cloud with optimized on-demand streaming so that you don’t have to download them to start viewing data.
Annotations are updated frequently and automatically by Golden Helix so that you can have immediate access to the most up-to-date information.
Additional species are also available including cattle, sheep, major food crops, model organisms, etc.
Flexible Data Storage Options
Easily manage repositories of BAM files, whether they are on local hard drives or network attached storage with an integrated download manager that can be used to create local copies of cloud-based public or private data files.
GenomeBrowse is tightly integrated with the EA Pipeline so that EA customers have immediate streaming access to their RNA-Seq analysis outputs, saving terabytes of data download.
GenomeBrowse users are required to create an account, enabling secure connections to cloud-based data sources. - Illumina BaseSpace integration coming soon.
SVS Example: Ogden Syndrome Data
Lethal X-linked recessive disease affecting males in multiple generations of a family in Ogden, Utah
Sequenced 5 members of the family, including 3 carriers, to identify the causal mutation.
Dr. Lyon graciously shared this data with GHI.
Study Design
Five family members sequenced using X-chromosome exome capture
The Human Reference Sequence
Genome Reference Consortium (GRCh37)- Feb 2009, previous was NCBI36 March 2006- 9 alt loci and 187 patches (11 patch releases)
Supercontigs: Large unplaced contigs- Some localized to chr level and some unknown
Does not include a Mitochondrial reference- UCSC hg19 includes older NCBI 36 MT- 1000 genomes project using revised Cambridge
Reference Sequence (rCRS)- Provide “g1k” reference: includes rCRS, Human
herpesvirus 4 type 1, supercontigs and “decoy” sequence
v38 genome coming this summer:- Incorporate all patches into the reference- Some allele fixes to have reference match major
Single Nucleotide Variants (i.e. SNVs or SNPs)
Single base substitution from reference
Note that “reference” is not always the “major” allele
“Multi-allelic” sites have more than 2 cataloged alleles
Gholson Lyon, 2012
Small Insertions/Deletions
Generally defined as being < 150bp (often much shorter)
Frameshift insertions/deletions important “loss of function” class of variants- Although InDels divisible by three are “in-frame” when in coding region
Hard to call consistently. Poor concordance between algorithms.
Where to call an InDel in a homopolymer?- GTTTAC - GTTTTAC- 01234567- How do you describe the insertion? Ins of T at 5? Or ins of T at 1?
- CGI in their v1 pipeline preferred calling insertion at end, others at beginning, now always at beginning
MNP – Can also be called differently
Copy Number Variants
Best results with WGS- CNVs > 10kb pretty accurate.- Under 10kb problematic.
Detecting Deletions- Can see coverage drop to near zero- Harder to pinpoint breakpoint- Possible false positives in low-
mapability regions
Amplifications- Can see coverage jump- False positives due sample prep or
sequence artifacts
Need “baseline,” look at Log Ratio- Somatic detection uses normal tissues- Can have control population
Venter vs Watson WGS CNV-seq
64kbp gain in DDAH1 Gene of NA12878
Structural Variants
Looking for:- Balanced rearrangements- Inversions- Translocations- Complex
Signals to detect SV:- Paired-end mappings / insert length- Depth of coverage- Split-read mapping
Translocations can result in “fusion” genes. - For example BCR-ABL fusion gene
central in pathogenesis certain leukemias.
Example 1kb Inversion (intron of APP)
Let’s take a look at this one…
Golden HelixLeaders in Genetic Analytics
Founded in 1998
Multi-disciplinary: computer science, bioinformatics, statistics, genetics
Software and analytic services
In Everything We Do…
Empowerment
Simplicity
Responsiveness
Excellence
About Golden Helix
Hundreds of Customers World-Wide
-Over 750 Published Citations
Visualization
Genome browsers:- Validate variant calls- Look at gene annotations,
problematic regions, population catalogs
- Compare samples where no variant called
Free Genome Browsers:- IGV
- Popular desktop by Broad
- UCSC- Web-based, extensive annotations
- GenomeBrowse- Designed to be publication ready- Smooth zoom and navigation