Whole Exome Sequencing for Variant Discovery and Prioritisation

2005 2006 2007 2008 2009 2010 2011 2012 2013

Papers

2013: ~ 800 papers

2014: ~ 1200 papers

Forero DA, 2012

Exomes: Publication Trends

Total: 925 (Oct 2012)

NGS Variation Discovery Workflow (resequencing based)

Variant Discovery Application: Disease

• An equivalent of the genome would amount almost 2000 books, containing 1.5 million letters each (average books with 200 pages)!

• This information is contained in any single cell of the body.

Monogenic Diseases

• Single mutation

• How do we find it in all those ‘books’?

• A bioinformatics challenge

• NGS sequencers can only read small portions

• So, the library is fragments of pages of the books!

Mendelian Disease Gene Discovery

6Gilissen, Genome Biol 2011

Mendelian Disease Gene Discovery

7Gilissen, Genome Biol 2011

Opportunities and Challenges

• Exomes more cost effective: Sequence patient DNA and filter common SNPs; compare parents child trios; compare paired normal cancer

• Challenges:– Still can’t interpret many Mendelian disorders– Rare variants need large samples sizes– Exome might miss region (e.g. novel non-coding genes)

8Shendure, Genome Biol 2011

Why exome sequencing?• WGS still too costly

• WES: targets ~1% of human genome)

• Mendelian disorders mostly disrupts protein-coding sequences

• Large fraction of rare non-synonymous variants in human genome are predicted to be deleterious

• Splice sites also enriched for highly functional variation

• Search for variants with large effect sizes

A representation of the relationship between the size of the mutational target and the frequency of disease for disorders

caused by de novo mutations

Gilissen, Genom Biol 2011

Bamshad, Nat Rev Genet 2011

Maximizing chances of finding disease-causing rare variants using exome sequencing

Example: Comparative Sequencing

• Somatic mutation detection between normal / cancer pairs

• More mutation yield and better causal gene identification than Mendelian disorders

12Meyerson et al, Nat Rev Genet 2010

Pierce, Am J Hum Genet 2010

Perrault syndrome (HSD17B4)

BUT Exome Analysis for a single patient can be informative

Exome sequencing procedure

Read Mapping

• Mapping hundreds of millions of reads the reference genome is CPU and RAM intensive, and ‘slow’

• Read quality decreases with length (small single nucleotide mismatches or indels – real or artifact?)

• Very few mappers appropriately deal with indels

• Mapping output: SAM (BAM) or BED

Mapped Data: SAM specification• Simple generic sequence alignment format

• Describes alignment of reads to a reference

• Flexible - stores all the alignment information

• Keeps track of chromosome position, alignment quality and alignment features (extended cigar)

• Includes mate pair / paired end information

• Original FASTQ data can be reproduced from SAM (and BAM)

SAM FIELDS

BAM format

• Binary version of SAM - more compact• Makes downstream analysis independent from

the mapping program• Allows most of operations on alignment to work

on a stream without loading the whole alignment into memory

• Allows the file to be indexed by genomic position to efficiently retrieve all reads aligning to a locus

VCF format

• Emerging standard for storing variant data

• Originally designed for SNPs and short INDELs, it also works for structural variations

• Consists of header and data sections

• The data section is TAB delimited with each line consisting of at least 8 mandatory fields

VCF FIELDS

Variant Prioritization

• Heuristic filtering to identify novel genes for Mendelian disorders

22Stitziel et al, Genome Biol 2011

More than just SNVs and ‘short’ indels

Example WES-based variant discovery workflow

1. Map the reads to a reference genome– index the reference genome– Map (BWA, BOWTIE, NOVOAOLIGN, ETC)

2. Sort BAM file3. Remove PCR duplicates4. Realign around indels (‘optional’)5. Call variants6. Recalibrate quality scores (‘optional’)7. Filter variants 8. Basic variant annotation9. Biological interpretation only starts here

Whole Exome Sequencing for Variant Discovery and Prioritisation

Documents

ppt - Autism Exome sequencing

Variant data analysis and prioritization using HGVA€¦ · An individual exome carries between 25,000 and 50,000 variants A whole genome can carry 3.5 million variants on average

Prioritisation simulation

Playful Prioritisation

Reassessment of Mendelian gene pathogenicity using … · the Exome Sequencing Project ... The challenges of variant interpretation in Mendelian disorders are particularly well illustrated

Clinical Prioritisation Criteria

Daylon Soh - Prioritisation

Empirical Data on the Path to Genomic Medicine€¦ · Novel LOF Medical exome >1% Gene exclusions ~200-300 variants Variant

TruSeq Exome SamplePrep

Implementing ACMG guidelines on sequence variant ...€¦ · Exome Aggregation Consortium. The path a selected variant follows through the tree is highlighted. As a variant is filtered

Exome screening to identify loss-of-function mutations in ... · examining the variant analyses conducted with MacaM as reference genome, we identified two putative loss-of-function

Variant Annotation and Viewing Exome Sequencing Data · Jamie K. Teer Exomes 101 9/28/2011 Generate Sequence Data Workflow Align / Call Genotypes Annotate Analyze Sequence Provider

Variant Annotation and Viewing Exome Sequencing Data...Analysis and Visualization – IGV Analysis and Visualization – IGV • Zooming • Highlight reads to get more info • Many

Exome/Genome Sequencing and Newborn Screening · –Clinically Relevant Variant Resource (CRVR) •develop a resource for the identification and dissemination of consensus information

Whole Exome Sequencing for Variant Discovery and Prioritisation

Trio Exome Analysis...Trio Exome Analysis using whole exome/clinical exome sequencing facilitates analyses of thousands of genes simultaneously to identify genetic alterations like

Exome sequencing or Trio analysis - DNA sequencing & … · Exome sequencing or Trio analysis ... 26301 Dobris, Czech Republic | ngs@seqme.eu When ordering exome sequencing, ... Thanks

Variant Annotation and Viewing Exome Sequencing … 1 Variant Annotation and Viewing Exome Sequencing Data Jamie K. Teer Exomes 101 9/28/2011 Generate Sequence Data Workflow Align

Case Report A Novel VPS33B Variant Identified by Exome

Large-scale genomic analyses link reproductive aging to ...€¦ · variants (the common index SNP, the second signal from GCTA and the exome chip variant) appeared to reflect non-redundant