4

Click here to load reader

GATK Best Practices Workflow on DRAGEN - edicogenome.com · GATK Best Practices Workflow on DRAGEN is a complete end-to solution that includes all required analysis phases as specified

  • Upload
    vannga

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: GATK Best Practices Workflow on DRAGEN - edicogenome.com · GATK Best Practices Workflow on DRAGEN is a complete end-to solution that includes all required analysis phases as specified

TM

GATK Best Practices Workflow on DRAGENAccelerate your GATK Best Practices Workflow on the DRAGEN Platform

Accelerated and Cost-Effective Complete End-to-End Solution

The GATK Best Practices Workflow for Germline SNPsand Indels in Whole Genomes and Exomes is availableon the DRAGEN Platform for customers that have avalid GATK License from the Broad Institute.Harnessing the tremendous processing power of theDRAGEN Bio-IT Platform, the GATK Best PracticesWorkflow on DRAGEN reduces the time required toanalyze a whole genome from FASTQ to VCF at 30xcoverage to ~22 minutes. This time saving translatesdirectly to significant cost savings both onsite and inthe cloud. DRAGEN supports versions 3.1 and 3.6 ofthe GATK-HC Variant Caller.

GATK Best Practices for Germline SNPs and Indels in Whole Genomes and Exomes on the DRAGEN Platform

The GATK Best Practices Workflow on DRAGEN is acomplete end-to-end solution that includes allrequired analysis phases as specified by the BroadInstitute. The workflow is fully configurable and readyfor use out-of-the-box. The workflow ingests BCL orFASTQ files and produces BAM, VCF and/or gVCF filesas well as depth of coverage metrics. GATK BestPractices on DRAGEN can be utilized onsite, in thecloud or as a seamless hybrid cloud that includes afully functional and easy to use Web Portal andWorkflow Management System.

Page 2: GATK Best Practices Workflow on DRAGEN - edicogenome.com · GATK Best Practices Workflow on DRAGEN is a complete end-to solution that includes all required analysis phases as specified

The first phase of the GATK Best Practices Workflow involves recommendations to properly prepare raw data files for analysis.

Map to Reference: The first step is mapping thesequenced reads to the reference genome to producea file in SAM/BAM format sorted by coordinate. Theworkflow offers supreme flexibility of data analysisand can handle any raw data input format and canstream BCL data directly from sequencer storage, asolution unique to the DRAGEN Platform, enabling thecustomer to go directly from raw sequencing data toan output VCF. DRAGEN can also convert BCL toFASTQ or BAM/CRAM then proceed with the GATKBest Practices Workflow.

Mark Duplicates: Once data has been mapped to thereference genome, the workflow marks duplicatereads from DNA fragments that may have beensequenced multiple times because of PCRamplification or optical duplicate artifacts. Markingduplicates mitigates biases introduced by datageneration steps by flagging the duplicates but theprocess does not remove the reads.

GATK Best Practices divides variant discovery into twoseparate steps: variant calling and variant filtering.The first step is designed to maximize sensitivity whilethe filtering step aims to deliver a level of specificitythat can be customized for each project.

Variant Discovery

Call Variants: The workflow is capable of calling SNPsand Indels simultaneously via local de-novo assemblyof haplotypes in an active region. The workflow useshardware-accelerated implementations of the Smith-Waterman and PairHMM algorithms for variant calling.

Joint Genotype: The workflow produces an outputgVCF file for each individual sample providing acomprehensive record of each position in the genomealong with genotype likelihoods. The gVCF files are fedinto the joint genotyping tool to produce a multi-sample VCF for subsequent joint or family analysis.

Filter Variants: GATK Best Practices recommendsfiltering the raw variant callset by using a variantquality score recalibration (VQSR), which uses machinelearning to identify annotation profiles of variants thatare likely to be real, and assigns a VQSLOD score toeach variant.

Pre-Processing

GATK Best Practices on DRAGEN

Analysis Phases

GATK Best Practices describes the key principles of the processing and analysis steps required to go from rawreads coming off a sequencing instrument through to an appropriately filtered variant callset that can be used indownstream analyses. GATK Best Practices breaks the workflow into two required analysis phases and oneoptional phase that describes ways to handle or fine-tune the output VCF file.

1. Pre-Processing

Pre-processing starts from rawsequence data and producesanalysis-ready BAM files. Thisinvolves alignment to areference genome and datacleanup to correct for technicalbiases and make the datasuitable for analysis.

Variant discovery starts fromanalysis-ready BAM files andproduces a callset in VCFformat. This involves identifyinggenomic variation in one ormore individuals and applyingfiltering methods appropriateto the experimental design.

2. Variant Discovery

After a callset has been generatedand filtered according to GATKBest Practices recommendations,there are several options toevaluate and refine the variantand genotype calls further.

3. Callset Refinement*

The GATK Best Practices documentation: software.broadinstitute.org/gatk/best-practices/

*Optional

Page 3: GATK Best Practices Workflow on DRAGEN - edicogenome.com · GATK Best Practices Workflow on DRAGEN is a complete end-to solution that includes all required analysis phases as specified

Comprehensive Platform for Genomic Data Analysis and Storage

Drag and Drop Web Portal

The WMS is comprehensive in terms of functionalcapabilities: users can effortlessly automate theirworkflows between any number of sequencinginstruments of any type, can automate theirpipelines across any number of samples, and canspread these samples across an infinite number oflanes and flow cells. The WMS can integrate withany Laboratory Information Management System(LIMS).

Many of the largest genome sequencing centers inthe world use this technology to run thousands ofgenomes in a year, without the need forwidespread resources to facilitate and manage thework. The WMS is scalable in terms of file size,number of files, and/or number of DRAGENs. Itenables production-scale, clinical-grade genomicanalysis.

Workflow Management System

The GATK Best Practices Workflow on DRAGEN plugs into the larger DRAGEN eco-system, a comprehensiveplatform for analyzing, reanalyzing, storing and archiving genomic data at the lowest cost and highest fidelity. TheDRAGEN Platform is based on the highly reconfigurable DRAGEN Bio-IT Processor which uses a field-programmable-gate-array (FPGA) to provide hardware-accelerated implementations of genome pipelinealgorithms, such as BCL conversion, compression, mapping, alignment, sorting, duplicate marking and haplotypevariant calling. The highly flexible DRAGEN Platform allows users to effortlessly set-up and manage highly complexworkflows and can be utilized onsite, in the cloud or as a seamless hybrid cloud that includes a fully functional andeasy-to-use Web Portal and Workflow Management System. The Hybrid Cloud configuration provides users withthe flexibility to scale up to the cloud during times of high capacity and return to onsite analysis when demand isreduced.

The DRAGEN Web Portal enables customers to useone single interface to manage both the onsite andcloud platform. Customers can manage thecomplete DRAGEN Platform through the Webportal, including selecting or customizing genomicpipelines, syncing hybrid cloud analysis with onsiteDRAGEN server and FPGA platform, automating andscheduling concurrent single, multi-sample and/orpopulation calling pipelines , and compressing datafiles. The DRAGEN Web Portal features aninformation-rich dashboard that enables customersto manage everything at a glance, including thecomplex automation functionality of the WorkflowManagement System and all DRAGEN ultra-rapidanalysis algorithms.

Page 4: GATK Best Practices Workflow on DRAGEN - edicogenome.com · GATK Best Practices Workflow on DRAGEN is a complete end-to solution that includes all required analysis phases as specified

At Edico Genome, we’re helping usher in the new era of personalized medicine by enabling a fundamental changein healthcare with customized treatments and data-driven insights tailored to the individual. At the heart ofpersonalized medicine, DNA sequencing technology is advancing at an even more rapid pace than the cell phonerevolution. By increasing the speed and accuracy for NGS data analysis, such as whole genome sequencing (WGS),our computing platform makes it easier to discover links between DNA sequence variations and human disease.

About Edico Genome

[email protected]

@edicogenome

GATK Best Practices Workflow DRAGEN Onsite On CPU Only DRAGEN Speed Up

BCL / FASTQ -> VCF 0:26:28 / 0:22:11 32:18:20 / 30:20:20 ~74X / ~82X

BCL / FASTQ -> gVCF 0:29:27 / 0:28:31 38:51:34 / 36:53:29 ~80X / ~78X

Joint Calling (3 gVCF files -> FASTQ -> Joint VCF)

2:24:00 166:45:00 ~69X

+ VQSR Recalibration 0:6:58 1:50:00 ~15X

Dataset: SRA056922 NA12878 @ 30x

In addition to the GATK Best Practices Workflow, the following set of tools are available on the DRAGEN Platform.

DRAGEN Tools and Features

Depth of CoverageThis tool processes a set of BAM files to determine coverage at different levels of partitioning and aggregation.

dbSNP AnnotationThis tool is designed to annotate variant calls based on their context, as opposed to functional annotation.

VCF Hard FilteringThis tool is designed for hard-filtering variant calls based on specified criteria.

GATK Best Practices Workflow DRAGEN Cloud On CPU Only DRAGEN Speed Up

BCL / FASTQ -> VCF 0:8:45 / 0:6:12 32:18:20 / 30:20:20 ~242X / ~151X

BCL / FASTQ -> gVCF 0:10:20 / 8:16 38:51:34 / 36:53:29 ~233X / ~148X

Joint Calling (3 gVCF Files) 1:20:23 166:45:00 ~125X

DRAGEN Cloud

Best Practice Platform Speeds