17
© 2010 LabKey Software www.labkey.com Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey Server Adam Rauch [email protected]

LabKey Software Company Overview

  • Upload
    tab

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Managing Next Generation Sequencing and Multiplexed Genotyping Data Using Open Source LabKey Server Adam Rauch [email protected]. LabKey Software Company Overview. LabKey Software is a consulting company Spun off from the McIntosh Lab (part owned by FHCRC) - PowerPoint PPT Presentation

Citation preview

Page 1: LabKey Software Company Overview

© 2010 LabKey Software www.labkey.com

Managing Next Generation Sequencing and Multiplexed Genotyping Data Using

Open Source LabKey Server

Adam [email protected]

Page 2: LabKey Software Company Overview

LabKey Software 2010

LabKey Software Company Overview

LabKey Software is a consulting company Spun off from the McIntosh Lab (part owned by FHCRC) Professional software engineers from Amazon,

Microsoft, BEA etc Work in partnership with scientists

For-profit fee-for-service contracts Non-profit grant sub-awards

– Co-investigators with a shared research agenda All development approved by and relevant to FHCRC

Development & support around LabKey Server Extending the base LabKey Server platform Creating customized lab-specific solutions Hosting LabKey server Support

2

Page 3: LabKey Software Company Overview

LabKey Software 2010

What Is LabKey Server?

An open-source, web-based platform for organizing, analyzing & sharing scientific data

Data integration analysis for assays Proteomics, flow cytometry, plate-based assays, etc.

Study Data Management Combines demographic, clinical, assay & specimen data

LabKey Server powers many deployments… CPAS: FHCRC proteomics repository Atlas Science Portal: SCHARP’s HIV vaccine studies AdaptiveTCR: Customer analytics for ImmunoSEQ NGS UW (Katze, Heinecke, et al), USC, Markey, Harvard,

IDRI, TGen, Wisconsin Primate EHR, UC Denver, etc.

3

Page 4: LabKey Software Company Overview

LabKey Software 2010

Dave O’Connor Lab, University of Wisconsin

Academic research lab Focus: understanding SIV using nonhuman

primate models & applying NHP methods to human HIV disease research

Academic research lab Focus: understanding SIV using nonhuman

primate models & applying NHP methods to human HIV disease research

Page 5: LabKey Software Company Overview

Source: modified from Yewdell et al., Nature Reviews Immunology 2003

Source: Korber et al., British Medical Bulletin 2001

Host Immune Genetics

Virus Genetics

O’Connor Lab SIV/HIV Research

Page 6: LabKey Software Company Overview

Source: modified from Yewdell et al., Nature Reviews Immunology 2003

Host Immune Genetics

MHC class I molecules dictate immunity to disease

High degree of polymorphism within the MHC class I peptide-binding domain

Specific MHC alleles associated with superior control of HIV infection

Importance of MHC Class I

Page 7: LabKey Software Company Overview

Source: Korber et al., British Medical Bulletin 2001

Virus Genetics HIV has fast replication cycle, high mutation rate

Evolution of the virus causes escape from immune responses

Specific mutations are associated with resistance to antiretroviral drug therapy

Importance of Viral Variability

Page 8: LabKey Software Company Overview

LabKey Software 2010

Sequencing in the O’Connor Lab

8

2005 – 2009 Sanger sequencing “Prohibitively expensive” for most experiments

2009 Roche/454 GS FLX at UIUC 2010 Roche/454 GS Junior in lab

Roche/454 GS Junior Long-read instrument, critical for genotyping Identical to GS FLX, but 1/8 throughput & lower cost ~100,000 reads per run (~1¢ per read), average ~560bp read length 115 runs this year

MID tagging Allows pooling multiple samples (30-100) into a single run

Galaxy server Open-source sequence analysis tool (Giardine et al, Genome Res 2005) Lab has built custom workflow to match sequences to known MHC alleles Uses BLAT, transitioning to AGILE (Northwestern alignment tool)

Page 9: LabKey Software Company Overview

Roche/454 MHC Workflow

• Total RNA isolation and cDNA synthesis– RNA isolation ~4 hrs; cDNA synthesis ~2

hrs

• Primary PCR amplification– plus SPRI purification, quantification,

pooling ~3 hrs

• emPCR– set-up ~1 hr, run ~5.5 hrs

• Breaking and enrichment– ~3 hrs

• Roche/454 GS Junior run– set-up ~1.5 hrs; run time ~10 hrs

• Data processing and analysis– run processing ~2 hrs; analysis time

varies

www.454.com

Page 10: LabKey Software Company Overview

LabKey Software 2010

PROBLEM: DATA MANAGEMENT!

There is a real disconnect between the ability to collect next-generation sequence data (easy) and the ability to analyze it meaningfully (hard)

Dave O’Connor

10

Page 11: LabKey Software Company Overview

LabKey Software 2010

Problem: Data Management

As volume has increased, lab has found it difficult to manage all their sequencing data & meta data: Run meta data Run metrics Sequencing reads and quality scores Sample information and multiplex identifiers (MIDs) Reference sequences for genotyping experiments Genotyping matches

O’Connor asked LabKey to build a system that can: Store sequencing and genotyping data in a single database that

links all the tables, allowing arbitrary queries and reports Provide tools for analysis, querying, visualization and export Automate data workflows for efficiency & consistency Eventually, link sequencing results to their primate EHR system

11

Page 12: LabKey Software Company Overview

LabKey Software 2010

LabKey Sequencing System

12

Reads Quality Scores

Metrics

Sample Information

Sequencing and Genotyping Database

External Tools

AnalysisReporting Export

Galaxy Genotyping Workflow

Reference Sequences

Visualization

Page 13: LabKey Software Company Overview

Database Schema

13

Metrics (genotyping)Run

[...]

Runs (genotyping)RowId

MetaDataId

Container

CreatedBy

Created

Path

FileName

Status

AllelesJ unction (genotyping)MatchId

SequenceId

Analyses (genotyping)RowId

Run

CreatedBy

Created

Description

Path

FileName

Status

SequenceDictionary

SequencesView

AnalysisSamples (genotyping)Analysis

SampleId

Reads (genotyping)RowId

Run

Name

Mid

Sequence

Quality

ReadsJ unction (genotyping)MatchId

ReadId

Dictionaries (genotyping)RowId

Container

CreatedBy

Created

Matches (genotyping)RowId

Analysis

SampleId

Reads

[Percent]

AverageLength

PosReads

NegReads

PosExtReads

NegExtReads

Samples (genotyping)SampleId

[...]

Sequences (genotyping)RowId

Dictionary

Uid

AlleleName

Initials

GenbankId

ExptNumber

Comments

Locus

Species

Origin

Sequence

PreviousName

LastEdit

Version

ModifiedBy

Translation

Type

IpdAccession

Reference

RegIon

Id

Variant

UploadId

FullLength

AlleleFamily

MetaData (genotyping)Run

[...]

Page 14: LabKey Software Company Overview

LabKey Software 2010

Demo

14

Page 15: LabKey Software Company Overview

LabKey Software 2010

Possible Future Directions

Respond to O’Connor lab’s near-term needs Genomics-specific analytics Additional export formats Tighter integration with Galaxy Support for amplicon-designated reads Match combining Simplify configuration and operation

Integrate with Wisconsin primate EHR Better integration with R / Bioconductor Visualization Other sequencing platforms: Illumina, PacBio…

15

Page 16: LabKey Software Company Overview

LabKey Software 2010

Acknowledgements

O’Connor Laboratory David O’Connor Simon Lank Julie Karl Benjamin Bimber

LabKey Software Mark Igra Brian Connolly Elizabeth Nelson Josh Eckels Matthew Bellew Et al

Page 17: LabKey Software Company Overview

LabKey Software 2010

Questions?

17