Next-Generation Informatics

Preview:

DESCRIPTION

Talk from the Bioinformatics session of the Advances in Genome Biology and Technology 2009 meeting.

Citation preview

Next-Generation InformaticsDavid Dooling <ddooling@wustl.edu>

AGBT Bioinformatics2009-02-05

ddooling@wustl.edu

Framing the problem

ddooling@wustl.edu

Framing the problem

!"""# !""$# !""!# !""%# !""&# !""'# !""(# !"")# !""*# !""+# !"$"#

,--./01#234#

567#

89-.3:/#;<=>#

8/?@/AB/#

6/.1-AA/C#

ddooling@wustl.edu

Different perspectives

ddooling@wustl.edu

LIMS

ddooling@wustl.edu

LIMS - Illumina/Solexa

ddooling@wustl.edu

LIMS - Roche/454

ddooling@wustl.edu

Analysis

ddooling@wustl.edu

Analysis - cDNASolexa cDNA reads

SNPsIndels

Gene expression

(to exquisite sensitivity)

[Transcriptome] OR [Genome + SpliceJunctions (SJs)] OR [Genome]

Variant discovery/

ASE

Splice isotypes

NovelGenes

Readdepth

Maq/Tophat

MaqReadsmap to

“non-genic”regions

VelvetGenScan

Readsmap to

novel SJs or introns

ddooling@wustl.edu

Project Lead

ddooling@wustl.edu

Changing pipelines

ddooling@wustl.edu

Changing pipelines - LIMSPrep

Tech-SpecificPrep /Detection

Primary Analysis

PCR

cDNAs

3730 Phred

Submission

HybridSelection

Bisulfite

SamplePooling

JumpingLibraries

WGS

Solexa

454

SOLiD

ChurchPolony(?)

Helicos(?)

(Technology-specific)

Flow-space

Color-space

.

.

.

NCBI Trace

ProjectArchives(e.g., DCC)

NCBI SRA

NCBI MedicalArchive

Courtesy of Toby Bloom

ddooling@wustl.edu

Changing pipelines - AnalysisBLASTBLATPASHssaha

runMappingELAND

mapreadsArachne

MAQexonerateSHRiMPSPLIGNMosaik

SLIM SearchSXOligoSearch

SOAP2NovoCraft

BowtieTophat

PhrapArachnePCAP

PhusionEuler

ATLASNewblerVelvetForge

SSAKEVCAKE

Euler-USRSHARCGS

CABOG

Alig

ners

Assem

blers

ddooling@wustl.edu

Framing the solution

ddooling@wustl.edu

Past is prologue

ddooling@wustl.edu

Convert this…

ddooling@wustl.edu

… into this

ddooling@wustl.edu

Convert this…

ddooling@wustl.edu

… into this

ddooling@wustl.edu

UR• Object-relational mapping (ORM) layer

– Interact with persistence layer (e.g., relational database) through objects and methods

– Automatic, dynamic class definitions– Moose1-like object definition syntax

• Object context– In-memory transactions (even across databases)– Caching/deferred loading

• Dynamic command-line interface• Integrated documentation system

1 - http://www.iinteractive.com/moose/

ddooling@wustl.edu

Genome Workflow

ddooling@wustl.edu

Genome Model

ddooling@wustl.edu

Past is prologue…

ddooling@wustl.edu

… but with a wrinkle• Lab personnel accept

the software you give them

• Analysts are more than happy to develop their own

• We need to make it easy for analysts to build tools within the system

ddooling@wustl.edu

Easy Perl API

ddooling@wustl.edu

Pairing

Analyst

Programmer

ddooling@wustl.edu

Variant Detection Pipeline

ddooling@wustl.edu

cDNA Analysis

ddooling@wustl.edu

16S Pipeline

ddooling@wustl.edu

Assembly and Annotation Pipeline

ddooling@wustl.edu

Challenges• There is still much more work to do• Sequencing is demolishing Moore’s law• The cult of traces• The richness of data• Visualization

ddooling@wustl.edu

CIRCOS

ddooling@wustl.edu

ThanksWeb Site http://genome.wustl.edu/Blog http://www.politigenomics.com/

LIMS Paper http://www.biomedcentral.com/1471-2105/8/362UR Presentation http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/

Recommended