Next-Generation InformaticsDavid Dooling <[email protected]>
AGBT Bioinformatics2009-02-05
Framing the problem
Framing the problem
!"""# !""$# !""!# !""%# !""&# !""'# !""(# !"")# !""*# !""+# !"$"#
,--./01#234#
567#
89-.3:/#;<=>#
8/?@/AB/#
6/.1-AA/C#
Different perspectives
LIMS
LIMS - Illumina/Solexa
LIMS - Roche/454
Analysis
Analysis - cDNASolexa cDNA reads
SNPsIndels
Gene expression
(to exquisite sensitivity)
[Transcriptome] OR [Genome + SpliceJunctions (SJs)] OR [Genome]
Variant discovery/
ASE
Splice isotypes
NovelGenes
Readdepth
Maq/Tophat
MaqReadsmap to
“non-genic”regions
VelvetGenScan
Readsmap to
novel SJs or introns
Project Lead
Changing pipelines
Changing pipelines - LIMSPrep
Tech-SpecificPrep /Detection
Primary Analysis
PCR
cDNAs
3730 Phred
Submission
HybridSelection
Bisulfite
SamplePooling
JumpingLibraries
WGS
Solexa
454
SOLiD
ChurchPolony(?)
Helicos(?)
…
(Technology-specific)
Flow-space
Color-space
.
.
.
NCBI Trace
ProjectArchives(e.g., DCC)
NCBI SRA
NCBI MedicalArchive
Courtesy of Toby Bloom
Changing pipelines - AnalysisBLASTBLATPASHssaha
runMappingELAND
mapreadsArachne
MAQexonerateSHRiMPSPLIGNMosaik
SLIM SearchSXOligoSearch
SOAP2NovoCraft
BowtieTophat
PhrapArachnePCAP
PhusionEuler
ATLASNewblerVelvetForge
SSAKEVCAKE
Euler-USRSHARCGS
CABOG
Alig
ners
Assem
blers
Framing the solution
Past is prologue
Convert this…
… into this
Convert this…
… into this
UR• Object-relational mapping (ORM) layer
– Interact with persistence layer (e.g., relational database) through objects and methods
– Automatic, dynamic class definitions– Moose1-like object definition syntax
• Object context– In-memory transactions (even across databases)– Caching/deferred loading
• Dynamic command-line interface• Integrated documentation system
1 - http://www.iinteractive.com/moose/
Genome Workflow
Genome Model
Past is prologue…
… but with a wrinkle• Lab personnel accept
the software you give them
• Analysts are more than happy to develop their own
• We need to make it easy for analysts to build tools within the system
Easy Perl API
Variant Detection Pipeline
cDNA Analysis
16S Pipeline
Assembly and Annotation Pipeline
Challenges• There is still much more work to do• Sequencing is demolishing Moore’s law• The cult of traces• The richness of data• Visualization
CIRCOS
ThanksWeb Site http://genome.wustl.edu/Blog http://www.politigenomics.com/
LIMS Paper http://www.biomedcentral.com/1471-2105/8/362UR Presentation http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/