View
2.468
Download
1
Category
Tags:
Preview:
DESCRIPTION
Talk from the Bioinformatics session of the Advances in Genome Biology and Technology 2009 meeting.
Citation preview
Next-Generation InformaticsDavid Dooling <ddooling@wustl.edu>
AGBT Bioinformatics2009-02-05
ddooling@wustl.edu
Framing the problem
ddooling@wustl.edu
Framing the problem
!"""# !""$# !""!# !""%# !""&# !""'# !""(# !"")# !""*# !""+# !"$"#
,--./01#234#
567#
89-.3:/#;<=>#
8/?@/AB/#
6/.1-AA/C#
ddooling@wustl.edu
Different perspectives
ddooling@wustl.edu
LIMS
ddooling@wustl.edu
LIMS - Illumina/Solexa
ddooling@wustl.edu
LIMS - Roche/454
ddooling@wustl.edu
Analysis
ddooling@wustl.edu
Analysis - cDNASolexa cDNA reads
SNPsIndels
Gene expression
(to exquisite sensitivity)
[Transcriptome] OR [Genome + SpliceJunctions (SJs)] OR [Genome]
Variant discovery/
ASE
Splice isotypes
NovelGenes
Readdepth
Maq/Tophat
MaqReadsmap to
“non-genic”regions
VelvetGenScan
Readsmap to
novel SJs or introns
ddooling@wustl.edu
Project Lead
ddooling@wustl.edu
Changing pipelines
ddooling@wustl.edu
Changing pipelines - LIMSPrep
Tech-SpecificPrep /Detection
Primary Analysis
PCR
cDNAs
3730 Phred
Submission
HybridSelection
Bisulfite
SamplePooling
JumpingLibraries
WGS
Solexa
454
SOLiD
ChurchPolony(?)
Helicos(?)
…
(Technology-specific)
Flow-space
Color-space
.
.
.
NCBI Trace
ProjectArchives(e.g., DCC)
NCBI SRA
NCBI MedicalArchive
Courtesy of Toby Bloom
ddooling@wustl.edu
Changing pipelines - AnalysisBLASTBLATPASHssaha
runMappingELAND
mapreadsArachne
MAQexonerateSHRiMPSPLIGNMosaik
SLIM SearchSXOligoSearch
SOAP2NovoCraft
BowtieTophat
PhrapArachnePCAP
PhusionEuler
ATLASNewblerVelvetForge
SSAKEVCAKE
Euler-USRSHARCGS
CABOG
Alig
ners
Assem
blers
ddooling@wustl.edu
Framing the solution
ddooling@wustl.edu
Past is prologue
ddooling@wustl.edu
Convert this…
ddooling@wustl.edu
… into this
ddooling@wustl.edu
Convert this…
ddooling@wustl.edu
… into this
ddooling@wustl.edu
UR• Object-relational mapping (ORM) layer
– Interact with persistence layer (e.g., relational database) through objects and methods
– Automatic, dynamic class definitions– Moose1-like object definition syntax
• Object context– In-memory transactions (even across databases)– Caching/deferred loading
• Dynamic command-line interface• Integrated documentation system
1 - http://www.iinteractive.com/moose/
ddooling@wustl.edu
Genome Workflow
ddooling@wustl.edu
Genome Model
ddooling@wustl.edu
Past is prologue…
ddooling@wustl.edu
… but with a wrinkle• Lab personnel accept
the software you give them
• Analysts are more than happy to develop their own
• We need to make it easy for analysts to build tools within the system
ddooling@wustl.edu
Easy Perl API
ddooling@wustl.edu
Pairing
Analyst
Programmer
ddooling@wustl.edu
Variant Detection Pipeline
ddooling@wustl.edu
cDNA Analysis
ddooling@wustl.edu
16S Pipeline
ddooling@wustl.edu
Assembly and Annotation Pipeline
ddooling@wustl.edu
Challenges• There is still much more work to do• Sequencing is demolishing Moore’s law• The cult of traces• The richness of data• Visualization
ddooling@wustl.edu
CIRCOS
ddooling@wustl.edu
ThanksWeb Site http://genome.wustl.edu/Blog http://www.politigenomics.com/
LIMS Paper http://www.biomedcentral.com/1471-2105/8/362UR Presentation http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/
Recommended