Upload
elvis
View
134
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Scalable Algorithms for Next-Generation Sequencing Data Analysis. Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering. Next Generation Sequencing. Illumina HiSeq. Roche/454. SOLiD 5500. Ion Proton. PacBio RS. Oxford Nanopore. - PowerPoint PPT Presentation
Citation preview
Scalable Algorithms for Next-Generation Sequencing
Data Analysis
Ion MandoiuUTC Associate Professor in Engineering InnovationDepartment of Computer Science & Engineering
Next Generation Sequencing
Roche/454 Illumina HiSeq
SOLiD 5500 Ion Proton
PacBio RS Oxford Nanopore
3
Ongoing Projects• Transcriptome Analysis
- Transcriptome quantification and differential expression analysis- Computational deconvolution of heterogeneous samples- Transcriptome and meta-transcriptome assembly
• Viral quasispecies- Quasispecies reconstruction from NGS reads- IBV evolution and vaccine optimization- Transmission graphs
• Immunoinformatics- Genomics-guided immunotherapy- Deep panning for early cancer detection
• Sequencing error correction, genome assembly and scaffolding, metabolomics, biomarker selection, … - More info & software at http://dna.engr.uconn.edu
Transcriptome Quantification
• RNA-PhASE pipeline for allele-specific isoform expression
A B C A C
Series1
Series1
• IsoEM algorithm for isoform expression estimation- Incorporates fragment length distribution, hexamer bias correction, …
IsoEM HBR
Cufflinks HBR
IsoEM UHR
Cufflinks UHR
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
R2
Ion Torrent MAQC datasets
Differential Expression• Fast estimation enables the use of accurate bootstrapping-based methods
MAQC 454 datasets UHRR SRX002934 vs HBRR SRX002935
Computational Deconvolution of Heterogeneous Samples
• Goal: characterization expression of mesoderm progenitor cells– Whole-transcriptome expression data for NSB cell
mixtures + single-cell qPCR data for few genes
• Three step approach– Cluster of single cell qPCR data and infer
“reduced” cell type signatures
– Infer mixing proportions based on reduced signatures using quadratic programming
– Infer full expression signatures based on mixing proportions, solving one quadratic program per gene
1 742 3 65t1 :
1 743 65t2 :
1 742 3 5t3 :
t4 :1 743 5
1 742 3 65
Reference-Guided Transcriptome Reconstruction
TRIP: Transciptome Reconstruction using Integer Programming
• Select the smallest set of putative transcripts that yields a good statistical fit between– empirically determined during library preparation– implied by “mapping” read pairs
1 3
1 2 3
500
300
200 200 200
200 200
Series1
Mean : 500; Std. dev. 50
Series1
Mean : 500; Std. dev. 50
De Novo (Meta)Transcriptome Assembly of Bugula Neritina and its Symbiont
• Uncultured bacterial symbiont produces bryostatins- Symbiont absent in Northern Atlantic populations
De Novo (Meta)Transcriptome Assembly of Bugula Neritina and its Symbiont
• Developing scalable multi-sample meta transcriptome assembly pipeline based on differential-coverage clustering of reads
Acknowledgements
Sahar Al SeesiAbdul BandayAmir BayeganGabriel IlieCaroline Jakuba
James LindsayRahul KanadiaCraig NelsonMarius Nicolae
Adrian CaciulaNicole LopanikSerghei MangulYvette Temate TiagueuAlex Zelikovsky