11
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering

Scalable Algorithms for Next-Generation Sequencing Data Analysis

  • Upload
    elvis

  • View
    134

  • Download
    0

Embed Size (px)

DESCRIPTION

Scalable Algorithms for Next-Generation Sequencing Data Analysis. Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering. Next Generation Sequencing. Illumina HiSeq. Roche/454. SOLiD 5500. Ion Proton. PacBio RS. Oxford Nanopore. - PowerPoint PPT Presentation

Citation preview

Page 1: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

Scalable Algorithms for Next-Generation Sequencing

Data Analysis

Ion MandoiuUTC Associate Professor in Engineering InnovationDepartment of Computer Science & Engineering

Page 2: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

Next Generation Sequencing

Roche/454 Illumina HiSeq

SOLiD 5500 Ion Proton

PacBio RS Oxford Nanopore

Page 3: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

3

Ongoing Projects• Transcriptome Analysis

- Transcriptome quantification and differential expression analysis- Computational deconvolution of heterogeneous samples- Transcriptome and meta-transcriptome assembly

• Viral quasispecies- Quasispecies reconstruction from NGS reads- IBV evolution and vaccine optimization- Transmission graphs

• Immunoinformatics- Genomics-guided immunotherapy- Deep panning for early cancer detection

• Sequencing error correction, genome assembly and scaffolding, metabolomics, biomarker selection, … - More info & software at http://dna.engr.uconn.edu

Page 4: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

Transcriptome Quantification

• RNA-PhASE pipeline for allele-specific isoform expression

A B C A C

Series1

Series1

• IsoEM algorithm for isoform expression estimation- Incorporates fragment length distribution, hexamer bias correction, …

IsoEM HBR

Cufflinks HBR

IsoEM UHR

Cufflinks UHR

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

R2

Ion Torrent MAQC datasets

Page 5: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

Differential Expression• Fast estimation enables the use of accurate bootstrapping-based methods

MAQC 454 datasets UHRR SRX002934 vs HBRR SRX002935

Page 6: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

Computational Deconvolution of Heterogeneous Samples

• Goal: characterization expression of mesoderm progenitor cells– Whole-transcriptome expression data for NSB cell

mixtures + single-cell qPCR data for few genes

• Three step approach– Cluster of single cell qPCR data and infer

“reduced” cell type signatures

– Infer mixing proportions based on reduced signatures using quadratic programming

– Infer full expression signatures based on mixing proportions, solving one quadratic program per gene

Page 7: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

1 742 3 65t1 :

1 743 65t2 :

1 742 3 5t3 :

t4 :1 743 5

1 742 3 65

Reference-Guided Transcriptome Reconstruction

Page 8: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

TRIP: Transciptome Reconstruction using Integer Programming

• Select the smallest set of putative transcripts that yields a good statistical fit between– empirically determined during library preparation– implied by “mapping” read pairs

1 3

1 2 3

500

300

200 200 200

200 200

Series1

Mean : 500; Std. dev. 50

Series1

Mean : 500; Std. dev. 50

Page 9: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

De Novo (Meta)Transcriptome Assembly of Bugula Neritina and its Symbiont

• Uncultured bacterial symbiont produces bryostatins- Symbiont absent in Northern Atlantic populations

Page 10: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

De Novo (Meta)Transcriptome Assembly of Bugula Neritina and its Symbiont

• Developing scalable multi-sample meta transcriptome assembly pipeline based on differential-coverage clustering of reads

Page 11: Scalable Algorithms for  Next-Generation Sequencing Data Analysis

Acknowledgements

Sahar Al SeesiAbdul BandayAmir BayeganGabriel IlieCaroline Jakuba

James LindsayRahul KanadiaCraig NelsonMarius Nicolae

Adrian CaciulaNicole LopanikSerghei MangulYvette Temate TiagueuAlex Zelikovsky