65
1 Ray and Ray Cloud Browser for Metagenomics Sébastien Boisvert @sebhtml Université Laval, Québec, Canada Beatles and Bioinformatics! #BeatlesAndBioinformatics University of Liverpool 27th November 2013 13:00 Talk: 40 minutes Questions: 5 min

Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

1

Ray and Ray Cloud Browser for Metagenomics

Sébastien Boisvert @sebhtmlUniversité Laval, Québec, Canada

Beatles and Bioinformatics! #BeatlesAndBioinformatics University of Liverpool

27th November 2013 13:00

Talk: 40 minutesQuestions: 5 min

Page 2: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

2

Where is Laval University ?

In Québec City

Page 3: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

3

Canada is in the Commonwealth of Nations too !

● Canadian money

Photo: http://www.bridgeandtunnelclub.com/bigmap/outoftown/canada/money/

Page 4: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

4

Super computing at Laval University

colosse#314 top500 06/20127616 Intel Xeon X5560 coresMellanox Technologies MT26428332 kW

Page 5: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

5

Plan

● Background● Parallelism● Ray & metagenomics● Compare samples with Surveyor● Interactive visualization● Futures

Page 6: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

6

Background

Page 7: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

7

We buy sequencers and computers but...● We have:

– DNA sequencers to read genetic code (parallel)

– Supercomputers to compute stuff in the general sense (parallel)

Mardis, E. R. (2011, February). A decade/'s perspective on DNA sequencing technology. Nature 470 (7333), 198-203.

Sanger, F., S. Nicklen, and A. R. Coulson (1977). DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences 74 (12), 5463-5467.

Shendure, J. and H. Ji (2008, October). Next-generation DNA sequencing. Nature Biotechnology 26 (10), 1135-1145.

Sanger, F. (2001, March). The early days of DNA sequences. Nat Med 7 (3), 267-268.

Afuah, A. N. and J. M. Utterback (1991, December). The emergence of a new supercomputer architecture. Technological Forecasting and Social Change 40 (4), 315-328.

Page 8: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

8

Trend

● However:– Genomics need more parallel software that scale with

biology's huge problems

Pollack, A. (2011). DNA sequencing caught in deluge of data. New York Times 1.

Baker, M. (2010, July). Next-generation sequencing: adjusting to data overload. Nature Methods 7 (7), 495-499.

Trelles, O., P. Prins, M. Snir, and R. C. Jansen (2011, February). Big data, but are we ready? Nature Reviews Genetics 12 (3), 224.

(2013, October). In need of an upgrade. Nature Biotechnology 31 (10), 857.

McPherson, J. D. (2009, November). Next-generation gap. Nature Methods 6 (11 Suppl), S2-S5.

Mardis, E. (2010). The $1,000 genome, the $100,000 analysis? Genome Medicine 2 (11), 84+.

Page 9: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

9

I created some useful software

● Ray genome assembly, metagenome assembly, taxonomic profiling, sample comparison

● RayPlatform platform on which Ray is built● Ray Cloud Browser visualization of large genome

graphs

Page 10: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

10

In this talk

● Ray (C++, started with bacterial genome assembly)● Ray Meta (assembling metagenomes with Ray)● Ray Communities (profiling metagenomes with

Ray)● Ray Surveyor (comparing DNA sequencing samples

without reference; Ray -run-surveyor)● Ray Cloud Browser (separate project )

Page 11: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

11

Our original idea in 2010

● Mixing reads from different technologies (454 + Illumina)

● 2010 paper about Ray heuristics:

Boisvert, S., F. Laviolette, and J. Corbeil (2010, November). Ray: simultaneous assembly of reads from a mix of high-throughput sequencing technologies. Journal of Computational Biology 17 (11), 1519-1533.

Page 12: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

12

Mixing sequencing reads

Figure from: Journal of Computational Biology 17 (11), 1519-1533.

Page 13: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

13

Platform

● Goal: build a platform for distributed genomic computing

● Thread-based programming is hard● Message passing is easy to understand, scales. but

harder to program● Solution: framework to abstract everything

Page 14: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

14

Platform perks

● Plugin interface● Actor model interface

● Runtimes:– Actor playground

– Standard mode

– Mini-ranks

Page 15: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

15

RayPlatform's scalability

● Ray is scalability is measurable

Sample SRS011098 from Human Microbiome Project (202 487 723 reads)

Figure from:

Godzaridis, Boisvert, et al. Big Data (accepted)

Page 16: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

16

Parallelism

Page 17: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

17

Software should be parallel too

● Highly parallel genomic assays

Nature Reviews Genetics 7, 632-644 (August 2006)

● Couple of reviews about need for speed

Flicek, P. (2009, March). The need for speed. Genome biology 10 (3), 1-4.

Bonetta, L. (2006, February). Genome sequencing in the fast lane. Nature Methods 3 (2), 141-147.

Schatz, M. C., B. Langmead, and S. L. Salzberg (2010, July). Cloud computing and the DNA data race. Nature Biotechnology 28 (7), 691-693.

Page 18: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

18

What is concurrency

● Several actions performed simultaneously during a period of time

● Example: give 1000000 sequences to 10 computers: each processes 100000 seq. simultaneously

● Threads are local to 1 computer● Processes can be distributed

Page 19: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

19

Actor model for programming genomic tools

● In a nutshell: actors send messages to each other and can spawn actors

● Video: http://channel9.msdn.com/Shows/Going+Deep/Hewitt-Meijer-and-Szyperski-The-Actor-Model-everything-you-wanted-to-know-but-were-afraid-to-ask

Hewitt, C., P. Bishop, and R. Steiger (1973). A universal modular ACTOR formalism for artificial intelligence. In Proceedings of the 3rd international joint conference on Artificial intelligence, IJCAI'73, San Francisco, CA, USA, pp. 235-245. Morgan Kaufmann Publishers Inc.

Agha, G. (1986). Actors: a model of concurrent computation in distributed systems. Cambridge, MA, USA: MIT Press.

Page 20: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

21

Ray & metagenomics

Page 21: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

22

Metagenomics (started in 1998)

● DNA sequencing is cheap● Bacteria in complex communities can not be

cultured easily● Metagenomics: direct DNA sequencing from

uncultured microorganisms● Field started by Jo Handelsman in 1998

Handelsman, J. (2004, December). Metagenomics: Application of genomics to uncultured microorganisms. Microbiology and Molecular Biology Reviews 68 (4), 669-685.

The microbiome explored: recent insights and future challenges. Blaser, Bork, Fraser, Knight & Wang Nature Reviews Microbiology 11, 213-217 (March 2013)

Handelsman et al. (Oct 1998) Chemistry & biology 5 (10).

Page 22: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

23

Existing metagenomic tools do ABC, we do XYZ

● Metagenomic sequencing data must be analyzed● Methods A, B, C (16S = metagenomics)● We propose X, Y and Z (whole genome shotgun + k-mers)

● Also, so many choices (tools, sequencers), most do ABC, we do XYZ

Loman, N. J., C. Constantinidou, J. Z. Chan, M. Halachev, M. Sergeant, C. W. Penn, E. R. Robinson, and M. J. Pallen (2012, September). High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nature Reviews Microbiology 10 (9), 599-606.

Kahvejian, A., J. Quackenbush, and J. F. Thompson (2008, October). What would you do if you could sequence everything? Nature Biotechnology 26 (10), 1125-1133.

Metagenomics: DNA sequencing of environmental samples Nature Reviews Genetics 6, 805-814 (November 2005)

Page 23: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

24

Some concepts

● Taxonomy: the branch of science concerned with classification, especially of organisms; systematics.

● Taxon: taxonomic group● Taxonomic tree: a tree of taxon● Leaf: a tree node without children● OTU: operational taxonomic unit

Page 24: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

25

Taxonomic profiling with kmers

● Kmers: DNA words of length k● Given (1) a taxonomic tree and (2) data (usually

reads or kmers) on the tree's leaves● LCA: Last Common Ancestor to classify each kmer

to a node (possibly not a leaf)● Colored = labeled with a taxon or genome identifier

Page 25: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

26

Examples

● Annotation with k-mers: Edwards, R. A., R. Olson, T. Disz, G. D. Pusch, V. Vonstein, R. Stevens, and R. Overbeek (2012, December). Real time metagenomics: using k-mers to annotate metagenomes. Bioinformatics (Oxford, England) 28 (24), 3316-3317.

● “Ray Communities” => Boisvert et al. 2012 Genome Biology● Scalable taxonomic assignation: Ames, S. K., D. A. Hysom, S. N. Gardner, G. S.

Lloyd, M. B. Gokhale, and J. E. Allen (2013, September). Scalable metagenomic taxonomy classification using a reference genome database. Bioinformatics 29 (18), 2253-2260.

Page 26: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

27

Profile with kmers using Ray Communities

● Genome abundance● Taxon abundance (good correlation with Metaphlan)● Gene Ontology

Page 27: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

28

UniFrac is mathematically sound

● Use taxon profiles● UniFrac: distance between 2 community samples

Lozupone, C. and R. Knight (2005, December). UniFrac: a new phylogenetic method for comparing microbial communities. Applied and Environmental Microbiology 71 (12), 8228-8235.

Page 28: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

29

Ray Meta

● “Ray Meta” => metagenome assembly with Ray● Binning with coverage may not accurate because

coverage depth changes with GC content and other factors

● Ray trick: instead of binning with coverage, bin with graph seeds (locality)

Boisvert, S., F. Raymond, E. Godzaridis, F. Laviolette, and J. Corbeil (2012, December). Ray meta: scalable de novo metagenome assembly and profiling. Genome Biology 13 (12), R122+.

● http://genomebiology.com/2012/13/12/R122

Page 29: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

30

Assembled proportions of bacterial genomes for a simulated metagenome

with sequencing errors

1000 bacterial genomes with power law distribution3*10^9 readsSimulated errorsFigure 1, Boisvert et al. 2012 Genome Biology

Good assembly proportion of contained genomes within metagenome

Page 30: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

31

Estimated bacterial genome

proportions● With kmer● Uniquely-colored k-

mers

A: 100-genome metagenome

B: 1000-genome metagenome

Figure 2, Boisvert et al. 2012 Genome Biology

Page 31: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

32

Enterotypes

● 3 enterotypes:Arumugam, M. (...) and P. Bork (2011, April). Enterotypes of the human gut microbiome. Nature 473 (7346), 174-180.

● 2 enterotypes:Wu, G. D. (...) and J. D. Lewis (2011, October). Linking long-term dietary patterns with gut microbial enterotypes. Science (New York, N.Y.) 334 (6052), 105-108.

● Can we reproduce that with k-mers-based classification ?

Page 32: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

33

Reproduction of enterotypes with k-mer based profiling

● Data: Qin et al. 2010 Nature (MetaHIT)

Figure 4, Boisvert et al. 2012 Genome Biology

Page 33: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

34

Some quotes

● Snake assembly in Assemblathon 2:

“The Ray assembly was ranked 1st overall, and also ranked 1st for all individual measures except multiplicity (where it still had a better than average performance). “ GigaScience 2013, 2:10

● E. coli sequencing on MiSeq:

“Ray stood apart as the most accurate of the three assemblers, based on the number of inversions, relocations, SNPs, and a visual inspection of the associated dot plots” BMC Genomics 2013, 14:675

● “Ray will be a good validation assembler” Bastien Chevreux (Mira assembler author) http://article.gmane.org/gmane.science.biology.ray-genome-assembler/696

Page 35: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

36

Using a graph to mine variation

Bubble caused by variation or sequencing error

Page 36: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

37

Comparing metagenome samples

● Idea: compare samples without a reference● Be it variants, or kmer content● For kmer presence/absence, don't use coverage● For RNA-Seq or taxon abundances, compare

normalized kmer counts

Page 37: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

38

Compare genomic content without a ref. with Surveyor

● Set of biological samples● DNA sequencing for each● Use Actor Model to compare a lot of samples● Build a de Bruijn graph that contains all of them (à

la fermi or Cortex), but distributed● In development

Iqbal, Z., I. Turner, and G. McVean (2013, January). High-throughput microbial population genomics using the cortex variation assembler. Bioinformatics 29 (2), 275-276. Cortex for microbial populations

Iqbal, Z., M. Caccamo, I. Turner, P. Flicek, and G. McVean (2012, February). De novo assembly and genotyping of variants using colored de bruijn graphs. Nature Genetics 44 (2), 226-232. Cortex

Li, H. (2012, July). Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 28 (14), 1838-1844. Fermi

Page 38: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

39

Ray -run-surveyor

● Existing methods enumerate variation entries● Genomic word content may also be interesting● Compare many samples (their kmer content)

Page 39: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

40

Legionella

● 2012 outbreak in Quebec City● What's the source of contamination ?● 3 suspect cooling towers● On the Illumina MiSeq

Page 40: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

41

Samples

● 22 patient-samples● 3 source-tower-samples (metagenomic)● 2 epidemic-strain-environmental-samples● 7 environmental-samples● 4 contemporaneous-samples● 5 old-1996-samples

Page 41: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

42

Questions

● Are the 2012 strains similar to the 1996 (also in Québec City) strains ?

● Which cooling tower is the most-likely source of contamination ?

Page 42: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

43

Similarity matrix (k spectrum kernel)

Ref. For spectrum kernel: Leslie, C., E. Eskin, and W. S. S. Noble (2002). The spectrum kernel: a string kernel for SVM protein classification. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, 564-575.

Page 43: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

44

Kernel-based distance matrix

For kernel distance formula: Scholkopf, B. (2000). The kernel trick for distances. In NIPS, pp. 301-307.

d(x, y)2 = k(x, x) + k(y, y) – 2k(x,y)

Page 44: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

45

Tree

Towers are outliers and their placement may not accurate.

Page 45: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

46

Similarity between patient samples & tower samples

towers/002-1 towers/006-1 towers/010-1

pat/ID120206 11187 12528 11329

pat/ID120368 11168 12513 11315

pat/ID120369 11282 12617 11427

pat/ID120370 11272 12613 11421

pat/ID120371 11289 12621 11434

pat/ID120713 11225 12566 11368

pat/KID119442 11092 12445 11239

pat/KID119444 11097 12449 11244

pat/KID119445 11117 12468 11261

pat/KID119536 11138 12488 11287

pat/KID119537 11175 12518 11321

pat/KID119788 11193 12536 11336

pat/KID119957 11092 12445 11239

pat/KID119958 11144 12494 11292

pat/KID119960 11265 12602 11408

pat/KID120069 11089 12442 11236

pat/KID120070 11154 12501 11299

pat/KID120071 11116 12467 11261

pat/KID120111 11219 12559 11365

pat/KID120112 11172 12518 11319

pat/KID120113 11357 12686 11497

pat/KID120114 11235 12577 11381

Smallest distance

Page 46: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

47

Interactive visualization

Page 47: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

48

Visualizing a microbiota with nucleic acid probes

Figure 2, Handelsman (2004) Microbiology and Molecular Biology Reviews 68 (4), 669-685.

Page 48: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

49

Observation

● Visualization is important to reach out to the general public

● People like beautiful things

Page 49: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

50

Structural metagenomics visualization

● Ray Cloud Browser● Project started to debug genome assembly code● http://genome.ulaval.ca:10208/client/● All you need is a modern web browser

Page 50: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

51

Ray Cloud Browser: interactively skim processed genomics data with energy

Frontend: Javascript, canvas

Backend: C++

https://github.com/sebhtml/Ray-Cloud-Browser

Page 51: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

52

Computing DNA layout for display

Barnes-Hut algorithm: Barnes, J. and P. Hut (1986, December). A hierarchical O(N log n) force-calculation algorithm. Nature 324 (6096), 446-449.

Page 52: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

53

Evolution path: linear -> bubble -> hairy bubble -> super bubble

Onodera, T., K. Sadakane, and T. Shibuya (2013). Detecting superbubbles in assembly graphs. In A. Darling and J. Stoye (Eds.), Algorithms in Bioinformatics, Volume 8126 of Lecture Notes in Computer Science, pp. 338-348. Springer Berlin Heidelberg.

Hairy bubbles

Page 53: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

54

Interactive too

Page 54: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

55

Bird's view

Page 55: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

56

Lumps

Howe, A. C., J. Pell, R. Canino-Koning, R. Mackelprang, S. Tringe, J. Jansson, J. M. Tiedje, and C. T. Brown (2012, December). Illumina sequencing artifacts revealed by connectivity analysis of metagenomic datasets.

http://dskernel.blogspot.ca/2013/01/metagenome-lumps-artifactual-mutations.html

Page 57: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

58

Lumps

Page 58: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

59

Lumps

Page 59: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

60

SRS011134

● Demo (2 min): http://genome.ulaval.ca:10208/client/● Genomic DNA from stool of a male● http://sra.dnanexus.com/samples/SRS011134

Page 60: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

62

Futures

● Genomic need more scalable & parallel software● More parallel● More push-button● Robustness● K-mer-based (paper: realtime kmers)

Page 61: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

64

Acknowledgements

● Invitation: Nicholas J. Loman, University of Birmingham

● Arrangements: Lesley Parsons, University of Liverpool

Page 62: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

65

Acknowledgements

● Funding: Canadian Institutes of Health Research (doctoral award)

● Compute time: Compute Canada & Calcul Québec (colosse and Mammouth Parallèle II)

● Jacques Corbeil (director) & François Laviolette (codirector)

Page 63: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

66

Acknowledgements

● Jean-François Erdelyi (from France) for working on Ray Cloud Browser during the 2013 summer

Page 64: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

67

Acknowledgements

● E. Godzaridis to comments and suggestions for my talk

Page 65: Ray and Ray Cloud Browser for Metagenomics17 Software should be parallel too Highly parallel genomic assays Nature Reviews Genetics 7, 632-644 (August 2006) Couple of reviews about

68

Questions

● don't forget to tweet !● @sebhtml● #BeatlesAndBioinformatics