Upload
torsten-seemann
View
50
Download
0
Tags:
Embed Size (px)
Citation preview
Pipeline or pipe dream?
Transitioning a public health microbiology laboratory network to
WGS & bioinformatics
Dr Torsten Seemann
1st Midlands Molecular Microbiology Meeting - Mon 15 Sep 2014 - Birmingham, UK
IntroductionOnly 4 hours until drinks.
(Very) South East Midlands
About me● Previous life
o B.Sc - computer science, data compressiono B.E - elec & comp sys engineering
(abandoned)o Ph.D - digital image processing
● Bioinformatician: microbial genomicso primarily bacterial pathogenso genomics data analysiso tool development: Prokka, Nesoni,
VelvetOptimiser
Nomadic bioinformatics
Microbial Diagnostic Unit
● Oldest public health lab in Australiao established 1897 in Melbourneo large historical isolate collection back to
1950s
● National reference laboratoryo Salmonella, Listeria, EHEC
● WHO regional reference labo vaccine preventable invasive bacterial
pathogens
New director● Professor Ben Howden
o clinician, microbiologist, pathologisto early adopter of genomics and bioinformaticso long term collaborator on MRSA and VRE
● Mandateo modernise service deliveryo enhance research output and collaborationo nationally lead the conversion to WGS
Transitioning“If you want to make enemies, try to change
something.”
Existing workflow
Traditional typing
● PFGEPulsed Field Gel Electrophoresis
● MLSTMulti-Locus Sequence Typing
● MLVAMulti-Locus VNTR Analysis
Drawbacks● Low resolution
o only gives rough idea of relationship
● Labour intensiveo lots of tedious lab work
● Relatively expensiveo in time and consumables
A single assay● Whole Genome Sequencing (WGS)
o backward compatible with most existing typing
● Complete snapshoto all variation: SNVs, insertions, structural
changeso plasmids, phage, resistance & virulence
genes
● High throughputo now cheaper (<£50) and faster (<24h)
New workflow
Refocussing● Small scale → big scale
o orders of magnitude increase in samples processed
● Manual → automatedo robots for colony picking & library
preparation● Benchtop → desktop
o bioinformatics, visualization, interpretation● Paper → electronic
o data storage, backups, LIMS system
ImplementationWarning: may contain some bioinformatics
The WGS assay
● Millions of DNA
sequences
● Typically 50-300 bp each
● Includes quality
information
● File size ~ 1 gigabyte
Using short reads● Read mapping
o align all reads to a reference genome
● De novo assemblyo reconstruct the source replicons into contigs
● Alignment-free methodso examine the nucleotide content of the reads
directly
Read mapping● Choose an existing reference genome● Find best fit for each read on the
reference
Use case: read mapping
Genome deletions● Regions in reference where no reads
align● DNA not present in sequenced isolate
De novo assembly
Like a jigsaw puzzle, except● we don’t have the box (unknown target)● missing pieces (coverage bias)● broken pieces (sequencing errors)● duplicate pieces (repeats)● disconnected sub-puzzles (multiple replicons)● random pieces from another puzzle
(contamination)● no corner or edge pieces (circular genomes)
Use case: de novo assembly● Novel DNA
o mobile elementso plasmidso phage
● Structural changeso inversions &
rearrangementso large insertions &
deletionso plasmid integration
Mutant
Wildtype
k-mer analysis● Build a “signature”
from all sub-reads of length k
● Compare signature to database of signatures of known genomes
k=4
Use case: k-mer analysis 1.04 1046 1046 U 0 unclassified98.96 99624 142 - 1 root98.81 99473 1 - 131567 cellular organisms98.81 99472 194 D 2 Bacteria98.57 99233 111 P 1224 Proteobacteria98.45 99110 318 C 1236 Gammaproteobacteria98.07 98728 0 O 91347 Enterobacteriales98.07 98728 52477 F 543 Enterobacteriaceae44.95 45256 665 G 561 Escherichia44.20 44498 33391 S 562 Escherichia coli 8.84 8899 8899 - 1274814 Escherichia coli APEC O78 0.29 287 0 - 244319 Escherichia coli O26:H11 0.29 287 287 - 573235 Escherichia coli O26:H11 str 11368 0.21 216 216 - 316401 Escherichia coli ETEC H10407 0.19 193 0 - 168807 Escherichia coli O127:H6 0.19 193 193 - 574521 Escherichia coli O127:H6 str E2348/69
http://ccb.jhu.edu/software/kraken
ProgressImplement. Deploy. ?????. Profit $$$.
Current status● Sequencer
o replacing MiSeq with NextSeq-500 + robots
● Softwareo most components written, some incompleteo not a fully automated pipeline yeto no friendly user interface yet
● Need a project name!
A vision for Australia● A common online system for all labs
o upload sampleso automated analysis pipelines
some customization for each genuso easy submission to ENA and Genbank
● Access controlo each lab controls their own datao jurisdictions can share data in outbreaks
Cooperation● International
o Global Microbial Identifier consortium
● UKo PHE, ngMicrobes, CLIMB, ENA
● USAo FDA GenomeTrakr, NCBI SRA
● NZo New Zealand already participating in our
PHLN
ConclusionYour post-prandial blood glucose is now
peaking.
Resistance to change● Protecting empires
o “this is how we’ve always done it”, job redundancies
● Expense of instrumentso capital purchase, maintenance, new staff
● Fear of the unknowno lack bioinformatics, infrastructure, software,
training● Legal requirements
o must do PFGE, validation, accreditation
Upcoming .au meetings● Lorne Infection & Immunity
o Feb 2015 @ Lorne (beach, near Melbourne)
● Australian ASM o Jul 2015 @ Canberra (near Parliament)
● BacPatho Sep 2015 @ San Remo (beach, near
Melbourne)
Opportunities
● Warwick-Monash Allianceo seed funding, joint positions, shared PhD
studentso www.monashwarwick.org
● Birmingham-Melbourneo both members of Universitas 21o www.universitas21.com
AcknowledgementsUni BrumNick Loman
Ian HendersonCathy WardiusAllie Hardwick
Loman family
My family
Monash Uni
David PowellDieter BulachRoss Coppel
Uni MelbTim StinearJason Kwong
MDUBen HowdenKim Barton
VLSCIAndrew Lonie
Helen Gardiner
[email protected] @torstenseemannBlogTheGenomeFactory.blogspot.comWeb bioinformatics.net.au
Contact