Upload
joao-andre-carrico
View
472
Download
4
Tags:
Embed Size (px)
Citation preview
Microbial Typing : discriminating strains bellow species/subspecies level
Genomics : antibiotic resistance/ virulence factor gene presence/absence, Mobile genetic elements detection
http://en.wikipedia.org/wiki/File:ChronicleOfADeathForetold.JPG
WGS in molecular typing:
Gene-by-gene: wgMLST, cgMLST,rMLST,MLST,eMLST, MLST+
SNP comparison approaches: comparison with reference strains
Ability to recover most of the present sequence based typing information in a single experimental procedure
Microbiological
Sample
The Ideal Scenario
Magic Box of
NGS Wonders for
Microbiology
Completely characterized strain:
• Antibiotic resistance profile• Multilocus Sequence Typing (MLST)• Virulence factors present• Other SBTM information .Ex:
• spa (S. aureus)• emm (Group A Streptococcus)
Desired End result:
Risk Assessment of the strain and
Useful application of the data to clinical practice
Comparison between groups of strains
https://pmcvariety.files.wordpress.com/2014/06/eli-wallach-dead-good-bad-ugly.jpg?w=670&h=377&crop=1
My Goals/ Areas that I want to apply WGS to: • Microbial population structure• Microbial Evolution• Microbial Genomics : gene structure, genome synteny,
Mobile Genetic Elements detection
My toolbox is chosen based on my questions and what I want to do !
Trying to avoid:“I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.” - Abraham H. Maslow (1962), Toward a Psychology of Being
Sequence QA/QCFastQChttp://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Adaptor and Quality trimming:trimmomatichttp://www.usadellab.org/cms/?page=trimmomatic
AssemblySPAdeshttp://bioinf.spbau.ru/spades
Velvet http://www.ebi.ac.uk/~zerbino/velvet/
MappingBowtie2http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
Annotation:Prokkahttp://www.vicbioinformatics.com/software.prokka.shtml
Whole genome comparisonBRIG (Blast Ring Generator)http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
MAUVEhttp://darlinglab.org/mauve/mauve.html
http://rugbyea.com/wp-content/uploads/2013/05/blast.jpghttp://www.ecohealthypets.com/writable/pet_report_photos/photo/480x/ball_python_2.jpg
- Perform the same analysis over tens, hundreds or thousands of strains : your own and publicly available
- Integrate multiple analysis in a single pipeline- Pipelines = reproducibility (if not something is very wrong)
http://www.ebi.ac.uk/ena
http://www.ncbi.nlm.nih.gov/sra
Gene-by-Gene /extended MLST approaches are my favorite
Why? Allele based classification “buffers” the effect of
recombination in the analysis
Stable nomenclature for alleles facilitates data exchange by schema creation
Easy to expand and visualize up to thousands of genomes with MST- like approaches
Lower computing requirements
Bacterial Isolate Genome Sequence Database Jolley & Maiden 2010, BMC Bioinformatics 11:595 -
http://pubmlst.org/software/database/bigsdb/
PROs: Freely available, open-source, handles thousands of genomes, has several schemas implemented for MLSTfor several bacterial species, and some extended MLST and core genome MLST (mainly Neisseria sp. but soon to be expanded)
CONs: Requires Perl knowledge to install and maintain
Ridom SeqSphere+ http://www.ridom.com/seqsphere/ Commercial software with client server solutions from assembly to allele
calling and visualization for core genome MLST (MLST+/ cgMLST)
Applied Maths - Bionumerics 7.5 http://www.applied-maths.com/news/bionumerics-version-75-released Commercial software with client server solutions from assembly to allele
calling and visualization for whole genome MLST (wgMLST)
Schema = set of loci to be used
What is a locus?gene or part of a gene
How to choose the locus:1. Start from reference genomes2. Decide if you want core genes only or core+accessory genes3. Use a method to compare CDS/ORF of reference genomes:
1. OrthoMCL - www.orthomcl.org
2. CD-HIT-cd-hit.org4. Parse the output to:
1. Remove paralogous genes2. Decide which are core genes and which are accessory genes
At this point different algorithms/software use:- BLAST(n/p/x)- Different criteria and parameters are used to call an
alleles as a coding sequence or part of a coding sequence
Self BLAST – Calculate BSR
BLAST
Run prodigal on genome
Translate CDSto protein
Translate genefile to protein
Gene BLASTdatabase
No blast match
or BSR<=0.6BSR =1 &
same DNA seq?LOT? BSR>0.6
Add new allele to gene file
Calculate BSR of the new allele
Calculate BSR
Re-do
Gene BLAST database
LNF Exact Match LOTInferred
Allele
Allelic profile
Prodigal (Prokaryotic Dynamic Programming Gene finding Algorithm):
BSR: Blast Score Ratio
LOT: Locus On the Tip (of a contig)
Core Genome addressing synteny:
Core Genome Addressing synteny and paralogy:
http://www.phyloviz.net
Open source and Freely available!
Can be easily applied to:- MLST- MLVA- SNP data*- Gene Presence/absence
*Conversion of VCF to PHYLOViZ: https://github.com/nickloman/misc-genomics-tools/blob/master/scripts/vcf2phyloviz.py(Thanks Nick!)
PROs: Handles thousands of profilesFast calculationEasy to annotate and explore metadataAllows for basic statistics on profiles and metadataAllows for advanced statistics on MSTs(PLoS One. 2015 Mar 23;10(3):e0119315) Exports high quality graphical formatsAllows plugin development
CONs: goeBURST and goeBURST MST only
(Neighbour Joining and UPGMA soon)JAVA knowledge to code new plugins
MEGA (http://www.megasoftware.net/)
Splitstree (http://www.splitstree.org/)
Geneious (http://www.geneious.com/)
Multipurpose software: very useful for sequence alignment visualization, tree building and annotation visualization
(commercial software)
No need to take sides on choosing an approach. Gene-by-gene, SNP, K-mer methods should be used depending on the problem at hand and the questions
The still evolving tool and sequencing methodology development makes the creation of easy-to-use “big red button” approaches difficult to implement
Beware of differences in software /algorithm version that can lead to different results
Always be critical for the results you have and try to understand if you have a nail or a screw before picking up the hammer at hand
UMMI Members: Mickael Silva Sergio Santos Bruno Gonçalves Adriana Policarpo Mário Ramirez José Melo-Cristino
FP7 PathoNGenTrace (http://www.patho-ngen-trace.eu/): Dag Harmsen (Univ. Muenster) Stefan Niemann (Research Center Borstel) Keith Jolley, James Bray and Martin Maiden (Univ. Oxford) Joerg Rothganger (RIDOM) Hannes Pouseele (Applied Maths)
Genome Canada IRIDA project (www.irida.ca) Franklin Bristow, Thomas Matthews, Aaron Petkau, Morag Graham and Gary Van Domselaar (NLM ,
PHAC) Ed Taboada and Peter Kruczkiewicz (Lab Foodborne Zoonoses, PHAC) Fiona Brinkman (SFU) William Hsiao (BCCDC)
INESC-ID Members:Alexandre FranciscoCátia VazPedro Tiago Monteiro
INTEGRATED RAPID INFECTIOUS DISEASE ANALYSIS
Twitter Microbial Bioinf community:Nick LomanTorsteen SeemanWill SchaikMick WatsonJennifer GardyMany, many others….
Draft Scientific Programme:
Plenaries:
1) Small Scale Microbial Epidemiology
2) Large Scale Microbial Epidemiology
3) Bioinformatics for Genome-based Microbial Epidemiology
4) Population Genetics: Pathogen Emergence
5) Population Dynamics : Transmission networks and
surveillance
6) Molecular Epidemiology for Global Health and One
Health
Parallel Sessions
1) Food and Environmental pathogens
2) Microbial Forensics
3) Virus
4) Fungi and Yeasts
5) Novel Diagnostics methodologies
6) Novel Typing approaches
7) Phylogenetic Inference
8) Interactive Illustration Platforms