Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
BasicSequenceAnalysis
LarsRønn OlsenAssistantProfessor, TechnicalUniversityofDenmark
Learningobjectives
Aftertoday,youwillbeableto:
• UnderstandhowBLASTworks
• UseBLASTforsequencesimilaritysearch
• Understand thetheorybehinddenovoassemblyofgenomesfromsequencereads
• Understand thetheorybehindshortreadalignment
• Understandhowmultiplesequencealignmentworks
• Usemultiplesequencealignment toexaminesequencevariability
• Usepublicwebservicestopredictvaccinetargetsinpathogens
Example:vaccinedesignworkflow
Pathogenofinterest:Denguevirus
Generalpathogeninformation:WikiPediaPubMed
Speciesinformation:NCBITaxonomy
Genomicsequencedata:NCBIGenBank
Geneinformation:GeneCardsAmiGO
Genomicsequencedata:Wholegenomesequencing
Proteinsequencedata:NCBIprotein
SwissProt/UniProt
Geneexpressionprofiles:NCBIGEO
Selectionofvaccinetargets
Database
Tool
BLAST- Searchingdatabasesforsequences
BLAST(BasicLocalAlignmentSearchTool) isatooltoqueryadatabase forsequencessimilartoaninputsequence.
Imagineyouhavesequencedagenefromanunknownsample,andyouwouldliketoknowwhatitis.YoucanuseBLASTinNCBItocompareyoursequencetoALLthe197,390,691sequencesinGenBank!
https://blast.ncbi.nlm.nih.gov/Blast.cgi
BLAST- Searchingdatabasesforsequences
Example:youwouldliketoknowwhatthefollowingsequenceis:
X)AATGCCG
Youhavethefollowingthreesequencesinyourdatabase:
A)CGTGTGATCB)AATGCCGC)GCTGTGAC
https://blast.ncbi.nlm.nih.gov/Blast.cgi
BLAST- Searchingdatabasesforsequences
Example:youwouldliketoknowwhatthefollowingsequenceis:
AATGCCG
Youhavethefollowingthreesequencesinyourdatabase:
A)CGTGTGATCB)AATCCCGC)GCTGTGAC
https://blast.ncbi.nlm.nih.gov/Blast.cgi
(Tip: use the font “courier new”)
X) AATGCCGB) AATCCCG
BLAST- Searchingdatabasesforsequences
Example:youwouldliketoknowwhatthefollowingsequenceis:
X)AATGCCG
Youhavethefollowingthreesequencesinyourdatabase:
A)CGTGTGATCB)AATCCCCGC)GCTGTGAC
https://blast.ncbi.nlm.nih.gov/Blast.cgi
X) AATG-CCGB) AATCCCCG
BLAST- Searchingdatabasesforsequences
Example:youwouldliketoknowwhatthefollowingsequenceis:
X)AATGCCG
Youhavethefollowingthreesequencesinyourdatabase:
A)AATCCCGB)AATCCCCGC)AATCC
https://blast.ncbi.nlm.nih.gov/Blast.cgi
X) AATGCCGA) AATCCCG
X) AATG-CCGB) AATCCCCG
X) AATGCCGC) AAT-CC-
Whichmatchisbest?
BLAST- Searchingdatabasesforsequences
https://blast.ncbi.nlm.nih.gov/Blast.cgi
Example:vaccinedesignworkflow
Pathogenofinterest:Denguevirus
Generalpathogeninformation:WikiPediaPubMed
Speciesinformation:NCBITaxonomy
Genomicsequencedata:NCBIGenBank
Geneinformation:GeneCardsAmiGO
Genomicsequencedata:Wholegenomesequencing
Proteinsequencedata:NCBIprotein
SwissProt/UniProt
Geneexpressionprofiles:NCBIGEO
Selectionofvaccinetargets
Database
Tool
Sequencingreadalignment- Toolsfornextgenerationsequencing
Basecalls
Qualitycontrol/trimming
Sequencingwithout reference:Denovoassembly
Rawsequencingoutput Database
Tool
Sequencingwithreference:Short readalignment
GenBank/RefSeq
Geneorgenomesequence
Sequence readarchive
Denovoassembly- Makingsenseofsequencingreadswithoutareferencegeneorgenome
ATGACGTTT
TTTCTGAAA
AAATTCCCC
CCCCTGGCCA)
B)
C)
D)
ATGACGTTT AAATTCCCC
TTTCTGAAA CCCCTGGCC
A) B)C) D)
ATGACGTTTCTGAAATTCCCCCTGGCC
Denovoassembly- Makingsenseofsequencingreadswithoutareferencegeneorgenome
ATGACGCCC
CCCCTGCCC
CCCTTCCCC
CCCCTGGCCA)
B)
C)
D)
ATGACGCCC CCCTTCCCC
CCCCTGCCC CCCCTGGCC
A) B)C) D)
ATGACGCCCCTGCCCTTCCCCCTGGCC
ATGACGCCC
CCCTTCCCC
CCCCTGCCC
CCCCTGGCC
A) C)B) D)
ATGACGTTTTTCCCCCTGCCCCTGGCC
???
Denovoassembly- Makingsenseofsequencingreadswithoutareferencegeneorgenome
ATGACGCCC
CCCCTGCCC
CCCTTCCCC
CCCCTGGCCA)
B)
C)
D)
ATGACGCCC CCCTTCCCC
CCCCTGCCC CCCCTGGCC
A) B)C) D)
ATGACGTTTCTGCCCTTCCCCCTGGCC
ATGACGCCC
CCCTTCCCC
CCCCTGCCC
CCCCTGGCC
A) C)B) D)
ATGACGTTTTTCCCCCTGCCCCTGGCC
???
TTCCCCCTGE)
TTCCCCCTGTTCCCCCTG
Sequencingreadalignment- Toolsfornextgenerationsequencing
Basecalls
Qualitycontrol/trimming
Sequencingwithout reference:Denovoassembly
Rawsequencingoutput Database
Tool
Sequencingwithreference:Short readalignment
GenBank/RefSeq
Geneorgenomesequence
Sequence readarchive
Shortreadalignment- Aligningreadstoareferencegeneorgenome
ATGACGTCAGCTGTTGGCGACATCGTTCGATCAGTCGATTATTCGATAATCGCTCTCTTAGReferenceATGACG TTGGCG CGTTCG AGTCGA TTCGAT TCTCTTReads
ReadDepth“x”
Makeyoursequencesavailable- Otherresearcherscanbenefitimmenselyfromyourwork!
YoursequencereadscanbedepositedintheSequenceReadArchive.
https://www.ncbi.nlm.nih.gov/sra
Makeyoursequencesavailable- Otherresearcherscanbenefitimmenselyfromyourwork!
Yourassembled/mapped sequencescanbedepositedinGenBank.
Example:vaccinedesignworkflow
Pathogenofinterest:Denguevirus
Generalpathogeninformation:WikiPediaPubMed
Speciesinformation:NCBITaxonomy
Genomicsequencedata:NCBIGenBank
Geneinformation:GeneCardsAmiGO
Genomicsequencedata:Wholegenomesequencing
Proteinsequencedata:NCBIprotein
SwissProt/UniProt
Geneexpressionprofiles:NCBIGEO
Selectionofvaccinetargets
Database
Tool
PredictingTcellepitopes- Toolsforvaccinetargetdiscovery
SequencevariabilityanalysisMultiplesequencealignment
PredictionofepitopesHLAalleleHLAallelefrequency
Selectionofepitopes
Proteinsequences
ImmuneEpitopeDatabase
Epitopesforvaccine
Database
Tool
Multiplesequencealignment- Aligningsequencestodeterminevariability
Multiplesequencealignment isatypeofalgorithmthatallowsyoutocompare2ormoresequencessimultaneously.
Thisishighlyuseful,forexamplewhenanalyzing thevariabilityofacertainproteininapathogen.
Therearemanydifferentalgorithms forthispurpose– oneofthemostfamousbeingClustalW from1997.
Infact,thistoolhasbeencited53,288times(number10inthepapermountain)andisstillcitedheavily.
HOWEVER!ThemainauthorofClustalW,DesHiggins,hasaskedpeopletostopusingandcitingitasthereareupgradesandotherbettertoolsavailabletoday!
Multiplesequencealignment- Aligningsequencestodeterminevariability
Howdoesitwork?VerysimilartoBLAST,exceptallsequencesareconsidered:
AATCCC-GAAATCCCCGTAATCC---T
AATCCCGAAATCCCCGTAATCCT
Multiplesequencealignment- Aligningsequencestodeterminevariability
Inpractice,youcopyordownloadallyoursequencesofinterestinfasta format.
Thefasta formatlookslikethis:
>Thislineistheheader.YoucanwritewhateveryouwanthereATCAGACTGTGCTGATCG…
Youthenpasteorupload thesequencestoamultiplesequencealignmentwebserver(orinstallitlocallyifyouhaveverylargedatasets)andrun thealignment.OneiswebPRANKwhichisavailablethroughEBI(EuropeanBioinformaticsInstitute).
http://www.ebi.ac.uk/goldman-srv/webprank/
Multiplesequencealignment- Aligningsequencestodeterminevariability
http://www.ebi.ac.uk/goldman-srv/webprank/
Multiplesequencealignment- Aligningsequencestodeterminevariability
http://www.ebi.ac.uk/goldman-srv/webprank/
PredictingTcellepitopes- Toolsforvaccinetargetdiscovery
SequencevariabilityanalysisMultiplesequencealignment
PredictionofepitopesHLAalleleHLAallelefrequency
Selectionofepitopes
Proteinsequences
ImmuneEpitopeDatabase
Epitopesforvaccine
Database
Tool
PredictionofTcellepitopes- WhatareTcellepitopes?
Virusproteinsarecleavedtoshortpeptides
Someofthesepeptidesbindtothehuman leukocyteantigenprotein(HLA)
Whenviruspeptidesarepresentedonthesurfaceoftheinfectedcell,Tcellskilltheinfectedcells
TheHLAproteincomeindifferentflavors
FindingTcellepitopes- Traditionalapproachtofindingimmunogenicregionsinpathogens
PredictionofTcellepitopes- Computationalapproachtofindingimmunogenicregionsinpathogens
Rappuoli R(2000) Reverse Vaccinology, Curr OpinMicrobiol
PredictionofTcellepitopes- HowtopredictpeptidebindingtoHLA
Therearealotofalgorithms topredictepitopesinpathogensandcancercells.
Among thebestperforming isNetMHC.
PredictionofTcellepitopes- HowtopredictpeptidebindingtoHLA
http://www.cbs.dtu.dk/services/NetMHC/
PredictionofTcellepitopes- HowtopredictpeptidebindingtoHLA
http://www.cbs.dtu.dk/services/NetMHC/
MRCVGVGNRRCVGVGNRDCVGVGNRDFVGVGNRDFVGVGNRDFVEVGNRDFVEGGNRDFVEGLNRDFVEGLSDFVEGLSGAFVEGLSGATVEGLSGATWEGLSGATWVGLSGATWVD
MRCVGVGNRRCVGVGNRDCVGVGNRDFVGVGNRDFVGVGNRDFVEVGNRDFVEGGNRDFVEGLNRDFVEGLSDFVEGLSGAFVEGLSGATVEGLSGATWEGLSGATWVGLSGATWVD
Epitopes Notepitopes
PredictionofTcellepitopes- HowtopredictpeptidebindingtoHLA
http://www.cbs.dtu.dk/services/NetMHC/
MRCVGVGNRDFVEGLSGATWVDVVLFQCLESIEGKAVQHENLKYTVIITVHTGDQHQVG
MRCVGVGNRRCVGVGNRDCVGVGNRDFVGVGNRDFVGVGNRDFVEVGNRDFVEGGNRDFVEGLNRDFVEGLSDFVEGLSGAFVEGLSGATVEGLSGATWEGLSGATWVGLSGATWVD
GVGNRDFVEVGNRDFVEGFVEGLSGATVEGLSGATW
Potentialepitopes
PredictingTcellepitopes- Toolsforvaccinetargetdiscovery
SequencevariabilityanalysisMultiplesequencealignment
PredictionofepitopesHLAalleleHLAallelefrequency
Selectionofepitopes
Proteinsequences
ImmuneEpitopeDatabase
Epitopesforvaccine
Database
Tool
HLAdatabases- Databaseswithinformationaboutthehumanleukocyteantigen
W
K
I
D
HLAcomesindifferent flavors(alleles).
Differentallelesbinddifferentpeptides.
HumanshavesixdifferentHLAalleles(threefromeachparent).
DifferentHLAallelesareprevalentindifferentpopulations– seetheHLAallelefrequencydatabase
http://www.allelefrequencies.net/
YoucanalsoexplorethesequencesoftheHLAsintheHLAalleledatabase
https://www.ebi.ac.uk/ipd/imgt/hla/
HLAdatabases- Databaseswithinformationaboutthehumanleukocyteantigen
W
K
I
D
http://www.allelefrequencies.net/
HLAdatabases- Databaseswithinformationaboutthehumanleukocyteantigen
W
K
I
D
http://www.allelefrequencies.net/
PredictingTcellepitopes- Toolsforvaccinetargetdiscovery
SequencevariabilityanalysisMultiplesequencealignment
PredictionofepitopesHLAalleleHLAallelefrequency
Selectionofepitopes
Proteinsequences
ImmuneEpitopeDatabase
Epitopesforvaccine
Database
Tool
Selectionofepitopes- ChoosingHLAbinders
Combiningvariabilityanalysis(multiple sequencealignment)andHLAbindingpredictionsletsyoupickthebestepitopesforyouvaccine.
Thiscanbedonemanuallyorusing, forexample,theBlockCons tool
Selectionofepitopes- ChoosingHLAbinders
http://met-hilab.cbs.dtu.dk/blockcons/
Selectionofepitopes- ChoosingHLAbinders
http://met-hilab.cbs.dtu.dk/blockcons/
Selectionofepitopes- ChoosingHLAbinders
Afteryouhaveselectedyourpotentialepitopes,youshouldcheckwhethertheyarepresentinhumanproteins,asvaccinationwiththesemayleadtolackofefficacyorpotentiallyautoimmunity.
Howwouldyoudothis?
Answer:useBLASTinGenBank,andsearchforthepeptidesinhumanproteins.
PredictingTcellepitopes- Toolsforvaccinetargetdiscovery
SequencevariabilityanalysisMultiplesequencealignment
PredictionofepitopesHLAalleleHLAallelefrequency
Selectionofepitopes
Proteinsequences
ImmuneEpitopeDatabase
Epitopesforvaccine
Database
Tool
ImmuneEpitopeDatabase(IEDB)- Databaseswithepitopesinanarrayofpathogenspecies
W
K
I
D
TheImmuneEpitopeDatabasecontainsepitopesthatotherresearchershavereportedintheliterature.
Useittoseeifanyonehasworkedexperimentallywithyourpredictedpeptidesbefore.Ifsomeonehasalreadytestedwhetheritgivesrisetoanimmuneresponse,youcanusethistoinformyourdecisiontousethemornot.
ImmuneEpitopeDatabase(IEDB)- Databaseswithepitopesinanarrayofpathogenspecies
W
K
I
D
http://www.iedb.org/
PredictionofTcellepitopes- Computationalapproachtofindingimmunogenicregionsinpathogens
Rappuoli R(2000) Reverse Vaccinology, Curr OpinMicrobiol
1,779 reportsofnovelorimprovedprediction algorithms
4,622 reportsofnovelvaccinetargets
142activeclinical trialsofvarious formsofDNAvaccines
2approved DNAvaccines...protecting horsesagainstWestNilevirusanddogs againstmelanoma.
16yearsofreversevaccinology- Whathaveweachieved?
Therearealotoftoolsoutthere
UseGoogletosearchfortoolsforyourspecificquestions.
Therearealsoonlineforawhereyoucanaskquestionsifyoucannotfindtheanswer.ForexampleBioStars:
https://www.biostars.org/
Youcanalsoexploretoolsinthetoolsregistryat:
https://bio.tools/
Takehomemessages
• Exploringand re-analyzingpublisheddataisincrediblyuseful- whenyousequencesomething, contributetotheresearchcommunityanduploadyourawandprocessedsequencedata!