View
91
Download
0
Category
Preview:
Citation preview
Introduction to ApolloCollaborative genome annotation editing
A webinar for the i5K Research Community – Calanoida (copepod)
Monica Munoz-Torres | @monimunozto
Berkeley Bioinformatics Open-Source Projects (BBOP)Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory
i5k Pilot Project Species Calls | 17 October, 2016
http://GenomeArchitect.org
Outline
• Today you will discovereffective ways to extract valuable information about a genome through curation efforts.
After this talk you will...• Better understand ‘curation’ in the context of genome annotation:
assembled genome à automated annotation à manual annotation
• Become familiar with Apollo’s environment and functionality.
• Learn to identify homologs of known genes of interest in your newly sequenced genome.
• Learn how to corroborate and modify automatically annotated gene models using all available evidence in Apollo.
Experimental design, sampling.
Comparative analyses
Official / Merged Gene Set
Manual Annotation
Automated Annotation
SequencingAssembly
Synthesis & dissemination.
This is our focus.
We must care about curation
Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
The gene set of an organism informs a variety of studies:• Characterization: Gene number, GC%, TEs, repeats.• Functional assignments.• Molecular evolution, sequence conservation.• Gene families.• Metabolic pathways.• What makes an organism what it is?
What makes a bee a “bee”?
Genome Curation
Identifies elements that best represent the underlying biology and eliminates elements that reflect systemic errors of automated analyses.
Assigns function through comparative analysis of similar genome elements from closely
related species using literature, databases, and experimental
data.
Apollo
Gene Ontology Resources
A few things to rememberwhen conducting manual annotation
7BIO-REFRESHER
• KEEPAGLOSSARY HANDYfromcontig tosplicesite
• WHATISAGENE?definingyourgoal
• TRANSCRIPTIONmRNAindetail
• TRANSLATIONreadingframes,etc.
• GENOMECURATIONstepsinvolved
The gene: a “moving target”
“The gene is a union of genomic
sequences encoding a coherent set of
potentially overlapping
functional products.”
Gerstein et al., 2007. Genome Res
9
"Gene structure" by Daycd- Wikimedia Commons
BIO-REFRESHER
mRNA
• Although of brief existence, understanding mRNAs is crucial,as they will become the center of your work.
10BIO-REFRESHER
Reading frames
v In eukaryotes, only one reading frame per section of DNA is biologically relevant at a time: it has the potential to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF)• ORF = Start signal + coding sequence (divisible by 3) + Stop signal
11BIO-REFRESHER
Splice sites
v The spliceosome catalyzes the removal of introns and the ligation of flanking exons.
v Splicing signals (from the point of view of an intron): • One splice signal (site) on the 5’ end: usually GT (less common: GC)• And a 3’ end splice site: usually AG• Canonical splice sites look like this: …]5’-GT/AG-3’[…
12BIO-REFRESHER
Exons and Introns
v Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons
v Between the first and second nucleotide of a codon
v Or between the second and third nucleotide of a codon
"Exon and Intron classes”. Licensed under Fair use via Wikipedia
Prediction&Annotation
14GENE PREDICTION & ANNOTATION
PREDICTION & ANNOTATION
v Identificationandannotationofgenomefeatures:
• primarilyfocusesonprotein-codinggenes.• alsoidentifiesRNAs(tRNA,rRNA,longandsmallnon-coding
RNAs(ncRNA)),regulatorymotifs,repetitiveelements,etc.
• happensin2phases:1. Computationphase2. Annotationphase
15GENE PREDICTION & ANNOTATION
COMPUTATION PHASE
a. Experimentaldataarealignedtothegenome:expressedsequencetags,RNA-sequencingreads,proteins(alsofromotherspecies).
a. Genepredictionsaregenerated:- ab initio:basedonnucleotidesequenceandcompositione.g.Augustus,GENSCAN,geneid,fgenesh,etc.
- evidence-driven:identifyingalsodomainsandmotifse.g.SGP2,JAMg,fgenesh++,etc.
Result:thesinglemostlikelycodingsequence,noUTRs,noisoforms.Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
16GENE PREDICTION & ANNOTATION
ANNOTATION PHASE
Experimentaldata(evidence)and predictionsaresynthetizedintogeneannotations.
Result: genemodelsthatgenerallyincludeUTRs,isoforms,evidencetrails.
Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174
5’UTR 3’UTR
17
Insomecasesalgorithmsandmetricsusedtogenerateconsensussetsmayactuallyreducetheaccuracyofthegene’srepresentation.
CONSENSUS GENE SETS
Genemodelsmaybeorganizedintosetsusing:v combinersforautomaticintegrationofpredictedsets
e.g:GLEAN,EvidenceModeler
orv toolspackagedintopipelines
e.g:MAKER,PASA,Gnomon,Ensembl,etc.
GENE PREDICTION & ANNOTATION
ANNOTATIONneeds some refinement
No one is perfect, least of all automated annotation. 18
Newtechnologiesbringnewchallenges:• Assembly errorscancausefragmented
annotations• Limited coveragemakesprecise
identificationadifficulttask
MANUAL ANNOTATIONimproving predictions
Preciseelucidationofbiologicalfeaturesencodedinthegenomerequirescareful
examinationandreview.
Schiex etal.Nucleic Acids2003 (31)13:3738-3741
Automated Predictions
Experimental Evidence
Manual Annotation – to the rescue. 19
cDNAs,HMMdomainsearches,RNAseq,genesfromotherspecies.
GENOME CURATIONan inherently collaborative task
GENE PREDICTION & ANNOTATION 20
Somanysequences,notenoughhands.
Apismellifera|AlexanderWild|www.alexanderwild.com
We have provided continuous training and support for hundreds ofgeographically dispersed scientists to conduct manual annotationsefforts in order to recover coding sequences in agreement with allavailable biological evidence.
21
Collaboration is key!
APOLLO
• Collaborative work distills invaluable knowledge.
• A little training goes a long way! Wet lab scientists can easily learn to maximize the generation of accurate, biologically supported gene models.
Apollo
APOLLO: versatile genome annotation editing• Apollo is a web-based genome annotation editor, integrated with JBrowse
• Supports real time collaboration & generates analysis-ready data
USER-CREATED ANNOTATIONS
EVIDENCE TRACKS
ANNOTATOR PANEL
BECOMING ACQUAINTED WITH APOLLO
General process of curation
1. Selectorfinda regionofinterest,e.g.scaffold.
2. Selectappropriateevidencetrackstoreviewthegenemodel.
3. Determinewhetherafeatureinanexistingevidencetrackwillprovideareasonablegenemodeltostartworking.
4. Ifnecessary,adjust thegenemodel.
5. Checkyoureditedgenemodelforintegrityandaccuracy bycomparingitwithavailablehomologs.
6. Comment andfinish.
Apollo - version at i5K Workspace@NAL
4. Becoming Acquainted with Web Apollo.
25
TheSequenceSelectionWindow
Sort
Apollo - version at i5K Workspace@NAL
“OldTrackSelectPage”
4. Becoming Acquainted with Web Apollo.
26
APOLLOannotation editing environment
BECOMING ACQUAINTED WITH APOLLO
ColorbyCDSframe,togglestrands,setcolorschemeandhighlights.
- Uploadevidencefiles(GFF3,BAM,BigWig),- combinationtrack- sequencesearchtrack
QuerythegenomeusingBLAT.
Navigationandzoom.
Searchforagenemodelorascaffold.
Getcoordinatesand“rubberband”selectionforzooming.
Login
User-createdannotations. New
annotatorpanel.
EvidenceTracks
Stageandcell-typespecifictranscriptiondata.
http://genomearchitect.org/web_apollo_user_guide
28 | BECOMING ACQUAINTED WITH APOLLO
USER NAVIGATION
Annotatorpanel.
• Chooseappropriateevidencefromlistof“Tracks”onannotatorpanel.
• Select&dragelementsfromevidencetrackintothe‘User-createdAnnotations’area.
• Hoveringoverannotationinprogressbringsupaninformationpop-up.
• Creatinganewannotation
Adding a gene model
Adding a gene model
Adding a gene model
Editing functionality
Editing functionalityExample: Adding an exon supported by experimental data
• RNAseq reads show evidence in support of a transcribed product that was not predicted.• Add exon by dragging up one of the RNAseq reads.
Editing functionalityExample: Adjusting exon boundaries supported by experimental data
CuratingwithApollo
36 |
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• ‘Zoomtobaselevel’ revealstheDNATrack.
37 |
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• ColorexonsbyCDSfromthe‘View’menu.
38 |
Zoomin/outwithkeyboard:shift+arrowkeysup/down
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• TogglereferenceDNAsequenceand translationframesinforwardstrand.Togglemodelsineitherdirection.
annotatingsimplecases
“Simplecase”:- thepredictedgenemodeliscorrectornearlycorrect,and- thismodelissupportedbyevidencethatcompletely ormostlyagreeswiththeprediction.- evidencethatextendsbeyondthepredictedmodelisassumedtobenon-codingsequence.
Thefollowingaresimplemodifications.
ANNOTATING SIMPLE CASES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
• A confirmation box will warn you if the receiving transcript is not on thesame strand as the feature where the new exon originated.
• Check ‘Start’ and ‘Stop’ signals after each edit.
ADDING EXONS
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Iftranscriptalignmentdataareavailable&extendbeyondyouroriginalannotation,youmayextendoraddUTRs.
1. Rightclickattheexonedgeand‘Zoomtobaselevel’.
2. PlacethecursorovertheedgeoftheexonuntilitbecomesablackarrowthenclickanddragtheedgeoftheexontothenewcoordinatepositionthatincludestheUTR.
ADDING UTRs
ToaddanewsplicedUTRtoanexistingannotationalsofollowtheprocedureforaddinganexon.
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
To modify an exon boundary and matchdata in the evidence tracks: selectboth the [offending] exon and thefeature with the expected boundary,then right click on the annotation toselect ‘Set 3’ end’ or ‘Set 5’ end’ asappropriate.
Insomecasesallthedatamaydisagreewiththeannotation,inothercasessomedatasupporttheannotationandsomeofthe
datasupportoneormorealternativetranscripts.Trytoannotateasmanyalternativetranscriptsasarewellsupportedbythedata.
MATCHING EXON BOUNDARY TO EVIDENCE
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
1. Twoexonsfromdifferenttrackssharingthesamestart/endcoordinatesdisplayaredbartoindicatematchingedges.
2. Selectingthewholeannotationoroneexonatatime,usethis edge-matching functionandscrollalongthelengthoftheannotation,verifyingexonboundariesagainstavailabledata.Usesquare[]bracketstoscrollfromexontoexon.Usercurly{}bracketstoscrollfromannotationtoannotation.
3. CheckifcDNA/RNAseqreadslackoneormoreoftheannotatedexonsorincludeadditionalexons.
CHECKING EXON INTEGRITY
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Non-canonicalsplicesitesflags. Doubleclick:selectionoffeatureandsub-features
EvidenceTracksArea
‘User-createdAnnotations’Track
Edge-matching
Apollo’seditinglogic(brain):§ selectslongestORFasCDS§ flagsnon-canonicalsplicesites
ORFs AND SPLICE SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Non-canonical splices are indicated byan orange circle with a whiteexclamation point inside, placed overthe edge of the offending exon.
Canonicalsplicesites:
3’-…exon]GA/TG[exon…-5’
5’-…exon]GT/AG[exon…-3’reversestrand,notreverse-complemented:
forwardstrand
SPLICE SITES
Zoom toreviewnon-canonicalsplicesitewarnings.Althoughthesemaynotalwayshavetobecorrected(e.g GCdonor),theyshouldbeflaggedwithacomment.
Exon/intronsplicesiteerrorwarning
Curatedmodel
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
Apollocalculatesthelongestpossibleopenreadingframe(ORF)thatincludescanonical‘Start’and‘Stop’signalswithinthepredictedexons.
If‘Start’appearstobeincorrect,modifyitbyselectinganin-frame‘Start’codonfurtherupordownstream,dependingonevidence(proteins,RNAseq).
Itmaybepresentoutsidethepredictedgenemodel,withinaregionsupportedbyanotherevidencetrack.
Inveryrarecases,theactual‘Start’ codonmaybenon-canonical(non-ATG).
‘Start’ AND ‘Stop’ SITES
BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES
annotatingcomplexcases
Evidencemaysupportjoiningtwoormoredifferentgenemodels.Warning: proteinalignmentsmayhaveincorrectsplicesitesandlacknon-conservedregions!
1. In‘User-createdAnnotations’area shift-clicktoselectanintronfromeachgenemodelandrightclicktoselectthe‘Merge’ optionfromthemenu.
2. Dragsupportingevidencetracksoverthecandidatemodelstocorroborateoverlap,orreviewedgematchingandcoverageacrossmodels.
3. Checktheresultingtranslationbyqueryingaproteindatabase e.g.UniProt,NCBInr.Addcommentstorecordthatthisannotationistheresultofamerge.
Redlinesaroundexons:‘edge-matching’allowsannotatorstoconfirmwhethertheevidenceisinagreementwithoutexaminingeachexonatthebaselevel.
COMPLEX CASESmerge two gene predictions on the same scaffold
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
Oneormoresplitsmayberecommendedwhen:- differentsegmentsofthepredictedproteinaligntotwoormoredifferentgenefamilies- predictedproteindoesn’taligntoknownproteinsoveritsentirelength- Transcriptdatamaysupportasplit,butfirstverifywhethertheyarealternativetranscripts.
COMPLEX CASESsplit a gene prediction
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
DNATrack
‘User-createdAnnotations’Track
COMPLEX CASESannotate frameshifts and correct single-base errors
Alwaysremember:whenannotatinggenemodelsusingApollo,youarelookingata‘frozen’versionofthegenomeassemblyandyouwillnotbeabletomodifytheassemblyitself.
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
COMPLEX CASEScorrecting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
COMPLEX CASEScorrecting selenocysteine containing proteins
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
1. Apolloallowsannotatorstomakesinglebasemodificationsorframeshifts thatarereflectedinthesequenceandstructureofanytranscriptsoverlappingthemodification.ThesemanipulationsdoNOTchangetheunderlyinggenomicsequence.
2. Ifyoudeterminethatyouneedtomakeoneofthesechanges,zoomintothenucleotidelevelandrightclickoverasinglenucleotideonthegenomicsequencetoaccessamenuthatprovidesoptionsforcreatinginsertions,deletionsorsubstitutions.
3. The‘CreateGenomicInsertion’featurewillrequireyoutoenterthenecessarystringofnucleotideresiduesthatwillbeinsertedtotherightofthecursor’scurrentlocation.The‘CreateGenomicDeletion’ optionwillrequireyoutoenterthelengthofthedeletion,startingwiththenucleotidewherethecursorispositioned.The‘CreateGenomicSubstitution’featureasksforthestringofnucleotideresiduesthatwillreplacetheonesontheDNAtrack.
4. Onceyouhaveenteredthemodifications,Apollowillrecalculatethecorrectedtranscriptandproteinsequences,whichwillappearwhenyouusetheright-clickmenu‘GetSequence’option.Sincetheunderlyinggenomicsequenceisreflectedinallannotationsthatincludethemodifiedregionyoushouldalertthecuratorsofyourorganismsdatabaseusingthe‘Comments’sectiontoreporttheCDSedits.
5. Inspecialcasessuchasselenocysteine containingproteins(read-throughs),right-clickovertheoffending/premature‘Stop’signalandchoosethe‘Setreadthrough stopcodon’optionfromthemenu.
COMPLEX CASESannotating frameshifts and correcting single-base errors & selenocysteines
BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES
55 |
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• Information Editor
TheAnnotationInformationEditorUSER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
TheAnnotationInformationEditor
• AddPubMedIDs• IncludeGO termsasappropriate
fromanyofthethreeontologies• Writecomments statinghowyou
havevalidatedeachmodel.
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
58 |
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
• Keeping track of each edit
Annotations,annotationedits,andHistory: storedinacentralizeddatabase.
USER NAVIGATION
BECOMING ACQUAINTED WITH APOLLO
Followthechecklistuntilyouarehappywiththeannotation!
Andrememberto…– commenttovalidateyourannotation,evenifyoumadenochangestoanexistingmodel.Thinkofcommentsasyourvoteofconfidence.
– oraddacommenttoinformthecommunityofunresolvedissuesyouthinkthismodelmayhave.
60 |
AlwaysRemember:Apollocurationisacommunityeffortsopleaseusecommentstocommunicatethereasonsforyour
annotation.Yourcommentswillbevisibletoeveryone.
COMPLETING THE ANNOTATION
BECOMING ACQUAINTED WITH APOLLO
Checklist
• Check‘Start’ and‘Stop’sites.
• Checksplicesites:mostsplicesitesdisplaytheseresidues…]5’-GT/AG-3’[…
• CheckifyoucanannotateUTRs,forexampleusingRNA-Seq data:– alignitagainstrelevantgenes/genefamily– blastp againstNCBI’sRefSeq ornr
• Checkforgaps inthegenome.
• Additionalfunctionalitymaybenecessary:–merging 2genepredictions- samescaffold– ‘merging’ 2genepredictions- differentscaffolds
– splitting ageneprediction– annotating frameshifts– annotatingselenocysteines,correctingsingle-baseandotherassemblyerrors,etc.
62 |
• Add:– Importantprojectinformationintheformof
comments– IDsfrompublicdatabasese.g.GenBank (via
DBXRef),genesymbol(s),commonname(s),synonyms,topBLASThits,orthologswithspeciesnames,andeverythingelseyoucanthinkof,becauseyouaretheexpert.
– Commentsaboutthekindsofchangesyoumadetothegenemodelofinterest,ifany.
– Anyappropriatefunctionalassignments,e.g.viaBLAST,RNA-Seq data,literaturesearches,etc.
CHECKLISTfor accuracy and integrity
MANUAL ANNOTATION CHECKLIST
Genomecurationwithi5k
64i5K Workspace@NAL
The collaborative curation process at i5k
1. Acomputationallypredictedconsensusgenesethasbeengeneratedusingmultiplelinesofevidence;e.g.HVIT_v0.5.3-Models
1. i5KProjectswillintegrateconsensuscomputationalpredictionswithmanualannotationstoproduceanupdatedOfficialGeneSet(OGS):Warning!• Ifit’snotoneithertrack,itwon’tmaketheOGS!• Ifit’sthereanditshouldn’t,itwillstillmaketheOGS!
The ‘Replace Models’ rules
BECOMING ACQUAINTED WITH APOLLO http://tinyurl.com/apollo-i5k-replace
66i5K Workspace@NAL
3. Insomecasesalgorithmsandmetricsusedtogenerateconsensussetsmayactuallyreducetheaccuracyofthegene’srepresentation.Useyourjudgment,trychoosingadifferentmodeltobegintheannotation.
4. Isoforms:dragoriginalandalternativelysplicedformto‘User-createdAnnotations’area.
5. Ifanannotationneedstoberemovedfromtheconsensusset,dragittothe‘User-createdAnnotations’areaandlabelas‘Delete’ontheInformationEditor.
6. Overlappinginterests?Collaboratetoreachagreement.
7. Followguidelinesfori5KPilotSpeciesProjects,athttp://goo.gl/LRu1VY
The collaborative curation process at i5k
Example
What’s new?... finding inspiration in PubMed.
Example 68
“Molecular analysis of bed bug populations from across the USA and Europe found that >80% and >95% of the respective populations contained V419L and/or L925I mutations in the voltage-gated sodium channel gene, indicating widespread distribution of target-site-based pyrethroid resistance.”
Homalodisca vitripennis | Alexander Wild | www.alexanderwild.comHalyomorpha halys | Fondazione Edmund Mach - Italy
Now for our species of interest. . .
Example
Example 69
CurationexampleusingtheHyalella aztecagenome(amphipodcrustacean).
What do we know about this genome?
• CurrentlypubliclyavailabledataatNCBI:• >37,000 nucleotideseqsà scaffolds,mitochondrialgenes• 344 aminoacidseqsàmitochondrion• 47 ESTs• 0 conserveddomainsidentified• 0 “gene”entriessubmitted
• Dataati5KWorkspace@NAL(annotationhostedatUSDA)- 10,832scaffolds:23,288transcripts:12,906proteins
Example 70
PubMed Search: what’s new?
Example 71
PubMed Search: what’s new?
Example 72
“Tenpopulationsdifferedbyatleast550-foldinsensitivity topyrethroids.”
“Sequencingtheprimarypyrethroid targetsite,thevoltage-gatedsodiumchannel(vgsc),showsthatpointmutationsandtheirspreadinnaturalpopulationswereresponsiblefordifferencesinpyrethroid sensitivity.”
“Thefindingthatanon-targetaquaticspecieshasacquiredresistancetopesticidesusedonlyonterrestrialpestsistroublingevidenceoftheimpactofchronicpesticidetransportfromland-basedapplicationsintoaquaticsystems.”
How many sequences are there, publicly available, for our gene of interest?
Example 73
• Para,(voltage-gatedsodiumchannelalphasubunit;Nasonia vitripennis).
• NaCP60E (Sodiumchannelprotein60E;D.melanogaster).– MF:voltage-gatedcation channelactivity(IDA,GO:0022843).
– BP:olfactorybehavior(IMP,GO:0042048),sodiumiontransmembrane transport(ISS,GO:0035725).
– CC:voltage-gatedsodiumchannelcomplex(IEA,GO:0001518).
Andwhatdoweknowaboutthem?
Retrieving sequences for a sequence similarity search.
Example 74
>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT searchinput
Example 75
>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT searchresults
Example 76
• High-scoringsegmentpairs(hsp)arelistedintabulatedformat.
• Clickingononelineofresultssendsyoutothosecoordinates.
BLAST at i5K https://i5k.nal.usda.gov/blast
Example 77
>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAST at i5K https://i5k.nal.usda.gov/blast
Example 78
BLAST at i5K: hsps in“BLAST+Results”track
Example 79
Creating a new gene model: drag and drop
Example 80
• ApolloautomaticallycalculateslongestORF.
• Inthiscase,ORFincludesthehigh-scoringsegmentpairs(hsp),markedhereinblue.
• Notethatgeneistranscribedfromreversestrand.
Available Tracks
Example 81
Get Sequence
Example 82
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Also, flanking sequences (other gene models) vs. NCBI nr
Example 83
Inthiscase,twogenemodelsupstream,at5’end.
BLASThsps
Review alignments
Example 84
HaztTmpM006234
HaztTmpM006233
HaztTmpM006232
Hypothesis for vgsc gene model
Example 85
Editing: merge the three models
Example 86
Mergebydroppinganexonorgenemodelontoanother.
Mergebyselectingtwoexons(holdingdown“Shift”)andusingtherightclickmenu.
or…
Result of merging the gene models:
Example 87
Editing: correct offending splice site
Example 88
Modifyexon/intronboundary:- Dragtheendofthe
exontothenearestcanonicalsplicesite.
or
- Useright-clickmenu.
Editing: set translation start
Example 89
Editing: delete exon not supported by evidence
Example 90
DeletefirstexonfromHaztTmpM006233
Editing: add an exon supported by RNAseq
Example 91
• RNAseqreadsshowevidenceinsupportoftranscribedproduct,whichwasnotpredicted.• Addexonatcoordinates97946-98012bydragginguponeoftheRNAseqreads.
Editing: adjust offending splice site using evidence
Example 92
Editing: adjust other boundaries supported by evidence
Example 93
Finished model
Example 94
Corroborateintegrityandaccuracyofthemodel:- Start andStop- Exonstructureandsplicesites…]5’-GT/AG-3’[…- Checkthepredictedproteinproductvs.NCBInr,UniProt,etc.
Information Editor
• DBXRefs:e.g.NP_001128389.1,N.vitripennis,RefSeq
• PubMedidentifier:PMID:24065824
• GeneOntologyIDs:GO:0022843,GO:0042048,GO:0035725,GO:0001518.
• Comments
• Name,Symbol
• Approve/Deleteradiobutton
Example 95
Comments(ifapplicable)
Goplay!
PUBLIC DEMO97 |
APOLLO ON THE WEBinstructions
Ati5K1. RegisterforaccesstoApolloatthei5KWorkspace@NALat
https://i5k.nal.usda.gov/web-apollo-registration
2. Contactthecoordinatorforeachspeciescommunitytoreceivemoreinformationabouthowtocontribute.Contactinfoisavailableoneachorganism’spage.
PUBLIC DEMO98 |
APOLLO ON THE WEBinstructions
PublicHoneybeedemoavailableat:
http://GenomeArchitect.org/WebApolloDemo
Username:demo@demo.com
Password:demo
APOLLOdemonstration
PUBLIC DEMO 99
Demonstrationvideoisavailableathttps://youtu.be/VgPtAP_fvxY
OUTLINE
100OUTLINE
• BIO-REFRESHERbiologicalconceptsforcuration
• ANNOTATIONautomaticpredictions
• MANUALANNOTATIONnecessary,collaborative
• APOLLOadvancingcollaborativecuration
• EXAMPLEdemos
Apollo Development
Nathan DunnTechnical Lead Eric Yao
Christine Elsik’s Lab, University of Missouri
Suzi LewisPrincipal Investigator
BBOP
Moni Munoz-TorresProject Manager
Deepak Unni
JBrowse. Ian Holmes’ Lab University of California, Berkeley
• Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).
• § Christine G. Elsik (PI). University of Missouri. • * Ian Holmes (PI). University of California Berkeley.• Arthropod genomics community & i5K Steering
Committee.• Stephen Ficklin, GenSAS, Washington State University• Apollo is supported by NIH grants 5R01GM080203
from NIGMS, and 5R01HG004483 from NHGRI. Also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231
• For your attention, thank you!
ApolloNathan DunnDeepak Unni §
Gene Ontology
Chris Mungall
Seth Carbon
Heiko Dietze
BBOP
Learn more about Apollo at http://GenomeArchitect.org
Thank you!
NAL at USDA
Monica Poelchau
Mei-Ju Chen
Christopher ChildersGary Moore
HGSC at BCM
fringy Richards
Kim Worley
JBrowse Eric Yao *
Interface Updates
Annotator Panel
Interface Updates
gene
mRNA
Update: Transforming coordinatesBringing exons closer together to facilitate annotation of gene models with long introns.
1,275bp
Concept for Apollo v2.1 – Northern Spring 2016
Transforming coordinatesAssembly artifacts may cause gene models to be splitacross two or more scaffolds. To facilitate annotation,Apollo allows the generation of an artificial space wherethe annotation can be completed.
Scaffold 2Scaffold 1
Genome Assembly
. . . . . .
Scaffold n
Recommended