Introduction to Apollo - i5k Research Community – Calanoida (copepod)

  • View
    91

  • Download
    0

  • Category

    Science

Preview:

Citation preview

Introduction to ApolloCollaborative genome annotation editing

A webinar for the i5K Research Community – Calanoida (copepod)

Monica Munoz-Torres | @monimunozto

Berkeley Bioinformatics Open-Source Projects (BBOP)Environmental Genomics & Systems Biology Division, Lawrence Berkeley National Laboratory

i5k Pilot Project Species Calls | 17 October, 2016

http://GenomeArchitect.org

Outline

• Today you will discovereffective ways to extract valuable information about a genome through curation efforts.

After this talk you will...• Better understand ‘curation’ in the context of genome annotation:

assembled genome à automated annotation à manual annotation

• Become familiar with Apollo’s environment and functionality.

• Learn to identify homologs of known genes of interest in your newly sequenced genome.

• Learn how to corroborate and modify automatically annotated gene models using all available evidence in Apollo.

Experimental design, sampling.

Comparative analyses

Official / Merged Gene Set

Manual Annotation

Automated Annotation

SequencingAssembly

Synthesis & dissemination.

This is our focus.

We must care about curation

Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild

The gene set of an organism informs a variety of studies:• Characterization: Gene number, GC%, TEs, repeats.• Functional assignments.• Molecular evolution, sequence conservation.• Gene families.• Metabolic pathways.• What makes an organism what it is?

What makes a bee a “bee”?

Genome Curation

Identifies elements that best represent the underlying biology and eliminates elements that reflect systemic errors of automated analyses.

Assigns function through comparative analysis of similar genome elements from closely

related species using literature, databases, and experimental

data.

Apollo

Gene Ontology Resources

A few things to rememberwhen conducting manual annotation

7BIO-REFRESHER

• KEEPAGLOSSARY HANDYfromcontig tosplicesite

• WHATISAGENE?definingyourgoal

• TRANSCRIPTIONmRNAindetail

• TRANSLATIONreadingframes,etc.

• GENOMECURATIONstepsinvolved

The gene: a “moving target”

“The gene is a union of genomic

sequences encoding a coherent set of

potentially overlapping

functional products.”

Gerstein et al., 2007. Genome Res

9

"Gene structure" by Daycd- Wikimedia Commons

BIO-REFRESHER

mRNA

• Although of brief existence, understanding mRNAs is crucial,as they will become the center of your work.

10BIO-REFRESHER

Reading frames

v In eukaryotes, only one reading frame per section of DNA is biologically relevant at a time: it has the potential to be transcribed into RNA and translated into protein. This is called the OPEN READING FRAME (ORF)• ORF = Start signal + coding sequence (divisible by 3) + Stop signal

11BIO-REFRESHER

Splice sites

v The spliceosome catalyzes the removal of introns and the ligation of flanking exons.

v Splicing signals (from the point of view of an intron): • One splice signal (site) on the 5’ end: usually GT (less common: GC)• And a 3’ end splice site: usually AG• Canonical splice sites look like this: …]5’-GT/AG-3’[…

12BIO-REFRESHER

Exons and Introns

v Introns can interrupt the reading frame of a gene by inserting a sequence between two consecutive codons

v Between the first and second nucleotide of a codon

v Or between the second and third nucleotide of a codon

"Exon and Intron classes”. Licensed under Fair use via Wikipedia

Prediction&Annotation

14GENE PREDICTION & ANNOTATION

PREDICTION & ANNOTATION

v Identificationandannotationofgenomefeatures:

• primarilyfocusesonprotein-codinggenes.• alsoidentifiesRNAs(tRNA,rRNA,longandsmallnon-coding

RNAs(ncRNA)),regulatorymotifs,repetitiveelements,etc.

• happensin2phases:1. Computationphase2. Annotationphase

15GENE PREDICTION & ANNOTATION

COMPUTATION PHASE

a. Experimentaldataarealignedtothegenome:expressedsequencetags,RNA-sequencingreads,proteins(alsofromotherspecies).

a. Genepredictionsaregenerated:- ab initio:basedonnucleotidesequenceandcompositione.g.Augustus,GENSCAN,geneid,fgenesh,etc.

- evidence-driven:identifyingalsodomainsandmotifse.g.SGP2,JAMg,fgenesh++,etc.

Result:thesinglemostlikelycodingsequence,noUTRs,noisoforms.Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174

16GENE PREDICTION & ANNOTATION

ANNOTATION PHASE

Experimentaldata(evidence)and predictionsaresynthetizedintogeneannotations.

Result: genemodelsthatgenerallyincludeUTRs,isoforms,evidencetrails.

Yandell & Ence. Nature Rev 2012 doi:10.1038/nrg3174

5’UTR 3’UTR

17

Insomecasesalgorithmsandmetricsusedtogenerateconsensussetsmayactuallyreducetheaccuracyofthegene’srepresentation.

CONSENSUS GENE SETS

Genemodelsmaybeorganizedintosetsusing:v combinersforautomaticintegrationofpredictedsets

e.g:GLEAN,EvidenceModeler

orv toolspackagedintopipelines

e.g:MAKER,PASA,Gnomon,Ensembl,etc.

GENE PREDICTION & ANNOTATION

ANNOTATIONneeds some refinement

No one is perfect, least of all automated annotation. 18

Newtechnologiesbringnewchallenges:• Assembly errorscancausefragmented

annotations• Limited coveragemakesprecise

identificationadifficulttask

MANUAL ANNOTATIONimproving predictions

Preciseelucidationofbiologicalfeaturesencodedinthegenomerequirescareful

examinationandreview.

Schiex etal.Nucleic Acids2003 (31)13:3738-3741

Automated Predictions

Experimental Evidence

Manual Annotation – to the rescue. 19

cDNAs,HMMdomainsearches,RNAseq,genesfromotherspecies.

GENOME CURATIONan inherently collaborative task

GENE PREDICTION & ANNOTATION 20

Somanysequences,notenoughhands.

Apismellifera|AlexanderWild|www.alexanderwild.com

We have provided continuous training and support for hundreds ofgeographically dispersed scientists to conduct manual annotationsefforts in order to recover coding sequences in agreement with allavailable biological evidence.

21

Collaboration is key!

APOLLO

• Collaborative work distills invaluable knowledge.

• A little training goes a long way! Wet lab scientists can easily learn to maximize the generation of accurate, biologically supported gene models.

Apollo

APOLLO: versatile genome annotation editing• Apollo is a web-based genome annotation editor, integrated with JBrowse

• Supports real time collaboration & generates analysis-ready data

USER-CREATED ANNOTATIONS

EVIDENCE TRACKS

ANNOTATOR PANEL

BECOMING ACQUAINTED WITH APOLLO

General process of curation

1. Selectorfinda regionofinterest,e.g.scaffold.

2. Selectappropriateevidencetrackstoreviewthegenemodel.

3. Determinewhetherafeatureinanexistingevidencetrackwillprovideareasonablegenemodeltostartworking.

4. Ifnecessary,adjust thegenemodel.

5. Checkyoureditedgenemodelforintegrityandaccuracy bycomparingitwithavailablehomologs.

6. Comment andfinish.

Apollo - version at i5K Workspace@NAL

4. Becoming Acquainted with Web Apollo.

25

TheSequenceSelectionWindow

Sort

Apollo - version at i5K Workspace@NAL

“OldTrackSelectPage”

4. Becoming Acquainted with Web Apollo.

26

APOLLOannotation editing environment

BECOMING ACQUAINTED WITH APOLLO

ColorbyCDSframe,togglestrands,setcolorschemeandhighlights.

- Uploadevidencefiles(GFF3,BAM,BigWig),- combinationtrack- sequencesearchtrack

QuerythegenomeusingBLAT.

Navigationandzoom.

Searchforagenemodelorascaffold.

Getcoordinatesand“rubberband”selectionforzooming.

Login

User-createdannotations. New

annotatorpanel.

EvidenceTracks

Stageandcell-typespecifictranscriptiondata.

http://genomearchitect.org/web_apollo_user_guide

28 | BECOMING ACQUAINTED WITH APOLLO

USER NAVIGATION

Annotatorpanel.

• Chooseappropriateevidencefromlistof“Tracks”onannotatorpanel.

• Select&dragelementsfromevidencetrackintothe‘User-createdAnnotations’area.

• Hoveringoverannotationinprogressbringsupaninformationpop-up.

• Creatinganewannotation

Adding a gene model

Adding a gene model

Adding a gene model

Editing functionality

Editing functionalityExample: Adding an exon supported by experimental data

• RNAseq reads show evidence in support of a transcribed product that was not predicted.• Add exon by dragging up one of the RNAseq reads.

Editing functionalityExample: Adjusting exon boundaries supported by experimental data

CuratingwithApollo

36 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• ‘Zoomtobaselevel’ revealstheDNATrack.

37 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• ColorexonsbyCDSfromthe‘View’menu.

38 |

Zoomin/outwithkeyboard:shift+arrowkeysup/down

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• TogglereferenceDNAsequenceand translationframesinforwardstrand.Togglemodelsineitherdirection.

annotatingsimplecases

“Simplecase”:- thepredictedgenemodeliscorrectornearlycorrect,and- thismodelissupportedbyevidencethatcompletely ormostlyagreeswiththeprediction.- evidencethatextendsbeyondthepredictedmodelisassumedtobenon-codingsequence.

Thefollowingaresimplemodifications.

ANNOTATING SIMPLE CASES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

• A confirmation box will warn you if the receiving transcript is not on thesame strand as the feature where the new exon originated.

• Check ‘Start’ and ‘Stop’ signals after each edit.

ADDING EXONS

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Iftranscriptalignmentdataareavailable&extendbeyondyouroriginalannotation,youmayextendoraddUTRs.

1. Rightclickattheexonedgeand‘Zoomtobaselevel’.

2. PlacethecursorovertheedgeoftheexonuntilitbecomesablackarrowthenclickanddragtheedgeoftheexontothenewcoordinatepositionthatincludestheUTR.

ADDING UTRs

ToaddanewsplicedUTRtoanexistingannotationalsofollowtheprocedureforaddinganexon.

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

To modify an exon boundary and matchdata in the evidence tracks: selectboth the [offending] exon and thefeature with the expected boundary,then right click on the annotation toselect ‘Set 3’ end’ or ‘Set 5’ end’ asappropriate.

Insomecasesallthedatamaydisagreewiththeannotation,inothercasessomedatasupporttheannotationandsomeofthe

datasupportoneormorealternativetranscripts.Trytoannotateasmanyalternativetranscriptsasarewellsupportedbythedata.

MATCHING EXON BOUNDARY TO EVIDENCE

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

1. Twoexonsfromdifferenttrackssharingthesamestart/endcoordinatesdisplayaredbartoindicatematchingedges.

2. Selectingthewholeannotationoroneexonatatime,usethis edge-matching functionandscrollalongthelengthoftheannotation,verifyingexonboundariesagainstavailabledata.Usesquare[]bracketstoscrollfromexontoexon.Usercurly{}bracketstoscrollfromannotationtoannotation.

3. CheckifcDNA/RNAseqreadslackoneormoreoftheannotatedexonsorincludeadditionalexons.

CHECKING EXON INTEGRITY

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Non-canonicalsplicesitesflags. Doubleclick:selectionoffeatureandsub-features

EvidenceTracksArea

‘User-createdAnnotations’Track

Edge-matching

Apollo’seditinglogic(brain):§ selectslongestORFasCDS§ flagsnon-canonicalsplicesites

ORFs AND SPLICE SITES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Non-canonical splices are indicated byan orange circle with a whiteexclamation point inside, placed overthe edge of the offending exon.

Canonicalsplicesites:

3’-…exon]GA/TG[exon…-5’

5’-…exon]GT/AG[exon…-3’reversestrand,notreverse-complemented:

forwardstrand

SPLICE SITES

Zoom toreviewnon-canonicalsplicesitewarnings.Althoughthesemaynotalwayshavetobecorrected(e.g GCdonor),theyshouldbeflaggedwithacomment.

Exon/intronsplicesiteerrorwarning

Curatedmodel

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

Apollocalculatesthelongestpossibleopenreadingframe(ORF)thatincludescanonical‘Start’and‘Stop’signalswithinthepredictedexons.

If‘Start’appearstobeincorrect,modifyitbyselectinganin-frame‘Start’codonfurtherupordownstream,dependingonevidence(proteins,RNAseq).

Itmaybepresentoutsidethepredictedgenemodel,withinaregionsupportedbyanotherevidencetrack.

Inveryrarecases,theactual‘Start’ codonmaybenon-canonical(non-ATG).

‘Start’ AND ‘Stop’ SITES

BECOMING ACQUAINTED WITH APOLLO SIMPLE CASES

annotatingcomplexcases

Evidencemaysupportjoiningtwoormoredifferentgenemodels.Warning: proteinalignmentsmayhaveincorrectsplicesitesandlacknon-conservedregions!

1. In‘User-createdAnnotations’area shift-clicktoselectanintronfromeachgenemodelandrightclicktoselectthe‘Merge’ optionfromthemenu.

2. Dragsupportingevidencetracksoverthecandidatemodelstocorroborateoverlap,orreviewedgematchingandcoverageacrossmodels.

3. Checktheresultingtranslationbyqueryingaproteindatabase e.g.UniProt,NCBInr.Addcommentstorecordthatthisannotationistheresultofamerge.

Redlinesaroundexons:‘edge-matching’allowsannotatorstoconfirmwhethertheevidenceisinagreementwithoutexaminingeachexonatthebaselevel.

COMPLEX CASESmerge two gene predictions on the same scaffold

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

Oneormoresplitsmayberecommendedwhen:- differentsegmentsofthepredictedproteinaligntotwoormoredifferentgenefamilies- predictedproteindoesn’taligntoknownproteinsoveritsentirelength- Transcriptdatamaysupportasplit,butfirstverifywhethertheyarealternativetranscripts.

COMPLEX CASESsplit a gene prediction

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

DNATrack

‘User-createdAnnotations’Track

COMPLEX CASESannotate frameshifts and correct single-base errors

Alwaysremember:whenannotatinggenemodelsusingApollo,youarelookingata‘frozen’versionofthegenomeassemblyandyouwillnotbeabletomodifytheassemblyitself.

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

COMPLEX CASEScorrecting selenocysteine containing proteins

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

COMPLEX CASEScorrecting selenocysteine containing proteins

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

1. Apolloallowsannotatorstomakesinglebasemodificationsorframeshifts thatarereflectedinthesequenceandstructureofanytranscriptsoverlappingthemodification.ThesemanipulationsdoNOTchangetheunderlyinggenomicsequence.

2. Ifyoudeterminethatyouneedtomakeoneofthesechanges,zoomintothenucleotidelevelandrightclickoverasinglenucleotideonthegenomicsequencetoaccessamenuthatprovidesoptionsforcreatinginsertions,deletionsorsubstitutions.

3. The‘CreateGenomicInsertion’featurewillrequireyoutoenterthenecessarystringofnucleotideresiduesthatwillbeinsertedtotherightofthecursor’scurrentlocation.The‘CreateGenomicDeletion’ optionwillrequireyoutoenterthelengthofthedeletion,startingwiththenucleotidewherethecursorispositioned.The‘CreateGenomicSubstitution’featureasksforthestringofnucleotideresiduesthatwillreplacetheonesontheDNAtrack.

4. Onceyouhaveenteredthemodifications,Apollowillrecalculatethecorrectedtranscriptandproteinsequences,whichwillappearwhenyouusetheright-clickmenu‘GetSequence’option.Sincetheunderlyinggenomicsequenceisreflectedinallannotationsthatincludethemodifiedregionyoushouldalertthecuratorsofyourorganismsdatabaseusingthe‘Comments’sectiontoreporttheCDSedits.

5. Inspecialcasessuchasselenocysteine containingproteins(read-throughs),right-clickovertheoffending/premature‘Stop’signalandchoosethe‘Setreadthrough stopcodon’optionfromthemenu.

COMPLEX CASESannotating frameshifts and correcting single-base errors & selenocysteines

BECOMING ACQUAINTED WITH APOLLO COMPLEX CASES

55 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• Information Editor

TheAnnotationInformationEditorUSER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

TheAnnotationInformationEditor

• AddPubMedIDs• IncludeGO termsasappropriate

fromanyofthethreeontologies• Writecomments statinghowyou

havevalidatedeachmodel.

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

58 |

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

• Keeping track of each edit

Annotations,annotationedits,andHistory: storedinacentralizeddatabase.

USER NAVIGATION

BECOMING ACQUAINTED WITH APOLLO

Followthechecklistuntilyouarehappywiththeannotation!

Andrememberto…– commenttovalidateyourannotation,evenifyoumadenochangestoanexistingmodel.Thinkofcommentsasyourvoteofconfidence.

– oraddacommenttoinformthecommunityofunresolvedissuesyouthinkthismodelmayhave.

60 |

AlwaysRemember:Apollocurationisacommunityeffortsopleaseusecommentstocommunicatethereasonsforyour

annotation.Yourcommentswillbevisibletoeveryone.

COMPLETING THE ANNOTATION

BECOMING ACQUAINTED WITH APOLLO

Checklist

• Check‘Start’ and‘Stop’sites.

• Checksplicesites:mostsplicesitesdisplaytheseresidues…]5’-GT/AG-3’[…

• CheckifyoucanannotateUTRs,forexampleusingRNA-Seq data:– alignitagainstrelevantgenes/genefamily– blastp againstNCBI’sRefSeq ornr

• Checkforgaps inthegenome.

• Additionalfunctionalitymaybenecessary:–merging 2genepredictions- samescaffold– ‘merging’ 2genepredictions- differentscaffolds

– splitting ageneprediction– annotating frameshifts– annotatingselenocysteines,correctingsingle-baseandotherassemblyerrors,etc.

62 |

• Add:– Importantprojectinformationintheformof

comments– IDsfrompublicdatabasese.g.GenBank (via

DBXRef),genesymbol(s),commonname(s),synonyms,topBLASThits,orthologswithspeciesnames,andeverythingelseyoucanthinkof,becauseyouaretheexpert.

– Commentsaboutthekindsofchangesyoumadetothegenemodelofinterest,ifany.

– Anyappropriatefunctionalassignments,e.g.viaBLAST,RNA-Seq data,literaturesearches,etc.

CHECKLISTfor accuracy and integrity

MANUAL ANNOTATION CHECKLIST

Genomecurationwithi5k

64i5K Workspace@NAL

The collaborative curation process at i5k

1. Acomputationallypredictedconsensusgenesethasbeengeneratedusingmultiplelinesofevidence;e.g.HVIT_v0.5.3-Models

1. i5KProjectswillintegrateconsensuscomputationalpredictionswithmanualannotationstoproduceanupdatedOfficialGeneSet(OGS):Warning!• Ifit’snotoneithertrack,itwon’tmaketheOGS!• Ifit’sthereanditshouldn’t,itwillstillmaketheOGS!

The ‘Replace Models’ rules

BECOMING ACQUAINTED WITH APOLLO http://tinyurl.com/apollo-i5k-replace

66i5K Workspace@NAL

3. Insomecasesalgorithmsandmetricsusedtogenerateconsensussetsmayactuallyreducetheaccuracyofthegene’srepresentation.Useyourjudgment,trychoosingadifferentmodeltobegintheannotation.

4. Isoforms:dragoriginalandalternativelysplicedformto‘User-createdAnnotations’area.

5. Ifanannotationneedstoberemovedfromtheconsensusset,dragittothe‘User-createdAnnotations’areaandlabelas‘Delete’ontheInformationEditor.

6. Overlappinginterests?Collaboratetoreachagreement.

7. Followguidelinesfori5KPilotSpeciesProjects,athttp://goo.gl/LRu1VY

The collaborative curation process at i5k

Example

What’s new?... finding inspiration in PubMed.

Example 68

“Molecular analysis of bed bug populations from across the USA and Europe found that >80% and >95% of the respective populations contained V419L and/or L925I mutations in the voltage-gated sodium channel gene, indicating widespread distribution of target-site-based pyrethroid resistance.”

Homalodisca vitripennis | Alexander Wild | www.alexanderwild.comHalyomorpha halys | Fondazione Edmund Mach - Italy

Now for our species of interest. . .

Example

Example 69

CurationexampleusingtheHyalella aztecagenome(amphipodcrustacean).

What do we know about this genome?

• CurrentlypubliclyavailabledataatNCBI:• >37,000 nucleotideseqsà scaffolds,mitochondrialgenes• 344 aminoacidseqsàmitochondrion• 47 ESTs• 0 conserveddomainsidentified• 0 “gene”entriessubmitted

• Dataati5KWorkspace@NAL(annotationhostedatUSDA)- 10,832scaffolds:23,288transcripts:12,906proteins

Example 70

PubMed Search: what’s new?

Example 71

PubMed Search: what’s new?

Example 72

“Tenpopulationsdifferedbyatleast550-foldinsensitivity topyrethroids.”

“Sequencingtheprimarypyrethroid targetsite,thevoltage-gatedsodiumchannel(vgsc),showsthatpointmutationsandtheirspreadinnaturalpopulationswereresponsiblefordifferencesinpyrethroid sensitivity.”

“Thefindingthatanon-targetaquaticspecieshasacquiredresistancetopesticidesusedonlyonterrestrialpestsistroublingevidenceoftheimpactofchronicpesticidetransportfromland-basedapplicationsintoaquaticsystems.”

How many sequences are there, publicly available, for our gene of interest?

Example 73

• Para,(voltage-gatedsodiumchannelalphasubunit;Nasonia vitripennis).

• NaCP60E (Sodiumchannelprotein60E;D.melanogaster).– MF:voltage-gatedcation channelactivity(IDA,GO:0022843).

– BP:olfactorybehavior(IMP,GO:0042048),sodiumiontransmembrane transport(ISS,GO:0035725).

– CC:voltage-gatedsodiumchannelcomplex(IEA,GO:0001518).

Andwhatdoweknowaboutthem?

Retrieving sequences for a sequence similarity search.

Example 74

>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR

BLAT searchinput

Example 75

>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR

BLAT searchresults

Example 76

• High-scoringsegmentpairs(hsp)arelistedintabulatedformat.

• Clickingononelineofresultssendsyoutothosecoordinates.

BLAST at i5K https://i5k.nal.usda.gov/blast

Example 77

>vgsc-Segment3-DomainIIRVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDGQMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR

BLAST at i5K https://i5k.nal.usda.gov/blast

Example 78

BLAST at i5K: hsps in“BLAST+Results”track

Example 79

Creating a new gene model: drag and drop

Example 80

• ApolloautomaticallycalculateslongestORF.

• Inthiscase,ORFincludesthehigh-scoringsegmentpairs(hsp),markedhereinblue.

• Notethatgeneistranscribedfromreversestrand.

Available Tracks

Example 81

Get Sequence

Example 82

http://blast.ncbi.nlm.nih.gov/Blast.cgi

Also, flanking sequences (other gene models) vs. NCBI nr

Example 83

Inthiscase,twogenemodelsupstream,at5’end.

BLASThsps

Review alignments

Example 84

HaztTmpM006234

HaztTmpM006233

HaztTmpM006232

Hypothesis for vgsc gene model

Example 85

Editing: merge the three models

Example 86

Mergebydroppinganexonorgenemodelontoanother.

Mergebyselectingtwoexons(holdingdown“Shift”)andusingtherightclickmenu.

or…

Result of merging the gene models:

Example 87

Editing: correct offending splice site

Example 88

Modifyexon/intronboundary:- Dragtheendofthe

exontothenearestcanonicalsplicesite.

or

- Useright-clickmenu.

Editing: set translation start

Example 89

Editing: delete exon not supported by evidence

Example 90

DeletefirstexonfromHaztTmpM006233

Editing: add an exon supported by RNAseq

Example 91

• RNAseqreadsshowevidenceinsupportoftranscribedproduct,whichwasnotpredicted.• Addexonatcoordinates97946-98012bydragginguponeoftheRNAseqreads.

Editing: adjust offending splice site using evidence

Example 92

Editing: adjust other boundaries supported by evidence

Example 93

Finished model

Example 94

Corroborateintegrityandaccuracyofthemodel:- Start andStop- Exonstructureandsplicesites…]5’-GT/AG-3’[…- Checkthepredictedproteinproductvs.NCBInr,UniProt,etc.

Information Editor

• DBXRefs:e.g.NP_001128389.1,N.vitripennis,RefSeq

• PubMedidentifier:PMID:24065824

• GeneOntologyIDs:GO:0022843,GO:0042048,GO:0035725,GO:0001518.

• Comments

• Name,Symbol

• Approve/Deleteradiobutton

Example 95

Comments(ifapplicable)

Goplay!

PUBLIC DEMO97 |

APOLLO ON THE WEBinstructions

Ati5K1. RegisterforaccesstoApolloatthei5KWorkspace@NALat

https://i5k.nal.usda.gov/web-apollo-registration

2. Contactthecoordinatorforeachspeciescommunitytoreceivemoreinformationabouthowtocontribute.Contactinfoisavailableoneachorganism’spage.

PUBLIC DEMO98 |

APOLLO ON THE WEBinstructions

PublicHoneybeedemoavailableat:

http://GenomeArchitect.org/WebApolloDemo

Username:demo@demo.com

Password:demo

APOLLOdemonstration

PUBLIC DEMO 99

Demonstrationvideoisavailableathttps://youtu.be/VgPtAP_fvxY

OUTLINE

100OUTLINE

• BIO-REFRESHERbiologicalconceptsforcuration

• ANNOTATIONautomaticpredictions

• MANUALANNOTATIONnecessary,collaborative

• APOLLOadvancingcollaborativecuration

• EXAMPLEdemos

Apollo Development

Nathan DunnTechnical Lead Eric Yao

Christine Elsik’s Lab, University of Missouri

Suzi LewisPrincipal Investigator

BBOP

Moni Munoz-TorresProject Manager

Deepak Unni

JBrowse. Ian Holmes’ Lab University of California, Berkeley

• Berkeley Bioinformatics Open-source Projects (BBOP), Berkeley Lab: Apollo and Gene Ontology teams. Suzanna E. Lewis (PI).

• § Christine G. Elsik (PI). University of Missouri. • * Ian Holmes (PI). University of California Berkeley.• Arthropod genomics community & i5K Steering

Committee.• Stephen Ficklin, GenSAS, Washington State University• Apollo is supported by NIH grants 5R01GM080203

from NIGMS, and 5R01HG004483 from NHGRI. Also supported by the Director, Office of Science, Office of Basic Energy Sciences, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231

• For your attention, thank you!

ApolloNathan DunnDeepak Unni §

Gene Ontology

Chris Mungall

Seth Carbon

Heiko Dietze

BBOP

Learn more about Apollo at http://GenomeArchitect.org

Thank you!

NAL at USDA

Monica Poelchau

Mei-Ju Chen

Christopher ChildersGary Moore

HGSC at BCM

fringy Richards

Kim Worley

JBrowse Eric Yao *

Interface Updates

Annotator Panel

Interface Updates

gene

mRNA

Update: Transforming coordinatesBringing exons closer together to facilitate annotation of gene models with long introns.

1,275bp

Concept for Apollo v2.1 – Northern Spring 2016

Transforming coordinatesAssembly artifacts may cause gene models to be splitacross two or more scaffolds. To facilitate annotation,Apollo allows the generation of an artificial space wherethe annotation can be completed.

Scaffold 2Scaffold 1

Genome Assembly

. . . . . .

Scaffold n

Recommended