52
PomBase conven,ons for improving annota,on depth, breadth, consistency and accuracy

PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Embed Size (px)

Citation preview

Page 1: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

PomBaseconven,onsforimprovingannota,ondepth,

breadth,consistencyandaccuracy

Page 2: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Annota,onnumbersareimportant…butnumbersaren’teverything…..•  Downstreamuseofannota.onfordata-mininganddata-

analysisislimitedbyerrors,inconsistenciesandomissions.•  PomBaseusesacombina.onofannota.onconven.ons,to

improveinforma.oncontent(annota.oncoverage,specificityandredundancy),andQCmechanismstoiden.fypossibleannota.oninconsistenciesanderrors.

•  Incombina.onthesemechanismsaddressmanyrecurringannota.onissues.

Page 3: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

1.Thedefini.oniscri.cal

Allontologytermshavea“fixed”defini.on•  Ifadefini.onismisleadingorincorrectitsmeaningcannot

bechanged.Tofixthetermisobsoletedandannota.onsaremigrated.

•  Thismakesannota.onsveryrobusttoontologychanges.Ifatermneedstobereposi.onedtheannota.onsremaincorrect.

•  Weannotatetothedefini.on,notthetermname.Alwayscheckthedefini,on.

Page 4: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

2.Improvingannota.onspecificity

•  i)Considerdescendantterms•  ii)Vetouseofuninforma.veterms

Page 5: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

2i.ConsiderdescendantsAnnotateasspecificallyasexperimentallowsandbeunambiguousaboutthebiology•  regula.on:posi.veornega.ve?•  transla.on:cytoplasmicormitochondrial?•  transport:ofwhat?towhere?how?•  chromosomesegrega.on:mito.cormeio.c?

Iftheavailabletermsareinsufficient,requestamorespecificterm

Page 6: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

•  Foracarboxylicacidcarrier“carboxylicacidtransport”looksini.allyOK•  However“transmembranetransport”isnotexplicithere…Carboxylicacidmightbetransportedinotherways…

2i.Considerdescendantse.g.

Page 7: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Morespecificannota.oncanprovideaddi.onaldetaile.g.•  substrate,•  type(transmembrane),•  some.mesdirec.onalityAddi.onalparentsincreasetheinforma.oncontentasannota.ngindirectlytomoreterms.

2.Considerdescendantse.g.

Page 8: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

2.Vetouseofuninforma.vetermsIden.fythesetofontologytermswheremorespecificannota.onshouldbepossible(morebiologicaldetail)Examples:•  e.g.cellularprocess->which•  e.g.transla.on->cytoplasmic?Mitochondrial?•  e.g.transport->ofwhat?towhere?SomeGOtermsarealreadyflaggedasnotformanualannota.on.Reviewandimproveannota.onstovetoedtermsPomBasealotoftheupperontologylevels1175GOtermsblockedforannota.on(only~50viola1ons)

Page 9: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3.Improvetheontologies

Page 10: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3.i)Missingparents

Originalarrangement

Page 11: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3i.Missingparents

Theseprocessannota.onswereoriginallyindifferentbranchesoftheontology,soallannota.onswererequired

Page 12: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Newarrangement:

3i.Missingparents

Page 13: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3.iMissingparents

Collapsed6processesto2.Exactlythesameinforma.oncontentLessredundancy,easierforuserstointerpretannota.on

Page 14: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

3.iiReportincorrectparents

AKA“TruePathViola.ons”or“TPVs”Forexampleproteinmatura.on--proteinprocessing(part_of)----proteolysis(part_of)(notallproteolysisisprocessingormatura.on)

Page 15: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

4.ThepowerofAnnota.onExtensionsProvideaddi.onalspecificityforaGOannota.one.g.•  Targetgene(kinasesubstrate,TFregula.ontarget)•  Loca.onofafunc.on•  Localiza.ondependencies(proteinAlocalizesproteinB)•  Spa.alandtemporalaspectsofprocesses,func.ons,loca.ons(cellcyclestage

ofoccurrence)

•  ADDanexampleofageneproductspecificAE

See:Huntleyet.al.AmethodforincreasingexpressivityofGeneOntologyannota.onsusingacomposi.onalapproach.PMID:24885854

Page 16: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

cyclin-dependentproteinserine/threoninekinase•hassubstratejh2involvedinnega.veregula.onofconjuga.onwithcellularfusion•directlyinhibitssrw1involvedinposi.veregula.onregula.onofG1/Stransi.on•hassubstratedrc1involvedinposi.veregula.onofmito.ccellcycleDNAreplica.on•hassubstratecdc18,orc2involvedinnega.veregula.onofDNAreplica.onduringmito.cG2phase•hassubstratexlf1involvedinnega.veregula.onofdouble-strandbreakrepairvianonhomologousendjoining,duringmito.cG2phase•hassubstraterap1involvedinnega.veregula.onofmito.ctelomeretetheringatnuclearperipheryduringmito.cMphase•hassubstratehcn1duringmito.cMphase•hassubstratecut3involvedinposi.veregula.onofmito.cchromosomecondensa.onduringmito.cmetaphase•hassubstratemde4involvedincorrec.onofmerotelicamachment,mito.cduringmito.cmetaphase•hassubstrate,nsk1,involvedinnega.veregula.onofamachmentofmito.cspindlemicrotubulesduringmito.cmetaphase•hassubstratemde4,cut7involvedinnega.veregula.onofmito.cspindleelonga.onduringmito.cmetaphase•hassubstrateklp9involvedinnega.veregula.onofmito.cspindleelonga.onduringmito.canaphaseA•directlyinhibitsclp1involvedinnega.veregula.onofexitfrommitosis•hassubstratebyr4involvedinposi.veregula.onofsepta.onini.a.onsignaling•directlyinhibitsdis2,•hassubstraterum1,crb2,sds23

Linkfunc.on(cyclin-dependent-kinase)totargetgenes,processes,andtemporalinforma.on

4.Annota.onExtensione.g.cdc2

Page 17: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Alterna.ve(humanCDK1):

Notscalableormaintainable

Page 18: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

4.UsingAEforeffectors•  Reciprocaloftheextension(automated)called“targetof”•  Collectsknown“upstreameffectors”oncdc2page

Page 19: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

•  Wecanuseeffectorsubstrateconnec.onstogeneratenetworks(interac.on,metabolic,regulatory)

•  Providedirec.onallinkstosupportpathwayreconstruc.on

4.UsingAnnota.onExtensionstogeneratenetworks/pathways

sty1 cmk2

srk1

rum1

atf1 srk1

gsa1

gpx1

ntp1

sro1 ish1

Page 20: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

4.AutomatedAEnetworkse.g.

44/59connectedinautomatednetworkbasedonannotatedconnec.onswithin“regula,onofG2/Mtransi,on”(fissionyeast)(NetworkforeachGOslimcategoryfromtheslimpage)

Page 21: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

5.SuppressredundantIEAannota.on

•  PomBasepipelinesfilterredundantIEA(InferredfromElectronicAnnota1on)evidence

•  Removes>90%ofIEA(becauseanexis.ngmanualannota.onexists)

Page 22: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

5.SuppressredundantIEAannota.on

13annota.onsarereducedto4

Sameinforma.on,fewerterms

Page 23: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Incorrectannota.onsaremoreeasilyspomedMis16isnotinvolvedin‘chroma.nmodifica.on,->fixmapping

5.SuppressredundantIEA,QCofmappings

Page 24: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Missingparentsinontologymoreobvious“inorganicanionexchanger”shouldbean‘ancestor’ofGO:0005452,tosuppresstheIEAasredundant

5.SuppressredundantIEA,QCofontology(SPBC543.05c)

Page 25: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

5.SuppressredundantIEAannota.on

•  >40,000fissionyeastIEAsavailable.•  PomBasefilter36000redundant,retain4000(IEAsareatleast

90%accurateifmanualcorrect).•  ItiseasiertoevaluatetheremainingIEA’stoiden.fy/fix

anomalies

ReducingIEAsover.me

Page 26: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

5.SuppressredundantIEA•  Moreconciseviewwithzerolossofinforma1on•  IEAmappingsderivedfromasingleexperiment/publica.on

canbeinterpretedasproofbyrepe11onandmakeweakEXPdataappearmul.plysupported/acceptable

•  Fewerannota.ons,easierQCofremainingIEA’sQ“Whyisn’tanIEAcoveredbymanualannota.on?”Either:

1.  Incorrectmapping2.  Missingparentinontology3.  Missingannota.on->findsuppor.ngevidenceand

annotatemanually(EXPorISO)(PomBasealsofilterNAS/TAS/IC)

Page 27: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

6.Annotatebyprocess(pathway)

•  Annota.ngbyprocessratherthan“adhoc”improvesconsistencyandallows‘annota.ongaps’tobetargeted

•  Processpapersmorequickly(becomemorefamiliarwiththefield,experimentalmethods)Becomefamiliarwithanareaofbiologyandthetechniquesused.Don’tneedtoreadthebackgroundevery.me.Recognisephenotypes.

Page 28: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

FromPMID:22898774

Regula.onofthemetaphase/anaphasetransi.onbytheMCC,theAPCandupstreamSignallingIden.fyobviousmissingannota.on,forexamplebetweencomplexmembers

6.Annotatebyprocessorpathway

Page 29: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

6.Annotatebyprocessorpathway

cdc20

proteasome

APC separase

CohesinsubunitsecurinPosttransi.on

SAC/MCC

CanperformQConprocessedorcomponentse.g.UseSTRINGtoevaluateoutliers(poten.alannota.onerrors)Inputlist“regula,onofmito,cmetaphase/anaphasetransi,on”

Canalsoask“areanyComplexmembersmissing”

Page 30: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

•  Weareannota.ngwholeorganisms…useaholis.cwholeannota.onapproach

•  Evaluateannota.onbreadth(coverage)usingslims

•  Evaluateintersec.onsbetweenslimprocesses

7.Assessannota.onattheorganismallevel

Page 31: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Evaluateorganismalannota.oncoverageusing“slims”

•  EXPsupportedBP•  ISO/IEAinferredBP

‘unknowns’•  Speciesspecific,no

inferencepossible•  Conserved,but

unannotatedinanyspecies

Page 32: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.BrowsableSlim:

Page 33: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Sensibleassignments?

DNArecombina.on

PeriodiccheckthatslimclasscontentsLooksensible

Page 34: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Unknown&830&

TOTAL%5054%

cytoskeleton&&org&&206&

nuclear&DNA&&replica;on,&&recombina;on,&repair&305&

mito;c&&chromosome&&segrega;on&184&

&regula;on&of&&mito;c&&cell&&cycle&232&

10&

CELL%DIVISION%751%

27&

cytokinesis&110&

0&

39& 1&

46&

3&

4.%MITOCHONDRIAL%ORG/EXP%%280&

4&

cell&wall&&org&130&3&

4&

1&

MEMBRANES,%TRAFFICKING,%CELL%SURFACE%787%%

14&

lipid&met&222& vesicle&

Mediated&transport&324&

6&

glycosyla;on&polysacc&met&&&&&&&140&&

membrane&&org&199&

75&

0&

6&74&

10&

33&

0&

detox&&

SMALL%MOLECULE%TM%TRANSPORT%&288&&

13&

9&&

0&&

AA&&&sulfur&met&220&

&vitamin&cofactor&met&

9&

5&&

nucleoKbase/&side/;de&met&219&

small&&sugar&met&&&&&&&77&&

CENTRAL%MET,%ENERGY%%AND%BUILDING%%BLOCKS%549%

Nitrogen&15&&

25&174&

54&&

34&&

30&&

other&energy&genera;on&&&25&

23&&

signalling&404&

sexual&reproduc;ve&&process&&262&(Many&intersec;ons)&

Other&290&No&intersec;ons.&Includes&adhesion,&many&proteases,&peroxions&&&

EXPRESSION%1294%

````&

EXPRESSION%submod%863%

4& 1&3&

ribosome&&biogenesis&317&

RNA&&metabolism&772&&

cytoplasmic&transla;on&249&

189&

c&

nucleocyto&transport&&&&110&

5&

34&

26&

2&

Transcrip;on&479&&&&&

32&

18&&

PROTEIN%ASSEMBLY/STABILITY%%765%

protein&&catabolism&&&&autophagy&&&&&&&&&&251&

ubiqui;na;on&&&&&&&&&&&&192&&

63&

folding&102&

complex&&Assembly&325&

1&3&

4&

1&

7.Visualslim,allpombeproteins

Page 35: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Evaluateintersec.onsbetweenslimcategories

Evaluateintersec.onsbetweenprocessesManyGOprocessesarerarelyco-annotatedbecausetheyarefunc.onallyspa.allyortemporallydistant.Forexample,wouldnotexpect“ribosomebiogenesis”tointersectwith“vitaminmetabolism”Wecanusethisobserva.ontoiden.fypoten.alconflicts

Page 36: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Slimintersec.onsOct2014

xx

x x x x xx

x

x

xx

Page 37: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

2

Slimintersec.onsFeb2015

Page 38: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

March2016

Page 39: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Iden.fiesontologyerrors(e.g)

DNAmetabolismandchromosomesegrega.ondonotusuallyintersectRegula.onofchromosomecondensa.onshouldnotbeaDNAmetabolicprocess

Page 40: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Ontologyerror(e.g.)

FolicacidisclassifiedasanaminoacidbyCHEBI,sofolatemetabolismisalsoannotatedtoAminoacidmetabolism.Needtofix,CHEBI,whichwillfixGO

Genesannotatedtofolicacidmetabolismwerealsoincorrectlyannotatedtoaminoacidmetabolism

Page 41: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

7.Findsincorrectmappings(e.g)

IntersectbetweentRNAmetabolismandtranscrip.on.Elongatorisnolongerthoughttohaveadirectroleintranscrip.on,mappingremoved

Page 42: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

8.ConsiderAuthorintentThinkaboutthebiologytheauthorintendede.g.rubidiumiontransmembranetransporter/transportRubidiumionisusedasanassayforK+transportnotrubidium(non-physiologicalsubstrate)e.g.Apoptosis(RPS19)Rps19mutantdisplayedcondensedDNA,afragmentednucleusandcaspaseac.va.on-indica.veofapoptosis.SinceRPS19hasanessen.alroleinribosomebiogenesisapoptosisislikelytobeanindirecteffectofthedisrup.onofanupstreamprocesstransla.on(i.e.anexperimentalreadout)

Page 43: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

9.Communica.onwiththeauthorandcommunitycura.on

•  Mostauthorsarehappytodiscusstheirpublica.ons.Ifunsureaboutanannota.onaskthem.PomBaserou.nelyusetheauthorsasaQCsteptorefineannota.on.

•  Mostauthorsarehappytocuratetheirownpapers(especiallyPhD/postdoc/recentpapers).>40%of***papersassignedtocommunityhavebeenreturnedwithhighqualityannota.on

Page 44: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

9.CommunityCura.on•  …..Authorsalsocuratetheirownrecentpapers

Co-cura.onbyauthorandcuratorimprovesannota.onquality

Page 45: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Someexamplesessions

•  hQp://,nyurl.com/q2bgyqv•  hQp://,nyurl.com/p7d979b•  hQp://,nyurl.com/o72bzul

Page 46: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Verydetailedannota.onismadepossiblebecauseCantoGuidestheuserstepbysteptoconstructgenotypesandontologybasedannota.ons.“Drilldown”tomorespecifictermsisassisted.PromptsareprovidedforAEofspecifiedtypesforcertainterms

Page 47: PomBase conventions for improving annotation depth, breadth, consistency and accuracy
Page 48: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Isitsuccessful?

•  Numbers•  Showincreaseuptakegraph?•  Accuracy,high,butowenomissions•  Oncepeopledoneonehappytorepeat•  Bemeronsubsequentsessions•  Moresuccessclosertopublica.ondate•  Mo.va.on–publica.onvisibility(datadissemina.on)

Page 49: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

10.Priori.seerrorfixing•  Fixingknownerrorstakesprecedenceovernewannota.on....

likecri.calbugsincode•  Evensmallerrorsowenuncoverlargerissues,orcanfixmany

problemssimultaneouslyacrossmul.plespecies.•  Preventspropaga.onofannota.onerrors

Page 50: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Summary

Page 51: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

Spareslides

Page 52: PomBase conventions for improving annotation depth, breadth, consistency and accuracy

8.GOvs.phenotype•  GOannota.onsshouldreflectagene'sdirectinvolvementin,orroleinregula.ng,processes

orfunc.ons.Incontrast,phenotypeannota.onsindicatethatamuta.oncausesachangeinaprocess,butmayreflectdownstreamorindirecteffects.

•  ERmembranedefect->nuclearenvelopedefect->chromosmedecondensa.ondefect->defectsinnextroundofDNAreplica.on.ClearlyaDNAreplica.onphenotypealoneisnotenoughtomakea“DNAreplica.on”GOannota.on.

•  AtPomBaseweonlymakeGOannota.onsbasedonphenotypesif

i)Thephenotypeisknowntobecompletelydetermina.vefortheprocessIi)Addi.onaldatasupportsGOinferencefromphenotype(loca.on,orthology)

•  Owentheexperimentsdonotdefini.velyresolvetheexactprocessthatthegeneisinvolved

in.•  Intersec.onsbetweenprocessesusefulforiden.fyingannota.onerrorscausedbyindirect

annota.on