If you can't read please download the document
Persistent identifiers for museum specimens, NeIC workshop, August 2015
Embed Size (px)
Citation preview
- 1. PersistentIden+ers,NeICworkshopAugust2015inOslo
DagEndresen,GBIFNorway,UiONaturalHistoryMuseum
- 2. Thepurposeofiden.ers istonamethings,
makingitpossibletorefertothem. 2
- 3. Nameambiguity: Manythings(inGBIF)arenamed123 3
Catalognumber:123 GBIFID:543392241 urn:catalog:CAS:BOT:123
Bigelowiajuncea Catalognumber:123 GBIFID:1030591721 UAMb:Herb:123
Sphagnumgirgensohnii Catalognumber:123 GBIFID:893477175
Parideserithalion Catalognumber:123 GBIFID:1050327334
Cinchonaledgeriana Catalognumber:123 GBIFID:231564351
Umbrinacanariensis Catalognumber:123 GBIFID:931031820 Bromuskalmii
Catalognumber:123 GBIFID:283363
urn:occurrence:Arctos:MVZ:Egg:123:164 Mercurialisovata
Catalognumber:123 GBIFID:896547722
urn:occurrence:Arctos:MVZ:Egg:123:164 Contopussordidulusveliei
- 4. Whenistheiden.ergoodenough?
Uniqueandpersistent-withinagivencontext.
ThecommonexperienceisthatanidenEeriscreatedwithin
asystemorwithinacontext,andthatatalaterdateitneeds
tobeusedinanotherorlargercontext(KarenCoyle2006). Expandingcontext:
1. Withinonemuseumcollec+on(catalognumber). 2.
Withinanetworkbetweenmuseumcollec+ons(collec+oncode+
cataloguenumber). 3.
Withinbiodiversityinforma.onnetwork(ins+tu+oncode+
collec+on/datasetcode+cataloguenumber). 4.
AttheInternet(e.g.hepURI,DOI,LSID,etc) 5.
largercontextsarepossibletoimagineinthefuture!! 4
- 5. Expandingcontext 5 Internet Museum Iden+er
- 6. Iden.ersformuseumcollec.ons Thelongevityofmuseumsleadto:
Theneedtouseiden3ersfromourpastinthecurrenthighly-
networkeddigitalsystems(KarenCoyle2006[talkingaboutlibraries]).
Specifyanamespacefortheiden+ers?
URIuniformresourceiden+er(uniqueinthecontextoftheweb).
URNuniformresourcename(namenot+edtoloca+on).
URLuniformresourcelocator(networkloca+onasiden+er).
PURLpersistentURL(commitmenttoservicelongevity). Somethingelse?
DOIdigitalobjectiden+er ARKarchivalresourcekey
UUIDuniversaluniqueiden+er 6
- 7. PersistentIden+er(PID) GloballyUniqueIden+er(GUID)
UniversalResourceIden+er(URI)
PersistentUniformResourceLocator(PURL) LifeScienceIden+er(LSID)
DigitalObjectIden+er(DOI) Handlesystem(Handle)
ArchivalResourceKey(ARK,EZID) UniversallyUniqueIden+er(UUID) 7
- 8.
Photo:SmithsonianNa+onalMuseumofNaturalHistory,USNM-445024-Eutoxeres-aquila
PURL Reuseexis3ngiden3ers 8
- 9. Globallyunique Scalability,numberofIDs Communityacceptance
Long-termlife-cycle Resolvable,resolu+onservice(s) Costperiden+er
People-friendlyormachine-friendly Solu+onforthegenera+onofnewIDs
Centralgenera+on,PIDissuer Distributedgenera.onatsource 9
- 10. AUUIDisa16-octet(128-bit)36-charsnumber.
Example:41d9cbb4-4590-4265-8079-ca44d46d27c3
Theprobabilityofoneduplicatewouldbeabout
50%ifeverypersononearthcreate600million UUIDs.
Allowsforeasygenera.onatsourceina distributednetwork. 10
- 11. hepPURLUUID
hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3
11
- 12. Iden+er Resolver Loca+on Specimen
Theresolverisasystemtoresolveloca+onsfromiden+ers,
enablingretrievalevenwhentheloca+onchanges.
hep://purl.org/nhmuio/id/[UUID] hep://gbif.no/resolver/[UUID]
No-informaEonobject(hMpredirect) hMp303 redirect
- 13. hep://purl.org/nhmuio/id/UUIDhep://gbif.no/resolver/UUID
hep://purl.org/gbifnorway/id/UUIDhep://gbif.no/resolver/UUID
13
- 14. Includingmachine readableformats 14
- 15.
Catalognumber:O-L-000014hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3
15
- 16. UUIDQRcodesformuseum objectsatNHM-UiOprovides:
Machine-readableiden.ers (usingasimplesmartphone-ora barcodereader)
Allowsfornewandecient workowsforcollec+on management.
Deploymentforstableiden.ers appropriatefordata-basing. 16
- 17.
hep://purl.org/nhmuio/id/41d9cbb4-4590-4265-8079-ca44d46d27c3
(machinefriendly) Catalognumber:O-L-000014 (humanfriendly)
Ecientworkowrou+nes
- 18. hep://gbif.no/transcribe/ 18
- 19. 19 Somekeychallengesforthegroupwork
ManyoftheoriginalsourcedatasetsindexedbyGBIFareregularlyupdatedandre-indexedbytheGBIFportal.Without
stableandpersistentiden+ersinforma+ononthesameherbariumspecimen(orspeciesobserva+on)aresome+mes
includedmorethanone.me,leadingtoduplicatedinforma.on-duplicatedinthesenseofmorethanone(unlinked)
datarecordforthesameRealWorlden+ty.
Withoutstableandpersistentiden+ersforherbariumspecimens(andspeciesobserva+ons)itisdiculttolinkthe
samedatarecordindexedatdierentre-indexingcyclesoftheGBIFportal.Whenadatarecordpreviouslyindexedisnot
re-iden+edinanewversionofagivendataset,thentherecordisdeletedfromtheportal,andthelinktoprevious
versionsofthisdatarecordislost.
Acompositekeyiden.er(suchastheDarwinCoretriplet)basedonacombina.onthemetadataaIributesfor
ins+tutecode(dwc:ins+tuteCode),collec+oncode(dwc:collec+onCode),andthelocalspecimeniden+er
(dwc:catalogNumber)isgenerallyusedasthespecimeniden+erinGBIF.However,allthreemetadataaeributescan
(anddo)some+meschange.
Whatcouldbeabestprac+ceguidelineforiden.erresolu.on.Isitusefultodeneandagreeona(setof)common
andwell-denedresponseformat?Isitusefultoproviderecommenda+onsforasetofmetadataproleswithaclear
setofdenedmetadataaeributes?Orwouldmoregeneralprinciplesandmoreopenrecommenda+onsbemorelikely
tostandthetestof+meandremainrelevantwiththeemergenceofnewinforma+oninfrastructuretechnologies?
Challenges,prosandconsofreusingobjectiden.ersandmetadataaIributetermsdeclaredbyotherswithoutfull
controlofhowtheseobjectsandtermsaremaintained.Objectsandconceptsdeclaredforapar+cularpurposewilloren
notmatchexactlytheneedssuitableforanotherpurpose.Howtoop+mallyreuseeachothersOWLontologies,
metadatavocabulariesanddataobjectmodels?
Iden.ersiden.fyingtheRealWorldphysicalobjects,theen++esthatthecollec+oncuratorsandusersofthe
informa+oncareabout.Orshouldtheiden+erbeassignedtodatabaserecords?RealWorlden++eswillnothavea
signaturebyte-sequenceandwillrelyofinterpreta+onofwhenanobjectisconsideredtobethesamething.
- 20. [email protected] DagEndresen [email protected]
Chris+anSvindseth [email protected] Gary Larson, 1987
20 Workshop in Oslo 26th Aug