22
DataONE Data Observa-onal Network for Earth Rebecca Koskela William Michener Dave Vieglais Amber Budden OSTP/NITRD Data Sharing and Metadata CuraDon: Obstacles and Strategies May 29, 2013

DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

  • Upload
    dobao

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

DataONE    Data  Observa-onal  Network  for  Earth  

Rebecca  Koskela  William  Michener  Dave  Vieglais  Amber  Budden    OSTP/NITRD  Data  Sharing  and  Metadata  CuraDon:  Obstacles  and  Strategies  May  29,  2013    

Page 2: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

2  2  

The  metadata  problem  

Page 3: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

3  3  

12 21 26 95 95 96 97

266

676

DIF DwC DC EML FGDC Open GIS

ISO My Lab none

Metadata  standards  

ScienDsts  want  to  share  data  Use  other  researchers’  datasets  if  easily  accessible  

Willing  to  share  data  across  a  broad  group  of  researchers    

Appropriate  to  create  new  datasets  from  shared  data  

84%  

81%  

76%  

Currently  share  all  of  their  data   6%  

but  don’t  know  how  to  and,  if  they  do,  want  to  get  proper  credit    for  doing  so.  

Page 4: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

4  4  

•  Make  it  easy  to  describe  data  •  Provide  credit  to  the  data/metadata  author  

•  CitaDon  •  Promote  discoverability  

•  Mandates  (ideally,  funded!)  

Some  soluDons  

Page 5: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

5  5  

Best  PracDces  and  So\ware  Tools  

Page 6: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

6  6  

Making  it  easy  to  describe  data  

 

Intercept  researchers  where  they  already  work  

Page 7: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

7  7  

Data  &    Metadata  (EML)  

Page 8: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

8  8  

Credit:  Dryad  repository  for  journal  data  &  metadata  

Page 9: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

9  9  

PromoDng  data  citaDons  via  Dryad  

Ar-cle  Wu  D,  Wu  M,  Halpern  A,  Rusch  DB,  Yooseph  S,  Frazier  M,  Venter  JC,  Eisen  JA  (2011)  Stalking  the  fourth  domain  in  metagenomic  data:  searching  for,  discovering,  and  interpreDng  novel,  deep  branches  in  phylogeneDc  trees  of  phylogeneDc  marker  genes.  PLoS  ONE  6(3):  e18011.  doi:10.1371/journal.pone.0018011      Dryad  data  package  Wu  D,  Wu  M,  Halpern  A,  Rusch  DB,  Yooseph  S,  Frazier  M,  Venter  JC,  Eisen  JA  (2011)  Data  from:  Stalking  the  fourth  domain  in  metagenomic  data:  searching  for,  discovering,  and  interpreDng  novel,  deep  branches  in  phylogeneDc  trees  of  phylogeneDc  marker  genes.  Dryad  Digital  Repository.  doi:10.5061/dryad.8384    

Page 10: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

10  10  

PromoDng  data  discovery  Provide  universal  access  to  data  about  life  on  earth  and  the  environment  

1.    Building  community  2.    Developing  sustainable  data  discovery  and  interoperability  soluDons  

3.  Enabling  science  through  tools  and  services  

Plan  

Collect  

Assure  

Describe  

Preserve  

Discover  

Integrate  

Analyze  

Page 11: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

11  11  

DataONE  Three  major  components  for  a  flexible,  scalable,  sustainable  network  

Coordina-ng  Nodes  •  retain  complete  metadata  catalog    

•  indexing  for  search  •  network-­‐wide  services  •  ensure  content  availability  (preservaDon)      

•  replicaDon  services  

Page 12: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

12  12  

DataONE  Three  major  components  for  a  flexible,  scalable,  sustainable  network  

Coordina-ng  Nodes  •  retain  complete  metadata  catalog    

•  indexing  for  search  •  network-­‐wide  services  •  ensure  content  availability  (preservaDon)      

•  replicaDon  services  

Member  Nodes  •  diverse  insDtuDons  •  serve  local  community  •  provide  resources  for  managing  their  data  

•  retain  copies  of  data  

Page 13: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

13  13  

DataONE  Three  major  components  for  a  flexible,  scalable,  sustainable  network  

Coordina-ng  Nodes  •  retain  complete  metadata  catalog    

•  indexing  for  search  •  network-­‐wide  services  •  ensure  content  availability  (preservaDon)      

•  replicaDon  services  

Member  Nodes  •  diverse  insDtuDons  •  serve  local  community  •  provide  resources  for  managing  their  data  

•  retain  copies  of  data  

Page 14: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

14  14  

DataONE  Three  major  components  for  a  flexible,  scalable,  sustainable  network  

Coordina-ng  Nodes  •  retain  complete  metadata  catalog    

•  indexing  for  search  •  network-­‐wide  services  •  ensure  content  availability  (preservaDon)      

•  replicaDon  services  

Member  Nodes  •  diverse  insDtuDons  •  serve  local  community  •  provide  resources  for  managing  their  data  

•  retain  copies  of  data  

Inves-gator  Toolkit    

Page 15: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

15  15  

DataONE: Enabling data discovery

ORNL  DAAC  

KNB  

PISCO  

SANParks  

ESA  

USGS  CSAS  Internal  Metadata  Index  

ONEShare  

UC  Merrik  

Extract  a

nd  Align  Metadata  

LTER  

CLO/AKN  

FGDC,  ISO,  DIF,  FGDC  

FGDC,  ISO,  FGDC  

EML,  FGDC  

EML,  ISO  

EML  

EML  

EML  

EML  

EML  

EML  

Augm

ent  M

etadata  

Search  API  

Page 16: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

16  16  

ICE  Collectors  

ICE  Users  

DataONE  Users  16  

InformaDon              Center  for  the  Environment  (ICE)  UC  Davis  

ICE  Collects  Water  Data     ICE  Users  

agencies  

ciDzens  

faculty  

Inves-gator  Toolkit  

Page 17: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

17  17  

•  SemanDc  mediaDon  •  Provenance  •  Improving  metadata  quality  over  Dme  

Some  remaining  challenges  

Page 18: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

18  18  

outcomes  

Powerful  Data  Discovery  via  SemanDcs  

topic  model  

formal  ontologies/  controlled  vocabularies  

term  matching  (TF-­‐IDF)  

query  

Enhanced  models  for  knowledge  representaDon  in  earth  and  environmental  sciences  

Powerful  model-­‐driven  search  interface  for  data  discovery  

Improved  Precision  Improved  Recall  Automated  annota-on  

18  

Page 19: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

19  19  

Provenance  Origin,  context,  deriva8on,  ownership,  history  of  (data)  ar8facts  

•  Record  processing  history,  data  lineage  

•  dependency  graph  

• W3C  standard:  PROV  

•  DataONE  Extension:  D-­‐PROV  • Workflow  provenance  •  System  agnosDc!  

 .  .  .   .  .  .   .  .  .  

Page 20: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

20  20  

Improving  metadata  quality  for  data  reuse  

Time  (<  1  yr)    

Inform

aDon

 Con

tent  

Planning  

CollecDon  

Assure  

DocumentaDon  

Archive  

Sufficient  for  Sharing  and  Reuse  

Page 21: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

21  21  

Mandates  (ideally,  funded!)    

Page 22: DataOne (DataNet Observational Network for the Earth) · DataONE’’ Data’Observa-onal’Networkfor’Earth’ RebeccaKoskela William&Michener& Dave&Vieglais& AmberBudden& & OSTP/NITRD&DataSharing&and&MetadataCuraon:&

22  22  

DataONE:  SupporDng  scienDfic  data  preservaDon,  discovery,  and  innovaDon