33
1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane Greenberg Professor, College of Computing & Informatics (CCI) Director, Metadata Research Center <MRC> Erin Clary, Dryad Curator,

1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Embed Size (px)

Citation preview

Page 1: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

1

The Dryad Data Repository: Metadata

Workflows and Processes2nd Data Management Workshop

November 28th – 29th 2014University of Cologne, Germany

Jane Greenberg Professor, College of Computing & Informatics (CCI)Director, Metadata Research Center <MRC>Erin Clary, Dryad Curator, CCI/MRC

Page 2: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane
Page 3: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

3

Page 4: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

http://datadryad.org/

Page 5: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane
Page 6: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Pre-populated metadatafield

Page 7: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

7

Page 8: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

8

Elsevier’s Science Direct: EXAMPLE: Dryad Unmack,  et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019.

Page 9: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Elsevier’s Science Direct: EXAMPLE: Dryad Unmack,  et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019

Page 10: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane
Page 11: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Data downloads reuse citation

Observations, motivating study of metadata capital1.Metadata generation costs money

2.Metadata reuse is a BIG a BIG part part of Dryad’s workflow3.Metadata reuse via OAI4.Metadata reuse via data sharing, reuse, and repurposing

Download 10678 times

Page 12: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Greenberg J, Swauger S, Feinstein EM (2013) Data from: Metadata capital in a data repository. Proceedings of the International Conference on Dublin Core and Metadata Applications http://dx.doi.org/10.5061/dryad.8c1p6

Page 13: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Journal Re.Wrkfl

Blackout

AmNtrl N NMBE N NBioRisk Y NBMJ Open

Y N

…. Y

Type Total 30 days

Data packages 6867 198

Data files 21056 977

Journals 364 77

Authors 24500 3492

Downloads 639314 36006

• Journals (80+…PLOS): http://datadryad.org/pages/integratedJournals

• X >10GB = $15,$10+

Page 14: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

http://wiki.datadryad.org/Sample_Dryad_Content#Examples_by_file_type

Page 15: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

TechnologyDSpace DOIs via CDL/DataCiteCC0 (<m> + data)Integration with specialized repositories and databasesFederated searching with TreeBASE and KNB LTERTreeBASE submission (OAI-PMH)GenBank (currently in development)

Governance““non-profit status, 12 non-profit status, 12 member Board of Directors”member Board of Directors”

Sets policy, goals•science, journals, societies, OCLC, MS

2006 Dryad development – NESCent +<MRC>•Stakeholders: journals, publishers and scientific societies, and researchers.

2009-2012: Interim Board

$ PAYMENT-Sept. 1,2014

Page 16: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Sustainability: Plan Comparison

Payment Plan Member Non-member Minimum purchase

1. Voucher Plan USD$65 per data package

USD$70 per data package 25 vouchers

2. Deferred Payment Plan

USD$70 per data package

USD$75 per data package 1 yr contract

3. Subscription Plan

Annual fee based on USD$25 per published research article

Annual fee based on USD$30 per published research article

2 yr contract

For individuals:Pay on acceptance NA

USD$80 per data package, payable by the submitter

1 data package

Page 17: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

More on grown and sustainability Membership: 

http://datadryad.org/pages/membershipOverview

Pricing and sponsorship of deposits: http://datadryad.org/pages/pricing

Journal integration:  http://datadryad.org/pages/

journalIntegration

Page 18: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

18

Page 19: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Metadata research & developmentMetadata research & development1.Curation workflow - cognitive walkthroughs2.Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010)3.Metadata reuse - content analysis (Greenberg, IDCC Research Summit, 2010) 4.Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012)5.Name-authority control - exploratory study (Haven, 2009, INLS 720)6.KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM)7.Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) 8.Vocabulary needs (HIVE) (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib)9.Metadata theory – deductive analysis (Greenberg, 2009)

Page 20: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Singapore Framework

Dryad DCAP, ver. 3.0bibo (The Bibliographic Ontology)dcterms (Dublin Core terms)dryad (Dryad) DwC (Darwin Core)

Vision1.Simple: automatic metadata gen; heterogeneous datasets *Data-package centric2.Interoperable: harvesting, cross-system searching 3.Semantic Web compatible: sustainable; supporting machine processing

Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/19386380903405090.

Page 21: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

21

Helping Interdisciplinary Vocabulary Engineering (HIVE)HIVE)

Page 22: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

~~~~Amy~~~~Amy

DATADATA

publicationpublication

Page 23: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane
Page 24: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane
Page 25: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Package metadata harvested from email

Subj. 177 (gr. 97%, rd. 2%, bl. 1%)

Contr. 101 (gr. 99%, bl. 1%)

Page 26: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Modified Capital-sigma notation

Reuse

nR + ∑ ai = R + a1 + a2 +a3 + …an

i=1R = value of the metadata recordi= number of usagesa = incremental increase in valuen = maximum number of reuse

Page 27: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

27

Author/Submitter | Curator

100 metadata instantiations•8 of 12 metadata properties had reuse @ 50% or greater•5 of 8 confirmed reuse at• 80% or higher. •Basic bib. vs. complex

Page 28: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Author

Subject

Dcterms.spatial

DwC.ScientificName

Page 29: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Conclusion…other Valuation Approaches

Market cap of Facebook per user: $40 – $300 Revenues per record per user: $4 – $7 per year

• Facebook• Experian

Market prices of personal data:

• $0.50 for street address• $2.00 for date of birth• $8 for social security number• $3 for driver’s license number• $35 for military record

SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.

Page 30: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

Concluding comments

Success story Contribution, have to start

somewhere…• Good timing, the right discipline

Confirmed use, reuse Machine capabilities An educative commons, intellectually

engaging

Page 31: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

http://wiki.datadryad.org/Sample_Dryad_Content

Page 32: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

32

Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp,

Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI)

**Drexel/UNC <Metadata Research Center>: Jose R. Pérez-Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary

U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research

Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and

many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton

Page 33: 1 The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane

33

http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki

http://code.google.com/p/[email protected]

Facebook: Dryad Twitter: @datadryad

http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/

Metsdata Reserch Center: http://cci.drexel.edu/mrc

http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki

http://code.google.com/p/[email protected]

Facebook: Dryad Twitter: @datadryad

http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/

Metsdata Reserch Center: http://cci.drexel.edu/mrc