Dryad Curation Practices
January 2014
Dryad Package/File Structure
DATA PACKAGEMETADATA
BITSTREAM (DATA)
PUBLICATION/ARTICLE
BITSTREAM (README)
BITSTREAM (DATA)
DATA FILEMETADATA
DATA FILEMETADATA Scholarly
publication/article associated with Dryad data package, not stored in Dryad
A Dryad data package is a conceptual and metadata object. It contains a summary description of all the constituent data files and creates the link with the associated publication. Each data file has a metadata description and at least one bitstream (additional bitstreams, such as readme files, are optional). Metadata pertaining to the publication (citation, publication date, article DOI) is stored in the data package. Metadata pertaining to each file and its embargo period is stored in each file record.
Important Curation Documents
• Curation manuals– http://wiki.datadryad.org/Curation
• [email protected]– Notifications of new submissions, newly published articles, other
assignments• Integrated journal metadata emails
– Access via [email protected]– Also correspond with authors using this account: send as help@
datadryad.org• Curator office whiteboard
– Google doc shared with dryadassistant google account– Includes login information for Dryad user accounts, EZID, etc.
• Submission tracking spreadsheet– Google doc shared with dryadassistant google account
• Templates for correspondence– http://wiki.datadryad.org/Templates_for_Correspondence
Integrated and Non-Integrated Journals
• Non-Integrated– No coordination between journal and Dryad (no metadata
emails, journal contact addresses for reporting, etc.)• Integrated
– Metadata emails send info ahead of submission– May use review workflow or only archive data after
manuscript acceptance– May require ‘blackout’ of Dryad submission until article
publication– Journal contacts are notified upon submission for review
(if using review workflow), acceptance to blackout queue, approval/archiving, and weekly summary
The original integrated workflow is represented to the right. Some journals now also use a review workflow with additional steps or require the Dryad data package to remain hidden until after article publication (what we call “blackout”).
Further integration details are available in the following presentation:http://wiki.datadryad.org/wg/dryad/images/c/c6/DryadIntegrationOverview.pdf
Basic Integrated Workflow (no review)
Author submits manuscript to
journal
Journal reports accepted
manuscript to Dryad; Dryad
creates provisional record
Journal invites author to submit data to Dryad & provides link to
provisional record
Author submits data to Dryad & receives DOI
Dryad curator approves
submission & sends DOI to
author & journal
Dryad publishes data files with link to article; Journal adds Dryad DOI to all forms of article
Review Workflow• Journal sends manuscript information to Dryad before
manuscript acceptance and invites authors to upload data.• Dryad submission is routed to private review workspace, not
main curation and publication queue. Passkey link is sent to journal for editor/reviewer access to Dryad submission.
• Author may continue to add files while submission is in review workspace.
• Journal sends second metadata email to Dryad with manuscript acceptance notification, triggering any associated submission to move from review to curation.
• Curator inspects and approves, queues, or rejects submission, as in basic workflow.
Review Workflow
Author submits manuscript to
journal
Journal reports manuscript
under review to Dryad; Dryad
creates provisional
record
Journal invites author to
submit data to Dryad &
provides link to provisional
record
Author submits data to Dryad, using link sent by journal to provisional
record
Dryad sends review passcode
and DOI to author & journal
Upon article acceptance,
journal notifies Dryad
Dryad publishes data files with link to article; Journal adds
Dryad DOI to all forms of article
Navigation
• Notifications of new tasks go to [email protected]• Log in to Dryad site your email and password
– http://datadryad.org/password-login • Dryad site, nested under your account name:
– My Submissions are submissions you have created– My Tasks are submissions you can act on as a curator– Workflow Overview provides a way to search for items
before or after archiving, and to force changes in their status that aren’t always available in the interface
Overview of New Submission Processing
JOURNAL
SETTINGS ARTICLE STATUS
ARTICLE STATUS
FILES FILES
FILES FILES
SETTINGS
REJECTAPPROVE
REJECT
REJECT
REJECT
APPROVE
APPROVEBLACKOUT
BLACKOUTERROR
INTEGRATED NON-INTEGRATED
PUBLISHEDNOT PUBLISHEDREVIEW? BLACKOUT?
METADATA EMAIL?SPREADSHEET ENTRY?
REVIEW PUBLISHEDACCEPTEDAPPROPRIATE
NOT APPROPRIATE
APPROPRIATE
NOT APPROPRIATE
NOT APPROPRIATE
NOT APPROPRIATE
APPROPRIATEAPPROPRIATE
BLACKOUT
NO BLACKOUT
Journal Settings
• See JournalSubmissionTracking spreadsheet shared in google docs.
• First tab (“Notes”) lists each integrated journal and its review and blackout settings.
• Also search for duplicate submissions or notes in appropriate sheet.
Article Status• If integrated submission, should be indicated in metadata
email (not stated = accepted manuscript)• Look for article DOI or volume information in the submitted
metadata, as clues to published status.• Google search and/or visit publisher website.
1. New submissions will be listed on the My Tasks page under the heading In Curation: Unclaimed
2. Once claimed, submission will appear in In Curation: Claimed list on My Tasks page.
3. Click Edit item(s) button (lower right when viewing the claimed submission) and open package and all files in tabs to inspect files and edit all metadata
Claiming Submissions
Inspecting Files• Check for technical problems, corrupt files, files that
won’t open in expected software, etc.• Files should contain something that looks like data, with
a very broad definition of data (supplementary figures, multimedia, etc., are ok, the manuscript itself is not).
• Look for copyright statements and licenses (generally unacceptable).
• Look for identifiable human subject data (err on the side of caution, see guidelines article at http://dx.doi.org/10.1136/bmj.c181).
• Look for duplicated files, data files uploaded in place of readme files, etc., and clean these up.
Rejecting Submissions
• The most common reasons for rejection are inappropriate files, submissions associated with integrated journals for which we have no metadata email, and integrated submissions that should have been directed to the review workspace but the author did not use the integrated process. A submission might also be rejected because a journal is out of scope, but always consult a senior curator before rejecting for this reason.
• When rejecting a submission, you must enter a reason. This reason will be sent to the submitter; it should be courteous and should explain clearly to them what the problem was and how they can fix it if they wish to resubmit.
• See Templates for Correspondence wiki page for common rejection explanations.
Editing Metadata• REMEMBER: some metadata, such as author names, is repeated on the package and files and will
need to be edited in both places.• Scan over all metadata to see if it looks reasonable and to identify problems.• Strip any formatting tags or mangled characters. International or special characters can often be
copied and pasted from metadata email or other source on the web.• Check the journal name, especially for non-integrated journals. It should match exactly the name
already in use in the repository. If it’s a new journal, ask a senior curator about establishing a new name.
• Author names should be formatted as – LastName, FirstName M. I.– Note the spaces between and periods after middle initials– remove any titles, such as “PhD”
• Data package title should be formatted as – Data from: Article title in sentence case: no caps following colons, either.
• Add specialized keywords (geographic, temporal, scientific name), moving them from/to general subject keywords, as appropriate. Scientific names should be Latin (common names go in dc:subject instead) and should be recognized by http://eol.org/.
• Look for line breaks, especially in article abstract and file descriptions and edit these fields as needed for clarity when the content is displayed without line breaks.
• Check for inappropriate embargoes (e.g., ‘custom’ when we have no info from journal, ‘untilArticleAppears’ when article is out) and adjust as needed. If custom embargo, add embargo period (from journal) as dryad:curatorNote in file metadata.
Approving/Archiving (no blackout)1. Check for duplicates and notes in tracking spreadsheet, if you haven’t
already done so.2. For submissions formerly in review, update title, authors, and abstract as
necessary to match the acceptance notice.3. Click Approve. Email notification is sent automatically.4. Visit Dryad Data Packages collection page http://datadryad.org/
handle/10255/3 and find item in Recent Submissions list (if not there, look for it on My Tasks page or track down any error).
5. Check that the DOI resolves (there may be a few minutes delay after registering). DOIs should have been automatically registered, check EZID if there is a problem.
6. Update submission tracking spreadsheet.
Spring 2013 update: do not delete DOIs that were duplicated in the metadata upon approval. These no longer create broken links, and leaving the duplicates in place may help the developers track down the underlying problem. December 2013: We believe the problem is solved. Please alert the Senior Curator if you notice a duplicated DOI.
Placing Submission in Publication Blackout Queue
You should have already claimed the item with Dryad Queue account, inspected files, edited metadata, and checked for duplicate submissions at this point.
1. Add entry to submission tracking spreadsheet (or update existing entry for submission that was previously in review).
2. Click Accept with Blackout button. Once file has successfully moved into the blackout queue, click send on the acceptance email.
Searching for items to updateSometimes you will know to update entries because of journal alert emails (from integrated journals), and sometimes you will be searching by article titles to find published items (from nonintegrated journals).1. Integrated Journals: Do a title search in the journal tracking spreadsheet
for articles that are listed in the update mailers sent to the Dryad Assistant gmail account. Update as outlined on the next slide. If a newly published article is listed in the spreadsheet as being in review, check the journal.submit gmail account for an acceptance email before asking for the item to be pushed to curation. Never ask for a package in review to be pushed to curation unless you have an acceptance email from the journal editor.
2. Nonintegrated Journals: Do google searches on titles listed in the tracking spreadsheet. Be aware that titles sometimes change between Dryad submission and publication. Update published articles as listed in the next slide.
Updating Archived Items Once Article is Published
You have made a match between a published article and a Dryad data package that needs to be updated.
1. Check author names, article title, and article abstract against published article and update if needed.
2. Edit package dc:date.issued to match earliest (online) publication date of article (format as YYYY-MM-DD)
3. Add article DOI to package dc:relation.isreferencedby (format as doi:####)4. Add article citation to package dc:identifier.citation or updated existing citation (if advance access
online article now has print citation). Format as:LastName F, LastName FM (YYYY) Article title in sentence case. Journal Name Vol(Num): page-page.
orLastName F, LastName FM (YYYY) Article title in sentence case. Journal Name, online in advance of print.
5. Lift embargoes or set embargo end dates for each file, as appropriate. Go to the Item Embargo pane in Edit Item to work with embargoes.
6. Visit public view of package page (leave Edit Item) and verify article citation, resolvable article DOI, and updated embargoes.
7. Update submission tracking spreadsheet.
Approving Submission out of Publication Blackout
1. Find the package by searching My Tasks or Workflow Overview page. Claim the task.
2. Update metadata as described in Updating Archived Items Once Article is Published. Because this submission isn’t archived yet, dates won’t have been added to the metadata by the system, so you will add the article publication date to package as dc:date.issued, instead of editing an existing value.
3. Click Approve and Archive.4. Visit Dryad homepage and find item in Recently Published list or on
the Data Packages Collection page if files are embargoed. (If not in either place, look for it on My Tasks page or track down any error.)
5. Check that the DOI resolves.6. Update file embargoes (lift or set end date, as appropriate). Go to
the Item Embargo pane in Edit Item for each file to work with embargoes.
7. Update submission tracking spreadsheet.
Genomic Resources Notes (from MolEcolRes)
• These are a new (as of summer 2013) type of article being submitted to Molecular Ecology Resources.
• They are submitted to Dryad with XXXX instead of dates.• Once the articles have been accepted, change the X’s to the
accepted dates.• Then add the citation, which should look like this:
– Genome Resources Development Consortium et al. (2013) Genomic Resources Notes accepted 1 February 2013–31 March 2013. Molecular Ecology Resources 13(4): 759. doi:10.1111/1755-0998.12123
Adding data in review to the spreadsheetWhen data is submitted associated with an article in review, it must be entered into the spreadsheet. • Refer to the email notification forwarded from the senior curator for
title, author, doi, and review workspace URL.• Search the spreadsheet for the title and manuscript number to make
sure that it isn't a duplicate.– If it’s a duplicate, enter in red in the “Notes” column of each duplicate: “has dup”– Duplicate submission warnings are in the process of being created. Ideally, if
someone has data in review and tries to re-submit it in the first-time submission area, they’ll be taken to their existing submission.
• Refer to the metadata email ([email protected]) to see if the submitter is the lead author. For submitter Smith and lead author Jones, write “Jones (submitted by coauthor Smith)”
• In the final field, paste the reviewer key URL• Make the row’s background color light gray
PACKAGE METADATA GUIDEAuthors dc:contributor.author repeatable required LastName, FirstName M.
Corresponding author dc:contributor.correspondingAuthor
not repeatable required LastName, FirstName M.
Spatial coverage dc:coverage.spatial repeatable optional place names, geographic coordinates, etc
Temporal coverage dc:coverage.temporal repeatable optional intended for geologic timespans, but years and other values are accepted
Approval timestamp dc:date.accessioned not repeatable required system-generated upon submission approval
Approval timestamp dc:date.available not repeatable required system-generated upon submission approval
Article publication date dc:date.issued not repeatable required system-generated to match approval date, later edited by curator to article publication
Data package DOI dc:identifier not repeatable required doi:10.5061/dryad.####
Article citation dc:identifier.citation not repeatable optional modified PLoS citation style
Journal’s manuscript ID
dc:identifier.manuscriptNumber
not repeatable optional only for integrated submissions
Data package handle dc:identifier.uri not repeatable required http://hdl.handle.net/10255/dryad.####, system-generated upon submission approval
Abstract dc:description not repeatable optional article abstract
Component data file DOIs
dc:relation.haspart repeatable required doi:10.5061/dryad.####/1, doi:10.5061/dryad.####/2, etc
Article volume, issue, year
dc:relation.ispartofseries
not repeatable optional only present if entered by depositor during submission
Article DOI dc:relation.isreferencedby
not repeatable optional doi:####
Keywords dc:subject repeatable optional
Data package title dc:title not repeatable required Data from: Article title
Record type dc:type not repeatable required system-generated, now set to “Article”
Curator note dryad.curatorNote repeatable optional rarely used
Scientific names dwc:ScientificName repeatable optional Latin taxon names
Journal name prism:publicationName not repeatable required use authorized form of name only
FILE METADATA GUIDEAuthors dc:contributor.author repeatable required LastName, FirstName M.
Spatial coverage dc:coverage.spatial repeatable optional place names, geographic coordinates, etc
Temporal coverage dc:coverage.temporal repeatable optional intended for geologic timespans, but years and other values are accepted
Approval timestamp dc:date.accessioned not repeatable required system-generated upon submission approval
Bitstream availability timestamp
dc:date.available not repeatable required system-generated upon availability of bitstreams for download (will not appear if file is embargoed)
Approval date dc:date.issued not repeatable required system-generated upon submission approval
Data file DOI dc:identifier not repeatable required doi:10.5061/dryad.####/#
Data file handle dc:identifier.uri not repeatable required http://hdl.handle.net/10255/dryad.####, system-generated upon submission approval
File description dc:description not repeatable optional brief file description entered by depositor
Associated data package DOI
dc:relation.ispartof not repeatable required doi:10.5061/dryad.####
Rights information dc:rights.uri not repeatable required CC0 URI for all items except a few legacy items under Original License
Keywords dc:subject repeatable optional
Data file title dc:title not repeatable required
Record type dc:type not repeatable required system-generated, now set to “Dataset”
Curator note dryad.curatorNote repeatable optional rarely used, mostly to specify custom embargo dates
Scientific names dwc:ScientificName repeatable optional Latin taxon names
Embargo end date dc:date.embargoedUntil not repeatable optional YYYY-MM-DD, will have value 9999-01-01 for embargoed items when the article has not yet been published then edited by curator to real date, not present for items that were never embargoed or after embargo has been lifted (see dc:date.available for embargo lifting timestamp)
Embargo type dc:type.embargo not repeatable required controlled list of values: none, untilArticleAppears, oneyear, custom