Upload
heather-piwowar
View
1.142
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Tracking Data Reuse Motivations, Methods, and Obstacles
Heather PiwowarDataONE postdoc with NESCent and Dryad
@researchremix
IASSIST2011 #iassist
http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
http://www.flickr.com/photos/jsmjr/62443357/
http://www.flickr.com/photos/camilleharrington/3587294608/
http://www.flickr.com/photos/rkuhnau/3318245976/
http://www.flickr.com/photos/conformpdx/1796399674/
http://www.flickr.com/photos/rkuhnau/3317418699/
http://www.flickr.com/photos/zemlinki/261617721/
http://www.flickr.com/photos/tracenmatt/3020786491/
http://www.flickr.com/photos/the-o/2078239333/
http://www.flickr.com/photos/ryanr/142455033/
?
http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
http://www.flickr.com/photos/archeon/2941655917/
In 2009, 116 articles cited ORNL DAAC data.
Finding these articles took 70-80 hours
across at least 12 resourcesall chosen from a deep understanding of this specific research domain
then the full text of all the hits were manually reviewed
Valerie Enriquez interview with James Kidderhttp://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data
publicly archived dataset
dataset has an iden2fier?(DOI, url, accession #)
IDs are difficult to unambiguously iden2fy in full text unless they have a unique paCern (DOI) or unusual prefix or suffix.
search in full text of all papers
search in reference sec2ons of all papers
sort hits to disambiguate reuse from submission
dataset submission record men2ons data collec2on ar2cle publica2on?
gather papers that cite the data collec2on paper
sort hits to disambiguate reuse from other cita2on contexts
dataset submission record has submiCer name or dataset
2tle?
with dataset unique ID
with (submi-er surname AND repository name), and also(dataset 9tle AND repository name)
with (first author surname AND repository name)
with dataset unique ID
DOI/ID search not supported by ISI Web of Science or Scopus
DOI/ID search works in Google Scholar, but scope is poorly defined, results are messy.
This cita2on paCern (dataset DOI/ID in references sec2on) is used almost exclusively for dataset reuse. Manual disambigua2on not required: can be automated pending API support.
Disambigua2on is 2me consuming: most cita2ons are not in the context of reuse
Requires access to full text of search hits for sor2ng
This flow s2ll misses aCribu2ons embedded in supplementary informa2on, reuses aCributed through a query descrip2on, etc.
Disambigua2on is 2me consuming
Requires access to full text of search hits for sor2ng
Only finds cita2ons indexed by cita2on databases
DOI/ID reference search possible in full-‐text portals like PubMed Central and HighWire Press, however portal coverage is limited and search is not restricted to references sec2on.
Cita2on history export is 2me consuming: automa2on not supported.
This cita2on paCern (cita2on to data crea2on paper) is very common in some subdisciplines, so probably finds most reuses.
This cita2on paCern (accession numbers in full text) is very common in some subdisciplines, so probably finds most reuses.Requires ability to query
full text across all literature that may contain reuse
Link to data collec2on paper oVen missing from dataset submission record, especially when dataset submission predates ar2cle publica2on.
Does not require access to full-‐text
How to iden9fy Dataset Reuse in the published literature
Names and 2tles are messy iden2fiers
Heather Piwowar, v1.0, CC-‐BY
This cita2on paCern is currently rare
This cita2on paCern is difficult to track with exis2ng tool limita2ons
with data collec2on ar2cle’s journal, volume, page, etc.
10 * 100 = 1000
publication-based datasets
deposited in 2005
1. following citations to the paper that describes the data
collection, then filtering.
2. searching for accession numbers, urls, and DOIs in
full text
http://api.plos.org/2011/05/31/announcing_the_plos_search_api/
2005 long time ago
biomedicine familiar, also very dominant
search interfaces not well designed for this task
helpdesks are very helpful
stay tuned for results
poster at ASIS&T, SIGUSE
I post my data, code, and statistical scripts: http://researchremix.org
Share yours too!
-> Open Notebook Science
http://www.flickr.com/photos/myklroventine/892446624/
https://notebooks.dataone.org/tracking1000datasets/
thank youTodd Vision,
Estephanie Sta MariaJonathan CarlsonDryad and DataONE teams
The open science online community and those who release their articles, datasets and photos openly