70
OPEN DATA: ENHANCING PRESERVATION, REPRODUCIBILITY, AND INNOVATION Clarke Iakovakis Scholarly Communications Librarian Neumann Library CC BY-SA 3.0-2.5-2.0-1.0 image courtesy Daniel Tenerife - Own work. Title: "Social Red" https://commons.wikimedia.org/wiki/File:Social_Red.jpg#mediaviewer/Fil e:Social_Red.jpg This work is licensed under a Creative Commons Attribution- NonCommercial -ShareAlike 4.0 International License.

Open data: Enhancing preservation, reproducibility, and innovation

  • Upload
    ciakov

  • View
    53

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Open data: Enhancing preservation, reproducibility, and innovation

OPEN DATA: ENHANCING

PRESERVATION, REPRODUCIBILITY, AND

INNOVATION

Clarke IakovakisScholarly Communications Librarian

Neumann Library CC BY-SA 3.0-2.5-2.0-1.0 image courtesy Daniel Tenerife - Own work. Title: "Social Red" https://commons.wikimedia.org/wiki/File:Social_Red.jpg#mediaviewer/File:Social_Red.jpg

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Page 2: Open data: Enhancing preservation, reproducibility, and innovation

CHANGE IN MINDSET

“data is no longer regarded as static or stale, whose usefulness is finished once the purpose for

which it was collected was achieved.”

- Kenneth Cukier and Viktor Mayer-Schönberger

"in some fields, the data are coming to be viewed as an essential end product of research, comparable in value to journal articles or

conference papers”

- Christine Borgman

Page 3: Open data: Enhancing preservation, reproducibility, and innovation

OUTLINE

• Data-centric scholarship

• Benefits & challenges of open data• Defining open data

• Reproducibility

• Public use & data management plans

• Data reuse

• Concerns and open questions

• Where to deposit data?

Page 4: Open data: Enhancing preservation, reproducibility, and innovation

SECTION 1

FROM DOCUMENT TO DATA-CENTRIC VIEW OF SCHOLARSHIP

Page 5: Open data: Enhancing preservation, reproducibility, and innovation

WHAT WE MEAN BY “DATA”

A wide definition:

any information that can be stored in digital form, including text, numbers, images, video or movies,

audio, software, algorithms, equations, animations, models, simulations, etc. Such data

may be generated by...observation, computation, or experiment

- National Science Board

National Science Board. Long-Lived Data Collections: Enabling Research and Education in the 21st Century. Arlington, VA (2005): 13. https://www.nsf.gov/pubs/2005/nsb0540/nsb0540_3.pdf

Page 6: Open data: Enhancing preservation, reproducibility, and innovation

WHAT IS RESEARCH DATA?

collected, observed, accessed, or created, for the purposes of analysis to produce and validate original

research results.

What is a routine collection at one point can become research data in the future

Thus research data are very much about whenthey are used, as well as what they constitute, and

the purpose for which they are to be used

University of Edinburg. “Research Data Explained.” http://mantra.edina.ac.uk/researchdataexplained/

Page 7: Open data: Enhancing preservation, reproducibility, and innovation

Hard Science: Scientific data generated by instrumented research projects

Social science: data generated from government statistics, online surveys, behavioral models

Humanities: bodies of text, digital images and video, models of historic sites

WHAT IS RESEARCH DATA?

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Page 8: Open data: Enhancing preservation, reproducibility, and innovation

Applying information technology to research problems

Collaborations across disciplines & increasing size of collaborations

Increasing the complexity and quantity of research data

DATA INTENSIVE RESEARCH

Page 9: Open data: Enhancing preservation, reproducibility, and innovation

DATA INTENSIVE RESEARCH

• Scientific instruments generate data at greater speeds, densities, and detail

• Digitization of older print & analog data

• Born digital data

• Data storage capacity increases & storage costs decrease, enabling preservation of data

• Improvements in searching, analysis & visualization tools

Page 10: Open data: Enhancing preservation, reproducibility, and innovation

World’s technological installed capacity to store information (table SA1) (16).

M Hilbert, and P López Science 2011;332:60-65

Page 11: Open data: Enhancing preservation, reproducibility, and innovation

SLOAN DIGITAL SKY SURVEY

The most distant quasar ever discovered (at least as of October 2003). The redshift 6.4 quasar is seen at a time when the universe was just 800 million years old. The light-travel time from this object to us is about 13 billion years.

http://www.sdss.org

Page 12: Open data: Enhancing preservation, reproducibility, and innovation

Data is the new oil. It’s valuable, but if unrefined it cannot really be used. It

has to be changed into gas, plastic, chemicals, etc. to create a valuable entity that drives

profitable activity; somust data be broken down, analyzed for it

to have value

- Clive Humbly

Image © Against All Odds Productions

Page 13: Open data: Enhancing preservation, reproducibility, and innovation

VALUE OF DATA

Pryor, Graham. “Why Manage Research Data?” Managing Research Data. London: Facet Publishing, 2012.

Page 14: Open data: Enhancing preservation, reproducibility, and innovation

VALUE OF DATA

Value of a dataset can be• Immediate

• Gained over time

• Transient

• Little (i.e. it’s easier to recreate than curate)

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Page 15: Open data: Enhancing preservation, reproducibility, and innovation

VALUE OF DATA

“Fundamentally, there is a shift from a document-centric view of scholarship to a data-

centric view of scholarship”

- Sayeed Choudury

Choudury, Sayeed. "Data curation: An ecological perspective." College & Research Libraries News 71, no. 4 (2010): 194-196.

Page 16: Open data: Enhancing preservation, reproducibility, and innovation

WHY OPEN?Data that underpin a journal article should be made concurrently available in an accessible

database.

We are now on the brink of an achievable aim: for all science literature to be online, for all of the data to be online and for the two to be

interoperable.

Adapted from Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data“ Licensed under CC BY

Royal Society June 2012, Science as an Open Enterprise, http://royalsociety.org/policy/projects/science-public-enterprise/report/

Page 17: Open data: Enhancing preservation, reproducibility, and innovation

DATA AVAILABILITY

Vines, Timothy H, Arianne Y K. Albert, Rose L Andrew, Florence Débarre, Dan G Bock, Michelle T Franklin, Kimberly J Gilbert, et al. "The Availability of Research Data Declines Rapidly with Article Age." Current Biology 24, no. 1 (1/6/ 2014): 94-97. https://linkinghub.elsevier.com/retrieve/pii/S0960-9822(13)01400-0

Researchers requested data sets from a relatively homogenous set of 516 articles published 1991-2011 in field

of zoology

Tracking down the authors & getting a response was the first challenge.

For every yearly increase in article age, the odds of the data set being reported as extant decreased by 17%

When the authors did give the status of their data, the proportion of data sets that still existed dropped from 100%

in 2011 to 33% in 1991

Page 18: Open data: Enhancing preservation, reproducibility, and innovation

DATA AVAILABILITY

Vines, Timothy H, Arianne Y K. Albert, Rose L Andrew, Florence Débarre, Dan G Bock, Michelle T Franklin, Kimberly J Gilbert, et al. "The Availability of Research Data Declines Rapidly with Article Age." Current Biology 24, no. 1 (1/6/ 2014): 94-97. https://linkinghub.elsevier.com/retrieve/pii/S0960-9822(13)01400-0

Many of these missing data sets could be retrieved only with considerable effort by the

authors, and others are completely lost to science

Page 19: Open data: Enhancing preservation, reproducibility, and innovation

DATA LOSS

Adapted from Mitcham, Jenny & Lindsey Myers. “Managing your research data”. Licensed under CC BY-NC-SA

Page 20: Open data: Enhancing preservation, reproducibility, and innovation

Adapted from Mitcham, Jenny & Lindsey Myers. “Managing your research data”. Licensed under CC BY-NC-SA

DATA LOSS

Page 21: Open data: Enhancing preservation, reproducibility, and innovation

DATA LOSS• Human error

• Natural disaster

• Facilities infrastructure failure

• Storage failure

• Server hardware/software failure

• Application software failure

• Format obsolescence• Legal encumbrance • Malicious attack • Loss of staffing

competencies• Loss of institutional

commitment • Loss of financial stability

Peters, Christie. Research Data Management: Basics and Best Practices. http://uknowledge.uky.edu/cgi/viewcontent.cgi?article=1000&context=rdsc_workshops. Licensed under CC BY

Page 22: Open data: Enhancing preservation, reproducibility, and innovation

DISCUSSION

• Have you seen a shift to a data-centric research culture in your discipline?

• Is data availability a concern among you or your colleagues?

• Other ideas & questions

• Up next: Open Data Benefits and Challenges

Page 23: Open data: Enhancing preservation, reproducibility, and innovation

SECTION 2

BENEFITS AND CHALLENGES OF OPEN DATA

Page 24: Open data: Enhancing preservation, reproducibility, and innovation

WHAT IS OPEN DATA?

Page 25: Open data: Enhancing preservation, reproducibility, and innovation

HIGH ASPIRATIONS, LOW UPTAKE

• Berlin Declaration for Access to Knowledge in the Sciences and Humanities (2003: 572 institutions)

• Recommendations for Access to Data from Publicly Funded Research (2006, all OECD member states)

Page 26: Open data: Enhancing preservation, reproducibility, and innovation

CULTURE CHANGE?

A survey of 17,000 UK doctoral students

Showed that they are privately open to sharing resources

But in practice, followed behaviors of supervisors

And fear losing future publication opportunities

Researchers of Tomorrow – The Research Behaviour of Generation Y Doctoral Students. London, United Kingdom: JISC. Retrieved from: http://www.jisc.ac.uk/publications/reports/2012/researchers-of-tomorrow.

Tenopir, C, Dalton, E D, Allard, S, Frame, M, Pjesivac, I, Birch, B, Pollock, D and Dorsett, K (2015). Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide. PLoS ONE 10(8): e0134826.DOI: https://doi.org/10.1371/journal.pone.0134826

Page 27: Open data: Enhancing preservation, reproducibility, and innovation

STRUCTURAL BARRIERS

Small data could initially be published as part of the original publication as tables

As size and complexity of data grew and publishers enforced page limits, data publication

was prohibited or impossible

Klump, J., (2017). Data as Social Capital and the Gift Culture in Research. Data Science Journal. 16, p.14. DOI: http://doi.org/10.5334/dsj-2017-014

Page 28: Open data: Enhancing preservation, reproducibility, and innovation

WHAT IS OPEN DATA?

The Open Definition (opendefinition.org):

“Open data and content can be freely used, modified, and shared by anyone for

any purpose”

Screencap © Open Data Commons Attribution License:

http://opendatacommons.org/licenses/by/summary/

Page 29: Open data: Enhancing preservation, reproducibility, and innovation

Fair Data Principles:

Findable

Accessible

Interoperable

Reusable

WHAT IS OPEN DATA?

Page 30: Open data: Enhancing preservation, reproducibility, and innovation

OPEN SCIENCE

CC-BY-SA image courtesy PLOS One.Credit: Ainsley Seago.

doi:10.1371/journal.pbio.1001779.g001

Table © Fecher & FriesikeFrom Open Science: One Term, Five Schools of Thought

http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2272036

Page 31: Open data: Enhancing preservation, reproducibility, and innovation

RATIONALES FOR SHARING RESEARCH DATA

• Stakeholders• Researchers• Public• Journals• Funders• Libraries

• Motivations to share• Needs of research community• Needs of the public at large

• Beneficiaries of sharing• Those who produce the data• Those who use the data

Page 32: Open data: Enhancing preservation, reproducibility, and innovation

REPLICATION & REPRODUCIBILITY

Page 33: Open data: Enhancing preservation, reproducibility, and innovation

Alexander, Ruth. “Reinhart, Rogoff... and Herndon: The student who caught out the profs.” BBC News http://www.bbc.com/news/magazine-22223190

“This week, economists have been astonished to find that a famous academic paper often

used to make the case for austerity cuts contains major

errors. Another surprise is that the mistakes, by two eminent

Harvard professors, were spotted by a student doing his

homework.”

REPLICATION/REPRODUCIBILITY

Page 34: Open data: Enhancing preservation, reproducibility, and innovation

REPLICATION/REPRODUCIBILITY

• 90% of respondents to a recent survey in Nature agreed that there is a ‘reproducibility crisis’

• Increasing number of retractions

• Failures to replicate high profile studies

• Underlying causes• Mechanized reporting of statistical results

• Publication bias towards statistically significant results

Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).

Page 35: Open data: Enhancing preservation, reproducibility, and innovation

REPLICATION/REPRODUCIBILITY

• Transparency and Openness Promotion (TOP) Guidelines (https://osf.io/9f6gx/wiki/Guidelines/)

• Badges to articles with open data

• The Peer Reviewers' Openness Initiative

• Open Science Foundation Reproducibility Project (https://osf.io/ezcuj/wiki/home/)

• Science Exchange Reproducibility Initiative (http://validation.scienceexchange.com/#/)

Page 36: Open data: Enhancing preservation, reproducibility, and innovation

JOURNAL MANDATES• Mandatory requirement to archive data publically

unless there is a valid reason not to• Response to low voluntary uptake

• To allow reproduction of reported results

• Ecology, evolution, biology

• These policies do work to increase data archiving

• However, the quality varies…

Roche DG, Kruuk LEB, Lanfear R, Binning SA (2015) Public Data Archiving in Ecology and Evolution: How Well Are We Doing? PLoSBiol13(11): e1002295. https://doi.org/10.1371/journal.pbio.1002295

Page 37: Open data: Enhancing preservation, reproducibility, and innovation

JOURNAL MANDATES

Researchers surveyed 100 datasets associated with nonmolecular studies in journals that commonly publish ecological and evolutionary research and

have a strong PDA policy.

Out of these datasets, 56% were incomplete, and 64% were archived in a way that partially or entirely

prevented reuse.

Roche DG, Kruuk LEB, Lanfear R, Binning SA (2015) Public Data Archiving in Ecology and Evolution: How Well Are We Doing? PLoSBiol13(11): e1002295. https://doi.org/10.1371/journal.pbio.1002295

Page 38: Open data: Enhancing preservation, reproducibility, and innovation

REPLICATION/REPRODUCIBILITY

"True reproducibility requires deep engagement with the epistemological questions of a given

research specialty, and the very different ways in which investigators obtain and value evidence“

“As rationale for sharing research, reproducibility…risks reducing the research process to a set of mechanistic procedures”

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Page 39: Open data: Enhancing preservation, reproducibility, and innovation

REPRODUCIBILITY AS RATIONALE

• Where data deposit is required as condition of publication (e.g. Protein Data Bank), researchers will comply

• Data sharing more likely if• Materials/documentation are automated

• Data is not sensitive/no licensing restrictions apply

• Publication is completed

• Data is not part of a long-term study integral to researcher’s career

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Page 40: Open data: Enhancing preservation, reproducibility, and innovation

DISCUSSION

Is there a reproducibility crisis?

If so, to what extent can data sharing remedy the crisis?

Other questions/comments

Up next: data management plans & sharing for public use

Page 41: Open data: Enhancing preservation, reproducibility, and innovation

PUBLIC USE

Page 42: Open data: Enhancing preservation, reproducibility, and innovation

PUBLIC USE

Tax monies should be leveraged to serve the public good

Data should not be hoarded by researchers

Public understanding of research

Evidence-based advocacy

Education & teaching

Citizen science

Policymakers

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Data Sharing in the Sciences

Page 43: Open data: Enhancing preservation, reproducibility, and innovation

OPEN GOVERNMENT DATA

White House Office of Science & Technology Policy memo: “Expanding Public Access to the Results of Federally Funded Research” (Feb 2013)

digitally formatted scientific data resulting from unclassified research supported wholly

or in part by Federal funding should be stored and publicly accessible to search, retrieve,

and analyze.

Page 44: Open data: Enhancing preservation, reproducibility, and innovation

OPEN GOVERNMENT DATA

• Data is hard (or even impossible) to find

• Data can not be readily used

• Unavailable, unclear, restrictive licensing terms

https://blog.okfn.org/files/2017/06/FinalreportTheStateofOpenGovernmentDatain2017.pdf

Global Open Data Index (GODI): https://index.okfn.org/

“Measures the openness of government data according to the Open Definition”

Page 45: Open data: Enhancing preservation, reproducibility, and innovation

DATA MANAGEMENT POLICIES

• NSF

• NIH

• NEH

• NASA

• NOAA

• CDC

• Gates Foundation

http://dms.data.jhu.edu/data-management-resources/plan-research/funders-data-sharing-requirement/funder-data-related-mandates-and-public-access-plans/

Page 46: Open data: Enhancing preservation, reproducibility, and innovation

DATA MANAGEMENT PLANS

• Roles and responsibilities

• Description of data and metadata

• Storage, Backup and security

• Provisions for Privacy, confidentiality, intellectual property rights and other rights

• Data access and sharing

• Data reuse, redistribution and production of derivatives

• Archiving and preservation

University of Iowa Libraries. Data Management Plans. Licensed under CC BY. http://guides.lib.uiowa.edu/c.php?g=132111&p=900990`

Page 47: Open data: Enhancing preservation, reproducibility, and innovation

NSF DATA SHARING POLICY

What constitutes reasonable data management and access

…and reasonable length of time

will be determined by the community of interest through the process of peer review and program

management

NSF Data Management & Sharing FAQ. https://www.nsf.gov/bfa/dias/policy/dmpfaqs.jsp

Page 48: Open data: Enhancing preservation, reproducibility, and innovation

NSF DATA SHARING POLICY

Annual reports must include information on the progress on data management and sharing of

research products

Final project reports are to contain a more thorough updating of the original DMP, including

how your data is archived.

http://dms.data.jhu.edu/data-management-resources/plan-research/funders-data-sharing-requirement/funder-data-related-mandates-and-public-access-plans/

Page 49: Open data: Enhancing preservation, reproducibility, and innovation

PUBLIC HEALTH

WHO seeks a paradigm shift in the approach to information sharing in emergencies, from one limited by embargoes set for publication timelines, to open sharing using modern fit-for-purpose pre-publication platforms.

Opting in to data and results sharing should be the default practice and the onus should be placed on data generators and stewards at the local, national and international level to explain any decision to opt out from sharing data and results during public health emergencies

World Health Organization “Developing global norms for sharing data and results during public health emergencies”

Page 50: Open data: Enhancing preservation, reproducibility, and innovation

PUBLIC HEALTH

Many publishers, NGOs and research funders committed to free research sharing in light of the

Zika outbreak

Wellcome Trust. “Statement on data sharing in public health emergencies.“

Journal signatories will make all content concerning the Zika virus free to access. Any data or

preprint deposited for unrestricted dissemination ahead of submission of any paper will not pre-empt its

publication in these journals.

Funder signatories will require researchers undertaking work relevant to public health emergencies to

set in place mechanisms to share quality-assured interim and final data as rapidly and widely as

possible, including with public health and research communities and the World Health Organization.

Wiley, Taylor and Francis, and Elsevier are not signatories.

Page 51: Open data: Enhancing preservation, reproducibility, and innovation

SHERPA JULIET

Searchable database and single focal point of up-to-date information concerning funders'

policies and their requirements on open access, publication and data archiving.

http://v2.sherpa.ac.uk/juliet/

Page 52: Open data: Enhancing preservation, reproducibility, and innovation

DISCUSSION

Do you have experience with data management plans?

Up next: Data reuse

Page 53: Open data: Enhancing preservation, reproducibility, and innovation

DATA REUSE

Page 54: Open data: Enhancing preservation, reproducibility, and innovation

ASKING NEW QUESTIONS OF EXTANT DATA

• Encourages meta-analyses & data combination

• Exploring new questions and identifying new relationships

Page 55: Open data: Enhancing preservation, reproducibility, and innovation

HUBBLE SPACE TELESCOPE DATA REUSE

• General Observing (GO) paper: At least one author was investigator on the GO proposal that obtained the data.

• AR paper: No overlap between the paper authors and investigators on the GO proposal that obtained the data.

• GO+AR: Combination of GO data sets with AR data sets.

Adapted from Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The Open Research Challenge: Peer Review and Publication of Research Data“ Licensed under CC BY.

Royal Society June 2012, Science as an Open Enterprise, http://royalsociety.org/policy/projects/science-public-enterprise/report/

Papers based upon reuse of archived observations now

exceed those based on the use described

in the original proposal.

https://archive.stsci.edu/hst/bibliography/pubstat.html

Page 56: Open data: Enhancing preservation, reproducibility, and innovation

ASKING NEW QUESTIONS OF EXTANT DATA

• Assessing veracity requires domain expertise & misinterpretation is a serious risk

• Depends on extensive documentation & description

• The farther the user is from the point of data origin

• The more documentation required• The more effort required by reuser• Greater the risk of misinterpretation

• Benefits prospective users more than producers

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Data Sharing in the Sciences

Page 57: Open data: Enhancing preservation, reproducibility, and innovation

ADVANCING RESEARCH AND INNOVATION

• Data-intensive fields (astronomy, social sciences, economics)

• Comparisons across time and space (ecology, biology, sociology)

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Data Sharing in the Sciences

Page 58: Open data: Enhancing preservation, reproducibility, and innovation

ADVANCING RESEARCH AND INNOVATION

• Maximizing the use of data

• Increasing the impact of findings

• Progressing the state of research

• Laying broader foundation for knowledge

• Diversifying perspectives

Fischer, B.A., & Zigmond, M.J. (2010). The essential nature of sharing in science. Science and Engineering Ethics, 16(4), 783–799.

Page 59: Open data: Enhancing preservation, reproducibility, and innovation

DATA SHARING ASSOCIATED WITH CITATION IMPACT

Examined the citation history of 85 cancer microarray clinical trial publications with respect to the availability of their data.

The 48% of trials with publicly available microarray data received 85% of the aggregate citations

Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE2(3): e308. https://doi.org/10.1371/journal.pone.0000308

Page 60: Open data: Enhancing preservation, reproducibility, and innovation

DATA SHARING ASSOCIATED WITH CITATION IMPACT

Does not imply causation

• But there may be mechanisms in which data sharing did stimulate greater citations

• Exposure

• Reanalysis

• Enthusiasm and synergy around a specific research question

Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE2(3): e308. https://doi.org/10.1371/journal.pone.0000308

Page 61: Open data: Enhancing preservation, reproducibility, and innovation

SECTION 3

CONCERNS AND OPEN QUESTIONS

Page 62: Open data: Enhancing preservation, reproducibility, and innovation

RESEARCHER CONCERNS

• Data is competitive advantage

• Data is intellectual capital

• Time & effort required to prepare data for archiving

• Lack of recognition & other extrinsic incentives

• Concerns about data misinterpretation

Roche, D. G., Kruuk, L. E. B., Lanfear, R., & Binning, S. A. (2015). Public data archiving in ecology and evolution: How well are we doing?PLoS Biology, 13(11) doi:http://dx.doi.org/10.1371/journal.pbio.1002295

Page 63: Open data: Enhancing preservation, reproducibility, and innovation

OPEN QUESTIONS

• What data to share?

• What is sharing?

• What is interpretable and reusable?

• How to reward/give credit?

• How to document without extensive labor?

• How to handle misuse/misinterpretation?

• Restricting access/de-identification

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Data Sharing in the Sciences

Page 64: Open data: Enhancing preservation, reproducibility, and innovation

OPEN QUESTIONS

• Lack of demonstrated demand for research data outside genomics, climate science, astronomy, social science, demographics

• How open is it?

• Who owns the copyright? Is data public domain?

• How to validate data?

• Preserving data

Borgman, C. L. (2012), The conundrum of sharing research data. J Am Soc Inf Sci Tec, 63: 1059–1078. doi:10.1002/asi.22634

Data Sharing in the Sciences

Page 65: Open data: Enhancing preservation, reproducibility, and innovation

SECTION 4

WHERE TO ARCHIVE?

Page 66: Open data: Enhancing preservation, reproducibility, and innovation

WHERE TO ARCHIVE?

• Downloadable files on author’s webpage

• Repositories

• Publishers

• Torrents (academic torrents.com)

• APIs (Application Programming Interfaces)

Page 67: Open data: Enhancing preservation, reproducibility, and innovation

DATA REPOSITORIES

• Institutional Repositories

• Subject Repositories

http://oad.simmons.edu/oadwiki/Data_repositories

Page 68: Open data: Enhancing preservation, reproducibility, and innovation

DATA REPOSITORIES

• Government (data.gov, data.worldbank.org)

• Multidisciplinary (figshare)

Page 69: Open data: Enhancing preservation, reproducibility, and innovation

ACCESSING & USING OPEN DATA

• Open source software: R• rOpenSci (ropensci.org)

• rOpenGov (https://ropengov.github.io/projects/)

• Run My Code (http://www.runmycode.org)

• Google Public Data Explorer (https://www.google.com/publicdata)

www.r-project.org

Page 70: Open data: Enhancing preservation, reproducibility, and innovation

QUESTIONS?