28
Research data management PROOF course Finding and controlling scientific literature and data TU/e, 2015 [email protected], TU/e IEC/Library Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original

Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Embed Size (px)

DESCRIPTION

research data management, data stewardship, research data management planning, research data labs, research data archives

Citation preview

Page 1: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Research data management

PROOF course Finding and controlling

scientific literature and data

TU/e, 2015

[email protected], TU/e IEC/Library

Available under CC BY-SA license, which permits copying and redistributing the material in any medium or format & adapting the material for any purpose, provided the original author and source are credited & you distribute the adapted material under the same license as the original

Page 2: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Agenda

1. Research data management [RDM]: what and why

2. RDM before your research: data management plan

[discussion]

3. RDM during your research: protecting and sharing your data via a data lab

4. RDM after your research: publishing and archiving your data via a data archive

Source: Research Data Netherlands / Marina Noordegraaf

Page 3: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Research data management [RDM]

RDM: caring* for your data with the purpose of

1. protecting their mere existence, and;

2. making them available to others - during and after your research project

Data sharing implies RDM, or: RDM prepares the way for sharing your data during and after the project

*Goodman A, et al. (2014) Ten simple rules for the care and feeding of scientific data. PLoS Comput Biol 10(4): e1003542. doi:10.1371/journal.pcbi.1003542

“Rule 3. Conduct science with a particular level of reuse in mind”

Page 4: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

During your research Because you work together with other researchers

After your research Because of scientific integrity: validating results by replication

requires data

Because of re-using results: data-driven science

Because your data are unique / not easily repeatable (long term observational data)

Because you benefit from it: increases your visibility and enhances the trustworthiness of your research

Why sharing research data? #1

Page 6: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

EC: Horizon 2020 #1Open research data pilot

“… aims to improve and maximise access to and re-use of research data generated by projects for the benefit of society and the economy.”

“Regarding the digital research data (…), the beneficiaries must: deposit in a research data repository and take measures to make it possible (…) to access, mine, exploit, reproduce, and disseminate – free of charge for any user (…) the data …”

“Participating projects will be required to develop a Data Management Plan(DMP), in which they will specify what data will be open.” [ italics mine ]

Page 7: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

The DMP should address:

1. Data set reference and name

2. Data set description

3. Standards and metadata

4. Data sharing

5. Archiving and preservation

EC: Horizon 2020 #2Open research data pilot: data management plan [DMP]

Research data should be:

1. Discoverable

2. Accessible

3. Assessable and intelligible

4. Useable beyond the original purpose

5. Interoperable

DMP template by 3TU.Datacentrum

Page 8: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

NWOpilot data management: scope

“The pilot applies to the following seven funding rounds:

Vici

Research talent (Social sciences)

Innovative public private partnership in ICT (Physical sciences)

Fund new chemical innovations (Chemical sciences)

HTM call (Hightech materials) (Technology foundation STW)

Urbanising deltas of the world of security and the rule of law (WOTRO)

Open programme (Earth and life sciences).”

Page 9: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

NWOpilot data management: additional information #1

“Researchers are expected to answer four questions about data management in the research proposal (data management section).”

“After a proposal has been awarded funding, the researcher should elaborate the section into a data management plan. Within four months of the research project being awarded funding, the researcher must have submitted the first version of the data management plan to NWO.”

“For this data management plan, NWO has chosen a template that matches the guidelines for data management from Horizon 2020 as closely as possible.” [italics mine]

Page 10: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

“During the pilot, the data management section will not be included in the decision about the awarding of funding.”

“NWO understands ‘data’ to be both collected, unprocessed data as well as analysed, generated data. (…). NWO only requests storage of data that are relevant for reuse. [italics mine]

NWOpilot data management: additional information #2

Page 11: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Research data managementdiscussion topics and questions

Storage and back-up

Where do you keep your research data?

Is there a back-up? Where?

Are data selections made? Not everything is to be stored but…?Metadata and documentation

Do you describe your research data? Who measured or collected what, when, how? Other context information?

Are you content with the way you document or describe your research data? Do you succeed in finding the right (version of your) research data?

Can other researchers understand and (re-)use your research data (during and after research)? Should they be able to?

Access and re-use

Who can access your research data?

What will happen to your research data when you leave TU/e?

Would you consider publishing your research data, i.e. to make them public available?

Page 12: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Data management plan assignment [ N=5 ]

Collection Observation during measurements (lab journal), measurement data (from apparatus, tiff files), simulation data, Matlab, Excel, PDF’s, Origin (creation of graphs), .csv, .ascii, questionnaire, SPSS, GIS

Storage, backup Own laptop, network drive, portable/external hard drive, cloud storage

(secondary backup), measurement-pc, user-pc

Documenta-tion Aimed at understanding and re-use: lab journal, accompanying Excel-/Word-

files naming, organizing data in folders + README’s, organized by data of acquisition and method of measurement

Access During your research: all users of the apparatus, access policy of network drive, SVN (version control + access control), under confidentiality, openly after publication, open

Sharing When your research is done: with colleagues, conferences, through university file servers, published as part of thesis (open), unknown

Preservation When your research is done and in the long run: DVD’s (raw and processed data), no archiving, data can be produced by running the models at any time, unknown

Page 13: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Source: Research Data Netherlands / Marina Noordegraaf

Protection against physical loss and destructionstorage, backupdata classification and retention; different treatment of different data

Protection against intellectual loss and unretrievability - using the correct dataMetadata, data documentation+ catalogue metadata, for discovery: creator & title data set, abstract …+ study metadata: more or less similar to the Methodology section of a paper: info on

provenance of data, workflow of data collection, instruments used, data validation + data-level metadata, for re-use by humans and machines, often embedded in software

packages: variable and code descriptions in tables or databases, codebook+ license-information: what are others allowed to do with your data?file-naming, organizing data in folders, versioning,using a relational database [ instead of Excel ]

Protection against unauthorized useaccess control

RDM during your researchprotecting and sharing your data

Page 14: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

File-naming

File-naming conventions help you find your data, help others to find your data and help track which version of a file is most current

A good file name distinguishes a file from files with similar subjects as well as different versions of the file

Avoid using special characters in a file name: \ / : * ? < > | [ ] & $ , .

Use underscores instead of periods or spaces to separate logical elements in a file name

Avoid very long names: usually 25 characters is sufficient length

Use descriptive names, indicative of the content

Names should include all necessary descriptive information independent of where it is stored

Include dates Include a version number on files Be consistent Add a readme.txt to each folder in which the

file naming and its meaning is explained

Source: File naming conventions<

Page 15: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

File organization

PAGE 156-3-2015

<Source: Beatriz Ramirez, Data management plan for the PhD project: development and application of a monitoring system to assess the impacts of climate and land cover changes on eco-hydrological processes in an eastern Andes catchment area

Page 16: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Dataverse Network: data lab for active research data where you may store your data in an organized and safe way clearly describe your data version control of your data arrange access to your data get recognition for your data [collaborate on your data]

Data lab surrogates: Google Drive, Dropbox,[ SURFdrive ], Beehub…

SURF Filesender [data transfer up to 100 Gb]

RDM during your researchdata labs

Storage and backup of data through DANS [Dutch Archiving and Networking Services]Data transfer: up to 2 Gb per datasetDataverse 3TU.Datacentrum: up to 50 Gb free

Page 17: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Workshop on Dataverse Network, by Leon Osinski

Workshop on Mendeley, by Rikie Deurenberg

We will contact you to ask if you’re interested!

RDM during your researchDataverse Network and Mendeley workshop

Page 18: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

On request (informal, peer to peer sharing)“Reinhart and Rogoff kindly provided us with the working spreadsheet from the RR analysis. With the working spreadsheet, we were able to approximate closely the published RR results. While using RR's working spreadsheet, we identified coding errors, selective exclusion of available data, and unconventional weighting of summary statistics.”

Herndon, T., Ash, M., Pollin, R. (2013), Does high public debt consistently stifle economic growth? : a critique of Reinhart and Rogoff

“I'd like to thank E.J. Masicampo and Daniel LaLande for sharing and allowing me to share their data…”

Daniël Lakens (2014), What p-hacking really looks like: A comment on Masicampo & LaLande (2012)

On a (personal) website“Let me start by saying that the reason why I put all excel files online, including all the detailed excel formulas about data constructions and adjustments, is precisely because I want to promote an open and transparent debate about these important and sensitive measurement issues.”

Thomas Piketty, My response to the Financial Times, HuffPost The Blog, 29-05-2014 ; originally published as Addendum: Response to FT, 28-05-2014

RDM after your researchsharing data after your research #1

Page 19: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Source: www.aukeherrema.nl

A data journalJournal of open psychology data, Geoscience data journal, Data in brief , Scientific data, Frontiers data reports

A data archive or repository Catalogues of research data repositories: Databib, Re3data.org Zenodo, Figshare, DANS, Dryad, B2SHARE 3TU.Datacentrum

+ small medium sized data sets, long tail data+ static data, ‘frozen’ data sets+ preferably nonproprietary software formats suitable for long term

preservation+ DOI’s [ persistent identifier for citability and retrievability ]+ open access+ long-term availability, Data Seal of Approval+ Data Citation Index (Thomson Reuters)+ self-upload (single data sets < 4Gb)+ special collections of related data sets

RDM after your researchsharing data after your research #2

Page 20: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Attach your data to your publication

“What research data and waste have in common is that’s worthwhile to reuse them.”

Lilliana Abarca-Guerrero (2014), A construction waste generation model for developing countries, PhD thesis TU/e, proposition 9

“Psychology journals should require, as a condition for publication, that

data supporting the results in the paper are accessible in an

appropriate public archive”

Daniël Lakens (2014), Psychology journals should make data sharing a

requirement for publication

RDM after your researchsharing your data of your PhD thesis

Page 21: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

RDMtime consuming and laborious but also…

“Oh yes, there are certainly benefits from this. Doing this once means it will be easier in the future (increased efficiency), so one benefit is reduced future opportunity costs. Other benefits include personal satisfaction and the indirect benefits that come from archiving and publishing in OA journals – I can now list the datasets and code on NSF Biosketches as a “product” resulting from previous funding. As I say in the post, I also expect future publications to be much easier to producebecause the data and code are well organized and annotated. I will be doing the same calculations for the next paper using these data/code and writing a follow-up post.” [ italics mine ]

Emilio M. Bruna

Page 22: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Data Coach [ website ]

Data librarian

Leon Osinski, Merle Rodenburg

Recommended readingVan den Eynden, Veerle e.a. (2011), Managing and sharing data: best practice for researchers, UK Data Archive

Van den Eynden, Veerle e.a. (2014), Managing and sharing research data: a guide to good practice, London: Sage [available via TU/e Library]

Recommended online course

Essentials 4 data support [English & Dutch]

Support

Page 23: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

Be prepared to share your data after your research because it’s required and because you benefit from it

Preparation = careful and responsible data management duringyour research

[You’ll receive an evaluation form after the course by e-mail. Don’t forget to fill it in.]

Source: Research Data Netherlands / Marina Noordegraaf

Wrap up

Page 24: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

1. Website IEC/Library [TU/e]: http://w3.tue.nl/en/services/library/

2. Data sharing increases visibility: http://dx.doi.org/10.7717/peerj.175

3. Data sharing enhances trustworthiness: http://dx.dor.org/10.1371/journal.pone.0026828

4. Data availability policy journals: http://www.nap.edu/openbook.php?record_id=10613&page=33

5. Data availability policy American Economic Review: https://www.aeaweb.org/aer/data.php

6. Data availability policy PLoS: http://www.plos.org/plos-data-policy-faq/

7. Data availability policy Nature: http://www.nature.com/authors/policies/availability.html

8. VSNU Code of Scientific Conduct (Dutch, revision 2014): http://www.vsnu.nl/files/documenten/Domeinen/Onderzoek/Code_wetenschapsbeoefening_2004_(2014).pdf

9. KNAW responsible research data management: https://www.knaw.nl/en/news/publications/responsible-research-data-management-and-the-prevention-of-scientific-misconduct?set_language=en

10. Research evaluators (Standard evaluation protocol 2015-2021): http://www.vsnu.nl/SEP

11. Radboud University research data policy: http://www.ru.nl/library/services-0/research/expert-centre/vm/policy-radboud/

12. TU/e Code of Scientific Conduct: http://www.tue.nl/en/university/about-the-university/integrity/scientific-integrity/

13. NWO and research data: http://www.nwo.nl/en/news-and-events/dossiers/datamanagement

URL’s of mentioned webpagesin order of appearance #1

Page 25: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

14. ZonMW Toegang tot data: http://www.zonmw.nl/nl/programmas/programma-detail/toegang-tot-data-ttdata/algemeen/

15. Horizon 2020 Guidelines on data management: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

16. Data management plan template (3TU.Datacentrum): http://datacentrum.3tu.nl/en/what-we-offer/data-management-plan/

17. Loss of data: http://www.cursor.tue.nl/en/news-article/artikel/doctorate-ends-in-drama-after-car-burglary-1/

18. Storage, back up of data: http://www.data-archive.ac.uk/create-manage/storage

19. Catalogue metadata: http://www.data-archive.ac.uk/create-manage/document/metadata

20. Study metadata: http://www.data-archive.ac.uk/create-manage/document/study-level

21. Data-level metadata: http://www.data-archive.ac.uk/create-manage/document/data-level

22. File naming: http://www.ncdcr.gov/portals/26/pdf/guidelines/filenaming.pdf

23. Organizing data: http://www.wageningenur.nl/en/Expertise-Services/Facilities/Library/Expertise/Write-cite/Research-data-1/Data-management-plans.htm [example 2]

24. Version control: http://www.data-archive.ac.uk/create-manage/format/versions

25. Using a relational database: http://geekgirls.com/category/office/databases/ , see also http://www.datacarpentry.org and http://dx.doi.org/10.1890/0012-9623-90.2.205

URL’s of mentioned webpagesin order of appearance #2

Page 26: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

26. Kien Leong (2010), The seven deadly spreadsheet sins: http://production-scheduling.com/seven-deadly-spreadsheet-sins/

27. Dataverse Network: http://www.dataverse.nl

28. Google Drive: https://www.google.com/drive/

29. Dropbox: http://www.dropbox.com

30. SURFdrive: https://surfdrive.surf.nl

31. Beehub: https://beehub.nl/system/

32. Data on request (Reinhart-Rogoff paper): http://dx.doi.org/10.1257/aer.100.2.573

33. Data on request (blog post Daniel Lakens): http://daniellakens.blogspot.nl/2014/09/what-p-hacking-really-looks-like.html

34. Data on personal website (Thomas Piketty): http://piketty.pse.ens.fr/en/capital21c2

35. Data journal: Journal of Open Psychology Data: http://openpsychologydata.metajnl.com/

36. Data journal: Geoscience Data Journal: http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2049-6060

37. Data journal: Data in brief: http://www.journals.elsevier.com/data-in-brief

38. Data journal: Scientific data: http://www.nature.com/sdata/

URL’s of mentioned webpagesin order of appearance #3

Page 27: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

39. Data journal: Frontiers data reports: http://www.frontiersin.org/news/Data_Reports_a_new_type_of_peer-reviewed_article_in_Frontiers_journals/1051?utm_source=FRN&utm_medium=ECOM&utm_campaign=TWT_FRN_1502_datareport

40. Research data catalogue: Databib: http://databib.org/

41. Research data catalogue: Re3data.org: http://service.re3data.org/search/results?term=

42. Publishing data: Zenodo: http://www.zenodo.org/

43. Publishing data: Figshare: http://www.figshare.com

44. Publishing data: DANS: http://www.dans.knaw.nl/en

45. Publishing data: Dryad: http://datadryad.org/

46. Publishing data: B2SHARE: https://b2share.eudat.eu/

47. Publishing data: 3TU.Datacentrum: http://data.3tu.nl/

48. Long tail research data: http://www.nature.com/neuro/journal/v17/n11/fig_tab/nn.3838_F1.html

49. Nonproprietary software formats: http://datacentrum.3tu.nl/fileadmin/editor_upload/File_formats/Digital_Preservation_Support_levels.pdf

50. Data Seal of Approval: http://www.datasealofapproval.org

URL’s of mentioned webpagesin order of appearance #4

Page 28: Research data management : [part of] PROOF course Finding and controlling scientific literature and data, Eindhoven University of Technology, 2015 / Leon Osinski

51. Data Citation Index (Thomson Reuters): http://wokinfo.com/products_tools/multidisciplinary/dci/

52. Self upload 3TU.Datacentrum: https://data.3tu.nl/account/signin/?next=/upload/

53. Data set underlying PhD thesis Lilliana Abarca-Guerrero: http://dx.doi.org/10.4121/uuid:31d9e6b3-77e4-4a4c-835e-5c3b211edcfc

54. PhD thesis Lilliana Abarca-Guerrero: http://repository.tue.nl/770952

55. Blogpost Daniël Lakens: http://daniellakens.blogspot.nl/2014/12/psychology-journals-should-require-data.html

56. Emilio M. Bruna, The opportunity cost of my #OpenScience… : http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-hours-690/

57. Data Coach: http://w3.tue.nl/en/services/library/about/services/datacoach/

58. Van den Eynden, V. e.a. Managing and sharing data: best practice for reseachers: http://www.data-archive.ac.uk/media/2894/managingsharing.pdf

59. Essentials 4 data support: http://datasupport.researchdata.nl/

URL’s of mentioned webpagesin order of appearance #4