Upload
hugo-besemer
View
79
Download
2
Embed Size (px)
Citation preview
Guest lecture: Library and data?
www.slideshare.net/hugobesemer (use on WURNET Chrome, Firefox)
20160920, Hugo Besemer
Two different things
●An example of data modelling challenges for the library or if you wish: data is dirty ....
●Data management planning at Wageningen University
2
Data is dirty
3
The problem
I am in the tenure track, the university wants me publish in “Q1” journals
My research is funded by NWO/EU/.... And they want me to publish in “Open access” journals
Journals catalogue
Open_access
QuartilesSelect title,issn from Journals where topics=“mine” INNER JOIN open_access.status=“yes” INNER JOIN Quartiles.quartile=“Q1” UNION ALL
topicstitle
Open access status
(boolean)
quartile
issn
issn
issn
Let’s look in Nottingham for online status’
6
But we can also go to Lund
7
Confusion from Amsterdam
8
Things change all the time
9
So we have learned....
ISSN (primary key) is ambiguous●so you need to harmonize data
Open access status is ambiguous ●Gold, Green or Hybrid●Discussion: which one do we take
There are several sources for online status●Discussion: which one do we take?
10
Journals catalogue
Quartiles
topicstitle
Romeo Sherpa (colours)
quartile
issn
issn
issn
Romeo Sherpa (colours)
DOAJ (Romeo gold)
issn
issnAPC
Hybrid publisher
issnAPC
issn
Now for the quartiles
12
Q1
Q2
Q3
Q4
How do we compare numbers
Scientist Z. Math has a publication from 2003 with 17 citations
Scientist M. Biology has a publication from 2009 with 24 citations
Baselines for Mathematics
Baselines for Molecular Biology
0
100
200
300
400
0 2 4 6 8 10 12
Years after publication
Cum
ulat
ive
no. c
itatio
ns
Baselinetop 10%top 1%
What does that mean for our E-R diagram?
Quartile distribution depends on topic
17
Journals catalogue
Quartiles
topicstitle
Romeo Sherpa (colours)
quartile
issn
issn
issn
Romeo Sherpa (colours)
DOAJ (Romeo gold)
issn
issnAPC
Hybrid publisher
issnAPC
issn
topics
19
Datamagement planning at Wageningen University
Wageningen UR policy – What’s in place
●Data management plan for PhD projects●Data management plans for research groups●Data management planning course●Options for data publishing●Code Repository●“Support hub”
20
Wageningen UR data policy – What needs to be resolved
Registration and accessibility of data for ongoing research Storage (security, “getting rid of external hard drives”) Research notes Legal issues?
21
Day-to-day issues (from a workshop for PE&RC)
We are human Synchronizing between different platforms Relationships between files What is a logical file / folder structure? Collaborating on files
22
Some terminology: retention
Retention: obligation to produce upon request data underlying publications for a certain time
Verification purposes or as a basis for further work Often required by scientific organizations or publishers The “Netherlands Code of conduct for Academic
Practice” requires 10 years Rule is seldom enforced
33
More terminology: ‘long term storage’’
‘Long term storage’ used in the DMP format ‘Long term’ meaning
●With sufficient documentation on project, file and parameter / variable level
●In a format that is usable in the future (so preferably “ flat files”)
34
More terminology: ‘publishing data’
We prefer “Data Publishing” as it implies making the data persistently accessible
That’s only possible in a service with a long-term mission It should come with a persistent identifier
independent of its current of future location
35
Persistent identifiers
http://hdl.handle.net/ 1902.1/UOVMCPSWOL
http://dx.doi.org/10.1594/PANGAEA.701380
36
Scheme / ResolverPrefix (identifying institution)Suffix (identifying this dataset)
To get a persistent identifier for your dataset you need to store it with a service, and the resolver will redirect users there
An example
37
An example (continued)
38
An example (continued 2)
39
Publish all data?
40
Services (1)
Discplinary services with a specific data model●EBI, NCBI (bioinformatics) example SRA●Pangaea (spatial)●GBIF (Biodiversity)
Generic (multidisciplinary) services
41
Services - (2)
42
* DANS 3TU Datacentrum
Dryad Figshare Zenodo
URL http://www.dans.knaw.nl/en/
http://datacentrum.3tu.nl/en/home/
http://datadryad.org/ https://figshare.com/
http://www.zenodo.org/
Single file size
unknown - 2GB 5GB 2GB
Total disk space n.a. n.a. Extra charge for
larger sets20 GB “Please be aware that we
cannot offer infinite space for free, so donations from heavy users towards sustainability are encouraged”
Paid € 2.85 per GB (WUR covers first 500 GB)
€ 3.50 per GB (WUR covers first 500 GB)
$120 (> 20 GB extra charge)
N N
Private/public
Public (part of royal Dutch Academy for Sciences – KNAW)
Public, owned by Dutch Technical Universities
Not-for-profit company governed by members
Private, Macmillan inc.
Public, CERN
Special relationships
Wageningen UR Library acts as front office
Wageningen UR Library acts as front office
Reduced fee or free for certain journals, see http://datadryad.org/pages/journalLookup
Embedded in PLOS article submission workflow
EU (output of the Openaire plus project and used for data in the EU data management pilot)
That’s all
43