Upload
platforma-otwartej-nauki
View
238
Download
2
Embed Size (px)
Citation preview
Open data:Benefits for the researcher,
Benefits for Society
Kevin Ashley Digital Curation Centre
www.dcc.ac.uk@kevingashley
Reusable with attribution: CC-BY The DCC is supported by Jisc
2
A summary
• Why data reuse ?• What stops us ?• Related issues – software & methods• The case for reuse - again
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
3
An alternative summary
Being Selfish
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
What’s possible now
… and still benefiting others
Being Just Good Enough
Thanks to:Neil Chue Hong (@npch), Software Sustainability Institute
ORCID: 0000-0002-8876-7606David Flanders (@dfflanders), Dr Steven Manos
(DrStevenManos)University of Melbourne.
All my colleagues at the DCCCameron Neylon (@CameronNeylon)
4
My home – the DCC
• Mission – to increase capability and capacity for research data services in UK institutions
• Not just a UK problem – an international one
• Training, shared services, guidance, policy, standards, futures
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
Kevin Ashley –ORD2015 - CC-BY 5
DATA REUSE HAPPENS – AND NOT ALWAYS IN THE WAY YOU EXPECT
2015-05-28
62015-05-28 Kevin Ashley –ORD2015 - CC-BY
The Old weather project
Data for research, not from research
7
Data reuse stories
• The palaeontologist who saved years of work with archaeological data
• The 19th-century ships logs that help us model climate change
• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
8
Should all data be open?
• NO• Many reasons – most to do with human
subjects• But data existence should always be open• Allows discovery & negotiation on use• Avoids pointless replication
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
Kevin Ashley –ORD2015 - CC-BY 9
Some conundrums
• Releasing genome data is OK when it’s:– An identified human subject– An anonymous human subject– Your pet dog– Another mammal– An insect– A plant– A virus
2015-05-28
10
Data reuse - messages
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
Often your data tells stories that your
publications do not
Not all data comes from other researchers
One person’s noise is another person’s signal
Discipline-bounded data discovery doesn’t give us
all we need or want
Kevin Ashley –ORD2015 - CC-BY 112015-05-28
Why care?
• Data is expensive – an investment• Reuse:
– More research– Teaching & Learning– Planning
• Impact – with or without publication• Accountability• Legal & regulatory requirements
12
Why does this matter?
• Research quality– How close can we get to
the truth?• Research speed
– How quickly can we get to the truth?
• Research finance– How much does the
truth cost?
• Improving one or more of these is of interest to all actors:
• Researchers as data creators
• Researchers as data reusers
• Research institutions• Funders – hence
government and society
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
Kevin Ashley –ORD2015 - CC-BY 13
G8UK - Endorses OAOpen Data CharterPolicy Paper18 June 2013
2015-05-28
14
164 universities in UK*
*2011 HESA data
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
71 (43%) > 5% research income
115 (70%) > £1m income from research
Kevin Ashley –ORD2015 - CC-BY 15
£4.4 billion total research grants
=~PLN 26.6 billion2015-05-28
Kevin Ashley –ORD2015 - CC-BY 16
Business case for UK investment in data reuse
• National infrastructure costs £1.5m/year• 5 years before data reuse is fully active• 10,000 datasets per year captured• 1 in 100 datasets reused each year• £30,000 saved each time data is reused• Saving: £3m/year – twice the running cost
2015-05-28
Kevin Ashley –ORD2015 - CC-BY 17http://www.flickr.com/photos/sethw/113073189/
95% of research results are never published
Slide: Cameron Neylon2015-05-28
Kevin Ashley –ORD2015 - CC-BY 18http://flickr.com/photos/heymans/480396810/
If a million postdocs repeat a million experiments…
Slide: Cameron Neylon2015-05-28
Kevin Ashley –ORD2015 - CC-BY 19http://flickr.com/photos/cliche/120070310/
And 25% of those don’t work…
Slide: Cameron Neylon2015-05-28
Kevin Ashley –ORD2015 - CC-BY 20
…how much taxpayer’s money is that?
http://flickr.com/photos/luismimunoznajar/2093185804/Slide: Cameron Neylon2015-05-28
21
More benefits: patient safety
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
22
… and institutional reputation
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
23
BUT WHAT ABOUT ME BEING SELFISH?
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
24
Funders are making demands
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
Kevin Ashley –ORD2015 - CC-BY 252015-05-28
Findable, citable data has value
• Important to link publications to data (and vice versa)• Increases citations – of data & publication• Increases reuse (hence value)• But effects exist even without publication, if data is:
– Archived– Citable– Discoverable
• All benefit – researcher; institution; publisher
26
Citability
• Making data available increases citations• Everyone – academic, funder, institution – loves
citations• Want evidence?
– Alter, Pienta, Lyle – 240%, social sciences *– Piwowar, Vision – 9% (microarray data)†– Henneken, Accomazzi – 20% (astronomy) #
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1
* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307
# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618
Kevin Ashley –ORD2015 - CC-BY 27
Traditional skills can win
• Google Flu gets it wrong: • Laze, D., Kennedy, R., King, G., and Vespignani, A. (2014). The Parable of
Google Flu: Traps in Big Data Analysis. Science, 343, Forthcoming.• The data tells us why:• Lazer, David; Kennedy, Ryan; King, Gary; Vespignani, Alessandro, 2014,
"Replication data for: The Parable of Google Flu: Traps in Big Data Analysis", http://dx.doi.org/10.7910/DVN/24823 UNF:5:BJh9WzZQNEeSEpV3EWs+xg== IQSS Dataverse Network [Distributor] V1 [Version]
• Personalisation ; suggested searches; other UI changes
2015-05-28
41 datasets – none bigger than
1 Mbyte
Data made available before
paper was published – result was immediate
impact
28
What stops data reuse• Loss• Destruction• Pride• Gluttony• Ineptitude• Concealment• Bureaucracy• Complexity• Procrastination• Lack of potential
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
Kevin Ashley –ORD2015 - CC-BY 29
Excuses – and responses• “People will ask questions”
– So use a data centre or repository• “It will be misinterpreted”
– Stuff happens. Also, openness encourages correction• “It’s not interesting”
– Let others be the judge – your noise is my signal• “I might get another paper out of it”
– Up to a point. We might get more research out of it• “I don’t have permission”
– A real problem. But solvable at senior level• “It’s too bad/complicated” –see above• “It’s not a priority”
– Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well
2015-05-28
See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/
30
Why open software?
• Quicker start• Better flexibility• Improved robustness• Increases collaborators• Greater research impact• Easier to work with industry• No added cost
– Caveat: over what you should already be doing
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
31
What’s software got to do with my research?
Slide: Neil Chue Hong
Kevin Ashley –ORD2015 - CC-BY 32
The research communityrelies on software
Do you use research software?
What would happen to your research without
software
Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October 2014. DOI: 10.5281/zenodo.14809
56% Develop their own software71%
Have no formal software training2015-05-28 Slide: Neil Chue Hong
33
The modern researcher…
• … worries about:– Data management
and analysis– Reproducible
research– Scalable simulations– Integration of
models and workflows
– CollaborationPicture of Otto Stern from Emilio Segre Visual Archives. Copyright American Institute of Physics. Used with permission
2015-05-28 Kevin Ashley –ORD2015 - CC-BYSlide: Neil Chue Hong
34
Open software is good for science and good for you
• Benefits– More collaborators– More citations– More benefit to others– Increased robustness– Increased reuse– Reduced replication of effort
• Far more than the drawbacks– More structured collaboration
2015-05-28 Kevin Ashley –ORD2015 - CC-BYSlide: Neil Chue Hong
Kevin Ashley –ORD2015 - CC-BY 35
Improve your research impact
Vandewalle (2012) DOI: 10.1109/MCSE.2012.632015-05-28
Slide: Neil Chue Hong
Kevin Ashley –ORD2015 - CC-BY 36
, it’
Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.
2015-05-28
software engineering (vs) software/data carpentry
Software carpenters craft their research
atop the digital infrastructure to produce novel
science.
Software engineers
maintain, own and operate
digital infrastructure.
Teaching researchers to code
Community exemplar:#SWCarpentry
F
Kevin Ashley –ORD2015 - CC-BY 38
Publishing data & software papers is easy
http://openresearchsoftware.metajnl.comhttp://bit.ly/softwarejournals
http://dx.doi.org/10.6084/m9.figshare.9422892015-05-28 Slide: Neil Chue Hong
Kevin Ashley –ORD2015 - CC-BY 392015-05-28
Kevin Ashley –ORD2015 - CC-BY 40
THERE’S HELP FOR DATA SHARING AS WELL
2015-05-28
Kevin Ashley –ORD2015 - CC-BY 412015-05-28
Kevin Ashley –ORD2015 - CC-BY 42
Roles and Responsibilities
What data to keep
2015-05-28
Kevin Ashley –ORD2015 - CC-BY 432015-05-28How to cite data
What data to keep
44
Acquire research data skills
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
45
Finally…
• Sharing data is good for you• It’s good for all of us• It isn’t as hard as you think – start today!
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
46
It’s amazing what people will share…
2015-05-28 Kevin Ashley –ORD2015 - CC-BY
47
Data reuse from Hubble
2015-05-28 Kevin Ashley –ORD2015 - CC-BY