47
Open data: Benefits for the researcher, Benefits for Society Kevin Ashley Digital Curation Centre www.dcc.ac.uk @kevingashley [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc

Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Embed Size (px)

Citation preview

Page 1: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Open data:Benefits for the researcher,

Benefits for Society

Kevin Ashley Digital Curation Centre

www.dcc.ac.uk@kevingashley

[email protected]

Reusable with attribution: CC-BY The DCC is supported by Jisc

Page 2: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

2

A summary

• Why data reuse ?• What stops us ?• Related issues – software & methods• The case for reuse - again

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 3: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

3

An alternative summary

Being Selfish

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

What’s possible now

… and still benefiting others

Being Just Good Enough

Thanks to:Neil Chue Hong (@npch), Software Sustainability Institute

ORCID: 0000-0002-8876-7606David Flanders (@dfflanders), Dr Steven Manos

(DrStevenManos)University of Melbourne.

All my colleagues at the DCCCameron Neylon (@CameronNeylon)

Page 4: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

4

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 5: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 5

DATA REUSE HAPPENS – AND NOT ALWAYS IN THE WAY YOU EXPECT

2015-05-28

Page 6: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

62015-05-28 Kevin Ashley –ORD2015 - CC-BY

The Old weather project

Data for research, not from research

Page 7: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

7

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 8: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

8

Should all data be open?

• NO• Many reasons – most to do with human

subjects• But data existence should always be open• Allows discovery & negotiation on use• Avoids pointless replication

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 9: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 9

Some conundrums

• Releasing genome data is OK when it’s:– An identified human subject– An anonymous human subject– Your pet dog– Another mammal– An insect– A plant– A virus

2015-05-28

Page 10: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

10

Data reuse - messages

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Often your data tells stories that your

publications do not

Not all data comes from other researchers

One person’s noise is another person’s signal

Discipline-bounded data discovery doesn’t give us

all we need or want

Page 11: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 112015-05-28

Why care?

• Data is expensive – an investment• Reuse:

– More research– Teaching & Learning– Planning

• Impact – with or without publication• Accountability• Legal & regulatory requirements

Page 12: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

12

Why does this matter?

• Research quality– How close can we get to

the truth?• Research speed

– How quickly can we get to the truth?

• Research finance– How much does the

truth cost?

• Improving one or more of these is of interest to all actors:

• Researchers as data creators

• Researchers as data reusers

• Research institutions• Funders – hence

government and society

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 13: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 13

G8UK - Endorses OAOpen Data CharterPolicy Paper18 June 2013

2015-05-28

Page 14: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

14

164 universities in UK*

*2011 HESA data

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

71 (43%) > 5% research income

115 (70%) > £1m income from research

Page 15: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 15

£4.4 billion total research grants

=~PLN 26.6 billion2015-05-28

Page 16: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 16

Business case for UK investment in data reuse

• National infrastructure costs £1.5m/year• 5 years before data reuse is fully active• 10,000 datasets per year captured• 1 in 100 datasets reused each year• £30,000 saved each time data is reused• Saving: £3m/year – twice the running cost

2015-05-28

Page 17: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 17http://www.flickr.com/photos/sethw/113073189/

95% of research results are never published

Slide: Cameron Neylon2015-05-28

Page 18: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 18http://flickr.com/photos/heymans/480396810/

If a million postdocs repeat a million experiments…

Slide: Cameron Neylon2015-05-28

Page 19: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 19http://flickr.com/photos/cliche/120070310/

And 25% of those don’t work…

Slide: Cameron Neylon2015-05-28

Page 20: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 20

…how much taxpayer’s money is that?

http://flickr.com/photos/luismimunoznajar/2093185804/Slide: Cameron Neylon2015-05-28

Page 21: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

21

More benefits: patient safety

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 22: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

22

… and institutional reputation

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 23: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

23

BUT WHAT ABOUT ME BEING SELFISH?

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 24: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

24

Funders are making demands

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 25: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 252015-05-28

Findable, citable data has value

• Important to link publications to data (and vice versa)• Increases citations – of data & publication• Increases reuse (hence value)• But effects exist even without publication, if data is:

– Archived– Citable– Discoverable

• All benefit – researcher; institution; publisher

Page 26: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

26

Citability

• Making data available increases citations• Everyone – academic, funder, institution – loves

citations• Want evidence?

– Alter, Pienta, Lyle – 240%, social sciences *– Piwowar, Vision – 9% (microarray data)†– Henneken, Accomazzi – 20% (astronomy) #

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Page 27: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 27

Traditional skills can win

• Google Flu gets it wrong: • Laze, D., Kennedy, R., King, G., and Vespignani, A. (2014). The Parable of

Google Flu: Traps in Big Data Analysis. Science, 343, Forthcoming.• The data tells us why:• Lazer, David; Kennedy, Ryan; King, Gary; Vespignani, Alessandro, 2014,

"Replication data for: The Parable of Google Flu: Traps in Big Data Analysis", http://dx.doi.org/10.7910/DVN/24823 UNF:5:BJh9WzZQNEeSEpV3EWs+xg== IQSS Dataverse Network [Distributor] V1 [Version]

• Personalisation ; suggested searches; other UI changes

2015-05-28

41 datasets – none bigger than

1 Mbyte

Data made available before

paper was published – result was immediate

impact

Page 28: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

28

What stops data reuse• Loss• Destruction• Pride• Gluttony• Ineptitude• Concealment• Bureaucracy• Complexity• Procrastination• Lack of potential

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 29: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 29

Excuses – and responses• “People will ask questions”

– So use a data centre or repository• “It will be misinterpreted”

– Stuff happens. Also, openness encourages correction• “It’s not interesting”

– Let others be the judge – your noise is my signal• “I might get another paper out of it”

– Up to a point. We might get more research out of it• “I don’t have permission”

– A real problem. But solvable at senior level• “It’s too bad/complicated” –see above• “It’s not a priority”

– Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well

2015-05-28

See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/

Page 30: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

30

Why open software?

• Quicker start• Better flexibility• Improved robustness• Increases collaborators• Greater research impact• Easier to work with industry• No added cost

– Caveat: over what you should already be doing

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 31: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

31

What’s software got to do with my research?

Slide: Neil Chue Hong

Page 32: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 32

The research communityrelies on software

Do you use research software?

What would happen to your research without

software

Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October 2014. DOI: 10.5281/zenodo.14809

56% Develop their own software71%

Have no formal software training2015-05-28 Slide: Neil Chue Hong

Page 33: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

33

The modern researcher…

• … worries about:– Data management

and analysis– Reproducible

research– Scalable simulations– Integration of

models and workflows

– CollaborationPicture of Otto Stern from Emilio Segre Visual Archives. Copyright American Institute of Physics. Used with permission

2015-05-28 Kevin Ashley –ORD2015 - CC-BYSlide: Neil Chue Hong

Page 34: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

34

Open software is good for science and good for you

• Benefits– More collaborators– More citations– More benefit to others– Increased robustness– Increased reuse– Reduced replication of effort

• Far more than the drawbacks– More structured collaboration

2015-05-28 Kevin Ashley –ORD2015 - CC-BYSlide: Neil Chue Hong

Page 35: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 35

Improve your research impact

Vandewalle (2012) DOI: 10.1109/MCSE.2012.632015-05-28

Slide: Neil Chue Hong

Page 36: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 36

, it’

Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.

2015-05-28

Page 37: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

software engineering (vs) software/data carpentry

Software carpenters craft their research

atop the digital infrastructure to produce novel

science.

Software engineers

maintain, own and operate

digital infrastructure.

Teaching researchers to code

Community exemplar:#SWCarpentry

F

Page 38: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 38

Publishing data & software papers is easy

http://openresearchsoftware.metajnl.comhttp://bit.ly/softwarejournals

http://dx.doi.org/10.6084/m9.figshare.9422892015-05-28 Slide: Neil Chue Hong

Page 39: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 392015-05-28

Page 40: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 40

THERE’S HELP FOR DATA SHARING AS WELL

2015-05-28

Page 41: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 412015-05-28

Page 42: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 42

Roles and Responsibilities

What data to keep

2015-05-28

Page 43: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

Kevin Ashley –ORD2015 - CC-BY 432015-05-28How to cite data

What data to keep

Page 44: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

44

Acquire research data skills

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 45: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

45

Finally…

• Sharing data is good for you• It’s good for all of us• It isn’t as hard as you think – start today!

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 46: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

46

It’s amazing what people will share…

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Page 47: Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

47

Data reuse from Hubble

2015-05-28 Kevin Ashley –ORD2015 - CC-BY