Kevin Ashley_Sharing research data: benefits for the researcher, benefits for society

  • View
    238

  • Download
    2

  • Category

    Science

Preview:

Citation preview

Open data:Benefits for the researcher,

Benefits for Society

Kevin Ashley Digital Curation Centre

www.dcc.ac.uk@kevingashley

Kevin.ashley@ed.ac.uk

Reusable with attribution: CC-BY The DCC is supported by Jisc

2

A summary

• Why data reuse ?• What stops us ?• Related issues – software & methods• The case for reuse - again

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

3

An alternative summary

Being Selfish

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

What’s possible now

… and still benefiting others

Being Just Good Enough

Thanks to:Neil Chue Hong (@npch), Software Sustainability Institute

ORCID: 0000-0002-8876-7606David Flanders (@dfflanders), Dr Steven Manos

(DrStevenManos)University of Melbourne.

All my colleagues at the DCCCameron Neylon (@CameronNeylon)

4

My home – the DCC

• Mission – to increase capability and capacity for research data services in UK institutions

• Not just a UK problem – an international one

• Training, shared services, guidance, policy, standards, futures

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Kevin Ashley –ORD2015 - CC-BY 5

DATA REUSE HAPPENS – AND NOT ALWAYS IN THE WAY YOU EXPECT

2015-05-28

62015-05-28 Kevin Ashley –ORD2015 - CC-BY

The Old weather project

Data for research, not from research

7

Data reuse stories

• The palaeontologist who saved years of work with archaeological data

• The 19th-century ships logs that help us model climate change

• The ‘noise’ from research radar that mapped dust from Eyjafjallajökull

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

8

Should all data be open?

• NO• Many reasons – most to do with human

subjects• But data existence should always be open• Allows discovery & negotiation on use• Avoids pointless replication

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Kevin Ashley –ORD2015 - CC-BY 9

Some conundrums

• Releasing genome data is OK when it’s:– An identified human subject– An anonymous human subject– Your pet dog– Another mammal– An insect– A plant– A virus

2015-05-28

10

Data reuse - messages

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Often your data tells stories that your

publications do not

Not all data comes from other researchers

One person’s noise is another person’s signal

Discipline-bounded data discovery doesn’t give us

all we need or want

Kevin Ashley –ORD2015 - CC-BY 112015-05-28

Why care?

• Data is expensive – an investment• Reuse:

– More research– Teaching & Learning– Planning

• Impact – with or without publication• Accountability• Legal & regulatory requirements

12

Why does this matter?

• Research quality– How close can we get to

the truth?• Research speed

– How quickly can we get to the truth?

• Research finance– How much does the

truth cost?

• Improving one or more of these is of interest to all actors:

• Researchers as data creators

• Researchers as data reusers

• Research institutions• Funders – hence

government and society

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Kevin Ashley –ORD2015 - CC-BY 13

G8UK - Endorses OAOpen Data CharterPolicy Paper18 June 2013

2015-05-28

14

164 universities in UK*

*2011 HESA data

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

71 (43%) > 5% research income

115 (70%) > £1m income from research

Kevin Ashley –ORD2015 - CC-BY 15

£4.4 billion total research grants

=~PLN 26.6 billion2015-05-28

Kevin Ashley –ORD2015 - CC-BY 16

Business case for UK investment in data reuse

• National infrastructure costs £1.5m/year• 5 years before data reuse is fully active• 10,000 datasets per year captured• 1 in 100 datasets reused each year• £30,000 saved each time data is reused• Saving: £3m/year – twice the running cost

2015-05-28

Kevin Ashley –ORD2015 - CC-BY 17http://www.flickr.com/photos/sethw/113073189/

95% of research results are never published

Slide: Cameron Neylon2015-05-28

Kevin Ashley –ORD2015 - CC-BY 18http://flickr.com/photos/heymans/480396810/

If a million postdocs repeat a million experiments…

Slide: Cameron Neylon2015-05-28

Kevin Ashley –ORD2015 - CC-BY 19http://flickr.com/photos/cliche/120070310/

And 25% of those don’t work…

Slide: Cameron Neylon2015-05-28

Kevin Ashley –ORD2015 - CC-BY 20

…how much taxpayer’s money is that?

http://flickr.com/photos/luismimunoznajar/2093185804/Slide: Cameron Neylon2015-05-28

21

More benefits: patient safety

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

22

… and institutional reputation

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

23

BUT WHAT ABOUT ME BEING SELFISH?

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

24

Funders are making demands

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Kevin Ashley –ORD2015 - CC-BY 252015-05-28

Findable, citable data has value

• Important to link publications to data (and vice versa)• Increases citations – of data & publication• Increases reuse (hence value)• But effects exist even without publication, if data is:

– Archived– Citable– Discoverable

• All benefit – researcher; institution; publisher

26

Citability

• Making data available increases citations• Everyone – academic, funder, institution – loves

citations• Want evidence?

– Alter, Pienta, Lyle – 240%, social sciences *– Piwowar, Vision – 9% (microarray data)†– Henneken, Accomazzi – 20% (astronomy) #

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

† Piwowar H, Vision TJ. (2013) Data reuse & the open data citation advantage. PeerJ PrePrints 1:e1v1 http://dx.doi.org/10.7287/peerj.preprints.1v1

* Amy Pienta, George Alter, Jared Lyle, (2010) The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data.http://hdl.handle.net/2027.42/78307

# Edwin Henneken, Alberto Accomazzi, (2011) Linking to Data - Effect on Citation Rates in Astronomy. http://arxiv.org/abs/1111.3618

Kevin Ashley –ORD2015 - CC-BY 27

Traditional skills can win

• Google Flu gets it wrong: • Laze, D., Kennedy, R., King, G., and Vespignani, A. (2014). The Parable of

Google Flu: Traps in Big Data Analysis. Science, 343, Forthcoming.• The data tells us why:• Lazer, David; Kennedy, Ryan; King, Gary; Vespignani, Alessandro, 2014,

"Replication data for: The Parable of Google Flu: Traps in Big Data Analysis", http://dx.doi.org/10.7910/DVN/24823 UNF:5:BJh9WzZQNEeSEpV3EWs+xg== IQSS Dataverse Network [Distributor] V1 [Version]

• Personalisation ; suggested searches; other UI changes

2015-05-28

41 datasets – none bigger than

1 Mbyte

Data made available before

paper was published – result was immediate

impact

28

What stops data reuse• Loss• Destruction• Pride• Gluttony• Ineptitude• Concealment• Bureaucracy• Complexity• Procrastination• Lack of potential

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Kevin Ashley –ORD2015 - CC-BY 29

Excuses – and responses• “People will ask questions”

– So use a data centre or repository• “It will be misinterpreted”

– Stuff happens. Also, openness encourages correction• “It’s not interesting”

– Let others be the judge – your noise is my signal• “I might get another paper out of it”

– Up to a point. We might get more research out of it• “I don’t have permission”

– A real problem. But solvable at senior level• “It’s too bad/complicated” –see above• “It’s not a priority”

– Unfortunately, funders are making it so. But if you looked at the evidence, it would be your priority as well

2015-05-28

See e.g. Carly Strasser’s blog: http://datapub.cdlib.org/2013/04/24/closed-data-excuses-excuses/

30

Why open software?

• Quicker start• Better flexibility• Improved robustness• Increases collaborators• Greater research impact• Easier to work with industry• No added cost

– Caveat: over what you should already be doing

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

31

What’s software got to do with my research?

Slide: Neil Chue Hong

Kevin Ashley –ORD2015 - CC-BY 32

The research communityrelies on software

Do you use research software?

What would happen to your research without

software

Survey of researchers from 15 UK Russell Group universities conducted by SSI between August - October 2014. DOI: 10.5281/zenodo.14809

56% Develop their own software71%

Have no formal software training2015-05-28 Slide: Neil Chue Hong

33

The modern researcher…

• … worries about:– Data management

and analysis– Reproducible

research– Scalable simulations– Integration of

models and workflows

– CollaborationPicture of Otto Stern from Emilio Segre Visual Archives. Copyright American Institute of Physics. Used with permission

2015-05-28 Kevin Ashley –ORD2015 - CC-BYSlide: Neil Chue Hong

34

Open software is good for science and good for you

• Benefits– More collaborators– More citations– More benefit to others– Increased robustness– Increased reuse– Reduced replication of effort

• Far more than the drawbacks– More structured collaboration

2015-05-28 Kevin Ashley –ORD2015 - CC-BYSlide: Neil Chue Hong

Kevin Ashley –ORD2015 - CC-BY 35

Improve your research impact

Vandewalle (2012) DOI: 10.1109/MCSE.2012.632015-05-28

Slide: Neil Chue Hong

Kevin Ashley –ORD2015 - CC-BY 36

, it’

Victoria Stodden, AMP 2011 http://www.stodden.net/AMP2011/, Special Issue Reproducible Research Computing in Science and Engineering July/August 2012, 14(4)Howison and Herbsleb (2013) "Incentives and Integration In Scientific Software Production" CSCW 2013.

2015-05-28

software engineering (vs) software/data carpentry

Software carpenters craft their research

atop the digital infrastructure to produce novel

science.

Software engineers

maintain, own and operate

digital infrastructure.

Teaching researchers to code

Community exemplar:#SWCarpentry

F

Kevin Ashley –ORD2015 - CC-BY 38

Publishing data & software papers is easy

http://openresearchsoftware.metajnl.comhttp://bit.ly/softwarejournals

http://dx.doi.org/10.6084/m9.figshare.9422892015-05-28 Slide: Neil Chue Hong

Kevin Ashley –ORD2015 - CC-BY 392015-05-28

Kevin Ashley –ORD2015 - CC-BY 40

THERE’S HELP FOR DATA SHARING AS WELL

2015-05-28

Kevin Ashley –ORD2015 - CC-BY 412015-05-28

Kevin Ashley –ORD2015 - CC-BY 42

Roles and Responsibilities

What data to keep

2015-05-28

Kevin Ashley –ORD2015 - CC-BY 432015-05-28How to cite data

What data to keep

44

Acquire research data skills

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

45

Finally…

• Sharing data is good for you• It’s good for all of us• It isn’t as hard as you think – start today!

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

46

It’s amazing what people will share…

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

47

Data reuse from Hubble

2015-05-28 Kevin Ashley –ORD2015 - CC-BY

Recommended