29
(Linked) Data Curation challenges Kevin Ashley Director, Digital Curation Centre www.dcc.ac.uk [email protected] Reusable with attribution: CC-BY The DCC is supported by Jisc

(Linked) Data Curation challenges

  • Upload
    hilda

  • View
    54

  • Download
    0

Embed Size (px)

DESCRIPTION

(Linked) Data Curation challenges. Kevin Ashley Director, Digital Curation Centre www.dcc.ac.uk [email protected]. Reusable with attribution: CC-BY. The DCC is supported by Jisc. Acknowledgements. John Wilkins & Cameron Neylon Ideas, images, slides, inspiration. - PowerPoint PPT Presentation

Citation preview

Page 1: (Linked) Data Curation challenges

(Linked) Data Curation challenges

Kevin AshleyDirector, Digital Curation Centre

[email protected]

Reusable with attribution: CC-BYThe DCC is supported by Jisc

Page 2: (Linked) Data Curation challenges

2

Acknowledgements

• John Wilkins & Cameron Neylon• Ideas, images, slides, inspiration

2013-07-05 Kevin Ashley – CC-BY

Page 3: (Linked) Data Curation challenges

3

Data views and processes

• Administration• Discovery• Work-level description• Discipline-level interpretation

2013-07-05 Kevin Ashley – CC-BY

Page 4: (Linked) Data Curation challenges

4

Administrative view

2013-07-05 Kevin Ashley – CC-BY

Data from projects funded by NERC

Data produced by the department of linguistics

Page 5: (Linked) Data Curation challenges

5

Discovery view

2013-07-05 Kevin Ashley – CC-BY

Data about reproductive behaviour in freshwater fish

Page 6: (Linked) Data Curation challenges

6

Work-level description

2013-07-05 Kevin Ashley – CC-BY

Page 7: (Linked) Data Curation challenges

72013-07-05 Kevin Ashley – CC-BY

Page 8: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 82013-07-05

Page 9: (Linked) Data Curation challenges

9

Data is variable

• Not always textual• Not always tabular• Not always fixed• Not always clearly authored – think of archival

provenance• Not always associated with publication

2013-07-05 Kevin Ashley – CC-BY

Page 10: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 10http://www.flickr.com/photos/sethw/113073189/

95% of research results are never published

2013-07-05

Page 11: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 11http://flickr.com/photos/heymans/480396810/

If a million postdocs repeat a million experiments…

2013-07-05

Page 12: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 12http://flickr.com/photos/cliche/120070310/

And 25% of those don’t work…

2013-07-05

Page 13: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 13

…how much taxpayer’s money is that?

http://flickr.com/photos/luismimunoznajar/2093185804/2013-07-05

Page 14: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 142013-07-05

I need that data now!!! I don’t care how messy it is – I

can fix it!

I’ve wasted too much of my life fixing other’s people’s bad

data. I’m not interested until you’ve cleaned it up and

documented it. Besides, I have other things to think about

Page 15: (Linked) Data Curation challenges

15

Grandfather’s axe

2013-07-05 Kevin Ashley – CC-BY

[email protected] CC-BY-NC-SA

When is my dataset a new dataset?

Page 16: (Linked) Data Curation challenges

16

Authorship

• Reference data – cell-level provenance versus single author data table

• ‘Cleaned’ data – can pass through many hands• Synthesis…

2013-07-05 Kevin Ashley – CC-BY

Page 17: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 172013-07-05

Page 18: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 182013-07-05

Page 19: (Linked) Data Curation challenges

19

Potential wins

• Provenance of machine-gathered data – linking observations to instrument descriptions

• Linking data in multiple places• Data and publications and plans• Robust assertions about data versioning• Association of data with institutions

2013-07-05 Kevin Ashley – CC-BY

Page 20: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 20

networks of people…2013-07-05

Page 21: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 212013-07-05

Page 22: (Linked) Data Curation challenges

22

More wins

• Assertions at table and variable group level• Linking that crosses disciplinary boundaries:– Biochemistry and neuroscience– Naval history, economics and climate science

• Linking that crosses research and administrative boundaries

2013-07-05 Kevin Ashley – CC-BY

Page 23: (Linked) Data Curation challenges

23

IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases

2013-07-05 Kevin Ashley – CC-BY

After John WIlbanks

Page 24: (Linked) Data Curation challenges

24

Tylenol

2013-07-05 Kevin Ashley – CC-BY

N-acetyl-p-aminophenolAcetaminophen

ParacetamolSameAsN-(4-hydroxyphenyl)ethanamideN-(4-hydroxyphenyl)acetamide

Page 25: (Linked) Data Curation challenges

25

“I never had an idea that couldn’t be improved by sharing it with as

many people as possible…”

Bill Hooker (2006)http://3quarksdaily.blogs.com/3quarksdaily/2006/10/the_future_of_s_1.html

2013-07-05 Kevin Ashley – CC-BY

Page 26: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 26

IdeaDevelo

p

Fund

PlanRecord

Process

Publish

Read

2013-07-05

Page 27: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 27

IdeaDevelo

p

Fund

PlanRecord

Process

Publish

Read

2013-07-05

Page 28: (Linked) Data Curation challenges

Kevin Ashley – CC-BY 28

IdeaDevelo

p

Fund

PlanRecord

Process

Publish

Read

2013-07-05

Page 29: (Linked) Data Curation challenges

29

Challenge? Opportunity

• Linked data can improve administration of research and research data

• The real potential is in improving research quality and efficiency

• The same actors can’t do both• The actions don’t need to be in lock-step

2013-07-05 Kevin Ashley – CC-BY