43
a centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship: process, product and people This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 Funded by: Dr Liz Lyon, DCC Associate Director Outreach Director, UKOLN, University of Bath, UK

A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Embed Size (px)

Citation preview

Page 1: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

a centre of expertise in data curation and preservation

2nd International Digital Curation Conference, November 2006

Reflections on open scholarship: process, product and people

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

Funded by:

Dr Liz Lyon,

DCC Associate Director Outreach Director, UKOLN, University of Bath, UK

Page 2: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

a centre of expertise in data curation and preservation

2nd International Digital Curation Conference, November 2006

Three themes• How?

– Unpacking the title: open scholarship

• What? – Creating and using science-ready archives

• Who?– “Digital natives” as data scientists

Page 3: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

•Publicly available?

•Shared?

•Inclusive?

•Collaborative?

•Participative?

•Non-proprietary?What do we mean by “open”?

Page 4: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Scholarship today? “Open Access”

Page 5: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Data-centric 2020 vision

Data-driven science

Page 6: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Reference datasets as infrastructure

Page 7: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Research into neglected tropical diseases http://www.thesynapticleap.org/

“Open source science”

Page 8: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

http://openwetware.org/wiki/Main_Page

Page 9: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:
Page 10: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:
Page 11: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Synthetic biology: materials for (bio) mash-ups? Interesting IPR issues…..

Page 12: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Bioblog

Blogs, blogs and meta-blogs….

Page 13: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

The Tool Box?

http://www.flickr.com/photos/64696485@N00/13146762/

Page 14: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

The Peer Review Process?

Page 15: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

The Scientific Paper?

http://www.ch.ic.ac.uk/wiki2/index.php/Mauveine

Page 16: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Crystal Structure reports - data-rich scientific articles

• 3-d positional coordinates• Atomic motions• Molecular geometry• Chemical bonding• Crystal packing• Chemical behaviour arising

from structure

• Two dedicated IUCr journals: Acta Cryst. C, E

• Important part of scientific discussion in many other titles: Acta Cryst. B, D, F

Original slide: Brian McMahon, IUCr

Validation of data through publication

Page 17: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

• Data-centric scholarly “publications”

• Raw, primary, derived data integrated with interpretations

• Mandatory submission of data with text

Page 18: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

a centre of expertise in data curation and preservation

2nd International Digital Curation Conference, November 2006The database publication?

Page 19: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

http://declanbutler.info/blog/?p=58#more-58

The “mash-up”

Data from FAO, WHO

+

Google Earth

Page 20: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Pause for thought…..• Big science communities

– Grid-enabled applications– Large managed open data archives– Funder policy driver

• Small(er) science communities– Collaborative and social software– Evolving open wikis and blogs– Grassroots driver

• Curation and preservation issues– Burgeoning wiki and blog content– Web archiving

• Positioning of repositories???

Page 21: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Big science

Funder-mandated sharing?

Top down

Small science

Community culture Discipline? Institution?

Bottom up

“science-ready archives”

Page 22: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

• Laboratory protocols: common practice

• Instrumentation: proprietary software

• Standard specifications and formats

Data capture

Page 23: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

• Working towards standard specifications in the lab– Open Microscopy Environment OME– Medical imaging DICOM– Flow cytometry standard FCS– Mass Spectrometry Standards Working Group

mzData vs mzXML

• Laboratory management data systems in development

Page 24: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Maintenance Engineer Aircraft Lands

Visual Inspection

Provide Information

Quote Diagnos is

Brief Diagnos is / Prognos is

Check Diagnoses

Maintenance Procedure

Diagnos is Result

Release Engine

complete

Maintenance Result

Maintenance Analys t (Fleet Manager)

Detailed Diagnos is / Prognos is

Provide Further Details

Reques t Information

Sign-off Diagnos is

Analys t Decis ion

[ information required ]

[ diagnosis ]

DAME signal processing workflows using Grid Services

Domain Expert

Detailed Analys is

[ unknown ]

Reques t Further Details

Expert Decis ion

[ known ][ Clear ]

[ unknown ]

[ information required ]

[ diagnosis ]

[ fault unresolved ]

[ fault resolved ]

Rolls RoyceDS&SAirport

RepoMMan: Repository Metadata and Management (Univ Hull) using WS-BPEL

Workflow: m2m?

e-Scientist desktop?

Slide: Carole Goble

Page 25: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Silchester: A VRE for Archaeology

Page 26: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Harmonisation and normalisation • Standard Deposit API (GNU eprints, Dspace, Fedora)• Dublin Core Application Profile for ePrints (+ Eduserv)

• Requirements: richer metadata set, support for value-added services, version identification, appropriate copy (OA), citations

• Based on FRBR• Data model for scholarly works• Application profile includes simple and qualified DC properties

Page 27: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

The ePrints application profile• simple DC properties (the usual suspects … )

– identifier, title, abstract, subject, creator, publisher, type, language, format• qualified DC properties

– access rights, licence, date available, bibliographic citation, references, date modified

• new properties– grant number, affiliation institution, status, version, copyright holder

• properties from other schemes– funder, supervisor, editor (MARC relators) – name, family name, given name, workplace homepage, mailbox, homepage

(FOAF)• clearer use of existing relationships

– has version, is part of• new relationship properties

– has adaptation, has translation, is expressed as, is manifested as, is available as

• vocabularies – access rights, entity type, resource type and status

Slide: Julie Allinson, UKOLN, Andy Powell, Eduserv

Page 28: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Use DC Application Profile for ePrints?

Page 29: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Data description and discovery• Validation, publication & discovery of

data models & schema• eBank Application Profile

http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

• Harmonisation and normalisation of metadata and semantics

• DOI http://dx.doi.org/10 .1594/ecrystals.chem.soton.ac.uk/145

• Rights & Citation policy http://ecrystals.chem.soton.ac.uk/rights.html

• Crystallography: a community working together

Page 30: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Aggregator services

Institutional data repositories

Deposit , Validation

Publication

Validation

Data analysis, transformation, mining, modelling Search,

harvest

Presentation services / portals

Data discovery, linking, citation

Laboratory repository

Deposit

eCrystals ‘Global Federation’ Model 23/10/2006

Publishers: peer-review journals, conference proceedings, etc

Curation

Preservation

Subject Repository

Institution Library & Information Services

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0Data creation

& capture in “Smart lab”

Data discovery, linking, citation

Search, harvest

Search, harvest

Deposit

Deposit

Deposit

Page 31: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Data deposit & sharing: roles and responsibilities

• Funder• Institution• Faculty• Individual

Noor et al PLoS Biol 4(7) 2006

Page 32: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

eBank Project exemplar

Adding value: aggregating & linking data + interpretations

Page 33: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

“Repository wow-factor”…

…or adding value through user interface tools…

Page 34: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Facilitating use and re-use: text mining tools

• Adding value

Page 35: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Second pause for thought…

• We need to work with instrument suppliers

• We need to understand more about workflow

• We need to develop new ways of adding value to datasets through innovative user tools and services

• We need more evidence of how data is used and re-used (or not…)

Page 36: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Getting the skills mix• Communities, teams, individuals• International Virtual Observatory

Alliance– Global community– Virtual organisation

• Multi-disciplinary team approach– eBank Project exemplar: computer

scientists, domain scientists (chemists), digital library experts

– Lessons learnt: e-Science Human Factors Audit Report 2006 Roy Kawalsky, Loughborough

• NSF Report 2005 Long-lived digital data collections – “Data scientist”

Page 37: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

?

Wanted! data scientist

Page 38: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Digital natives as data scientists?• eBank Project: assessing role of

research data in u/g Chemical Informatics and MChem courses at Univ. of Southampton

• Pedagogic evaluation by Grainne Conole

• Report imminent….

Page 39: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

“Well basically I’ve done nothing like it before, so it’s the first time I’ve sort of delved into computing or computational chemistry … quite nice, quite enjoyed starting off with just like a string of data and pop it into say a database, just a flat string of numbers basically and then come out with a crystal structure, which is exactly what it should represent which is quite cool”

“There were several parts to the course – We started off with how to get 2D and 3D representations of molecules onto a computer using a one-dimensional format, a SMILE string …so just ways of like getting data into a format so that it can be easily shared between different computers or different people without having to change lots of things”

Source: Grainne Conole

Page 40: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

New skills requirements:

• interdisciplinary

• quantitative

• data curation

Integrate within the curriculum

Wingreen & Botstein Mol Cell Biol 7, 2006

Page 41: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

Final pause for thought…

• Various approaches to develop and obtain digital curation skills

• Skills are there but often in discrete communities: we need to bring communities together (like at this conference…)

• Integration within the curriculum: undergraduate students, library & information science, archival studies, computer science

• Provide recognition and a career path for emerging data scientists

Page 42: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

a centre of expertise in data curation and preservation

2nd International Digital Curation Conference, November 2006

Take home messages

• Scholarship is changing fast

• Big science and open source science both create significant digital curation challenges

• Science-ready archives are the goal

• Native data scientists are coming

• The culture will change too……….

Page 43: A centre of expertise in data curation and preservation 2 nd International Digital Curation Conference, November 2006 Reflections on open scholarship:

a centre of expertise in data curation and preservation

2nd International Digital Curation Conference, November 2006

Thank you….