Digital | Curation | Centre An Introduction to the UK Digital Curation Centre Dr Liz Lyon, DCC...

Preview:

Citation preview

Digital | Curation | Centre

An Introduction to the UK Digital Curation Centre

Dr Liz Lyon,

DCC Associate Director Outreach Director, UKOLN, University of Bath, UK

Funded by:

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

CURL/SCONUL Workshop

December 2005

2

Digital | Curation | Centre

Overview

• About the Digital Curation Centre– Organisation and structure

• What is digital curation?– e-Research cycle

• DCC activities– Development activity– Research agenda– Advisory services – Outreach programme

3

Digital | Curation | Centre

UK Digital Curation Centre

• Development activities

• Research agenda

• Delivering services

• Outreach Programme

• http://www.dcc.ac.uk/

4

Digital | Curation | Centre

DCC people (some of them…)

• Management & Co-ordination– Director Chris Rusbridge (University of Edinburgh)

• Community Support & Outreach– Led by Dr Liz Lyon (UKOLN, University of Bath)

• Service Definition & Delivery– Led by Professor Seamus Ross (HATII, University of Glasgow)

• Development– Led by Dr David Giaretta (Astronomical Software & Services, CCLRC)

• Research– Led by Professor Peter Buneman (University of Edinburgh)

5

Digital | Curation | Centre

For later use? In use now (and the future)?

What is digital curation?

Data preservation Data curation

Static Dynamic

“maintaining and adding value to a trusted body of digital information for current and future use”

6

Digital | Curation | Centre

(Very simple) e-Research Cycle and Data Curation

Formulate hypothesis / ideas, test, experiment, observe: data creation,

collection & capture

Adding value: Data linking, annotation,

visualisation, simulation

(New) knowledge extraction: data mining, modelling, analysis, synthesis

e-Infrastructure

Open access

Collaboration

Scholarly communications: data disclosure, publication, citation, discovery, re-use

Data management storage & validation: description, deposit,

self-archiving, preservation,

certification

Data processing

Data processingData processing

Data processing

Data processing

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

7

Digital | Curation | Centre

(Very simple) e-Research Cycle and Data Curation

Formulate hypothesis / ideas, test, experiment, observe: data creation,

collection & capture

Adding value: Data linking, annotation,

visualisation, simulation

(New) knowledge extraction: data mining, modelling, analysis, synthesis

e-Infrastructure

Open access

Collaboration

Scholarly communications: data disclosure, publication, citation, discovery, re-use

Data management storage & validation: description, deposit,

self-archiving, preservation,

certification

Data processing

Data processingData processing

Data processing

Data processing

8

Digital | Curation | Centre

9

Digital | Curation | Centre

Engineering Product Information

EPSRC Grand Challenge Project, Prof Chris McMahon, University of Bath

10

Digital | Curation | Centre

– Access Grid – Collaborative telematic art– Modify spaces for performers – Interplay: Hallucinations

11

Digital | Curation | Centre

Data capture & integration into research workflows

• R4L Repository for the Laboratory Project (JISC-funded) automated data capture from instrumentation, deposit of results (chemistry)

• SMART TEA electronic Laboratory notebook + annotations

12

Digital | Curation | Centre

(Very simple) e-Research Cycle and Data Curation

Formulate hypothesis / ideas, test, experiment, observe: data creation,

collection & capture

Adding value: Data linking, annotation,

visualisation, simulation

(New) knowledge extraction: data mining, modelling, analysis, synthesis

e-Infrastructure

Open access

Collaboration

Scholarly communications: data disclosure, publication, citation, discovery, re-use

Data management storage & validation: description, deposit,

self-archiving, preservation,

certification

Data processing

Data processingData processing

Data processing

Data processing

13

Digital | Curation | Centre

Learning & Teaching workflows

Research & e-Science workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Resource discovery, linking, embedding

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Resource discovery, linking, embedding

Deposit / self-archiving

Learning object creation, re-use

Searching , harvesting, embedding

Quality assurance bodies

Validation

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

The scholarly knowledge cycle.

Liz Lyon, Ariadne, July 2003.

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

© Liz Lyon (UKOLN, University of Bath), 2005

14

Digital | Curation | Centre

Disciplinary data-centres

15

Digital | Curation | Centre

eBank UK Project• Two key themes:

– Open access to datasets– Linking research data to publications and to learning

• UKOLN, University of Southampton, University of Manchester• e-Science application ‘Combechem’ : Grid-enabled combinatorial

chemistry + National Crystallography Service• Resource Discovery Network / PSIgate physical sciences portal

http://www.ukoln.ac.uk/projects/ebank-uk/

16

Digital | Curation | Centre

A data repository entry

17

Digital | Curation | Centre

Access to the underlying data: complex objects

ecrystals.chem.soton.ac.uk

18

Digital | Curation | Centre

Data descriptions• Validation, publication & discovery

of data models & schema• Managing complex objects • Metadata packaging standards

– METS– MPEG 21 DIDL

• Semantic descriptions– Formal controlled vocabularies– High-level and domain ontologies– Inter-disciplinary discovery

• Informal approaches Web 2.0 “folksonomies”

19

Digital | Curation | Centre

Trusted digital repositories

• Audit Checklist for Certification • Draft Report published August 2005• Research Libraries Group RLG-NARA

Taskforce • Defined criteria under 4 categories

– Organisation– Functions, processes & procedures– Designated community & usability– Technologies & technical infrastructure

20

Digital | Curation | Centre

OAIS Reference Model

4-1.

2

MANAGEMENT

Ingest

Data Management

SIP

AIPDIP

queries

result setsAccess

PRODUCER

CONSUMER

Descriptive Info

AIP

orders

Descriptive Info

Archival Storage

Administration

Preservation Planning

21

Digital | Curation | Centre

DCC: Development

• “DCC Approach to Digital Curation” based on the Reference Model for an Open Archival Information System (OAIS); ISO standard, 14721:

– Monitoring international standards– Development of a Representation Information

(RI) registry/repository (DCC-RR)– Recommendations for tools and methods for

generating Representation Information– Creating test-beds for digital curation tools

Development info – see

http://dev.dcc.ac.uk

for details of Wiki and email list open to all

22

Digital | Curation | Centre

(Very simple) e-Research Cycle and Data Curation

Formulate hypothesis / ideas, test, experiment, observe: data creation,

collection & capture

Adding value: Data linking, annotation,

visualisation, simulation

(New) knowledge extraction: data mining, modelling, analysis, synthesis

e-Infrastructure

Open access

Collaboration

Scholarly communications: data disclosure, publication, citation, discovery, re-use

Data management storage & validation: description, deposit,

self-archiving, preservation,

certification

Data processing

Data processingData processing

Data processing

Data processing

23

Digital | Curation | Centre

Persistent identifiers for data citation

• Identify use cases: depositor, author, service provider, reader, publisher, ?

• Schemes: DOI, Handle, ARK, PURL• Global identification: express as http URIs• Added value services: CrossRef, resolution service,

integration (Globus), look-up service• Domain identifiers: e.g. International Chemical Identifier

(INChI) codes• Google molecules using InChIs demo:

Peter Murray-Rust, University of Cambridge• DCC Workshop June 2005 Glasgow

24

Digital | Curation | Centre

One approach to data citation using DOIs

• Publication & citation of scientific primary data project National Library for Science & Technology (TIB), University of Hanover, Germany STD-DOI Project http://www.std-doi.de

• DOI registry for datasets• Data publication agents: World Data Center Climate,

GeoForschungsZentrum Potsdam • Data requirements: quality control, long-term curation,

use DOI resolver• Exemplar data citation:

– Kamm, H; Machon, L; Donner, S (2004): Gas chromatography (KTB Field Lab), GFZ Potsdam. doi:10.1594/GFZ/ICDP/KTB/ktb-geoch-gaschr-p

25

Digital | Curation | Centre

(Very simple) e-Research Cycle and Data Curation

Formulate hypothesis / ideas, test, experiment, observe: data creation,

collection & capture

Adding value: Data linking, annotation,

visualisation, simulation

(New) knowledge extraction: data mining, modelling, analysis, synthesis

e-Infrastructure

Open access

Collaboration

Scholarly communications: data disclosure, publication, citation, discovery, re-use

Data management storage & validation: description, deposit,

self-archiving, preservation,

certification

Data processing

Data processingData processing

Data processing

Data processing

26

Digital | Curation | Centre

Adding value: eBank linking data to publications

27

Digital | Curation | Centre

Linking research to learning - embedding eBank aggregator service in a science portal for student learners

28

Digital | Curation | Centre

Adding value through annotation

DCC Research at the University of Edinburgh

• Scientific databases: Annotation scoping report

• AstroDAS: distributed annotation servers in astronomy

• New annotation model + prototype: top-ranked demonstration at recent DB conference

29

Digital | Curation | Centre

DCC Research agenda

• Publishing & integrating scientific databases• ‘Archiving’ past states of volatile databases• Database provenance and annotation• Organisational dynamics of trusted

repositories• Automating metadata extraction• Cost-benefit analysis of data curation• Rights and responsibilities

– “Public domain, public interest, public funding” paper Waelde & McGinley

30

Digital | Curation | Centre

(Very simple) e-Research Cycle and Data Curation

Formulate hypothesis / ideas, test, experiment, observe: data creation,

collection & capture

Adding value: Data linking, annotation,

visualisation, simulation

(New) knowledge extraction: data mining, modelling, analysis, synthesis

e-Infrastructure

Open access

Collaboration

Scholarly communications: data disclosure, publication, citation, discovery, re-use

Data management storage & validation: description, deposit,

self-archiving, preservation,

certification

Data processing

Data processingData processing

Data processing

Data processing

31

Digital | Curation | Centre

Facilitate “post-processing” and knowledge extraction

Enable the acquisition of newly-derived information and knowledge

• Run complex algorithms over primary datasets

• Mining (data, text, structures)

• Modelling (economic, climate, mathematical, biological)

• Analysis (statistical, lexical, pattern matching, gene)

32

Digital | Curation | Centre

33

Digital | Curation | Centre

DCC Case Study published: Wide Field Astronomy Unit

34

Digital | Curation | Centre

Supporting the community• DCC Outreach & Services:

– HELPDESK@dcc.ac.uk (legal - technical guidance)

– Curation Manual 45 chapters planned, Briefing Papers

– Workshops: Future-proofing Institutional Web sites, Jan 19-20, London

– Information Days: regional– 1st International DCC

Conference, Bath Sept 2005 – PV2005 November,

Edinburgh– 2nd International Conference

November 2006 Glasgow tbc

35

Digital | Curation | Centre

• www.ijdc.net

• Peer-review Editorial Board

• Peter Buneman Editor (research)

• Production editor Richard Waller

• Papers for submission are very welcome!

• 1st issue soon….

36

Digital | Curation | Centre

Associates Network

Goals

Develop understanding, share best practice, advance research, promote recognition, develop consensus

Membership

International groups, national bodies, industry partners, funders, research groups, HEIs, FEIs, individuals……

Benefits

Early access to R&D outputs, advisory services, training, input to definition and design, community participation

Discussion Forum www.dcc.ac.uk Please join us!

37

Digital | Curation | Centre

Developing skills & collaboration

• NSF Report : “Data scientist”• Develop hybrid skills• Embed in u/g, p/g curriculum• Facilitate community

collaboration: – Researchers – Data centres – Libraries & archives

• New roles???• Achieve cultural change

Digital | Curation | Centre

Thank you.Questions?

e.lyon@ukoln.ac.uk

Join the DCC Associates Network at www.dcc.ac.uk

Recommended