44
Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support for data curation and preservation

Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

Embed Size (px)

Citation preview

Page 1: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

Liz Lyon Associate Director, Outreach

Chris Rusbridge, DCC Director

UK Digital Curation Centre One Year On

Digital Curation Centrea centre of support for data curation and preservation

Page 2: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

2

Overview

• Why is digital curation important?

• What are the challenges that the DCC faces?

• About the people and our collaborative approach

• Addressing the issues

• How can you contribute to the DCC?

Page 3: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

3

Curation?

“maintaining and adding value to a trusted body of digital information for

current and future use”

Page 4: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

4

For later use? In use now (and the future)?

Digital curation continuum

Data preservation Data curation

Static Dynamic

Page 5: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

5

Assuring permanent access to the records of science & the humanities?

Long term access to primary data

• Increasing data volumes from eScience and Grid-enabled / cyberinfrastructure applications

• Changing research paradigm: data-driven science, “big science”

• Observational data, simulations, large-scale experimentation

• Multi-media resources, statistical data, surveys, geo-spatial data……

Page 6: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

6

Page 7: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

7

Facilitate “post-processing” and knowledge extraction

Enable the acquisition of newly-derived information and knowledge

• Run complex algorithms over primary datasets

• Mining (data, text, structures)

• Modelling (economic, climate, mathematical, biological)

• Analysis (statistical, lexical, pattern matching, gene)

• Presentation (visualisation, rendering)

Page 8: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

8

Page 9: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

9

Provide additional functionality beyond digital preservation processes

Annotations

• Gene and protein sequences

• e-Lab books (Smart Tea Project in chemistry)

Page 10: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

10

Research & e-Science workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Data curation: databases & databanks

Validation

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Searching , harvesting, embedding

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Linking

The scholarly knowledge cycle : linking research data to publications

eBank UK Projecthttp://www.ukoln.ac.uk/projects/ebank-uk/

Emerging policy on open access to data

Page 11: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

11

DCC people (some of them…)

• Management & Co-ordination– Director Chris Rusbridge (University of Edinburgh)

• Community Support & Outreach– Led by Dr Liz Lyon (UKOLN, University of Bath)

• Service Definition & Delivery– Led by Professor Seamus Ross (HATII [ERPANET], University of

Glasgow)

• Development– Led by Dr David Giaretta (Astronomical Software & Services,

CCLRC)

• Research– Led by Professor Peter Buneman (Informatics, University of

Edinburgh)

Page 12: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

12

The challenges we face

Standards

• Interoperability issues: technical & hopefully soluble

Scale

• Volume and diversity of datasets

Culture

• Bringing communities together

• Library/information science/archives “document tradition”

• Domain research (chemists, astronomers, biologists)

• Computer science (databases)

• Commercial suppliers (storage technology)

Page 13: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

13

More challenges……

Process

• Highly-distributed organisation: use collaborative tools

Skills

• Distributed amongst the 4 partners & beyond

Engagement

• Lots of existing work and many significant players

Impact

• Visible & measurable, in the short & long-term

Meeting expectations (which are high…..)

• Of the community and our funders

Page 14: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

14

User requirements analysis

Commissioned study

• Leona Carpenter

• Reporting now

• Desk-based research

• Focus groups

• Interviews

Results will inform research, development service definition / delivery and outreach

Recommendations and priority tasks

Page 15: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

15

Some sound bytes…

R&D issues: Annotation services, Ontology development, Automating metadata creation, Tools and toolkits, Data Format Description Language, Identifiers, Registries, Economic and cost-benefits studies

Advisory services :“Ask-a-Curator”,FAQs, reports, briefings, awareness-raising materials, best practice guidance, Storage media, “Like Erpanet”, advise Government, Research Councils, funding bodies

Professional development: Short courses, conferences, seminars, workshops, secondments to DCC and to working repository services

Outreach: Leadership for the future, case studies, sharing solutions, collaboration with other partners, international peers, industry links

Taxonomy of “Users”

Page 16: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

16

Outline Taxonomy of digital curation users by role

1. Data Creators

2. Data Curators

3. Data Re-users

4. Policy makers

-funding bodies

-other leaders

Page 17: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

17

Outline Taxonomy of digital curation users by role

1. Data Creators

2. Data Curators

3. Data Re-users

4. Policy makers

-funding bodies

-other leaders

Data Preservers

Data publishers

Page 18: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

18

Outline Taxonomy by significant function of organisational entity

1. Research

2. Service provision

3. Learning & teaching

4. Funders

5. Policy / strategy makers

“Designated communities”

Page 19: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

19

Outline Taxonomy by significant function of organisational entity

1. Research

2. Service provision

3. Learning & teaching

4. Funders

5. Policy / strategy makers

“Designated communities”

Commercial

Page 20: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

20

Service definition & delivery• Advisory services

– Responses to queries—from legal to technical guidance [email protected]

– Site visits (National Institute of Environmental eScience)

• Information Services– Briefing Documents - Freedom of Information by Mags

McGinley– DIGITAL CURATION MANUAL– 20 chapters written by community experts e.g. Metadata

written by Michael Day, UKOLN– Peer-reviewed– Checklist for Compliance with best practices and standards– Technology Watch

Page 21: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

21

Services: workshops

• 2005 Programme – Preservation of medical databases:

24-25 May at the Gulbenkian Institute, Lisbon in collaboration with ERPANET & the Wellcome Trust

– Institutional repositories: 6 July at the University of Cambridge, UK in collaboration with DSpace

– Cost models in collaboration with the Digital Preservation Coalition July at British Library

– Persistent identifiers liaising with NISO, summer, UK location tbc

Page 22: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

22

Development approach• OAIS (Open Archival Information System)

linkage: focus on representation information – link to global work on format registries?– Concentrate on scientific data formats?

• Repository– Representation Information– Standards and Tools– Aim for OAIS compliance

• Persistent identifiers• Certification… RLG task force• Open development wiki and email list

Page 23: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

23

OAIS Reference Model – Functional Model

4-1.

2

MANAGEMENT

Ingest

Data Management

SIP

AIPDIP

queries

result setsAccess

PRODUCER

CONSUMER

Descriptive Info

AIP

orders

Descriptive Info

Archival Storage

Administration

Preservation Planning

How relevant to curation?

Page 24: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

24

Representation Net

Page 25: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

25

Representation Information More detail

How does this relate to format registries?

Page 26: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

26

High Level View

Example of use of Representation Information Labelling

Page 27: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

27

Registry issues?

• Trusted repository of Representation Information– Authenticity of information– Access control– Certificates/Digests : (are they trustable over the long

term?)• Findability

– Persistent IDs• What can we rely on?

– Labels (to support automated processing)• Extensibility• Distributed

Page 28: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

28

Registry development

• Simple PHP prototype

• Scoping study- unification– Formats, standards, tools

• More robust prototype in development– Based on ebXML & JAXR– Potentially distributed, cooperative

maintenance model

Page 29: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

29

Development Roadmap

• Registry: complete prototype, link to PRONOM, GDFR etc, handover to service

• Representation information: describe CCLRC (science) data using EAST, etc

• Certification work continues• Additional tools: metadata extraction• Testbeds, interactions with others

Page 30: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

30

Research approaches

• Publishing & integrating scientific databases• ‘Archiving’ past states of volatile databases• Database provenance and annotation• Organisational dynamics of trusted

repositories• Automating metadata extraction• Cost-benefit analysis of data curation• Rights and responsibilities

Page 31: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

31

The database picture

Source data Curated data: classified, cleaned, annotated, integrated, cross-linked

Page 32: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

32

Curated Databases are Central

Much/most scientific data is now in databases• They often do not contain source experimental data. Sometimes

just annotation/metadata• They borrow extensively from, and refer to, other databases• You are now judged by your data as well as your (paper)

publications!!• These databases are built and maintained with a great deal of

human or computational effort.

What makes a database?– it has internal structure or it changes.Size alone doesn’t qualify

Page 33: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

33

Archiving (preserving) volatile databases

• How do you preserve something that changes every hour or minute?– Important for the scientific record – someone might have

cited your data at time t.

• Current practice– Create versions (how often?)– Log changes – Use diffs– Do nothing (common!)

Page 34: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

34

Curated databases – some issues

• Integrating and publishing data so that someone else can use it.

• Annotating existing data and moving annotations to other databases

• Provenance: where did this data come from?

• Archiving: how do you preserve something that is constantly changing?

Page 35: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

35

How do we cite data?

• A URL or citation to an article is already unsatisfactory.– DCC client complaint: “I spend a lot of time

searching [electronic documents] for the part that is relevant to the citation.”

• The problem is much worse when you are citing something in a very large database.

• How do you use a citation to locate data?• How do you ensure that the citation

persists?– Connections with DB archiving and DOIs

Page 36: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

36

Research approaches

• Publishing & integrating scientific databases• ‘Archiving’ past states of volatile databases• Database provenance and annotation• Organisational dynamics of trusted

repositories• Automating metadata extraction• Cost-benefit analysis of data curation• Rights and responsibilities

– “Public domain, public interest, public funding” paper Waelde & McGinley

Page 37: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

37

www.dcc.ac.uk

Page 38: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

38

• www.ijdc.net

• Launch planned June/July

• Peer-reviewed contributions

• Peter Buneman Editor (research)

• Production editor Philip Hunter

Page 39: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

39

Sample issue

Full papers

Invited articles

News & views

Papers for submission are very welcome!

Page 40: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

40

1st DCC International Conference

• Location - Bath UK

• 29-30 September 2005

• Keynote speakers

Cliff Lynch CNI

Graham Cameron European Bio-informatics Institute

• DCC Research update

• Social highlights

Page 41: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

41

Associates Network

Goals

Develop understanding, share best practice, advance research, promote recognition, develop consensus

Membership

International groups, national bodies, industry partners, funders, research groups, HEIs, FEIs, individuals……

Benefits

Early access to R&D outputs, advisory services, training, input to definition and design, community participation

Discussion Forum www.dcc.ac.uk Please join us!

Page 42: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

42

CCLRC UKOLN

UofGUofE

CMS-Bristol

NIEeS

RG

Durham

WT-CFGLeicester

ICMaastricht

Oxford

Dutch NASwiss NAUrbino

UNC

Salzburg

SDSC

NEODC

CEH

RI

NCS

RLG

Innogen

NHS

Capri NTUAINRIAHUJUPCMax-

PlanckMIMAS

IASSIST

LDCACM

Data Archive

EDGGridPPEGEE

CambridgeLeicester

Jodrell Bank

DLI (US)DPC

DELOS

UNC

ESA

NASANARACNESESARLG

BNSC

TU Vienna UPenn

EBIMRC HGU

KyotoUSC

INRIA

GSK

Roslin

IBM Almaden

JHUCSIRO

CaltechJHU

CSIRO

CDSESO

OCLC

AHDSMicrosoft

IBMOracle

BTSTK

BADCBODC

ESO

IVOA

ResearchCouncils

HEIs&

FE

ResearchInstitutes

InternationalCollaborations

StandardsBodies

DPC

MIMAS

ILRT

Council forMuseums, Archives

& LibrariesRDN. OCLC

So’ton

OAI

NOF

NLA

NeSC

Page 43: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

Acknowledgements

Slides from Peter Buneman, David Giaretta and others used

with thanks.

Page 44: Liz Lyon Associate Director, Outreach Chris Rusbridge, DCC Director UK Digital Curation Centre One Year On Digital Curation Centre a centre of support

44

How you can help us

How does OAIS relate to curation?

How do format registries relate to representation information?

Who else is working across these areas?

What outcomes would you like to see?