Transcript
Page 1: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

Digital | Curation | Centre

Digital Curation Centre www.dcc.ac.uk

Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and Seamus Ross

Funded by:

Page 2: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

2

Digital | Curation | Centre

Session Overview

1. Introduction & Briefing

2. Towards a Technical Model of Digital Curation: our R&D

3. Planning Delivery of Services & the Associates Network

Page 3: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

3

Digital | Curation | Centre

1. Introduction & Briefing

• Background story on the DCC ‘So who’s that new kid on the block?’

• What is digital curation anyway? – ‘adding value’ & ‘ensuring longevity’

• Aims & objectives for the DCC– ‘improving the quality of what is done’

• Our planning & our progress– timelines & deliverables

• How does this relate to the JISC Programme?

Page 4: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

4

Digital | Curation | Centre

Background to the DCC (1)

• Two parallel policy concerns1. Neglect of digital heritage, especially given investment in digitsation

programmes• JISC Continuing Access and Digital Preservation Strategy, 2002-2005

– eLib Programme, eLib3, Circular 5/97: Digital Preservation

• Digital Preservation Coalition formed in 2002

2. Differing data sharing practices in eScience, especially given huge data volumes

• Links between eScience Programme and JISC

• Report commissioned by JISC Cttee for Support of Research (Lord & Macdonald, May 2003)– twin drivers: Digital Preservation & Continuing Access (e-Science)– identified need for national digital curation centre

Page 5: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

5

Digital | Curation | Centre

Interpretation of JISC policy

JISC plays 3 roles1. promotes, supports & develop management & preservation of

institutional and community digital materials for UK benefit2. partner to Research Council/AHRB & other national/international bodies3. as organization, appropriate grant conditions for JISC-funded creation of

digital resources; good practice for JISC created/managed materials

• “escalating scale and complexity of digital resources to be curated and the subsequent urgency of developing a critical mass of expertise, shared services and tools, for long-term digital preservation … require a step change in investment and approaches.

– “Over the next three years a greater emphasis on development of production services and tools … needed to build on previous research studies and projects.”

• “Digital preservation remains a challenging area in which techniques, costs, and skills are still in development: advocacy, dissemination and training, to embed preservation needs as appropriate in JISC funding programmes.”

Page 6: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

6

Digital | Curation | Centre

Interpreting the implementation plan

• Risk assessment studies, eg ePrints– Calls to implement studies’ recommendations for services and integration

of preservation activity & standards into repositories funded by JISC.• Series of community calls to support records management and

digital preservation in institutions - cf FOI compliance.

• Establish Digital Curation Centre to: • Provide central focus of skilled staff & research

• links to wider network of development activity, researchers, & services

• Develop set of central services, standards, and tools • for a range of distributed digital data centres & preservation services, • across the Information Environment & Research Grid.

• JISC Partnership funding, – eg Web-archiving study: jointly funded by JCIE and Wellcome Trust

»

• Digital Preservation Coalition as an independent entity with JISC membership and sector activity supported by JISC.

• National preservation of e-journals, through RLN/RSLG

Page 7: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

7

Digital | Curation | Centre

Back to the DCC Background (2)

• JISC Circular 6/03, initially issued June 2003– Call postponed, revised & re-issued with more

significant research component– Joint funding: JISC and e-Science Core Programme – £750K pa (outreach, services & development) £250K pa

(research)

– Unlikely that any single organisation could do what’s expected

– Expressions of Interest & Full Proposals from Consortia– Final selection made in December 2003– Negotiations & clarification in January 2004

Page 8: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

8

Digital | Curation | Centre

Designation of DCC

• Task entrusted to Consortium of four institutional partners – Universities of Edinburgh (lead), Glasgow & Bath together with

CCLRC (Rutherford Appleton and Daresbury Laboratories)– brought together through the National eScience Centre

• jointly managed by Universities of Edinburgh & Glasgow

• Two 3-year awards made:– JISC funding started on 1st March 2004– EPSRC grant-funded starts on 1st September 2004

• Phase One set-up– some ‘early deliverables’ of website & helpdesk– preparation for full operation & launch of services in October– planning formal opening for early November 2004

Page 9: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

9

Digital | Curation | Centre

Responsibilities across the DCC

• Them with titles …– Peter Burnhill, Director (Phase One)

with Robin Rice, Phase One Project Co-ordinator• (from EDINA & Data Library, University of Edinburgh)

– Peter Buneman Research Director (& PI on EPSRC grant)• Professor of Informatics, University of Edinburgh

– Liz Lyon, Associate Director (Community Support & Outreach)• Director of UKOLN, University of Bath

– Seamus Ross, Associate Director (Service Definition & Delivery)• Director of HATII [ERPANET], University of Glasgow

– David Giaretta, Associate Director (Development)• Head of Astronomical Software & Services, CCLRC

• Two significant & well known ‘Ex Portfolio’ names– Malcolm Atkinson, Director, NeSC– Chris Rusbridge, Director, Information Services, UofGlasgow

Page 10: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

functional management & collaboration

Industry

research collaborators

standards bodies

testbeds& tools

communities of practice: users

community support & outreach

research

development co-ordination

service definition & delivery

management & admin support

curation organisations eg DPC

Collaborative Associates Network of DataOrganisations

Page 11: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

11

Digital | Curation | Centre

What is this digital curation anyway?

The term Digital Curation is a new invention. • Digital Data Curation Task Force - Report of Strategy Discussion Day

(2002)– citing Tony Hey citing use by Dr John Taylor, Director General of the Research

Councils, to distinguish the actions involved in caring for digital data beyond its original use, from digital preservation. The concept’s reach extends beyond libraries.

– • The e-Science Curation Report (2003) proposed the following distinctions:

– Curation : managing & promoting the use of data from point of creation, to ensure fit- for-contemporary-purpose, available for discovery & re-use.

• For dynamic datasets this may mean continuous enrichment or updating to keep it fit for purpose.

• Higher levels of curation will involve maintaining links with annotation & with other published materials.

– Archiving : curation activity which ensures that data are properly selected, stored, can be accessed

• logical and physical integrity is maintained over time, including security and authenticity. – Preservation : activity within archiving in which specific items of data are maintained

over time so that they can still be accessed and understood through changes in technology.

Page 12: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

12

Digital | Curation | Centre

digital curation: ... digital objects and data, over their life-cycle, for current & future generations of use ...

= f(data curation & digital preservation)• data curation [when high current/ongoing interest]

– actions needed to maintain and utilise digital data & research results over entire life-cycle

– data creation & management; adding value; generating new sources of information & knowledge, for use

• digital preservation [for longevity;fall off in interest]– long-run technological/legal accessibility & usability– storage, maintenance & accessibility of information content in

digital material over the long-term, for use• OAIS concept of designated community

Digital curation redefined ...

Page 13: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

13

Digital | Curation | Centre

Data curation in action

• Astronomy• Integrating and analysing distributed data (AstroGrid)• publishing multi-TB sky surveys (SuperCOSMOS & WFCAM)• interoperability standards (IVO Alliance)

• BioInformatics• data publishing: generic tools for XML export (EBI Biomart)• annotation tools for massive data sets (Pubmed, VOTable)• archiving tools for dynamic data sets (biological DBs)

• Environmental sciences• spatio-temporal annotation (OS Mastermap/ Mouse Atlas)

• Document management• Repository certification (RLG Task Force)

Page 14: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

14

Digital | Curation | Centre

Digital preservation approaches

• Migration & Refreshment

• Emulation & Encapsulation

• Digital Archaeology & Rescue

• Document Format Specification

Robin Rice & Najla Semple, http://www.lib.ed.ac.uk/sites/digpres/

Page 15: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

15

Digital | Curation | Centre

Communities of Practice: Social Sciences (IASSIST)

• History of sharing – economical in terms of both data collector and respondent

• Data about humans – problems of confidentiality confronted early on

• Mixed blessing of agreed proprietary formats (OSIRIS, SPSS, etc.) allows migration

• ‘Future-proofing’ - 30 years of data advocacy! – Tradition of data archiving & data citation – Building new data standards out of common experience

• data archivists, & data librarians: the new digital curators?• www.iassistdata.org

Page 16: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

16

Digital | Curation | Centre

Unifying Themes for D C C

• ‘data as evidence’– for one or more designated communities

• ‘archival responsibility’– at one or more institutional levels– with institutional policies & individuals’ competence

• engage/discover communities of practice, to invoke/provoke good practices

– appraisal & retention/disposal– logical & physical integrity: authenticity/security

• research problems in productive research domains– eg Informatics, Law School

Page 17: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

17

Digital | Curation | Centre

Aims & Objectives for the DCC

‘quality improvement in data curation & digital preservation’ – Initial focus: data as evidence for scholarly conclusions– Wider remit: worlds of scholarly communication & eLearning

• twin aims:excellence in research & excellence in service• need to bridge across communities:

– universities & research institutes– scientific data tradition & document tradition– multi-sectoral, international

Page 18: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

18

Digital | Curation | Centre

We are all curators now ...

• The term “curation” builds on our understanding of the word “curator”– who keeps something for the public good, value of which often needs to

be brought out by the curator.

1. this open context implies more support for explicit policies with regard to data sharing, and it has major implications for structuring and tools.

2. the digital curator as ‘store-keeper’ closely linked to promoting new science, looking forward to identify new ways to serve present and future researchers.

• digital curator should take an active role in promoting and adding value to holdings

– manage the value of collection– adding links and annotation to provide context– recording provenance of changes made

Page 19: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

19

Digital | Curation | Centre

Planning & Progress

• We must plan for the Long, with our 2020 Vision - 15yrs– we have large territory, and large expectation

• multi-disciplinary, multi data type, multi tradition/profession

• national and international, but also local and hidden from view

• a lot is going on

– how to ensure that we do something sensible with the ££’s and the trust we have been given?

– who/what should we plan to affect/effect?• policy-makers; ‘responsible curators’; (researchers?)

• how do we wish to be judged, and when?• collaboration & win-win-win scenarios

Page 20: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

20

Digital | Curation | Centre

focii of attention in set-up phase

• Users: client, peer and policy communities– outreach & community support; service definition/delivery;

development co-ordination; research agenda

– user requirements analysis: Leona Carpenter (Focus Groups) • Consortium: ‘organisation’ from partner participation

– roles; commitment; norming/performing; operational communication; consortium agreement (IPR)

• Employers: institutional settings– re-deployment/appointments; accommodation;

commitment/reporting

-> Project Plan, as living document

Page 21: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

21

Digital | Curation | Centre

• weekly AccessGrid/telecon; two face2face meetings– defining programme of deliverables; re-deploying & recruiting

staff; planning appointment of full time director in time for Launch

• early ‘deliverables’:

– www.dcc.ac.uk with links, presentations & progress updates

[email protected] for contacts & offers of collaboration

• project plan submitted to JISC, late May 2004

• defining R & D programme & services for deliveryeg curation architecture; repository of tools & technical information

• engaging curators in existing community of practice

Phase One Progress, March -

Page 22: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

Digital | Curation | Centre

Towards a Technical Model of Digital Curation: our R&D

David Giaretta

Funded by:

Page 23: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

23

Digital | Curation | Centre

What can we rely on in the Long Term

• The bits - BIT PRESERVATION • Paper documents that people can read

– ISO standards

• The information we collect – either in the far future DCC or its successor

• Some kind of remote access• Some kind of computers• People?

Page 24: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

24

Digital | Curation | Centre

Preservation “vs” Current Use

• There are already very many architectures to support immediate use of information– Including JISC architecture– Aim to support these

• Therefore chose to be guided by– long-term preservation aspects– to promote this we should emphasise

“interoperability” and “automated use” as far as possible.

– based initially on OAIS Reference Model – but add other ideas later

– bear e-Science in mind

Page 25: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

25

Digital | Curation | Centre

OAIS Reference Model – Functional Model

4-1.2

MANAGEMENT

Ingest

Data Management

SIP

AIPDIP

queries

result setsAccess

PRODUCER

CONSUMER

Descriptive Info

AIP

orders

Descriptive Info

Archival Storage

Administration

Preservation Planning

Page 26: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

26

Digital | Curation | Centre

OAIS – Preservation Planning - key aspects

• Representation Net

• Designated Communities & Knowledge Base

Page 27: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

27

Digital | Curation | Centre

Representation Net

Page 28: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

28

Digital | Curation | Centre

Preservation IssuesGiven a file or a stream of bits how does one know what

Representation Information is needed (this question applies to Representation Information itself as well as to the digital objects we are primarily interested in preserving and using); how does one know, for example, if this thing is in FITS format?

• Someone may simply “know” what it is and how to deal with it i.e. the bits are within the Knowledge Base

• One may be able to recognise the format by looking for various types of patterns.

• One may feed the bits into all available interpreters to see which accept the data as valid

• Other means…. • The only safe way: have an associated label which points to

the appropriate Representation Information– Note this does not exclude the other methods e.g. for data

rescue

Page 29: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

29

Digital | Curation | Centre

High Level View

Example of use of Representation Information Labelling

Page 30: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

30

Digital | Curation | Centre

Implications• A label must be attached to each piece of digital object as a necessary (but

not sufficient) condition for long-term preservation –logical attachment or packaging TBD by the DCC.

• The label should at least identify Representation Information. For long-term preservation this label must therefore be a DCC persistent identifier.

– allow some normalisation• In order for the Representation Information to be persistent then it should

either be held with the data object itself or be part of a central repository – part of the DCC. Thus the DCC needs a DCC Representation Information Repository. This repository would include

– a Format Repository (covering structural information) *automated use would be supported by use of formal description languages such as EAST (ISO 15889, http://east.cnes.fr/ ) or DFDL (http://forge.gridforum.org/projects/dfdl-wg/)

– a Semantic Repository with, for example, Data Dictionaries and Ontologies – Software Repository – with appropriate emulation capabilities

• Each piece of digital RI is also a digital object – which is understood either by the users’ Knowledge Base OR by further Representation Information. Therefore each piece of RI also has a label pointing to further RI.

Page 31: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

31

Digital | Curation | Centre

Designated Community

• Techniques must be created for – defining a Knowledge Base – linking a Knowledge Base to a Designated

Community – linking Representation Information to a

Knowledge Base if possible

Page 32: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

32

Digital | Curation | Centre

Representation Information (1)

• Structure – including Formats– Distinguish

• formats which are used mainly for rendering – to be followed by human inspection, and

• formats used for automated processing

• Implications:– Representation Information Repository

should define selected file formats using EAST and DFDL

– Definitions should include scientific objects and humanities objects

Page 33: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

33

Digital | Curation | Centre

Representation Information (2)

• Semantics– Hard problem

• start with Data Dictionaries

– Implications: the Representation Information Repository

should include Data Dictionaries, followed by more general semantics

Page 34: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

34

Digital | Curation | Centre

Representation Information (3) Time Dependent Information

– Many, perhaps most, datasets change over time and the state at each particular moment in time may be important. It may be useful to break the issue into separate parts.

• at each moment in time we could, in principle, take a snapshot and store it. That snapshot has its associated Representation Net.

• efficient storage of a series of snapshots may lead one to store differences or include time tags in the data (see for example P.Buneman, S. Khanna, and Wang-Chiew Tan. On the Propagation of Deletions and Annotations through Views. Proc.21st ACM Sym. on Principles of Database Systems.).

– Additional Representation Information would be needed which describes how to get to a particular time's snapshot from the efficiently encoded version.

– Also applies to ANNOTATION – who said what and when did they say it– Implications:

• These are area of active research within the consortium and the DCC should be able to provide

– advice and well tested tools for certain forms of efficient encoding of time dependent information

– advice on annotation – identifiers and Representation, perhaps in the form of software, for the associated

encodings

Page 35: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

35

Digital | Curation | Centre

Representation Information (4)

• Actions and Processes (Behaviour?)– Some information has, as an integral part of its content, an

implicit or explicit process associated with it – this could be argued to be a type of semantics, however it is probably sufficiently different to need special classification. An examples of this is a database or other time dependent or reactive system such as a Neural Net.

– Emulations – Universal Virtual Computer (UVC)– Implications:

• Support Software emulation via a UVC (possibly based on JVM)

• Support time dependent or reactive systems

Page 36: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

36

Digital | Curation | Centre

Persistent Ids

• Implications:– Use of existing, or creation of new, infrastructure

(standards, protocols, servers etc) for persistent IDs with adequate flexibility and longevity

• as part of the succession planning, agreement would be needed with appropriate organisation to act as backup and inheritor of DCC data.

Page 37: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

37

Digital | Curation | Centre

Archival Information Package

Page 38: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

38

Digital | Curation | Centre

Preservation Description Info

Page 39: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

39

Digital | Curation | Centre

AIP implications – PDI

• define standard Preservation Metadata – based initially on OCLC work – including Michael Day’s work and also CCLRC work etc

• define adequate Packaging technique – almost certainly XML based

• define recommended tools and procedures for creating Fixity Information such as checksums and digests, together with associated Representation Information

• investigate authentication systems

Page 40: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

40

Digital | Curation | Centre

Audit and Certification

• Implications:– facilitate production of standard(s) on which a certification

program can be based – work to establish accreditation and certification body in

preparation for offering audit and certification services – audit, certification and accreditation are potential sources

of long term funding for the DCC – software certification will require testbeds and testing

procedures. • Hardware and software systems will need to be purchased,

hired or borrowed. The DCC associates would be useful partners.

• We might expect Commercial software to be offered to us by the manufacturer for testing

• Testing commercial software could be fee based.

Page 41: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

41

Digital | Curation | Centre

Implications for Research• Research needed on Representation Information (Structure and Semantics) e.g.

– Investigate fundamental limitations of bit-level descriptions and existing tools. – Contribute to DFDL definition– Investigate capabilities needed to describe rendered format (including Word, PDF etc)

• Data Virtualisation – define Science objects and “Humanities” objects• Research is needed to:

– Support Software emulation via a UVC (possibly based on JVM) – Support time dependent or reactive systems

• Research is needed to provide a solid basis on which we can develop persistent IDs with adequate flexibility and longevity

• Research is needed to allow the DCC to: – define standard Preservation Metadata – based initially on OCLC work – define adequate Packaging technique – almost certainly XML based – investigate authentication systems with a view to preparing recommendations for users

and consider offering, for example, a (fee-based) key storage service. • A rigorous theoretical basis must be put in place from which we can create

techniques for: – defining a Knowledge Base – linking a Knowledge Base to a Designated Community – linking Representation Information to a Knowledge Base if possible

Page 42: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

42

Digital | Curation | Centre

Curation Manual

• Put in place quickly using international experts

• Updates annually

• Build to “curation encyclopaedia”

Page 43: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

43

Digital | Curation | Centre

Document format specification

• They borrowed from records management tradition - institutions to create documents in standard or open formats, which are easier to preserve.

• Much easier to do in a strict records management environment with a published policy of retention schedules and a clear knowledge of internally produced records.

• Stipulating a specific file format is harder in a research environment where a wide range of digital materials are produced and have to be preserved.

• The move to DDI DTD in social science data world may be seen as an example of this preservation technique.

Page 44: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

44

Digital | Curation | Centre

Services & Development• Turns Research into ‘Products for Research’ that our

communities can use with confidence– tracking and testing tools and standards

• that are correct, usable, reliable, well documentede.g. for ingest, repository management, data exchange, ontologies

• working with tool developers wherever possible• developing testbeds & interworking with other testbeds

– aim to gain leverage formats• working with other projects worldwide• using generic tools and techniques

– to develop strategies for emerging digital formats

– Metadata standards• long-term viability of metadata

• Registries underpin, to provide basis of Advisory Service

Page 45: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

Digital | Curation | Centre

Scientist

Research Process

Secondary(derived)

data

Tertiarydata for

publication

Primary publication

Secondarypublication

Tertiarypublication

PeerReview

Pre-prints& e-Prints

Publicationarchives

Library - Peers - Public - Industry

PublicationProcess

Primary data

Web Content

Patent data

Research ProcessLevel 1curation

© Philip Lord, 2003

Page 46: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

Digital | Curation | Centre

Scientist

Research Process

Secondary(derived)

data

Tertiarydata for

publication

Primary publication

Secondarypublication

Tertiarypublication

PeerReview

e-Prints

Publicationarchives

Library - Peers - Public - Industry

PublicationProcess

Primary data

Web Content

Patent data

Research Process

Researchbased on

data

Metadata

Archivist

© Philip Lord, 2003

Level 2curation

Archiveddata

Page 47: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

Digital | Curation | Centre

Scientist

Research Process

Secondary(derived)

data

Tertiarydata for

publication

Primary publication

Secondarypublication

Tertiarypublication

PeerReview

e-Prints

Publicationarchives

Library - Peers - Public - Industry

PublicationProcess

Primary data

Web Content

Patent data

Research Process

Researchbased on

data

Metadata

CurationCurator

Curation Process

Data repositories

© Philip Lord, 2003

Level 3curation

Archiveddata

Page 48: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

48

Digital | Curation | Centre

Faith in the medium

?

Page 49: Digital | Curation | Centre Digital Curation Centre  Peter Burnhill, Michael Day, David Giaretta, Liz Lyon, Robin Rice, Bridget Robinson and

49

Digital | Curation | Centre

Faith in the technology