28
a centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on data curation and preservation This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 Funded by: Dr Liz Lyon, DCC Associate Director Outreach Director, UKOLN, University of Bath, UK

A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Embed Size (px)

Citation preview

Page 1: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

Looking to the longer term: some perspectives on data curation

and preservation

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

Funded by:

Dr Liz Lyon,

DCC Associate Director Outreach Director, UKOLN, University of Bath, UK

Page 2: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

About UKOLN

• “a centre of expertise in digital information management”• Funding: Joint Information Systems Committee (JISC) +

Museums, Libraries & Archives Council (MLA)• Portfolio of R&D projects Delos, DRIVER, Grand Challenge• 29+ staff based at the University of Bath• Inform the library, information, education and cultural

heritage communities• Policy, advocacy at national level, build innovative Web-

based systems & services, R&D, e-journal Ariadne, workshops and conferences.

• http://www.ukoln.ac.uk/

Acknowledgement: Alex Ball, Grand Challenge Project

Page 3: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

UK Digital Curation Centre

• Digital Curation Centre• Funded by JISC & EPSRC• Development activities• Research agenda• Delivering services• Outreach Programme• http://www.dcc.ac.uk/

Page 4: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

Overview• Data curation and digital preservation issues • Draw on research and scholarship

perspectives• Data / information flows and the “business

process”• UK Digital Curation Centre activities

“maintaining and adding value to a trusted body of digital information for current and

future use”

Page 5: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Data-centric 2020 vision

Reference datasets as infrastructure?

Page 6: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

(Very simple) Product Research Cycle & Data Curation

Formulate ideas / hypothesis, test, experiment, observe, design: data

creation, collection & capture

Adding value: Data linking, annotation,

visualisation, simulation

(New) knowledge extraction: data mining, modelling, analysis, synthesis

e-Infrastructure

Open ?? access

Collaboration

Scholarly communications & Business transactions: data disclosure, publication, citation, discovery, re-use

Data management storage & validation: description, deposit,

self-archiving, preservation,

certification

Data processing

Data processingData processing

Data processing

Data processing

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

Page 7: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Maintenance Engineer Aircraft Lands

Visual Inspection

Provide Information

Quote Diagnos is

Brief Diagnos is / Prognos is

Check Diagnoses

Maintenance Procedure

Diagnos is Result

Release Engine

complete

Maintenance Result

Maintenance Analys t (Fleet Manager)

Detailed Diagnos is / Prognos is

Provide Further Details

Reques t Information

Sign-off Diagnos is

Analys t Decis ion

[ information required ]

[ diagnosis ]

DAME signal processing workflows using Grid Services

Domain Expert

Detailed Analys is

[ unknown ]

Reques t Further Details

Expert Decis ion

[ known ][ Clear ]

[ unknown ]

[ information required ]

[ diagnosis ]

[ fault unresolved ]

[ fault resolved ]

Rolls RoyceDS&SAirport

• RepoMMan: Repository Metadata and Management (Hull) using WS-BPEL

• Are your engineering workflows identified and described?

Workflowe-Scientist desktop?

Slide: Carole Goble

Page 8: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Research outputs in institutional repositories: engineering

Page 9: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

“JISC Vision”: a global landscape of federated repositories

fusion layer ‘repository federator’

repository repository repository repository repository

portal portal portal portal portal

heterogeneous - metadataformats, content formats,identifiers, packagingstandards

homogeneous - metadataformats, content formats,identifiers, packagingstandards

From Andy Powell: http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/presentations/jiie-jcs-2005/

• Multi-disciplinary, cross-sectoral

• National, institutional

• Different platforms

• Many format types: data, eprints, images, geospatial

• e-Framework and Information Environment context

• Define common + domain-specific + repository “services”

• Interoperability based on open standards, software tools

Page 10: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Pilot Engineering Repository Xsearch PerX http://www.engineering.ac.uk/

Page 11: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

Page 12: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

STEP ISO10303

Interoperability???

Page 13: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Repositories and OAIS Reference Model“an archive consisting of an organisation of people and systems that has

accepted the responsibility to preserve information and make it available for a Designated Community..an identified group of potential consumers who

should be able to understand a particular set of information”

4-1

.2

MANAGEMENT

Ingest

Data Management

SIP

AIPDIP

queries

result setsAccess

PRODUCER

CONSUMER

Descriptive Info

AIP

orders

Descriptive Info

Archival Storage

Administration

Preservation Planning

Page 14: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Assuring permanence: digital preservation• Trusted DR Audit Checklist for Certification Draft Research Libraries Group-NARA Taskforce 2005

Defined criteria: – Organisation– Functions, processes & procedures– Designated community & usability– Technologies & technical infrastructure

• Revised Checklist based on feedback and pilot audits (KB, BADC)

• Self-certification: DINI-Zertifikat: requirements & recommendations:– Server policy / Guidelines– Author support– Legal issues– Authenticity and integrity– Cataloguing– Access statistics– Long-term sustainability

• Has your repository / PLM been audited?

Page 15: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Interdisciplinary discovery• Validation, publication & discovery of data

models & schema• Harmonisation and normalisation of

metadata and semantics• Packaging standards: METS,

MPEG-21 DIDL• Formal high-level and domain ontologies• ePrints DC Application Profile

http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile

• eBank Application Profile crystallography data http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

• What data models and metadata schema are in place?

Page 16: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Persistent identifiers for data citation• How will they be used? We need use cases: depositor, author,

service provider, researcher, publisher?• Schemes: DOI, Handle, ARK, PURL• Global identification: express as http URIs• Data citation (human and machine-actionable)• Publication & citation of scientific primary data project National

Library for Science & Technology (TIB), University of Hanover, Germany. STD-DOI Project DOI registry for datasets http://www.std-doi.de

• Is there a data citation policy?

• What persistent identifiers have been assigned to your data?

Page 17: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Discovering data: eBank Project

Coles, S.J., Day, N.E., Murray-Rust, P., Rzepa, H.S., Zhang, Y., Org. Biomol. Chem., 2005, (10),1832-1834. DOI: 10.1039/b502828k

• Domain identifier: International Chemical Identifier (INChI) code• Google molecule using INChISlide from Simon Coles

Domain identifiers for engineering?

Page 18: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Format migration challenges? CAD Program Compatibility Chart http://www.okino.com/conv/filefrmt_cad.htm

Page 19: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Registry development

Page 20: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Development: Representation Information Registry Repository

• “DCC Approach to Digital Curation” based on OAIS• Representation Information Registry Repository • Prototype demonstrator: based on 2 key concepts to facilitate

sharing of the curation effort– Curation Persistent Identifier (CPID)– Descriptive “label” (structural, semantic, other metadata)

• Development of (M2M) tools and interfaces for creating, using and re-using representation information

• http://dev.dcc.ac.uk Wiki and email list

• EU CASPAR Integrated Project

• Task Force on the Permanent Access to the Records of Science http://www.casparpreserves.info/pages/1/index.htm

http://tfpa.kb.nl/

Page 21: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Registry APIAllows applications to talk to many different registry implementations e.g. GDFR, PRONOM, UDDI

•GUI Access and via Web browser http://registry.dcc.ac.uk

Page 22: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Adding value through annotation Research at the University of Edinburgh

• Scientific databases: Annotation scoping report

• New annotation model + prototype MONDRIAN

• Intuitive visual interface iMONDRIAN

• Annotate sets of values

• Support for querying annotations

Page 23: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Nature 23 March 2006 OTMI: Open Text Mining Interface

NaCTeMhttp://www.nactem.ac.uk/

Emerging tools: TerMine, GENIA, Cafetiere

Knowledge extraction:• Mining (data, text, structures)

• Modelling (economic, climate, mathematical, biological…)

• Analysis (statistical, lexical, gene….)

Page 24: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Supporting the community: Services• [email protected] • legal - technical guidance • Curation Manual 45 chapters planned

– Metadata (umbrella)– Open Source– Archival metadata– Preservation metadata– Selection & appraisal– Curating emails

• Briefing Papers– Curating emails – Digital repositories – Geospatial data – Data protection – eScience data

• Case studies

Page 25: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

DCC Case Study published: Wide Field Astronomy Unit

Page 26: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

Supporting the community: Outreach & Services • Workshops:

• Geospatial data, NeSC, 27 October• OAIS 5 year Review, October• Audit & Certification Forum, October• Records Management, L’pool 30 Nov• Curation & Preservation Training, Dec• 2007 Preservation of journals tbc• 2007 Legal environment tbc• 2007 Preparing for audit tbc

• Information Days British Library L’pool UCL

• 2nd International DCC Conference 21-22 November, Glasgow

• Keynotes: Hans F. Hoffmann, CERN, Clifford Lynch, CNI

Page 27: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

a centre of expertise in data curation and preservation

IMechE Workshop, London, 26th September 2006

DCC Phase 2: 2007-2010• Working more closely with data centres, e-Science

Programmes and Research Councils• SCARP Project: disciplinary approach• JISC Digital Repository Programme collaboration• RepInfo Registry service migration• Define self-assessment procedures and tools• Collaborate with CASPAR, DPE and PLANETS (EU-

funded Digital Preservation Projects)• Workshop Programme, International Conference 2007

Page 28: A centre of expertise in data curation and preservation IMechE Workshop, London, 26 th September 2006 Looking to the longer term: some perspectives on

University of Bath, 13 September 2006

a centre of expertise in data curation and preservation

Thank you.Questions?

[email protected]

Join the DCC Associates Network at www.dcc.ac.uk