28
A centre of expertise in digital information management www.ukoln.ac.u k UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra Mahey Repositories Research Officer, Repositories Research Team, UKOLN GRADE Project Meeting (all partners), Edinburgh, 30 October 2006. This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0

A centre of expertise in digital information management UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

Embed Size (px)

Citation preview

Page 1: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

UKOLN is supported by:

Data Repositories and JISC Repository Landscape

Mahendra Mahey

Repositories Research Officer, Repositories Research Team, UKOLN

GRADE Project Meeting (all partners),Edinburgh, 30 October 2006.

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

Page 2: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Data Repositories LandscapeDisconnected landscape

Institutions

Data Centre

Data CentreData Centre

Data Centre

Institutions

?

?

Page 3: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

JISC Funds

• Data Centres– MIMAS*– AHDS*– UK Data Archive*– EDINA

* Also receive funding from Research Council UK

Page 4: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

JISC Information Environment Architecture

(Idealised) Technical Infrastructure for ServicesAndy Powell, 2005

Page 5: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Institutional Repositories Holding Research Data

• Very few around the world are doing this and are they up to the job?– Versioning– Authentication at individual asset level

• Other methods are being used, informal, ad-hoc, lots of data slipping through the net

• Repositories offer a better way to do this? Different Data types lead to problems with existing software

• Data cluster projects– E Bank– Spectra– GRADE– CLADDIER– ARROW – DART

• The idea of linking papers to underlying data of experiments and research is very appealing – stORe project and Open Access!

• Can do some (orphaned) but not all, still role for data centres

Page 6: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Data Centres• Have been storing data for years and predate trendy ‘r’ word, experts• They can teach institutions many lessons• A lot of mystery, suspicion between Data Centres and Institutions communication

and dialogue needed between the two and interdisciplinary • Time and money saving?• Data centres argue that that subject specific is a good thing, rationalising?• Storing and Curation has become science in its own right, bioinformatics• Offer

– Databases– Web access– Tools to explore the information– Systems to capture the information– Service centres

• Custodianship, acquisition and ownership– Depend of good will of community– Add value, service and organisation, require lots of money to continue

Page 7: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Reactome

EnsEMBLGenome

Annotation

EMBL-BankDNA sequences

UniProtProtein Sequences

Array-ExpressMicroarray

Expression Data

EMSDMacromolecularStructure Data

IntActProtein Interactions

Data Centre Infrastructure Can be Complex!

Page 8: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Aggregator services

Institutional data repositoriesValidation

Deposit

Publishers: peer-review journals, conference proceedings, etc

Publication

Validation

Data analysis, transformation, mining, modelling

Search, harvest

Presentation services / portals

Data discovery, linking, citation

Laboratory repositoryDeposit

Institutional and Data Centre practice exist

Page 9: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

DRP Projects

Data Cluster Meetings

Road MapRequired

Workshop

Briefing Paper

Interviews and Surveys

Road Map for Digital Repository / Preservation ProjectsFocusing on Data

06/09 Call

Data ClusterData Centres

• GRADE• R4L• SPECTRa• CLADDIER• stORe• eBank

Page 10: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

UKOLN - Data Repositories Research (Consultancy)

• To define how institutions (collectively and individually) and scientific data centres can together effectively achieve:– Preservation– Access – Managed and Open– Reuse – Data Citation, Data Mining and Reinterpretation

• To identify the mechanisms, business processes and good practice by which these functions can be achieved

• To facilitate dialogue between data centres, institutions and other key players and to define a collaborative way forward

Dr Liz Lyon

Page 11: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Identifying and defining inter-relationships1. Socio-cultural, organisational, legal2. Technical interoperability3. Roles & responsibilities

AccessPreservationRe-use

See briefing paper produced for workshop

Page 12: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Socio-cultural, organisational, political and legal issues

• highly diverse in awareness

• practice and skills

• need to understand the full spectrum of research practice

• workflows and associated data flows – both within and between disciplines/sub-

disciplines:

Page 13: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Hierarchy of Drivers

• Level 0: deliver project.

• Level 1: meet ‘good scientific practice’.

• Level 2: support own science.

• Level 3: employer’s requirements.

• Level 4: funder’s requirements.

• Level 5: public policy requirements.

Slide from Mark Thorley: NERC

Page 14: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

RC UK - Funding Body

Page 15: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Socio-legal conclusions• Use a questionnaire and send to data centres, disciplines will be different• Promote use & interoperability through metadata standards. Resource discovery

standards should be promoted & developed by learned societies/ (membership arms) subject communities by disciplines (not data curators). Bottom up rather than top down. Education – recognise very wide range of understanding amongst disciplines re value of data curation centres/IRs/archives – need go out and promote why they exist and why they should be used. Focus at community.

• Each research council should have a written ‘meaty’ data policy, disseminated and policed.

• Legal issues – value of JISC legal centre but lack clarity and guidance of law where law exists re use of digital objects, IP etc need clarity of law and guidance on how best to interpret it, straightforward answers to straightforward questions. Model licences for use, interpretation, confidentiality, disclosure.

• Academics & data centres need to be told differences between data banks/data centres etc and IRs. IRs have not had enough institutional buy-in yet.

• JISC could investigate why subject repositories are more successful than IRs. JISC policy should reflect what is happening on ground.

• JISC should help sell IRs better

Page 16: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Technical Interoperability

• Federation models

• interoperability and inter-relationships between repositories

Page 17: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Open Access

• Good thing but…– But are the tools up to the job

• OAI PMH• Dublin Core• Use METS as packaging standard, momentum

building?

• Papers not data• For data do these map to other Metadata

Schema developed, extensions to DC?

Page 18: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Federation

• Monolithic solutions fail

• Aggregation of institutional repositories is essential

Data Centre’s View

Page 19: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Technical• Need to define what is meant by semantics of structured data and publish guidelines at

levels of metadata, classification/subject areas/factual names/agreed conventions layered on top e.g identifiers.

• Application profiles – who should be keeper of those definitions eg registries – who funds and owns them ?

• Scientists concentrate on narrow areas but connections are to other wider areas• Time series data are different – how discover and use? More difficult to define discovery

metadata for time series. Data might not be logically the same.• Data curation responsibility at institutional level/data centre – data curation requires

specialisms and data centres could feed this expertise back to institutions – need flow of expertise from Data Centres to institutions

– Invitations to work in a data centre for week – happening in Australia

• Mixed economy re organisational responsibility is inevitable: some federation will be there• How to express quality – role for provenance and audit as a means to express quality; also

ranking and annotation • Curation of data is of more interest to scientists than interoperability as a means of

marketing/selling it.

Page 20: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Roles, Rights & Responsibilities• ‘Scientist’: Creation and use of data.• ‘Data centre’: Curation of and access to data.

• ‘User’: Use of 3rd party data.

• ‘Funder’: Set / react to public policy drivers.• ‘Publisher’: Maintain integrity of the scientific

record.

From Mark Thorley: NERC

Page 21: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Roles & Responsibilities• Individual scientists to deposit data using domain standards of an acceptable quality• Re-user should acknowledge where data came from and if it is appropriate to improve the quality of the

data.• Institution should have policies that mandate data deposit in an appropriate place not necessarily an IR.• Publishers/journals/editors should mandate open deposit of data.• Curators who collect, describe and connect data, idea of community proxy role - define standards for

domain working, in and with the scientists• Funders should enforce their data deposit policies where possible• Funders should recognise the emerging need for new infrastructure and provide appropriate funding for

this infrastructure and for the resulting actions• Users and funders should feed back views on the data stored to the data centre manager• Click use licence – says if you enhance the data you must give it back, but how to police that policy by

data centre? Versioning an issue here.• Value of “good enough” versus “completely comprehensive” descriptions (Graham C)• Who is responsible for ownership of the data to make changes? If multiple versions, not necessarily the

last one is best • Competitive views: risk of sabotage of other groups work is possible.• Who checks provenance of anything new? Curators?

Page 22: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Small Science vs Big Science “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.”

‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)

Page 23: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Dataset publishing• Re examine concept of Dataset Publishing (Callahan, Johnson, and Shelley 1996)

– analogous to publishing papers– rewards for publishing datasets (e.g. promotion, RAE)– procedures (e.g. standards to use, peer review) & resources to manage procedures

• Should minimise time and effort required

– need tools to assist in creation, maintenance and dissemination of dataset descriptions

• Means of ‘putting’ into a public/community– Deposit and Share are too cosy– to ‘publicate, to issue

• Terms of access and use– Open? – Privilege of membership– Payment of money

Taken from Peter Burnhill

Page 24: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

Spatial is Special• Why?• GEO research data not deposited, Lots of data slipping through

nets, not falling under RC remit, Data being lost, shared informally, may be case for national repository?

• Fears about legality of resources, e.g. OS data, researchers really want to share in a big way

• Should data be deposited in Data Centres?• Academics not comfortable about sharing on larger scale?• IRs not geared up to handle data?• DSPace not allow edit of Metadata• Problem with ISO Standard used for Geo data ISO 19115 and DC• Mapping done, further work needed, from wing mirror to Smart Car?

Page 25: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

• Responsibility of publically funded research to share data

• ‘Free our Data’ Guardian work• INSPIRE work

Responsibility of Data Providers

Page 26: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

GRADE’s input

• Important that GRADE inputs into this work as it will set direction of research and focus on GEOSPATIAL DATA Repository work

• Interviews held with Rebecca and David

Page 27: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

DRP Projects

Data Cluster Meetings

Road MapRequired

Workshop

Briefing Paper

Interviews and Surveys

Road Map for Digital Repository / Preservation ProjectsFocusing on Data

06/09 Call

Data ClusterData Centres

• GRADE• R4L• SPECTRa• CLADDIER• stORe• eBank

Page 28: A centre of expertise in digital information management  UKOLN is supported by: Data Repositories and JISC Repository Landscape Mahendra

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

We need your input!

[email protected] [email protected]