23
Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University of Southampton, April 2008 This work is licensed under a Creative Commons Licence Attribution-ShareAlike 3.0 http://creativecommons.org/licenses/by-sa/3.0/

Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Embed Size (px)

Citation preview

Page 1: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Federation

The eCrystals FederationDr Simon Coles, University of Southampton, UK

Dr Liz Lyon, UKOLN, University of Bath, UK

Open Repositories 2008, University of Southampton, April 2008

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0

http://creativecommons.org/licenses/by-sa/3.0/

Page 2: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Themes1. Context: Open science, institutional data

repositories crystallography exemplar2. Scale: repository federations3. Integration: Lab workflow and semantic

challenges 4. Longevity: Digital curation, preservation and

sustainability5. Community: DCC Data Forum

Page 3: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Open Science

….is happening now

• Blogging of results data

• Open grant proposals

• Community repositories for data

• Open Notebook Science (ONS) tutorials in Second Life

Page 4: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

eBank Project – building the eCrystals Data Repository

ePrints platform @ Southampton

Institutional Repository exemplar

Embedded in workflow

http://ecrystals.chem.soton.ac.uk

Started Sept 2003

Scholarly knowledge cycle context

UKOLN-led interdisciplinary team

Page 5: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Scaling Up ReportInterviews & analysis of a discipline: crystallography

Synthesis: IR Policy & Practice, Laboratory Practice & Workflows, Technical Interoperability & Standards, Metadata Schema & Application Profiles, Semantic Interoperability, Data Citation, Identifiers & Linking, Federation Architectures & Third Party Services, Rights & Licensing, Data Quality & Validation, Preservation, Curation & Sustainability

Recommendations, commentary

Page 6: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Scaling Up Report

Phase 3 findings:

Diverse lab practice

LIMS and proprietary formats

Data policy should reflect lab practice & institutional model

Data quality criteria/validation

“Prior publication” problem

We need scalable assignment of “terms” for data discovery

No discipline preservation model

Page 7: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

nλ = 2 d sinθ

TheThe

Page 8: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

eCrystals Repository

ePrints.org v3.0

Page 9: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Repository Foundations • Using simple Dublin Core

• Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date

• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords

• Specifies which ‘datasets’ are present in an entry

• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

• DOI links http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145

• Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html

Learned society + subject repository support

Page 10: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Federation interoperability & linking services

• Roll-out in 2 phases led by University of Southampton• Establish Federation policies, application profile, mappings• Bi-directional links with derived articles in “publisher

repositories”, IUCr, Royal Society of Chemistry (RSC), Chemistry Central: scholarly knowledge cycle

• StOReLink project - Test linking options: StORe middleware and CLADDIER

• OAI-ORE Testbed

eChemistry project

Page 11: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Validation and ReproducibilityWe need to: • Provide accurate data and information that will allow an experiment to be reproduced • Record the provenance of a dataset • Provide an ‘audit trail’ from workflow capture• Relate components of a dataset to steps in the workflow• Share the workflow and record of an experiment• Provide automated approaches to validation

Page 12: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Laboratory practice & workflow• Community standard CIF• Mixed lab practice – central service facility

versus single “staff crystallographer” in department

• Achieve end-to-end workflow• Lack of integration with LIMS• Instrument manufacturers with proprietary

formats• “Repository Lite” for smaller lab

operations?

X-ray diffractometers

Page 13: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Crystallographic schema underpins CIF (Crystallographic Information Framework), but is limited to data parameters e.g. cell_length_a

Semantic issues

Page 14: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

IUCr Acta Cryst 1992

Limited set of keywords describing methods, properties & applications, compounds, attributes

No established crystallography dictionary or controlled vocabulary to give chemistry context

Page 15: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Federation=Repository 2.0?• Facilitate interaction and participation beyond conventional

disciplinary boundaries• Multi-disciplinary search and browse functionality • Support tags, terms, comments, ratings…..• Automatic tag / term validation & enhancement• Develop domain semantics / vocabulary• Use domain-specific authority files • Facilitate and improve automated indexing• Link data to all associated digital objects / people• Apply across a heterogeneous Federation• Mine to “discover and innovate” rather than (just) “find”

Page 16: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Challenges?• How are tags, terms, comments, ratings assigned?• Informal tags and/or structured KOS? • How is a vocabulary curated and maintained?• Can a vocabulary be transformed into a (Semantic Web

related understanding) ontology?• Disambiguation, acronyms, IUPAC names• Persistent identification for data citation• Granularity of data citation: dataset or value?• Advocacy: becoming part of the lab culture

Page 17: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

eCrystals Curation & Preservation Study

Working with the Digital Curation CentreExamined four main areas

1. Audit and certification (TRAC, DRAMBORA, NESTOR, ISO International repository audit and certification BOF Group)

2. The Open Archival Information System (OAIS) and Representation Information (RI)

3. eBank-UK application profile and preservation metadata

4. ePrints.org repository platform

Recommendations

http://www.ukoln.ac.uk/projects/ebank-uk/curation/eBank3-WP4-Report%20(Revised).pdf

Page 18: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

eCrystals Federation: Preservation & sustainability Recommendations

Data repositories• Use DRAMBORA Interactive for self-assessment• Add PREMIS preservation metadata• Collect eCrystals representation information• Examine repository platform conformance to OAIS Reference Model• Survey partner preservation policies

Digital Curation Centre partnership

Page 19: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Dealing with Data Report

• DataSets Mapping and Gap Analysis (UK)

• Data Curation & Preservation Strategy (UK)

• Data Audit Framework (HE Institutions)

• Institutional Data Management, Preservation & Sharing Policy

• Data Management & Sharing Policy (Funders)

• Data Management Plan (Projects)

• Rec 5 : Data Networking Forum (People) linked to RIN Framework Principle 1

Page 20: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Inaugural Research Data Forum• 19-20th March 2008 in Manchester• Joint DCC – RIN event• Data centre managers, IR managers, funders & policy

makers• Aims & Objectives:

– Improve data acquisition, management, analysis, validation, archiving and dissemination

– Increase awareness of national & international data policies and standards

– Facilitate co-operation between organisations and individuals– Exchange experience and best practice

• Next meeting in autumn tbc

Page 21: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Heard at the Forum….“protected by PDF”

“Rembrandt in the attic”“Don’t forget the researcher!”

“stuff isn’t getting done”“demand outstrips supply…”“careers developed more by luck than judgement”“Data managers as failed scientists”

“need to sit down and write the manual”“teeth and sticks and carrots”“professionalising data management”

“Data is not just about eScience/eResearch”“we need services not projects!”

Page 22: Federation The eCrystals Federation Dr Simon Coles, University of Southampton, UK Dr Liz Lyon, UKOLN, University of Bath, UK Open Repositories 2008, University

Heard at the Forum….“protected by PDF”

“Rembrandt in the attic”“Don’t forget the researcher!”

“stuff isn’t getting done”“demand outstrips supply…”“careers developed more by luck than judgement”“Data managers as failed scientists”

“need to sit down and write the manual”“teeth and sticks and carrots”“professionalising data management”

“Data is not just about eScience/eResearch”“we need services not projects!”

OR2008???OR2008???