15
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton, UK Chemical Informatics Workshop, Manchester, March 2008 This work is licensed under a Creative Commons Licence Attribution-ShareAlike 3.0 http://creativecommons.org/licenses/by-sa/3.0/

Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Embed Size (px)

Citation preview

Page 1: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Federation

eCrystals Federation: Open Repositories for Data-driven Science

Dr Liz Lyon, UKOLN, University of Bath, UK

Dr Simon Coles, University of Southampton, UK

Chemical Informatics Workshop, Manchester, March 2008

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0

http://creativecommons.org/licenses/by-sa/3.0/

Page 2: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Themes1. Context: Institutional data repositories

crystallography exemplar2. Scale: repository federations3. Longevity: Digital curation and preservation4. Integration: Semantic challenges

Page 3: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

eBank Project – building the eCrystals Data Repository

ePrints platform @ Southampton

Institutional Repository exemplar

Embedded in workflow

http://ecrystals.chem.soton.ac.uk

Started Sept 2003

Scholarly knowledge cycle context

UKOLN-led interdisciplinary team

Page 4: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Scaling Up Report

Phase 3 findings:

Data policy should reflect lab practice & institutional model

Diverse lab practice

LIMS proprietary formats

Data quality criteria/validation

“Prior publication” problem

We need automated assignment of terms for data discovery

No discipline preservation model

Page 5: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

nλ = 2 d sinθ

TheThe

Page 6: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

eCrystals Repository

ePrints.org v3.0

Page 7: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Repository Foundations • Using simple Dublin Core

• Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date

• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords

• Specifies which ‘datasets’ are present in an entry

• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

• DOI links http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145

• Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html

Learned society + subject repository support

Page 8: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Federation interoperability & linking services

• Roll-out in 2 phases led by University of Southampton• Establish Federation policies, application profile, mappings• Bi-directional links with derived articles in “publisher

repositories”, IUCr, Royal Society of Chemistry (RSC), Chemistry Central: scholarly knowledge cycle

• StOReLink project - Test linking options: StORe middleware and CLADDIER

• OAI-ORE Testbed

eChemistry project

Page 9: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Laboratory practice & workflow• Community standard CIF• Mixed lab practice – central service

facility versus single “staff crystallographer” in department

• Achieve end-to-end workflow• Challenge of instrument manufacturers

with proprietary formats• “Repository Lite” for smaller lab

operations?

X-ray diffractometers

Page 10: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

eBank-UK Phase 3 Curation & Preservation Study: Sustainability issueshttp://www.ukoln.ac.uk/projects/ebank-uk/curation

/

Examined four main areas1. Audit and certification (TRAC,

DRAMBORA, NESTOR, ISO International repository audit and certification BOF Group)

2. The Open Archival Information System (OAIS) and Representation Information (RI)

3. eBank-UK application profile and preservation metadata

4. ePrints.org repository platform

Recommendations:

Self-assessment using DRAMBORA

Consider Representation Information in wider context

Develop preservation strategy

Capture preservation metadata - PREMIS

Page 11: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Crystallographic schema underpins CIF (Crystallographic Information Framework), but is limited to data parameters e.g. cell_length_a

Semantic issues

Page 12: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

IUCr Acta Cryst 1992

Limited set of keywords describing methods, properties & applications, compounds, attributes

No established crystallography dictionary or controlled vocabulary to give chemistry context

Page 13: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

What do we want to do?• Support depositors’ keyword/term assignment• Facilitate and improve automated indexing• Support advanced search / browse• Allow metadata validation & enhancement• Apply across a heterogeneous Federation• Cross search, cross browse functionality• Link data to all associated digital objects• Develop domain semantics / vocabulary• Use domain-specific authority files• Mine to “discover” rather than “find”• Achieve full inter-disciplinary integration

Page 14: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Some (semantic) issues…..• How are terms assigned?• Informal tags and/or structured KOS? • How is a vocabulary curated and maintained?• Can a vocabulary be transformed into a (Semantic Web related

understanding) ontology?• Disambiguation, acronyms, IUPAC names• Persistent identification for data citation• Granularity of data citation• Data (and metadata) quality, provenance, validation• Embedding within complex workflows• Use collaborative social approaches? • Community adoption: becomes part of the culture

Page 15: Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,

Federation

Questions?

Slides will be available at :

http://wiki.ecrystals.chem.soton.ac.uk/index.php

http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0

http://creativecommons.org/licenses/by-sa/3.0/