Upload
anna-lancaster
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Federation
eCrystals Federation: Open Repositories for Data-driven Science
Dr Liz Lyon, UKOLN, University of Bath, UK
Dr Simon Coles, University of Southampton, UK
Chemical Informatics Workshop, Manchester, March 2008
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0
http://creativecommons.org/licenses/by-sa/3.0/
Themes1. Context: Institutional data repositories
crystallography exemplar2. Scale: repository federations3. Longevity: Digital curation and preservation4. Integration: Semantic challenges
eBank Project – building the eCrystals Data Repository
ePrints platform @ Southampton
Institutional Repository exemplar
Embedded in workflow
http://ecrystals.chem.soton.ac.uk
Started Sept 2003
Scholarly knowledge cycle context
UKOLN-led interdisciplinary team
Scaling Up Report
Phase 3 findings:
Data policy should reflect lab practice & institutional model
Diverse lab practice
LIMS proprietary formats
Data quality criteria/validation
“Prior publication” problem
We need automated assignment of terms for data discovery
No discipline preservation model
nλ = 2 d sinθ
TheThe
eCrystals Repository
ePrints.org v3.0
Repository Foundations • Using simple Dublin Core
• Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date
• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords
• Specifies which ‘datasets’ are present in an entry
• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
• DOI links http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145
• Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html
Learned society + subject repository support
Federation interoperability & linking services
• Roll-out in 2 phases led by University of Southampton• Establish Federation policies, application profile, mappings• Bi-directional links with derived articles in “publisher
repositories”, IUCr, Royal Society of Chemistry (RSC), Chemistry Central: scholarly knowledge cycle
• StOReLink project - Test linking options: StORe middleware and CLADDIER
• OAI-ORE Testbed
eChemistry project
Laboratory practice & workflow• Community standard CIF• Mixed lab practice – central service
facility versus single “staff crystallographer” in department
• Achieve end-to-end workflow• Challenge of instrument manufacturers
with proprietary formats• “Repository Lite” for smaller lab
operations?
X-ray diffractometers
eBank-UK Phase 3 Curation & Preservation Study: Sustainability issueshttp://www.ukoln.ac.uk/projects/ebank-uk/curation
/
Examined four main areas1. Audit and certification (TRAC,
DRAMBORA, NESTOR, ISO International repository audit and certification BOF Group)
2. The Open Archival Information System (OAIS) and Representation Information (RI)
3. eBank-UK application profile and preservation metadata
4. ePrints.org repository platform
Recommendations:
Self-assessment using DRAMBORA
Consider Representation Information in wider context
Develop preservation strategy
Capture preservation metadata - PREMIS
Crystallographic schema underpins CIF (Crystallographic Information Framework), but is limited to data parameters e.g. cell_length_a
Semantic issues
IUCr Acta Cryst 1992
Limited set of keywords describing methods, properties & applications, compounds, attributes
No established crystallography dictionary or controlled vocabulary to give chemistry context
What do we want to do?• Support depositors’ keyword/term assignment• Facilitate and improve automated indexing• Support advanced search / browse• Allow metadata validation & enhancement• Apply across a heterogeneous Federation• Cross search, cross browse functionality• Link data to all associated digital objects• Develop domain semantics / vocabulary• Use domain-specific authority files• Mine to “discover” rather than “find”• Achieve full inter-disciplinary integration
Some (semantic) issues…..• How are terms assigned?• Informal tags and/or structured KOS? • How is a vocabulary curated and maintained?• Can a vocabulary be transformed into a (Semantic Web related
understanding) ontology?• Disambiguation, acronyms, IUPAC names• Persistent identification for data citation• Granularity of data citation• Data (and metadata) quality, provenance, validation• Embedding within complex workflows• Use collaborative social approaches? • Community adoption: becomes part of the culture
Federation
Questions?
Slides will be available at :
http://wiki.ecrystals.chem.soton.ac.uk/index.php
http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html
This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 3.0
http://creativecommons.org/licenses/by-sa/3.0/