View
20
Download
0
Category
Tags:
Preview:
DESCRIPTION
eBank UK : linking research data, scholarly communication and learning. Dr Liz Lyon, UKOLN, University of Bath Dr Simon Coles, School of Chemistry, University of Southampton. Overview. In context: scholarly communications Open Access Data, information, workflows and provenance - PowerPoint PPT Presentation
Citation preview
AHM, Nottingham, September 2004 1
eBank UK : linking research data, scholarly communication and learning.
Dr Liz Lyon, UKOLN, University of Bath
Dr Simon Coles, School of Chemistry, University of Southampton
AHM, Nottingham, September 2004 2
Overview• In context: scholarly communications
– Open Access – Data, information, workflows and provenance
• The data publication bottleneck
– e-Science and crystallography– Comb-e-chem Project
• eBank UK
– Information architecture and data flow– Interoperability issues
• Challenges for the future
Scholarly communications
AHM, Nottingham, September 2004 4
Current chemistry publishing protocols
Ideas and interpretations
Results & derived data
Hooks into the literature
Raw data!
AHM, Nottingham, September 2004 5
AHM, Nottingham, September 2004 6
AHM, Nottingham, September 2004 7
“It is envisaged that the sharing of primary data would prevent unnecessary repetition of experiments and enable scientists to build directly on each others’ work, creating greater efficiencies and productivity in the research process.”
The government line
AHM, Nottingham, September 2004 8
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Searching , harvesting, embedding
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
The scholarly knowledge cycle.
Liz Lyon, eBankUK article. Ariadne, July 2003.
AHM, Nottingham, September 2004 9
Learning & Teaching workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Harvestingmetadata
Resource discovery, linking, embedding
Peer-reviewed publications: journals, conference proceedings
Validation
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
AHM, Nottingham, September 2004 10
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
AHM, Nottingham, September 2004 11
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services:
eBank UK
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
The Data Publication Bottleneck
AHM, Nottingham, September 2004 13
Data Overload!
How do we disseminate?
EPSRC National Crystallography
Service
The data deluge
AHM, Nottingham, September 2004 14
CombeChem: An EPSRC pilot project
X-Raye-Lab
Analysis
Properties
Propertiese-Lab
SimulationVideo
Diff
ract
omet
er
Grid Middleware
StructuresDatabase
AHM, Nottingham, September 2004 15
Grid
E-Scientists
Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning
5
Institutional Archive
LocalWebPublisher
Holdings
Digital Library
E-Scientists Graduate Students
Undergraduate Students
Virtual Learning Environment
E-Experimentation
E-Scientists
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints & Metadata
Certified Experimental
Results & Analyses
Data, Metadata & Ontologies
The eBank UK Project
AHM, Nottingham, September 2004 17
eBank UK project
• JISC-funded for 1 year from September 2003• UKOLN at the University of Bath (lead), University of
Southampton, University of Manchester• “Building the links between research data, scholarly
communication and learning”• Exemplar: e-Science testbed ‘Combechem’
– Grid-enabled combinatorial chemistry– Crystallography, laser and surface chemistry examples– Development of an e-Lab using pervasive computing technology– National Crystallography Service
• Resource Discovery Network / PSIgate physical sciences portal• http://www.ukoln.ac.uk/projects/ebank-uk/
AHM, Nottingham, September 2004 18
The project team
• UKOLN• Michael Day• Monica Duke• Rachel Heery• Liz Lyon• +• Andy Powell
• Southampton• Les Carr• Simon Coles• Jeremy Frey• Chris Gutteridge• Mike Hursthouse
• Manchester• John Blunden-Ellis
AHM, Nottingham, September 2004 19
First steps: establishing common ground…
• Understand the data creation process • Terminology and definitions
– Data– Metadata– Datafile– Dataset– Data holding
• Different views– Digital library researchers, computer scientists, chemists– Generic vs specific– Modeller vs practitioner
• Aim for a common ontology• Modelling the domain• Creating a metadata schema
AHM, Nottingham, September 2004 20
Progress update
• Version 2.0 eBank metadata schema• Enhanced ePrints.org software• Pilot institutional e-data repository for
harvesting (raw, derived, results data)• Exports records as ebank_dc and oai_dc• Validation of schema• Pilot eBank UK aggregator service• Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal
– embedding eBank UK
AHM, Nottingham, September 2004 21
Crystallography workflow
• Initialisation: mount new sample on diffractometer & set up data collection
• Collection: collect data• Processing: process and correct images• Solution: solve structures• Refinement: refine structure• CIF: produce CIF (Crystallographic Information File
format)• Report: generate Crystal Structure Report
RAW DATA DERIVED DATA RESULTS DATA
AHM, Nottingham, September 2004 22
Deposition into the archive
AHM, Nottingham, September 2004 23
An Archive entry
ecrystals.chem.soton.ac.uk
For a demo come to the JISC booth!
Today @ 13:00 & during tea
AHM, Nottingham, September 2004 24
All the way back to the underlying data…
AHM, Nottingham, September 2004 25
Some metadata issues
• Using simple and qualified Dublin Core • Additional chemical information in schema for
harvesting e.g. empirical formula• Schema contains International Chemical Identifier
(InChI)• Links to all datasets associated with an experiment• Links to individual datasets within an experiment• Links to eprints (and other published literature)
derived from the data• Using vocabularies specific to crystallography• Engaging the broader scientific community to ensure
different schemas are compliant and standards can emerge
AHM, Nottingham, September 2004 26
ebank_dc record (XML)
Crystal structure (data holding)
Crystal structure report (HTML)
Dataset
Dataset
Institutional repository
Deposit
Dataset
dc:identifier
dcterms:references
Linking
dc:type=“CrystalStructure” and/or “Collection”
Model input Andy Powell, UKOLN.
Eprint oai_dc record (XML)
dcterms:isReferencedBy
dc:type=“Eprint” and/or ”Text”
Data flow in eBank
Eprint “jump-off” page (HTML)
dc:identifierEprint manifestation (e.g. PDF)
Linking
AHM, Nottingham, September 2004 28
ebank_dc record (XML)
Crystal structure (data holding)
Crystal structure report (HTML)
Dataset
Dataset
Institutional repository
eBank UK aggregator service
ePrint UK aggregator service
Subject service
DepositHarvesting OAI-PMH
ebank_dc
Harvesting OAI-PMH oai_dc
Harvesting OAI-PMH oai_dc
Searching, linking and embedding
Searching, linking and embedding
Searching, linking and embedding
Dataset
dc:identifier
dcterms:references
Linking
dc:type=“CrystalStructure” and/or “Collection”
Model input Andy Powell, UKOLN.
PSIgate portal
Eprint oai_dc record (XML)
dcterms:isReferencedBy
dc:type=“Eprint” and/or ”Text”
Data flow in eBank
Eprint “jump-off” page (HTML)
dc:identifierEprint manifestation (e.g. PDF)
Linking
AHM, Nottingham, September 2004 29
Harvesting: OAIster
AHM, Nottingham, September 2004 30
Linking and aggregating: Search & discover
For a demo come to the JISC booth!Today @ 13:00 & during tea or the buffet
AHM, Nottingham, September 2004 31
Linking and aggregating: Hit browsing
AHM, Nottingham, September 2004 32
And finally…eBank embedded in a science portal
AHM, Nottingham, September 2004 33
Currently we are……
• Assessing outcomes of a Consultation Workshop held in August e.g.– Cost-benefit issues for researchers?– RAE / assessment impact?– Disciplinary differences?
• Presenting a demonstrator• Completing supporting studies on (1)
Provenance and (2) Data models and schema• Promoting Open Access and Open eData Archives to
international crystallographic organisations, publishers, learned societies
• Phase 2 proposal funding sought for further 12 months
Challenges for the future
AHM, Nottingham, September 2004 35
Phase 2 plan…….(1)
• Continue to progress towards generic metadata schemas
• Validation against other schema– CLRC Scientific Metadata Model
• Modify Eprints.org software to allow for more generic scientific data and schemas
• Metadata enhancement: subject keyword additions based on knowledge of keywords in related publications
• Investigate identifiers e.g. International Chemical Identifier (InChI code)
• Explore context sensitive linking: find me– Datasets by this person; Journal articles by this person; Datasets
related to this subject; Journal articles on this subject; Learning objects by this person; Learning objects on this subject
AHM, Nottingham, September 2004 36
Phase 2…….(2)
• Full embedding into the crystallographic research and publishing communities
• Chemistry workflow embedding– SMART TEA e synthesis Lab– Other analytical techniques in chemistry
• e-Learning embedding and pedagogic evaluation– Undergraduate chemical informatics courses– Introduction to visiting schools
• Expand into other physical, mathematical, geological and engineering sciences
• Feasibility study in related domains – bio and medical sciences
• Feasibility study in unrelated domains – arts and humanities
Thank you.
Questions?…..
Recommended