Upload
rebecca-mcdonough
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
UKOLN is supported by:
Realising the scholarly knowledge cycle:
The experience of eBank UK
Dr Liz Lyon, UKOLN, University of Bath, UK
CNI Task Force Meeting Spring 2004
Alexandria, Virginia,
www.bath.ac.uk
a centre of expertise in digital information management
www.ukoln.ac.uk
CNI Spring 2004 2
Overview
• Setting the scene– e-Research trends– Towards a common infrastructure
• The scholarly knowledge cycle– Data, information and workflows– Provenance
• eBank UK Project– The experience so far– Issues arising
• Challenges for the future
Setting the scene
“The next generation of research breakthroughs will rely upon new ways of handling the immense amounts of data that are being produced by modern research methods and equipment, such as telescopes, particle accelerators, genome sequencers and biological imagers….Similar developments are having an impact in the arts and humanities, and in the social sciences.”
A Vision for Research,
Research Councils UK, December 2003.
CNI Spring 2004 5
Report of the National Science Foundation
Blue-Ribbon Advisory Panel on Cyberinfrastructure
2003
http://www.cise.nsf.gov/sci/reports/toc.cfm
CNI Spring 2004 6
Report of the National Science Foundation
Blue-Ribbon Advisory Panel on Cyberinfrastructure
2003
http://www.cise.nsf.gov/sci/reports/toc.cfm
CNI Spring 2004 7
UK e-Science Programme
“e-Science is about global collaboration in key areas of science and the next generation of
infrastructure that will enable it.”
John Taylor, Director General, Research Councils, UK
CNI Spring 2004 8
CNI Spring 2004 9
Powering the Virtual Universehttp://www.astrogrid.org(Edinburgh, Belfast, Cambridge, Leicester, London, Manchester, RAL)
AstroGrid will provide advanced, Grid based, federation and data mining tools to facilitate better and faster scientific output.
Picture credits: “NASA / Chandra X-ray Observatory / Herman Marshall (MIT)”, “NASA/HST/Eric Perlman (UMBC), “Gemini Observatory/OSCIR”, “VLA/NSF/Eric Perlman (UMBC)/Fang Zhou, Biretta (STScI)/F Owen (NRA)”
:
CNI Spring 2004 10
e-Research: the trends?
• Increasingly data–intensive, quantitative• Open access to data and information
– OECD Declaration January 2004
• Implementing new science• Inter-disciplinary • New disciplines e.g. Astro-informatics• New skills requirements
– IT + statistics + domain
• Collaborative – virtual / transient – communities / organisations
• Highly distributed resources
CNI Spring 2004 11
New resources…….used in new ways
• Primary / original data – Observational, experimental, numeric, genomic, 2/3D molecular
structures, satellite images, electron micrographs, wave spectra, CAD, musical compositions, VR, performances, animations
• Data and information– Creation, discovery, gathering, aggregation, dis-aggregation, replication,
federation, manipulation, transformation, linking, annotation, editing/versioning, validation, (self-)archiving, deposit, publication, curation
• Knowledge extraction and management– Analysis (textual, musical, statistical, mathematical, visual, chemical,
gene……)– Mining (text, data, structures……)– Modelling (economic, mathematical, biological..)– Simulation (molecular, physical, environmental, games…)– Presentation (visualisation, rendering….)
CNI Spring 2004 12
Towards a common infrastructure
• UK e-Science Programme & JISC Development• e-Science Phase 2 2003 – 2006
– A National e-Science Centre linked to a network of Regional Grid Centres
– An Open Middleware Infrastructure Institute (OMII) based on common standards (Web Services)
• JISC Information Environment– Technical architecture based on open standards (Web Services,
OAI-PMH, Z39.50, RSS…..) http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/
– A Digital Curation Centre (DCC) http://www.dcc.ac.uk/ – Virtual Research Environments?
• A changing landscape of scholarly communications
The scholarly knowledge cycle
CNI Spring 2004 14
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Searching , harvesting, embedding
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
CNI Spring 2004 15
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Searching , harvesting, embedding
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
CNI Spring 2004 16
CNI Spring 2004 17
CNI Spring 2004 18
CNI Spring 2004 19
CNI Spring 2004 20
CNI Spring 2004 21
CNI Spring 2004 22
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Searching , harvesting, embedding
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
CNI Spring 2004 23
Learning & Teaching workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Harvestingmetadata
Resource discovery, linking, embedding
Peer-reviewed publications: journals, conference proceedings
Validation
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
CNI Spring 2004 24
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services: national, commercial
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
CNI Spring 2004 25
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services:
eBank UK
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
The eBank UK Project
CNI Spring 2004 27
eBank UK project
• JISC-funded for 1 year from September 2003• UKOLN (lead), University of Southampton, University of
Manchester• “Building the links between research data, scholarly
communication and learning”• e-Science testbed Combechem
– Grid-enabled combinatorial chemistry– Crystallography, laser and surface chemistry– Development of an e-Lab using pervasive computing technology– National Crystallography Service
• Resource Discovery Network PSIgate physical sciences portal• http://www.ukoln.ac.uk/projects/ebank-uk/
CNI Spring 2004 28
The project team
• UKOLN• Michael Day• Monica Duke• Rachel Heery• Liz Lyon• +• Andy Powell
• Southampton• Les Carr• Simon Coles• Jeremy Frey• Chris Gutteridge• Mike Hursthouse
• Manchester• John Blunden-Ellis
CNI Spring 2004 29
Key Deliverables
1. Requirements specification
2. Pilot service
3. Two supporting studies:– Provenance: review of current research– Feasibility report on dataset description and
schema
4. Consultative evaluation workshop and report
5. Recommendations for future work
CNI Spring 2004 30
Diagram by Andy Powell, UKOLN
Pilot service – technical architecture
Comb-e-Chem Project
X-Raye-Lab
Analysis
Properties
Propertiese-Lab
SimulationVideo
Diff
ract
omet
er
Grid Middleware
StructuresDatabase
CNI Spring 2004 32
Crystallography workflow
• Initialisation: mount new sample on diffractometer & set up data collection
• Collection: collect data• Processing: process and correct images• Solution: solve structures• Refinement: refine structure• CIF: produce CIF• Report: generate Crystal Structure Report
CNI Spring 2004 33
CNI Spring 2004 34
First steps: establishing common ground…
• Understand the data creation process • Terminology and definitions
– Data– Metadata– Datafile– Dataset– Data holding
• Different views– Digital library researchers, computer scientists, chemists– Generic vs specific– Modeller vs practitioner
• Aim for a common ontology• Modelling the domain• Creating a metadata schema
CNI Spring 2004 35
ebank_dc record (XML)
Crystal structure (data holding)
Crystal structure report (HTML)
Dataset
Dataset
Institutional repository
eBank UK aggregator service
ePrint UK aggregator service
Subject service
DepositHarvesting OAI-PMH
ebank_dc
Harvesting OAI-PMH oai_dc
Harvesting OAI-PMH oai_dc
Searching, linking and embedding
Searching, linking and embedding
Searching, linking and embedding
Dataset
dc:identifier
dcterms:references
Linking
dc:type=“CrystalStructure” and/or “Collection”
Model input Andy Powell, UKOLN.
PSIgate portal
Eprint oai_dc record (XML)
dcterms:isReferencedBy
dc:type=“Eprint” and/or ”Text”
CNI Spring 2004 36
Where are we now?
• Version 1.0 eBank metadata schema• Pilot eBank repository for harvesting• Exports records as ebank_dc and oai_dc• Validation of schema
– Against harvesting and searching– Against user requirements– Against other schema
• Concept of a collection and a Collection Level Description
• Implementing the pilot service
Challenges for the future
CNI Spring 2004 38
What next?The metadata schema…some issues• Reduce to its simplest form or reflect the complexity?• ebank_dc versus oai_dc• Compatibility with other schema
– CLRC Scientific Metadata Model vs 1.0 2001 (under revision) http://www-dienst.rl.ac.uk/library/2002/tr/dltr-2002001.pdf
• Investigate packaging options – METS– MPEG 21 DIDL – ??
• Expand to include SMART e-Lab metadata e.g. sample preparation
CNI Spring 2004 39
…and also….• Investigate identifiers e.g. International Chemical Identifier• Metadata enhancement - subject keyword additions to
datasets based on knowledge of keywords in related publications
• Develop search interface – embedding eBank UK• Testing with PSIgate physical sciences portal• Explore context sensitive linking: find me
– Datasets by this person– Journal articles by this person– Datasets related to this subject– Journal articles on this subject– Learning objects by this person– Learning objects on this subject
CNI Spring 2004 40
Learning & Teaching workflows
Research & e-Science workflows
Aggregator services: eBank UK
Repositories : institutional, e-prints, subject, data, learning objects
Data curation: databases & databanks
Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules
Validation
Harvestingmetadata
Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media
Resource discovery, linking, embedding
Deposit / self-archiving
Peer-reviewed publications: journals, conference proceedings
Publication
Validation
Data analysis, transformation, mining, modelling
Resource discovery, linking, embedding
Deposit / self-archiving
Learning object creation, re-use
Searching , harvesting, embedding
Quality assurance bodies
Validation
Presentation services: subject, media-specific, data, commercial portals
Resource discovery, linking, embedding
Linking
CNI Spring 2004 41
Potential longer term impact
1. Track data, information and workflows in e-research and scholarly communications – knowledge audit??
2. Validate the accuracy and authenticity of derived works – ideas audit??
3. Facilitate explicit referencing and acknowledgment of original contributors – intellectual integrity??
4. Raise standards associated with publication of research outputs – academic publishing rigour??
5. Implement open access to and dissemination of data and information – enhance the research process??
6. Give students links to original data underpinning published works – enhance the learning process??
CNI Spring 2004 42
Thank you.
Questions?…..