36
UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference May 2005, Amsterdam. www.bath.ac.u k a centre of expertise in digital information management www.ukoln.ac.u k

UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

Embed Size (px)

Citation preview

Page 1: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

UKOLN is supported by:

From research data to new knowledge: a lifecycle approach.

Dr Liz Lyon, Director

UKOLN, University of Bath, UK

JISC/SURF/CNI Conference May 2005, Amsterdam.

www.bath.ac.uk

a centre of expertise in digital information management

www.ukoln.ac.uk

Page 2: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 2

Overview

1. Scholarly communications in flux

2. e-Research and the diversity of data

3. Repositories & meta-functionality• Realising the link to learning: eBank UK• Providing value-added services• Enabling knowledge extraction & post-

processing

4. Look at (some of) the issues en route

Page 3: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

1. Scholarly communications in flux

Page 4: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 4

A medieval scriptorium…..

Page 5: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 5

Research & e-Science workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Searching , harvesting, embedding

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding The scholarly knowledge

cycle.

Liz Lyon, Ariadne, July 2003.

Page 6: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 6

Learning & Teaching workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules

Harvestingmetadata

Resource discovery, linking, embedding

Peer-reviewed publications: journals, conference proceedings

Validation

Resource discovery, linking, embedding

Deposit / self-archiving

Learning object creation, re-use

Searching , harvesting, embedding

Quality assurance bodies

Validation

Presentation services: subject, media-specific, data, commercial portals

Page 7: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 7

Learning & Teaching workflows

Research & e-Science workflows

Aggregator services: national, commercial

Repositories : institutional, e-prints, subject, data, learning objects

Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Resource discovery, linking, embedding

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Resource discovery, linking, embedding

Deposit / self-archiving

Learning object creation, re-use

Searching , harvesting, embedding

Quality assurance bodies

Validation

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Page 8: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

2. e-Research and the diversity of data

Page 9: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 9

Assuring permanent open access to the records of science & the humanities?

Long term access to primary data

• Increasing data volumes from eScience and Grid-enabled / cyberinfrastructure applications

• Changing research paradigm: data-driven science, “big science”

• Observational data, simulations, large-scale experimentation, computations

• Multi-media resources, statistical data, surveys, geo-spatial data……

Page 10: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 10

Diversity of data collections• Very large, relatively homogeneous: Large-scale Hadron

Collider (LHC) outputs from CERN• Smaller, heterogeneous and richer collections: World Data Centre for

Solar-terrestrial Physics CCLRC• Small-scale laboratory results: “jumping robots” project

at the University of Bath• Population survey data: UK Biobank

• Highly sensitive, personal data: patient care records

Page 11: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 11

Taxonomy of data collections• Research collections:

jumping robots • Community collections:

Flybase at Indiana (with UC Berkeley )

• Reference collections: Protein Data Bank

Source: NSF Long-Lived Digital Data Collections

Draft report March 2005

Page 12: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 12

Taxonomy of data collections• Research collections:

jumping robots • Community collections:

Flybase at Indiana (with UC Berkeley )

• Reference collections: Protein Data Bank

Source: NSF Long-Lived Digital Data Collections

Draft report March 2005

Evolution……

Page 13: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 13

Repository evolution:

1971 Research collection

<12 files

2005 Reference collection

>2700 structures deposited in 6 months

Page 14: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 14

1. Issues: research data as content

• Sharing it!• Data diversity

– Homo- or heterogeneous– Raw and derived / processed – Sensitivity– Fast or slow growth in volume

• Repository evolution: – Likelihood to scale up (from bytes to petabytes)– Quality assurance (from the start)– Community-based standards development

(“folksonomies”)– Build robust services

Page 15: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

3. Repositories & meta-functionality

Page 16: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 16

eBank UK: linking research data to learning

• JISC-funded September 2003, Phase 2 February 2005• UKOLN at the University of Bath (lead), University of

Southampton, University of Manchester• Exemplar: e-Science testbed ‘Combechem’

– Grid-enabled combinatorial chemistry– Crystallography, laser and surface chemistry examples– Development of an e-Lab using pervasive computing technology– National Crystallography Service

• Resource Discovery Network / PSIgate physical sciences portal

• http://www.ukoln.ac.uk/projects/ebank-uk/

Page 17: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 17

Learning & Teaching workflows

Research & e-Science workflows

Aggregator services:

eBank UK

Repositories : institutional, e-prints, subject, data, learning objects

Institutional presentation services: portals, Learning Management Systems, u/g, p/g courses, modules

Harvestingmetadata

Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media

Resource discovery, linking, embedding

Deposit / self-archiving

Peer-reviewed publications: journals, conference proceedings

Publication

Validation

Data analysis, transformation, mining, modelling

Resource discovery, linking, embedding

Deposit / self-archiving

Learning object creation, re-use

Searching , harvesting, embedding

Quality assurance bodies

Validation

Presentation services: subject, media-specific, data, commercial portals

Resource discovery, linking, embedding

Page 18: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 18

Data Flow in eBank UK

OA

I-P

MH

Submit

Store/link

Harvest (XML)

Index and Search

Data files

Metadatapresent

HTML

present

HTML

Institutional repository

eBank aggregator

Create

Page 19: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

Comb-e-Chem Project

X-Raye-Lab

Analysis

Properties

Propertiese-Lab

SimulationVideo

Diff

ract

omet

er

Grid Middleware

StructuresDatabase

Page 20: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 20

Page 21: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 21

The digital repository

ecrystals.chem.soton.ac.uk

Acknowledgement: Simon Coles

Page 22: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 22

Access to the underlying data

Page 23: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 23

Harvesting: OAIster

Page 24: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 24

Aggregating: search & discover

Page 25: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 25

Linking to publications

Page 26: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 26

eBank embedded in a science portal

Page 27: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 27

eBank Phase 2: linking to learning

• Embedding in e-Learning processes• Evaluating the pedagogical benefits

– MChem course

– Chemical informatics course

Page 28: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 28

2. Issues: generic data models, metadata schema & terminology

• Validation against other schema– CCLRC Scientific Data Model Vs 2

• Complex digital objects and packaging options – METS– MPEG 21 DIDL

• Terminologies– Domain: crystallography– Inter-disciplinary e.g. biomaterials– Metadata enhancement: subject keyword additions to datasets

based on knowledge of keywords in related publications – Meaningful resource discovery?

Page 29: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 29

3. Issues: linking and identifiers

• Links to individual datasets within an experiment• Links to all datasets associated with an experiment or a data

collection• Links to derived eprints and published literature • Context sensitive linking: find me

– Datasets by this author / creator– Datasets related to this subject– Learning objects by this author / creator– Learning objects related to this subject

• Identifiers and persistence– “generic” – domain: International Chemical Identifier (InChI code)

• Resource discovery : Google Scholar?• Provenance: authenticity, authority, integrity?

Page 30: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 30

4. Issues: embedding and workflow

• Into the crystallographic publishing community International Union of Crystallography

• Into the chemistry research workflow– SMART TEA Digital Lab Book e-synthesis Lab– Other analytical techniques and instrumentation

• Into the curriculum and e-Learning workflows– MChem course – Undergraduate Chemical Informatics courses

Page 31: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 31

For later use? In use now (and the future)?

Repositories and digital curation

Data preservation Data curation

Static Dynamic

“maintaining and adding value to a trusted body of digital information for current and future use”

Page 32: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 32

Provide value-added services

Annotation

• e-Lab books (Smart Tea Project in chemistry)

• Gene and protein sequences

Page 33: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 33

Enable “post-processing” and knowledge extraction

The acquisition of newly-derived information and knowledge from repository content

• Run complex algorithms over primary datasets

• Mining (data, text, structures)

• Modelling (economic, climate, mathematical, biological)

• Analysis (statistical, lexical, pattern matching, gene)

• Presentation (visualisation, rendering)

Page 34: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 34

Page 35: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 35

5. Issues: “knowledge services”• Layered over repositories

– Annotation– Mining, modelling, analysis– Visualisation

• Across multiple repositories– Grid enabled applications– Highly distributed, dynamic and collaborative

• Associated with curatorial responsibility– UK Digital Curation Centre

http://www.dcc.ac.uk

Page 36: UKOLN is supported by: From research data to new knowledge: a lifecycle approach. Dr Liz Lyon, Director UKOLN, University of Bath, UK JISC/SURF/CNI Conference

                                                             

JISC/SURF/CNI Conference May 2005 36

Issues summary1. Research data is diverse, increasing rapidly in

volume and complexity

2. Repository collections are dynamic and evolve

3. Technical challenges associated with interoperability, persistence, provenance, resource discovery and infrastructure provision

4. Embedding in workflow is critical: scholarly communications, research practice, learning

5. Knowledge extraction tools will generate new discoveries based on repository content

6. Repository solutions must scale: M2M processing will become the norm……