37
21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference Glasgow The curation of laboratory The curation of laboratory experimental data as part of the experimental data as part of the overall data lifecycle overall data lifecycle Jeremy G.Frey Jeremy G.Frey School of Chemistry, University of School of Chemistry, University of Southampton, UK Southampton, UK 21 Nov 2006 21 Nov 2006 DCC Conference, Glasgow DCC Conference, Glasgow

21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference Glasgow

The curation of laboratory The curation of laboratory experimental data as part of the experimental data as part of the

overall data lifecycleoverall data lifecycle

Jeremy G.FreyJeremy G.FreySchool of Chemistry, University of School of Chemistry, University of

Southampton, UKSouthampton, UK

21 Nov 200621 Nov 2006

DCC Conference, GlasgowDCC Conference, Glasgow

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

If you do things right at the start then all the following processes are much easier!Exponentially growing amount of data - the future overwhelms the past

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

The CombThe CombeeChem ProjectChem Project End to End linking of data and End to End linking of data and

informationinformation Publication@SourcePublication@Source

So collect data with regard to how it So collect data with regard to how it could eventually be usedcould eventually be used Make sure the metadata is of high qualityMake sure the metadata is of high quality Record properly at source in Digital FormRecord properly at source in Digital Form

The Chemistry LabThe Chemistry Lab People & Machines working togetherPeople & Machines working together

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Combechem

Smart Lab

R4L

e-Bank

E-Malaria

Instruments on the Grid

BioSimGridStatistics

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Plan & COSHH

Digital Model

InformationIntegration

Report

Knowledge

Goal

Literature

Synthesis

not just one laboratory but many co-laboratories

working together

Analysis

Smart Laboratory

Smart Storage

Smart Dissemination

Smart HCI

The concept of Publication @ The concept of Publication @ SourceSourceThe concept of Publication @ The concept of Publication @ SourceSource

Smart Workflow

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

If only I knew exactly how she did this experiments

I know all this supplementary information could be useful but will people really remember the format? Is it worth all the hassle?

I wish I could get the numbers from this graph - the pdf is not much use.

I wish I had recorded things at the start the way I do now…..

Typical Laboratory

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

First, they do an online search

Need to make the data available

Need to be able to find it

But how to expose it?

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

I am sure we collected that information a few years ago…

The details should be in her thesis…..

Can you read what he says here….?

Can you find the file of data that were used to make the plot?

Some of these problems are due to the lack of information recorded at the time. Others are due to loss of information over time.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

What are the people up to?What are the people up to?

Capture Data and ContextCapture Data and Context PeoplePeople ProcessProcess EnvironmentEnvironment

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Permanent, documented and primary record of laboratory

observations

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Observations are nevercollected on note pads,

filter paper or other temporary paper for later transfer into a

notebook

If you are caught using the “scrap of paper” technique,

your improperly recorded data may be confiscated by your TA

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

COSHHCOSHHLLeverage off things we already everage off things we already have to do – “We have a cunning have to do – “We have a cunning plan”plan”

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

1 1 2 2 1 3 1 4

Sample of 4-flourinatedbiphenyl

Add CoolReflux

Butanone Sample ofK2CO3Powder

Weigh

grammes0.9031

Measure

40 ml

Add

Weigh

2.0719 g

text

3 5

Add

g

Sample ofBr11OCB

2 6

Reflux

2 7

Cool

Water

Measure

30 ml

9

Liquid-liquid

extraction

DCM

Measure

3 of 40 ml

10

Dry

MgSO4

11

Filter(Buchner)

12RemoveSolvent

by RotaryEvaporation

13

Fuse

Silica

14Column

Chromatography

Ether/PetrolRatio

Butanone dried via silica column andmeasured into 100ml RB flask.

Used 1ml extra solvent to wash outcontainer.

Started reflux at 13.30. (Had tochange heater stirrer) Only reflux

for 45min, next step 14:15.

Inorganics dissolve 2layers. Added brine

~20ml.

Organics are yellowsolution

Washed MgSO4 withDCM ~ 50ml

Measure

excess

Observation Types

weight - grammes

measure - ml, drops

annotate - text

temperature - K, °C

Key

Process

Input

Literal

Observation

Add CoolRefluxAddAdd Reflux Cool Dry Filter Remove

Solventby Rotary

Evaporation

Fuse ColumnChromatography

Dissolve 4-flourinatedbiphenyl inbutanone

Add K2CO3powder

Heat at refluxfor 1.5 hours

Cool and addBr11OCB

Heat atreflux untilcompletion

Cool and addwater (30ml)

Combine organics,dry over MgSO4 &filter

Removesolvent invacuo

Liquid-liquid

extraction

Extract withDCM(3x40ml)

Fuse compound to silica &column in ether/petrol

4 8

Add

Add

text

Annotate

Annotate

text

Weigh

Annotate

g

Annotate Annotate

text text

Future Questions

Whether to have many subclasses of processes or fewer with annotations

How to depict destructive processes

How to depict taking lots of samples

What is the observation/process boundary? e.g. MRI scan

1.5918

Combechem

30 January 2004gvh, hrm, gms

Ingredient List

Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml

image

To Do

List

Plan

Process

Record

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

1 1 2 2 1 3

Sample of 4-flourinatedbiphenyl

Add Reflux

Butanone Sample ofK2CO3Powder

Weigh

grammes0.9031

Measure

40 ml

Add

W eigh

2.0719 g

text

Butanone dried via silica column andmeasured into 100ml RB flask.

Used 1ml extra solvent to wash outcontainer.

Started reflux at 13.30. (Had tochange heater stirrer) Only reflux

for 45min, next step 14:15.

Add RefluxAdd

Dissolve 4-flourinatedbiphenyl inbutanone

Add K2CO3powder

Heat at refluxfor 1.5 hours

text

Annotate

Annotate

Ingredient List

Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml

Pub-Sub systems provide the flexible & extensible approach to distribution of real time laboratory monitoring & archiving

Data Source

ArchiveClient

WebClient

Mobilephone

Data Source

PDA

MessageBroker

TranslatorService BLOG

Air Conditioning failed

Smart Laboratory Spaces

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

But what about the laboratory environment?

“I just realized, Howard, that everything in this apartment is more sophisticated than we are”

Semantic DataGridSemantic DataGrid

CombeChem used, tested & CombeChem used, tested & strained the Semantic Web strained the Semantic Web forfor Enhanced (annotated) DataGrid Enhanced (annotated) DataGrid

over multiple diverse storesover multiple diverse stores Storage of Provenance Storage of Provenance

Information Information Some Data StorageSome Data Storage Annotated multimedia streamsAnnotated multimedia streams Units & Propoerties OntologyUnits & Propoerties Ontology Multiple Triple StoresMultiple Triple Stores

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Laboratory “Blogs”Laboratory “Blogs”

Laboratory notebook is a BlogLaboratory notebook is a Blog Encourage and facilitate collaborationEncourage and facilitate collaboration Need a data repository behind the Need a data repository behind the BBloglog

R4LR4L E-BankE-Bank

Flexible Flexible Service oriented approach being developedService oriented approach being developed

A VREA VRE

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Instrument Blog

‘Blog-jects’

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

The ‘Scientific Blog’ is being tried in an attempt to combine laboratory notebooks and publication

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Format Issues – everyday and for the long term

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Note the use of “YouTube”

An experiment that failed… Publishable? Useful?

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Record the ‘Scientific Conversation’ – this part of the record often exists only in the ‘grey literature’

CoAKTing

Memetic

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Laboratory IRs and Information Laboratory IRs and Information ManagementManagement

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Repositories

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

ValidationValidation

Increasing the value of data Increasing the value of data How to bring all the necessary information How to bring all the necessary information

together to enable appropriate validationtogether to enable appropriate validation Increasingly difficult & expensive to Increasingly difficult & expensive to

achieveachieve Need provenance and contextNeed provenance and context Essential step otherwise just a collection Essential step otherwise just a collection

of items of items

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Why?Why?Publishing Data and Information Publishing Data and Information

LossLoss

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

SVG “active” graphics

Link to data, follow links back to the raw data archive

Link to simulation, full simulation data archived in BioSimGrid

R4L

Paper organized using RDF

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Access to information requires Access to information requires crossing administrative domainscrossing administrative domains

Researcher

NationalArchive

ResearchGroup

InstitutionInternational

Database

ResearchGroup

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Subversive and furtive sharing & exploitation of data in virtual

space

Data

CAS

RDF

OAI Taxi

E-

user

LabsDigital Repository

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

He is charged with expressing contempt for meta-data

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Metadata LifecycleMetadata Lifecycle

Creation and maintenance of metadataCreation and maintenance of metadata Need a metadata infrastructure as well as Need a metadata infrastructure as well as

a data infrastructurea data infrastructure Capture process as well as resultsCapture process as well as results Automatic metadata generation when Automatic metadata generation when

possiblepossible Human annotation will always be neededHuman annotation will always be needed

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

PlansPlans

Plans are usefulPlans are useful This is the way things are supposed to be This is the way things are supposed to be

donedone The Plan provides a digital context so The Plan provides a digital context so

increases the value of planningincreases the value of planning Key to our ‘Smart Lab’ approach….Key to our ‘Smart Lab’ approach…. Is it the best way?Is it the best way?

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

Who is responsible Who is responsible

Context is crucial for curation Context is crucial for curation every person, on each step of the process every person, on each step of the process

of converting data to knowledge of converting data to knowledge Need to consider the future access to this Need to consider the future access to this

information by themselves and others.information by themselves and others.

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference Glasgow

Information Providers Information

Consumers

These are the same people – if we can ‘talk’ to ourselves efficiently over time then that is a good start to be able to ‘talk’ to others

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

All I am saying is that now is the time to develop the technology to deflect an asteroid

We must speed up the knowledge discovery process

21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton

DCC Conference 2006

PEOPLEPEOPLE

Southampton ECS, Southampton ECS, MATHS & CHEMISTRYMATHS & CHEMISTRY

IT-INNOVATIONIT-INNOVATION BRISTOLBRISTOL UKOLNUKOLN CCLRCCCLRC INDIANAINDIANA SYDNEYSYDNEY MANCHESTERMANCHESTER

EPRSC e-Science & EPRSC e-Science & Chemistry ProgrammesChemistry Programmes

JISC e-InfrastructreJISC e-Infrastructre

DTIDTI

See web site for full See web site for full details and linksdetails and links

www.combechem.orgwww.combechem.org