View
217
Download
0
Embed Size (px)
Citation preview
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference Glasgow
The curation of laboratory The curation of laboratory experimental data as part of the experimental data as part of the
overall data lifecycleoverall data lifecycle
Jeremy G.FreyJeremy G.FreySchool of Chemistry, University of School of Chemistry, University of
Southampton, UKSouthampton, UK
21 Nov 200621 Nov 2006
DCC Conference, GlasgowDCC Conference, Glasgow
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
If you do things right at the start then all the following processes are much easier!Exponentially growing amount of data - the future overwhelms the past
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
The CombThe CombeeChem ProjectChem Project End to End linking of data and End to End linking of data and
informationinformation Publication@SourcePublication@Source
So collect data with regard to how it So collect data with regard to how it could eventually be usedcould eventually be used Make sure the metadata is of high qualityMake sure the metadata is of high quality Record properly at source in Digital FormRecord properly at source in Digital Form
The Chemistry LabThe Chemistry Lab People & Machines working togetherPeople & Machines working together
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Combechem
Smart Lab
R4L
e-Bank
E-Malaria
Instruments on the Grid
BioSimGridStatistics
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Plan & COSHH
Digital Model
InformationIntegration
Report
Knowledge
Goal
Literature
Synthesis
not just one laboratory but many co-laboratories
working together
Analysis
Smart Laboratory
Smart Storage
Smart Dissemination
Smart HCI
The concept of Publication @ The concept of Publication @ SourceSourceThe concept of Publication @ The concept of Publication @ SourceSource
Smart Workflow
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
If only I knew exactly how she did this experiments
I know all this supplementary information could be useful but will people really remember the format? Is it worth all the hassle?
I wish I could get the numbers from this graph - the pdf is not much use.
I wish I had recorded things at the start the way I do now…..
Typical Laboratory
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
First, they do an online search
Need to make the data available
Need to be able to find it
But how to expose it?
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
I am sure we collected that information a few years ago…
The details should be in her thesis…..
Can you read what he says here….?
Can you find the file of data that were used to make the plot?
Some of these problems are due to the lack of information recorded at the time. Others are due to loss of information over time.
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
What are the people up to?What are the people up to?
Capture Data and ContextCapture Data and Context PeoplePeople ProcessProcess EnvironmentEnvironment
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Permanent, documented and primary record of laboratory
observations
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Observations are nevercollected on note pads,
filter paper or other temporary paper for later transfer into a
notebook
If you are caught using the “scrap of paper” technique,
your improperly recorded data may be confiscated by your TA
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
COSHHCOSHHLLeverage off things we already everage off things we already have to do – “We have a cunning have to do – “We have a cunning plan”plan”
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
1 1 2 2 1 3 1 4
Sample of 4-flourinatedbiphenyl
Add CoolReflux
Butanone Sample ofK2CO3Powder
Weigh
grammes0.9031
Measure
40 ml
Add
Weigh
2.0719 g
text
3 5
Add
g
Sample ofBr11OCB
2 6
Reflux
2 7
Cool
Water
Measure
30 ml
9
Liquid-liquid
extraction
DCM
Measure
3 of 40 ml
10
Dry
MgSO4
11
Filter(Buchner)
12RemoveSolvent
by RotaryEvaporation
13
Fuse
Silica
14Column
Chromatography
Ether/PetrolRatio
Butanone dried via silica column andmeasured into 100ml RB flask.
Used 1ml extra solvent to wash outcontainer.
Started reflux at 13.30. (Had tochange heater stirrer) Only reflux
for 45min, next step 14:15.
Inorganics dissolve 2layers. Added brine
~20ml.
Organics are yellowsolution
Washed MgSO4 withDCM ~ 50ml
Measure
excess
Observation Types
weight - grammes
measure - ml, drops
annotate - text
temperature - K, °C
Key
Process
Input
Literal
Observation
Add CoolRefluxAddAdd Reflux Cool Dry Filter Remove
Solventby Rotary
Evaporation
Fuse ColumnChromatography
Dissolve 4-flourinatedbiphenyl inbutanone
Add K2CO3powder
Heat at refluxfor 1.5 hours
Cool and addBr11OCB
Heat atreflux untilcompletion
Cool and addwater (30ml)
Combine organics,dry over MgSO4 &filter
Removesolvent invacuo
Liquid-liquid
extraction
Extract withDCM(3x40ml)
Fuse compound to silica &column in ether/petrol
4 8
Add
Add
text
Annotate
Annotate
text
Weigh
Annotate
g
Annotate Annotate
text text
Future Questions
Whether to have many subclasses of processes or fewer with annotations
How to depict destructive processes
How to depict taking lots of samples
What is the observation/process boundary? e.g. MRI scan
1.5918
Combechem
30 January 2004gvh, hrm, gms
Ingredient List
Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml
image
To Do
List
Plan
Process
Record
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
1 1 2 2 1 3
Sample of 4-flourinatedbiphenyl
Add Reflux
Butanone Sample ofK2CO3Powder
Weigh
grammes0.9031
Measure
40 ml
Add
W eigh
2.0719 g
text
Butanone dried via silica column andmeasured into 100ml RB flask.
Used 1ml extra solvent to wash outcontainer.
Started reflux at 13.30. (Had tochange heater stirrer) Only reflux
for 45min, next step 14:15.
Add RefluxAdd
Dissolve 4-flourinatedbiphenyl inbutanone
Add K2CO3powder
Heat at refluxfor 1.5 hours
text
Annotate
Annotate
Ingredient List
Fluorinated biphenyl 0.9 gBr11OCB 1.59 gPotassium Carbonate 2.07 gButanone 40 ml
Pub-Sub systems provide the flexible & extensible approach to distribution of real time laboratory monitoring & archiving
Data Source
ArchiveClient
WebClient
Mobilephone
Data Source
PDA
MessageBroker
TranslatorService BLOG
Air Conditioning failed
Smart Laboratory Spaces
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
But what about the laboratory environment?
“I just realized, Howard, that everything in this apartment is more sophisticated than we are”
Semantic DataGridSemantic DataGrid
CombeChem used, tested & CombeChem used, tested & strained the Semantic Web strained the Semantic Web forfor Enhanced (annotated) DataGrid Enhanced (annotated) DataGrid
over multiple diverse storesover multiple diverse stores Storage of Provenance Storage of Provenance
Information Information Some Data StorageSome Data Storage Annotated multimedia streamsAnnotated multimedia streams Units & Propoerties OntologyUnits & Propoerties Ontology Multiple Triple StoresMultiple Triple Stores
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Laboratory “Blogs”Laboratory “Blogs”
Laboratory notebook is a BlogLaboratory notebook is a Blog Encourage and facilitate collaborationEncourage and facilitate collaboration Need a data repository behind the Need a data repository behind the BBloglog
R4LR4L E-BankE-Bank
Flexible Flexible Service oriented approach being developedService oriented approach being developed
A VREA VRE
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Instrument Blog
‘Blog-jects’
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
The ‘Scientific Blog’ is being tried in an attempt to combine laboratory notebooks and publication
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Format Issues – everyday and for the long term
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Note the use of “YouTube”
An experiment that failed… Publishable? Useful?
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Record the ‘Scientific Conversation’ – this part of the record often exists only in the ‘grey literature’
CoAKTing
Memetic
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Laboratory IRs and Information Laboratory IRs and Information ManagementManagement
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Repositories
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
ValidationValidation
Increasing the value of data Increasing the value of data How to bring all the necessary information How to bring all the necessary information
together to enable appropriate validationtogether to enable appropriate validation Increasingly difficult & expensive to Increasingly difficult & expensive to
achieveachieve Need provenance and contextNeed provenance and context Essential step otherwise just a collection Essential step otherwise just a collection
of items of items
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Why?Why?Publishing Data and Information Publishing Data and Information
LossLoss
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
SVG “active” graphics
Link to data, follow links back to the raw data archive
Link to simulation, full simulation data archived in BioSimGrid
R4L
Paper organized using RDF
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Access to information requires Access to information requires crossing administrative domainscrossing administrative domains
Researcher
NationalArchive
ResearchGroup
InstitutionInternational
Database
ResearchGroup
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Subversive and furtive sharing & exploitation of data in virtual
space
Data
CAS
RDF
OAI Taxi
E-
user
LabsDigital Repository
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
He is charged with expressing contempt for meta-data
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Metadata LifecycleMetadata Lifecycle
Creation and maintenance of metadataCreation and maintenance of metadata Need a metadata infrastructure as well as Need a metadata infrastructure as well as
a data infrastructurea data infrastructure Capture process as well as resultsCapture process as well as results Automatic metadata generation when Automatic metadata generation when
possiblepossible Human annotation will always be neededHuman annotation will always be needed
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
PlansPlans
Plans are usefulPlans are useful This is the way things are supposed to be This is the way things are supposed to be
donedone The Plan provides a digital context so The Plan provides a digital context so
increases the value of planningincreases the value of planning Key to our ‘Smart Lab’ approach….Key to our ‘Smart Lab’ approach…. Is it the best way?Is it the best way?
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
Who is responsible Who is responsible
Context is crucial for curation Context is crucial for curation every person, on each step of the process every person, on each step of the process
of converting data to knowledge of converting data to knowledge Need to consider the future access to this Need to consider the future access to this
information by themselves and others.information by themselves and others.
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference Glasgow
Information Providers Information
Consumers
These are the same people – if we can ‘talk’ to ourselves efficiently over time then that is a good start to be able to ‘talk’ to others
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
All I am saying is that now is the time to develop the technology to deflect an asteroid
We must speed up the knowledge discovery process
21 Nov 200621 Nov 2006 Jeremy G. FreyJeremy G. FreyUniversity of SouthamptonUniversity of Southampton
DCC Conference 2006
PEOPLEPEOPLE
Southampton ECS, Southampton ECS, MATHS & CHEMISTRYMATHS & CHEMISTRY
IT-INNOVATIONIT-INNOVATION BRISTOLBRISTOL UKOLNUKOLN CCLRCCCLRC INDIANAINDIANA SYDNEYSYDNEY MANCHESTERMANCHESTER
EPRSC e-Science & EPRSC e-Science & Chemistry ProgrammesChemistry Programmes
JISC e-InfrastructreJISC e-Infrastructre
DTIDTI
See web site for full See web site for full details and linksdetails and links
www.combechem.orgwww.combechem.org