Upload
anna-bruce
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
28 October 2005 Jeremy Frey, University of Southampton 1
“The CombeChem Experience”
CICC Workshop
28 October 2005
Bloomington Indiana
28 October 2005 Jeremy Frey, University of Southampton 2
Chemical Data & Chemical Grids• Chemical data, information & knowledge
– Experiments, Simulation & Computation
• Exponential growth in generation of data– Need automatic capture of meta data
• Start in the laboratory – pervasive physical grid
• Computational chemistry very significant source Software to be used by chemists so must be simple to support & maintain – autonomic
28 October 2005 Jeremy Frey, University of Southampton 3
Chemical Semantic Grid
• RDF (Resource description framework)– From the semantic web world– Best system for the description of chemical data and
processes– Achieves the same as XML + unique identifiers +
linking up in a simpler manner
• Large scale triple stores (so far up to 50 Million triples of molecular structures and properties)
• Need for scalable software solutions
28 October 2005 Jeremy Frey, University of Southampton 4
He is charged with expressing contempt for meta-data
28 October 2005 Jeremy Frey, University of Southampton 5
Permanent, documented and primary record of laboratory
observations
28 October 2005 Jeremy Frey, University of Southampton 6
Observations are nevercollected on note pads,
filter paper or other temporary paper for later transfer into a
notebook
If you are caught using the “scrap of paper” technique,
your improperly recorded data may be confiscated by your TA
28 October 2005 Jeremy Frey, University of Southampton 7
Digital record at source don’t try to add metadata after the fact
28 October 2005 Jeremy Frey, University of Southampton 8
Record the chemical processes as well as the data in RDF
Physical World
RDF
28 October 2005 Jeremy Frey, University of Southampton 9
Old technology does not scale
Problems with relational databases- information too variable and rapidly changing types- multimedia, images are the output of current experiments
28 October 2005 Jeremy Frey, University of Southampton 10
Create large semantically rich database of structures and properties
URI - INChi
28 October 2005 Jeremy Frey, University of Southampton 11
Property in RDF
• <c:OrganicMolecule rdf:about="file:///storage/ba8efc2ce0edada69d63b02d1b8630c6.rdf">
• <c:has-inchi>1.12Beta/C12H13NO2/c1-2-15-8-9-5-6-11(14)12-10(9)4-3-7-13-12/h1H3,2H2,3-7H,8H2,14H</c:has-inchi>
• <c:has-cas>22049-19-0</c:has-cas>• <c:has-empirical-formula>C12H13NO2</c:has-empirical-formula>• <c:has-stereocentres>0</c:has-stereocentres>• <c:has-property>• <c:MeltingPoint>• <c:has-information>• <c:Information>• <c:has-value>150</c:has-value>• <c:has-uncertainty>• <c:Range>• <c:has-value>16</c:has-value>• </c:Range>• </c:has-uncertainty>• </c:Information>• </c:has-information>• </c:MeltingPoint>• </c:has-property>• </c:OrganicMolecule>
Currently testing on 200,000 compounds but about to go up by order of magnitude
3Store is a scaleable solution
28 October 2005 Jeremy Frey, University of Southampton 12
You see that dark spooky image on the screen? That’s your credit history coming back to haunt you?
ProvenanceRecord experimentsMake data available(e-crystals, e-Bank)
28 October 2005 Jeremy Frey, University of Southampton 13
Security and trust for experiments and data
Experiments on the Gridnational crystallography service
28 October 2005 Jeremy Frey, University of Southampton 14
Chemistry Data private or public,
open or controlled access
28 October 2005 Jeremy Frey, University of Southampton 15
Subversive and furtive exploitation of data
Data
CAS
PubMed
CML
RDF
28 October 2005 Jeremy Frey, University of Southampton 16
E-BankE-crystals
R4L
28 October 2005 Jeremy Frey, University of Southampton 17
Standards?Interoperable?Convertible?Useable?
28 October 2005 Jeremy Frey, University of Southampton 18
Linking Chemistry to the Life-Sciences and the Environment
• Need to link up small and large molecule chemistry– Bio-Informatics– Medical informatics
• Need to link in place and time– Environmental Informatics– Spatial-Temporal issues at a cellular and
organism level
• Statistical Modelling
28 October 2005 Jeremy Frey, University of Southampton 19
Making sure Chemistry will not suffer from a data crunch
All I’m saying is that now is the time to develop the technology to deflect an asteroid