Upload
edward-blurock
View
161
Download
1
Embed Size (px)
Citation preview
ChemConnectA use case example using cloud services
• Data is the backbone of modern scientific research
• Exchange of data is paramount to successful interaction between research groups
Motivation
Publications and conferences
Data exchanged between researchers (email, etc)
Virtual Research Environment
paper
Data files
Clouds (infrastructures)
Towards Virtual Research Environment
Cloud based Database
ChemConnect
Repository and connected data(first step towards a electronic scientific notebook
backed by searchable data)
http://www.chemicalkinetics.info
Make the immense amount of data in the combustion community
not only available
but searchable
ChemConnect: Current Phase
Start with data set (in an accepted format)
Recognize interdependencies between datathrough connected relationships (semantic web concepts)
Parse the data set and produce fine-grained pieces of data
ChemConnect: client-server Structure
User interface on browser, tablet or phone(adjustable for each)
Generates InterfaceChemConnect
Computingand
Responses
SERVER CLIENT
http://www.chemicalkinetics.info
Current (Prototype) status• Number of data sets (primarily CHECKIN) 29 mechanisms
• Data sources (public domain from web):
• LLNL, Galway, San Diego, Stanford, Lund, CNRS-Nancy
• Size of database: 1.5 GB
• Number of data objects: 800k (objects: 240MB, indices 1.2GB)
• Number of relationships (fine grain semantics): 600k
• In the next phase, these numbers will increase dramatically
• More mechanisms
• Different types of data (theoretical, experimental, more 2D-graphical)
ChemConnect database components
Repository of data sets
Description
References
Data in accepted format
IndividualData objects
Build relationshipsbetween
Data objects
RDF: Resource Description Language
Subject: The subject of the
description
Predicate: The description of the relationship
between subject and object
Object: The object of the description
Subject ObjectPredicate
Concept of ontologies from the semantic web
Relationships(example from CHEMKIN mechanism)
Object Relationship Object
Mech-butane-2011 hasReaction c2h5+o2 = c2h5o2
Mech-butane-2011 hasSpecies c2h5
c2h5o2 = c2h4o2h hasReactant c2h4o2h
c2h5o2 = c2h4o2h hasProduct c2h4o2h
c2h4o2h isIsomer c2h5o2
c2h4o2h hasStandardEnthalpy -276.51 kJ/mol
c2h5 hasProduct c2h5o2
c2h5 hasProduct c2h4o2h
c2h5o2 = c2h4o2h subMechanism C2
c2h5o2 = c2h4o2h subMechanism C2H5O2
C2h5 + o2 = c2h5o2 followedBy c2h5o2=c2h4o2h
Google Cloud Platform: datastore
ChemConnect example
keyword: repository (data set input)
Data set information
Data input
Direct link from website
Drag and drop a file
Text field
Searching through the connected data
Keyword
Searching with keywordscan be viewed as
moving through the connected data
Subject
Object
keyword search: fine grained info
reaction information
find a reaction
keyword search
Connecting ’unrelated’ dataPassive Connection:
Don’t need to know which structures you want to connect to
If they share an RDF subject or a RDF object
Then they are connected!!Keyword: Passive
(independent datasets had no knowledge of other datasets)
Linking data/models
ChemkinModel I
ChemkinModel II
2-D Structure
ComputationalChemistry
Calculations
AutomaticallyGeneratedCHEMKIN
Model1-Butyl-3-hydroperoxide
C4H11O2
ch2ch2ch(ooh)ch31-c4hh8-3-ooh
hasSpecieshasSpecies
hasSpecies
hasThermo
isIsomer isIsomer
isIsomer
Thermo
hasThermo
Thermo
hasThermo
Thermo
Future directionsData presentation
Toward tool for comparison and mechanism building (ex. shopping cart of data)
Data object visualisationPresentation of search tree resultsIncrease number and source of data:
Data sets with 2-D (sub)structures (connecting substructures to species/reactionsExperimental data from source groups (see discussion after this session)Supplementary data from journals
Query More complex searches
multiple keywordsinterpretation/preprocessing of keyword expression before search
Ordering and filtering results (passive and with check boxes)Data Input
Researchers can enter their own dataFurther develop concept of private, group shared and public data
Thank you…… any Questions?
I encourage you to try ChemConnect and give feedback
www.chemicalkinetics.info