Upload
truongtuyen
View
219
Download
6
Embed Size (px)
Citation preview
Electronic Laboratory Notebooks (ELNs) are
routinely used to capture chemical reactions
and experiments in multi-user settings.
ELNs are optimized to make data capture
as easy as possible, but therefore are
sub-optimal for search and data retrieval.
Scientists thus rarely use their in-house ELN
as a source for reaction knowledge, despite
the volume of institutionally relevant information stored within.
Institutions may also have multiple sources of internal reaction information, including ELNs potentially from more than one vendor, and other in-house reaction databases. Scientists lack a single location from where they can search all the sources available.
Other data captured with the reaction is also relevant for scientists and management. Management lacks convenient tools to monitor performance and trends across scientists, projects, sites, conditions or other properties.
Reaction data is extracted from the ELN as an XML file and transformed into a new external, parallel database which is optimized for search. The reaction information is captured along with relevant metadata, such as the scientist, date, state of the experiment, chemical properties of the reactants and products, and reaction properties such as temperature, yield and solvents. Additional sources of reaction information can be merged into the data at this point. Custom web views into this new database then display query tools, results and performance metrics.
Performance is enhanced by a combination of both stepping outside of the ELN to optimize search and also using a new fragment based search methodology. As part of this transformation each unique molecular structure gets an ID number and each reaction gets a reaction transformation fingerprint based off of the fragments identified within the reactants, reagents and products.
Results are returned in multiple buckets to present the initial and most relevant results to the user while the search is still continuing.
Extremely rapid searching of in-house reaction databases:
Turning ELN data into a searchable library Philip J Skinner PhD, Scott Flicker, Joshua Wakefield, Sean Greenhow PhD, Megean Schoenberg, Kate Blanchard, Phil McHale D. Phil,
Sandra W Sessoms and Robin Smith
PerkinElmer Informatics, 100 CambridgePark Drive, Cambridge, MA02140 Scan to download a copy of this poster
Or visit www.cambridgesoft.com/code_land/Genius_ACS_2012.aspx
The Problem – Reaction Searching in ELNs
The Solution – Step Outside The ELN
Queries are run against a library of pre-determined reaction fragment fingerprints to optimize performance. Any given chemical reaction can be described by a set of fragments it contains, sourced from a predefined list. A library of reactions can be searched by comparing fragments in the search target with prospective hits. Thus for a typical reaction:
Fragments can be identified from the predefined list:
When the fragments are identified, these fragments are grouped by products and reactants. Fragments common to both sides are removed (factored) to create a transformation fingerprint
When a query is made against the library of transformation fingerprints, results are returned to the user in buckets correlating to decreasing match criteria. Within each bucket results are organized by decreasing product molecular weight. The buckets are further organized in the web view into Top Hits, Fragment Hits and Fuzzy Logic Hits correlating to sets of successive buckets. Using this approach results are more biased towards Functional Group Interconversions than a traditional substructure-biased cartridge search.
The Technology – Fragment Based Fingerprinting
Unfactored Fingerprint
001(4).003(2).004->001(4).002(2).003(3).005(2)
Factored fingerprint
004->002(2).003.005(2)
Performance was tested against a public reaction database containing approximately 500,000 reactions and 700,000 unique structures. 40 searches were conducted either sequentially, or concurrently at 1 second intervals. The time to return the first hit, and the time to complete the search, or return a maximum of 100 hits was recorded.
The application consists of three web views presented to the end-user, namely a dashboard to provide management level metrics and performance data, a query window and a results window.
Reaction searching can be optimized by “stepping outside” of the ELN
Novel fragment based search gives a fast, and more FGI (Functional Group Interconversions) biased hit-list
Performance metrics can be simultaneously accessed and presented to management
Query tools and Search results can be presented to the end user in an intuitive web based application, Reaction Genius™
Performance Testing
Oracle,2.4 GHz Celeron Core 2 dual db server, 2.8 GHz single core Pentium 4 running the test application.
The User Experience
Performance metrics
highlight the most
productive scientists,
teams, projects or
sites
Dashboard is built
on a widget model to
allow easy
customization and
hence institutionally
specific views into
the data
Widgets provide real-
time views of the
most recent additions
Combined structure,
chemical , experimental
and hierarchical property
search parameters
“Sharpen” provides a subsequent
cartridge search
Expandable reaction graph to explore precursors and
products throughout the synthetic scheme
Results are returned in
buckets, with the most
relevant results returned
first
Fuzzy Logic
buckets can
be excluded
Conclusions