Upload
trinity-quinlan
View
216
Download
1
Tags:
Embed Size (px)
Citation preview
version 5.3, February 2010
Scientific & technical presentation
JChem Base
Introduction to JChem Base
High performance Java based tools for:
storage, search and retrieval of chemical
structures and associated data
The components can be integrated into
web-based or standalone applications
in association with other ChemAxon tools
Structural overview
Web
browser
Application Web application
JChem Base API:Chemical logicStructure cache
JDBC driver: Standard interface to the RDBMS
RDBMS (e.g. Oracle, MySQL, etc.) :
Storage and security
Compatibility and integration
File formats:• SMILES• MDL molfile (v2000 and v3000)• MDL SDF• RXN• RDF• MRV• IUPAC name, InChI• Markush DARC• CDX
Integration:extensive API for• Java• .NET• JChem Cartridge for Oracle
Database engines:• Oracle• MySQL• MS SQL Server• PostgreSQL• MS Access• IBM DB2• Derby• etc.
Operating systems:• Windows• Linux• Mac OS X• Solaris• etc.
JSP example application
Features:
• Substructure, Superstructure, Full, Exact fragment, Similarity and Perfect search
• Molecular Descriptor similarity search with descriptor coloring
• Substructure hit alignment and coloring, inverse hit list
• Chemical Terms filter
• Import / Export
• Export of hits
• Insert / Modify / Delete structures
• AJAX in JChem Webservices
Structure search features
See detailed information on structure search: www.chemaxon.com/conf/Structural_Search.ppt
• Wide range of query atoms
• Query properties
• R-group queries
• Full SMARTS support
• Coordination compounds
• Link nodes
• Pseudo atoms, lone pairs
• Relative stereo
• Reaction search features
• Hit coloring, position variation
• Polymers
Search options
Some selected structure search options:•Stereo on/off
•Ignore charge/isotope/radical/
valence/polymers, etc.
•Vague bond matching options
•Chemical Terms filter
•Tautomer search
•Inverse hit list
•Maximum search time / number of hits
•Combine with non-structure
conditions
•Ordering of results
•etc.
JChem Base 5.2.2, Intel Quad Q6600 2.4GHz, 8 GB RAM; Oracle 10.2.0.3
Performance (1)
Number of compounds
Elapsed time
Duplicates not checked
Duplicates checked
10,000 21 s 26 s
100,000 2 min 4 s 2 min 34 s
200,000 4 min 24 s 5 min 13 s
Query Number of hits Search time
2 0.91 s
93 0.98 s
6,001 1.30 s
146,256 5,66 s
Compound registration:
Substructure search in PubChem (19.5 million compounds):
Performance (2)
Similarity search:Tanimoto >0.9
JChem Base 5.2.2, Intel Quad Q6600 2.4GHz, 8 GB RAM; Oracle 10.2.0.
Query Number of hits Search time
0 3.39 s
0 3.82 s
0 3.33 s
Markush structures
Markush structure registration and search
• Markush features
• R-groups
• Atom lists, bond lists
• Position variation bond
• Link nodes and repeating units
• Homology variation (alkyl, aryl, etc.)
• Compatible Markush enumeration plugin
Administration with JChemManager
User interface for• creating tables
• import
• export
• deleting rows
• dropping tables
Most functions are also available from command line.
Standardization
• Default standardization includes:
– Hydrogen removal
– Aromatization
• Custom standardization
can be specified for each
table by specifying an XML
configuration file at table
creation or in the “Table
Options” dialog of JChem
Manager (jcman)
Custom Standardization Example
afterbefore
Standardizer http://www.chemaxon.com/conf/Standardizer.ppt
The property table
The property table stores information about JChem structure tables, including:
• Fingerprint parameters
• Custom standardization rules
• Other table options and information
More than one property table can be used, each property table represents a particular JChem environment.
Table types
Control allowed chemical structures and available operations
• Molecule
• Reaction
• Markush
• Query
• Any structure
The structure of JChem tables
Column name Explanation
cd_id unique numeric identifier in the table
cd_structure the imported structure in the original format, without modifications (except for the removal of data fields)
cd_smiles; cd_smarts; cd_markush
the standardized structure format dependig on the different table types, used by the search process
cd_formula the formula of the standardized structure
cd_sortable_formula formula representation for alphanumerical sorting
cd_molweight the molecular weight of the standardized structure
cd_hash; cd_flags;
cd_fp…
fields used internally for structure searching
cd_timestamp the date and time of the insertion of the row
[user fields] custom data fields can be added by the user
Structural search in database
Two stage method provides optimal performance:
1. Rapid pre-screening reduces the number ofpossible hit candidates
• Chemical Hashed Fingerprints are used forsubstructure and superstructure searches
• Hash code is used for duplicate filtering(usually during compound registration)
2. Graph search algorithm is used to determine the final hit list
Structure Cache
• Contains Fingerprints for screening and ChemAxon Extended SMILES for ABAS
• Instant access to the structures for the search process
• Reduced load on the database server
• Incremental update ensures minimum overhead after changes in the table
• Small memory footprint due to – SMILES compression– Optimized storage technique
• Approximately 100MB memory needed for 1 million typical drug-like structures (using default, 512 bit long fingerprints)
Future plans
• Graphical user interface for R-group decomposition
• Arbitrary table structure
(Java and .NET API for JChem index)
• Maximum common substructure search type
• Additional layer: JChem Server (later also as grid)
• Compound registration system API
Summary
ChemAxon’s JChem Base API provides sophisticated
high performance tools for the developer to deal
with chemical structures and associated data.
Building on the JChem API is convenient, because:
• Our various tools integrate seamlessly
• Both high and low level API classes are available
• Responsive developer-to-developer support
Links
• JChem home page:http://www.chemaxon.com/products/jchem-base
• Online tryout:http://www.chemaxon.com/jchem/examples.html
• API documentation:http://www.chemaxon.com/jchem/doc/api/index.html
• Brochure:www.chemaxon.com/brochures/JChemBase.pdf
Visit other technical presentations
MarvinSketch/View http://www.chemaxon.com/MarvinSketch_View.ppt
MarvinSpace http://www.chemaxon.com/MarvinSpace.ppt
Calculator Plugins http://www.chemaxon.com/Calculator_Plugins.ppt
JChem Base http://www.chemaxon.com/JChem_Base.ppt
JChem Cartridge http://www.chemaxon.com/JChem_Cartridge.ppt
Standardizer http://www.chemaxon.com/Standardizer.ppt
Screen http://www.chemaxon.com/Screen.ppt
JKlustor http://www.chemaxon.com/JKlustor.ppt
Fragmenter http://www.chemaxon.com/Fragmenter.ppt
Reactor http://www.chemaxon.com/Reactor.ppt