View
422
Download
0
Category
Preview:
Citation preview
BUILDING A CROWDSOURCED
CHEMICAL DATABASE FROM
THE WEB
Árpád Figyelmesi
BACKGROUND
Chemistry in the deep
Deep Web is parts of the World Wide Web not
indexed by standard search engines.
• Limited access or scripted
• Web archives
• Chemistry is hardly indexed
• Buried under the waste
Chemicalize original concept
Free, web based, experimental, demonstration and
advertising application for non-commercial use only.
chemicalize.orgbeta
Eight years ago…
History
• 2008 Alpha release
• 2009 Webpage annotation
• 2010 Property calculation
• 2011 Chemical & Web search
Crowdsourced web exploration
Public pages visited by Chemicalize users
Auto annotations scripts
Search results
Contribution to PubChem (2013)
• 300k structures
• 350k web pages
• 100k novel
Popularity (2015)
• 25k users / month
• 1 million structures 2 millions visited URLs
• A dozen of blog posts and journal references
• Continuous valuable user feedback
Dark side:
• Scalability & performance
• Maintenance & operation
• Abuse and non-fair usage
NEW CHEMICALIZE
Vision
Preserve current values but make Chemicalize a
professional and much more powerful platform.
• Improve reliability
• Extend functionality
• Know and understand users
Development
• Secure
• Reliable
• Scalable
• Extensible
• Simple
• Fast
Full redesign and enterprise ready reimplementation
in a modular cloud architecture.
New business model
• Free registration
• Free basic functions
• Free credits monthly
• Pay-per-use
• Credit package system
Enough for most
typical use cases
For more intensive
usage
Instant cheminformatics solutions
Current modules
Calculation
Names,
identifiers,
physicochemical
properties eg.
pKa, logP/logD,
solubility…
Annotation
Chemical
structures
recognition and
extraction from
web pages
Search
Combined
chemical and text
search with
relevance scoring,
hit highlighting…
Compliance
Compliance check
with regulations on
psychotropic drugs,
explosives, toxic
agents
+ Extensible with any further modules
NEW HEART
Annotation
Improved annotation
view for modern web
pages with better CSS
and JS support
• GooglePatents
• ScienceDirect
• Wiley Online Library
Content
More preloaded content and proactive web
exploration besides of crowdsourcing
Processed in the first stage:
• English Wikipedia5 million articles
• USPTO grantsLast 5 years
• Chemicalize800k URLs
Search
New engine offering
unlimited combination of
chemical and keyword
search
• Substructure, full, similarity
• Name, SMILES, InChI, CAS
• Full text, field
• Boolean, proximity, wildcard
Query examples
acetylsalicylic acid AND fever
Aspirin, acetylsalicylic acid, 2-
(acetyloxy)benzoic acid and all chemically
equivalent terms and fever together.
SUB:benzene
Containing any structure which contains
benzene as a substructure. For
example, toluene, phenol, benzoic acid.
SIM:viagra AND "half-life" AND "pulmonary
arterial hypertension"
Containing structures chemically similar
to Viagra and containing "half-life" and
"pulmonary arterial hypertension".
(c?emotherap* AND ("Phosphoinositide 3-
kinases"~3OR Pi3K)) AND FULL:idelalisib
Wildcard operators: ? for one character, * for
multiple characters. Proximity operator: "term1
term2"~distance. Phrase: "term1 term2".
chemicalize.com
THANK YOU
Árpád Figyelmesi
Recommended