Upload
petermurrayrust
View
12
Download
0
Embed Size (px)
Citation preview
Extraction of metabolism data from the literature
Peter Murray-Rust, Dept of Chemistry and TheContentMine
Lhasa, Leeds, UK, 2017-01-12
contentmine.org is supported by a grant to PMR as a
Thousands of scientists have to type the literature.
Machines should be doing it!
Special ThanksMolecular Informatics, CambridgePeter CorbettAndy HowlettDaniel LoweLezan HawizyMark Williamson
OSCAR (chemical entities),OPSIN (name 2 structure)ChemicalTagger (recipes)“OSIRIS” (graphical chemistry)
What is “Content”?
http://www.plosone.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pone.0111303&representation=PDF CC-BY
SECTIONS
MAPS
TABLES
CHEMISTRYTEXT
MATH
contentmine.org tackles these
Example papers – what do you want? What can we find?
Entity extraction
Reaction Schemes
Tables
Tables
Graphs
Entities
Plot
Plot
Maths?
Models?
Semantics in Wikidata
What’s the title?
Some demos
“… simulated by 21cmFAST is in principle independent”
“it is a feature of the 21cmFAST code, and is explained in §3.1.”
SciCodes[1]: Searching for software in arXiv[1]
[1] Proposal to LJ Arnold Foundation (Alice Allen ASCL and PMR)
Using the semi-numerical simulation, 21cmFAST,
[2] arxiv.org: the physics/maths/astronomy.. Preprint server
The language identifies the software!
arxIv has >500 mentions of “21cmFast”
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
Automatic semantic markup of chemistry
Could be used for analytical, crystallization, etc.
AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home
Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY:
AMI reads the complete diagram, recognizes the paths and generates the molecules. Then she creates a stop-fram animation showing how the 12 reactions lead into each other
CLICK HERE FOR ANIMATION
(may be browser dependent)
UNITS
TICKS
QUANTITYSCALE
TITLES
DATA!!2000+ points
VECTOR PDF
Dumb PDF
CSV
SemanticSpectrum
2nd Derivative
Smoothing Gaussian Filter
Automaticextraction
https://rawgit.com/ContentMine/amidemos/master/zika/full.dataTables.html
Search on publicly accessible papers on “Zika”
C) What’s the problem with this spectrum?
Org. Lett., 2011, 13 (15), pp 4084–4087
Original thanks to ChemBark
After AMI2 processing…..
… AMI2 has detected a square