Upload
valery-tkachenko
View
232
Download
2
Embed Size (px)
Citation preview
OMPOL – Visualisation of large chemical spaces
Peter Corbett, Colin Batchelor, Alexey Pshenichnov, Valery Tkachenko
Royal Society of Chemistry
ACS Spring 2016San Diego, CAMarch 17th 2016
CompoundsReactionAnalytical DataText and References
ChemSpider Synthetic Pages
Chemical space - 1060
Dimensions and complexity of science
RSC Data Repository
Data Repository
Properties Names and Identifiers Spectra Articles Data
Collections Patents Etc
RSC CompoundsRSC ReactionsRSC SpectraRSC CrystalsRSC PolymersRSC MaterialsRSC AssaysRSC AlgorithmsRSC Models…and on…
RSC Databases
Record labels
Need to be able to see what sorts of structures are in a collection, how they relate to each other, etc.Could use something like clusteringDimensionality Reduction – chemical structures -> fingerprints -> large dimensional space -> small dimensional spaceStandard technique – Principal Components Analysis (PCA)
Visualising Chemical Space
Dimensionality Reduction – First make a molecule-feature matrix
1 0 0 0 0 0 0 0 … 0
0 0 1 0 0 0 0 0 … 0
1 1 0 0 1 0 0 0 … 1
1 1 0 1 0 0 0 0 … 1
1 1 0 0 0 0 0 0 … 0
1 0 0 0 0 0 0 1 … 0
1 0 1 0 1 1 0 0 … 0
1 0 0 1 0 0 0 0 … 1
PCA/SVD
The result0.209 0.078 -0.368 …
0.030 0.297 0.174 …
0.509 0.005 0.343 …
0.514 -0.394 0.172 …
0.320 -0.034 -0.198 …
0.228 0.108 -0.791 …
0.338 0.812 0.151 …
0.403 -0.281 0.003 …
<--- Most important Least important --->
Plot on a graph
Need an interactive scatterplotWeb delivery => JavaScript
Need, at minimum, to click, mouseover, pan and zoomExisting scatterplot libraries, e.g. flot.js, are plentiful and
well supported……but do not scale well – become slow and unresponsive
with ~40,000 data points
The problem
Make your own graph-plotting toolOMPOL – One Million Points Of Light – an aspiration for scalability
HTML5 Canvas“Google maps” style drawing
Divide graph into panelsDraw panels as they come onto the screenAssemble display from pre-drawn panels
Opportunity for better ways of exploring the data
The solution
ChEBI~50000 compounds, of “Biological Interest”Has an ontology of compound types
Example data
Display data from dimensional reductionSelecting data points, sets of data points“Narrowing down” a cluster of compounds based on distribution in multiple dimensionsExporting dataUsing name and ontology information to select groups of points
What we’re going to show
Works very nicely with ~50000 data points and all featuresDuring development, was able to work with 1M and on occasion 10M data points
Only in 2D, didn’t have all features turned enabled
How scalable?
Interacting with large (tens of thousands to millions of data points) multidimensional data sets is now a definite possibility
Conclusion