Sharing Science Data: Semantically Reimagining the IUPAC Solubility Series
DataStuart J. Chalk
Department of ChemistryUniversity of North Florida
SCTY 132 - Pacifichem 2015
Thoughts on Policy and Procedures
Case in Point – the SDS Development Perspective The Scientific Data Model Database Structure for SDM Mapping NIST DB to SDM DB Current Status Management and Tools To Do List Acknowledgements
Overview
http://aspiresquared.co.uk/2011/01/metadata-what-is-its-purpose/
Scientific publications need to be reimagined Describe work and highlight advancements yes!
… ...but also provide access to original data as
default For all reasearch (not just federally funded) With appropriate attribution/provenance… …and timestamp and digital signature
Thoughts on Policies and Procedures
Implementation needs to be designed by scientists to engage the community and get buy-in to change paradigm
National societies need to promote good practices in digital collection, annotation and reporting of scientific data …
... and mandate that it be taught as part of the curiculum in chemistry!
Don’t publish useful scientific data in ways you can’t use it!
Thoughts on Policies and Procedures
Case in Point…
IUPAC Solubility Data Series 1979-present 103 volumes
Paper volumes until 1996 (up to volume 65) Electronic articles in JPCRD
Abstracting solubility data from the literature by scientists, careful reporting of values with context
Great resource but why not electronic?
Understanding the NIST SDS DB structure Migration strategy development Database migration (iterative) Scientific Data Model development and
implementation UNF DB design Website development Ingest of chemical identifiers Identification of DOI’s for references Database cleanup
Activities
A general model to describe/contain scientific data focuses on the data not the view of the data
Therefore, how the data is stored in a database should not be governed by how the user views it
Separating the view and the data structure a major focus Our perspective
Systems (NIST DB) are two different views of SDS data Reports are a view of data from one reference about one or
more chemical systems Evaluations are an aggregated view of data from many reports
Data in tables are aggregated views of data from reports and optionally preparer/evaluator supplemental data
Development Perspective
Existing NIST DB Model
Scientific Data Model (SDM)
SDM DatabaseStructure
Each table has one to many rows Each row has one to many columns Data in a row is of different types
Conditional Data – properties values that define context of data
Experimental Data – the measured values in the paper Supplemental data (other property data)
From the authors of the publication evaluated Calculated by the preparer or evaluator
Notes
Interpreting Data in Tables
Interpreting Data in Tables
Conditional Data
Experimental Data
Supplemental Data
Interpreting Data in Tables
All tables completely transferred except SOLDATA – 75% transferred SOLID_DATA – 50% transferred GAS_DATA, PARAMETERS_TB, FIGURES - not
started TO DO’s
Separate Evaluations from Reports Clean up and link references properly
Get links to references that don’t have a DOI Additional integration into other API’s Write up API documentation Solubility Ontology
Current Status
Project files are hosted on GitHub Collaborative environment Tracking issues Website (TO DO)
Communication through Slack Integration with GitHub (issues and commits)
PHP coding using PHPStorm Enforces coding standards, code checks, TO
DOs GitHub (commits and issues) and MySQL
Management and Tools
Finish the transfer of data! Replicate the current SRD website Standardize GUI using BootStrap CSS Check browser compatibility (NIST list of browsers?) Check for compliance with Section 508 for access Add exit script() for leaving NIST website Documentation
PHP code, SDM schema, GUI, API Do roundtrip verification of the data New website will replace existing one by summer
2016
To Do List
UNF Matthew Morse (UNF Senior)
NIST SDS DB migration to UNF DB, SDM Testing Israel Hurst (UNF Junior)
NIST SDS DB data transfer to UNF DB, UNF DB Cleanup
John Turner (UNF Senior) Website implementation, UNF DB Cleanup
NIST Bob Hanisch (ODI Director) Adam Morey, Peter Linstrom, Don Burgess,
Angela Lee
Acknowledgements
Interesting in knowing more about chemical data, semantics and knowledge representation?
251st American Chemical Society MeetingSan Diego, CA, USA March 13-17, 2016
ACS Chemical Informatics Division “CINF Data Summit” All five days of the meeting three symposia
"Tomayto vs. Tomahto: Overcoming Incompatibilities in Scientific Data” “Global Initiatives in Research Data Management & Discovery” “Chemistry, Data, & the Semantic Web: An Important Triple to Advance
Science”
Shameless Plug
[email protected] Phone: 904-620-5311 Skype: stuartchalk Twitter: @StuChalk LinkedIn/Slidehare: https://www.linkedin.com/in/
stuchalk ORCID: http://orcid.org/0000-0002-0703-7776 ResearcherID:
http://www.researcherid.com/rid/D-8577-2013
Questions?