Click here to load reader
Upload
stuart-chalk
View
202
Download
2
Embed Size (px)
Citation preview
Reactions toThe Open Spectral Database
http://osdb.info
Stuart J. Chalk, Department of ChemistryUniversity of North Florida
Instigator: Tony Williams
SCTY 28 – Pacifichem 2015
What would Jean-Claude Bradley have wanted?
Share and Reuse Research Data!
How Do You Make Everything Open?
JCAMP Implementation
The Open Spectral Database
Data Model
Live Demo (fingers crossed)
Future Plans
Conclusion
Outline
What Would JCB Have Wanted?
Simple: Openness as the norm not the exception
Data made available, without restriction, so its useful
Mechanisms/tools to make data available
Formats to allow others to get the data…
…but also so its easy to use
Annotated data to make it easy to find
Community driven promotion of and action on these issues
Ryan P. Womack (2015) Research Data in Core Journals in Biology, Chemistry, Mathematics, and Physics. PLoS ONE 10(12): e0143460. doi:10.1371/journal.pone.0143460
Share and Reuse Research Data!
You have to know/define what “everything” means
Open Data
Open Data Model
Open and useable data structures
Open Code
Open to input from the community on all aspects
Open to add, extend, change, and rethink all of this
How Do You Make Everything Open?
Spectral data – There are many formats but only one open and generally accepted standard – JCAMP
Its not perfect…
…but its an output format people can share
Lets export the data, metadata, and inference as much as possible from JCAMP files
Not as easy as it seems…
First Attempt
Great data exchange format, however…
…not meant to be computer input…
…more a way to get data out so a human can process
Missing parameters (metadata)
Missing data
Incorrect values
Extra data
Incorrectly compressed
Challenges with JCAMP
Upload JCAMP spectra
Data and metadata extracted
Organize metadata so it can be used to find data
Use REST based website and API to make data availableand allow searching – document API
Make the website available as a project on GitHub andinvite the community to get involved
The Open Spectral Database
Apache 2.4 (http://httpd.apache.org) PHP 5.6 (http://www.php.org) CakePHP 2.7 PHP Framework (http://cakephp.org) MySQL 5.5 (http://www.mysql.com) jQuery (JavaScript) (http://jquery.com) Flot for jQuery (http://www.flotcharts.org) Jsmol (http://jmol.org) Bootstrap CSS (http://getbootstrap)
eXtensible Markup Language (http://www.w3.org/TR/xml/) JavaScript Object Notation (JSON) (http://json.org) JSON for Linked Data (JSON-LD) (http://www.w3.org/TR/json-ld/)
Technology
JCAMP file is imported into PHP as an array, then
Clean
Uncomment ($$)
Separate
Labeled Data Records (LDRs)
Parameters (##.)
User Defined Labels (##$)
Validate
Standardize
Decompress
Convert to output format or store in database
Ingestion Process
In order to organize the data and metadata it is distributed across a number of tables in the database
This is a generic science data model that is being used for multiple projects
Not limited to spectra or even just Chemistry data
Data Model
Data Model
File upload
Export formats
Search API
Live Demo
SemanticAnnotation
Enthusiastic Feedback with constructive comments…
Spectral list is boring needs molecules linked to spectra
Less metadata on the spectral page with option to see more
Revise homepage to make it more inviting
Reactions to Alpha Version
Again Enthusiastic…
”Love the layout! Very clean…”
“Nice Work!” (Twitter comment)
… with constructive comments
Needs a zoom spectra feature
Clicking on spectrum provides data that is not useful
Maybe you could use JSpecView rather than Flot?
Reactions to Beta Version
Handle more complicated JCAMP files
Handle file formats other than JCAMP
Export in AnIML format
Expand the API
Improve Flot viewer functionality (e.g. zoom)
Add JSpecView spectral viewer
Endpoint summary page
Document the website (GitHub)
Document how to contribute to the website (GitHub)
Solicit feature requests and encourage contributions
Things To Do
Take Home
The OSD is open for the community to develop and implement ideas about open spectral data re:
Data Model
API features
Export Formats
Services
Community Involvement!
Use as a data source for other applications
Submission of feature requests
Participation as code contributor
Phone: 904-620-5311
Skype: stuartchalk
Twitter: @StuChalk
LinkedIn/Slidehare: https://www.linkedin.com/in/stuchalk
ORCID: http://orcid.org/0000-0002-0703-7776
ResearcherID: http://www.researcherid.com/rid/D-8577-2013
Questions?