View
369
Download
0
Category
Preview:
Citation preview
DANS is een instituut van KNAW en NWO
Data Archiving and Networked ServicesData Archiving and Networked Services
Stop making tools !Nobody likes them anyway...
Christophe Guéret (@cgueret)
New Trends in eHumanities16 April 2015
DANS is een instituut van KNAW en NWO
Data-driven research
Data collection
Data cleaning and integration
Data processing Tool
New data Existing data
Happy users
What kind of tool ?● Could be
– An interactive web site – An “app” for smart phones– A stand-alone software
● Goal is to always let users consume the data for their need
● Actual tooling will depend on the skills and preferences of the team member coding it!
Behaving scholars do a bit more
Data collection
Data cleaning and integration
Data processing Tool
New data Existing data
Happy users
The myths of long term use● Data and software sent to a digital trusted
repository will for sure be re-used later
● Tools can be maintained after the project and further improved to fit new needs
● If the tool is not being used enough it should be adapted to fit more user needs
In reality● Data that is not easy to use is not used
● Tools are not maintained once the person who coded it has moved onto other things
● It is not possible to make everyone happy and fit all research questions with one tool
Data re-use: could you do it ?CEDAR all open on github: data, queries and scripts.
● Usage example:– Download dumps
– Install triple store
– Load data & wait
– Recursively query for provenance
Data is the important thing
http://redmonk.com/jgovernor/2007/04/05/why-applications-are-like-fish-and-data-is-like-wine/
Data
Tool
Where we're going we don't need “tools”
So what needs to be done ?Do not bake the data into the tool. Instead build the tool on top of the data, and ensure others can do the same
Data collection
Data cleaning and integration
Data processing Data exposition
Tool 1 Tool 2 ...
In fact, do not write any tool● Focus on exposing the data
– Less time spent coding and less code– Easier and cheaper to maintain
● To increase availability, expose your data on the Web
● Exposing != Make a package and put it somewhere
The magic keyword 1 : “API”● “In computer programming, an application
programming interface (API) is a set of routines, protocols, and tools for building software applications” - Wikipedia
● Regardless of data, all the software you use is a layered cake bound by software APIs– Presentation software > GUI toolkit >
Rendering System > Operating System > Hardware
Example (courtesy of Wikipedia)● In this code “nextLine” and “close” are part
of the API of “Scanner”
APIs can be on the Web too● HTTP can be used as an API too.
● Get a specific record from a database– http://example.com/api?action=show&id=500
● Delete a record in a database– http://example.com/api?action=delete&id=500
● But don't do it that way! This is abusing the role of the “GET” method from HTTP
Generic design for tool + API● Tools consume the data provided by a set of
APIs over the Web
● If you are coding tools– Forget about server-side page rendering– Learn Javascript
Data API ToolMySQL, R, ... HTTP, JSON, ...
The magic keyword 2 : “REST”● “Representational State Transfer (REST) is a
software architecture style consisting of guidelines and best practices for creating scalable web services” - Wikipedia
● For example: instead of using GET to do a delete just use the DELETE method from HTTP on the target resource
The magic keyword 3 : “JSON”● “JSON (/ d e s n/ JAY-s n), or JavaScript ˈ ʒ ɪ ə ə
Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs” - Wikipedia
A step further with JSON-LD● JSON-LD is Linked Data expressed in JSON.
Let users follow links across datasets● Example of JSON data that is not JSON-LD
Ok, but what is the API call to get more information about the board ?
● Need to figure it out in some way● With LD you would get a link
Part of the result from http://api.openonderwijsdata.nl/api/v1/get_document/duo/po_school/2013-20YF
Web APIs● There is a lot of them (> 12k) and their
number is increasing rapidly. See: http://www.programmableweb.com/
● Some examples:– https://dev.twitter.com/rest/public
– http://www.slideshare.net/developers/documentation
– http://developer.rottentomatoes.com/docs
– https://www.flickr.com/services/api/
Bonuses
© All Seeing, Flickr
Give less to share more● Noticed something about the examples given
in the previous slide on Web APIs ?
● None of them would give you a copy of their dataset, yet they have an API to let you access the data !
● => API enable fine-grained access to data
Monetize a service, not a dataset● APIs open up the opportunity for monetizing
the usage of the data instead of the data itself
● Users can be charged per API call
● Similar “download VS API” approaches– Paid game VS Free to play– Music download VS Streaming music
Extra technical bonuses ● Most of the processing happens on the client
side, so less resources needed to serve the data
● Finer tracking of data usage
● Extra possibilities to do caching, do round-robin, use CDNs etc => more easy to scale
Ending on some more examples...
Facilityregistry.orgThe website is the API. No interface of any kind
Nlgis.nlAPI and a simple data visualisation tool using it
Lod.cedar-project.nl/cedarGeneric query interface + extra API
To summarise● When your data is ready to be shared make first
an API for it. This will minimise friction in re-use.
● If you want/need to write a end-user tool make it use your own API (and others !)
● Plan maintenance for the API to keep it running.
Recommended