DataGraft Platform: RDF Database-as-a-Service
Open Data Workshop @ Oslo
July 2nd, 2015
http://dapaas.eu/
Marin Dimitrov, Ontotext (Bulgaria)
The Role of a RDF DBaaS
4
Grafter Grafterizer
RDF DBaaS Open Data Portal
• Transform tabular data into RDF
• Publish (Linked) data services, instead of static datasets
• Lower-cost & easier data publishing process
RDF DBaaS Requirements
• Elastic
– dynamically adapt to growing data & query volumes
• High availability & resilience
– no SPFs, “graceful degradation” upon failures
• Cost efficient
• Host a large number of data services (databases)
– But probably of low/moderate data & query volume
• Security & isolation of the multi-tenant databases
5
Not easy to achieve all three!
Cloud Architecture (AWS)
• AWS based
– Network storage, compute & auto-scaling, load balancing, integration services, …
• Ontotext GraphDB as the RDF DB engine
– OpenRDF REST API
• Docker for containerisation
• An RDF DBaaS is…
– A GraphDB instance…
– Running within a Docker container…
– Storing its data on a private NAS (EBS) volume
6
Evaluation
• Elastic
– Routing nodes, data nodes + NAS storage grow as usage grows
• High availability & resilience
– Strategies for dealing with failures in data, routing, Coordinator nodes
• Cost efficient
– Cloud native architecture -> cost savings
– Multi-tenant model -> cost savings
– Elastic: return under-utilised or unused resources back to CSP
9
OpenRDF REST API
resource operations comments
/repositories GET Get info on DB repos
/repositories/<REPOSITORY> GET, POST, PUT, DELETE Create*, delete, query a repository
/repositories/<REPOSITORY>/size GET Gets the number of triples in a repository
/repositories/<REPOSITORY>/statements GET, POST, PUT, DELETE Add, read, update, delete statements
repositories/<REPOSITORY>/rdf-graphs/<GRAPH> GET, POST, PUT, DELETE
Same as above
/settings GET, PUT Configure the DBaaS*
10
Standards Compliant
• Standard SPARQL endpoint, Linked Data point
• Variety of 3rd party tools can be used to query, explore or visualise Open (Linked) Data
11
Benefits
• Enables live data services instead of static data files
• Data publishers don’t need to worry about infrastructure (databases, availability, cloud)
• Developers get reliable access to data services, simple APIs, can use various 3rd party tools
14