Graph Databases and Web Frameworks (NodeJS, AngularJS, GridFS, OpenLink Virtuoso)

Preview:

DESCRIPTION

Invited Lecture on NoSQL databases and modern web-development frameworks. JavaScript + JSON = easy parsing, less verbose code NodeJS = asynchronous everything. Needs precise flow control ElasticSearch = Scalable indexing, easy to use JSON API GridFS = Transparent scaling for huge numbers of large files; querying using JSON-based API Graph Databases = Model certain problems better than their • relational counterparts. Simpler queries using SPARQL. Less mature than RDBMs. No transactions. Socket.io = Real-time library for client-server-client push communication

Citation preview

Web frameworks and

graph databasesOverview and code demos

João Rocha da Silva

May 2014

joaorosilva@gmail.com

Contents• Modeling limits of relational databases

• Entities with variable attributes

• Time-variant values

• Inheritance

• Hierarchies (parents of parents of parents…)

Contents (cont’d)• Modeling problems in a graph

• Ontologies and SPARQL

• OpenLink Virtuoso

• Scalable file storage: GridFS within MongoDB

• Scalable document indexing : ElasticSearch

• NodeJS and asynchronous flow control

• AngularJS for dynamic web interfaces

• BONUS : Socket.io sneak peek

Contents (cont’d)

Relational databases • Good when you know everything about the

problem at the time of modeling

• A column can only be of a single type (VARCHAR, int, etc)

• Hard to document

• Model can become too attached to the code

Relational databases

• Handling historical values = complex SQL

• Hierarchies = Foreign Key loops

• Variable attributes, inheritance = [null + if Hell] or many JOINs

Relational models

(one of 78,826 tables and counting)

source : SAP

Beautiful, meaningful column names ;-)

Even better table names

Attribute name

Timestamps

Value (always varchar)

Entity with variable, time-dependent

attributes

Fixed attrs.

!source CKAN

Graph models

Graph databases • Represent entities (Users, Products, Places…) as

vertexes (entity types are called classes)

• Connections between them are directed graph edges (edge types are called properties)

!

• The meaning of these connections is expressed in ontologies that can be shared and reused

Representing a person using ontologies

http://www.fe.up.pt/~pro11004

“João Rocha”

foaf:name

up:PhDStudent rdf:type

http://www.w3.org/TR/rdf-schema/http://www.foaf-project.org/

http://www.fe.up.pt/

org:memberOf

Getting all the studentsSELECT ?uri ?attribute ?value FROM <http://myorganization.com/data> WHERE { ?uri rdfs:type up:Student. ?uri ?attribute ?value }

• Will fetch all the students, regardless of their type

• Will also return their attributes (“database columns”)

• Different types of students will have different attributes

Inference

• Transitive Properties (subclass of subclass…) • Subclasses • Multiple Inheritance Handling

(Student + Researcher + ScholarshipHolder)

Saves coding time spent writing complex queries

Nothing comes for free• Aggregation operators slow

• Transactions are not supported in standard SPARQL

• (“SPARQL 1.1 Query/Update Services should be atomic but that they are not required to be atomic.”)

• Graph DBMS Solutions are in early stages (many bugs, many “beta”s, many mailing lists…)

An example application

Dendro (dendro-dev.fe.up.pt:3001)

• Dropbox and File/Folder description platform

• Variable descriptions

• Time-dependent values

• Directory structures (hierarchy)

• Need for simple querying…

nie:isLogicalPartOf

Pn

Dn

280mm

“DCB Base Data”

120

Dn-1

dcb:initialCrackLength

dc:title

dcb:specimenWidth

dc:isReferencedBy

Fn

120

dc:title

dcb:specimenWidth

dc:isVersionOf

Added propertyinstance

01/01/2014^^xsd:date

dc:created

01/01/2014^^xsd:date

dc:modified

Changedmodificationtimestamp

Revision creation

timestamp

Un

dc:creator

Current dataset version Past Revisions

ddr:pertainsTo

Change recording

C

ddr:initialCrackLen

gth

ddr:changedDescriptor

“add”

ddr:operation

“DCB Base Data”

Socket.io Real-time eventsNodeJSBusiness

Logic

AngularJS

Dynamic interfaces à la Google Docs

Files

GridFS

Database

OpenLink Virtuoso

Free-text search

ElasticSearch

Code DemosNodeJS (Dendro) http://192.168.5.75:3001

GridFS http://192.168.5.75:27017

OpenLink Virtuoso http://192.168.5.75:8890

ElasticSearch http://192.168.5.75:9200/_plugin/head/

Socket.io (BattleBits) http://localhost:3000

Conclusions• JavaScript + JSON = easy parsing, less verbose code

• NodeJS = asynchronous everything. Needs precise flow control

• ElasticSearch = Scalable indexing, easy to use JSON API

• GridFS = Transparent scaling for huge numbers of large files; querying using JSON-based API

• Graph Databases = Model certain problems better than their relational counterparts. Simpler queries using SPARQL. Less mature than RDBMs. No transactions.

• Socket.io = Real-time library for client-server-client push communication

João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of Engineering of the University of Porto. He specializes on research data management, applying the latest Semantic Web Technologies to the adequate preservation and discovery of research data assets. !He is experienced in many programming languages (Javascript-Node, PHP with MVC frameworks, Ruby on Rails, J2EE, etc etc) running on the major operating systems (everyday Mac user). Regardless of language, he is a quick learner that can adapt to any new technology quickly and effectively. !He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-taught DIY mechanic with a special interest in classic cars, particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi-Roku or AE86.

!Research Data Management and Semantic Web Researcher, Web & iPhone Developer

João Rocha da Silva!

joaorosilva@gmail.com