OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Time flows, my friend

Luigi Dell’AquilaOrient Technologies LTDTwitter: @ldellaquila

Managing event sequences and time series with a Document-Graph Database

Codemotion Milan 2014

Time What…?

Time series: A time series is a sequence of data points, typically consisting of successive measurements made over a time interval (Wikipedia)

Time What…?

Event sequences:

• A set of events with a timestamp• A set of relationships “happened

before/after”• Cause and effect relationships

Time What…?

Time as a dimension:

• Direct:– Eg. begin and end of relationships (I’m a

friend of John since…)• Calculated – Eg. Speed (distance/time)

Time What…?

Time as a constraint:

• Query execution time!

The problem:Fast and Effective

Fast and Effective

Fast write: Time doesn’t wait! Writes just arrive

Fast read: a lot of data to be read in a short time

Effective manipulation: complex operations like- Aggregation- Prediction- Analysis

Current approaches

0. Relational approach: table

Timestamp Value

2014:11:21 14:35:00 1321

2014:11:21 14:35:01 2444

2014:11:21 14:35:02 2135

2014:11:21 14:35:03 1833

Current approaches

0. Relational approach: table

HH MM SS Value

14 35 0 1321

14 35 1 2444

14 35 2 2135

14 35 3 1833

Current approaches

0. Relational – Advantages

• Simple• It can be used together with your application data

(operational)

Current approaches

0. Relational – Disadvantages

• Slow read (relies on an index)• Slow insert (update the index…)

Current approaches

1. Document Database

• Collections of Documents instead of tables• Schemaless• Complex data structures

Current approaches

1. Document approach: Minute Based

{timestamp: “2014-11-21 12.05“load: [10, 15, 3, … 30] //array of 60, one per second

Current approaches

1. Document approach: Hour Based

{timestamp: “2014-11-21 12.00“load: {

0: [10, 15, 3, … 30], //array of 60, one per second

1: [0, 12, 31, … 24],…59: [10, 10, 1, … 16]

Current approaches

1. Document approach – Advantages

• Fast write: One insert x 60 updates• Fast fetch

Current approaches

1. Document approach – Disadvantages

• Fixed time windows• Single point per unit• How to pre-aggregate?• Relationships with the rest of the world?• Relationships between events?

Current approaches

2. Graph Database

• Nodes/Edges instead of tables• Index free adjacency• Fast traversal• Dynamic structure

Current approaches

2. Graph approach: linked sequence

next e3

next e4

next e5

(timestamp on vertex)

Current approaches

2. Graph approach: linked sequence (tag based)

nextTag1

nextTag2

nextTag1

nextTag2

[Tag1, Tag2] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

Current approaches

2. Graph approach: Hierarchy

2 60…

Minutes

Seconds

Current approaches

2. Graph approach: mixed

2 60…

Minutes

Seconds

Current approaches

1. Graph approach – Advantages

• Flexible• Events can be connected together in different ways• You can connect events to other entities• Fast traversal of dynamic time windows• Fast aggregation (based on hierarchy)

Current approaches

1. Graph approach – Disadvantages

• Slow writes (vertex + edge + maintenance)• Not so fast reads

Can we mix different models and get all the advantages?

Can we mix all this with the rest of application logic?

Multi-Model!

• Document database (schema-free, complex properties)

• Graph database (index-free adjacency, fast traversal)• SQL (extended)• Operational (schema - ACID)• OO concepts (Classes, inheritance, polymorphism)• REST/JSON interface• Native Javascript (extend query language, expose

services, event hooks)• Distributed (Multi-master replica/sharding)

architecture

OrientDB

First step: put them together

2 60…

Minutes

0: 1000,

1: 1500.

59: 96 }

OrientDB

2 60…

Minutes

0: 1000,

1: 1500.

59: 96 }

Document <- IT’S A VERTEX TOO!!!

OrientDB

Hours…

0: 1000, 1: 1500,…59: 210

}1: { … }…59: { … }

Document

Where should I stop?

It depends on my domain and requirements.

OrientDB

Result:• Same insert speed of Document approach• But with flexibility of a Graph• (as a side effect of mixing models,

documents can also contain “pointers” to other elements of app domain)

OrientDB

Second step: Pre-aggregate

2 60…

Minutes

0: 1000,

1: 1500.

59: 96 }

OrientDB

2 60…

Minutes

0: 1000,

1: 1500.

59: 96 }

OrientDB

2 60…

Minutes

0: 1000,

1: 1500.

59: 96 }

OrientDB

How to aggregate

Hooks: Server side triggers (Java or Javascript), executed when DB operations happen (eg. Insert or update)

Java interface:

Public RESULT onBeforeInsert(…);public void onAfterInsert(…);public RESULT onBeforeUpdate(…);public void onAfterUpdate(…);

OrientDB

Aggregation logic

• Second 0 -> insert• Second 1 -> update• …• Second 57 -> update• Second 58 -> update• Second 59 -> update + aggregate

– Write aggregate value on minute vertex• Minute == 59? Calculate aggregate on hour vertex

OrientDB

2 60…

Minutes

1: 12.

59: 3}

sum = 1000

sum = 15000

sum = 300

incomplete

complete

sum = null

OrientDB

Query logic:• Traverse from root node to specified level

(filtering based on vertex data)• Is there aggregate value?– Yes: return it– No: go one level down and do the same

Aggregation on a level will be VERY fast if you have horizontal edges!

OrientDB

How to calculate aggregate values with a queryInput params:- Root node (suppose it is #11:11)

select sum(aggregateVal) from ( traverse out() from #11:11 while in().aggregateVal is null)

With the same logic you can query based on time windows

OrientDB

Third step: Complex domains

1 2 60…

Minutes

{0: {val: 1000},1: {val: 1500}.…59: {

val: 96,eventTags:

[tag1, tag2]…

Document <- Enrich the domain

OrientDB

Another use case: Event Categories and OO

nextTag1

nextTag2

nextTag1

nextTag2

[Tag1, Tag2, Tag3] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

nextTag3

[Tag3]

OrientDB

Another use case: Event Categories and OO

Suppose tags are hierarchical categories (Classes for vertices and/or edges)

nextTAG

nextTagX nextTag3

nextTag2nextTag1

OrientDB

Subset of events

TRAVERSE out(‘nextTag1’) FROM <e1>

nextTag1 e4

nextTag1

[Tag1, Tag2]

[Tag1]

OrientDB

Subset of events

TRAVERSE out(‘nextTag2’) FROM <e1>

nextTag1

nextTag2 e5

nextTag2

[Tag1, Tag2, Tag3]

[Tag1, Tag2]

[Tag2]

OrientDB

Subset of events (Polymorphic!!!)

TRAVERSE out(‘nextTagX’) FROM <e1>

nextTag1

nextTag2

nextTag1

nextTag2

[Tag1, Tag2]

[Tag1]

[Tag2]

Connect all this with the rest of your application domain

You’ll see, everything will get more complex: you will discover new time-

related dimensions (speed, position…) and new needs (complex

forecasting)

CHASE!

• Your target is running away• You have informers that track his moves

(coordinates in a point of time) and give you additional (unstructured) information

• You have a street map• You want to:– Catch him ASAP– Predict his moves– Be sure that he is inside an area

• Map is made of points and distances• You also have speed limits for streets

point1

pointN Distance: 1KmMax speed: 70Km/h

Distance: 2KmMax speed: 120Km/h

Distance: 8KmMax speed: 90Km/h

Street

Map point

• Map is made of points and distances• You also have speed limits for streets

• Distance / Speed = TIME!!!

You have a time series of your target’s moves

{Timestamp: 29/11/2014 17:15:00LAT: 19,12223LON: 42,134

{Timestamp: 29/11/2014 17:55:00LAT: 19,12223LON: 42,134

Event seqence

{Timestamp: 29/11/2014

17:55:00LAT: 19,12223LON: 42,134

21/11/20142:35:00 PM

20/11/20141:20:00 PM

Street

Map point

21/11/201414:35:00

20/11/201413:20:00

Event seqence

Street

Map point

29/11/201417:55:00

Vertices and edges are also documents

So you can store complex information inside them

{ timestamp: 22213989487987,lat: xxxx,lon: yyy,informer: 15,additional: {speed: 120,description: “the target was in a car”car: {model: “Fiat 500”,licensePlate: “AA 123 BB”}}

Now you can:• Predict his moves (eg. statistical methods,

interpolation on lat/lon + time)• Calculate how far he can be (based on last

position, avg speed and street data)• Reach him quickly (shortest path, Dijkstra)• … intelligence?

But to have all this you need:• An easy way for your informers to send

time series events

Hint: REST interface

With OrientDB you can expose Javascript functions as REST services!

And you need:• An extended query language

Eg. TRAVERSE out(“street”) FROM (

SELECT out(“point”) FROM #11:11 // my last event

) WHILE canBeReached($current, #11:11)

(where he could be)

With OrientDB you can write

function canBeReached(node, event)

In Javascript and use it in your queries

It’s just a game, but think about:• Fraud detection• Traffic routing• Multi-dimensional analytics• Forecasting• …

Summary

One model is not enough

One of most common issues of my customers is:

“I have a zoo of technologies in my application stack, and it’s getting worse every day”

My answer is: Multi-Model DB

One model is not enough

One of most common issues of my customers is:

“I have a zoo of technologies in my application stack, and it’s getting worse every day”

My answer is: Multi-Model DB

of course ;-)

From:“choose the right data model for your

use case”

To:“Your application has multiple data

models, you need all of them!”

This is NoSQL 2.0!!!

Thank you!

@ldellaquila

l.dellaquila@orientechnologies.com

OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Data & Analytics

OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Bot Revolution lab at Codemotion Milan 2016

Diego Viganò - Milano Chatbots Meetup - Codemotion Milan 2017

Applied mechanical sympathy - Simone Bordet - Codemotion Milan 2014

OrientDB Codemotion 2014

Coding Culture - Sven Peters - Codemotion Milan 2016

Milano Chatbots Meetup - Paolo Montrasio - Codemotion Milan 2016

Daniele Dellafiore - No-Backend Web Architecture | Codemotion Milan 2015

Matteo Guidotto - Lean Frontend Development | Codemotion Milan 2015

Greg Beech Securing Deliveroo - Codemotion Milan 2017

Bias Driven Development - Mario Fusco - Codemotion Milan 2016

Vincenzo Chianese - REST, for real! - Codemotion Milan 2017

Big Data, Small Dashboard - Andrea Maietta - Codemotion Milan 2016

Milano JS Meetup - Gabriele Petronella - Codemotion Milan 2016

Keynote: Community Innovation Alaina Percival - Codemotion Milan 2016

Valentina Mazzoni - GDG Italia Meetup - Codemotion Milan 2017

To ∞ (~65K) and beyond! - Sebastiano Gottardo - Codemotion Milan 2016

Beautiful Authentication - Tiffany Conroy - Codemotion Milan 2016

Codemotion Milan 2015 Alerts Overload

Luciano Fiandesio - Docker 101 | Codemotion Milan 2015