OrientDB - Time Series and Event Sequences - Codemotion Milan 2014

Preview:

Citation preview

Time flows, my friend

Luigi Dell’AquilaOrient Technologies LTDTwitter: @ldellaquila

Managing event sequences and time series with a Document-Graph Database

Codemotion Milan 2014

Time What…?

Time What…?

Time series: A time series is a sequence of data points, typically consisting of successive measurements made over a time interval (Wikipedia)

Time What…?

Event sequences:

• A set of events with a timestamp• A set of relationships “happened

before/after”• Cause and effect relationships

Time What…?

Time as a dimension:

• Direct:– Eg. begin and end of relationships (I’m a

friend of John since…)• Calculated – Eg. Speed (distance/time)

Time What…?

Time as a constraint:

• Query execution time!

The problem:Fast and Effective

Fast and Effective

Fast write: Time doesn’t wait! Writes just arrive

Fast read: a lot of data to be read in a short time

Effective manipulation: complex operations like- Aggregation- Prediction- Analysis

Current approaches

Current approaches

0. Relational approach: table

Timestamp Value

2014:11:21 14:35:00 1321

2014:11:21 14:35:01 2444

2014:11:21 14:35:02 2135

2014:11:21 14:35:03 1833

Current approaches

0. Relational approach: table

HH MM SS Value

14 35 0 1321

14 35 1 2444

14 35 2 2135

14 35 3 1833

Current approaches

0. Relational – Advantages

• Simple• It can be used together with your application data

(operational)

Current approaches

0. Relational – Disadvantages

• Slow read (relies on an index)• Slow insert (update the index…)

Current approaches

1. Document Database

• Collections of Documents instead of tables• Schemaless• Complex data structures

Current approaches

1. Document approach: Minute Based

{timestamp: “2014-11-21 12.05“load: [10, 15, 3, … 30] //array of 60, one per second

}

Current approaches

1. Document approach: Hour Based

{timestamp: “2014-11-21 12.00“load: {

0: [10, 15, 3, … 30], //array of 60, one per second

1: [0, 12, 31, … 24],…59: [10, 10, 1, … 16]

}}

Current approaches

1. Document approach – Advantages

• Fast write: One insert x 60 updates• Fast fetch

Current approaches

1. Document approach – Disadvantages

• Fixed time windows• Single point per unit• How to pre-aggregate?• Relationships with the rest of the world?• Relationships between events?

Current approaches

2. Graph Database

• Nodes/Edges instead of tables• Index free adjacency• Fast traversal• Dynamic structure

Current approaches

2. Graph approach: linked sequence

e1

e2

next e3

next e4

next e5

next

(timestamp on vertex)

Current approaches

2. Graph approach: linked sequence (tag based)

e1

e2

nextTag1

e3

nextTag2

e4

nextTag1

e5

nextTag1

nextTag2

[Tag1, Tag2] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

Current approaches

2. Graph approach: Hierarchy

e1

e2

e60

1

1

8

24

2 60…

Days

Hours

Minutes

Seconds

e3

Current approaches

2. Graph approach: mixed

e1

e2

e60

1

1

8

24

2 60…

Days

Hours

Minutes

Seconds

e3

Current approaches

1. Graph approach – Advantages

• Flexible• Events can be connected together in different ways• You can connect events to other entities• Fast traversal of dynamic time windows• Fast aggregation (based on hierarchy)

Current approaches

1. Graph approach – Disadvantages

• Slow writes (vertex + edge + maintenance)• Not so fast reads

Can we mix different models and get all the advantages?

Can we mix all this with the rest of application logic?

Multi-Model!

• Document database (schema-free, complex properties)

• Graph database (index-free adjacency, fast traversal)• SQL (extended)• Operational (schema - ACID)• OO concepts (Classes, inheritance, polymorphism)• REST/JSON interface• Native Javascript (extend query language, expose

services, event hooks)• Distributed (Multi-master replica/sharding)

architecture

OrientDB

First step: put them together

1

1

8

24

2 60…

Days

Hours

Minutes

{

0: 1000,

1: 1500.

59: 96 }

OrientDB

First step: put them together

1

1

8

24

2 60…

Days

Hours

Minutes

{

0: 1000,

1: 1500.

59: 96 }

Graph

Document <- IT’S A VERTEX TOO!!!

OrientDB

First step: put them together

1

8

24

Days

Hours…

{0: {

0: 1000, 1: 1500,…59: 210

}1: { … }…59: { … }

}

Graph

Document

Where should I stop?

It depends on my domain and requirements.

OrientDB

Result:• Same insert speed of Document approach• But with flexibility of a Graph• (as a side effect of mixing models,

documents can also contain “pointers” to other elements of app domain)

OrientDB

Second step: Pre-aggregate

1

1

8

24

2 60…

Days

Hours

Minutes

{

0: 1000,

1: 1500.

59: 96 }

Graph

Document <- IT’S A VERTEX TOO!!!

OrientDB

Second step: Pre-aggregate

1

1

8

24

2 60…

Days

Hours

Minutes

{

0: 1000,

1: 1500.

59: 96 }

Graph

Document <- IT’S A VERTEX TOO!!!

sum()

OrientDB

Second step: Pre-aggregate

1

1

8

24

2 60…

Days

Hours

Minutes

{

0: 1000,

1: 1500.

59: 96 }

Graph

Document <- IT’S A VERTEX TOO!!!

sum()

sum()

OrientDB

How to aggregate

Hooks: Server side triggers (Java or Javascript), executed when DB operations happen (eg. Insert or update)

Java interface:

Public RESULT onBeforeInsert(…);public void onAfterInsert(…);public RESULT onBeforeUpdate(…);public void onAfterUpdate(…);

OrientDB

Aggregation logic

• Second 0 -> insert• Second 1 -> update• …• Second 57 -> update• Second 58 -> update• Second 59 -> update + aggregate

– Write aggregate value on minute vertex• Minute == 59? Calculate aggregate on hour vertex

OrientDB

1

1

8

24

2 60…

Days

Hours

Minutes

{

0: 1,

1: 12.

59: 3}

sum = 1000

sum = 15000

sum = 300

incomplete

complete

1 2

sum = null

sum = null

OrientDB

Query logic:• Traverse from root node to specified level

(filtering based on vertex data)• Is there aggregate value?– Yes: return it– No: go one level down and do the same

Aggregation on a level will be VERY fast if you have horizontal edges!

OrientDB

How to calculate aggregate values with a queryInput params:- Root node (suppose it is #11:11)

select sum(aggregateVal) from ( traverse out() from #11:11 while in().aggregateVal is null)

With the same logic you can query based on time windows

OrientDB

Third step: Complex domains

1

1 2 60…

Hours

Minutes

{0: {val: 1000},1: {val: 1500}.…59: {

val: 96,eventTags:

[tag1, tag2]…

}}

Graph

Document <- Enrich the domain

OrientDB

Another use case: Event Categories and OO

e1

e2

nextTag1

e3

nextTag2

e4

nextTag1

e5

nextTag1

nextTag2

[Tag1, Tag2, Tag3] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

nextTag3

e3

[Tag3]

OrientDB

Another use case: Event Categories and OO

Suppose tags are hierarchical categories (Classes for vertices and/or edges)

nextTAG

nextTagX nextTag3

nextTag2nextTag1

OrientDB

Subset of events

TRAVERSE out(‘nextTag1’) FROM <e1>

e1

e2

nextTag1 e4

nextTag1

e5

nextTag1

[Tag1, Tag2, Tag3] [Tag1]

[Tag1, Tag2]

[Tag1]

OrientDB

Subset of events

TRAVERSE out(‘nextTag2’) FROM <e1>

e1

nextTag1

e3

nextTag2 e5

nextTag2

[Tag1, Tag2, Tag3]

[Tag1, Tag2]

[Tag2]

OrientDB

Subset of events (Polymorphic!!!)

TRAVERSE out(‘nextTagX’) FROM <e1>

e1

e2

nextTag1

e3

nextTag2

e4

nextTag1

e5

nextTag1

nextTag2

[Tag1, Tag2, Tag3] [Tag1]

[Tag1, Tag2]

[Tag1]

[Tag2]

Connect all this with the rest of your application domain

You’ll see, everything will get more complex: you will discover new time-

related dimensions (speed, position…) and new needs (complex

forecasting)

CHASE!

Chase

• Your target is running away• You have informers that track his moves

(coordinates in a point of time) and give you additional (unstructured) information

• You have a street map• You want to:– Catch him ASAP– Predict his moves– Be sure that he is inside an area

Chase

Chase

Chase

• Map is made of points and distances• You also have speed limits for streets

point1

pointN Distance: 1KmMax speed: 70Km/h

Distance: 2KmMax speed: 120Km/h

Distance: 8KmMax speed: 90Km/h

Street

Map point

Chase

• Map is made of points and distances• You also have speed limits for streets

• Distance / Speed = TIME!!!

Chase

You have a time series of your target’s moves

{Timestamp: 29/11/2014 17:15:00LAT: 19,12223LON: 42,134

}

{Timestamp: 29/11/2014 17:55:00LAT: 19,12223LON: 42,134

}

Event seqence

Event

{Timestamp: 29/11/2014

17:55:00LAT: 19,12223LON: 42,134

}

Chase

You have a time series of your target’s moves

21/11/20142:35:00 PM

20/11/20141:20:00 PM

Street

Map point

Chase

You have a time series of your target’s moves

21/11/201414:35:00

20/11/201413:20:00

Where

Event seqence

Street

Event

Map point

29/11/201417:55:00

Chase

Vertices and edges are also documents

So you can store complex information inside them

{ timestamp: 22213989487987,lat: xxxx,lon: yyy,informer: 15,additional: {speed: 120,description: “the target was in a car”car: {model: “Fiat 500”,licensePlate: “AA 123 BB”}}

}

Chase

Now you can:• Predict his moves (eg. statistical methods,

interpolation on lat/lon + time)• Calculate how far he can be (based on last

position, avg speed and street data)• Reach him quickly (shortest path, Dijkstra)• … intelligence?

Chase

But to have all this you need:• An easy way for your informers to send

time series events

Hint: REST interface

With OrientDB you can expose Javascript functions as REST services!

Chase

And you need:• An extended query language

Eg. TRAVERSE out(“street”) FROM (

SELECT out(“point”) FROM #11:11 // my last event

) WHILE canBeReached($current, #11:11)

(where he could be)

Chase

With OrientDB you can write

function canBeReached(node, event)

In Javascript and use it in your queries

Chase

It’s just a game, but think about:• Fraud detection• Traffic routing• Multi-dimensional analytics• Forecasting• …

Summary

One model is not enough

One of most common issues of my customers is:

“I have a zoo of technologies in my application stack, and it’s getting worse every day”

My answer is: Multi-Model DB

One model is not enough

One of most common issues of my customers is:

“I have a zoo of technologies in my application stack, and it’s getting worse every day”

My answer is: Multi-Model DB

of course ;-)

From:“choose the right data model for your

use case”

To:“Your application has multiple data

models, you need all of them!”

This is NoSQL 2.0!!!

Thank you!

@ldellaquila

l.dellaquila@orientechnologies.com