Time flows, my friend
Luigi Dell’AquilaOrient Technologies LTDTwitter: @ldellaquila
Managing event sequences and time series with a Document-Graph Database
Codemotion Milan 2014
Time What…?
Time What…?
Time series: A time series is a sequence of data points, typically consisting of successive measurements made over a time interval (Wikipedia)
Time What…?
Event sequences:
• A set of events with a timestamp• A set of relationships “happened
before/after”• Cause and effect relationships
Time What…?
Time as a dimension:
• Direct:– Eg. begin and end of relationships (I’m a
friend of John since…)• Calculated – Eg. Speed (distance/time)
Time What…?
Time as a constraint:
• Query execution time!
The problem:Fast and Effective
Fast and Effective
Fast write: Time doesn’t wait! Writes just arrive
Fast read: a lot of data to be read in a short time
Effective manipulation: complex operations like- Aggregation- Prediction- Analysis
Current approaches
Current approaches
0. Relational approach: table
Timestamp Value
2014:11:21 14:35:00 1321
2014:11:21 14:35:01 2444
2014:11:21 14:35:02 2135
2014:11:21 14:35:03 1833
Current approaches
0. Relational approach: table
HH MM SS Value
14 35 0 1321
14 35 1 2444
14 35 2 2135
14 35 3 1833
Current approaches
0. Relational – Advantages
• Simple• It can be used together with your application data
(operational)
Current approaches
0. Relational – Disadvantages
• Slow read (relies on an index)• Slow insert (update the index…)
Current approaches
1. Document Database
• Collections of Documents instead of tables• Schemaless• Complex data structures
Current approaches
1. Document approach: Minute Based
{timestamp: “2014-11-21 12.05“load: [10, 15, 3, … 30] //array of 60, one per second
}
Current approaches
1. Document approach: Hour Based
{timestamp: “2014-11-21 12.00“load: {
0: [10, 15, 3, … 30], //array of 60, one per second
1: [0, 12, 31, … 24],…59: [10, 10, 1, … 16]
}}
Current approaches
1. Document approach – Advantages
• Fast write: One insert x 60 updates• Fast fetch
Current approaches
1. Document approach – Disadvantages
• Fixed time windows• Single point per unit• How to pre-aggregate?• Relationships with the rest of the world?• Relationships between events?
Current approaches
2. Graph Database
• Nodes/Edges instead of tables• Index free adjacency• Fast traversal• Dynamic structure
Current approaches
2. Graph approach: linked sequence
e1
e2
next e3
next e4
next e5
next
(timestamp on vertex)
Current approaches
2. Graph approach: linked sequence (tag based)
e1
e2
nextTag1
e3
nextTag2
e4
nextTag1
e5
nextTag1
nextTag2
[Tag1, Tag2] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
Current approaches
2. Graph approach: Hierarchy
e1
e2
e60
1
1
8
24
2 60…
…
Days
Hours
Minutes
Seconds
…
e3
Current approaches
2. Graph approach: mixed
e1
e2
e60
1
1
8
24
2 60…
…
Days
Hours
Minutes
Seconds
…
e3
Current approaches
1. Graph approach – Advantages
• Flexible• Events can be connected together in different ways• You can connect events to other entities• Fast traversal of dynamic time windows• Fast aggregation (based on hierarchy)
Current approaches
1. Graph approach – Disadvantages
• Slow writes (vertex + edge + maintenance)• Not so fast reads
Can we mix different models and get all the advantages?
Can we mix all this with the rest of application logic?
Multi-Model!
• Document database (schema-free, complex properties)
• Graph database (index-free adjacency, fast traversal)• SQL (extended)• Operational (schema - ACID)• OO concepts (Classes, inheritance, polymorphism)• REST/JSON interface• Native Javascript (extend query language, expose
services, event hooks)• Distributed (Multi-master replica/sharding)
architecture
OrientDB
First step: put them together
1
1
8
24
2 60…
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96 }
OrientDB
First step: put them together
1
1
8
24
2 60…
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96 }
Graph
Document <- IT’S A VERTEX TOO!!!
OrientDB
First step: put them together
1
8
24
Days
Hours…
{0: {
0: 1000, 1: 1500,…59: 210
}1: { … }…59: { … }
}
Graph
Document
Where should I stop?
It depends on my domain and requirements.
OrientDB
Result:• Same insert speed of Document approach• But with flexibility of a Graph• (as a side effect of mixing models,
documents can also contain “pointers” to other elements of app domain)
OrientDB
Second step: Pre-aggregate
1
1
8
24
2 60…
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96 }
Graph
Document <- IT’S A VERTEX TOO!!!
OrientDB
Second step: Pre-aggregate
1
1
8
24
2 60…
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96 }
Graph
Document <- IT’S A VERTEX TOO!!!
sum()
OrientDB
Second step: Pre-aggregate
1
1
8
24
2 60…
Days
Hours
Minutes
…
{
0: 1000,
1: 1500.
…
59: 96 }
Graph
Document <- IT’S A VERTEX TOO!!!
sum()
sum()
OrientDB
How to aggregate
Hooks: Server side triggers (Java or Javascript), executed when DB operations happen (eg. Insert or update)
Java interface:
Public RESULT onBeforeInsert(…);public void onAfterInsert(…);public RESULT onBeforeUpdate(…);public void onAfterUpdate(…);
OrientDB
Aggregation logic
• Second 0 -> insert• Second 1 -> update• …• Second 57 -> update• Second 58 -> update• Second 59 -> update + aggregate
– Write aggregate value on minute vertex• Minute == 59? Calculate aggregate on hour vertex
OrientDB
1
1
8
24
2 60…
Days
Hours
Minutes
…
{
0: 1,
1: 12.
…
59: 3}
sum = 1000
sum = 15000
sum = 300
incomplete
complete
1 2
sum = null
sum = null
OrientDB
Query logic:• Traverse from root node to specified level
(filtering based on vertex data)• Is there aggregate value?– Yes: return it– No: go one level down and do the same
Aggregation on a level will be VERY fast if you have horizontal edges!
OrientDB
How to calculate aggregate values with a queryInput params:- Root node (suppose it is #11:11)
select sum(aggregateVal) from ( traverse out() from #11:11 while in().aggregateVal is null)
With the same logic you can query based on time windows
OrientDB
Third step: Complex domains
1
1 2 60…
Hours
Minutes
{0: {val: 1000},1: {val: 1500}.…59: {
val: 96,eventTags:
[tag1, tag2]…
}}
Graph
Document <- Enrich the domain
OrientDB
Another use case: Event Categories and OO
e1
e2
nextTag1
e3
nextTag2
e4
nextTag1
e5
nextTag1
nextTag2
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
nextTag3
e3
[Tag3]
OrientDB
Another use case: Event Categories and OO
Suppose tags are hierarchical categories (Classes for vertices and/or edges)
nextTAG
nextTagX nextTag3
nextTag2nextTag1
OrientDB
Subset of events
TRAVERSE out(‘nextTag1’) FROM <e1>
e1
e2
nextTag1 e4
nextTag1
e5
nextTag1
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
OrientDB
Subset of events
TRAVERSE out(‘nextTag2’) FROM <e1>
e1
nextTag1
e3
nextTag2 e5
nextTag2
[Tag1, Tag2, Tag3]
[Tag1, Tag2]
[Tag2]
OrientDB
Subset of events (Polymorphic!!!)
TRAVERSE out(‘nextTagX’) FROM <e1>
e1
e2
nextTag1
e3
nextTag2
e4
nextTag1
e5
nextTag1
nextTag2
[Tag1, Tag2, Tag3] [Tag1]
[Tag1, Tag2]
[Tag1]
[Tag2]
Connect all this with the rest of your application domain
You’ll see, everything will get more complex: you will discover new time-
related dimensions (speed, position…) and new needs (complex
forecasting)
CHASE!
Chase
• Your target is running away• You have informers that track his moves
(coordinates in a point of time) and give you additional (unstructured) information
• You have a street map• You want to:– Catch him ASAP– Predict his moves– Be sure that he is inside an area
Chase
Chase
Chase
• Map is made of points and distances• You also have speed limits for streets
point1
pointN Distance: 1KmMax speed: 70Km/h
Distance: 2KmMax speed: 120Km/h
Distance: 8KmMax speed: 90Km/h
Street
Map point
Chase
• Map is made of points and distances• You also have speed limits for streets
• Distance / Speed = TIME!!!
Chase
You have a time series of your target’s moves
{Timestamp: 29/11/2014 17:15:00LAT: 19,12223LON: 42,134
}
{Timestamp: 29/11/2014 17:55:00LAT: 19,12223LON: 42,134
}
Event seqence
Event
{Timestamp: 29/11/2014
17:55:00LAT: 19,12223LON: 42,134
}
Chase
You have a time series of your target’s moves
21/11/20142:35:00 PM
20/11/20141:20:00 PM
Street
Map point
Chase
You have a time series of your target’s moves
21/11/201414:35:00
20/11/201413:20:00
Where
Event seqence
Street
Event
Map point
29/11/201417:55:00
Chase
Vertices and edges are also documents
So you can store complex information inside them
{ timestamp: 22213989487987,lat: xxxx,lon: yyy,informer: 15,additional: {speed: 120,description: “the target was in a car”car: {model: “Fiat 500”,licensePlate: “AA 123 BB”}}
}
Chase
Now you can:• Predict his moves (eg. statistical methods,
interpolation on lat/lon + time)• Calculate how far he can be (based on last
position, avg speed and street data)• Reach him quickly (shortest path, Dijkstra)• … intelligence?
Chase
But to have all this you need:• An easy way for your informers to send
time series events
Hint: REST interface
With OrientDB you can expose Javascript functions as REST services!
Chase
And you need:• An extended query language
Eg. TRAVERSE out(“street”) FROM (
SELECT out(“point”) FROM #11:11 // my last event
) WHILE canBeReached($current, #11:11)
(where he could be)
Chase
With OrientDB you can write
function canBeReached(node, event)
In Javascript and use it in your queries
Chase
It’s just a game, but think about:• Fraud detection• Traffic routing• Multi-dimensional analytics• Forecasting• …
Summary
One model is not enough
One of most common issues of my customers is:
“I have a zoo of technologies in my application stack, and it’s getting worse every day”
My answer is: Multi-Model DB
One model is not enough
One of most common issues of my customers is:
“I have a zoo of technologies in my application stack, and it’s getting worse every day”
My answer is: Multi-Model DB
of course ;-)
From:“choose the right data model for your
use case”
To:“Your application has multiple data
models, you need all of them!”
This is NoSQL 2.0!!!