Where are yours vertexes and what are they talking about?

Preview:

Citation preview

Where are yours vertexes and what are they talking

about?Roberto Franchini

whoami(1)

More than 15 years of experience, proud to be a programmer

Member of OrientDB team, tech lead for full-text & spatial indexes, JDBC driver and Docker images

Wrote software for NLP and opinion mining on fast data/big data

JUG-Torino co-lead

#orientdb at #jugmi

Meet OrientDBThe First Ever Multi-Model Database Combining Flexibility of Documents with Connectedness of Graphs

The Five Imperatives

1. Availability and Integrity2. Scalability and Performance3. Relationships and Connections4. Data Model Complexity5. Agility and Ease of Use

Availability and Integrity

• Atomic, Consistent, Isolated and Durable (ACID) multi-statement transactions

Master Node

Master Node

CC C C

CCC

Multi-master Replication

Scalability and Performance

• Multi-Master Replication, Sharding and Auto-Discovery to Simplify Ops

Master Node

Master Node

CC C C

CCC

Auto-Discovere

d Node

Complex Relationships

No costs to traverse relationships:• Recommendation engines• Master Data Management• Information Clustering• Social Applications• Spatial Apps

JohnThriller

Comedy

Pulp Fiction

Mr Bean

TheaterB

TheaterA

Theater C

NYC

San Josè

Lives in

Likes

`

Flexible Data Model

{ ”@rid": “12:382”, ”@class": ”people", “first”: “John”, “last” : “Power”, “details”: {

“city”:”London",

“tags”:”millenial” }}

John

Comedy

Likes

General purpose solution:• Schema-less • Nested documents• Rich indexing and querying• Developer friendly

Agility and Ease of Use

• Flexible data model supports rapid iterations

• Hybrid or schema full guarantee data quality

• Graph model allows natural modeling of complex relationships

{ ”@rid": “12:382”, ”@class": ”people", “first”: “John”, “last” : “Power”, “details”: {

“city”:”London", “tags”:”millenial”

}}

developers are more productive and programming is

easier

API & Standards

• Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API

• SQL + extensions for graphs• JDBC driver to connect any BI tool• HTTP/JSON support• Drivers in Java, Node.js, Python,

PHP, .NET, Perl, C/C++ and more

A multi-model operational database can be the system of records for modern enterprises and the database of choice for ISV/OEMs

Snow Patrol(Band)

Luca(Accou

nt)

Indie(Genre

)123, 1st

Street Austin, TX

(Location)

Jill(Accou

nt)

Graphs{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Jill”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: { “city”:”London", “tags”:”millennial” }}

Schema-less structures

Object Oriented

Key-Value pairs

Geo-Spatial

Full-Text

Graph

Document

Object

Key/Value

Multi-Model represents the intersection

of multiple models in just one product

Full-Text

Spatial

Multi-model

Graph databases

Order #134(Order)

John(Provider)

Commodore

Amiga 1200

(Product)

Frank(Customer

)

Monitor 40”

(Product)

Mouse(Product)

Bruno(Provider)

Just Data

Order #134(Order)

John(Provider)

Commodore

Amiga 1200

(Product)

Frank(Customer

)

Monitor 40”

(Product)

Mouse(Product)

Bruno(Provider)

Data by itself has little value, it’s the relationship

between data that gives it

incredible value

Order #134(Order)

John(Provider)

Commodore

Amiga 1200

(Product)

(Sells)

Frank(Customer

)

(Has)

(Makes)

Monitor 40”

(Product)

(Sells)

(Has)

Mouse(Product)

Bruno(Provider)

(Sells)

(Has)

Data and relationships

Every developer knowsthe Relational Model,

but who knows theGraph one?

Back to school:Graph Theory crash

course

Basic Graph

Roberto MilanVisite

d

Vertices and Edges can have properties

Vertices are directed

* https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model

Property Graph Model*

Milancountry: Italy

Roberto

company: OrientDB

Vertices and Edges can have properties

Vertices and Edges can have properties

Visited

on: 2016

RobertoMilan

Visited

on: 2016

An Edge connects only 2 vertices

Use multiple edges to represent 1-N and N-M relationships

Worked

on: 2016

1-N and N-M relationships

Rob Milan

Visited

on: 2016#13:55

#15:99

out = #22:11 in = #22:11

#22:11

(Edge)

(Vertex)

(Vertex)

out = #13:55

in = #15:99

Connections use persistent

pointers

Each element in the Graph has own

immutable Record ID

Each element in the Graph has own

immutable Record ID

Each element in the Graph has own

immutable Record ID

Congrats! This is your diploma in

«Graph Theory»

Searching for something

Vertices and Edges are Documents

`

{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Frank”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: {

“city”:”London",“tags”:”millennial”

}}

Frank

Order

Makes

General purpose solution:• JSON• Schema-less • Schema-full• Schema-hybrid• Nested documents• Rich indexing and

querying• Developer friendly

Schema

• Property types – STRING, DATE, DATETIME, BYTE,

BOOLEAN, SHORT, BINARY• Constraint

– MANDATORY, NOTNULL, MIN, MAX, READONLY, REGEX

Schema

• Define indexes on single property or multiple properties– UNIQUE– NOT UNIQUE– FULL TEXT (Lucene)– SPATIAL (Lucene)

Polymorphic domain schema

Customer

Provider

Productname: string

qty: int

Actorname: string

surname: string

Sellsprice:

decimal

Inherits

Edge

Legenda:

V Vertex

Makes

Ordernumber:

intdate:

datetime

Hasprice:

decimal

Who

A Vertex is just a Document

We can define indexes on fields

CREATE CLASS User EXTENDS VCREATE PROPERTY User.userId LONGCREATE INDEX User.userId ON User(userId) UNIQUE

SELECT FROM User WHERE userId = 1024

What?

Ok, but my Users can describe themselves with free text. How can I find users describing

themselves as programmes?

CREATE PROPERTY User.description STRINGCREATE INDEX User.description

ON User(description) FULLTEXT ENGINE LUCENE

SELECT FROM User WHERE description LUCENE “programmer”

Where?

Users write articles with geo localisation data inside. I want all the article posted from the

Milano’s areaCREATE CLASS Article EXTENDS VCREATE PROPERTY Article.geo EMBEDDED OPointCREATE INDEX Article.geo

ON Article (geo) SPATIAL ENGINE LUCENE

SELECT * FROM Article WHERE ST_WITHIN(geo,

ST_Buffer(ST_GeomFromText(‘POINT(8.959091 46.005473)'), 1)) = true

Twitter Graph

Twitter graph

User

Tweet

Posts

User

Follows

Tweet

Retweets

Tweet

ReplyTo

Source

Using

Hashtag

Tags

User schema

CREATE CLASS User EXTENDS VCREATE PROPERTY User.userId LONGCREATE INDEX User.userId ON User(userId) UNIQUE

CREATE PROPERTY User.description STRINGCREATE PROPERTY User.screenName STRINGCREATE PROPERTY User.lang STRINGCREATE PROPERTY User.location STRING

Tweet schema

CREATE CLASS Tweet EXTENDS VCREATE PROPERTY Tweet.tweetId LONGCREATE INDEX Tweet.tweetId ON Tweet(tweetId) UNIQUECREATE PROPERTY Tweet.text STRINGCREATE PROPERTY Tweet.lang STRINGCREATE PROPERTY Tweet.location STRINGCREATE PROPERTY Tweet.createdAt DATETIMECREATE PROPERTY Tweet.isRetweeted BOOLEANCREATE PROPERTY Tweet.isRetweet BOOLEAN

Indexes

CREATE INDEX User.description ON User(description) FULLTEXT ENGINE LUCENE

CREATE INDEX Tweet.text ON Tweet(text) FULLTEXT ENGINE LUCENE

CREATE PROPERTY Tweet.geo EMBEDDED OPointCREATE INDEX Tweet.geo ON Tweet (geo)

SPATIAL ENGINE LUCENE

Relations

CREATE CLASS Posts EXTENDS E

CREATE CLASS Hashtag EXTENDS VCREATE PROPERTY Hashtag.label STRING

CREATE CLASS Tags EXTENDS E

CREATE CLASS Source EXTENDS VCREATE PROPERTY Source.name STRING

CREATE CLASS Using EXTENDS E

CREATE CLASS Follows EXTENDS ECREATE CLASS Retweets EXTENDS ECREATE CLASS ReplyTo EXTENDS ECREATE CLASS Mentions EXTENDS E

It’s demo time

docker run --name jugmi16 -d \

-v ~/local/orientdb/jugmi16/config:/orientdb/config \

-v ~/local/orientdb/jugmi16/databases:/orientdb/databases \

-p 2424:2424 -p 2480:2480 \

-e ORIENTDB_ROOT_PASSWORD=rootpwd \

-e ORIENTDB_NODE_NAME=ved1 \

orientdb/orientdb-spatial:latest server.sh

Run the Docker!

42Luca Franchini

Full text

Based on Lucene

Configurable

Analyzers

Stopwords

Access type

Full text

CREATE INDEX City.name ON City(name) FULLTEXT ENGINE LUCENE METADATA{ "directory_type": "nio", "use_compound_file": false, "ram_buffer_MB": "16", "max_buffered_docs": "-1", "max_buffered_delete_terms": "-1", "ram_per_thread_MB": "1024", "default": "org.apache.lucene.analysis.standard.StandardAnalyzer" "description_index": "org.apache.lucene.analysis.standard.StandardAnalyzer", "description_index_stopwords": [ "the", "is" ]}

Spatial

Lucene, Spatial4J, JTS

Geometry data

Point, line, polygon, multiline, multipolygon

Functions

follows The Open Geospatial Consortium OGC for extending SQL to support spatial data.

Implements a subset of SQL-MM functions with ST prefix (Spatial Type)

Spatial

Functions

ST_AsText(geom)

ST_GeomFromText(text)

ST_Equals(geom1,geom2)

ST_Within(geom1,geom2)

ST_Contains(geom1,geom2)

….

Spatial

SELECT ST_Intersects(ST_GeomFromText('POINT(0 0)'),

ST_GeomFromText('LINESTRING ( 2 0, 0 2 )'));

Result → (false)

SELECT ST_Disjoint(ST_GeomFromText('POINT(0 0)'),

ST_GeomFromText('LINESTRING ( 2 0, 0 2 )'));

Result → (true)

OrientDB Features

First Multi-Model DBMS with a Graph-Engine

Open Source Apache2 license

Data Models are built into the core engine

Schema-less, Schema-full and Schema-mixed

Written in Java (runs on every platform)

Zero-config HA

Get Started for Free

OrientDB Community Edition is FREE for any purpose (Apache 2 license)

Udemy Getting Started Training is ★★★★★ and Freehttp://www.orientechnologies.com/getting-started

OrientDB Enterprise is Free for Development

OrientDB At a Glance

70,000Downloads per

month from 200+ countries

100+Code contributors on Github and 15,000+

commits

1,000sUsers from SMBs to

Fortune 10 Companies

17+Years of

Product Research

Global Coverage and 24x7 Support

Thanks!

ROME 18-19 MARCH 2016

http://www.orientdb.com@robfrankier.franchini@orientdb.com

B-SideHow (the demo) it’s made

Roberto Franchini

How it’s made

Twitter4j for fetching stream

rxJava for stream processing (not so much

processing)

OrientDB graph API

OrientDB Docker image (custom)