43
www.Objectivity.com Realize The Value In Your Big Data With Graph Technology Leon Guzenda - Objectivity, Inc. DBTA Webinar – January 17, 2013

Dbta Webinar Realize Value of Big Data with graph 011713

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Dbta Webinar Realize Value of Big Data with graph  011713

www.Objectivity.com

Realize The Value In Your Big Data With Graph Technology

Leon Guzenda - Objectivity, Inc. DBTA Webinar – January 17, 2013

Page 2: Dbta Webinar Realize Value of Big Data with graph  011713

• Who We Are

• Current Big Data Analytics • Relationship Analytics • Graph Technologies

• The Big Data Connection Platform

Overview

Page 3: Dbta Webinar Realize Value of Big Data with graph  011713

About Objectivity Inc.

• Objectivity, Inc. is headquartered in Sunnyvale, California.

• Established in 1988 to tackle database problems that network/hierarchical/relational and file-based technologies struggle with.

• Objectivity has over two decades of Big Data and NoSQL experience

• Develops NoSQL platforms for managing and discovering relationships and patterns in complex data:

– Objectivity/DB - an object database that manages localized, centralized or distributed databases

– InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables organizations to find, store and exploit the relationships in their data

Embedded in hundreds of enterprises, government organizations and products - millions of deployments.

Page 4: Dbta Webinar Realize Value of Big Data with graph  011713

9/28/11 4

Human Intelligence (HUMINT) Analysis 9

28

11

4

Page 5: Dbta Webinar Realize Value of Big Data with graph  011713

Big Data Technologies Are Still Evolving

Page 6: Dbta Webinar Realize Value of Big Data with graph  011713

We All Know The Problem - Information Overload!

Volume, Velocity, Variety, Veracity, Value... Making sense of it all takes time and $$$

Current “Big Data” Analytics

Page 7: Dbta Webinar Realize Value of Big Data with graph  011713

A Typical “Big Data” Analytics Setup

Data Aggregation and Analytics Applications

Commodity Linux Platforms and/or High Performance Computing Clusters

Structured Semi-Structured Unstructured

Graph DB

Object DB Doc DB K-V

Store Hadoop Column Store

Data W/H RDBMS

Page 8: Dbta Webinar Realize Value of Big Data with graph  011713

Incremental Improvements Aren’t Enough

All current solutions use the same basic architectural model • None of the current solutions have a way to store connections between

entities in different silos • Most analytic technology focuses on the content of the data nodes, rather

than the many kinds of connections between the nodes and the data in those connections

• Why? Because traditional and earlier NoSQL solutions are bad at handling

relationships. • Graph databases can efficiently store, manage and query the many kinds of

relationships hidden in the data.

Presenter
Presentation Notes
Thinking we should be less about Objy in the last bullet… possibly Object oriented and graph databases… ?
Page 9: Dbta Webinar Realize Value of Big Data with graph  011713

• Key-Value Stores

• “Big Table” Clones

• Document Databases

• Object and Graph databases

Not Only SQL – a group of 4 primary technologies

Graph Database

Graph Processing

Page 10: Dbta Webinar Realize Value of Big Data with graph  011713

Not Only SQL – A group of 4 primary technologies

Simple Highly Interconnected

Page 11: Dbta Webinar Realize Value of Big Data with graph  011713

Graph Theory Terminology...

VERTEX: A single node in a graph data structure

EDGE: A connection between a pair of VERTICES

PROPERTIES: Data items that belong to a particular Vertex

WEIGHT: A quantity associated with a particular Edge

GRAPH: A collection of linked Vertex and Edge objects

Vertex 1 Vertex 2 Edge 1

City: San Francisco Pop: 812,826

City: San Jose Pop: 967,487

Road: I-101 Miles: 47.8

Page 12: Dbta Webinar Realize Value of Big Data with graph  011713

...Graph Theory Terminology...

SIMPLE/UNDIRECTED GRAPH: A Graph where each VERTEX may be linked to

one or more Vertex objects via Edge objects and each Edge object is connected to exactly two Vertex objects. Furthermore, neither Vertex connected to an Edge is more significant than the other.

DIRECTED GRAPH: A Simple/Undirected Graph where one Vertex in a Vertex + Edge + Vertex group (an “Arc” or “Path”) can be considered the “Head” of the Path and the other can be considered the “Tail”.

MIXED GRAPH: A Graph in which some paths are Undirected and others are Directed.

Page 13: Dbta Webinar Realize Value of Big Data with graph  011713

...Graph Theory Terminology

LOOP: An Edge that is doubly-linked to the same Vertex

MULTIGRAPH: A Graph that allows multiple Edges and Loops

QUIVER: A Graph where Vertices are allowed to be connected by multiple Arcs. A Quiver may include Loops.

WEIGHTED GRAPH: A Graph where a quantity is assigned to an Edge, e.g. a Length assigned to an Edge representing a road between two Vertices representing cities.

HALF EDGE: An Edge that is only connected to a single Vertex

LOOSE EDGE: An Edge that isn't connected to any Vertices.

CONNECTIVITY: Two Vertices are Connected if it is possible to find a path between them.

Page 14: Dbta Webinar Realize Value of Big Data with graph  011713

Relationship Analytics

Page 15: Dbta Webinar Realize Value of Big Data with graph  011713

Example 1 – Social Network Analysis

Sources may be covert or open Telecom Call Detail Records Banking transactions Flight and hotel reservations MASINT Twitter Facebook Google+ LinkedIn Plaxo Flickr Youtube

Page 16: Dbta Webinar Realize Value of Big Data with graph  011713

Example 2 – Finding Patterns In Open Source Data...

Data Volumes

Fast-Changing Data

Sensitivity of Data

Significance of Data

The Challenges

Page 17: Dbta Webinar Realize Value of Big Data with graph  011713

...Example 2 – Finding Patterns In Open Source Data

Page 18: Dbta Webinar Realize Value of Big Data with graph  011713

Example 3 – Logistics

Page 19: Dbta Webinar Realize Value of Big Data with graph  011713

Example 4 - Cyber Security...

Page 20: Dbta Webinar Realize Value of Big Data with graph  011713

… Example 4 - Cyber Security

Page 21: Dbta Webinar Realize Value of Big Data with graph  011713

Link Hunter - POC For A Federal Police Force

Run the live demo at objectivity.com [Resources, Live Demos]

Page 22: Dbta Webinar Realize Value of Big Data with graph  011713

MAKING GRAPH ANALYTICS WORK EFFICIENTLY

Presenter
Presentation Notes
This section seems out of place.
Page 23: Dbta Webinar Realize Value of Big Data with graph  011713

Relationship (Connection) Analytics... A SQL Shortcoming Think about the SQL query for finding all links between the two “blue” rows... it's hard!!

Table_A Table_B Table_C Table_D Table_E Table_F Table_G

There are some kinds of complex relationship handling problems that SQL wasn't designed for.

Page 24: Dbta Webinar Realize Value of Big Data with graph  011713

Relationship (Connection) Analytics...

InfiniteGraph - The solution can be found with a few lines of code

A SQL Shortcoming

A3 G4

Table_A Table_B Table_C Table_D Table_E Table_F Table_G

Page 25: Dbta Webinar Realize Value of Big Data with graph  011713

Representing the Graph...

Combatant A

Civilian Q

Situation Y

Civilian P

Bank X

Civilian S

Civilian R

Events/Places People/Orgs Facts

Situation X

The existing data might look like this:

Target T

Cafe C S Seen Near T A Banks at X

A Called P

A Seen At Y

A Seen Near X P Emailed S

P Called Q Q Seen Near T

P Called R R Seen Near T

X Paid S

A Eats At

Page 26: Dbta Webinar Realize Value of Big Data with graph  011713

Representing the Graph...

Combatant A

Civilian Q

Situation Y

Civilian P

Civilian S

Civilian R

Events/Places People/Orgs Facts

Situation X

Target T

We start by identifying the nodes (Vertices) and the connections (Edges)

NODES CONNECTIONS

S Seen Near T A Banks at X

A Called P

A Seen At Y

A Seen Near X P Emailed S

P Called Q Q Seen Near T

P Called R R Seen Near T

X Paid S Bank X

Cafe C

A Eats At

Page 27: Dbta Webinar Realize Value of Big Data with graph  011713

VERTEX EDGE 2 N

...Representing the Graph..

“Nodes” “Connections”

Page 28: Dbta Webinar Realize Value of Big Data with graph  011713

...Representing the Graph..

Situation X Combatant A Seen Near

Civilian P

Called

Called

Seen At Situation Y

Civilian Q

Target T

Seen Near

Emailed

Banks At

Bank X

Civilian S

Seen Near

Called

Civilian R

Seen Near

Paid

Eats At

Cafe C

VERTEX EDGE “Nodes” “Connections”

Page 29: Dbta Webinar Realize Value of Big Data with graph  011713

...Analyzing the Graph...

Situation X Combatant A Seen Near

Civilian P

Called

Called

Seen At Situation Y

Civilian Q

Target T

Seen Near

Emailed

Banks At

Bank X

Civilian S

Seen Near

Called

Civilian R

Seen Near

Paid

Eats At

Cafe C

Page 30: Dbta Webinar Realize Value of Big Data with graph  011713

...Threat Analysis

Situation X Combatant A Seen Near

Civilian P

Called

Called

Seen At Situation Y

Civilian Q

Target T

Seen Near

Emailed

Banks At

Bank X

Civilian S

Seen Near

Called

Civilian R

Seen Near

Paid

SUSPECTS

NEEDS PROTECTION

Page 31: Dbta Webinar Realize Value of Big Data with graph  011713

Visual Analytics

Presenter
Presentation Notes
Note Object Oriented Databases as NOSQL here.
Page 32: Dbta Webinar Realize Value of Big Data with graph  011713

Copyright © Objectivity, Inc. 2012

Recognizing Graphs In Object Models... Tree Structures

Graph (Network) Structures

Relationship Data

Object Class A

Object Class A

1-to-Many Relationship Data

Object Class A

Many-to-Many

Object Class A

Page 33: Dbta Webinar Realize Value of Big Data with graph  011713

• Distributed Graph Processing

• Angrapa, Apache Hama, Faunus, Giraph, GoldenOrb, HipG, InfiniteGraph,

Jpregel, KDT, OpenLink Virtuoso, Phoebus, Pregel, Sedge, Scala Signel/Collect, Trinity, Parallel Boost Graph Library (PGBL)...

APIs and Graph Programming/Query Languages

• Blueprints, Bulbflow, Cypher, Gremlin, Pacer, Pipes, PYBlueprints, Pygr,

Rexster, SPARQL, SPASQL, Styx...

Graph Data Interchange Formats

• DGML, Dot Language, GraphML, GML, GXL, XGMML, Trivial Graph Format...

Graph Processing Technologies and APIs

Page 34: Dbta Webinar Realize Value of Big Data with graph  011713

• In Memory, e.g. YarcData, Apache Hama...

• RDF stores – Allegrograph, BigData, OpenLink Virtuoso, R2DF...

• Document relationships – ArangoDB, OrientDB...

• Single server or embedded graph DBMSs – DEX, Filament, Graphbase,

HypergraphDB, Neo4J, VertexDB...

• Layers over existing DBMSs – Horton, Infogrid, OQGraph...

• Distributed Graph DBMSs – InfiniteGraph, Titan...

Graph Database Technologies

Page 35: Dbta Webinar Realize Value of Big Data with graph  011713

Graph Databases Post-2003

X

Page 36: Dbta Webinar Realize Value of Big Data with graph  011713

Graph Databases Compared [UNSW]

SUPPORT FOR ESSENTIAL GRAPH QUERIES

Page 37: Dbta Webinar Realize Value of Big Data with graph  011713

THE BIG DATA CONNECTION PLATFORM

Presenter
Presentation Notes
This section seems out of place.
Page 38: Dbta Webinar Realize Value of Big Data with graph  011713

Data Visualization & Analytics

Big Data Connection

Platform

*Now HP *Now IBM

Conventional & Graph Analytics

ORACLE or Other Big Data Solutions +

Presenter
Presentation Notes
By having a scalable and distributed platform that can manage connections between all types of disparate data, enterprise can easily capitalize on the best tools for the job at hand.
Page 39: Dbta Webinar Realize Value of Big Data with graph  011713

• A high performance distributed database engine that supports analyst-time decision

support and actionable intelligence

• Cost effective link analysis – flexible deployment on commodity resources (hardware

and OS).

• Efficient, scalable, risk averse technology – enterprise proven.

• High Speed parallel ingest to load graph data quickly.

• Parallel, distributed queries

• Flexible plugin architecture

• Complementary technology

• Fast proof of concept – easy to use Graph API.

InfiniteGraph - The Enterprise Graph Database

Page 40: Dbta Webinar Realize Value of Big Data with graph  011713

Basic Capabilities Of Most Graph Databases

Rapid Graph Traversal Inclusive or Exclusive Selection

X

X

Find the Shortest or All Paths Between Objects

Start Start

Start Finish

Page 41: Dbta Webinar Realize Value of Big Data with graph  011713

InfiniteGraph 3.0

PARALLEL LOAD & SEARCH

Start

Computational & Visualization Plugins

Start

Total Path Latency

Display Fastest Path

Page 42: Dbta Webinar Realize Value of Big Data with graph  011713

Summary - Graph Analytics

• Can Be Used For:

– Social Network Analysis

– Pattern finding in open source data

– Logistics

– Campaign planning

– Energy usage, planning and protection

• The technology works best if the graph is extracted from existing

sources and stored in a Graph Database.

Presenter
Presentation Notes
By having a scalable and distributed platform that can manage connections between all types of disparate data, enterprise can easily capitalize on the best tools for the job at hand.
Page 43: Dbta Webinar Realize Value of Big Data with graph  011713

Thank You!

Please take a look at objectivity.com For InfiniteGraph Online Demos, White Papers, Free

Downloads, Samples & Tutorials

[email protected]