Upload
infinitegraph
View
103
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
www.Objectivity.com
Realize The Value In Your Big Data With Graph Technology
Leon Guzenda - Objectivity, Inc. DBTA Webinar – January 17, 2013
• Who We Are
• Current Big Data Analytics • Relationship Analytics • Graph Technologies
• The Big Data Connection Platform
Overview
About Objectivity Inc.
• Objectivity, Inc. is headquartered in Sunnyvale, California.
• Established in 1988 to tackle database problems that network/hierarchical/relational and file-based technologies struggle with.
• Objectivity has over two decades of Big Data and NoSQL experience
• Develops NoSQL platforms for managing and discovering relationships and patterns in complex data:
– Objectivity/DB - an object database that manages localized, centralized or distributed databases
– InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables organizations to find, store and exploit the relationships in their data
Embedded in hundreds of enterprises, government organizations and products - millions of deployments.
9/28/11 4
Human Intelligence (HUMINT) Analysis 9
28
11
4
Big Data Technologies Are Still Evolving
We All Know The Problem - Information Overload!
Volume, Velocity, Variety, Veracity, Value... Making sense of it all takes time and $$$
Current “Big Data” Analytics
A Typical “Big Data” Analytics Setup
Data Aggregation and Analytics Applications
Commodity Linux Platforms and/or High Performance Computing Clusters
Structured Semi-Structured Unstructured
Graph DB
Object DB Doc DB K-V
Store Hadoop Column Store
Data W/H RDBMS
Incremental Improvements Aren’t Enough
All current solutions use the same basic architectural model • None of the current solutions have a way to store connections between
entities in different silos • Most analytic technology focuses on the content of the data nodes, rather
than the many kinds of connections between the nodes and the data in those connections
• Why? Because traditional and earlier NoSQL solutions are bad at handling
relationships. • Graph databases can efficiently store, manage and query the many kinds of
relationships hidden in the data.
• Key-Value Stores
• “Big Table” Clones
• Document Databases
• Object and Graph databases
Not Only SQL – a group of 4 primary technologies
Graph Database
Graph Processing
Not Only SQL – A group of 4 primary technologies
Simple Highly Interconnected
Graph Theory Terminology...
VERTEX: A single node in a graph data structure
EDGE: A connection between a pair of VERTICES
PROPERTIES: Data items that belong to a particular Vertex
WEIGHT: A quantity associated with a particular Edge
GRAPH: A collection of linked Vertex and Edge objects
Vertex 1 Vertex 2 Edge 1
City: San Francisco Pop: 812,826
City: San Jose Pop: 967,487
Road: I-101 Miles: 47.8
...Graph Theory Terminology...
SIMPLE/UNDIRECTED GRAPH: A Graph where each VERTEX may be linked to
one or more Vertex objects via Edge objects and each Edge object is connected to exactly two Vertex objects. Furthermore, neither Vertex connected to an Edge is more significant than the other.
DIRECTED GRAPH: A Simple/Undirected Graph where one Vertex in a Vertex + Edge + Vertex group (an “Arc” or “Path”) can be considered the “Head” of the Path and the other can be considered the “Tail”.
MIXED GRAPH: A Graph in which some paths are Undirected and others are Directed.
...Graph Theory Terminology
LOOP: An Edge that is doubly-linked to the same Vertex
MULTIGRAPH: A Graph that allows multiple Edges and Loops
QUIVER: A Graph where Vertices are allowed to be connected by multiple Arcs. A Quiver may include Loops.
WEIGHTED GRAPH: A Graph where a quantity is assigned to an Edge, e.g. a Length assigned to an Edge representing a road between two Vertices representing cities.
HALF EDGE: An Edge that is only connected to a single Vertex
LOOSE EDGE: An Edge that isn't connected to any Vertices.
CONNECTIVITY: Two Vertices are Connected if it is possible to find a path between them.
Relationship Analytics
Example 1 – Social Network Analysis
Sources may be covert or open Telecom Call Detail Records Banking transactions Flight and hotel reservations MASINT Twitter Facebook Google+ LinkedIn Plaxo Flickr Youtube
Example 2 – Finding Patterns In Open Source Data...
Data Volumes
Fast-Changing Data
Sensitivity of Data
Significance of Data
The Challenges
...Example 2 – Finding Patterns In Open Source Data
Example 3 – Logistics
Example 4 - Cyber Security...
… Example 4 - Cyber Security
Link Hunter - POC For A Federal Police Force
Run the live demo at objectivity.com [Resources, Live Demos]
MAKING GRAPH ANALYTICS WORK EFFICIENTLY
Relationship (Connection) Analytics... A SQL Shortcoming Think about the SQL query for finding all links between the two “blue” rows... it's hard!!
Table_A Table_B Table_C Table_D Table_E Table_F Table_G
There are some kinds of complex relationship handling problems that SQL wasn't designed for.
Relationship (Connection) Analytics...
InfiniteGraph - The solution can be found with a few lines of code
A SQL Shortcoming
A3 G4
Table_A Table_B Table_C Table_D Table_E Table_F Table_G
Representing the Graph...
Combatant A
Civilian Q
Situation Y
Civilian P
Bank X
Civilian S
Civilian R
Events/Places People/Orgs Facts
Situation X
The existing data might look like this:
Target T
Cafe C S Seen Near T A Banks at X
A Called P
A Seen At Y
A Seen Near X P Emailed S
P Called Q Q Seen Near T
P Called R R Seen Near T
X Paid S
A Eats At
Representing the Graph...
Combatant A
Civilian Q
Situation Y
Civilian P
Civilian S
Civilian R
Events/Places People/Orgs Facts
Situation X
Target T
We start by identifying the nodes (Vertices) and the connections (Edges)
NODES CONNECTIONS
S Seen Near T A Banks at X
A Called P
A Seen At Y
A Seen Near X P Emailed S
P Called Q Q Seen Near T
P Called R R Seen Near T
X Paid S Bank X
Cafe C
A Eats At
VERTEX EDGE 2 N
...Representing the Graph..
“Nodes” “Connections”
...Representing the Graph..
Situation X Combatant A Seen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
Eats At
Cafe C
VERTEX EDGE “Nodes” “Connections”
...Analyzing the Graph...
Situation X Combatant A Seen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
Eats At
Cafe C
...Threat Analysis
Situation X Combatant A Seen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
SUSPECTS
NEEDS PROTECTION
Visual Analytics
Copyright © Objectivity, Inc. 2012
Recognizing Graphs In Object Models... Tree Structures
Graph (Network) Structures
Relationship Data
Object Class A
Object Class A
1-to-Many Relationship Data
Object Class A
Many-to-Many
Object Class A
• Distributed Graph Processing
• Angrapa, Apache Hama, Faunus, Giraph, GoldenOrb, HipG, InfiniteGraph,
Jpregel, KDT, OpenLink Virtuoso, Phoebus, Pregel, Sedge, Scala Signel/Collect, Trinity, Parallel Boost Graph Library (PGBL)...
APIs and Graph Programming/Query Languages
• Blueprints, Bulbflow, Cypher, Gremlin, Pacer, Pipes, PYBlueprints, Pygr,
Rexster, SPARQL, SPASQL, Styx...
Graph Data Interchange Formats
• DGML, Dot Language, GraphML, GML, GXL, XGMML, Trivial Graph Format...
Graph Processing Technologies and APIs
• In Memory, e.g. YarcData, Apache Hama...
• RDF stores – Allegrograph, BigData, OpenLink Virtuoso, R2DF...
• Document relationships – ArangoDB, OrientDB...
• Single server or embedded graph DBMSs – DEX, Filament, Graphbase,
HypergraphDB, Neo4J, VertexDB...
• Layers over existing DBMSs – Horton, Infogrid, OQGraph...
• Distributed Graph DBMSs – InfiniteGraph, Titan...
Graph Database Technologies
Graph Databases Post-2003
X
Graph Databases Compared [UNSW]
SUPPORT FOR ESSENTIAL GRAPH QUERIES
THE BIG DATA CONNECTION PLATFORM
Data Visualization & Analytics
Big Data Connection
Platform
*Now HP *Now IBM
Conventional & Graph Analytics
ORACLE or Other Big Data Solutions +
• A high performance distributed database engine that supports analyst-time decision
support and actionable intelligence
• Cost effective link analysis – flexible deployment on commodity resources (hardware
and OS).
• Efficient, scalable, risk averse technology – enterprise proven.
• High Speed parallel ingest to load graph data quickly.
• Parallel, distributed queries
• Flexible plugin architecture
• Complementary technology
• Fast proof of concept – easy to use Graph API.
InfiniteGraph - The Enterprise Graph Database
Basic Capabilities Of Most Graph Databases
Rapid Graph Traversal Inclusive or Exclusive Selection
X
X
Find the Shortest or All Paths Between Objects
Start Start
Start Finish
InfiniteGraph 3.0
PARALLEL LOAD & SEARCH
Start
Computational & Visualization Plugins
Start
Total Path Latency
Display Fastest Path
Summary - Graph Analytics
• Can Be Used For:
– Social Network Analysis
– Pattern finding in open source data
– Logistics
– Campaign planning
– Energy usage, planning and protection
• The technology works best if the graph is extracted from existing
sources and stored in a Graph Database.
Thank You!
Please take a look at objectivity.com For InfiniteGraph Online Demos, White Papers, Free
Downloads, Samples & Tutorials