22
1 [ Augment Your Analytics Ecosystem Through Scalable Graph Analytics Kiran Narsu, YarcData

Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

1 [

Augment Your Analytics

Ecosystem Through Scalable

Graph Analytics

Kiran Narsu, YarcData

Page 2: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

2

What is Graph Analytics?

2

Models of complex networks of data

in a graph representation of nodes

and edges

The nodes represent entities of

interest and the edges represent

the relationship between entities

The analysis of nodes and edges

provides information on

relationships in the data

Page 3: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

3

Not a chart

A Graph is a fundamental data structure

A collection of vertices (nodes) and edges (links, relationships, connections)

What is a Graph?

Page 4: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

4

Graphs are Everywhere

Page 5: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

5

Simple Graph

RDF Triple

Subject Predicate Object

1234 First Name

John

RDF Triple

Subject Predicate Object

1234 First Name John

Page 6: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

6

Graph vs. Relational - Example

Customer

Cust ID Entity Type Tax ID

1234 Person 999-99-9999

Account Position Mapping

AcctId Instrument Quantity Instrument Type

AC567 IBM 1000 Equity

AC567 USIBM_OPT 100 Equity Option

CustomerAccount

Cust ID Account ID

1234 AC567

Account Master

Account ID Account Type

AC123 Trading Account

Person

Cust ID Last Name First Name

1234 Smith John

1234,First Name,John

1234,Last Name,Smith

1234,Entity Type,Person

1234,Tax ID,999-99-9999

1234,Account ID,AC567

AC567,Account Type,Trading Account

AC567,Instrument,IBM

AC567,Instrument,USIBM_OPT

IBM,Quantity,1000

USIBM_OPT,Quantity,100

IBM,Instrument Type,Equity

USIBM_OPT,Instrument Type, Equity Option

TR

AD

IT

IO

NA

L

GR

AP

H

RDF Triples

Page 7: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

7

Data Representation in Graph Format

1234,First Name,John

1234,Last Name,Smith

1234,Entity Type,Person

1234,Tax ID,999-99-9999

1234,Account ID,AC567

AC567,Account Type,Trading Account

AC567,Instrument,IBM

AC567,Instrument,USIBM_OPT

IBM,Quantity,1000

USIBM_OPT,Quantity,100

IBM,Instrument Type,Equity

USIBM_OPT,Instrument Type, Equity Option

RDF Triples

1234 First Name

John

AC567

IBM Quantity

1000

100

Smith

999-

99-

9999

Person

Trading

Account

USIBM

_OPT

Equity

Option

Equity

Financial Instrument

Type O

f T

ype O

f

Option

Equity,Type Of,Financial Instrument

Equity Options,Type Of,Option

Option,Type Of,Financial Instrument

Page 8: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

8

Emerging Questions Demand Emerging Approaches

Question Graph

Technique

Challenge

What is the shortest non-

obvious path connecting two

entities?

Path Analysis Pre-loading all paths to analyze

connections is difficult

Who are the central players in a

given fraud event?

Betweenness

Centrality

Fixed relational models inhibit

finding entities who are central

What clusters or communities

exist in a population?

Community

Detection,

Clustering

Finding communities without

“bias” and then discovering

attributes is technically difficult

Graph analytics can enable you to:

Connect widely disparate data, load it all in one place, and discover

connections in the data, without knowing the questions in advance

Page 9: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

9

Graphs – The Ideal Structure for “Discovery”

Dynamic Data Sources

• Simple data model

• Support for multiple data types

• Schema information and data mix harmoniously

• Augment data and definitions

Increasing Volumes

• Low redundancy

• Compact data format

• Easy to add data “on-the-fly”

Greater Flexibility

• No fixed schema - no constraints on queries

• Relationships not hidden – true discovery

• Support for unique analytic techniques such as clustering, community detection, path analysis, etc.

Business Questions

Data Structure

Page 10: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

10

Graphs Are Ideal for Interactive, Iterative “Discovery”

Graphs allow you to define

and redefine your analysis

as you go along

Graphs allow you to

explore connections and

relationships

Graphs can flexibly handle

new and different data

types and volumes

Graphs are hard to Partition

Unpredictable & extremely

slow to follow relationships

Graphs are not Predictable Graphs are highly Dynamic

High cost to follow multiple

competing paths

High cost to load multiple,

constantly changing datasets

?

While great for discovery, graphs pose challenges for traditional approaches

But: But: But:

Page 11: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

11

Graph Analytics with Urika

Use ALL Your Data – No Subsets or

Partitioning

Large Shared Memory

Architecture

Up to 512 Terabytes of RAM

Get Answers in Seconds - Not Days or Weeks

Thousands of Massively

Multi-Threaded Processors

128 Threads/Processor

Load New Data in Minutes - Not Weeks

Scalable I/O – Load data at

up to 350TB per hour

Readily Deployable – No Proprietary Skills

Easy to Use

Open Standard Interface

Linux and W3C

The Urika

Respons

e

Supercomputing

heritage applied to

the largest Big Data

challenges

Key Graph Requirements:

Predictable, interactive performance on largest volumes of diverse data

Flexibility to add new data sources rapidly, in hours as opposed to months

Ability to analyze the “whole graph” without having to break it up across clusters

Leverage and extend IT skill sets and use standards-based approaches

Maximize portability of analytics

Page 12: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

12

Urika and Existing Analytic Environments

Hadoop Clusters

Page 13: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

13

Use Cases

Page 14: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

14

Customer Insight – Relationship Discovery

Goal: Identify new cross-sell and upsell opportunities through discovery of communities, networks or clusters

Data sets: Customer data, customer transaction data, portolio, website traffic, positions,balances, demographics

Technical Challenges: Speed up ability to put facts together and identify hidden clusters, communities or affinities

Users: Product managers, business analysts

Usage model: Iteratively identify communities or clusters where there is an affinity which can be exploited by Marketing

Augmenting: Existing data warehouses, analytical tools

Page 15: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

15

Identify Unknown or Emerging Cyber Threats

Goal: Proactively identify unknown cyber threats by examining all relationships

Data sets: IP, MAC, BGP, Firewall, DNS, Netflow, Whois, NVD, CIDR…

Technical Challenges: Volume and Velocity of data; Temporal dependencies; Real-time response

Users: Cyber Analysts

Usage model: Iterative analysis of all patterns across all traffic to explore deviations in frequency of occurrence, derivative patterns of known threats and linking patterns through relationships in offline data

Augmenting: Existing data appliances

Page 16: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

16

Concluding Thoughts – Key Advantages of Graph Analytics

Business Advantages of Graph Analytics

Interactively answer your most complex questions &

discover new threats, breaches or revenue opportunities

Assess the impact of new data on your analysis

interactively, not after a 3-6 month data onboarding

process

Ask questions you’ve not thought of yet, against all your

relevant data, and rapidly gain new insights

Get up and running in weeks, while leveraging 100% of

existing internal IT and business skills

Leverage your investment and bring scale to any

business problem

Add a powerful NEW capability to your ecosystem, and

improve effectiveness of existing infrastructure

Page 17: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

17

Who is YarcData?

A new division within Cray

100% focused on Big Data solutions

Rapidly-growing, multi-billion market

Experienced management team with deep enterprise roots

YarcData product proven at largest Gov’t/Intel clients

Page 18: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

18

Cray’s Vision: The Fusion of Supercomputing and Big & Fast Data

Modeling The World

Cray Supercomputers solving “grand challenges” in science, engineering and analytics

Advanced

Analytic

Appliances

Storage & Data

Management Supercomputers

Data Models

Integration of datasets

and math models for

search, analysis,

predictive modeling and

knowledge discovery

Math Models

Modeling and simulation

augmented with data to

provide the highest

fidelity virtual reality

results

Data-Intensive

Processing

High throughput event

processing & data

capture from sensors,

data feeds and

instruments

Page 19: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

One Way to Segment the “Big & Fast Data” Market…

Data Warehouses +Extensions (Oracle, Teradata,

Greenplum, DB2)

NoSQL Databases (MongoDB, CouchBase, DynamoDB, AsterData)

Big Data Solutions

These solutions can compete, but also can be very

complementary as each has strengths & weaknesses

Hadoop / MapReduce (Cloudera, HortonWorks,

MapR, Intel)

Graph Analytics (Neo4j, AllegroGraph, Objectivity, Virtuoso)

Page 20: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

Big Data Fast Data Cray Brings Supercomputing to Analytics

20

SAN

Interconnects

Enterprise

Data

(structured)

GRID

LAN/WAN

interconnects

Distributed Memory

Big Data

\

CLOUD

Global

Memory

Fast Data

uRiKA

In-memory

Graph

Analytics

XC30

MPP Global

Memory

CS300

Cluster

Supercomputer

& Hadoop

Ethernet

Clusters

Page 21: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

It is Really About Decision Making through

Fact Finding and Equation Solving

Key

Function

Language Data

Approach

“Airline”

Example

OLTP Declarative

(SQL)

Structured

(relational)

ATM transactions

Buying a seat on an airplane

OLAP

Ad Hoc

Declarative

(SQL+UDF)

or NoSQL

Structured

(relational)

Business Intelligence analysis of

bookings for new ad placements

or discounting policy

Semantic

Ad hoc

Declarative

(SPARQL)

Linked, Open

(graph-based)

Analyze social graphs and infer

who might travel where

API for

analysis

Procedural

(MapReduce)

Unstructured

(Hadoop files)

Application Framework for large

scale weblog analysis

Data

Assimilation

Procedural

(C++, Fortran)

Data merged

With simulations

Sensor data incorporated into

the computer simulation

Optimize

Models

Procedural

(Solver Libs)

Optimization

<-> Simulation

Complex Scheduling

Estimating empty seats

Simulate

Models

Procedural (Fortran, C++)

Matrix Math (Systems of Eq’s)

Mathematical Modeling and

simulation (design airplane)

Languages & Tools for

Programmers

Analyst

Query

Page 22: Augment Your Analytics Ecosystem Through Scalable Graph ... Intelligence analysis of bookings for new ad placements or discounting policy Semantic Ad hoc Declarative (SPARQL) Linked,

22

Thank You