37
GRAPH ANALYTICS AND MACHINE LEARNING STANLEY W ANG SOLUTION ARCHITECT , TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b

Graph analytic and machine learning

Embed Size (px)

Citation preview

Page 1: Graph analytic and machine learning

GRAPH ANALYTICS AND MACHINE LEARNING

STANLEY WANG SOLUTION ARCHITECT, TECH LEAD @SWANG68 http://www.linkedin.com/in/stanley-wang-a2b143b

Page 2: Graph analytic and machine learning

Mathematics on Graph

• An abstract representation of a set of entities where some pairs are connected by links;

Entity (Vertex, Node)

Link ( Edge, Relationship)

Page 3: Graph analytic and machine learning

What is Graph?

Page 4: Graph analytic and machine learning

Constructing of Graph

Page 5: Graph analytic and machine learning

Graph Affinity Matrix

Page 6: Graph analytic and machine learning

Graph Laplacian Matrix

Page 7: Graph analytic and machine learning

Update Function on Graph

Page 8: Graph analytic and machine learning

Magic of Properties of Laplacian Matrix

Page 9: Graph analytic and machine learning

What is a Graph Database?

• A Database with an Explicit Graph Structure;

• Each Node Knows its Adjacent Nodes; • As the Number of Nodes Increases, the

Cost of a Local Step Remains the Same, O(n);

• An Index for Lookups;

Page 10: Graph analytic and machine learning

Relational Model vs Graph Model

Optimized for Aggregation Optimized for Connections

Page 11: Graph analytic and machine learning

RDBMS

SQL vs NOSQL C

om

ple

xit

y

Big Table Column Family

Size

Key-Value Store

Document Databases

Graph Databases

90% of Use Cases

Relational Databases

Page 12: Graph analytic and machine learning

Performance Comparison

Page 13: Graph analytic and machine learning

Value in Relationships Low High

Key-Value

Why Graph Databases?

K V

BigTable

K V V V V

Document

Relational

Graph

Page 14: Graph analytic and machine learning

NoSQL and Big Data

14

• Traditional databases handle big data sets, too. But, more on structure data;

• NoSQL databases have poor analytics;

• HDFS, MapReduce often works from text files;

• NoSQL is more for high throughput, basically, AP from the CAP theorem, instead of CP;

• In practice, Big Data is likely to be a mix of text files, NoSQL, and SQL RDBMS;

Page 15: Graph analytic and machine learning

Graph Terminology

• Graph Computation(Analytics):

o Whole graph is processed, typically for several

iterations vertex-centric computation.

o Examples: Belief Propagation, Pagerank,

Community detection, Triangle Counting,

Matrix Factorization, Machine Learning…

• Graph Database (Queries):

o Selective graph queries (compare to SQL

queries)

o Traversals: shortest-path, friends-of-friends,…

15

Page 16: Graph analytic and machine learning

GRAPH ANALYTICS

Page 17: Graph analytic and machine learning

What Graph Can Model?

Page 18: Graph analytic and machine learning

Graphs are Essential to ML

• Identify influential people and information;

• Discover communities;

• Understand people’s interests in common;

• Model complex real life data dependencies;

It’s all about GRAPH: The Value of Data is Proportional to the Number of Meaningful Relationships!

Page 19: Graph analytic and machine learning

Complex Big Data Graph ML Algorithms

Page 20: Graph analytic and machine learning

Graph Social Network Model

Model can be easily used in real life applications for customer classification, profiling, segmentation and product

recommendations.

Page 21: Graph analytic and machine learning

Identifying Key People

Page 22: Graph analytic and machine learning

Social Network Tie Recommendation

Page 23: Graph analytic and machine learning

Full Stack Graph ML Algorithms

Page 24: Graph analytic and machine learning

Typical Graph Analytics

Page 25: Graph analytic and machine learning

Graph Analytics - Page Rank

• PageRank, is about the importance of nodes in GRAPH – Link Analysis, which is defined as the probability falling into node depending on: The probability

landing onto one of the node’s neighbor;

The probability crossing the link from neighbor to it;

o Identify the influential leader;

Page 26: Graph analytic and machine learning

Graph Analytics - Triangle Count • Clustering coefficient (CC) is a

measure of the degree to which nodes in a graph tend to cluster together;

• Calculation of CC can be tuned to counting the number of triangles around one particular node in the graph;

• CC indicates the degree to which a node’s neighbors are themselves neighbors;

• CC of a graph is closely related to the transitivity of a graph;

Page 27: Graph analytic and machine learning

Graph Analytics - Connected Components

• Connected component is a subgraph in which any

two vertices are connected and no additional

vertices connected to the supergraph;

• A graph is strongly connected if every vertex is

reachable from other vertices. The strongly

connected components form a partition into

subgraphs that are themselves strongly connected;

• A spanning tree is a subgraph of the original graph,

which connect all the vertexes that where originally

connected;

• A minimum spanning tree (mst) is a spanning tree

such that the sum of the weights of its edges is not

greater than the sum of the edges of any other

spanning tree;

Page 28: Graph analytic and machine learning

Graph Analytics - Betweenness centrality

• Betweenness centrality is an indicator of a node's centrality in a network, which is equal to the number of shortest paths from all vertices to all others that pass through that node;

• A node with high betweenness centrality has a large influence on the transfer of items through the network;

• Betweenness centrality is related to a network's connectivity;

Page 29: Graph analytic and machine learning

Graph Social Media Recommendation

Page 30: Graph analytic and machine learning

Graph Computing Opportunity

Combining with the leading tools such as Graph Database, Machine Learning, High Performance

Computing, Clustering, Streaming, Graph Computing Technology is ready to take off in Big

Data Era!

Page 31: Graph analytic and machine learning
Page 32: Graph analytic and machine learning

Distributed Graph Analytics System

Page 33: Graph analytic and machine learning
Page 34: Graph analytic and machine learning

How to Construct Graph?

Page 35: Graph analytic and machine learning

Graph ETL Data Flow

Page 36: Graph analytic and machine learning

Graph ETL Example

Page 37: Graph analytic and machine learning

Graph ETL Architecture