12
DataSToRM: Data Science and Technology Research Environment Mr. Vitaliy Gleyzer MIT Lincoln Laboratory 5 March 2018 The Future of Advanced (Secure) Computing This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air Force Contract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering. Distribution Statement A: Approved for public release: distribution unlimited. © 2018 Massachusetts Institute of Technology. Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

DataSToRM: Data Science and Technology Research …...Mr. Vitaliy Gleyzer MIT Lincoln Laboratory 5 March 2018 The Future of Advanced (Secure) Computing This material is based upon

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

DataSToRM: Data Science and Technology Research

Environment

Mr. Vitaliy Gleyzer

MIT Lincoln Laboratory

5 March 2018

The Future of Advanced (Secure) Computing

This material is based upon work supported by the Assistant Secretary of Defense for Research and Engineering under Air ForceContract No. FA8721-05-C-0002 and/or FA8702-15-D-0001. Any opinions, findings, conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Assistant Secretary of Defense for Research and Engineering.

Distribution Statement A: Approved for public release: distribution unlimited.

© 2018 Massachusetts Institute of Technology.

Delivered to the U.S. Government with Unlimited Rights, as defined in DFARS Part 252.227-7013 or 7014 (Feb 2014). Notwithstanding any copyright notice, U.S. Government rights in this work are defined by DFARS 252.227-7013 or DFARS 252.227-7014 as detailed above. Use of this work other than as specifically authorized by the U.S. Government may violate any copyrights that exist in this work.

DataSToRM - 2VG 03/05/18

Advancing the State of Big Data Analytics: Raw Data to Insight

What new insight can

be gained from the

data?

What new information

should be collected?

analyticsdata

technologies

Big Data Application

presentation

Can analytics be modified

to allow efficient

implementation?

Can different data help

improve human

understanding?

How can existing

analytics be

accelerated?

How can the insight be

presented to improve

human cognition?

Are there new application

areas enabled by new

analytics?

How can data be

stored/collected more

efficiently?

DataSToRM - 3VG 03/05/18

Large-Scale Graph Applications Today

Development Environment

Lincoln Laboratory Supercomputing Center

Graph Analysis Frameworks and Databases

PageRank

Centrality

Walktrap

InfoMap

K-Truss

Algorithms

Visualization ToolkitsHardware Platform

……

BFS

MST

D4M

Cyber Data Analysis

Recommender Systems

Functional Brain Mapping

Applications

Personalized Healthcare

Fraud Detection

Drug Discovery

Diverse, quickly evolving ecosystem

DataSToRM - 4VG 03/05/18

Advancing the State of Big Data Analytics: Challenges

• Technology moves quickly– New algorithms and analytic techniques– New storage solutions– New processing technologies– New database technologies and frameworks– New applications

• New framework adoption is a serious investment

• How to leverage new technologies?

• How to enable co-design opportunities?

• How to integrate disparate communities to enable co-design?

Keeping up with big data technology is challenging

DataSToRM - 5VG 03/05/18

Example Standardization Efforts

Development Environment

Lincoln Laboratory Supercomputing Center

Graph Analysis Frameworks and Databases

PageRank

Centrality

Walktrap

InfoMap

K-Truss

Algorithms

Visualization ToolkitsHardware Platform

……

BFS

MST

D4M

Cyber Data Analysis

Recommender Systems

Functional Brain Mapping

Applications

Personalized Healthcare

Fraud Detection

Drug Discovery

GraphBLASApache

Gremlin

Different communities are attempting to unify and standardize interface and languages in the ecosystem

DataSToRM - 6VG 03/05/18

Example Standardization Efforts

Development Environment

Lincoln Laboratory Supercomputing Center

Graph Analysis Frameworks and Databases

PageRank

Centrality

Walktrap

InfoMap

K-Truss

Algorithms

Visualization ToolkitsHardware Platform

……

BFS

MST

D4M

Cyber Data Analysis

Recommender Systems

Functional Brain Mapping

Applications

Personalized Healthcare

Fraud Detection

Drug Discovery

GraphBLASApache

Gremlin

Different communities are attempting to unify and standardize interface and languages in the ecosystem

Backend interface standardization

enables hardware innovation!

• Parallel in-memory database

• 100–1000 graph edge traversal speed

• Simple linear algebra API

DataSToRM - 7VG 03/05/18

Unifying Principles for Big Data Graphs

• Graphs capture relationship information between entities

– Molecular forces– Social interactions– Semantic concepts– Vehicle tracks

• Graphs can be fully expressed in the language of linear algebra

– Represented as sparse matrices– Enable mathematic foundation for data analysis– Leverage existing linear algebra techniques and

methods– Define a small set of well-defined mathematical

operations

Graph Representations

Adjacency Matrix(N N)

Vertices

Ver

tice

s

Incidence matrix(N M)V

erti

ces

Edges

DataSToRM - 8VG 03/05/18

DataSToRM: Data Science and Technology Research Environment

Enables hardware diversity

Composed of analytics

Implemented on top of a standard API

Hardware acceleration of a small number of well-defined mathematical operations enable an extensive analytic ecosystem

Threat Detection Sentiment AnalysisRecommender

Engine …

Community Detection

Classification

API

Centrality Analysis

Hardware

Applications

Graph Analysis Kernels …

GraphBLAS (Semi-ring Linear Algebra API)

DataSToRM - 9VG 03/05/18

GraphBLAS Overview

• Five key operationsA = NxM(i,j,v) (i,j,v) = A C = A B C = A C C = A B = A . B

• Can be used to build 12 GraphBLAS standard functionsbuildMatrix, extractTuples, Transpose, mXm, mXv, vXm, extract, assign, eWiseAdd, eWiseMult, apply, reduce

• Can be used to build a variety of graph utility functionsTril(), Triu(), Degreed Filtered BFS, …

• Can be used to build a variety of graph algorithmsK-Truss, Jaccard Coefficient, Non-Negative Matrix Factorization, …

• That work on a wide range of graphsHyper, multi-directed, multi-weighted, multi-partite, multi-edge

Unifying interface for backend graph processing

DataSToRM - 10VG 03/05/18

Lincoln Laboratory Technologies Targeting Large-Scale Graph Analytics

Graph Processor LLSC

D4M

GraphBLAS

Graph AlgorithmsGraph Algorithms

Dynamic Distributed Dimensional

Data Model (D4M)

GraphBLAS

Standard API for graph analytics using

Sparse Linear Algebra primitivesData analysis framework based on

associative array algebra

Graph Processor Lincoln Laboratory Super

Computing Center (LLSC)

State-of-the-art super computing

environment

Novel graph processing architecture

• Simple hardware agnostic API

• Concise language for

complex graph analytics

• Mathematical closure

• Linear Algebra underpinning

• Scalable graph processing

hardware architecture

• Unprecedented performance

• Native linear algebra

instruction set

• Heterogeneous processing

capabilities

• Ideal technology integration

environment

DataSToRM - 11VG 03/05/18

Graph Processor Matrix Multiply Performance

1E+7

1E+8

1E+9

1E+10

1E+11

1E+12

1E+13

1E+14

1E+1 1E+2 1E+3 1E+4 1E+5 1E+6 1E+7 1E+8 1E+9

Trav

ers

ed

Ed

ges

Pe

r Se

con

dWatts

ASIC Graph Processor (Projected)

FPGA Graph Processor (Measured)

Cray XK7 Titan (Measured)

Cray XT4 Franklin (Measured)

Today’s State of the Art

Data Center

Applications

Embedded

Applications

16 Racks

64 Racks

1014

1013

1012

1011

1010

109

108

107

101 102 103 104 105 106 107 108 109

Graph Processor Performance

4 Racks

1 Racks

2 Nodes

FPGAASIC

100x

20171 Chassis System

Highly efficient graph processing technology that is 100s to 1000s of times more efficient compared to traditional architectures

DataSToRM - 12VG 03/05/18

Collaboration Opportunities

• Developing graph algorithms in the language of linear algebra– Community detections, subgraph isomorphism, subgraph matching, etc.

• Developing graph algorithms that can scale to datasets with billions to trillions of vertices– Sparsity-aware, distributed memory algorithms

• Identifying or developing new technologies to leverage the linear algebra abstraction– Compilers, optimizer, hardware accelerators, etc.

• Integrating GraphBLAS backend as part of popular frameworks – e.g., Apache TinkerPop, Neo4j, ElasticSearch, etc.