34
© 2017 MapR Technologies MapR Confidential 1 Distributed Deep Learning on MapR Converged Data Platform Powered By NVIDIA GPUs Dong Meng - [email protected] Data Scientist, MapR Technologies

Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 1

Distributed Deep Learning on MapR Converged Data Platform

Powered By NVIDIA GPUs

Dong Meng - [email protected] Data Scientist, MapR Technologies

Page 2: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 2

Roadmaps

•  Enterprise Big data Journey • Distributed Deep Learning •  Apply Container Technology and NVIDIA GPUs • Demos

Page 3: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 3

Enterprise Big data Journey

Page 4: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 4

Great buildings have great foundations

Page 5: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 5

Big Data Technology development 2003 •  Google published “The Google File System”

2004 •  Google published “MapReduce” paper

2008 •  Hadoop became a top-level Apache Project

2009 •  Facebook created Hive

2010 •  Facebook implemented Hbase

2011 •  Linkedin presented Kafka

2014 •  Spark 1.0 was released

2014 •  Google implement Borg with Go and release Kubernetes

Page 6: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 6

MapR Development

2011 Industrial grade data platform for big data analytics 1.0 - MapR File System

2013 Industrial grade NoSQL Key Value Store – MapRDB (Implements Hbase APIs)

2012 Industry’s first visual big data ops dashboard in MapR control system

2014 Global multi datacenter replication Fast Ingest 1.0

2016 Global streaming - MapR Streams JSON Document DB Fast Ingest 2.0 Spyglass Monitoring

2015 Schema free SQL engine for big data – Apache Drill Global table replication

2017 Persisted data access for Docker containers MapR Edge for edge computing

Page 7: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 7

MapR Converged Data Platform

Files, Tables, Streams together on same platform

Shared Services

On-Premise, In the Cloud, Hybrid

High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace

Converge-X™ Data Fabric

Event Data Streams

Analytics & Machine Learning

Engines

Operational Database

Cloud-scale Data Store

Page 8: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 8

High Availability Real Time Security & Governance Multi-tenancy Disaster Recovery Global Namespace

Converge-X™ Data Fabric

On-Premise, In the Cloud, Hybrid

HDFS API POSIX, NFS HBase API JSON API Kafka API

An Enterprise Foundation To Operationalize Data

Supports Open-Source APIs with Patented Speed, Scale, Reliability

Data Center

Next-Gen Applications

Event Data Streams

Analytics & Machine Learning Engines

Operational Database

Cloud-scale Data Store

Existing Enterprise Applications

Page 9: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 9

From a User Perspective

Files

Table

Streams

Directories

Cluster

Volume mount point

Page 10: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 10

What we have observed:

•  Enterprise customers start to gain great value from big data platform as technologies grow mature

•  The data pipeline is growing from batch to real time streaming •  The volume of data collected is growing from TB to PB to EB. •  Customers are more motivated than ever to build intelligence

applications based on collected data •  Machine learning interest •  Add GPU cards to clusters

Page 11: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 11

Distributed Deep Learning

Page 12: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 12

More DATA

The Unreasonable Effectiveness of Data, published by Google

beats complex algorithms

Page 13: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 13

Distributed Machine Learning •  Mahout: Map-Reduce •  GraphLab: Graph-based parallel framework •  Spark: In memory dataflow system •  H2o: Distributed machine learning library and server

Page 14: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 14

Why Deep Learning •  Machine learning algorithm’s performance depend on the

representation of the data they are given (feature engineering)

•  Difficulty in finding the right representations (features) •  Deep learning

–  Learns multiple levels of representations –  Learns high level of abstractions

•  Growth of data volumes and computing power (GPUs)

Page 15: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 15

Distributed Deep Learning System From a system design perspective: •  Distributed Data Storage – HDFS, MapRFS, Ceph •  Consistency – Parameter Server •  Fault Tolerance – Checkpoint reload •  Resource management – Yarn, Mesos, Kubernetes •  Programming Mode – TensorFlow, Apache MXNet, Torch, Caffe

Page 16: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 16

Distributed Deep Learning Model Data Parallelism: •  Data Partition •  Train each model on mini-batches of data Model Parallelism: •  Model Partition •  Separate the training task for each layer

Page 17: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 17

Distributed Deep Learning System From a user perspective: •  Ease of expression: for lots of crazy ML ideas/algorithms •  Scalability: can run experiments quickly •  Portability: can run on wide variety of platforms •  Reproducibility: easy to share and reproduce research •  Production readiness: go from research to real products

Page 18: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 18

Apply Container Technology and NVIDIA GPUs

Page 19: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 19

Deep Learning Development Environment Individual Research: •  Laptop/Dev boxes Research Lab Setting: •  HPC clusters, high speed job execution, small teams Enterprise Deep learning System: •  Distributed Data Storage, Consistency, Fault Tolerance,

Resource management, Programming Mode, Version and Deployment

•  Container, Kubernetes, MapR Data Platform

Page 20: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 20

Containers are Great for Deployment and Research Advantages

•  Reproducible work environments

•  Ease of deployment

•  Isolation of individual workspaces

•  Run across heterogeneous

environments

•  Facilitate collaboration

Page 21: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 21

Stateful Containers for Deep Learning

Persistent Storage

Audio Data Video Feeds

Advantages •  Containerized workspaces

•  Keep your work between sessions

•  Manage work across many

projects

•  Work with versioned datasets and

models

•  Share work across containers,

projects and/or teams

Sensor data

Page 22: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 22

Microservices for Production Deep Learning

Event Streams & DB

Advantages •  Deploy models to production as

microservices

•  Use files, real-time streams and

databases in production

•  Scales horizontally

•  Support both real-time and batch

•  May or may not be stateful

Page 23: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 23

Kubernetes for Containers Orchestration •  Deploy once and run many times •  Rollout new versions/rollback to old versions •  Dynamic Scheduling and Elastic Scaling •  Auto Fault Recovery and Scalable Computing •  Isolation and Quota •  Manage GPU resources •  Mount MapR Volume to Persistent Volume

Page 24: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 24

Kubernetes in a Heterogeneous GPU Cluster •  Kubernetes master:

•  CPU-only •  Workers:

•  NVIDIA driver •  CUDA •  CUDNN

•  Kubelet config updated for GPU workers needed

--feature gates=Accelerators=true

•  Note: Containers also need to be configured for GPU support separately Diagram: Frederic Tausch on Github

Page 25: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 25

Architecture

Data layer

Orchestration layer

Application layer

Page 26: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 26

•  Converged Data Platform for both Big Data and Deep Learning •  By design Volume Topology, collocate the data with GPU

computation/training. •  Performant POSIX file system, quick adapt to new deep

learning technologies and frameworks •  Mirroring feature enables model training/deployment with global

data centers •  Snapshot feature enables research reproducibility •  MapRDB and MapR Streams provides infrastructures for

Intelligence applications with deep learning models

MapR as the Infrastructure for Distributed DL

Page 27: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 27

MapR Volumes •  Volumes是containers的逻辑分组, 被mounted在文件目录上(just like Linux).

•  没有大小限制 •  每个volume可以设置不同的管

理策略 •  没有数目限制 •  Containers在其他节点上有备份 •  快照备份,镜像功能,数据定位(限制在特定节点内),数据核查

/projects

/project1

/users

/jsmith

/mjohnson

/projects

/users

Data Nodes

Page 28: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 28

Pattern 1: Separate Clusters

MapR Converged Data Platform Tier

Dockerized GPU-based NVIDIA Tier, NVIDIA DGX systems as MapR client

Page 29: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 29

Pattern 2: Collocated MapR + Kubernetes

MapR Converged Data Platforms powered by NVIDIA GPU Cards

Page 30: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 30

Demos

Page 31: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 31

Demo1: Parameter Server and Worker POD1: TF Parameter Server

POD3: TF Worker2 POD2: TF Work1

/projects

/project1 Persistent Volume

MapR Volume

Page 32: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 32

Demo2: Real Time Streams Face Detection GLOBAL DATA PLANE

MAPR EDGE

Small footprint at the edge MAPR CONVERGED

ENTERPRISE EDITION

Send updated model back to edge

Aggregate, analyze data at core and refine models

Reliable replication

MAPR EDGE

Camera to capture video feeds

Convergence at the edge

MAPR EDGE Deploy Model at the edge to

score data feed

[on-prem, hybrid, cloud]

Page 33: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 33

Demo2: Real Time Streams Face Detection

Producers to capture video feeds

Consumers to output processed videos

DL model process streams

MAPR EDGE

Page 34: Distributed Deep Learning on MapR Converged Data Platform ...€¦ · MapR Confidential © 2017 MapR Technologies 2 Roadmaps • Enterprise Big data Journey • Distributed Deep Learning

© 2017 MapR Technologies MapR Confidential 34

Q&A

ENGAGE WITH US

孟东 [email protected]