23
Edge-Cloud Converged System for Machine Inference and Learning SATYAM VAGHANI. VP & GM, IOT AND AI OCTOBER 2018

Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Edge-Cloud Converged System for Machine Inference and Learning

S A T Y A M V A G H A N I . V P & G M , I O T A N D A I

O C T O B E R 2 0 1 8

Page 2: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

BackgroundThe Intelligent Edge and some use cases

Page 3: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

A key change: Edge trumps Cloud

2017 2020

Cloud traffic

8.6 ZB15.3 ZB

2017 2020

IoT data

256 ZB

600 ZB

Sources: Cisco Global Cloud Index, Memoori

Page 4: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Key consequence: Intelligent Edge

AFTER

BEFORE

Data IngestionReal-time

Processing

Long-term

Processing

Real-time

Processing

Long-term

ProcessingData Ingestion

SENSORS

SENSORS

CLOUD

CLOUD

EDGE COMPUTING

IoT GATEWAY

Page 5: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Use case: ‘Amazon Go’ for restaurants

EDGE CLOUDx 100s

Apps &

models

Anomalies

Machine Inference

Page 6: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Use case: Product quality check

EDGE CLOUDx 100s

x 10s

✓✘

Apps &

models

Insights

Machine Inference,

Analytics,

Actuation

Page 7: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Our journey and learningsBuilding a PoC is easy, operationalizing it at industrial scale is H-A-R-D

Page 8: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

The Intelligent Edge is Not Ready

TWO PROBLEMS PREVENT WIDESPREAD ADOPTION

• Distributed infrastructure burden

• AI operationalization at industrial scale

Page 9: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

A Smart Airport Example

TOPOLOGY

Airport 1 (of 10)

2000 x

100 x

250 x

Page 10: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

A Smart Airport use case

OBJECT OF INTEREST

Look for a red car at airport(s)

| 10

SFO AIRPORT (EDGE) CLOUD

...

10:41AM: redcar=0

10:42AM: redcar=1

10:43AM: redcar=0

10:44AM: redcar=0

...

...

10:41AM: SFO, redcar=0

10:42AM: SFO, redcar=1

10:43AM: SFO, redcar=0

10:44AM: SFO, redcar=0

...

Page 11: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

IoT infrastructure burden

APPS SPAN EDGE AND CLOUD AT PLANET SCALE

| 11

Train model to recognize redcar

Deploy model to selected edges

Runtime for car recognition model

Persistence for surveillance feed

Sampling surveillance feed to match model

input requirements

Persistence of image recognition

time series output Data mover to move time series output to

cloud

Persistence for time series data in cloud

App runtime in cloud

Securitybusiness logic

infrastructure madness

Page 12: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Introducing Xi IoT

PLATFORM

MIDDLEWARE

DEVOPS

BUSINESS LOGIC

Xi IoT

Page 13: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Xi IoT: High level architecture

Persistence

Runtime

Data Ingestion

Sensor Actuation

Streaming Data Service

Unstructured Data Service

Pub Sub Service

Data Bus

Machine Inference

Streaming Analytics

Code (Containers/FaaS)

Control Plane

SENSORS

Machine Training

Long term Analytics

Custom Code

Streaming Data Service

Unstructured Data Service

Structured Data Service

EDGE CLOUD

LONG TERM PROCESSINGREAL TIME PROCESSINGINGRESS

Operations Console

Developer Console SaaS

Page 14: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

ML-specific learnings

Page 15: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

#1: Handle inference hardware diversity

EDGE EDGE EDGE

GPU ASIC FPGA

SaaS

Model

1Model

2

Model

3

ML FRAMEWORK 1 ML FRAMEWORK 2 Model Onboarding

COMPILER 1 COMPILER 2 Model DeploymentCOMPILER 3

Model

1Model

2

Model

1Model3

Model

1

Page 16: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

#2: Fit model to Edge constraints

SaaS

ML FRAMEWORK 1 ML FRAMEWORK 2 Model Onboarding

COMPILER 1 COMPILER 2 Model DeploymentCOMPILER 3

TensorRT Quantization NN Pruning

Fitting

Strategy

Model

Library

Model Development

Edge Infrastructure

Page 17: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

EDGE

#3: Make inference hardware multi-tenant

Static allocation

Unclear benefit

No sharing Dynamic allocation

Dynamic rebalancing

EDGE EDGE

GPU GPU GPU

VM VMVMCtr/

FnCtr/

Fn

Ctr/

Fn

vGPUsINFERENCE MUX

Ctr/

FnCtr/

Fn

Ctr/

Fn

Centr

aliz

ed in

fere

nce

resourc

e m

anagem

ent

H/W

capabili

ty input

Page 18: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

#4: Stretch inferencing across nodes

EDGE

GPU

Ctr/

FnCtr/

Fn

Ctr/

Fn

Centr

aliz

ed in

fere

nce

resourc

e m

anagem

ent

H/W

capabili

ty input

Hard to scale perf with use

No HA

Easier to scale perf with use

Additional HA benefit

EDGE

GPU

Ctr

/Fn

Ctr/

Fn

Centr

aliz

ed in

fere

nce

resourc

e m

anagem

ent

EDGE

GPU

Ctr/

Fn

INFERENCE MUX INFERENCE MUX

Page 19: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

#5: Advanced resource management

Workload re-distribution due to contention or user defined objectives

EDGE

GPU

Model

1

Model

2

CPU

Model

3

EDGE

GPU

Model

1

Model

2

CPU

Model

3

SaaS

User Policies Edge constraints Global Resource Manager

SaaS orchestrated

transition

Page 20: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

#6: Update models at planet-scale

GOOD: APP EMBEDDED

MODELS

• Hard to construct/maintain

• Hard to update

• Hard to share

BETTER: DIS-AGGREGATED

MODELS & CONTAINERS

• Hard to construct/maintain

• Easy to update

• Easy to share

BEST: FUNCTIONS & MODELS-

as-ARGUMENTS

• Easy to construct/maintain

• Easy to update, roll back

• Easy to share

SaaSFitting Strategy Model Library

EDGE EDGE EDGE

Container

Model

ContainerModel Function Model

Page 21: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

BEFORE

Two different silos to operate: user management,

security, data management, infrastructure

#7: Converge learning & inferencing flows

AFTER

Unified control and data plane for learning and

inferencing: uniform user management, security, data

management, end-to-end infrastructure.

OT data can be easily fed back to learning.

LEARNING

PLATFORM

LEARNING

APPS

DATA LAKE

EDGE

PLATFORMLEARNING

PLATFORMPUB/PVT

CLOUD

EDGE

PLATFORM

INFERENCE

APPS

OT DATA

SOURCES Xi IoT DATA BUS: OT DATA & DATA LAKE

INFERENCE

APPS

LEARNING

APPS

Xi IoT CONTROL PLANE

OTHER

APPS

Page 22: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Takeaways

1 Success of machine inferencing in Enterprise IoT greatly depends on success of the Intelligent Edge

2 Many obstacles in operationalizing machine inference; important to get past them via systems software instead of human intervention

3 Nutanix solved these problems while creating Xi IoT, with positive validation in retail, manufacturing, oil & gas verticals

Page 23: Edge-Cloud Converged System for Machine Inference and Learningon-demand.gputechconf.com/gtc-eu/2018/pdf/e8479... · A key change: Edge trumps Cloud 2017 2020 Cloud traffic 8.6 ZB

Thank you