50
Building an Operating System for AI How Microservices and Serverless Computing Enable the Next Generation of Machine Intelligence Diego Oppenheimer, CEO [email protected]

Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Building an Operating System for AIHow Microservices and Serverless Computing Enable

the Next Generation of Machine Intelligence

Diego Oppenheimer, CEO

[email protected]

Page 2: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

About Me

Diego Oppenheimer - Founder and CEO - Algorithmia

● Product developer, entrepreneur, extensive background in all things data.

● Microsoft: PowerPivot, PowerBI, Excel and SQL Server.

● Founder of algorithmic trading startup

● BS/MS Carnegie Mellon University

Page 3: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Make state-of-the-art algorithms

discoverable and accessible

to everyone.

Page 4: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

4

Algorithmia.comAI/ML scalable infrastructure on demand + marketplace

● Function-as-a-service for Machine & Deep Learning

● Discoverable, live inventory of AI

● Monetizable

● Composable

● Every developer on earth can make their app intelligent

Page 5: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

“There’s an algorithm for that!”70K+ DEVELOPERS 5K+ ALGORITHMS

Page 6: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

6

How do we do it?

● ~5,000 algorithms (60k w/ different versions)

● Each algorithm: 1 to 1,000 calls a second, fluctuates, no devops

● ~15ms overhead latency

● Any runtime, any architecture

Page 7: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

• Two distinct phases: training and inference

• Lots of processing power

• Heterogenous hardware (CPUs, GPUs, TPUs, etc.)

• Limited by compute rather than bandwidth

• “Tensorflow is open source, scaling it is not.” - Kenny Daniel

Characteristics of AI

Page 8: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

8

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

OWNER: Data Scientists

Single user

Page 9: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

9

Analogous to dev tool chain.

Building and iterating over a model

is similar to building an app.

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

OWNER: Data Scientists

Single user

Page 10: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Use CaseJian Yang made an app to recognize food “SeeFood”. Fully trained. Works on his machine.

All rights reserved HBO

Page 11: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Use CaseHe deployed his trained model to a GPU-enabled server

GPU-enabled

Server

?

Page 12: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Use CaseThe app is a hit!

SeeFoodProductivity

All rights reserved HBO

Page 13: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

??

Use Case… and now his server is overloaded.

GPU-enabled

Server

?

xN

?

Page 14: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

MICROSERVICES: the design of a system as

independently deployable, loosely coupled

services.

We’ll be talking about Microservices & Serverless Computing

ADVANTAGES

• Maintainability

• Scalability

• Rolling deployments

SERVERLESS: the encapsulation, starting, and

stopping of singular functions per request, with a

just-in-time-compute model.

ADVANTAGES

• Cost / Efficiency

• Concurrency built-in

• Speed of development

• Improved latency

Page 15: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

15

INFERENCE

Short compute bursts

Elastic

Stateless

OWNER: DevOps

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

OWNER: Data Scientists

Multiple usersSingle user

Analogous to dev tool chain.

Building and iterating over a model

is similar to building an app.

Page 16: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

16

Analogous to dev tool chain.

Building and iterating over a model

is similar to building an app.

Analogous to an OS.

Running concurrent models

requires task scheduling.

INFERENCE

Short compute bursts

Elastic

Stateless

OWNER: DevOps

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

OWNER: Data Scientists

Multiple usersSingle user

Page 17: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

17

Metal or VM Containers

INFERENCE

Short compute bursts

Elastic

Stateless

OWNER: DevOps

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

OWNER: Data Scientists

Multiple usersSingle user

Page 18: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

18

Metal or VM Containers Kubernetes

INFERENCE

Short compute bursts

Elastic

Stateless

OWNER: DevOps

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

OWNER: Data Scientists

Multiple usersSingle user

Page 19: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

19

Metal or VM Containers Kubernetes

INFERENCE

Short compute bursts

Elastic

Stateless

Multiple users

OWNER: DevOps

TRAINING

Long compute cycle

Fixed load (Inelastic)

Stateful

Single user

OWNER: Data Scientists

Page 20: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

20

+ +

● Elastic

● Scalable

● Software agnostic

● Hardware agnostic

=

Why Microservices?

Page 21: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

21

● Cost / Efficiency

● Concurrency built-in

● Improved latency

Why Serverless?

Page 22: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Why Serverless - Cost EfficiencyC

alls

per

Second

Max calls/s

Avg calls/s

40

35

30

25

20

15

10

5

GP

U S

erv

er

Insta

nces

12

AM

02

AM

04

AM

06

AM

08

AM

10

AM

12

PM

02

PM

04

PM

06

PM

08

PM

10

PM

160

140

120

100

80

60

40

20

Jian Yang’s “SeeFood” is most active during lunchtime.

Page 23: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Traditional Architecture - Design for MaximumC

alls

per

Second

Max calls/s

Avg calls/s

40

35

30

25

20

15

10

5

12

AM

02

AM

04

AM

06

AM

08

AM

10

AM

12

PM

02

PM

04

PM

06

PM

08

PM

10

PM

40 machines 24 hours. $648 * 40 = $25,920 per month

GP

U S

erv

er

Insta

nces

160

140

120

100

80

60

40

20

Page 24: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Autoscale Architecture - Design for Local MaximumC

alls

per

Second

Max calls/s

Avg calls/s

40

35

30

25

20

15

10

5

12

AM

02

AM

04

AM

06

AM

08

AM

10

AM

12

PM

02

PM

04

PM

06

PM

08

PM

10

PM

19 machines 24 hours. $648 * 40 = $12,312 per month

GP

U S

erv

er

Insta

nces

160

140

120

100

80

60

40

20

Page 25: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Serverless Architecture - Design for MinimumC

alls

per

Second

Max calls/s

Avg calls/s

40

35

30

25

20

15

10

5

12

AM

02

AM

04

AM

06

AM

08

AM

10

AM

12

PM

02

PM

04

PM

06

PM

08

PM

10

PM

Avg. of 21 calls / sec, or equivalent of 6 machines. $648 * 6 = $3,888 per month

160

140

120

100

80

60

40

20

GP

U S

erv

er

Insta

nces

Page 26: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

??

Why Serverless - Concurrency

GPU-enabled

Servers

?

Load B

ala

ncer

Page 27: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Why Serverless - Improved LatencyPortability = Low Latency

Page 28: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

28

+ +

ALSO:

GPU Memory Management, Job Scheduling, Cloud Abstraction, etc.

Page 29: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

An Operating System for AI

Page 30: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

30

Runtime Abstraction

Support any

programming language

or framework, including

interoperability between

mixed stacks.

Elastic Scale

Prioritize and

automatically optimize

execution of concurrent

short-lived jobs.

Cloud Abstraction

Provide portability to

algorithms, including

public clouds or private

clouds.

Discoverability, Authentication, Instrumentation, etc.

Shell & Services

Kernel

Page 31: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

31

Kernel: Elastic Scale

User

Web Load Balancer

API Load Balancer

Web Servers

API Servers

Cloud Region #1

Worker xN

Docker(algorithm#1)

..

Docker(algorithm#n)

Cloud Region #2

Worker xN

Docker(algorithm#1)

..

Docker(algorithm#n)

Page 32: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

32

Composability

Composability is critical for AI workflows because of data

processing pipelines and ensembles.

Fruit or Veggie

Classifier

Fruit

Classifier

Veggie

Classifier

Page 33: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

33

Kernel: Elastic Scale + Intelligent Orchestration

CPU util, GPU util, Memory

util, IO util

FoodClassifier

CPU util, GPU util, Memory

util, IO util

FruitClassifier

CPU util, GPU util, Memory

util, IO util

VeggieClassifier

Page 34: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

34

Kernel: Elastic Scale + Intelligent Orchestration

Knowing that:

● Algorithm A always calls Algorithm B

● Algorithm A consumes X CPU, X Memory, etc

● Algorithm B consumes X CPU, X Memory, etc

Therefore we can slot them in a way that:

● Reduce network latency

● Increase cluster utilization

● Build dependency graphs

CPU util, GPU util, Memory

util, IO util

FoodClassifier

CPU util, GPU util, Memory

util, IO util

FruitClassifier

CPU util, GPU util, Memory

util, IO util

VeggieClassifier

Page 35: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

35

Kernel: Runtime Abstraction

FoodClassifier

FruitClassifier VeggieClassifier

Page 36: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

36

Kernel: Cloud Abstraction - Storage

# No storage abstraction

s3 = boto3.client("s3")

obj = s3.get_object(Bucket="bucket-name", Key="records.csv")

data = obj["Body"].read()

# With storage abstraction

data = Algorithmia().client.file("blob://records.csv").get()

s3://foo/bar

blob://foo/bar

hdfs://foo/bar

dropbox://foo/bar

etc.

Page 37: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

37

Compute EC2 CE VM Nova

Autoscaling Autoscaling Group Autoscaler Scale Set Heat Scaling Policy

Load BalancingElastic Load

BalancerLoad Balancer Load Balancer LBaaS

Remote Storage Elastic Block Store Persistent Disk File Storage Block Storage

Partial Source: Sam Ghods, KubeConf 2016

Kernel: Cloud Abstraction

Page 38: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

38

Summary - What makes an OS for AI?

Stack-agnostic

Composable

Self-optimizing

Auto-scaling

Monitorable

Discoverability

Page 39: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

iOS/AndroidBuilt-in App Store

(Discoverability)

Punched Cards1970s

UnixMulti-tenancy, Composability

DOSHardware Abstraction

GUI (Win/Mac)Accessibility

Page 40: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Punched Cards1970s

AI is here

iOS/AndroidBuilt-in App Store

(Discoverability)

Page 41: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Diego Oppenheimer

CEO

Thank you!

[email protected]

@doppenhe

FREE STUFF:

Signup with code: NVIDIAGTC18for $50 on us.

Page 42: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

more slides

Page 43: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

43Source: Jerry Chen, Greylock Ventures

The New Moats

Page 44: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

Punched Cards1970s

GitHub and HerokuToday

Page 45: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

# init

client = Algorithmia.client()

# get data (S3)

s3 = boto3.client("s3")

obj = s3.get_object(Bucket="bucket-name", Key="records.csv")

data = obj["Body"].read()

# remove seasonality

data = client.algo("ts/RemoveSeasonality").pipe(data).result

# forecast time series

data = client.algo("ts/ForecastLSTM").pipe(data).result

45

Kernel: Cloud Abstraction - Storage

# init

client = Algorithmia.client()

# get data (anything)

data = client.file("blob://records.csv").get()

# remove seasonality

data = client.algo("ts/RemoveSeasonality").pipe(data).result

# forecast time series

data = client.algo("ts/ForecastLSTM").pipe(data).result

Page 46: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

46

01 # MY_ALGORITHM.py

02

03 client = Algorithmia.client()

04 data = client.file("blob://records.csv").get()

05

06 # remove seasonality

07 data = client.algo("ts/RemoveSeasonality").pipe(data).result

08

09 # forecast time series

10 data = client.algo("ts/ForecastLSTM").pipe(data).result

Page 47: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

47

Kernel: Elastic Scale + Intelligent Orchestration

# MY_ALGORITHM.py

client = Algorithmia.client()

data = client.file("blob://records.csv").get()

# remove seasonality

data = client.algo("ts/RemoveSeasonality").pipe(data).result

# forecast time series

data = client.algo("ts/ForecastLSTM").pipe(data).result

CPU util, GPU util, Memory

util, IO util

MyAlgorithm

CPU util, GPU util, Memory

util, IO util

RemoveSeasonality

CPU util, GPU util, Memory

util, IO util

ForecastLSTM

Page 48: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

48

Kernel: Elastic Scale + Intelligent Orchestration

Knowing that:

● Algorithm A always calls Algorithm B

● Algorithm A consumes X CPU, X Memory, etc

● Algorithm B consumes X CPU, X Memory, etc

Therefore we can slot them in a way that:

● Reduce network latency

● Increase cluster utilization

● Build dependency graphs

CPU util, GPU util, Memory

util, IO util

MyAlgorithm

CPU util, GPU util, Memory

util, IO util

RemoveSeasonality

CPU util, GPU util, Memory

util, IO util

ForecastLSTM

Page 49: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

49

Kernel: Runtime Abstraction

# MY_ALGORITHM.py

client = Algorithmia.client()

data = client.file("blob://records.csv").get()

# remove seasonality

data = client.algo("ts/RemoveSeasonality").pipe(data).result

# forecast time series

data = client.algo("ts/ForecastLSTM").pipe(data).result

MyAlgorithm

RemoveSeasonality ForecastLSTM

Page 50: Building an Operating System for AIon-demand.gputechconf.com/gtc/2018/presentation/s8900-how-micro... · Building an Operating System for AI How Microservices and Serverless Computing

50

Challenges

● Machine learning

○ CPU/GPU/Specialized hardware

○ Multiple frameworks, languages, dependencies

○ Called from different devices/architectures

● “Snowflake” environments

○ Unique cloud hardware and services

● Uncharted territory

○ Not a lot of literature, errors messages sometimes cryptic (can’t just stackoverflow)