86
AI WEBINAR Date/Time: Tuesday, June 9 | 9 am PST Kubernetes & AI with Run:AI, Red Hat & Excelero

Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

AI WEBINARDate/Time: Tuesday, June 9 | 9 am PST

Kubernetes & AIwith Run:AI, Red Hat & Excelero

Page 2: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Presenter:Omri Geller

CEO & Co-Founder

Your Host:Tom LeydenVP Marketing

AI WEBINAR

What’s next in technology and innovation?

Kubernetes & AIwith Run:AI, Red Hat & Excelero

Presenter:William Benton

Engineering Manager

Presenter:Gil Vitzinger

Software Developer

Presenter:Omri Geller

CEO & Co-Founder

Your Host:Tom LeydenVP Marketing

AI WEBINAR

What’s next in technology and innovation?

Kubernetes & AIwith Run:AI, Red Hat & Excelero

Presenter:William Benton

Engineering Manager

Presenter:Gil Vitzinger

Software Developer

Page 3: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Kubernetes for AI WorkloadsOmri Geller, CEO and co-founder, Run:AI

Page 4: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

A Bit of History

2

Containers scale easily, they’re lightweight and efficient, they can run any workload, are flexible

and can be isolated…But they need orchestration

Bare Metal

Needed flexibility and better utilization

Virtual Machines

Reproducibility and portability

Containers

Page 5: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Track, Schedule and Operationalize

Enter Kubernetes

3

Execute Across Different

Hardware

Create Efficient Cluster

Utilization

Page 6: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Today, 60% of Those Who Deploy Containers Use K8s for Orchestration*

4

*CNCF

Page 7: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Now let’s talk about AI

Page 8: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

6

Manual Engineering

Classical Machine Learning

Computing Power Fuels Development of AI

Deep Learning

Page 9: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

7

Artificial Intelligence is a Completely Different Ballgame

Experimentation R&D

New accelerators

Distributed computing

Page 10: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Constant hassles

8

Data Science Workflows and Hardware Accelerators are Highly Coupled

Datascientists

Hardwareaccelerators

Workflow Limitations

Under-utilized GPUs

Page 11: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

This Leads to Frustration on Both Sides

9

Data Scientists are frustrated – speed and

productivity are low

IT leaders are frustrated – GPU utilization is low

Page 12: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Container ecosystem for Data Science is growing

AI Workloads are Also Built on Containers

10

NGC – Nvidia pre-trained models for AI experimentation on docker containers

Page 13: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

How Can We Bridge The Divide?

11

Page 14: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

12

Kubernetes, the “De-facto” Standard for Container Orchestration

Multiple queues

Automatic queueing/de-queueing

Advanced priorities & policies

Advanced scheduling algorithms

Affinity-aware scheduling

Efficient management of distributed workloads

Lacks the following capabilities:

Page 15: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

13

Build Training

How is Experimentation Different?

Page 16: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

14

Build Training

Distinguishing Between Build and Training Workflows

• Development & debugging• Interactive sessions• Short cycles• Performance is less important• Low GPU utilization

Page 17: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

15

Build Training

Distinguishing Between Build and Training Workflows

• Development & debugging• Interactive sessions• Short cycles• Performance is less important• Low GPU utilization

• Training & HPO• Remote execution• Long workloads• Throughput is highly important• High GPU utilization

Page 18: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

16

Fixed quotas Guaranteed quotas

How to Solve? Guaranteed Quotas

• Fits build workloads• GPUs are always available

• Fits training workflows• Users can go over quota

Page 19: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

17

Fixed quotas Guaranteed quotas

Solution: Guaranteed Quotas

• Fits build workloads• GPUs are always available

• Fits training workflows• Users can go over quota

• More concurrent experiments• More multi-GPU training

Page 20: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

18

Queueing Management Mechanism

Page 21: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Run:AI - Stitching it All Together

Page 22: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Run:AI - Applying HPC Concepts to Kubernetes

20

With the advantages of K8s, plus some concepts from the world of HPC & distributed computing, we can bridge the gap

Data Science teams gain productivity

and speed

IT teams gain visibility and maximal GPU

utilization

Page 23: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

21

Run:AI - Kubernetes-Based Abstraction Layer

INTEGRABLEEasily integrates with IT and Data Science platforms

MULTI-CLOUDRun on any public, private and hybrid cloud environment

IT GOVERNANCEPolicy based orchestration and queuing management

Page 24: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

22

Run:AI

Utilize Kubernetes across IT to improve resource utilization

Speed up experimentation process and time to market

Easily scale infrastructure to meet needs of the business

Page 25: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

From 28% to 73% utilization, 2X speed, and $1M savings

23

Challenge

28% AVERAGE GPU UTILIZATION -inefficient and underutilized resources

After implementing Run:AI’s platformSolution

73% AVERAGE GPU UTILIZATION• Enabled 2x more experiments to run• Saved $1M in additional GPU

expenditures for 2020

Page 26: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

24

Run:AI at-a-Glance

Venture Funded

• Founded in 2018

• Backed by top VCs

• Offices in Tel Aviv, New York, and Boston

• Fortune 500 customers

• Top cloud and virtualization engineers

Page 27: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Thank you

Page 28: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

NVMesh in Kubernetes

Page 29: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

What is NVMesh CSI Driver

● What is NVMesh CSI Driver ?

○ CSI - Container Storage Interface

○ NVMesh as a storage backend in Kubernetes

● Main Features

○ Static Provisioning

○ Dynamic Provisioning

○ Block and File System volumes

○ Access Modes (ReadWriteOnce, ReadWriteMany, ReadOnlyMany)

○ Extend volumes

○ Using NVMesh VPGs

29

Page 30: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

CSI Driver Components

NVMesh Management

NVMesh CSI Controller

Kubernetes Controller

NVMesh CSI Node Driver

NVMesh CSI Node Driver

NVMesh CSI Node Driver

NVMeshClient

NVMeshClient

NVMeshClient

NVMeshTargets

REST API

30

Page 31: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Dynamic Provisioning & Attach Flow

NVMesh CSI Controller

Kubernetes Controller

NVMesh Management

Create Volume

User creates a Persistent Volume Claim (PVC)

NVMeshTargets

31

Page 32: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Dynamic Provisioning & Attach Flow

NVMesh CSI Controller

Kubernetes Controller

NVMesh CSI Node Driver

NVMesh Client NVMesh Management

OS mount

User creates a POD that uses the PVC

Attach / Detach

User App PODs

/dev/nvmesh/v1

K8s internal mount

POD mount

Node

NVMeshTargets

Data

32

Page 33: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Exposing NVMesh volume in a Pod

kublete/pod2/volumes/v1

/dev/nvmesh/v1

User App POD 1

kubelet/volume/mount

kubelet/pod1/volumes/v1

User App POD 2

FileSystem Volume

mount

NVMesh Client

NVMesh attach

Block Volume

bind mount

mkfs

CSI Publish Volume

For each volume for each POD

CSI Stage Volume

Once for each Volume on the Node

33

Page 34: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Usage Examples

kind: PersistentVolumeClaimapiVersion: v1metadata:

name: block-pvcspec:

accessModes:- ReadWriteMany

volumeMode: Blockresources:requests:

storage: 15GistorageClassName: nvmesh-raid10

kind: StorageClassapiVersion: storage.k8s.io/v1metadata:

name: nvmesh-custom-vpgprovisioner: nvmesh-csi.excelero.comparameters:

vpg: your_custom_vpg

34

Page 35: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Summary

NVMesh Benefits for Kubernetes:

● Persistent storage that scales for stateful applications

● Predictable application performance – ensure that storage is not a bottleneck

● Scale your performance and capacity linearly

● Containers in a pod can access persistent storage presented to that pod, but with the freedom to restart the pod on an alternate physical node

● Choice of Kubernetes PVC access mode to match the storage to the application and file system requirements

35

Page 36: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

William Benton Engineering Manager and Senior Principal Engineer Red Hat, Inc.

Machine learning discovery, workflows, and systems on Kubernetes

Page 37: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 38: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 39: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 40: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 41: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 42: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 43: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

configuration data collection

feature extraction process management

analysis tools

monitoring

serving infrastructure

machine resource

management

data verification

(Adapted from Sculley et al., “Hidden Technical Debt in Machine Learning Systems.” NIPS 2015)

Page 44: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

configuration data collection

feature extraction process management

analysis tools

monitoring

serving infrastructure

machine resource

management

data verification

(Adapted from Sculley et al., “Hidden Technical Debt in Machine Learning Systems.” NIPS 2015)

Page 45: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

data engineers

federate

events

databases

file, object storage

transform

transform

transform

archive

Page 46: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

data scientists

federate

trainmodels

events

databases

file, object storage

developer UItransform

transform

transform

Page 47: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

application developers

models

events

databases

file, object storage

management

web and mobile

reporting

transform

transform

transform

archivefederate

train

developer UI

Page 48: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

data scientists

application developersdata engineers

models

events

databases

file, object storage

management

web and mobile

reporting

developer UItransform

transform

transform

archive

train

federate

Page 49: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 50: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 51: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 52: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and cleaning

model deployment

monitoring, validation

Page 53: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

How Kubernetes can help

Page 54: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Immutable images

base image

configuration and installation recipes

user application code

979229b9

33721112 e8cae4f6 2bb6ab16 a8296f7e

a6afd91e 6b8cad3e

Page 55: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Immutable images

base image

configuration and installation recipes

user application code

979229b9

33721112 e8cae4f6 2bb6ab16 a8296f7e

a6afd91e 6b8cad3e

Page 56: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Immutable images

base image

configuration and installation recipes

user application code

979229b9

33721112 e8cae4f6 2bb6ab16 a8296f7e

a6afd91e 6b8cad3e

model in production on 16 July 2019

Page 57: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Stateless microservices

Page 58: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Stateless microservices

Page 59: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Stateless microservices

Page 60: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Stateless microservices

Page 61: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Stateless microservices

Page 62: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Stateless microservices

Page 63: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Stateless microservices

Page 64: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Stateless microservices

Page 65: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Declarative app configuration

https://route.my-awesome-app.ai

Page 66: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Integration and deployment

Page 67: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Integration and deployment

OK!

Page 68: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Integration and deployment

OK!base image

configuration and installation recipes

application codeapplication code

Page 69: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Integration and deployment

base image

configuration and installation recipes

application code

Page 70: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Data drift

Page 71: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Data drift

Page 72: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

On-demand discovery with the Open Data Hub

Page 73: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed
Page 74: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed
Page 75: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

0 0 0 1 1 0 1 0 1 0

0 0 1 0 0 0 1 1 0 0

1 0 1 1 0 1 0 0 0 0

0 0 0 0 0 0 1 1 0 1

0 1 0 0 1 0 0 1 0 0

1 0 0 0 0 1 0 1 1 0

0 0 1 0 1 0 1 0 0 0

0 1 0 0 0 1 0 0 1 1

0 0 0 0 1 0 0 1 0 1

1 1 0 0 0 0 0 0 0 1

0.13 0.13

0.06 0.07

0.07 0.06

0.02 0.08

0.17 0.11

0.11 0.09

0.04 0.18

0.13 0.04

0.13 0.21

0.14 0.03

*

Page 76: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

more storage

sensitive data

more CPUsbetter GPUs

Page 77: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

https://opendatahub.io

Page 78: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

PostgreSQL MariaDB Apache Spark SQL

Apache Kafka (via Strimzi)

Red Hat Ceph Storage

TensorFlow Serving PyTorch Serving Seldon

Spark Katib TFJob PyTorch

Argo Kubeflow Pipelines

OpenShift

JupyterHub Apache Superset

Grafana Prometheus

Page 79: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem

and metrics

feature engineering

model training

and tuning

model validation

data collection

and cleaning

model deployment

monitoring, validation

Page 80: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

3

feature engineering

model training

and tuning

model validation

2

Page 81: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

feature engineering

model training and tuning

model validation

data collection and

cleaning

model deployment

monitoring, validation

OpenShift Pipelines

Page 82: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

codifying problem and metrics

model validation

data collection and

cleaning

model deployment

monitoring, validation

2 3

OpenShift Pipelines

Page 83: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

REST endpoint

OpenShift Serverless

Page 84: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Further resources

Open Data Hub web site: https://opendatahub.io

Contribute: https://github.com/opendatahub-io

Get involved: https://gitlab.com/opendatahub/opendatahub-community

ML workflows on OpenShift and Open Data Hub: https://bit.ly/ml-workflows-ocp

Page 85: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed
Page 86: Kubernetes & AI with Run:AI, Red Hat & Excelero · Run:AI -Applying HPC Concepts to Kubernetes 20 With the advantages of K8s, plus some concepts from the world of HPC & distributed

Thank you!