36
1 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | SageMaker Operators and Components Overview Alex Chung, Senior Product Manager Hallie Crosby Service Solutions Architect Amazon SageMaker and Kubernetes

Amazon SageMaker and Kubernetes - AWS

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Amazon SageMaker and Kubernetes - AWS

1© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | 1© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

SageMaker Operators and Components Overview

Alex Chung,Senior Product Manager

Hallie CrosbyService Solutions Architect

Amazon SageMakerand Kubernetes

Page 2: Amazon SageMaker and Kubernetes - AWS

2© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

AMAZON SAGEMAKER

KUBERNETES ECOSYSTEM

SCALING

Kubernetes Amazon SageMaker

Agenda

Overview of Amazon SageMaker

Adopting SageMaker

Overview of open source routes to SageMaker

Scaling ML with SageMaker

Resources to get started

2© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Page 3: Amazon SageMaker and Kubernetes - AWS

3© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

The AWS ML stackBroadest and most complete set of Machine Learning capabilities

VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS

GroundTruth

AWS Marketplace

for ML

Neo Augmented AIBuilt-in

algorithms Notebooks Experiments ProcessingModel

training& tuning

Debugger Autopilot Modelhosting Model Monitor

Deep LearningAMIs & Containers

GPUs &CPUs

ElasticInference Inferentia FPGA

AmazonRekognition

AmazonPolly

AmazonTranscribe

+Medical

AmazonComprehend

+Medical

AmazonTranslate

AmazonLex

AmazonPersonalize

AmazonForecast

AmazonFraud Detector

AmazonCodeGuru

AI SERVICES

ML SERVICES

ML FRAMEWORKS & INFRASTRUCTURE

AmazonTextract

AmazonKendra

ContactLens

For Amazon Connect

SageMaker Studio IDE

AmazonSageMaker

DeepGraphLibrary

scikit

Page 4: Amazon SageMaker and Kubernetes - AWS

4© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

The machine learning workflow is iterative and complex

Collect and prepare training data

Choose or bring yourown ML algorithm

Set up and manage environments for training

Train, debug, and tune models

Managetraining runs

Deploy modelin production

Monitormodels

Validate predictions

Scale and manage the production environment

Page 5: Amazon SageMaker and Kubernetes - AWS

5© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Use Amazon SageMaker to train and deploy models into production

Collect and prepare training dataFully managed data processing jobs/data labeling workflows

Choose or bring yourown ML algorithmCollaborative notebooks,

built-in algorithms/models

Set up and manage environments for training

One-click training

Train, debug, and tune models

Debugging andoptimization

Managetraining runsVisually track and

compare experiments

Deploy modelin productionOne-click deploymentand auto-scaling

MonitormodelsAutomaticallyspot concept drift

Validate predictionsAdd human reviewof predictions

Scale and manage the production environmentFully managed withauto-scaling for 75% less

WEB-BASED IDE FOR ML ML OPS LIFECYCLE

Page 6: Amazon SageMaker and Kubernetes - AWS

6© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Experimentation, model development, and BYOS

Use script mode to build containers quickly that can be deployed in prod (or bring your own container)

Use SM Hosting for deployment of models

Experiment using SageMaker Studio and Experiments Manager

Iterate on new models when business use case changes

Page 7: Amazon SageMaker and Kubernetes - AWS

7© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Amazon SageMaker StudioFully integrated development environment (IDE) for Machine Learning

Collaboration at scaleWithout tracking code dependencies

Easy experiment managementOrganize, track, and compare thousands of experiments

Automatic model generationFull visibility and control without writing code

Higher quality ML modelsAutomatically debug errors, monitormodels, and maintain high quality

Increased productivityCode, build, train, deploy, and monitorin a unified visual interface

Page 8: Amazon SageMaker and Kubernetes - AWS

8© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Amazon SageMaker ExperimentsOrganize, track, and compare training experiments

Tracking at scaleTrack parameters and metrics across experiments and users

Custom organizationOrganize experiments by teams, goals, and hypotheses

VisualizationEasily visualize experiments and compare

Metrics and loggingLog custom metrics using the Python SDK and APIs

Fast iterationQuickly go back and forth, and maintain high-quality

Page 9: Amazon SageMaker and Kubernetes - AWS

9© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Use Amazon SageMaker Experiments totrack and manage thousands of experiments

Page 10: Amazon SageMaker and Kubernetes - AWS

10© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Pain points of self-managed ML PlatformsThe following are all obstacles to the core goal of building best-in-class models that solve business problems

Configuring proper scaling of compute has a learning curve

Right sizing instances for cost-efficiency is hard

Kubeflow needs additional configuration to use GPU or CPU nodes optimally

Libraries and toolkits need to be regularly updated, which increases technical debt that later needs to be paid off

Setting up k8s without prior experience is challenging

Additional management burden for the ops team

Page 11: Amazon SageMaker and Kubernetes - AWS

11© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

EXAMPLE CHALLENGE:Scaling Machine Learning

Single-instance Compute (CPU or GPU) Scaling to multiple instances to maximize performance

CLI

Cluster

…CLI

EC2 instance

Page 12: Amazon SageMaker and Kubernetes - AWS

12© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

SageMaker provides the building blocks for scalable machine learning

Includes over a dozen first party algorithms, such as XGBoost

Convert existing containers with minimal changes to run in SageMaker

Deep Learning Containers provide the base layer of Apache MXNet, PyTorch, and TensorFlow frameworks

Integrated Debugger

Ground Truth data labelling

Compute is entirely managed by SageMaker. You specify parameters, instances, etc., and SageMaker makes it happen

One field change for Spot Instances

Page 13: Amazon SageMaker and Kubernetes - AWS

13© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Common IT constraints where SageMaker can still provide managed ML services

Hybrid cloud mandates

Portability requirements of application stack

Prior technology investments such as DIY ML platforms

On-premise data restrictions

Page 14: Amazon SageMaker and Kubernetes - AWS

14© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Scaling ML with Amazon SageMaker from Kubernetes

Amazon SageMakerOperators for Kubernetes2

Amazon SageMaker Componentsfor Kubeflow Pipelines1

Pipelines

Page 15: Amazon SageMaker and Kubernetes - AWS

15© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

ARCHITECTUREKubernetes

Pod Pod Pod Pod Pod Pod Pod Pod Pod

Developer

kubectl

YAML

kubelet

Worker Node

Container runtime Kube-proxy kubelet

Worker Node

Container runtime Kube-proxy kubelet

Worker Node

Container runtime Kube-proxy

API Server

Scheduler

Controller Manager Etcd

Master

Page 16: Amazon SageMaker and Kubernetes - AWS

16© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Kubeflow Pipelines

End-to-end ML workflow orchestration

Experimentation and managing various trials/experiments

Re-useable componentsand pipelines to createend-to-end solutions without having to rebuild each time

Page 17: Amazon SageMaker and Kubernetes - AWS

17© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

SageMaker Components for Kubeflow Pipelines helps modularize your code

For each step you can develop code, package it into a container (or let SM package it for you), and have that be a default run that anyone within your company can mix and match

Batch transform

Training Model deployment and updates

Ground truth data labelling

Processing

Page 18: Amazon SageMaker and Kubernetes - AWS

18© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

SageMaker + Kubeflowfor Machine Learning

Amazon SageMaker

Model development

Modeltraining

Model deployment

Datapreparation

Page 19: Amazon SageMaker and Kubernetes - AWS

19© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Portability story and SageMaker (BYOC/BYOS)

Code and containers that run in SageMaker can run anywhere

Models developed in Kubeflow can be submitted to SageMaker for managed execution

Using opensource KFP components and Kubernetes operators, you can swap back to Kubernetes at any time

Page 20: Amazon SageMaker and Kubernetes - AWS

20© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

BYOC can run in standard K8s environment

Model code Amazon S3 Amazon SageMaker

BYOC Container Amazon Elastic Kubernetes Service

BYOC

Code can run in any generic container that you build yourself

BYOC is model code uploaded to S3 that then gets ingested by SageMaker

BYOC container built for SageMaker can be run in Kubernetes without SageMaker

Page 21: Amazon SageMaker and Kubernetes - AWS

21© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Amazon SageMakeraccessible from Kubernetes

KubeflowHybrid infrastructures Portability Composability

Amazon SageMakerFully-managed infrastructure

Ground Truthlabeling

Automatic model turning

Built-in optimized algorithms

Managed Spot Training

Scalable inference endpoints

Modelmonitoring

Easy to doscalability

Page 22: Amazon SageMaker and Kubernetes - AWS

22© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

SageMakerComponents VS SageMaker

Operators

KubeFlow Pipeline Components ARCHITECTURE Kubernetes Operator custom resources

Yes KUBERNETES Yes

Self-hosted Kubeflow Pipelines ORCHESTRATION Kubernetes tools (Ex. Flyte, Argo)

Python DEV INTERFACE YAML/custom extension by customer

KFP dashboard GUI None/custom

Medium EASE OF USE Advanced

Page 23: Amazon SageMaker and Kubernetes - AWS

23© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | 23© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

RESOURCES

Getting startedCloudFormation Quick Start

SageMaker Components for Kubeflow Pipelines

SageMaker Operators for Kubernetes

Page 24: Amazon SageMaker and Kubernetes - AWS

24© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Multiple ways to get started

Opensource standard APIs—K8s operators,

KFP components

Quickstart template Examples in Github repo

Page 25: Amazon SageMaker and Kubernetes - AWS

25© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Amazon SageMaker Componentsfor Kubeflow Pipelines

Kubeflow Pipelines

Amazon SageMaker

Other component

Pipeline step

Other component

Pipeline stepPipeline step

Component

Metadata

Input/output

Implementation(container)

Amazon EC2 Container registry

Page 26: Amazon SageMaker and Kubernetes - AWS

26© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Adapt your Container for SageMaker training

1. Switch to SageMaker maintained DeepLearning Container as a base or pip install sagemaker-trainingFROM tensorflow/tensorflow:2.2.0rc2-gpu-py3-jupyter

2. Place training code in /opt/ml directory COPY train.py /opt/ml/code/train.py

3. Defines train.py as script entrypointENV SAGEMAKER_PROGRAM train.py

https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html

Page 27: Amazon SageMaker and Kubernetes - AWS

27© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

ARCHITECTUREOperator

Pod Pod Pod

Developer

kubectl

YAML

kubelet

Worker Node

Container runtime Kube-proxyOperator

API Server Scheduler

Master

Etcd

Page 28: Amazon SageMaker and Kubernetes - AWS

28© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

SageMaker Operators

KEY FEATURES

Amazon SageMaker Operatorsfor training, tuning, inference

Natively interact with Amazon SageMaker jobs using Kubernetes tools (e.g., get pods, describe)

Stream and view logs fromAmazon SageMaker in Kubernetes

Helm Charts to assist withsetup and spec creation

Kubectl apply

YAML

Kubernetes

Amazon SageMaker Operator

API Server

Amazon SageMaker

Page 29: Amazon SageMaker and Kubernetes - AWS

29© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Amazon SageMaker Operators for Kubernetes

Train, tune, and deploy models in Amazon SageMaker without leaving Kubernetes environment

Use Kubernetes kubectl CLI to submit Amazon SageMaker jobs:

• Training jobs

• Hyperparameter tuning jobs

• Hosting deployments

• Batch transform jobs

Page 30: Amazon SageMaker and Kubernetes - AWS

30© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Summary

SageMaker provides a fully managed service for authoring ML (Studio, Notebooks) and compute infrastructure

Companies may have hybrid(on-premise and cloud) use cases can use SageMaker with existing Kubernetes tools

Customers may haveexisting infrastructure

on Kubernetes

Enterprises may want toadopt an open-source

ML platform or lookingfor multi-cloud strategy

Page 31: Amazon SageMaker and Kubernetes - AWS

31© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

ADDITIONALResources

ONLINE WORKSHOP

https://eksworkshop.com/advanced/420_kubeflow/pipelines/

DOCUMENTATION

https://www.kubeflow.org/docs/aws/

BLOGS

https://towardsdatascience.com/kubernetes-and-amazon-sagemaker-for-machine-learning-distributed-training-hyperparameter-tuning-187c821e25b4

https://towardsdatascience.com/kubernetes-and-amazon-sagemaker-for-machine-learning-best-of-both-worlds-part-1-37580689a92f

Page 32: Amazon SageMaker and Kubernetes - AWS

32© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Train and Deploy Detectron2 object detection model using Amazon Sagemaker Components

• Used Mask-RCNN model from Detectron2 model zoo trained on COCO2017 dataset.

• Further fine-tune this model on custom dataset with aerial imagery.

• Drone images from TuGraz university

• Goal: Detect people from high vantage point.

• Code to Reproduce: https://github.com/HallieCrosby/detectron2/

Page 33: Amazon SageMaker and Kubernetes - AWS

33© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Amazon SageMaker Components for Kubeflow Pipelines

Kubeflow Pipeline

Component

Training Job Create Model Deploy Model

Input/Output

Implementation(container)

Metadata

Container registry

SageMaker

Component

Input/Output

Implementation(container)

Metadata

SageMakerContainer registry

Component

Input/Output

Implementation(container)

Metadata

SageMakerContainer registry

Page 34: Amazon SageMaker and Kubernetes - AWS

34© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

Thank you!

Page 35: Amazon SageMaker and Kubernetes - AWS

35© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

What is Kubeflow?Kubeflow is a machine learning toolkit for Kubernetes

Cloud/on-prem

INFRASTRUCTURE

Modelling, training, tuning, serving…

ML WORKLOADS

KUBERNETES

KUBEFLOW

Page 36: Amazon SageMaker and Kubernetes - AWS

36© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |

AWS ML infrastructure and services

Jupyternotebook instances

High performance algorithms

Large-scaletraining

Optimization One-click deployment

Fully managed with auto-scaling

ML servicesFully-managed service that

covers the entire machine learning workflow

Amazon SageMaker

Image registryContainer image repository

Amazon Elastic Container Registry (Amazon ECR)

ManagementDeployment, scheduling,

scaling, and management of containerized applications

Amazon Elastic Container Service (Amazon ECS)

Amazon ElasticKubernetes Service (Amazon EKS)

ComputeWhere the containers run

Amazon EC2