30
Mitul Tiwari and Deepak Bobbarjung TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES

TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Mitul Tiwari and Deepak Bobbarjung

TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES

Page 2: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Introductions

Page 3: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Outline

Conversational AI and Deep Learning

Need for a Jobs framework on Kubernetes

Our Jobs architecture

Page 4: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Bot Training#1 AI/NLP Model.

Bot DeploymentBuild Once, Deploy Everywhere.

Our Conversational AI Platform

Bot BuilderNo Coding Required.

AI/NLP

Intents

Entities

Attributes

Speech to text

Language Translation

Sentiment Knowledge Base

Page 5: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

How to Make a Bot Intelligent?

• Natural Language Understanding

• Information Extraction

• Entities

• Intents

• Actions

• Natural Language Generation

• Generating Response

Page 6: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Deep Learning

• Traditional Machine Learning

• Human designed features and representations

• Optimize weights to combine

• Deep Learning

• Deep Neural Network

• Learn good features and multiple levels of representations

Page 7: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Deep Learning for NLP• Language Translation

• Image Captioning

• Text Summarization

• Parts-of-speech Tagging

• Named Entity Recognition

• Natural Language Generation

• Question-Answering

• Optical Character Recognition

• Speech Recognition

• Machine Reading Comprehension

Page 8: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Neural Network for Word Embedding

• Word Embedding: Word2Vec

• Embed words in continues vector space

• Semantically similar words are mapped to nearby points

• Enables powerful operations

• “King”-“Man”+”Woman” -> “Queen”

Page 9: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Bag of Words - Curse of Dimensionality

• Before word embeddings - Bag of words

• Dictionary of words & counts in the text

• Easy feature generation technique

• Limitations

• Hard to capture order of words

• Curse of dimensionality - limited vocabulary - similar words don’t match

Page 10: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Word Embeddings Cont’d

• Word Embedding: mapping words to a higher dimensional space, typically 200-500, e.g.,

• W(‘King’) = (0.2, -0.4, 0.9, …)

• W(‘Queen’) = (0.1, -0.3, 0.8, …)

• Learn representations of words

• How: two layer NN to learn word representations by predicting validity of phrases

Example of similar word vectors

Page 11: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Sequence Learning: Response Generation

• Automated Response Generation

• Sequence 2 Sequence Model

• Recurrent Neural Network (RNN)

• Long Short Term Memory Network (LSTM)

• Example: GMail Smart Reply

• Automated Response Suggestions

Page 12: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Sequence Learning: RNN And LSTM

• Recurrent Neural Network

• Output of a module go into a module of same type (recurrent)

• Good for capturing a sequence

• Long Short Term Memory Network

• Long running cell state: forget & add new values

• Output: combination of cell state, previous output, and new input

Page 13: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Training Deep Learning Models for NLP

• Intent Classification• Deep Learning — LSTM

• Information Extraction• Named Entity Recognition (NER)• Slot attributes

• Sentiment and Complaint Classifier

• Knowledge Base & Semantic Search

• Machine Reading Comprehension

Natural Language Understanding &

GenerationAnalysis for Complaints

Targeted Personalized Timely

Notification

Automatic Speech Recognition &

Generation

Deep Learning

Entity Graph & Knowledge Base

Text Sentiment Notifications Speech

Page 14: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Scaling Training Deep Learning Models For NLP

•Off line: Started with a script for training models

•Run Time: A service for prediction during runtime

•However, the number of models are reaching in thousands

•Hard to manage model training script for each of the bot

Conversation Plane (Run Time)

Control Plane(Offline)

Users

Interfaces

NLP Prediction (for example, Intent classification …)

Train Models

Store Models

Training Data Load Models

Page 15: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Outline

Conversational AI and Deep Learning

Need for a Jobs framework on Kubernetes

Our Jobs architecture

Page 16: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Training plane

Conversation plane Control planeData

User Data

Config Data

Orchestration

AI/NLPAPI Service

Configure Bots

Train Bottraining job

Hi

Classify (Hi)

model

Greetings intentWelcome to..

Passage AI Architecture

Page 17: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Passage AI Architecture

Training plane

(Jobs)

Page 18: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

When do we train a new model for a bot ?

• When a new bot is created

• When a bot is changed

• utterances are added or modified

• New training data is available

Page 19: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

# of Bots

0

125

250

375

500

August September October November

# of Bot changes per day

0

25

50

75

100

August September October November

Page 20: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Why do we need a Jobs framework?• Run jobs at scale

• Eliminate out of band scripts that tend to become ‘tribal’.

• APIs and UI for exposing jobs to our customers in our Bot Builder UI.

• Reporting and auditing around jobs.

Control plane

API Service

Create job specificationRun a jobShow last 10 job runs

Page 21: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Conversation plane

Why Kubernetes (K8S) For Our Microservices?

• Scale and availability of our microservices

Orchestration Service

K8s service K8s hpaK8s deployment

Pod Pod Pod Pod Pod

Nginx

Pod

Pod PodPod PodPod Pod

Page 22: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

On-Prem

Why Kubernetes?(Contd)

• Cloud Agnostic and On-prem ready

Staging

Conversation plane

Control plane

ProductionConversation

planeControl plane

IntegrationConversation

planeControl plane

Conversation plane

Control plane

$helm install passage-ai

StandbyConversation

planeControl plane

Page 23: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Why Create The Jobs Framework In Kubernetes?

• Handle scale and availability in the same way as our microservices Jobs Plane

Pod Pod Pod Pod Pod Pod

Pod PodPod PodPod Pod

• Jobs should also be cloud-agnostic and on-prem ready

• Same set of tools for monitoring, logging and auditing.

Page 24: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Outline

Conversational AI and Deep Learning

Need for a Jobs framework on Kubernetes

Our Jobs architecture

Page 25: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Example Job types in our system

• Training deep learning models

• Extracting and indexing knowledge base articles

• Nightly testing of our bots

Page 26: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Control planeAPI

Job Specification

Create a job spec

Schedule: “Every Monday at 1 AM”Job type: “Training”

- bot ID, training dataTraining specific params:

priority

Job ObjectControl plane

API

Trigger a job from a job spec

progress: 25 job_spec_id: <id>

- data: < confusion matrix>

description: performing training

state: in_progress

Page 27: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Jobs plane

Control planeAPI Service

jobs job_specs

Jobs Service

train

ing

Q

KB In

dex

Q

Bot T

estin

g Q

K8s deployment of training pods

Pod1

Pod4

Pod2

Pod3

Upda

te job

prog

ress

Jobs Architecture

Pod1

Pod2

Pod1

Pod2

Trigger training job from job spec

Create a Job (params)

Add item on Q

Pickup item

gpu

gpu

Page 28: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Jobs plane

Control plane

API Servicejobs job_specs

Jobs Service

train

ingQ

KB In

dex

Q

K8s deployment of training pods

Pod1

Pod4Pod3

Pod2

Upda

te job

prog

ressTrigger job from job spec

Jobs Architecture (scheduled jobs)

Pod1

Pod2

Pod1

Pod2

Scheduler

CronJob

Get job status

Bot T

estin

g Q

Page 29: TRAINING DEEP LEARNING MODELS AT SCALE …...TRAINING DEEP LEARNING MODELS AT SCALE USING KUBERNETES Introductions Outline Conversational AI and Deep Learning Need for a Jobs framework

Alternatives that we considered

Apache Airflow Azkaban