54
@deanwampler Executive Briefing: What it takes to use machine learning in fast data pipelines @deanwampler @deanwampler [email protected] polyglotprogramming.com/talks © Dean Wampler, 2007-2019

Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

Executive Briefing: What it takes to use machine learning in fast data pipelines

@deanwampler

@deanwampler [email protected] polyglotprogramming.com/talks

© Dean Wampler, 2007-2019

Page 2: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler@deanwampler

lbnd.io/fast-data-ebook

Data Streaming, in General

Page 3: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

What We’ll Discuss• Batch vs. streaming... and why• Data science vs. data engineering• Serving models in production• CI/CD Systems for ML• Example architecture• Updating Models in Production

@deanwampler

Page 4: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Batch vs. streaming… and why

@deanwampler

Page 5: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

EnergyMedical

Finance

… and IoT

Telecom

Mobile@deanwampler

State of the art phone!

Page 6: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

EnergyMedical

Finance

… and IoT

Telecom

Mobile

Information value has a half life;

it decays with time

@deanwampler

Page 7: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Data Science vs. Data Engineering

@deanwampler

Page 8: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Software Engineering toolbox

Data Science toolbox

@deanwampler

Page 9: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

• Comfortable with uncertainty • Less process oriented• Iterative, experimental

• Uncomfortable with uncertainty• Process oriented• Agile Manifesto• … which does not

mention data!https://derwen.ai/s/6fqt

@deanwampler

Data Scientists Data Engineers

Page 10: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Streaming Imposes New Requirements

@deanwampler

If you run something long enough, all rare problems

eventually happen!

Page 11: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

• Reliability - fault and “surprise” tolerant• Availability - “always on”• Low latency - for some definition of “low”• Scalability - up and down• Adaptability - ideally without restarts

Page 12: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

In other words: Microservices

Browse

REST

AccountOrders

ShoppingCart

API Gateway

Inventory

Page 13: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Serving Models in Production

@deanwampler

Page 14: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

A Recent Kubeflow User Survey

@deanwamplerFrom this Kubeflow Overview

Page 15: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

• ~60% worry about missed opportunities• ~50% worry about loss of data team

productivity• ~45% worry about slow time-to-market• ~40% worry about customer dissatisfaction

Lack of Tool/Process Integration

@deanwamplerFrom a recent Lightbend survey

Page 16: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

•Why did the model reject that loan application?

Can You Answer this Question?

@deanwampler

(After you’ve been sued for discrimination…)

Page 17: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

•Which version of the model was used?• How was it trained? •When was this model deployed?•… and other questions you’ll need to answer to

understand what happened…

Which model was it?

@deanwampler

Page 18: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

CI/CD for ML?

@deanwamplera.k.a “MLops”

Page 19: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

• Version control - for models and code• Automation - builds, tests, quality checks,

artifact management & delivery• Necessary for reproducibility

CI/CD Process Required (1/4)

@deanwampler

Page 20: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

• Supports different launch configurations: • “dark” launches• A/B, Canary, and other testing scenarios

CI/CD Process Required (2/4)

@deanwampler

Page 21: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

• Auditing• Which model used to score this record?• Which records used to train this model?• Who accessed this model and when?

CI/CD Processes Required (3/4)

@deanwampler

Page 22: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

• Auditing• Which model used to score this record?• Which records used to train this model?• Who accessed this model and when?

CI/CD Processes Required (3/4)

@deanwampler

Models Are

Data

Page 23: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

• Auditing• Which model used to score this record?• Which records used to train this model?• Who accessed this model and when?

CI/CD Processes Required (3/4)

@deanwampler

GDPR - What if a customer asks you to delete their data? Do you also delete the

models trained with that data?

Page 24: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

•Monitoring• Resource utilization changes?• Quality metrics:•Match performance during training?• Concept drift?

CI/CD Processes Required (4/4)

@deanwampler

Page 25: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

• AutoML• Data safety and lineage•Model fairness and reproducibility•Model and feature artifact management

https://www.oreilly.com/ideas/9-ai-trends-on-our-radar@deanwampler

What’s Different from Microservice CI/CD?

Page 26: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

• CI/CD prefers deterministic measures of quality. How should you support the extra statistical indeterminacy data science introduces?

What’s Different from Microservice CI/CD?

Page 27: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

•Kubeflow - for Kubernetes• SageMaker - for AWS users•MLFlow - from the Spark community•… plus emerging vendors

CI/CD Suites for ML

@deanwampler

Page 28: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Systems - Kubeflow

Page 29: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Example Architectures

@deanwampler

Page 30: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Example Architectures

@deanwampler

Timely Information Integrated with

Your Apps

Page 31: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwamplerDataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Page 32: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwamplerDataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Session management, REST microservices

Model Scoring

Model Training

Three groups of functionality

Page 33: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwamplerDataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Embedded model serving

Page 34: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwamplerDataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 7Ingest Scores8

CorrectiveAction

9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 6Model Serving

Model serving as a service

Page 36: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

DataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 7Ingest Scores8

CorrectiveAction

9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 6Model Serving

Model Serving as a Service• Pros:• A familiar integration

pattern • Decouples “concerns”: AI

tools, scaling, upgrading, …• One system for training

and scoring

Page 37: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

• Cons:• Overhead of invocation,

e.g., REST• ML Pipeline becomes a

unique production work flow

Model Serving as a Service

DataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 7Ingest Scores8

CorrectiveAction

9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 6Model Serving

Page 38: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Embedded Model Serving• Pros:• Lowest scoring

overhead - interprocess communication only used for model updates• Performance tuning

focuses on one system, the data pipeline

Page 39: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

Embedded Model Serving• Cons:• Model parameters must

be serialized• More complexity• Model serving library

must be “compatible” with training system

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Page 40: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

Updating Models in Production @deanwampler

Page 41: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

• Concept Drift - models grow stale• They have a half life, too• So, periodically retrain, then serve the new

model, ideally without downtime

Model Updates

Page 42: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

• How do you measure model quality?• What’s the trade-off between model

performance vs. retraining cost?• How far back in the data set do you go

when training?

Retraining Considerations

Page 43: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker1. Telemetry

9. Ingest Scores

10. Corrective Action

Complex to update embedded models!

DataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1. Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 78. Ingest Scores8

9. Corrective Action9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 66. Model Serving

Model updates can be straightforward

Page 44: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

• Kind of Model• Parameters and hyperparameters•When trained• Data used for training•When deployed, undeployed, etc.•…

Auditing

Page 45: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler

• Quality metrics• Serving metrics (how many records, scoring

times…)• Provenance of decision to retrain• The metrics gathered above that were used to

decide when to retrain

Auditing

Page 46: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwamplerMars last Summer

Dusty Milky Way

Page 47: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

References• Ideas:• Ben Lorica on 9 AI Trends• Paco Nathan’s Data Governance Talk

@deanwampler

You can get these slides with the links here: polyglotprogramming.com/talks

Page 48: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

References• Ideas:• O’Reilly Radar: Data, AI, others• distill.pub• The Algorithm• The Gradient

@deanwamplerpolyglotprogramming.com/talks

Page 50: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

References• Kubeflow•MLFlow• DVC • AWS SageMaker• Fiddler (explainable AI)

@deanwamplerpolyglotprogramming.com/talks

Page 51: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

References• General Information about Stream Processing• My O'Reilly Report on Architectures• Streaming Systems Book• Stream Processing with Apache Spark• Designing Data-Intensive Apps book

@deanwamplerpolyglotprogramming.com/talks

Page 53: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

References• Tutorials• Model serving in streams• Stream processing with Kafka and

microservices

@deanwamplerpolyglotprogramming.com/talks

Page 54: Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams … Sessions Streams Storage Device 1 Telemetry 2. → Model Training 3 3. ← Model Storage

@deanwampler [email protected]

polyglotprogramming.com/talks

Questions?