Executive Briefing: What it takes to use machine learning ......Ka9a Streams Akka Streams …...

Preview:

Citation preview

@deanwampler

Executive Briefing: What it takes to use machine learning in fast data pipelines

@deanwampler

@deanwampler dean@deanwampler.com polyglotprogramming.com/talks

© Dean Wampler, 2007-2019

@deanwampler@deanwampler

lbnd.io/fast-data-ebook

Data Streaming, in General

What We’ll Discuss• Batch vs. streaming... and why• Data science vs. data engineering• Serving models in production• CI/CD Systems for ML• Example architecture• Updating Models in Production

@deanwampler

Batch vs. streaming… and why

@deanwampler

EnergyMedical

Finance

… and IoT

Telecom

Mobile@deanwampler

State of the art phone!

EnergyMedical

Finance

… and IoT

Telecom

Mobile

Information value has a half life;

it decays with time

@deanwampler

Data Science vs. Data Engineering

@deanwampler

Software Engineering toolbox

Data Science toolbox

@deanwampler

• Comfortable with uncertainty • Less process oriented• Iterative, experimental

• Uncomfortable with uncertainty• Process oriented• Agile Manifesto• … which does not

mention data!https://derwen.ai/s/6fqt

@deanwampler

Data Scientists Data Engineers

Streaming Imposes New Requirements

@deanwampler

If you run something long enough, all rare problems

eventually happen!

@deanwampler

• Reliability - fault and “surprise” tolerant• Availability - “always on”• Low latency - for some definition of “low”• Scalability - up and down• Adaptability - ideally without restarts

@deanwampler

In other words: Microservices

Browse

REST

AccountOrders

ShoppingCart

API Gateway

Inventory

Serving Models in Production

@deanwampler

A Recent Kubeflow User Survey

@deanwamplerFrom this Kubeflow Overview

• ~60% worry about missed opportunities• ~50% worry about loss of data team

productivity• ~45% worry about slow time-to-market• ~40% worry about customer dissatisfaction

Lack of Tool/Process Integration

@deanwamplerFrom a recent Lightbend survey

•Why did the model reject that loan application?

Can You Answer this Question?

@deanwampler

(After you’ve been sued for discrimination…)

•Which version of the model was used?• How was it trained? •When was this model deployed?•… and other questions you’ll need to answer to

understand what happened…

Which model was it?

@deanwampler

CI/CD for ML?

@deanwamplera.k.a “MLops”

• Version control - for models and code• Automation - builds, tests, quality checks,

artifact management & delivery• Necessary for reproducibility

CI/CD Process Required (1/4)

@deanwampler

• Supports different launch configurations: • “dark” launches• A/B, Canary, and other testing scenarios

CI/CD Process Required (2/4)

@deanwampler

• Auditing• Which model used to score this record?• Which records used to train this model?• Who accessed this model and when?

CI/CD Processes Required (3/4)

@deanwampler

• Auditing• Which model used to score this record?• Which records used to train this model?• Who accessed this model and when?

CI/CD Processes Required (3/4)

@deanwampler

Models Are

Data

• Auditing• Which model used to score this record?• Which records used to train this model?• Who accessed this model and when?

CI/CD Processes Required (3/4)

@deanwampler

GDPR - What if a customer asks you to delete their data? Do you also delete the

models trained with that data?

•Monitoring• Resource utilization changes?• Quality metrics:•Match performance during training?• Concept drift?

CI/CD Processes Required (4/4)

@deanwampler

• AutoML• Data safety and lineage•Model fairness and reproducibility•Model and feature artifact management

https://www.oreilly.com/ideas/9-ai-trends-on-our-radar@deanwampler

What’s Different from Microservice CI/CD?

@deanwampler

• CI/CD prefers deterministic measures of quality. How should you support the extra statistical indeterminacy data science introduces?

What’s Different from Microservice CI/CD?

•Kubeflow - for Kubernetes• SageMaker - for AWS users•MLFlow - from the Spark community•… plus emerging vendors

CI/CD Suites for ML

@deanwampler

Systems - Kubeflow

Example Architectures

@deanwampler

Example Architectures

@deanwampler

Timely Information Integrated with

Your Apps

@deanwamplerDataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

@deanwamplerDataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Session management, REST microservices

Model Scoring

Model Training

Three groups of functionality

@deanwamplerDataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Embedded model serving

@deanwamplerDataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 7Ingest Scores8

CorrectiveAction

9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 6Model Serving

Model serving as a service

@deanwampler

DataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 7Ingest Scores8

CorrectiveAction

9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 6Model Serving

Model Serving as a Service• Pros:• A familiar integration

pattern • Decouples “concerns”: AI

tools, scaling, upgrading, …• One system for training

and scoring

@deanwampler

• Cons:• Overhead of invocation,

e.g., REST• ML Pipeline becomes a

unique production work flow

Model Serving as a Service

DataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 7Ingest Scores8

CorrectiveAction

9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 6Model Serving

@deanwampler

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Embedded Model Serving• Pros:• Lowest scoring

overhead - interprocess communication only used for model updates• Performance tuning

focuses on one system, the data pipeline

@deanwampler

Embedded Model Serving• Cons:• Model parameters must

be serialized• More complexity• Model serving library

must be “compatible” with training system

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

BrokerTelemetry

Ingest Scores

CorrectiveAction

Updating Models in Production @deanwampler

@deanwampler

• Concept Drift - models grow stale• They have a half life, too• So, periodically retrain, then serve the new

model, ideally without downtime

Model Updates

@deanwampler

• How do you measure model quality?• What’s the trade-off between model

performance vs. retraining cost?• How far back in the data set do you go

when training?

Retraining Considerations

@deanwampler

DataCenter

Mini-batch,Batch

Spark

LowLatencyMicroservices

Ka;aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1

2. → Model Training3. ← New models

2, 3

4. ← Model Storage5. → Boot up, historical data

4, 5

6. → Data Pipeline 7. → Model Serving 8. ← Anomalies

6, 7, 89

10

Spark

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker1. Telemetry

9. Ingest Scores

10. Corrective Action

Complex to update embedded models!

DataCenter

REST,gRPC,…

Spark

LowLatency

Ka9aStreamsAkkaStreams

Sessions

Streams

Storage

Device

1. Telemetry1

2. → Model Training

3

3. ← Model Storage4. → Boot up, historical data

3, 4

5. → Data Pipeline 7. ← Anomalies

5, 78. Ingest Scores8

9. Corrective Action9

TensorFlow,…

MicroserviceMicroserviceMicroservice

DeviceSessionMicroservices

Persistence

Ka+a Cluster

Broker 66. Model Serving

Model updates can be straightforward

@deanwampler

• Kind of Model• Parameters and hyperparameters•When trained• Data used for training•When deployed, undeployed, etc.•…

Auditing

@deanwampler

• Quality metrics• Serving metrics (how many records, scoring

times…)• Provenance of decision to retrain• The metrics gathered above that were used to

decide when to retrain

Auditing

@deanwamplerMars last Summer

Dusty Milky Way

References• Ideas:• Ben Lorica on 9 AI Trends• Paco Nathan’s Data Governance Talk

@deanwampler

You can get these slides with the links here: polyglotprogramming.com/talks

References• Ideas:• O’Reilly Radar: Data, AI, others• distill.pub• The Algorithm• The Gradient

@deanwamplerpolyglotprogramming.com/talks

References• Kubeflow•MLFlow• DVC • AWS SageMaker• Fiddler (explainable AI)

@deanwamplerpolyglotprogramming.com/talks

References• General Information about Stream Processing• My O'Reilly Report on Architectures• Streaming Systems Book• Stream Processing with Apache Spark• Designing Data-Intensive Apps book

@deanwamplerpolyglotprogramming.com/talks

References• Tutorials• Model serving in streams• Stream processing with Kafka and

microservices

@deanwamplerpolyglotprogramming.com/talks

@deanwampler dean@deanwampler.com

polyglotprogramming.com/talks

Questions?

Recommended