28
Machine Learning Michał Łopuszyński ICM, Warsaw, 2017.01.31 Engineering, maintenance costs, technical debt Goes Production!

Machine Learning Goes Production

Embed Size (px)

Citation preview

Page 1: Machine Learning Goes Production

Machine Learning

Michał Łopuszyński

ICM, Warsaw, 2017.01.31

Engineering, maintenance costs, technical debt

Goes Production!

Page 2: Machine Learning Goes Production

Hmmm... My telly says, machine learning is amazingly cool.

Should I care about all thisengineering, maintenance costs,

technical debt?

Page 3: Machine Learning Goes Production

Oh yes! You'd better do!

Page 5: Machine Learning Goes Production

Example – Fast forward 5 years. Hey, can we?!?

doi:10.1126/science.1248506

Great supplementary material is available for this paper! Check this link.

Page 6: Machine Learning Goes Production

What to do?

Not good.

Page 7: Machine Learning Goes Production

It 's engineering, stupid!

Page 11: Machine Learning Goes Production

ML engineering – reading list

[Breck]

There is also a presentation on this topic

https://sites.google.com/site/wildml2016nips/SculleySlides1.pdf

Reliable Machine Learning in the Wild - NIPS 2016 Workshop

Page 12: Machine Learning Goes Production

One more cool thing about the above papers

ML NOW

DISCUSSEDPAPERS

THE HYPE CURVE

VISIBILITY

TIME

Page 13: Machine Learning Goes Production

So, what they say?

Page 14: Machine Learning Goes Production

Wisdom learnt the hard way [Sculley]

“As the machine learning (ML) community continues

to accumulate years of experience with live systems,

a wide-spread and uncomfortable trend has emerged:

developing and deploying ML systems is relatively

fast and cheap, but maintaining them over time is

difficult and expensive.

This dichotomy can be understood through the lens of

technical debt (...)”

5

Page 15: Machine Learning Goes Production

Technical debt? What does it even mean?

Page 21: Machine Learning Goes Production

Common anti-patterns [Sculley]

Glue codeReal systems = 5% ML code + 95% glue codeRewrite general purpose packages or wrap in a common API

Pipeline junglesEspecially, indirect feedback loops are difficult to track!

Dead experimental code pathsKnight Capital case, 465M$ lost in 45 min. from obsolete exp. code

Abstraction debtML abstractions much less developed than, e.g., in relational databases

Bad code smells (less severe anti-patterns)•Plain old data smell•

Multi-language smell•

Prototype smell•

Page 22: Machine Learning Goes Production

Configuration debt [Sculley]

“Another potentially surprising area where debt can accumulate is in the configuration of ML systems. (...) In a mature system which is being actively developed, the number of lines of configuration can far exceed the number of lines of the traditional code. Each configuration line has a potential for mistakes.”

“It should be easy to specify a configuration as a small change from aprevious configuration”

“Configurations should undergo a full code review and be checked into arepository”

“It should be hard to make manual errors, omissions, or oversights”•“It should be easy to see, visually, the difference in configuration betweentwo models”

“It should be easy to automatically assert and verify basic facts about theconfiguration: features used, transitive closure of data dependencies, etc.”

“It should be possible to detect unused or redundant settings”•

Page 24: Machine Learning Goes Production

Monitoring [Zinkevich]

Rule #8: “Know the freshness requirements of your system”•

Rule #9: “Detect problems before exporting models” •

Rule #10: “Watch for silent failures”•

Rule #11: “Give feature sets owners and documentation”•

Page 25: Machine Learning Goes Production

What should be tested/monitored in ML sys. [Breck]

Testing features and data•Test distribution, correlation, other statistical properties, cost of each feature ...

Testing model development•Test off-line scores vs. on-line performance (e.g., via A/B test), impact of hyperparameters, impact of model freshness, quality on data slices, comparison with simple baseline, ...

Testing ML infrastructure•Reproducibility of training, model quality before serving, fast roll-backs to previous versions, ...

Monitoring ML in production•Nans or infinities in the output, computational performance problems or RAM usage, decrease in quality of results, ...

Page 26: Machine Learning Goes Production

Other areas of ML-related debt [Sculley]

Culture•Deletion of features, reduction of complexity, improvements in reproducibility, stability, and monitoring are valued the same (or more!) as improvements in accuracy

“(...) This is most likely to occur within heterogeneous teams with strengths inboth ML research and engineering”

Reproducibility debt•ML-system behaviour is difficult to reproduce exactly, because of randomized algorithms, non-determinism inherent in parallel processing, reliance on initial conditions, interactions with the external world, ...

Data testing debt•ML converts data into code. For that code to be correct, data need to be correct. But how do you test data?

Process management debt•How is deployment, maintenance, configuration, recovery of the infrastructure handled? Bad smell a lot of manual work

Page 27: Machine Learning Goes Production

Measuring technical debt [Sculley]

“Does improving one model or signal degrade others?”•

“What is the transitive closure of all data dependencies?”•

“How easily can an entirely new algorithmic approach be tested at full scale?”

“How precisely can the impact of a new change to the system be measured?”

“How quickly can new members of the team be brought up to speed?”•

Page 28: Machine Learning Goes Production

Thank you!

Questions?

@lopusz