29
Markus Weimer [email protected] Making ML more useful to more people

Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Markus Weimer

[email protected]

Making ML more useful to more people

Page 2: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

“It has exquisite buttons …

with long sleeves …works for

casual as well as business

settings”{f(x) {f(x)

Why Machine Learning?“Programming the UnProgrammable”

Page 3: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Point of view: Data Science is Software Engineering with Data

Models are Software

• Built as software, just with different tools

• Deployed and updated as software

• Tested as software

• Debugged like software

Training data needs management

• Data is private and increasingly regulated

• Data is dynamic (CRUD, retention policies, …)

• Best managed as part of the data estate

• Training and deployment of models needs to respect data governance

Page 4: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

https://dot.net/ml

ML.NET

Page 5: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Brought to you by (amongst others)

Zeeshan Ahmed (Microsoft) [email protected], Saeed Amizadeh (Microsoft) <[email protected]>, Mikhail Bilenko (Yandex) <[email protected]>, Rogan Carr (Microsoft) <[email protected]>, Wei-Sheng Chin (Microsoft) <[email protected]>, Yael Dekel (Microsoft) <[email protected]>, Xavier Dupre (Microsoft) <[email protected]>, Vadim Eksarevskiy (Microsoft) <[email protected]>, Senja Filipi (Microsoft) <[email protected]>, Tom Finley (Microsoft) <[email protected]>, Abhishek Goswami (Microsoft) <[email protected]>, Monte Hoover (Microsoft) <[email protected]>, Scott Inglis (Microsoft) <[email protected]>, Matteo Interlandi (Microsoft) <[email protected]>, Najeeb Kazmi (Microsoft) <[email protected]>, Gleb Krivosheev (Microsoft) <[email protected]>, Pete Luferenko (Microsoft) <[email protected]>, Ivan Matantsev (Microsoft) <[email protected]>, Sergiy Matusevych (Microsoft) <[email protected]>, Shahab Moradi (Microsoft) <[email protected]>, Gani Nazirov (Microsoft) <[email protected]>, Justin Ormont (Microsoft) <[email protected]>, Gal Oshri (Microsoft) <[email protected]>, Artidoro Pagnoni (Microsoft) <[email protected]>, Jignesh Parmar (Microsoft) <[email protected]>, Prabhat Roy (Microsoft) <[email protected]>, Zeeshan Siddiqui (Microsoft) <[email protected]>, Markus Weimer (Microsoft) <[email protected]>, Shauheen Zahirazami (Microsoft) <[email protected]>, Yiwen Zhu (Microsoft) <[email protected]>, …

Page 6: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

About .NET

• .NET has cool stuff ML people care about

• C#: Like Java, but from the future

• F#: Like Python, but with static types and multithreading

• Almost-free calls into native code

• .NET is OSS and cross platform

• Windows (surprise!), Linux, macOS

• Phones via Xamarin: Android, iOS

• Interesting HW: Xbox, IoT devices, …

• Lots of developers build important stuff in .NET

• 4M active; 450k added each month

• 15% growth MoM in https://github.com/dotnet

• Half the top-10k websites are built in .NET

.NET

Page 7: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

ML.NET: An open source and cross-platform machine learning framework

Machine Learning made for .NET Developers

Covers many developer scenarios

Available in C#, F# and VB.NET

Open source and cross-platformWindows, Linux, Mac

X64, x86 (some), ARM (some)

Proven and extensibleDevelopment started ~10 years ago

Received contribution (and scrutiny) from all over Microsoft

Page 8: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

This designed most of my slides used today ☺

Page 9: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

ML.NET is used in many products

• Many MS products use TLC ML.NET.

• You have likely used ML.NET today ☺

• Why is that?

• Many products are written in (ASP).NET

• Using ML.NET is just like using any other .NET API

Page 10: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

var model = mlContext.Model.Load(“mymodel.zip”);

var predFunc = trainedModel.MakePredictionFunction<T_IN, T_OUT>(mlContext);

var result = predFunc.Predict(x);

Using a model is just like using codeResource

shipped with the app.

Standard software

dependency

Training: Think sklearn, but with a statically typed language

Page 11: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

ML.NET captures end-to-end Machine Learning Pipelines

Data Ingestion

Text

SQL

In Memory

Featurization and Transforms

Text & Image featurization

Pre-trained DNNs in ONNX, TensorFlow

Feature transforms (normalization, pruning, …)

Learning Algorithms

Supervised: Linear, Trees, Factorization Machines, …

Unsupervised: PCA, LDA, K-Means, …

Time Series

Page 12: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

ML.NET is fast & good

• Core infrastructure: IDataView

• Carefully designed to avoid memory allocations

• Only required data is lazily materialized

• Carefully tuned defaults

• Many ML tasks are more alike than we’d like to admit ☺

GBDT Experiments done on Criteo, using default parameters

Page 13: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

ML.NET’s journey to OSS

• Developed for almost a decade as an internal tool

• Open Sourced in May 2018 (at //build)

• MIT License, .NET Foundation

• Monthly releases ever since; 1.0rc1 this Tuesday

• Please check it out, and leave feedback

Page 14: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Other efforts not discussed today

• Pretzel

• Model compiler

• Especially good at the many models → one program problem

• http://www.markusweimer.com/publication/2018/10/23/pretzel/

• TorchSharp

• PyTorch – Python + .NET

• https://github.com/xamarin/TorchSharp

Page 15: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Distributed Machine Learning where the Data is

Page 16: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

• One cluster used by allworkloads (interactive, batch, streaming, …)

• Resources are handed out as containers• A container is slice of a

machine• Fixed RAM, CPU, I/O, …

• Examples:• Azure Batch• Apache Hadoop YARN• Apache Mesos• Google Borg

Resource Managers

Container

Page 17: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

• Fault tolerance

• Pre-emption

• Elasticity

Challenges

Page 18: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

• ML thrives with gang scheduling• Iterative • Fixed data sets

• Gangs are undesirable on shared clusters• Utilization is paramount• MPI: Wait …• MapReduce: Do the work

slowly on fewer machines

• Let’s do better than that

Machine learning

Page 19: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Approach I: Elastic MLNeurIPS ‘14

Page 20: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

• Our solution:• Ramp up the workload

with the allocations

• In each iteration, add machines and data

• First iteration

Elastic ML

Page 21: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

• Our solution:• Ramp up the workload

with the allocations

• In each iteration, add machines and data

• Second Iteration

Elastic ML

Page 22: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

• Our solution:• Ramp up the workload

with the allocations

• In each iteration, add machines and data

• End state

Elastic ML

Page 23: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Is it any good?

Page 24: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Approach II: Coded computingYaoqing Yang (CMU), Matteo Interlandi, Saeed Amizadeh

NeurIPS ’18, ongoing work

Page 25: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Coded DataOriginal Data

Or: Coded Computing

Container 1 Container 2 Container 3 Container 4 Container 5 Container 6

X[1] X[2] X[3] X[1]+2x[2]+3X[3] X[1]+4X[2]+9X[3]

X[1]+8X[2]+27X[3]

Y[1] Y[2] Y[3] Y[1]+2Y[2]+3Y[3] Y[1]+4Y[2]+9Y[3] Y[1]+8Y[2]+27Y[3]

… … … … … …

… … … … … …

… … … … … …

• Encode 3 splits into 6 splits

• Any 3 row bloks out of 6 are sufficient

Page 26: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Results

• Real dataset: 100,000 samples, 3352 Features.

• Distributed computing on 20 machines.

• Randomly pick 10 machines and let them randomly fail during the computation.

Page 27: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Point of view: Data Science is Software Engineering with Data

Models are Software

• Built as software, just with different tools

• Deployed and updated as software

• Tested as software

• Debugged like software

Training data needs management

• Data is private and increasingly regulated

• Data is dynamic (CRUD, retention policies, …)

• Best managed as part of the data estate

• Training and deployment of models needs to respect data governance

Page 28: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Many open questions

• For software, we have source control. For data and models we have …?

• For software, we have code reviews. For data we have … ?

• For software, we have semantic versions, for data we have … ?

• For software, we have debuggers. For models, we have … ?

• For software, we have signing. For models, we have … ?

• …

Page 29: Making ML more useful to more people · ML.NET: An open source and cross-platform machine learning framework Machine Learning made for .NET Developers Covers many developer scenarios

Thanks for your time!Let’s stay in touch!

ML.NET is ML for .NEThttps://dot.net/ml

https://github.com/dotnet/machinelearning

You can reach me at:[email protected]

@MarkusWeimer

http://markusweimer.com

Of course, we are hiring