Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning Services

https://github.com/Microsoft/CNTK

Caffe Cognitive Toolkit MxNet TensorFlow Torch

FCN5 (1024) 55.329ms 51.038ms 60.448ms 62.044ms 52.154ms

AlexNet (256) 36.815ms 27.215ms 28.994ms 103.960ms 37.462ms

ResNet (32) 143.987ms 81.470ms 84.545ms 181.404ms 90.935ms

LSTM (256)

(v7 benchmark)

- 43.581ms

(44.917ms)

288.142ms

(284.898ms)

-

(223.547ms)

1130.606ms

(906.958ms)

http://dlbench.comp.hkbu.edu.hk/

Benchmarking by HKBU, Version 8

Single Tesla K80 GPU, CUDA: 8.0 CUDNN: v5.1

Caffe: 1.0rc5(39f28e4)

CNTK: 2.0 Beta10(1ae666d)

MXNet: 0.93(32dc3a2)

TensorFlow: 1.0(4ac9c09)

Torch: 7(748f5e3)

2 only supports 1 GPU

Achieved with 1-bit gradient quantizationalgorithm

0

10000

20000

30000

40000

50000

60000

70000

80000

1 2 3 4 5

speed comparison (samples/second), higher = better

[note: December 2015]

Series1 Series2 Series3

MICROSOFT COGNITIVE TOOLKITFirst Deep Learning Framework Fully Optimized for Pascal

78

2,400

3,500

7,600

13,000

0

2,000

4,000

6,000

8,000

10,000

12,000

14,000

1 2 3 4 5

Toolkit Delivering Near-Linear Multi-GPU Scaling AlexNet Performance

imag

es

/ se

c

AlexNet training batch size 128, Grad Bit = 32, Dual socket E5-2699v4 CPUs (total 44 cores)CNTK 2.0b3 (to be released) includes cuDNN 5.1.8, NCCL 1.6.1, NVLink enabled

170x Fasterv. CPU Server

$ pip install <url>

https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine

https://docs.microsoft.com/en-us/cognitive-toolkit/Setup-CNTK-on-your-machine

https://notebooks.azure.com/cntk/libraries/tutorials

https://notebooks.azure.com/cntk/libraries/tutorials

Example: 2-hidden layer feed-forward NN

h1 = s(W1 x + b1) h1 = sigmoid (x @ W1 + b1)

h2 = s(W2 h1 + b2) h2 = sigmoid (h1 @ W2 + b2)

P = softmax(Wout h2 + bout) P = softmax (h2 @ Wout + bout)

with input x RM and one-hot label L RM

and cross-entropy training criterion

ce = LT log P ce = cross_entropy (L, P)

Scorpusce = max

Example: 2-hidden layer feed-forward NN




with input x RM and one-hot label y RJ


ce = yT log P ce = cross_entropy (L, P)

Scorpusce = max

example: 2-hidden layer feed-forward NN




with input x RM and one-hot label y RJ


ce = yT log P ce = cross_entropy (P, y)

Scorpusce = max

h1 = sigmoid (x @ W1 + b1)

h2 = sigmoid (h1 @ W2 + b2)

P = softmax (h2 @ Wout + bout)

ce = cross_entropy (P, y)

•

+

s

•

+

s

•

+

softmax

W1

b1

W2

b2

Wout

bout

cross_entropy

h1

h2

P

x y

h1 = sigmoid (x @ W1 + b1)

h2 = sigmoid (h1 @ W2 + b2)

P = softmax (h2 @ Wout + bout)

ce = cross_entropy (P, y)

ce

•

+

s

•

+

s

•

+

softmax

W1

b1

W2

b2

Wout

bout

cross_entropy

h1

h2

P

x y

ce

LEGO-like composability allows CNTK to supportwide range of networks & applications

Script configure and executes through CNTK Python APIs…

trainer• SGD

(momentum,Adam, …)

• minibatching

reader• minibatch source• task-specific

deserializer• automatic

randomization• distributed

reading

corpus model

network• model function• criterion function• CPU/GPU

execution engine• packing, padding

from cntk import *

# readerdef create_reader(path, is_training):

...

# networkdef create_model_function():

...def create_criterion_function(model):

...

# trainer (and evaluator)def train(reader, model):

...def evaluate(reader, model):

...

# main functionmodel = create_model_function()

reader = create_reader(..., is_training=True)train(reader, model)

reader = create_reader(..., is_training=False)evaluate(reader, model)

def create_reader(map_file, mean_file, is_training):# image preprocessing pipelinetransforms = [

ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio')ImageDeserializer.scale(width=image_width, height=image_height,

channels=num_channels,interpolations='linear'),

ImageDeserializer.mean(mean_file)]# deserializerreturn MinibatchSource(ImageDeserializer(map_file, StreamDefs(

features = StreamDef(field='image', transforms=transforms), 'labels = StreamDef(field='label', shape=num_classes)

)), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)

def create_reader(map_file, mean_file, is_training):# image preprocessing pipelinetransforms = [

ImageDeserializer.crop(crop_type='Random', ratio=0.8, jitter_type='uniRatio')ImageDeserializer.scale(width=image_width, height=image_height,

channels=num_channels,interpolations='linear'),

ImageDeserializer.mean(mean_file)]# deserializerreturn MinibatchSource(ImageDeserializer(map_file, StreamDefs(

features = StreamDef(field='image', transforms=transforms), 'labels = StreamDef(field='label', shape=num_classes)

)), randomize=is_training, epoch_size = INFINITELY_REPEAT if is_training else FULL_DATA_SWEEP)

Modelz = model(x):

h1 = Dense(400, act = relu)(x)h2 = Dense(200, act = relu)(h1)r = Dense(10, act = None)(h2)return r

Losscross_entropy_with_softmax(z,Y)

28 pix

28

pix

Model

z = model(x):h = Convolution2D((5,5),filt=8, …)(x)h = MaxPooling(…)(h)h = Convolution2D ((5,5),filt=16, …)((h)h = MaxPooling(…)(h) r = Dense(output_classes, act= None)(h)return r

Problem: Tagging entities in Air Traffic Controller (ATIS) data

Rec

show

o

Rec

burbank

From_city

Rec

to

o

Rec

seattle

To_city

Rec

flights

o

Rec

tomorrow

Date

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

0 0 1 0

Ԧ𝑥(t)

Ei = 943O= 150

Li = 150O= 300

D

i = 300O= 129a = sigmoid

ℎ(t-1) ℎ(t)

Ԧ𝑦(t)

Ԧ𝑥(t)

0 0 1 0

Text token

L

E

Ԧ𝑦(t)Class label

D

1 x 943

z = model():returnSequential([

Embedding(emb_dim=150),Recurrence(LSTM(hidden_dim=300),

go_backwards=False),Dense(num_labels = 129)

])

lr_schedule = C.learning_rate_schedule([0.05]*3 + [0.025]*2 + [0.0125], C.UnitType.minibatch, epoch_size=100)

sgd_learner = C.sgd(z.parameters, lr_schedule)

ATISTrain

96

sam

ples

(min

i-ba

tch

)

.

.

.

.

#1

#2

#3

#96

Input feature ( 96 x Ԧ𝑥(t))z = model():

returnSequential([

Embedding(emb_dim=150),Recurrence(LSTM(hidden_dim=300),

go_backwards=False),Dense(num_labels = 129)

])

Loss cross_entropy_with_softmax(z,Y)

Trainer(model, (loss, error), learner)

Trainer.train_minibatch({X, Y})

Error classification_error(z,Y)

Choose a learner(SGD, Adam, adagrad etc.)

One-hot encoded Label

(Y: 96 x 129/sampleOr word in

sequence)

t23t1

t1

t1

t1

t15

t9

t12

Function z = CNTKLib.Times(weightParam, input) + biasParam;Function loss = CNTKLib.CrossEntropyWithSoftmax(z, labelVariable);

Function conv = CNTKLib.Pooling(CNTKLib.Convolution(convParam, input), PoolingType.Average, poolingWindowShape);

Function resNetNode = CNTKLib.ReLU(CNTKLib.Plus(conv, input));

var parameterLearners = new List<Learner>() { Learner.AdamLearner(classifierOutput.Parameters(), learningRate, momentum) };

var trainer = Trainer.CreateTrainer(classifierOutput, trainingLoss, prediction, parameterLearners);

Function conv = CNTKLib.ReLU(CNTKLib.Convolution(convParams, features, strides ));

Function pooling = CNTKLib.Pooling(conv, PoolingType.Max, poolingWindow, stride, padding);

Function classifier = TestHelper.Dense(pooling, numClasses, device, Activation.None);

var minibatchSource = MinibatchSource.TextFormatMinibatchSource("Train_cntk_text.txt"), streamConfigurations, MinibatchSource.InfinitelyRepeat);

var minibatchData = minibatchSource.GetNextMinibatch(minibatchSize, device);

var arguments = new Dictionary<Variable, MinibatchData> {

{ input, minibatchData[featureStreamInfo] },

{ labels, minibatchData[labelStreamInfo] }

};

trainer.TrainMinibatch(arguments, device);

https://github.com/Microsoft/CNTK/tree/master/Examples/TrainingCSharp

Accelerating adoption of AI by developers

(consuming models)

Rise of hybrid training and scoring scenarios

Push scoring/inference to the event (edge,

cloud, on-prem)

Some developers moving into deep learning as

non-traditional path to DS / AI dev

Growth of diverse hardware arms race across all

form factors (CPU / GPU / FPGA / ASIC /

device)

Data prep

Model deployment &

management

Model lineage & auditing

Explain-ability

D A T A S C I E N C E & A I

C H A L L E N G E SK E Y T R E N D S

Challenge

• Traditional power line inspection services are

costly

• Demand for low cost image scoring and support

for multiple concurrent customers

• Needed powerful AI to execute on a drone

solution

Solution

• Deep learning to analyze multiple streaming data

feeds

• Azure GPUs support Single Shot multibox

detectors

• Reliable, consistent, and highly elastic scalability

with Azure Batch Shipyards

Drone-based electric grid inspector powered by deep learning

snow

leopard?

Deep neural network Spark ML classifier

Decision tree or logistic

regression

Image featuresImage

Class 1 Class 1

Gap

Identifying Snow LeopardsComputer vision and classification on Spark

Apps + insightsSocial

LOB

Graph

IoT

Image

CRM INGEST STORE PREP & TRAIN MODEL & SERVE

Data orchestration and monitoring

Data lake and storage

Hadoop/Spark/SQL and ML

.

IoT

Azure Machine Learning

T H E A I D E V E L O P M E N T L I F E C Y C L E

Azure Machine Learning Studio

Platform for data scientists to graphically

build and deploy experiments

• Rapid experiment composition

• > 100 easily configured modules for

data prep, training, evaluation

• Extensibility through R & Python

• Serverless training and deployment

Some numbers:

• 100’s of thousands of deployed models

serving billions of requests

Begin building now with the tools and platforms you know

Build, deploy, and

manage models at

scale

Boost productivity with

agile development

NotebooksIDEs

Azure Machine Learning Workbench

VS Code Tools for AI

N E W C A PA B I L I T I E S

Experimentation and

Model Management

Services

AZURE MACHINE LEARNING SERVICES

Spark

SQL Server

Virtual

machines

GPUs

Container

services

SQL Server

Machine Learning Server

ON-PREMISES

EDGEAzure IoT Edge

TRAIN & DEPLOY OPTIONS

AZURE

Local machine

Scale up to DSVM

Scale out with Spark on HDInsight

Azure Batch AI (Coming Soon)

ML Server

Experiment Everywhere

A ZURE ML

EXPER IMENTAT ION

Command line tools

IDEs

Notebooks in Workbench


Manage project dependencies

Manage training jobs locally, scaled-up or scaled-out

Git based checkpointing and version control

Service side capture of run metrics, output logs and models

Use your favorite IDE, and any framework

Experimentation service

U S E T H E M O S T P O P U L A R I N N O VAT I O N S

U S E A N Y TO O L

U S E A N Y F R A M E W O R K O R L I B R A R Y

DOCKER

Single node deployment (cloud/on-prem)

Azure Container Service

Azure IoT Edge

Microsoft ML Server

Spark clusters

SQL Server

Deploy Everywhere

A ZURE ML

MODEL MANAGEMENT

Deployment and management of models as HTTP

services

Container-based hosting of real time and batch

processing

Management and monitoring through Azure

Application Insights

First class support for SparkML, Python, Cognitive

Toolkit, TF, R, extensible to support others (Caffe,

MXnet)

Service authoring in Python

Manage models

AI Powered Spreadsheets

VS Code extension with deep integration to Azure

ML

End to end development environment, from new

project through training

Support for remote training

Job management

On top of all of the goodness of VS Code

(Python, Jupyter, Git, etc)


Windows and Mac based

companion for AI development

Full environment set up (Python,

Jupyter, etc)

Embedded notebooks

Run History and Comparison

experience

New data wrangling tools

Azure Machine Learning Workbench - What Is It?

AI Powered Data Wrangling

Rapidly sample, understand, and

prep data

Leverage PROSE and more for

intelligent, data prep by example

Extend/customize transforms and

featurization through Python

Generate Python and Pyspark for

execution at scale

https://microsoft.github.io/prose/

Machine Learning & AI PortfolioWhen to use what?

What engine(s) do you want to use?

Deployment target

Which experience do you want?

Build your own or consume pre-trained models?

Microsoft

ML & AI

products

Build your

own

Azure Machine Learning

Code first

(On-prem)

ML Server

On-

prem

Hadoop

SQL

Server

(cloud)

AML services (Preview)

SQL

Server

Spark Hadoop Azure

Batch

DSVM Azure

Container

Service

Visual tooling

(cloud)

AML Studio

Consume

Cognitive services, bots

http://aka.ms/aml_deep_dive

http://aka.ms/aml_deep_dive

https://aischool.microsoft.com/learning-paths/SPYpcLhRMyEAa2maw6YoU

https://channel9.msdn.com/events/Ignite/Microsoft-Ignite-Orlando-2017/BRK4033


https://www.microsoft.com/en-us/cognitive-toolkit/

https://azure.microsoft.com/services/machine-learning-services/

https://azure.microsoft.com/services/virtual-machines/data-science-virtual-machines/

https://aischool.microsoft.com/learning-paths/SPYpcLhRMyEAa2maw6YoU



https://www.microsoft.com/en-us/cognitive-toolkit/

https://azure.microsoft.com/services/machine-learning-services/

https://azure.microsoft.com/services/virtual-machines/data-science-virtual-machines/

Software

Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning Services