Dell EMC Machine Learning Strategy€¦ · not be transformed by AI in the next decade. ” –Andrew Ng, former Baidu Chief Scientist “AI is the most far-reaching technological

Dell EMC Machine Learning Strategy

Jay Boisseau

HPC Technical Strategist

June 22, 2017

Restricted - Confidential

4Restricted - Confidential

Tech Leaders Are Proclaiming Disruption, Revolution…

“Just as electricity 100 years ago transformed industry after industry after industry, I think AI

powered by deep learning will now do the same… It’s hard to think of an industry that will

not be transformed by AI in the next decade. ” – Andrew Ng, former Baidu Chief Scientist

“AI is the most far-reaching technological advancement in our lifetime. It changes every

industry, every company, every thing.” – Jen-Hsun Huang, Nvidia CEO

“Smart machine technologies will be the most disruptive class of innovations over the next

10 years due to their computational power, scalability in analyzing large-scale data sets,

and rapid advances in neural networks.” – Gartner Report

Cognitive computing will become “the largest consumer of computing cycles by 2020” –

Rob High, IBM Watson CTO

“Machine learning is HPC’s 1st consumer killer app” – Jen-Hsun Huang


Tech Leaders Are Proclaiming Disruption, Revolution…

“When you get all this data coming in from hundreds of billions of connected

devices and apply to that artificial intelligence, it’s almost like a fourth

industrial revolution and an incredible opportunity for companies to re-

imagine themselves in this digital age….The last 30 years have been

incredible in IT, but the next 30 years will make it look like child’s play.”

– Michael Dell

Sapphire SAP Conference May 16th 2017


The Hype Is High…


The Hype Is High…


But Analysts & Experts Agree on AI Importance

Cognitive/AI


AI Has Already Proven Superior to Experts, Other Methods in Many Areas

• In chess and Go and poker

• In sequence analysis and tumor detection

• In retail suggestions and fraud detection

• In intrusion attempt analysis to terrorist threat detection

• In AI agents, autonomous driving, robots, and more


AI solves business problem across many verticalsID says in 2018, 75% of enterprise and ISV development will include AI//ML/cognitive in at least one application


Machine Learning – Disruptive, Transformative

Artificial Intelligence

Machine Learning

(Statistical)

A B

C

Deep Learning

(Neural Networks)

Artificial Intelligence is the broader concept of

machines being able to carry out tasks in a way

that we would consider “smart”.

Machine Learning is a current application of

AI based around the idea that we should really

just be able to give machines access to data

and let them learn for themselves.

Deep Learning is an area of Machine Learning

research, which has been introduced with the

objective of moving Machine Learning closer to

one of its original goals: Artificial Intelligence

A Neural Network is a computer system designed to work

by classifying information in the same way a human brain

does. It can be taught to recognize, for example, images,

and classify them according to elements they contain.


Background: Intelligence from Processing

Machine Learning – condensing data

into a high-dimensional probability

model to be used for:

CLASSIFICATION – using the model

to label or tag the data

INFERENCE – using the model to

deduce probable inputs given some

outputs

JUDGEMENT – summarizing the

content of the probability model

PREDICTION – using the model to

deduce probable outputs given some

inputs

12


Background: The Machine Learning Process

13

ML Algorithm

Scoring

Classification

Engine

trained

parameters

or weights (Ŵ)

training data

useful

intelligence

iterate

until

satisfied

real world data

Training Phase Use Phase

The classification

engine implements

the ML model


Background: Machine Learning Requires Matrix Math Matrix form of RSS* (minimize the error)

y = true value

H x w = predicted

ɛ = error

14

+=

W = parametersor coefficients

*residual sum of squares

H = training data

15Restricted - Confidential http://scikit-learn.org/

16Restricted - Confidential http://scikit-learn.org/


REARCHITECTING THE ARTIFICIAL BRAINDeep Neural Networks Learn Features in Layers


Deep Learning – Train Model, Then Inference Against

Training

Computationally

Intensive: massive

data, massive

computations in

neural net

Billion of Tflops per

training run (train a

model)

Can sacrifice precision

(e.g. FP32, FP16) for

more performance

Inference

Less computationally

intensive, but still must be

fast (and often low

power)

Doesn’t require high

precision math, so can

use accelerators like

GPU & FPGA with INT8

support.

Can also be run in Xeon

& Xeon-Phi based

systems.

Scalable

Training

Neural Model

Scalable

Inference

Edge/Users


Background: Math Performance is Key

• Most of the recent performance gains by GPUs and KNM is due

to precision optimizations:

• But there is one more optimization step: specialized silicon

– Special 16 bit precision enhancements

– Better internal network, i.e. graph support and more connectivity

– Better use of memory

19

64-bit DP 16-bit HP32-bit SP Some

8 bitPrecision Evolution:


POV Point #1: Deep Learning is HPC

• Deep learning training certainly requires HPC techniques (and so

do other ML, and HPDA, techniques)

– HPC is not an acronym for 'parallel 3D PDE-solving time-dependent

applications written using MPI'

– HPC means high performance computing: computing in which the purpose and

design is for greater performance than any mainstream computing (mobile,

desktop, laptop, enterprise server)

• Deep learning is very data-driven, so it is also HPDA

• Note: Dell Ready Solutions & Alliances group includes both HPC and

HPDA solutions, so we're internally aligned


POV Point #2: Optimized DL Servers Are Needed

• DL workloads are different than traditional (simulation-based)

HPC workloads, and thus require different optimized servers

– DL is two phased process› training (which includes scoring, a kind of inferencing)

› inferencing

– Neither are dependent on 64-bit computations› can sacrifice precision for performance

› Inferencing can sometimes be accomplished in 8-bit!

– Training requires from very highly parallel processors

– Inferencing can be conducted by very efficient, lightweight processors


POV Point #3: We Must Evaluate Multiple, Diverse Optimized Solutions

• DL training solutions:– GPU-based: great at matrix math; V100 now optimize for tensor operations(!) *Volta*

– KNM: extra focus on scaling, variable precision

– Dedicated silicon› Nervana

› Graphcore

› Knupath

› Wave

• DL inferencing solutions– Xeon

– GPUs

– FPGAs

– dedicated silicon (as Google is doing with TPU, Apple with forthcoming neural chip, etc.)


Background: What are Frameworks?

Deep Learning Frameworks

TesnorFlow, MxNet, CNTK, Chainer,

Neon, Theano…

Neural network Libraries

cuDNN, cuBLAS, MKL, NCCL…

Hardware

All these frameworks allow deep learning

researchers to build models. They include basic

building blocks like layers which can be connected in

different ways to create a model.

In order to train the deep learning models, the

frameworks work with underlying neural network

libraries such as NVIDIA's cuDNN and Intel's MKL.

These libraries implement operations such as matrix

multiply that are important to deep learning models.

Finally, the models are trained on hardware like

NVIDIA GPUs, Intel's Xeon Phi processor or other

specialized processors.


Background: Most AI Frameworks Are Open Source

Key points: All the frameworks are open-source but some of the frameworks

supported by major players are:• TensorFlow: Google

• Mxnet: Amazon

• CNTK: Microsoft

• Intel : Neon

• Apple: Turi

We don’t have to develop the frameworks but we need to develop

the server that can give us the maximum performance.


POV Point #4; Customer Success Requires Solutions, Expertise – Not Just Optimized Servers

Machine Learning is changing and evolving weekly. Dell EMC needs to continue to evaluate new solutions—both HW and SW—and gain experience with applications and verticals. So how should we measure and benchmark potential products?

CUSTOMER SUCCESS – to be a broad-based supplier we need all Dell customers to be successful. Therefore, in addition to high performance ML/DL servers, we need solutions that provide:

– Ease of use

– Accuracy

– Hyper-tuning

This will require solutions with software & services to help customer use frameworks effectively

This is what we need to “benchmark”

not just performance of frameworks


Dell Activities Leading To Roadmap, New Solutions

• Conducting many, many software and hardware technology assessments & evaluations (e.g. GPUs, KNM, FPGAs, Nervana, Graphcore), and POCs (started 2016), e.g. BitFusion, others

• Benchmarking major frameworks (e.g. TensorFlow, CNTK, MXNet, Caffe2) to study performance & scaling characteristics, to create optimal solutions

• Evaluating potential software & services partners to offer best-in-class solutions

• Collecting customer solutions requirements, experiences, successes to develop complete solutions that provide max ROI

• Finalizing strategy and solutions roadmap – target summer 2017.


POV Point #5: It is NOT all about the hyperscalers!

• Same arguments for on-prem infrastructure apply here as for HPC

• No, they do not already have all your enterprise/science data—most corporate data remains on-prem

• Data origination, movement, security policies, etc. all remain challenges for public cloud usage

• Biggest advantage of hyperscalers currently is APIs (not cost)--but that's software and thus manageable


POV Point #6: You will want/need DL even if your current workload is mostly simulation

• You will want/need DL to complement your current HPC

efforts when

– there is no simulation option

› no physical laws for your problem

› uncertain/incomplete physics

– simulation accuracy is limited

› accurate (limits of non-linear dynamics simulations)

– for advanced data analysis of simulation output data


Summary & Closing Thoughts

• We get that it's hard--and we are evaluating & benching everything we can,

assessing software partners, and mapping solutions for customers and

workloads, and preparing a comprehensive roadmap (Summer 2017)

• DL is another part of a comprehensive IT solutions stack--but a very

important part for data rich problems (perhaps the most important)

• Some things that just weren't feasible before—autonomous driving, (good)

AR, robotics, etc.—will become realities!

• It's early, but we intend to win--and our vast enterprise presence will be

invaluable. So will your expertise and our partners (up next)

• Questions (both ways)?

Documents

Dell EMC Machine Learning Strategy€¦ · not be transformed by AI in the next decade. ” –Andrew Ng, former Baidu Chief Scientist “AI is the most far-reaching technological