20
NVIDIA DEEP LEARNING SOLUTIONS Alex Sabatier – Global Account Manager [email protected]

Nvidia Deep Learning Solutions - Alex Sabatier

Embed Size (px)

Citation preview

Slide 1

Nvidia Deep Learning SolutionsAlex Sabatier Global Account [email protected]

#

1

AgendaNVIDIA GPUsDeep learning revolutionDGX1 - First AI Supercomputer

#

First I am going to start to provide some insight about NVIDIA and the GPU which it at the center of it strategy.Then we will discover why Deep learning, this new field of AI using massive dataset to create software solving amazing problems across the whole industry.I also want to mention how NVIDIA GPUs make their way in the datacenter, not only for HPC and the nascent Deep learning, but also addressing accelerated data analytics now that the amount of available data grow exponentially. You will discover whats make DGX1 an unique platform powered by the 5 miracle of the latest GPU generation named Pascal. The unique hardware configuration explains the unprecedented power of this appliance.To make it a tool of his own, NVIDIA take a great proud to work on the dedicated SW stack allowing the users to take advantage of his row power without having to spend hours configuring and loaded the various sw elements.What we wanted to achieve is to give data scientist and deep learner a ready to use platform powerfull enough where they can directly express their creativity and solve issues they were not able to address up. to now.

Ready?, so lets start.2

GPU Computing

NVIDIAComputing for the Most Demanding Users

Computing Human Imagination

Computing Human Intelligence

#

We could say that NVIDIA makes computers that are loved by the most demanding users in the world gamers, designers, and scientists.

NVIDIA pioneered GPU computing, a supercharged form of computing 10 years ago allowing CUDA, the GPU programing language to be accessible on all of our platforms including the gaming ones. Yes a gamer can have a break from his favorite game and spend some time programming the GPU he was previously using for gaming.

The GPU evolved from a 3D graphics chip into a computer platform that gives humans the power to simulate virtual worlds. The fabulous and rich worlds created by Game developers of course, but also the creations from Industrial designers and Architect. 90% of the workstation uses NVIDIA graphic solution to render and visualize their creations. The movie industry also used our Graphic solution for CGI, special effect and animation movies. NVIDIA works very closely with application developers and their users to make sure the whole chain benefits the power of the GPU.Thats not all biologist and medical imaging specialists also use GPUs to simulate a drug effect on a specific molecule. It is also used to simulate weather and make prediction, make financial simulation, used by oil an gas exploration to explore underground resource.Thats computing human imagination.

But GPU are also amazingly powerful to understand the world around them.Compute Vision and Deep learning algorithm allow computers to see and understand the world around them, recognizing the road lanes, traffic signs, truck, cars , bike, pedestrian in a slef driving car. Helping a robot to identify the object he can grab, recognizing valuable object from garbage on a recycling line at a waste processing factory.The impact of visual recognition is almost infinite.But it is not all, Machines can now listen, understand and translate what they hear. Have you tried OK Google on and android device. It works pretty well , It works even for me with my so thick French accent.With the ability to see and hear, those computers are simulating Human intelligence. GPU trained the deepmind/Google system who beat the world champion at the game of go.

So why GPU excel at those tasks?It all comes down to an architecture differences vs traditionnal CPUs

3

GPUs: Accelerate Applications

GPUCPU

#

GPU Computing is a type of heterogeneous computing that is, parallel computing with multiple processor architectures involving CPU and GPU.CPU is a an old architecture known for being sequential and featuring a complex instruction set. CPUs have now a few cores 2,4,8, up to 20 -24 for the most powerful one.GPU contains up to several thousand cores able to execute similar instructions in the same clock cycle. This is highly efficient for todays computing applications having data intensive requirements and a large amount of a parallelism. GPUs are best at accelerate these parallel parts of any applications by spreading the computation over those thousands of cores.The results if a significant speedup in application execution time. Andrew Ng, head of Baidu research in AI , view the GPU as a high throughput parallel engine.The deep learning algorithm used by Andrews team in Deep speech 2 the name of his Neural language processing system requires a lot of parallel throughput to train the model.It could not be possible to use a CPU for that. Thanks to CPU the model is so performant that he is able to understand Mandarin talked by a 5 year oldLet me show you some numbers.4

GPU vs CPU a painting analogy

#

Performance Gap Continues to GrowM2090M1060K20K80PascalGB/sM1060K20GFLOPSK80Pascal

M2090

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

#

Those 2 diagrams shows the evolution of the compute power and the memory bandwidth for both GPU (Green curve) and CPU (blue curve) during the last 8 years.It is obvious that the trend is completely different here. While the CPU compute power expressed here in Gflops ( Float point Op/sec) follows Moores law, GPU shows an ability to increase significantly its compute power generation after generation. The latest Pascal P100 tesla GPU are close to 5 GFLOPS while the latest x86 CPU are barely reaching half a GFLOPS.The performance increase is due to the parallelization of the compute distributed across an increasing number of cores. GPU have a powerful compute capability, but it is not enough.To be truly efficient data needs to arrive quickly to the cores to be processed. The curves on the right hand side shows the evolution of the memory bandwidth in Gbyte/sec. once again the diagram talk by itself.NVIDIA engineered its GPU so that the compute power and memory bandwith allow to address the latest computing need of the industry.Raw power and memory bandwidth are a great starting point, But it is when a great eco-system develop solution on the platform that its truly serve its purpose.6

Credentials built over time

300K CUDA Developers, 4x Growth in 4 yearsMajority of HPC Applications are GPU-Accelerated, 410 and Growing 100% of Deep Learning Frameworks are Accelerated287# of Applications300K

#

NVIDIA built a platform that HPC can build on. It started 10 years ago, when CUDA was accessible to developers accelerated computing is now the most important platform for HPC. No other company in the world comes close.

There is wide ecosystem of applications developers, researchers, students and users that NVIDIA feed with the latest updates, training and conferences to create a strong link that allow us to understand the actual use and future needs to provide the platform of tomorrow.

Majority of HPC apps are now accelerated. Every HPC center can enjoy the benefits of GPUs. 410 apps now (9 of the top 10 35 of the top 50)It is Driven by the worlds largest ecosystem of HPC developers. 4x growth. Why? CUDA and OpenACC uniquely provide simple way to get performance for parallel archs. DL: All frameworks optimized for GPUs

Training neural networks to perform these tasks is a huge computational challenge. It can take weeks or even months to design and train a neural network to perform a specific task at near-human levels of accuracy. Fortunately, much of this work can be performed in parallel on modern GPUs.

We created dedicated GPU accelerated and SDK to allow those frameworks to be GPU accelerated.Name some of them.

Today these GPU-accelerated DL frameworks are being used by researchers and scientists world wide to solve problems in computer vision, speech translation, natural language understanding, that were previously considered impossible to solve.

7

nvidia powers worlds leading data centers for HPC and AI

#

You may have heard of the rie of GPU in the data center.Microsoft Azure and Amazone AWS raced for the GA of K80 clusters to enable GPU cloud computing services.Google is a massive users of NVIDA GPU, Jeff Dean, the head of search came to our annual GPU conference named GTC and claim during his key note that Deep learning wont be possible without GPUs CPU are too slow.That was a massive endorsement.IBM congnitive machine Watson is using NVIDIA GPU.The US most powerful supercomputer also relies on NVIDIA GPU at Oak Ridge and Livermore National lab.

8

Deep learningA NEW computing model

little girl is eating piece of cake"

LEARNINGALGORITHM

millions of trillions of FLOPS

Device

Inference

Training

#

Because the massively parallel GPU was widely available, more and more people started to pick it up.4-5 years ago, AI researchers discovered it.The combination of deep neural networks, big data and powerful GPU platforms reignited AI. Deep learning a new computing model where networks are trained to extract features from massive amounts of data has proven to be remarkably effective at solving some of the most complex problems in computer science.

In this example a network is trained with an millions of images, the learning phase requires trillions and trillions of operation to create a Neural network.This Neural net is now able to recognize and describe an image he never seen before.

When the Neural Net struggle to understand a picture, the image I analysed by data scientist and added in the initial data set to train an update model which will be successful when a similar picture is presented.

It reminds me this quote from Mandela: I never lose, either I win or I learn somethingDeep learning allow NN to learn from their mistake.9

impact of AI is huge for enterprises

Google: Hundreds of millions of dollars in power savings with AI-operated data centerNetflix: $1 billion savings per year with AI-assisted recommendation engineAdTheorent: 300% higher user engagement for mobile advertisers during shopping season

#

Deep learning applications are highly valuable, let me give you some examples:Netflix uses a Deepl learning powered recommendation algorithm that submit personalized sugestions to the user. Netflix explainedStrong recommendations increase the amount of time viewers watch content on Netflix keeping subscriber churn as low as possible. According to apaperpublished by Netflix executives, the on-demand video streaming service claims its AI assisted recommendation system saves the company $1 billion per year.

Google was a pioneer in Deep learning using GPU with Search. Sundar Pichai, head of Android, recently praised deep learning for new applications like the assistant or the new messaging app Hello who provides pre-formatted contextual answers to text messages.Google nailed the value of Deep learning there is now 1600 application using deep learning.They even use it to manage the power of their numerous Data center. The neural networks control "about 120 variables in the data centers," including "the fans and the cooling systems, the windows and other things." The AI worked out the most efficient methods of cooling by analyzing data from sensors among the server racks, including information on things like temperatures and pump speeds..

http://www.theverge.com/2016/7/21/12246258/google-deepmind-ai-data-center-cooling

"One last example that might help add context to the use of deep learning and the type of results it can deliver isAdTheorent, This company built a DL system to deliver real-time bidding assistance to advertisers bidding on ads for mobile devices. It helped advertisers reach an engagement level that was 200 to 300 per cent higher than the industry average for the last holiday shopping season. This is an example of a system that needs a very fast response time and carries an incredible value.

10

nvidia DGX-1AI Supercomputer-in-a-Box170 TFLOPS | 8x Tesla P100 16GB | NVLink Hybrid Cube Mesh2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U 3200W

#

Because we wanted the whole community of data scientists and Deep learners, across all industries, to be able to create those fantastic uses cases we decide to build the ultimate appliance for it.We best knows GPU and the way to accelerate the DL framework so we create DGX1 to unleash DL enormous promise.DGX1 is an AI supercomputer-in-a-box.DGX-1 is a plug-and-play appliance with the computing power of a 250-node HPC cluster. I told you a shrink version of a data center delivered with a built-in software stack supporting the main DL frameworks and GPU accelerated database and graph applications.A dream come true for the users who just have to focus on their task nd not spend cycle to prepare and maintain their software tools. Lets have a look inside:11

#

170 TFLOPS FP16Accelerates Major AI FrameworksDual 10GbE, Quad IB 100Gb3RU 3200W

nvidia DGX-1

NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.

#

DGX1 delivered 170 Tflops of FP16 performances thanks to 8 GPU P100 sxm2 with 16GB of HBM2 each ensuring a memory bandwidth of 732GByte/s for a total of 28, 672 CUDA cores!!!Those 8 GPUs are connected with NVlink: NVIDIA new interconnect interface that allow data to be shared between GPU 5 times faster than the latest PCIe Gen3 bus.The GPU are connected in a hybrid Cube mesh configuration, which allow the farthest GPU to be only 2 node away. It is ideal when you want large job to be spread across several GPUS. This is what you want for highly scalable systems.

2 Xeon CPU with 20 cores with a total of 256GB of system memory are used for scheduling and pushing the data to the GPU.That 3RU box requires a total of 3200W of power supply and will be connected to othe DGX1 boxes or the rest for your datacenter with a Dual 10GB Ethernet and a quad infiniband 100Gbps EDR interface.

P100 SXM2 is a new generation of GPU with a dedicated form factor specifically designed for the data center. 13

Dgx-1 A league of its ownCaffe on DeepMark. GeForce TITAN X and GTX 1080 system: Intel Core i7-5930K @ 3.5 GHz, 64 GB System Memory |Tesla P100 (SXM2) system: Dual CPU server, Intel E5-2698 v4 @ 2.2 GHz, 256 GB System Memory1X

GeForce GTX TITAN XGeForce GTX 1080Tesla P100Quadro VCA(8X Quadro M6000)DGX-1(8X Tesla P100)

#

14

Instant productivity plug-and-play, supports every AI framework and accelerated analytics software applications

Performance optimized across the entire stack

Always up-to-date via the cloud

Mixed framework environments baremetal and containerized

Direct access to NVIDIA expertsDGX STACKComplete Analytics and Deep Learning platform

#

with the DGX-1 platform, you now have all of the tools needed for realizing the benefits of GPU-accelerated deep learning. NVIDIA already provides a number of libraries for accelerating computational performance through GPUs. Some of these libraries, like NCCL, have actually been optimized specifically for the 8-GPU architecture on DGX-1. NVIDIA Docker containers provide packaged applications such as DL frameworks that are multi-GPU aware, AND you can schedule and run these from the NVIDIA cloud service. All these tools also make it easy for you to build, package, and run your own containerized applications on DGX-1. In other words, with NVIDIA libraries, containers, and DGX-1, youve got everything you need for developing and running multi-GPU accelerated applications.

Container Based Applications Easily run accelerated computing and deep learning frameworks through GPU aware containers. Build your own containers and host them in a private repository through the cloud service.

NVIDIA Cloud Management Manage your node or cluster and run containers from the NVIDIA cloud service. Connecting to the cloud is easy; just plug in power and internet and youre ready to go.15

NVIDIA Expertise At every step

Solution Architects

Global Network of Partners

Deep Learning Institute

GTC Conferences1:1 supportNetwork training setupNetwork optimizationCertified expert instructorsWorldwide workshopsOnline coursesEpicenter of industry leadersOnsite trainingGlobal reach NVIDIA Partner NetworkOEMsStartups

Need image

#

16

Nvidia deep learning partners

Graph and Data AnalyticsEnterprisesData ManagementDL FrameworksEnterprise DLServicesCore Analytics Tech

#

17

Nvidia deep learning everywhere,Every platformTITAN XAvailable via etail in200+ countriesDGX-1The AI appliance for instant productivity

TESLAServers in every shape and size

CLOUDEverywhere

#

Deep learning is a fundamentally new software model that needs a new computing platformGPU computing is an ideal approach and the GPU is the ideal processor.A combination of factors is essential to create a new computing platform performance, programming productivity, and open accessibility.

Performance. NVIDIA GPUs are naturally great at parallel workloads and speed-up DNNs by 10-20x, reducing each of the many training iterations from weeks to days.

Programmability. AI innovation is on a breakneck pace. Ease of programming and developer productivity is paramount. The programmability and richness of NVIDIAs CUDA platform allow researchers to innovate quickly building new configurations of CNNs, DNNs, deep inception networks, RNNs, LSTMs, and reinforcement learning networks.

Accessibility. Developers want to create anywhere and deploy everywhere. NVIDIA GPUs are available all over the world, from every PC OEM; in desktops, notebooks, servers, or supercomputers; and in the cloud from Alibaba, Amazon, Baidu, Google, IBM and Microsoft.

18

Visit the Deep Learning webpagehttp://www.nvidia.com/object/deep-learning.html

#

Visit the Deep Learning page today19

#

20