Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel

Accelerate your Inferencing with Intel® Deep Learning BoostShailen Sobhee

AI Software Technical Consultant

[email protected]

This is hpc on intel 2

Audience pre-requisites

▪Familiar with deep learning stages

−Training and inferencing

▪Have a basic knowledge about hardware

−know what are vector registers, like AVX-512


Outline

▪What is Intel® Deep Learning Boost (Intel® DL Boost)

▪Why is Intel® DL Boost useful?

▪What are Vector Neural Network Instructions (VNNI)

▪Sample results


What is Intel® Deep Learning Boost ?

Intel® DL Boost:

▪extends the AVX-512 instructions

▪designed to deliver significant and more efficient Deep Learning (Inference) acceleration

▪ for deep learning workloads optimized to use the Vector Neural Network Instruction (VNNI)

▪on Intel® Xeon® Scalable processor

▪as from the 2nd generation (codename “Cascade Lake”)


Deep Learning Foundations

• Heavy compute (Matrix Multiplications) are the foundation of many DL applications

• Multiply a row*column values, accumulate into a single value

• Traditional HPC and many AI training

workloads use floating point

• Massive dynamic range of values

(FP32 goes up to ~2^128)A [int8]

B[int8]

C[int32]

Matrix MultiplyA x B = C


Convolution operation for inference

=

Inputs

Weights

Outputs

16

16x 4

4+


Why do we need Intel® Deep Learning Boost?


The key term:

Quantization


Here’s why Quantization matters

10110110 10110110

10110110 10110110

10110110

Floating Point96.1924

32 -bit

Integer96

8 bit


Here’s why Quantization matters

Lower Power

Lower memory bandwidth

Lower storage

Higher performance

Image credits: see backup

Important:Acceptable accuracy loss


VNNI instruction set



Intel® Deep Learning BoostOptimizing AI inference

vpmadd231psOUTPUT

FP32

INPUT FP32

INPUT FP32

vpmaddubsw

OUTPUT INT16

CONSTANT INT16

vpmaddwdOUTPUT

INT32

Accumulate INT32

vpadddOUTPUT

INT32

INPUT INT8

INPUT INT8

1st gen Intel® Xeon®

Scalable processor without Intel® DL

Boost

1st gen Intel® Xeon®

Scalable processor without Intel® DL

Boost

2nd gen Intel® Xeon®

Scalable processor

with Intel® DL Boost

Port0 Port5

…. ….

FMA FMA

Microarchitecture view

In a given clock cycle

VNNI VNNI

FP32

INT8

INT8VNNI

NEW

vpdpbusd OUTPUT INT32

INPUT INT8

INPUT INT8 Accumulate

INT32


Here’s one tool in your arsenal to do it ☺


Intel® Distribution of OpenVINO™ in a nutshell

Your Program

Trained DL

model

Model Optimizer

IROptimized

IR

IR.xml.bin

Inference Engine

CNNNetwork

FP32/FP16

MKLDNN Plugin

CPU

MyriadXPlugin

Movidius

FPGA Plugin

Arria

clDNN Plugin

GPU

GNA Plugin

GNA

1 2 3 4



Your Program

Trained DL

model

Model Optimizer

IROptimized

IR

IR.xml.bin

Inference Engine

CNNNetwork

FP32/FP16



Your Program

Trained DL

model

Model Optimizer

IROptimized

IR

IR.xml.bin

Inference Engine

CNNNetwork

calibration_toolFP32

FP32/FP16

IR.xml.bin

Validation data INT8 normalize process

FP16 doesn’t work

Validation Data Examples:ImageNet for ClassificationPascal VOC for Object Detection

INT8 ready IR data

INT8

+Stat data


Online StageOffline Stage

19


Your Program

Trained DL

model

Model Optimizer

IROptimized

IR

IR.xml.bin

Inference Engine

CNNNetwork

calibration_toolFP32

FP32/FP16

IR.xml.bin

Validation data

FP16 doesn’t work

Validation Data Examples:ImageNet for ClassificationPascal VOC for Object Detection

INT8 ready IR data

INT8

+Stat data

https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Int8Inference.html

https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Int8Inference.html


Sample resultsDemo 1

Demo 2

Both executions on Intel® Cascade Lake CPU


QUICK DEMO


Sample results


Key take away

Try the Intel® Distribution of OpenVINO™:

https://software.intel.com/en-us/openvino-toolkit

Benefit from faster inference speeds with INT8leveraging VNNI instructions on

Intel® Cascade Lake CPUs.

https://software.intel.com/en-us/openvino-toolkit


Summary

▪What is Intel® Deep Learning Boost (Intel® DL Boost)

▪What are Vector Neural Network Instructions (VNNI)

▪Why is Intel® DL Boost useful?

▪Intel® Distribution of OpenVINO™


Thank you!

Documents

Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin