24
This is hpc on intel Accelerate your Inferencing with Intel® Deep Learning Boost Shailen Sobhee AI Software Technical Consultant [email protected]

Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel

Accelerate your Inferencing with Intel® Deep Learning BoostShailen Sobhee

AI Software Technical Consultant

[email protected]

Page 2: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 2

Audience pre-requisites

▪Familiar with deep learning stages

−Training and inferencing

▪Have a basic knowledge about hardware

−know what are vector registers, like AVX-512

Page 3: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 3

Outline

▪What is Intel® Deep Learning Boost (Intel® DL Boost)

▪Why is Intel® DL Boost useful?

▪What are Vector Neural Network Instructions (VNNI)

▪Sample results

Page 4: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 4

What is Intel® Deep Learning Boost ?

Intel® DL Boost:

▪extends the AVX-512 instructions

▪designed to deliver significant and more efficient Deep Learning (Inference) acceleration

▪ for deep learning workloads optimized to use the Vector Neural Network Instruction (VNNI)

▪on Intel® Xeon® Scalable processor

▪as from the 2nd generation (codename “Cascade Lake”)

Page 5: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 5

Deep Learning Foundations

• Heavy compute (Matrix Multiplications) are the foundation of many DL applications

• Multiply a row*column values, accumulate into a single value

• Traditional HPC and many AI training

workloads use floating point

• Massive dynamic range of values

(FP32 goes up to ~2^128)A [int8]

B[int8]

C[int32]

Matrix MultiplyA x B = C

Page 6: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 6

Convolution operation for inference

=

Inputs

Weights

Outputs

16

16x 4

4+

Page 7: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 7

Why do we need Intel® Deep Learning Boost?

Page 8: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 8

The key term:

Quantization

Page 9: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 9

Here’s why Quantization matters

10110110 10110110

10110110 10110110

10110110

Floating Point96.1924

32 -bit

Integer96

8 bit

Page 10: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 10

Here’s why Quantization matters

Lower Power

Lower memory bandwidth

Lower storage

Higher performance

Image credits: see backup

Important:Acceptable accuracy loss

Page 11: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel

VNNI instruction set

Page 12: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 12

Page 13: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 13

Intel® Deep Learning BoostOptimizing AI inference

vpmadd231psOUTPUT

FP32

INPUT FP32

INPUT FP32

vpmaddubsw

OUTPUT INT16

CONSTANT INT16

vpmaddwdOUTPUT

INT32

Accumulate INT32

vpadddOUTPUT

INT32

INPUT INT8

INPUT INT8

1st gen Intel® Xeon®

Scalable processor without Intel® DL

Boost

1st gen Intel® Xeon®

Scalable processor without Intel® DL

Boost

2nd gen Intel® Xeon®

Scalable processor

with Intel® DL Boost

Port0 Port5

…. ….

FMA FMA

Microarchitecture view

In a given clock cycle

VNNI VNNI

FP32

INT8

INT8VNNI

NEW

vpdpbusd OUTPUT INT32

INPUT INT8

INPUT INT8 Accumulate

INT32

Page 14: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 15

Here’s one tool in your arsenal to do it ☺

Page 15: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 16

Intel® Distribution of OpenVINO™ in a nutshell

Your Program

Trained DL

model

Model Optimizer

IROptimized

IR

IR.xml.bin

Inference Engine

CNNNetwork

FP32/FP16

MKLDNN Plugin

CPU

MyriadXPlugin

Movidius

FPGA Plugin

Arria

clDNN Plugin

GPU

GNA Plugin

GNA

1 2 3 4

Page 16: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 17

Intel® Distribution of OpenVINO™ in a nutshell

Your Program

Trained DL

model

Model Optimizer

IROptimized

IR

IR.xml.bin

Inference Engine

CNNNetwork

FP32/FP16

Page 17: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 18

Intel® Distribution of OpenVINO™ in a nutshell

Your Program

Trained DL

model

Model Optimizer

IROptimized

IR

IR.xml.bin

Inference Engine

CNNNetwork

calibration_toolFP32

FP32/FP16

IR.xml.bin

Validation data INT8 normalize process

FP16 doesn’t work

Validation Data Examples:ImageNet for ClassificationPascal VOC for Object Detection

INT8 ready IR data

INT8

+Stat data

Page 18: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel

Online StageOffline Stage

19

Intel® Distribution of OpenVINO™ in a nutshell

Your Program

Trained DL

model

Model Optimizer

IROptimized

IR

IR.xml.bin

Inference Engine

CNNNetwork

calibration_toolFP32

FP32/FP16

IR.xml.bin

Validation data

FP16 doesn’t work

Validation Data Examples:ImageNet for ClassificationPascal VOC for Object Detection

INT8 ready IR data

INT8

+Stat data

https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Int8Inference.html

Page 19: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 20

Sample resultsDemo 1

Demo 2

Both executions on Intel® Cascade Lake CPU

Page 20: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel

QUICK DEMO

Page 21: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 22

Sample results

Page 22: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 23

Key take away

Try the Intel® Distribution of OpenVINO™:

https://software.intel.com/en-us/openvino-toolkit

Benefit from faster inference speeds with INT8leveraging VNNI instructions on

Intel® Cascade Lake CPUs.

Page 23: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel 24

Summary

▪What is Intel® Deep Learning Boost (Intel® DL Boost)

▪What are Vector Neural Network Instructions (VNNI)

▪Why is Intel® DL Boost useful?

▪Intel® Distribution of OpenVINO™

Page 24: Accelerate your Inferencing with Intel® Deep Learning Boost · Intel® Distribution of OpenVINO™ in a nutshell Your Program Trained DL model Model Optimizer IR Optimized IR IR.xml.bin

This is hpc on intel

Thank you!