Using AI for Interactive Applications - NVIDIA€¦ · WinML Win32 API WinML API Input Surface Output Surface Application #1 WinML Runtime Application #2 NPU. Building Blocks: ML

Ahmed Zakaria

Software Engineer

Windows Graphics & AI Platform

Using AI for Interactive Applications

Low latency

Why we need Inferencing on the Edge/Client?

Flexibility

Reduced operational costs

• ML can be used in so many ways in games• Character Animation

• NPC Interaction

• Scene Generation

• Bots (testing, coop, competitive, etc)

• Art & Content pipelines

• Runtime visual FX, animation

• Telemetry/BI analysis

• And so much more....

• *Photos on right do NOT run on Windows ML; illustrative examples of ML only; with exception of Project Malmo, none are Microsoft projects.

Is it possible?

Project Malmö

Inferencing Examples: Smooth animation

Project Malmö

http://theorangeduck.com/media/uploads/other_stuff/phasefunction.pdf

Inferencing Examples: de-noising

Project Malmö

Noisy Input Recurrent Auto-Encoder Reference

http://research.nvidia.com/sites/default/files/publications/dnn_denoise_author.pdf

Inferencing Examples: Super resolution

Project Malmö

https://developer.nvidia.com/deep-learning-materials-texture

Introducing Windows Machine Learning

Application

Add Model

Reference

Model

Windows ML

1. Load – Loads model into Windows ML runtime2. Bind – Wires up inputs and outputs to model

3. Eval – Evaluates the model and products results

Training Inference

Open Neural Network Exchange

Demo Time!

Realtime Style Transfer

Unity ML-Agents 0.2 on WinML

Deep Dive!

Windows ML Architecture

• WinML API• Available to Win32 & UWP Applications

• Available on all Windows editions in 2018

• Inference Engine• Model & Device resource management

• Loads and compiles operator kernels

• Execute dataflow graph

• Device Layer• CPU instruction optimizations up to AVX-512

• DirectML generates DX12 Compute shadersDirect3D

GPU

CPUDirectML

Model Inference Engine

WinML Win32 API

WinML APIInput

Surface

Output

Surface

Application #1

WinML Runtime

Application #2

NPU

Building Blocks: ML Model

• Graph of operators & Attributes

• Input & Output Tensors

• Weights

Building Blocks: ML Model

• Graph of operators

• Input & Output Tensors

• Weights

Building Blocks: TensorsMulti-dimensional arrays of scalars

Heigh

t (H)

R

G

B

Operator Example: Fully connectedEach element in the output depends on all elements in the input

𝑦𝑖 = 𝒇(

𝑗

𝑤𝑖𝑗 ∙ 𝑥𝑗)0

x y

1

2

0

1

2


𝑦𝑖 = 𝒇(

𝑗

𝑤𝑖𝑗 ∙ 𝑥𝑗)0

x y

1

2

0

1

2

𝑤


𝑦𝑖 = 𝒇(

𝑗

𝑤𝑖𝑗 ∙ 𝑥𝑗)

Can be done as a Matrix Multiply!

0

x y

1

2

0

1

2

GPUs can do matrix multiplications very efficiently

• New hardware abstraction at a higher level

• Flexible IHV implementation • Ex. NVidia uses tensor cores for FP16 metacommand implementation

• GPU algorithm function declaration• What are the inputs and outputs to this algorithm • Does it need scratch memory• Persistent space for look up tables • Precision is well defined

• Similar to C-Style call

• Low cost for Adding new Metacommands vs DDI/API surface

• Enumeration & Reflection APIs

Building Blocks: DX12 Meta Command

Volta SM Tensor CoresDedicated hardware for machine learning acceleration

8 Tensor cores per SM

512 Fused-Multiply add (FMA) /clock

Mixed precision operation

110 TFLOPs peak on TitanV

Fully Connected as Meta Command

In

Filter

Bias

Out

Activation Fn

PrecisionFP32

FP16

ELU

HARDMAX

HARD_SIGMOID

IDENTITY

LEAKY_RELU

LINEAR

LOG_SOFTMAX

PARAMETERIZED_RELU

PARAMETERIZED_SOFTPLUS

…

Temporary Resource

Persistent Resource

Transforms

Lookups

Steps:

• Create on Device

• Initialize once on Command List

• Execute many on Command List

…

Tensors

DirectML Style transfer GPU performanceNVIDIA TitanV @ 1080p, absolute perf

3.48.6

27

0

5

10

15

20

25

30

35

HLSL FP32 FP32 MetaCommands FP16 MetaCommands with Tensor-cores

Fra

mes

per

second

2.5x

3.1x

ComPtr spRuntime = WinMLCreateRuntime();

ComPtr spModel;

spRuntime->LoadModel(GetModelPath(), &spModel);

View Graph Metadata:

spModel->GetDescription(&pDescription);

spModel->EnumerateMetadata(count, &key, &value);

spModel->EnumerateModelInputs(count, &pDescriptor);

spModel->EnumerateModelOutputs(count, &pDescriptor);

WinML API:LoadModel

• Setup the D3D Device to use for InferencingComPtr spContext;

ComPtr spDevice; // Any D3D device which is Compute capable!!!!!

spRuntime->CreateEvaluationContext(spDevice.Get(), &spContext);

• Connect input and output dataWINML_BINDING_DESC bindDescriptor;

bindDescriptor.BindType = WINML_BINDING_TENSOR;

bindDescriptor.DataType = WINML_TENSOR_FLOAT;

bindDescriptor.NumDimensions = 4;

INT64 shape[4] = { 1, 3, 1080, 1920 };

bindDescriptor.pShape = reinterpret_cast(&shape);

bindDescriptor.Tensor.pData = &floatarray;

spContext->BindValue(&bindDescriptor);

WinML API: Bind Float[]

• Setup the D3D Device to use for InferencingComPtr spContext;

ComPtr spDevice; // Any D3D device which is Compute capable!!!!!

spRuntime->CreateEvaluationContext(spDevice.Get(), &spContext);

• Connect input and output dataWINML_BINDING_DESC bindDescriptor;

bindDescriptor.BindType = WINML_BINDING_RESOURCE

bindDescriptor.DataType = WINML_TENSOR_FLOAT;

bindDescriptor.NumDimensions = 4;

INT64 shape[4] = { 1, 3, 1080, 1920 };

bindDescriptor.pShape = reinterpret_cast(&shape);

bindDescriptor.Resource.pData = &sp12Resource.Get();

spContext->BindValue(&bindDescriptor);

WinML API: Bind D3D12Resource

• Evaluate model (inference)

spRuntime->EvaluateModel(&spContext);

• Commands are queued on the compute queue

WinML API: Eval

WinML in PIX

• Windows ML is big step towards democratizing AI Inference

• DirectML with DX12 Meta Commands delivers the full performance of the hardware

• Integrates with Direct3D12 for low latency

• Runs on All DX12 GPUs

• Start experimenting with AI for games today!

Summary

• Converters• winmltools – https://pypi.org/project/winmltools/

• Project integration with Visual Studio• MLGen – in the Windows SDK

• Visualization• Netron - https://github.com/lutzroeder/Netron

• Debugging - PIX • ML events in the Compute queue

• Operator and model timing

Tools

https://pypi.org/project/winmltools/https://github.com/lutzroeder/Netron

• ONNX Models• https://github.com/onnx/models

• Windows Machine Learning Samples• https://github.com/Microsoft/Windows-Machine-Learning

• Check out our forums: DirectXTech.com• Links to Documentation• Resources for learning about ML• Compilation of recently published papers• ONNX Converters• Tools and tips• Post your questions or share your thoughts here• Stay up to date with the latest happenings

Resources

https://github.com/onnx/modelshttps://github.com/Microsoft/Windows-Machine-Learning

Questions?

© 2018 Microsoft Corporation.

All rights reserved. Microsoft, Xbox, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing

market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Documents

Using AI for Interactive Applications - NVIDIA€¦ · WinML Win32 API WinML API Input Surface Output Surface Application #1 WinML Runtime Application #2 NPU. Building Blocks: ML