Using AI for Interactive Applications - NVIDIA€¦ · WinML Win32 API WinML API Input Surface...

Ahmed Zakaria

Software Engineer

Windows Graphics & AI Platform

Using AI for Interactive Applications

Low latency

Why we need Inferencing on the Edge/Client?

Flexibility

Reduced operational costs

• ML can be used in so many ways in games• Character Animation

• NPC Interaction

• Scene Generation

• Bots (testing, coop, competitive, etc)

• Art & Content pipelines

• Runtime visual FX, animation

• Telemetry/BI analysis

• And so much more....

• *Photos on right do NOT run on Windows ML; illustrative examples of ML only; with exception of Project Malmo, none are Microsoft projects.

Is it possible?

Project Malmö

Inferencing Examples: Smooth animation

Project Malmö

http://theorangeduck.com/media/uploads/other_stuff/phasefunction.pdf

Inferencing Examples: de-noising

Project Malmö

Noisy Input Recurrent Auto-Encoder Reference

http://research.nvidia.com/sites/default/files/publications/dnn_denoise_author.pdf

Inferencing Examples: Super resolution

Project Malmö

https://developer.nvidia.com/deep-learning-materials-texture

Introducing Windows Machine Learning

Application

Add Model

Reference

Model

Windows ML

1. Load – Loads model into Windows ML runtime2. Bind – Wires up inputs and outputs to model

3. Eval – Evaluates the model and products results

Training Inference

Open Neural Network Exchange

Demo Time!

Realtime Style Transfer

Unity ML-Agents 0.2 on WinML

Deep Dive!

Windows ML Architecture

• WinML API• Available to Win32 & UWP Applications

• Available on all Windows editions in 2018

• Inference Engine• Model & Device resource management

• Loads and compiles operator kernels

• Execute dataflow graph

• Device Layer• CPU instruction optimizations up to AVX-512

• DirectML generates DX12 Compute shadersDirect3D

GPU

CPUDirectML

Model Inference Engine

WinML Win32 API

WinML APIInput

Surface

Output

Surface

Application #1

WinML Runtime

Application #2

NPU

Building Blocks: ML Model

• Graph of operators & Attributes

• Input & Output Tensors

• Weights

Building Blocks: ML Model

• Graph of operators

• Input & Output Tensors

• Weights

Building Blocks: TensorsMulti-dimensional arrays of scalars

Heigh

t (H)

R

G

B

Operator Example: Fully connectedEach element in the output depends on all elements in the input

𝑦𝑖 = 𝒇(

𝑗

𝑤𝑖𝑗 ∙ 𝑥𝑗)0

x y

1

2

0

1

2


𝑦𝑖 = 𝒇(

𝑗

𝑤𝑖𝑗 ∙ 𝑥𝑗)0

x y

1

2

0

1

2

𝑤


𝑦𝑖 = 𝒇(

𝑗

𝑤𝑖𝑗 ∙ 𝑥𝑗)

Can be done as a Matrix Multiply!

0

x y

1

2

0

1

2

GPUs can do matrix multiplications very efficiently

• New hardware abstraction at a higher level

• Flexible IHV implementation • Ex. NVidia uses tensor cores for FP16 metacommand implementation

• GPU algorithm function declaration• What are the inputs and outputs to this algorithm • Does it need scratch memory• Persistent space for look up tables • Precision is well defined

• Similar to C-Style call

• Low cost for Adding new Metacommands vs DDI/API surface

• Enumeration & Reflection APIs

Building Blocks: DX12 Meta Command

Volta SM Tensor CoresDedicated hardware for machine learning acceleration

8 Tensor cores per SM

512 Fused-Multiply add (FMA) /clock

Mixed precision operation

110 TFLOPs peak on TitanV

Fully Connected as Meta Command

In

Filter

Bias

Out

Activation Fn

PrecisionFP32

FP16

ELU

HARDMAX

HARD_SIGMOID

IDENTITY

LEAKY_RELU

LINEAR

LOG_SOFTMAX

PARAMETERIZED_RELU

PARAMETERIZED_SOFTPLUS

…

Temporary Resource

Persistent Resource

Transforms

Lookups

Steps:

• Create on Device

• Initialize once on Command List

• Execute many on Command List

…

Tensors

DirectML Style transfer GPU performanceNVIDIA TitanV @ 1080p, absolute perf

3.48.6

27

0

5

10

15

20

25

30

35

HLSL FP32 FP32 MetaCommands FP16 MetaCommands with Tensor-cores

Fra

mes

per

second

2.5x

3.1x

ComPtr spRuntime = WinMLCreateRuntime();

ComPtr spModel;

spRuntime->LoadModel(GetModelPath(), &spModel);

View Graph Metadata:

spModel->GetDescription(&pDescription);

spModel->EnumerateMetadata(count, &key, &value);

spModel->EnumerateModelInputs(count, &pDescriptor);

spModel->EnumerateModelOutputs(count, &pDescriptor);

WinML API:LoadModel

• Setup the D3D Device to use for InferencingComPtr spContext;

ComPtr spDevice; // Any D3D device which is Compute capable!!!!!

spRuntime->CreateEvaluationContext(spDevice.Get(), &spContext);

• Connect input and output dataWINML_BINDING_DESC bindDescriptor;

bindDescriptor.BindType = WINML_BINDING_TENSOR;

bindDescriptor.DataType = WINML_TENSOR_FLOAT;

bindDescriptor.NumDimensions = 4;

INT64 shape[4] = { 1, 3, 1080, 1920 };

bindDescriptor.pShape = reinterpret_cast(&shape);

bindDescriptor.Tensor.pData = &floatarray;

spContext->BindValue(&bindDescriptor);

WinML API: Bind Float[]

• Setup the D3D Device to use for InferencingComPtr spContext;

ComPtr spDevice; // Any D3D device which is Compute capable!!!!!

spRuntime->CreateEvaluationContext(spDevice.Get(), &spContext);

• Connect input and output dataWINML_BINDING_DESC bindDescriptor;

bindDescriptor.BindType = WINML_BINDING_RESOURCE

bindDescriptor.DataType = WINML_TENSOR_FLOAT;

bindDescriptor.NumDimensions = 4;

INT64 shape[4] = { 1, 3, 1080, 1920 };

bindDescriptor.pShape = reinterpret_cast(&shape);

bindDescriptor.Resource.pData = &sp12Resource.Get();

spContext->BindValue(&bindDescriptor);

WinML API: Bind D3D12Resource

• Evaluate model (inference)

spRuntime->EvaluateModel(&spContext);

• Commands are queued on the compute queue

WinML API: Eval

WinML in PIX

• Windows ML is big step towards democratizing AI Inference

• DirectML with DX12 Meta Commands delivers the full performance of the hardware

• Integrates with Direct3D12 for low latency

• Runs on All DX12 GPUs

• Start experimenting with AI for games today!

Summary

• Converters• winmltools – https://pypi.org/project/winmltools/

• Project integration with Visual Studio• MLGen – in the Windows SDK

• Visualization• Netron - https://github.com/lutzroeder/Netron

• Debugging - PIX • ML events in the Compute queue

• Operator and model timing

Tools

https://pypi.org/project/winmltools/https://github.com/lutzroeder/Netron

• ONNX Models• https://github.com/onnx/models

• Windows Machine Learning Samples• https://github.com/Microsoft/Windows-Machine-Learning

• Check out our forums: DirectXTech.com• Links to Documentation• Resources for learning about ML• Compilation of recently published papers• ONNX Converters• Tools and tips• Post your questions or share your thoughts here• Stay up to date with the latest happenings

Resources

https://github.com/onnx/modelshttps://github.com/Microsoft/Windows-Machine-Learning

Questions?

© 2018 Microsoft Corporation.

All rights reserved. Microsoft, Xbox, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing

market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Using AI for Interactive Applications - NVIDIA€¦ · WinML Win32 API WinML API Input Surface...

Documents

Nationwide management of laboratory information with the NPU terminology

2020 NPU-H

Recent and Future Research for Bird-like Flapping MAVs of NPU

High Performance Broadband Mixers - API Techmicro.apitech.com/pdf/api-mixers.pdf · Available in many industry standard surface mount, drop-in and connectorized housings, these mixers

Mapbox Surface API

1 Provide high-speed Network Processors (NPUs). 2 PC The Network Processor (NPU) Vision NP (Network Platform) NPU CPU Network Processor (NPU) is a programmable

DALI Commissioning Guide (Using a NPU) Contentsedincontrols.com/tech_files/eDIN_DALI_CommissioningGuide.pdf · DALI Commissioning Guide (Using a NPU) Mode Lighting (UK) Ltd. 2010

THE NPU GLOBE 2012.pdf · 2016-04-13 · The NPU Globe Executive Team: Mr. Barry Bishop, Editor Ms. Monica Sinha, Technical & Advertising Director Northwestern Polytechnic University,

SRC: a multicore NPU-based TCP stream reassembly …bbcr.uwaterloo.ca/~xshen/paper/2014/samcnb.pdfRESEARCH ARTICLE SRC: a multicore NPU-based TCP stream reassembly card for deep packet

Subsea & Surface Equipment - Oil States Industries · PDF filecertified to API ISO 9001:2008 and is currently working towards API Q1 and API Monogram Specs 6A & 16A certification

CRM SURVEY Usus og NPU

NPU Range - smhproducts.comsmhproducts.com/media/NPU-Range-TDS-UK-v3.pdf · NPU XL range 10000 / 15000 / 20000 Order code: C000392 C000399 C000420 C000424 / 425 / 426 Case: Grey polypropylene,

Ethernet VPN Prototyping on Next Generation NPU

API Surface Tention

NS Surface ellhead SFACE - NUSTAR TECHNOLOGIES … Surface Wellhead rev1.pdf · NS Surface ellhead ® SFACE ... Specifications API 6A / ISO 10423, NACE Fluid Medium Oil, Gas, Mud

4th Nursing innovation - NPU

CROSS-CONECTIONS: CASE STUDIES ON LOCAL … Connections.pdf• After observing Case #1, RS took photos and sent them with referral via email to NPU. NPU confirmed the violation and

User Guide for SMH Negative Pressure Unitssmhproducts.com/media/NPU-UG-UK-v4.pdfThe NPU is to be operated under the supervision of competent and trained operators. NOTE Attention is

API SPEC 6AV1(2008) Wellhead Surface Safety Valvessurfacesafetyvalve.com/doc/API SPEC 6AV1(2008).pdf · sealing mechanism manufactured under API Spec 6A for PR2 sandy service safety

NPU nr Kode Analysenavn (officiel) Analysenavn (lokal) · NPU nr Kode Analysenavn (officiel) Analysenavn (lokal) NPU16002 Sys-Abcavi Syst—Abacavir; tærskel stofk. NPU21860 Lkc-kl