View
12
Download
0
Category
Preview:
Citation preview
Ahmed Zakaria
Software Engineer
Windows Graphics & AI Platform
Using AI for Interactive Applications
Low latency
Why we need Inferencing on the Edge/Client?
Flexibility
Reduced operational costs
• ML can be used in so many ways in games• Character Animation
• NPC Interaction
• Scene Generation
• Bots (testing, coop, competitive, etc)
• Art & Content pipelines
• Runtime visual FX, animation
• Telemetry/BI analysis
• And so much more....
• *Photos on right do NOT run on Windows ML; illustrative examples of ML only; with exception of Project Malmo, none are Microsoft projects.
Is it possible?
Project Malmö
Inferencing Examples: Smooth animation
Project Malmö
http://theorangeduck.com/media/uploads/other_stuff/phasefunction.pdf
Inferencing Examples: de-noising
Project Malmö
Noisy Input Recurrent Auto-Encoder Reference
http://research.nvidia.com/sites/default/files/publications/dnn_denoise_author.pdf
Inferencing Examples: Super resolution
Project Malmö
https://developer.nvidia.com/deep-learning-materials-texture
Introducing Windows Machine Learning
Application
Add Model
Reference
Model
Windows ML
1. Load – Loads model into Windows ML runtime2. Bind – Wires up inputs and outputs to model
3. Eval – Evaluates the model and products results
Training Inference
Open Neural Network Exchange
Demo Time!
Realtime Style Transfer
Unity ML-Agents 0.2 on WinML
Deep Dive!
Windows ML Architecture
• WinML API• Available to Win32 & UWP Applications
• Available on all Windows editions in 2018
• Inference Engine• Model & Device resource management
• Loads and compiles operator kernels
• Execute dataflow graph
• Device Layer• CPU instruction optimizations up to AVX-512
• DirectML generates DX12 Compute shadersDirect3D
GPU
CPUDirectML
Model Inference Engine
WinML Win32 API
WinML APIInput
Surface
Output
Surface
Application #1
WinML Runtime
Application #2
NPU
Building Blocks: ML Model
• Graph of operators & Attributes
• Input & Output Tensors
• Weights
Building Blocks: ML Model
• Graph of operators & Attributes
• Input & Output Tensors
• Weights
Building Blocks: ML Model
• Graph of operators
• Input & Output Tensors
• Weights
Building Blocks: ML Model
• Graph of operators
• Input & Output Tensors
• Weights
Building Blocks: TensorsMulti-dimensional arrays of scalars
Heigh
t (H)
R
G
B
Operator Example: Fully connectedEach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)0
x y
1
2
0
1
2
Operator Example: Fully connectedEach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)0
x y
1
2
0
1
2
𝑤
Operator Example: Fully connectedEach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)0
x y
1
2
0
1
2
𝑤
Operator Example: Fully connectedEach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)0
x y
1
2
0
1
2
𝑤
Operator Example: Fully connectedEach element in the output depends on all elements in the input
𝑦𝑖 = 𝒇(
𝑗
𝑤𝑖𝑗 ∙ 𝑥𝑗)
Can be done as a Matrix Multiply!
0
x y
1
2
0
1
2
GPUs can do matrix multiplications very efficiently
• New hardware abstraction at a higher level
• Flexible IHV implementation • Ex. NVidia uses tensor cores for FP16 metacommand implementation
• GPU algorithm function declaration• What are the inputs and outputs to this algorithm • Does it need scratch memory• Persistent space for look up tables • Precision is well defined
• Similar to C-Style call
• Low cost for Adding new Metacommands vs DDI/API surface
• Enumeration & Reflection APIs
Building Blocks: DX12 Meta Command
Volta SM Tensor CoresDedicated hardware for machine learning acceleration
8 Tensor cores per SM
512 Fused-Multiply add (FMA) /clock
Mixed precision operation
110 TFLOPs peak on TitanV
Fully Connected as Meta Command
In
Filter
Bias
Out
Activation Fn
PrecisionFP32
FP16
ELU
HARDMAX
HARD_SIGMOID
IDENTITY
LEAKY_RELU
LINEAR
LOG_SOFTMAX
PARAMETERIZED_RELU
PARAMETERIZED_SOFTPLUS
…
Temporary Resource
Persistent Resource
Transforms
Lookups
Steps:
• Create on Device
• Initialize once on Command List
• Execute many on Command List
…
Tensors
DirectML Style transfer GPU performanceNVIDIA TitanV @ 1080p, absolute perf
3.48.6
27
0
5
10
15
20
25
30
35
HLSL FP32 FP32 MetaCommands FP16 MetaCommands with Tensor-cores
Fra
mes
per
second
2.5x
3.1x
Code!
ComPtr spRuntime = WinMLCreateRuntime();
ComPtr spModel;
spRuntime->LoadModel(GetModelPath(), &spModel);
View Graph Metadata:
spModel->GetDescription(&pDescription);
spModel->EnumerateMetadata(count, &key, &value);
spModel->EnumerateModelInputs(count, &pDescriptor);
spModel->EnumerateModelOutputs(count, &pDescriptor);
WinML API:LoadModel
• Setup the D3D Device to use for InferencingComPtr spContext;
ComPtr spDevice; // Any D3D device which is Compute capable!!!!!
spRuntime->CreateEvaluationContext(spDevice.Get(), &spContext);
• Connect input and output dataWINML_BINDING_DESC bindDescriptor;
bindDescriptor.BindType = WINML_BINDING_TENSOR;
bindDescriptor.DataType = WINML_TENSOR_FLOAT;
bindDescriptor.NumDimensions = 4;
INT64 shape[4] = { 1, 3, 1080, 1920 };
bindDescriptor.pShape = reinterpret_cast(&shape);
bindDescriptor.Tensor.pData = &floatarray;
spContext->BindValue(&bindDescriptor);
WinML API: Bind Float[]
• Setup the D3D Device to use for InferencingComPtr spContext;
ComPtr spDevice; // Any D3D device which is Compute capable!!!!!
spRuntime->CreateEvaluationContext(spDevice.Get(), &spContext);
• Connect input and output dataWINML_BINDING_DESC bindDescriptor;
bindDescriptor.BindType = WINML_BINDING_RESOURCE
bindDescriptor.DataType = WINML_TENSOR_FLOAT;
bindDescriptor.NumDimensions = 4;
INT64 shape[4] = { 1, 3, 1080, 1920 };
bindDescriptor.pShape = reinterpret_cast(&shape);
bindDescriptor.Resource.pData = &sp12Resource.Get();
spContext->BindValue(&bindDescriptor);
WinML API: Bind D3D12Resource
• Evaluate model (inference)
spRuntime->EvaluateModel(&spContext);
• Commands are queued on the compute queue
WinML API: Eval
WinML in PIX
• Windows ML is big step towards democratizing AI Inference
• DirectML with DX12 Meta Commands delivers the full performance of the hardware
• Integrates with Direct3D12 for low latency
• Runs on All DX12 GPUs
• Start experimenting with AI for games today!
Summary
• Converters• winmltools – https://pypi.org/project/winmltools/
• Project integration with Visual Studio• MLGen – in the Windows SDK
• Visualization• Netron - https://github.com/lutzroeder/Netron
• Debugging - PIX • ML events in the Compute queue
• Operator and model timing
Tools
https://pypi.org/project/winmltools/https://github.com/lutzroeder/Netron
• ONNX Models• https://github.com/onnx/models
• Windows Machine Learning Samples• https://github.com/Microsoft/Windows-Machine-Learning
• Check out our forums: DirectXTech.com• Links to Documentation• Resources for learning about ML• Compilation of recently published papers• ONNX Converters• Tools and tips• Post your questions or share your thoughts here• Stay up to date with the latest happenings
Resources
https://github.com/onnx/modelshttps://github.com/Microsoft/Windows-Machine-Learning
Questions?
© 2018 Microsoft Corporation.
All rights reserved. Microsoft, Xbox, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing
market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
Recommended