39
Ahmed Zakaria Software Engineer Windows Graphics & AI Platform Using AI for Interactive Applications

Using AI for Interactive Applications - NVIDIA€¦ · WinML Win32 API WinML API Input Surface Output Surface Application #1 WinML Runtime Application #2 NPU. Building Blocks: ML

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

  • Ahmed Zakaria

    Software Engineer

    Windows Graphics & AI Platform

    Using AI for Interactive Applications

  • Low latency

    Why we need Inferencing on the Edge/Client?

    Flexibility

    Reduced operational costs

  • • ML can be used in so many ways in games• Character Animation

    • NPC Interaction

    • Scene Generation

    • Bots (testing, coop, competitive, etc)

    • Art & Content pipelines

    • Runtime visual FX, animation

    • Telemetry/BI analysis

    • And so much more....

    • *Photos on right do NOT run on Windows ML; illustrative examples of ML only; with exception of Project Malmo, none are Microsoft projects.

    Is it possible?

    Project Malmö

  • Inferencing Examples: Smooth animation

    Project Malmö

    http://theorangeduck.com/media/uploads/other_stuff/phasefunction.pdf

  • Inferencing Examples: de-noising

    Project Malmö

    Noisy Input Recurrent Auto-Encoder Reference

    http://research.nvidia.com/sites/default/files/publications/dnn_denoise_author.pdf

  • Inferencing Examples: Super resolution

    Project Malmö

    https://developer.nvidia.com/deep-learning-materials-texture

  • Introducing Windows Machine Learning

    Application

    Add Model

    Reference

    Model

    Windows ML

    1. Load – Loads model into Windows ML runtime2. Bind – Wires up inputs and outputs to model

    3. Eval – Evaluates the model and products results

    Training Inference

  • Open Neural Network Exchange

  • Demo Time!

  • Realtime Style Transfer

  • Unity ML-Agents 0.2 on WinML

  • Deep Dive!

  • Windows ML Architecture

    • WinML API• Available to Win32 & UWP Applications

    • Available on all Windows editions in 2018

    • Inference Engine• Model & Device resource management

    • Loads and compiles operator kernels

    • Execute dataflow graph

    • Device Layer• CPU instruction optimizations up to AVX-512

    • DirectML generates DX12 Compute shadersDirect3D

    GPU

    CPUDirectML

    Model Inference Engine

    WinML Win32 API

    WinML APIInput

    Surface

    Output

    Surface

    Application #1

    WinML Runtime

    Application #2

    NPU

  • Building Blocks: ML Model

    • Graph of operators & Attributes

    • Input & Output Tensors

    • Weights

  • Building Blocks: ML Model

    • Graph of operators & Attributes

    • Input & Output Tensors

    • Weights

  • Building Blocks: ML Model

    • Graph of operators

    • Input & Output Tensors

    • Weights

  • Building Blocks: ML Model

    • Graph of operators

    • Input & Output Tensors

    • Weights

  • Building Blocks: TensorsMulti-dimensional arrays of scalars

    Heigh

    t (H)

    R

    G

    B

  • Operator Example: Fully connectedEach element in the output depends on all elements in the input

    𝑦𝑖 = 𝒇(

    𝑗

    𝑤𝑖𝑗 ∙ 𝑥𝑗)0

    x y

    1

    2

    0

    1

    2

  • Operator Example: Fully connectedEach element in the output depends on all elements in the input

    𝑦𝑖 = 𝒇(

    𝑗

    𝑤𝑖𝑗 ∙ 𝑥𝑗)0

    x y

    1

    2

    0

    1

    2

    𝑤

  • Operator Example: Fully connectedEach element in the output depends on all elements in the input

    𝑦𝑖 = 𝒇(

    𝑗

    𝑤𝑖𝑗 ∙ 𝑥𝑗)0

    x y

    1

    2

    0

    1

    2

    𝑤

  • Operator Example: Fully connectedEach element in the output depends on all elements in the input

    𝑦𝑖 = 𝒇(

    𝑗

    𝑤𝑖𝑗 ∙ 𝑥𝑗)0

    x y

    1

    2

    0

    1

    2

    𝑤

  • Operator Example: Fully connectedEach element in the output depends on all elements in the input

    𝑦𝑖 = 𝒇(

    𝑗

    𝑤𝑖𝑗 ∙ 𝑥𝑗)

    Can be done as a Matrix Multiply!

    0

    x y

    1

    2

    0

    1

    2

    GPUs can do matrix multiplications very efficiently

  • • New hardware abstraction at a higher level

    • Flexible IHV implementation • Ex. NVidia uses tensor cores for FP16 metacommand implementation

    • GPU algorithm function declaration• What are the inputs and outputs to this algorithm • Does it need scratch memory• Persistent space for look up tables • Precision is well defined

    • Similar to C-Style call

    • Low cost for Adding new Metacommands vs DDI/API surface

    • Enumeration & Reflection APIs

    Building Blocks: DX12 Meta Command

  • Volta SM Tensor CoresDedicated hardware for machine learning acceleration

    8 Tensor cores per SM

    512 Fused-Multiply add (FMA) /clock

    Mixed precision operation

    110 TFLOPs peak on TitanV

  • Fully Connected as Meta Command

    In

    Filter

    Bias

    Out

    Activation Fn

    PrecisionFP32

    FP16

    ELU

    HARDMAX

    HARD_SIGMOID

    IDENTITY

    LEAKY_RELU

    LINEAR

    LOG_SOFTMAX

    PARAMETERIZED_RELU

    PARAMETERIZED_SOFTPLUS

    Temporary Resource

    Persistent Resource

    Transforms

    Lookups

    Steps:

    • Create on Device

    • Initialize once on Command List

    • Execute many on Command List

    Tensors

  • DirectML Style transfer GPU performanceNVIDIA TitanV @ 1080p, absolute perf

    3.48.6

    27

    0

    5

    10

    15

    20

    25

    30

    35

    HLSL FP32 FP32 MetaCommands FP16 MetaCommands with Tensor-cores

    Fra

    mes

    per

    second

    2.5x

    3.1x

  • Code!

  • ComPtr spRuntime = WinMLCreateRuntime();

    ComPtr spModel;

    spRuntime->LoadModel(GetModelPath(), &spModel);

    View Graph Metadata:

    spModel->GetDescription(&pDescription);

    spModel->EnumerateMetadata(count, &key, &value);

    spModel->EnumerateModelInputs(count, &pDescriptor);

    spModel->EnumerateModelOutputs(count, &pDescriptor);

    WinML API:LoadModel

  • • Setup the D3D Device to use for InferencingComPtr spContext;

    ComPtr spDevice; // Any D3D device which is Compute capable!!!!!

    spRuntime->CreateEvaluationContext(spDevice.Get(), &spContext);

    • Connect input and output dataWINML_BINDING_DESC bindDescriptor;

    bindDescriptor.BindType = WINML_BINDING_TENSOR;

    bindDescriptor.DataType = WINML_TENSOR_FLOAT;

    bindDescriptor.NumDimensions = 4;

    INT64 shape[4] = { 1, 3, 1080, 1920 };

    bindDescriptor.pShape = reinterpret_cast(&shape);

    bindDescriptor.Tensor.pData = &floatarray;

    spContext->BindValue(&bindDescriptor);

    WinML API: Bind Float[]

  • • Setup the D3D Device to use for InferencingComPtr spContext;

    ComPtr spDevice; // Any D3D device which is Compute capable!!!!!

    spRuntime->CreateEvaluationContext(spDevice.Get(), &spContext);

    • Connect input and output dataWINML_BINDING_DESC bindDescriptor;

    bindDescriptor.BindType = WINML_BINDING_RESOURCE

    bindDescriptor.DataType = WINML_TENSOR_FLOAT;

    bindDescriptor.NumDimensions = 4;

    INT64 shape[4] = { 1, 3, 1080, 1920 };

    bindDescriptor.pShape = reinterpret_cast(&shape);

    bindDescriptor.Resource.pData = &sp12Resource.Get();

    spContext->BindValue(&bindDescriptor);

    WinML API: Bind D3D12Resource

  • • Evaluate model (inference)

    spRuntime->EvaluateModel(&spContext);

    • Commands are queued on the compute queue

    WinML API: Eval

  • WinML in PIX

  • • Windows ML is big step towards democratizing AI Inference

    • DirectML with DX12 Meta Commands delivers the full performance of the hardware

    • Integrates with Direct3D12 for low latency

    • Runs on All DX12 GPUs

    • Start experimenting with AI for games today!

    Summary

  • • Converters• winmltools – https://pypi.org/project/winmltools/

    • Project integration with Visual Studio• MLGen – in the Windows SDK

    • Visualization• Netron - https://github.com/lutzroeder/Netron

    • Debugging - PIX • ML events in the Compute queue

    • Operator and model timing

    Tools

    https://pypi.org/project/winmltools/https://github.com/lutzroeder/Netron

  • • ONNX Models• https://github.com/onnx/models

    • Windows Machine Learning Samples• https://github.com/Microsoft/Windows-Machine-Learning

    • Check out our forums: DirectXTech.com• Links to Documentation• Resources for learning about ML• Compilation of recently published papers• ONNX Converters• Tools and tips• Post your questions or share your thoughts here• Stay up to date with the latest happenings

    Resources

    https://github.com/onnx/modelshttps://github.com/Microsoft/Windows-Machine-Learning

  • Questions?

  • © 2018 Microsoft Corporation.

    All rights reserved. Microsoft, Xbox, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing

    market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.

    MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.