32
Maximizing GPU Power for Vision and Depth Sensor Processing From NVIDIA's Tegra K1 to GPUs on the Cloud Chen Sagiv Eri Rubin SagivTech Ltd.

Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Maximizing GPU Power for Vision and Depth Sensor Processing

From NVIDIA's Tegra K1 to GPUs on

the Cloud

Chen Sagiv Eri Rubin

SagivTech Ltd.

Page 2: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• Mobile Revolution

• Mobile – Cloud Concept

• 3D Imaging

• Two use case • SceneNet on Tegra K1

• Depth Sensing on Tegra K1

• SagivTech Streaming Infrastructure

• Take home Tips for Tegra K1

Today’s Talk

Page 3: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• Established in 2009 and headquartered in Israel

• Core domain expertise: GPU Computing and Computer Vision

• What we do: - Technology - Solutions - Projects - EU Research - Training

• GPU expertise:

- Hard core optimizations - Efficient streaming for single or multiple GPU systems - Mobile GPUs

SagivTech Snapshot

Page 4: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• In 1984, this was cutting-edge science fiction in The Terminator

• 30 years later, science fiction is becoming a reality!

Mobile Revolution is happening now !

Page 5: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

The Combined Model: Mobile & Cloud Computing

Page 6: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• Understanding, interpretation and interaction with our surroundings via mobile device

• Demand for immense processing power for implementation of computationally-intensive algorithms in real time with low latency

• Computation tasks are divided between the device and the server

• With CUDA – it’s simply easier!

Mobile – Cloud Concept

Page 7: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• Acquisition – Depth Sensors

• Processing – modeling, segmentation, recognition, tracking

• Visualization – Digital Holography

3D Imaging is happening now !

Page 8: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• If you’ve been to a concert recently, you’ve probably seen how many people take videos of the event with mobile phone cameras

• Each user has only one video – taken from one angle and location and of only moderate quality

Mobile Crowdsourcing Video Scene Reconstruction

Page 9: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Leverage the power of multiple mobile phone cameras

to create a high-quality 3D video experience that is

sharable via social networks

The Idea behind SceneNet

Page 10: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Creation of the 3D Video Sequence

The scene is photographed by

several people using their cell

phone camera

The video data is

transmitted via the

cellular network to a

High Performance

Computing server.

Following time

synchronization, resolution

normalization and spatial

registration, the several videos

are merged into a 3-D video

cube.

TIME

Page 11: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

The Event Community

A 3-D video event is created.

The 3-D video event

will be available on

the internet as public

or private event.

The event will create a

community, where each

member may provide another

piece of the puzzle and view

the entire information.

TIME

VIEW

SHARE

SEARCH

Page 12: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

GPU Computing in SceneNet

Video Registration

&

3D Reconstruction

Computational

Acceleration

Page 13: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Bilateral Filter Acceleration on Tegra K1

Page 14: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Bilateral Filter Acceleration on Tegra K1

Page 15: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Bilateral Filter Acceleration on Tegra K1

Page 16: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Bilateral Filter Acceleration on Tegra K1

Speedup GPU 4 CPU Threads 1 CPU Thread Image Size

x60 2.8ms 170ms 630ms 256 x 256

x57 12ms 690ms 2550ms 512 x 512

x60 45ms 2720ms 10300ms 1024 x 1024

Page 17: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• The Mission: Running a depth sensing technology on a mobile platform

• The Challenge: First time on Tegra K1

• Extreme optimizations on a CPU-GPU platform to allow the device to handle other tasks in parallel

• The Expertise:

• Mantis Vision – the 3D core technology and Structured light algorithms

• SagivTech – the GPU computing expertise

• The bottom line: Depth sensing in running in real time in parallel to other compute intensive applications !

First Depth Sensing Module for Mobile Devices – on Tegra K1

Page 18: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• In one word: Easy!

• Started with the most similar platform - GTX630, based on the GK208.

• Took only a few hours to transfer all the code.

• What's our secret ?

Migrating from Discrete Kepler to K1

Page 19: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Our Infra is composed of a set of modules

SagivTech Infra Stack

STInfraSys

STInfraGPU

STStreamingGPU

STMultiGPU STCudaK

ernels

STCuda

Functions

STGL

Interop

Page 20: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

for (int ....) {

START_BLOCK_TIME();

... Calculate some stuff ….

TAKE_BLOCK_SUB_TIME("2. First Part");

... Calculate some stuff ….

TAKE_BLOCK_SUB_TIME("3. Second Part");

}

Timing Code Sample Simple One Line of code to time a block

Page 21: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Timers:

---------

BENCHMARK: Recent Avg Global Avg Max time Count

---------------------------------------------------------------------------------------------------------------

|MyFunc.1. First Part 142.594 142.659 156.859 100

|MyFunc.2. Calculation 1706.63 1720.07 1987.78 100

Timing Code Sample

Simple One Line of code to time a block

Page 22: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

The major functionalities provided by the NDArray are: • Initialize a NDArray of any arbitrary size • Bind to an existing device/host pre-allocated pointer • Copy to/from host/device. • Load and Save functionality to/from file. Especially

useful for regression purposes • Most of the functionality of the NDArray is done in an

asynchronously manner

NDArray

Page 23: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• STL style code, no need to free and alloc

• Async is hidden from the user

NDArray Code Sample

st::CArray1D<int> arr_h1;

st::CArray1D<int> arr_d1(iArrayLength, false, 512);

arr_h1.Init(iArrayLength);

arr_h1.Fill(11);

arr_h1.CopyTo(arr_d1);

Page 24: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

Single line regression system

Regression Code Sample

st::RegressionParameters par = st::System::GetInstance().GetRegParams();

par.mode = regressionMode;

st::System::GetInstance().SetRegressionParams(par);

if(!ST_REGRESSION(h_cmpNDArr)) return 1;

return 0;

Page 25: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

ST MultiGPU Real World Use Case

Four GPUs Four pipes Utilization: 96%+

FPS: 20.46

Scaling: 3.79 – Near linear Scaling!

Note NO gaps in the profiler

Page 26: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

GPU streaming

Page 27: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• Need to remember that Android is overlaid on a Linux base

• Code development and testing (including CUDA) can be done on any PC

• Profiling on Logan – NVProf for Logan – can be ported to your PC

Key Points for Developing on the K1

Page 28: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• There is a strong separation between the Android system and the NDK

• A CUDA developer doesn’t need to become an Android developer

• From the Android developer viewpoint this is simply a library

• An Android developer doesn’t need to become a CUDA developer

Key Points for Developing on the K1

Page 29: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• Only 1 SMX (compared to 15 on the k20x)

• Only one RAM, shared by the CPU and the GPU

• Shared memory is similar in behavior to shared memory in Kepler 2

• LDG - very useful, easy optimization

• We used Thrust and moved to CUB (for streams)

• Will be possible to use existing library infrastructure on Logan

Take Home Tips for CUDA on Tegra K1

Page 30: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

• Development methodology is similar to discrete GPU development

• No dynamic parallelism

• No hyper Q

• Don’t underestimate Tegra’s CPU - the challenge is to divide work between the various components

Take Home Tips for CUDA on Tegra K1

Page 31: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

This project is partially funded by the European Union under the 7th Research Framework, programme FET-Open SME, Grant agreement no. 309169.

Mobile Crowdsourcing Video Scene Reconstruction

Page 32: Take GPU Power to the Maximum for Vision and Depth Sensor ...on-demand.gputechconf.com/...sensor-K1-tegra-cloud.pdfTitle: Take GPU Power to the Maximum for Vision and Depth Sensor

T h a n k Yo u F o r m o r e i n f o r m a t i o n p l e a s e c o n t a c t

N i z a n S a g i v

n i z a n @ s a g i v t e c h . c o m

+ 9 7 2 5 2 8 1 1 3 4 5 6