27
Where Tegra meets Titan Prof Tom Drummond

Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Where Tegra meets Titan!

Prof Tom Drummond!

Page 2: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Computer vision is easy!!But first a diversion to 10th Century Persia …!

! ! ! ! ! ! !… and the first recorded game of chess!

Page 3: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The rice and the chessboard!

Page 4: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The rice and the chessboard!

Page 5: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The rice and the chessboard!

Page 6: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The rice and the chessboard!

Page 7: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The rice and the chessboard!

Page 8: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The rice and the chessboard!

First half of the chessboard: 100 tons of rice

Page 9: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The rice and the chessboard!

First half of the chessboard: 100 tons of rice

Second half of the chessboard: 400 billion tons of rice = 1000 years of production

And the moral of the story is …

Page 10: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The transistor and the chessboard!

Page 11: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The transistor and the chessboard!1974:  Intel  8080  (6,000  transistors)    1978:  Intel  8086  (29,000  transistors)    1982:  Intel  80286  (134,000  transistors)    1993  Intel  Pen:um  (3,000,000  transistors)    2004  P4  Intel  Presco>  (125,000,000  transistors)  

Page 12: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

The transistor and the chessboard!

?  

How  many  on  the  last  square…?  

1974:  Intel  8080  (6,000  transistors)    1978:  Intel  8086  (29,000  transistors)    1982:  Intel  80286  (134,000  transistors)    1993  Intel  Pen:um  (3,000,000  transistors)    2004  P4  Intel  Presco>  (125,000,000  transistors)    This  notebook  >  2  trillion  transistors  

2004:  Nvidia  NV40  (222,000,000  transistors)    2006:  Nvidia  G80  (484,000,000  transistors)    2008:  Nvidia  GT200  (1,400,000,000  transistors)    2010:  Nvidia  GF104  (1,900,000,000  transistors)    2012:  Nvidia  GK104  (3,540,000,000  transistors)    2015:  Nvidia  GM200  (8,000,000,000  transistors)  

Page 13: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Can run Mooreʼs law backwards!Q:  According  to  Moore’s  law,  when  was  there  just  one  transistor?  A:  1948  

Page 14: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Can run Mooreʼs law backwards!Q:  According  to  Moore’s  law,  when  was  there  just  one  transistor?  A:  1948  

In  Nov  1947,  Bardeen,  Bra>ain  and  Shockley  a>ached  two  gold  contacts  to  a  crystal  of  germanium…  

Page 15: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Power!

Mooreʼs law gives us increasing compute power!

BUT!

With great power comes great …!

Page 16: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Mooreʼs Law is not always our friend!!

Even  with  GPUs,  compute  on  mobile  devices  is  limited    Can’t  put  a  K40  on  a  Quadrotor!  

Page 17: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Mooreʼs Law is not always our friend!!

Even  with  GPUs,  compute  on  mobile  devices  is  limited    But  a  TX1  fits  just  fine!              (Stereolabs  TX1  enabled  drone)  

Page 18: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

ACRV!

The  Australian  Research  Council  Centre  of  Excellence  for  Robo:c  Vision  •  $25.5M  over  7  years  •  13  Chief  Inves:gators  in  4  Universi:es  •  16  Research  Fellows  •  ~50  PhD  students  •  Research  into:  

–  Seman:cs  (deep  learning)  –  Robust  vision  (all  weathers)  –  Vision  and  Ac:on  (closing  the  loop)  –  Algorithms  and  Architecture  (constrained  resources)  

Page 19: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Distributed Robotic Vision!

Simplest method is to just partition the problem somewhere, giving some tasks to the mobile and some to the server!

mobile   server  

Page 20: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Distributed Robotic Vision!

But often this isnʼt the best solution !e.g. latency introduced by the network may be a problem!

Many interesting solutions not like this, e.g:!

Obtain  sensor  data  

Extract  summary  

informa:on  

Compute  accurate  solu:on  

Compute  approximate  solu:on  

Compare  

Calculate  output  

Update  local  model  

Bring  correc:on  up  to  date  

Calculate  and  send  correc:on  

Compute  approximate  solu:on  

Page 21: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Distributed Robotic Vision!

Want to create solutions to enable robotics in a distributed sensing and compute environment!

TX1  

TX1   TX1  

K40  

K40  

K40  

K40  

K40  

K40  

K40  

K40  

CPU  

CPU  

Page 22: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Distributed Localisation Service!

Extract  landmarks  CCTV1   Build  Image  

Pyramid  Build  

Descriptors   Index   Match  

Extract  landmarks  CCTV2   Build  Image  

Pyramid  Build  

Descriptors   Index   Match  

Extract  landmarks  Robot   Build  Image  

Pyramid  Build  

Descriptors  

Compute  1   Compute  Robot  pose  

Page 23: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Distributed Localisation Service!==3031== NVPROF is profiling process 3031, command: ./ComputeOrb 1!Frame# 1!Elapsed time : 5.955523 ms!Frame Elapsed time : 7.765627 ms!

numCorners: 28304, nmsnumCorners: 5073!==3031== Profiling application: ./ComputeOrb 1!==3031== Profiling result:!

Time(%) Time Calls Avg Min Max Name! 57.18% 3.2379ms 1 3.2379ms 3.2379ms 3.2379ms OrbDescriptors(…)! 30.57% 1.7312ms 1 1.7312ms 1.7312ms 1.7312ms (…)! 4.29% 242.92us 1 242.92us 242.92us 242.92us fastcorner(…)!

4.00% 226.31us 1 226.31us 226.31us 226.31us harris(…)! 1.46% 82.553us 1 82.553us 82.553us 82.553us NMS(…)! 0.73% 41.458us 1 41.458us 41.458us 41.458us cleansweep(…)!

!

Speedup over CPU* implementation is 4-5X!

!

* Intel  Core2  Quad  Q8400  @2.66Ghz!

Page 24: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Sub-pixel localisation!

Timing Results: ! ! !(µs/keypoint)Inverse Additive ! ! !672 Inverse Compositional !367 Ours ! ! ! ! !7!

Extract  image  patch  Camera  1   Find  

landmarks  

Compute  matrix  Compute  1  

Camera  2   Extract  image  patch  

Find  landmarks  

Compute  sub-­‐pixel  correspondence  on  many  subsequent  frames  

Compute  sub-­‐pixel  correspondence  on  many  subsequent  frames  

Page 25: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Approximate Nearest Neighbor!Big  data  in  high  dimensional  spaces  Given  a  query  point,  find  the  nearest  reference  point  Solu:on:  FANNG  (Fast  Approximate  Nearest  Neighbor  Graphs)  @CVPR  2016  Can  serve  1.2M  queries/second  at  90%  recall  in  a  database  of  1M  reference  points  in  128D  space  on  Titan  X  

Page 26: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Approximate Nearest Neighbor!CUDA  implementa:on  requires  a  short  priority  queue  BUT  int array[30]; // very slow global memory!

 Solu:on  is  to  treat  a  warp  as  a  single  unit  with  array  spread  over  the  warp  in  a  single  register:    int array; // there are 32 of these in a warp !...!// find the first entry in array that is > thresh!int pq = __ffs(__ballot(array > thresh));!...!!

Page 27: Where Tegra meets Titan - NVIDIAon-demand.gputechconf.com/gtc/2016/presentation/s...Distributed Robotic Vision! But often this isnʼt the best solution !e.g. latency introduced by

Approximate Nearest Neighbor!Want  to  keep  the  array  sorted  when  we  insert  a  new  value,  discarding  the  largest  value  

1   2   4   5   9   11   13   15  array:  

0   1   2   3   4   5   6   7  thread:  

new_value:   8  

8   8   8   8   9   11   13   15  ship  value:  

8   8   8   8   9   11   13  shuffle:  

(each  thread  sees  this  value)  

=max(new_value,array)  

Write  new  value  if  less  than  array  

1   2   4   5   8   9   1   13  array:  

8   8   8   8   8   8   8