40
A Powerful, Flexible, and Intui5ve Deep Learning Framework @ NVIDIA GTC, April 6 th , 2016 Shohei Hido Chief Research Officer Preferred Networks, Inc.

A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

APowerful,Flexible,andIntui5veDeepLearningFramework�

@NVIDIAGTC,April6th,2016 �

ShoheiHido

ChiefResearchOfficer

PreferredNetworks,Inc.

Page 2: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Overview

l  ChainerisaPython-baseddeeplearningframework

l  Chainerv1.0wasreleasedasanopensourceonJune2015

l  ItDOESN’TrelyonTheano,unlikeotherPythonframeworks

l  ChainerusesauniqueschemenamedDefine-by-Run

http://chainer.org/

l  WhydouserssOllneedanotherframework?

l  HowdifferentandeffecOveChaineris?

2

Page 3: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Preferred Networks (PFN) A startup that applies deep learning to industrial IoT

l  Founded: March 2014

l  Headquarter: Tokyo, Japan

l  U.S. Subsidiary: San Mateo, California

l  Company size: 35 engineers & researchers

l  Investors: Toyota, FANUC, NTT

Deep learning Industrial IoT

3

Manufacturing

Automotive

Healthcare

Page 4: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Partnering with world-leading companies using Chainer

l  R&DcollaboraOononindustrialproblemswithreal-worlddata  Specificrequirements,modifiedalgorithms,manytrialsanderrors,etc

  Differentfrommakinggeneral-purposerecogniOonsystem

4

Toyota FANUC

Panasonic

NTT

Cisco NVIDIA

Page 5: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Two types of background behind DL frameworks

1.Scalability-oriented

l  Use-casesinmind

  Image/speechrecogniOonsystem

  FastDLasaserviceincloud

l  Problemtype

  AfewgeneralapplicaOons

  10+milliontrainingsamples

  10+nodesclusterw/fastnetwork

l  PossibleboZleneck

  Tuningofwell-knownalgorithms

  DistributedcomputaOonformodel/data-paralleltraining

2.Flexibility-oriented

l  Use-casesinmind

  Algorithmresearch

  R&Dprojectsfornewproducts

l  Problemtype

  VariousspecificapplicaOons

  10+ktrainingsamples

  1nodewithmulOpleGPUs

l  PossibleboZleneck

  Trial-and-errorinprototyping

  Debugging,profiling&refactoring

  (waitOmeduringcompilaOon)

Page 6: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Designed for efficient research & development

l  Flexible:newkindsofcomplexmodelsforvariousapplicaOons

l  IntuiOve:rapidprototypingandefficienttrial-and-error

l  Powerful:comparableperformancefor1node&mulO-GPUs

6

Scalability-oriented Flexibility-oriented

Page 7: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Agenda

l  Deeplearningframeworkbasics

l  IntroducOontoChainer

l  CuPy:NumPy-compaObleGPUlibrary

l  PerformanceandapplicaOons

7

Page 8: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Neural network and computation

x1

xN

・・

h1

hH

・・・・

kM

k1

yM

y1

Forward computation

Backward computation (backpropagation)

・・

・・

Input Hidden units OutputText

Image

Sensor

Object: Tulip

Anomaly score: 0.35

Category: Sports

・・

・・

・・

8

Page 9: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Chainer focuses on network representation/training

l  Designchoicesfordeeplearningframeworks

  Howtobuildneuralnetworks?

  Howtotrainneuralnetworks?

  Whichtextformat/languageformodeling?

  WhichlanguageforcompuOng?

  RunwithGPU?

  RunonmulOpleGPUs?

  RunonmulOplecomputenodes?

9

Page 10: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Building and training neural networks: Computational graph construction is the key

1.  ConstructacomputaOonalgraph

  BasedonnetworkdefiniOongivenbyusers

  ChainsoffuncOonsandoperaOonsoninputvariables

2.  Computelossandgradients

  ForwardcomputaOontocalculatelossforaminibatch

  BackpropagaOongivesgradientstoallofparameters

3.  OpOmizemodel

  Updateeachparameterwiththegradient

  RepeatunOlconvergence

Step 1. is the most important and there are many approaches

10

Page 11: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Building blocks

l  ThesefuncOonaliOesareverysimilarbetweenframeworks

l  Butthestructure,abstracOonlevel,andinterfacearedifferent

l  Itcomestothedesignofdomain-specificlanguageforNN

Array data structure (vector/matrix/tensor)

Operations & functions

Network (computational graph)

Optimizer (SGD/AdaGrad/Adam)

11

Page 12: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Types of domain-specific language for neural networks

l  TextDSL  Ex.Caffe(prototxt)

  Ex.CNTK(NDL)

l  Symbolicprogram  OperaOons

onsymbols

  Ex.Theano

  Ex.TensorFlow

l  ImperaOveprogram  DirectcomputaOons

onrawdataarrays

  Ex.Torch.nn

  Ex.Chainer

#SymbolicdefiniOonA=Variable(‘A’)B=Variable(‘B’)C=B*AD=C+Constant(1)#Compilef=compile(D)d=f(A=np.ones(10),B=np.ones(10)*2)

#ImperaOvedeclaraOona=np.ones(10)b=np.ones(10)*2c=b*ad=c+1

%%DefiniOonintextf:{“A”:“Variable”,“B”:“Variable”,“C”:[“B”,“*”,“A”],“ret”:[“C”,“+”,1]}

#Compilef=compile(“f.txt”)d=f(A=np.ones(10),B=np.ones(10)*2)

12

Ex. MXNet

Page 13: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Comparison of DSL type

DSLtype Pros. Cons.

TextDSL

•  Human-readabledefiniOon•  Non-programmercaneasily

editthenetwork

•  Usersmuststudytheformat•  Formatmighthavetobe

extendedfornewalgorithms

InternalDSL Symbolic

•  StaOcanalysisatcompile•  OpOmizaOonbeforetraining•  Easytoparallelize

•  Usersmuststudyspecialsyntax•  Mayneedmoreeffortsto

implementnewalgorithms

ImperaOve

•  Lesseffortstolearnsyntax•  Easydebuggingandprofiling•  Suitablefornewalgorithms

withcomplexlogic

•  HardtoopOmizeinadvance•  Lessefficientinmemory

allocaOonandparallelizaOon

ChainerisattheextremeendofimperaOveprogramforhighflexibility

13

Page 14: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Agenda

l  Deeplearningframeworkbasics

l  IntroducOontoChainer

l  CuPy:NumPy-compaObleGPUlibrary

l  PerformanceandapplicaOons

14

Page 15: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Chainer as an open-source project

l  hZps://github.com/pfnet/chainerl  50contributors

l  1,277stars&255fork

l  3,708commits

l  AcOvedevelopment&releaseforlast10months  v1.0.0(June2015)tov1.7.2(March2016)

15

Original developerSeiya Tokui

Page 16: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

CuPy

Chainer software stack

CPU NVIDIAGPU

CUDA

cuDNN

BLAS

NumPy

Chainer

l  ChainerisbuiltontopofNumPyandCUDA

l  CuPyisalsointroducedasanequivalentofNumPyonGPU

16

Page 17: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Run

Define

Graph build scheme (1/2) - Define-and-Run: Most of frameworks use this scheme (Chainer does not)

l  Define:buildacomputaOonalgraphbasedondefiniOon

l  Run:updatethemodel(parameters)usingtrainingdataset

NetworkdefiniOon

ComputaOonalgraph

GradientfuncOon

Parameters

ComputaOonalgraph

GradientfuncOon

Parameters

Trainingdata

Update

Loss&gradient

AutodifferenOaOon

17

Page 18: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Define-by-Run

Graph build scheme (2/2) - Define-by-Run: Computational graph construction on the fly

l  Nographisconstructedbeforetraining

l  Instead,thegraphisbuiltateachforwardcomputaOon

l  ComputaOonalgraphcanbemodifieddynamicallyforeachiteraOon/sampleordependingonsomecondiOons

ModeldefiniOon

ComputaOonalgraph

GradientfuncOon

Parameters

Trainingdata

Update

DynamicchangeCondiOons

18

Page 19: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Define-by-Run example: MLP for MNIST

l  OnlytransformaOonsbetweenunitsaresetbeforetraining

l  ConnecOonisgivenasforwardcomputaOon

l1 = Linear(784, n_units) l2 = Linear(n_units, 10))

Linear l2

Linear l1 x yh1

W bias

059

W bias

ReLU

def forward(x): h1 = ReLU(l1(x)) return l2(h1)

19

Page 20: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Define-by-Run: An interpreted language for neural network

l  Idea  ForwardcomputaOonactuallygoesthroughcomputaOonalgraph

  Byrememberingthehistory,theactualgraphcanbeobtained

l  Advantage  Flexibilityfornewalgorithmswithcomplexcomponents

u  Ex.recurrent,recursive,aZenOon,memory,adversarial,etc

  IntuiOvecodingwithhighlyimperaOvenetworkdefiniOon

u  Ex.stochasOcnetworkofwhichgraphchangesforeachiteraOon

l  Currentdrawbacks  GraphisgeneratedeveryOmealsoforfixednetworks

  NoopOmizaOonevenforstaOcpartofgraphs

u  JIT-likeanalysisandsubgraphcachemightbeuseful20

Page 21: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Basic components (1/2): Variable and Function

l  Variable  Variablewrapsarrays(.data)

  ItremembersparentfuncOon(.creator)

  Itwillbeassignedgradient(.grad)

  ItkeepstrackofnotonlydatabutalsocomputaOons

l  FuncOon  TransformaOonbetweenVariable

  Stateless

  e.g.sigmoid,tanh,ReLU,maxpooling,dropout

Function

x y

Variable

x yh1059

21

Page 22: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Chain (MLP2)

Basic components (2/2): Link and Chain

l  Link=funcOonwithstate  ParametersarealsoVariable

andgradientswillbeassigned

  e.g.Linear(fully-connected),LSTMConvoluOon2d,word-embedding

l  Chain=network  ChainhasasetofchildLink

  ForwardcomputaOonisdefinedin.__call__()

  e.g.MLP2,AlexNet,GoogLeNet,RNNLM,seq2seq,

Link (Linear)

y=f(W*x+b)

x y

W b

Linear l2

Linear l1 �� y�h1 �

W� bias�

W� bias�

ReLU

22

Page 23: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Backpropagation through computational graph

l  ConsideranobjecOve(Link.Linear):L = f(x * w + b)

l  ThiscomputesthevalueofLinforwardcomputaOon,andsimultaneouslybuildsthefollowingcomputaOonalgraph

l  ThegradientofLcanbecomputedwithrespecttoanyvariablesbybackpropagaOon

l  ThentheopOmizerupdatesthevalueofparameters

* x

W

+

b

f L

isVariableisFuncOon

23

Page 24: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Code sample (1/4): Multi-layer perceptron

class MLP2(Chain): def __init__(self): super(MLP2, self).__init__( l1=L.Linear(784, 100), l2=L.Linear(100, 10), ) def __call__(self, x): h1 = F.relu(self.l1(x)) y = self.l2(h1) return y class Classifier(Chain): def __init__(self, predictor): super(Classifier, self). __init__(predictor=predictor) def __call__(self, x, t): y = self.predictor(x) self.accuracy = F.accuracy(y, t) self.loss = F.softmax_cross_entropy(y, t) return self.loss, self.accuracy

# Model and optimizer setup model = Classifier(MLP2()) optimizer = optimizers.SGD() optimizer.setup(model) # training loop with minibatch for i in range(0, datasize, batchsize): x = Variable(x_tr[i:i+batchsize]) t = Variable(y_tr[i:i+batchsize]) model.zerograds() loss, acc = model(x, t) loss.backward() optimizer.update()

Chain (MLP2)

Linear l2

Linear l1 �� y�h1 �

W� bias�

W� bias�

ReLU

24

Page 25: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Code sample (2/4): Convolutional neural networkclass AlexNet(Chain): def __init__(self): super(AlexNet, self).__init__( conv1=L.Convolution2D(3, 96, 11, stride=4), conv2=L.Convolution2D(96, 256, 5, pad=2), conv3=L.Convolution2D(256, 384, 3, pad=1), conv4=L.Convolution2D(384, 384, 3, pad=1), conv5=L.Convolution2D(384, 256, 3, pad=1), fc6=L.Linear(9216, 4096), fc7=L.Linear(4096, 4096), fc8=L.Linear(4096, 1000), ) def __call__(self, x, t): h = F.max_pooling_2d(F.relu( F.local_response_normalization(self.conv1(x))), 3, stride=2) h = F.max_pooling_2d(F.relu( F.local_response_normalization(self.conv2(h))), 3, stride=2) h = F.relu(self.conv3(h)) h = F.relu(self.conv4(h)) h = F.max_pooling_2d(F.relu(self.conv5(h)), 3, stride=2) h = F.dropout(F.relu(self.fc6(h)), train=self.train) h = F.dropout(F.relu(self.fc7(h)), train=self.train) y = self.fc8(h) return y

* ImageNet Classification with Deep Convolutional Neural Networks http://www.image-net.org/challenges/LSVRC/2012/supervision.pdf

conv2d

conv2d

conv2d

conv2d

conv2d

linear

linear

25

linear

Page 26: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Code sample (3/4): Recurrent neural network

class SimpleRNN(Chain): def __init__(self, n_vocab, n_units): super(SimpleRNN, self).__init__( embed=L.EmbedID(n_vocab, n_units) x2h=L.Linear(n_units, n_units), h2h=L.Linear(n_units, n_units), h2y=L.Linear(n_units, n_vocab),) self.h = None def __call__(self, x): y, h_new = self.fwd_one_step(x, self.h) self.h = h_new return y def fwd_one_step(self, x, h): x = F.tanh(self.embed(x)) if h is None: h = F.tanh(self.x2h(x)) else: h = F.tanh(self.x2h(x) + self.h2h(h)) y = F.softmax(self.h2y(h)) return y, h

x_1 h y_1

x_2 h y_2

x_3 h y_3

x_4 h y_4

BPTTlength=3

Inputword OutputRecurrentstate

# Truncated BPTT (length=3) for i in range(0, datasize, batchsize): ... accum_loss += model(x, t) if i % bptt_length == 0: model.zerograds() accum_loss.backward() accum_loss.unchain_backward() optimizer.update()

26

Page 27: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Code sample (4/4): Deep Networks with Stochastic Depth A paper published on arXiv, March 30, 2016

l  AvariantofResidualNetthatskipsconnecOonsstochasOcally  OutperformedtheoriginalResidualNet(ImageNet2015winner,MSR)

  StochasOcskip:

Taken from http://arxiv.org/abs/1603.09382v2G. Huang et al.

# Mock code in Chainer

class StochasticResNet(Chain):

def __init__(self, prob, size, …):

super(StochasticResNet, size, …).__init__(

## Define f[i] as same for Residual Net )

self.p = prob # Survival probabilities

def __call__(self, h):

for i in range(self.size): b = numpy.random.binomial(1, self.p[i])

c = self.f[i](h) + h if b == 1 else h

h = F.relu(c)

return h

w/ survival probability:

27

Page 28: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Miscellaneous

l  Otherfeatures  Installwithpipinoneline:

  MulO-GPUsupportbyexplicitlyselecOngtheIDtouse

  Pre-trainedCaffemodelimportfromModelZoo

  ModelserializaOon&save&load:HDF5orNumPynpz

l  FuturedirecOon(notonlyforChainer)  JIT-likeopOmizaOonduringDefine-by-Run

  MemoryconsumpOonreducOon(GPUmemoryissOllsmall)

  Handlingvariable-lengthinputswithoutminibatch

  MaximizingperformanceonmulO-node&mulO-GPUenvironment

$ pip install chainer

28

Page 29: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Agenda

l  Deeplearningframeworkbasics

l  IntroducOontoChainer

l  CuPy:NumPy-compaObleGPUlibrary

l  PerformanceandapplicaOons

29

Page 30: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

CuPy: (partially-)NumPy-compatible GPU library

l  MoOvaOon:NumPy+CUDA=CuPy  NumPyisthestandardlibraryinPythonfornumericalcomputaOon

  CUDAisthestandardAPIsforusingGPUforhigh-performance

  Unfortunately,NumPydoesNOTworkwithCUDA

l  CuPysupports:  FastcomputaOonusingNVIDIA’scuBLASandcuDNN

  Arrayindexing,slicing,transpose,andreshape

  MostofoperaOons/funcOonsinNumPy

u  Chainerv1.7.2alreadysupportsmorethan170funcOons

  User-definedfuncOonsandkernels

  alldtypes,broadcasOng,memorypool,etc30

Page 31: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

How to use CuPy

l  UsageofCuPy:justreplaceNumPy withCuPy

l  Conversionbetweennumpy.ndarrayandcupy.ndarray

l  Ex.CPU/GPU-agnosOclogsumexpfuncOon def logsumexp(x, axis=None): xp = cuda.get_array_module(x) #Get CuPy or NumPy x_max = x.max(axis) exp_sum = xp.exp(x - x_max).sum(axis) return x_max + xp.log(exp_sum)

import numpy, cupy enable_cupy = True xp = cupy if enable_cupy else numpy

w_c = cupy.asarray(numpy.ones(10)) # cupy.ndarray w_n = cupy.asnumpy(cupy.ones(10)) # numpy.ndarray

31

Page 32: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

CuPy implementation: Optimized for performance & NumPy-compatibility

l  UseCythonforcupy.core&cupy.cuda

l  DynamiccodegeneraOon&compile  CUDAcodeisgeneratedforspecifictensordimension&datatype

  On-the-flycompilebynvccandbinarycache(fasterawer1stuse)

CUDAlibraries(cuBLAS,cuRAND,cuDNN)�

ndarray�

ufunc,elementwise,reduc5on

CUDAPythonwrapper � cupy.cuda

cupy.core

Tensoropera5ons&func5ons � cupy

32

Page 33: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

CuPy performance on linear algebra: 5 to 25 times faster than NumPy

def test(xp): a = xp.arange(1000000).reshape(1000, -1) return a.T * 2 test(numpy) t1 = datetime.datetime.now() for i in range(1000): test(numpy) t2 = datetime.datetime.now() print(t2 -t1) test(cupy) t1 = datetime.datetime.now() for i in range(1000): test(cupy) t2 = datetime.datetime.now() print(t2 -t1)

msec � speedup�

NumPy 2,929 1.0CuPy 585 5.0CuPy+MemoryPool

123 23.8

[email protected],32GB,GeForceGTX970

33

Page 34: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Use CuPy for GPU-based computation

l  SupportthreepaZernsaswrappers  ElementwiseKernel:forelement-wisecomputaOon

  ReducOonKernel:forreduceoperaOonalongaxis

  ufunc:universalfuncOonasinNumpy

l  Ex.definiOonofanelement-wisefuncOon

l  Usage(automaOcbroadcastandtypecheckaresupported)

squared_diff = cupy.ElementwiseKernel( ‘float32 x, float32 y’, # Input

‘float32 z’, # Output

‘z = (x - y) * (x - y)’, # Operation

‘squared_diff’) # Name

squared_diff(cupy.arange(10), 10)

34

Page 35: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Agenda

l  Deeplearningframeworkbasics

l  IntroducOontoChainer

l  CuPy:NumPy-compaObleGPUlibrary

l  PerformanceandapplicaOons

35

Page 36: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Public benchmark results (CNN): Chainer shows comparable performance

l  ForwardcomputaOonisalmostthesamewithTensorFlow

l  TrainingwithbackwardcomputaOonisslower,butitcanbeoffsetbynocompilaOonOmewhiledebugging/tuning

0

200

400

600

800

1000

1200

AlexNet GoogLeNet VGG-A OverFeat

TorchTensorFlowChainerCaffe(naCve)

0

200

400

600

800

1000

1200

AlexNet GoogLeNet VGG-A OverFeat

TorchTensorFlowChainerCaffe(naCve)

Forward computation (msec) Backward computation (msec)

Taken from https://github.com/soumith/convnet-benchmarks, using cuDNN except Caffe 36

Page 37: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Chainer can benefit from latest CUDA libraries: Ex. Winograd algorithm in cuDNN v5

l  Conv3x3iscommoninCNNs&nowcomputedwithWinograd

l  State-of-the-artCNNmodels(e.g.,GoogLeNet,VGG-A)canbeacceleratedupto2.0xattestOme(forwardonly)

0

100

200

300

400

500

600

AlexNet GoogLeNet VGG-A OverFeat

cuDNNv4cuDNNv5

0

100

200

300

400

500

600

AlexNet GoogLeNet VGG-A OverFeat

cuDNNv4cuDNNv5

Forward computation (msec) Backward computation (msec)

Independently measured by a modified version of soumith/convnet-benchmarkscuDNN v5 can be used in Chainer v1.8.0 37

Page 38: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Algorithm implementation in Chainer: A Neural Algorithm of Artistic Style (Gatys et al., 2015)

l  hZps://github.com/maZya/chainer-gogh

Contentimage (cat)

Style image

New artistic image

+ =

Main code (45 lines) 38

Page 39: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

l  ManycollaboraOonsareon-goingw/Chainer-basedcomputervision,deepreinforcementlearning,etc…

l  Ex.1Chainer-controlledtoycarsinToyotaboothatCES2016

l  Ex.2HighlyaccurateFANUC’sbin-pickingrobotatIREX2015  8hourstrainingtoreachexpert-level,commercializaOonby2016end

Chainer in industry: Used in demonstrations & being commercialized

http://tinyurl.com/pfn-irex15http://tinyurl.com/pfn-ces16

39

Page 40: A Powerful, Flexible, and Intui5ve Deep Learning Frameworkon-demand.gputechconf.com/gtc/...hido-chainer-deep-learning-framework.pdf · A Powerful, Flexible, and Intui5ve Deep Learning

Summary

l  ChainerisaPython-baseddeeplearningframeworkwithdynamicnetworkconstrucOonschemeandCuPy

l  ItisdesignedforefficientresearchandprototypingwhilekeepingcomparableperformancethankstoNVIDIAGPU

l  Officialweb:hZp://chainer.org/

l  Github:hZps://github.com/pfnet/chainer

YourcontribuOonswillbeappreciated&wearehiring!

40