Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
Joy of Designing Deep Neural Networks
November 28, 2019
1
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
Creating a complex function, but not having to actually program it
def computeSomething(data):…newData = something # I don’t give a shit return newData
2
Imagine
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 2
History
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● Avid video game player● Programming from a young age
3
At the beginning
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 4
Globulation 2
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 5
A different direction
● Took a business degree● Freelanced on the side● Did a startup (failed)● Broke; back to programming for cash
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 6
Sensibill
● Receipt processing technology
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 7
Sensibill{ "items": [{ "name": "T1 Cafvn Lt Frapp", "regularPriceTotal": "4.25" }], "receiptDate": "04/06/2016", "receiptNumber": "656335", "receiptTime": "07:35 am", "store": [{ "addressLines": [ "438 Richmond Street West", "Toronto, ON M5V 3S6" ], "name": "Starbucks Coffee Canada #", "storeID": "4495" }], "taxes": [{ "amount": "0.55", "currencyCode": "", "percent": "13", "ruleID": "HST" }], "tenders": [{ "amount": "4.80", "currencyCode": "", "currentBalance": "14.80", "maskedCardNumber": "**** 3616", "tenderType": "Sbux Card" }], "total": { "currencyCode": "", "grand": "4.80", "subtotal": "4.25" }
>
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 8
Stumbling Around
● Hand baked heuristic algorithm● Later turns out to be a variant on
k-nearest-neighbor algorithm
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● Building and maintaining AI datasets● Designing annotators● Building a data-operations team● Cleaning and transforming data● Testing different algorithms (like
decision trees)
9
Valuable Experience
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 10
A hint
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● Upgraded to recurrent neural network
● Massive improvement in accuracy
11
Deep Neural Network Upgrade
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● Became obsessed with neural networks and how they are designed
● Passion for deep learning ignites!
12
It begins
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● Learning everything I can about deep neural networks
13
Diving deep
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● As a programmer, I’m not a big fan of mathematical equations
14
My dirty secret
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● Diving deep into the equations is very rarely needed
● The graphs, charts, and anecdotes say everything
15
Not a problem
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 16
Example
● Choosing an activation function for a neural network
● What’s the difference between sigmoid and tanh? They’re both just non-linear equations
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 17
Sigmoid
● Outputs between 0 and 1● For inputs, action is in the center, close to 0
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 18
Tanh
● Outputs between -1 and 1● For inputs, action is in the center, close to 0
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 19
When does it matter?
● Do you want negative numbers or not?
● If you don’t care, it doesn’t really matter. Use whatever gets highest accuracy
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 20
Example: What do the layers do?
● Neural networks are made of layers● There are many types of layers● How do we understand them?
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 21
What is an LSTM?
i[t] = σ(W[x->i]x[t] + W[h->i]h[t−1] + W[c->i]c[t−1] + b[1->i]) (1)f[t] = σ(W[x->f]x[t] + W[h->f]h[t−1] + W[c->f]c[t−1] + b[1->f]) (2)z[t] = tanh(W[x->c]x[t] + W[h->c]h[t−1] + b[1->c]) (3)c[t] = f[t]c[t−1] + i[t]z[t] (4)o[t] = σ(W[x->o]x[t] + W[h->o]h[t−1] + W[c->o]c[t] + b[1->o]) (5)h[t] = o[t]tanh(c[t]) (6)
(source: https://github.com/Element-Research/rnn)
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 22
What is an LSTM?
Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 23
LSTM Variant: GRU
Image Source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 24
What is a convolutional network?
output[i][j][k] = bias[k] + sum_l sum_{s=1}^kW sum_{t=1}^kH w[s][t][l][k] * i[dW*(i-1)+s)][dH*(j-1)+t][l]
(source: https://github.com/torch/nn/blob/master/doc/convolution.md)
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 25
What is a convolutional network?
Image Source: http://deeplearning.net/tutorial/lenet.html
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 26
Understanding the layers
● Layers can be understood as math equations, but why bother?
● The high-level intuitions are much more useful
● You don’t need to know how the CPU works to write Python code. Same with deep learning.
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 27
Examples
● Linear/Dense = Generally combines / processes data
● Convolution = Matches fixed number of patterns on data
● Recurrent = Processes data of arbitrary size, like sequences/arrays
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 28
Examples
● Attention Layer = Suppresses irrelevant information
● Dropout = Prevents overfitting, spreads the knowledge across vector
● Batch Norm = Speeds up learning● Pooling = Combines nearby data
together
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 28
Programming Analogies
● Projection - cast<float64[]>(data)● Dense Layer - function (data) {...}● Recurrent Layer - for () {...} loop● Convolutional Layer - RegExp● Attention Layer - Map & Reduce
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 29
The architecture
● At architecture level, the math is hideous! It can only be understood as a graph
● Many intuitions form based on the graphs
● What's possible, what’s useful
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 30
Understanding the architecture
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 31
Edges and Nodes
● What is an edge?○ Technically it’s a vector, e.g. <1, 5.3, 3.1>○ Intuitively, its information
● Whats a node?○ Technically, a bundle of math equations○ Intuitively, an information processing unit
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 32
The architecture: recurrent translation
Image Source: http://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 33
The architecture: recurrent translation
Image Source: http://cs224d.stanford.edu/lectures/CS224d-Lecture8.pdf
All information about that sentence!
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 34
The architecture: recurrent translation
Image Source: https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-2/
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 35
The intuition
● Any information can be represented in abstract within a vector
● Vectors can be at various stages of processing, part way between the input and finished output
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 36
The architecture: inception network
Image Source: https://arxiv.org/pdf/1409.4842.pdf
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 37
The intuition
● Sometimes it’s useful to process the same information multiple ways
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 38
The architecture: residual networks
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 39
The intuition
● Too many information processing units in a row don’t work, due to vanishing gradients
● Adding paths around processing modules allows networks to go deeper
● Works similar to gossip
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 40
Going Rogue
● After enough reading, playing, and experimenting, you start to feel comfortable enough to create original designs
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● Infinite search space of possible architectures, each with massive to infinite number of hyper-parameters
● Intuition, experimentation, trial and error dominate
41
Big problem
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● More or less, code either works or it doesn’t
● The deep neural network always works at least a little bit. Question is, what makes it better or worse?
42
Not like programming
https://www.electricbrain.io/ #bigdata2019 @electricbrainio
● Julian Konomi “The deep neural network always learns something”
● The only question is, did it learn something useful? Did it learn better than another design?
43
The Profound and Annoying Fact
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 44
Example: Regulation Matching
● Client wants to detect if part of a project plan might violate a law or internal control
● Automated Compliance Review
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 45
Options ...
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 46
Options ...
Description:Use vanilla recurrent network stack
Process one paragraph at a time
Treat each regulation / control as an independent output
Output the probability the whole paragraph violates each regulation.
Hyperparameters:● # of recurrent layers● # of dense layers● Type of recurrent layer● Size of recurrent layer● Type of activation function● Size of dense layers● Method for input word-vectors● Use attention?● Use residual connections?● more….
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 47
Options ...
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 48
Options ...
Description:Use a neural network with an external memory
Process the entire project plan rather than one paragraph at a time
Treat the regulations as mutually exclusive, n-class output
Output on a word-by-word basis whether or not that word is describing a violation
Hyperparameters:● # of recurrent layers in storage location
stack● # of recurrent layers in storage value stack● # of recurrent layers in retrieval location
stack● Size of storage location recurrent layers● Size of storage value recurrent layers● Size of retrieval location recurrent layers● # of dense layers at the end● Type of activation function● Size of dense layers● Size of single vector in NN memory● Number of locations in NN memory● Method for input word-vectors● Use attention?● Use residual connections?● more….
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 49
Options….
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 50
Options ...
Description:Use two vanilla recurrent stacks, one for the project plan and one for the regulation
Process the project plan in sentences
Process the text of the regulation through the neural network as well, computing a ‘regulation vector’
Use cosine distance between ‘regulation vector’ and ‘project vector’ to determine relevance
Hyperparameters:● # of recurrent layers in project plan stack● # of recurrent layers in regulation stack● Size of project plan recurrent layers● Size of regulation stack recurrent layers● # of dense layers after project plan stack● # of dense layers after regulation stack● Type of activation function● Size of dense layers● Size of the matching vector● Cutoff point to determine if project plan
fails regulation● Method for input word-vectors● Use attention?● Use residual connections?● more….
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 51
Challenge and Joy
● Far too many techniques to test for any one client
● If only clients had unlimited money…. (**cough cough** Google)
● One must gain an intuition on what's likely to work - can’t rely entirely on copying results from NIPS papers
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 52
Winging It
● Even smartest PhD’s don’t really understand how or why deep neural networks work so well
● Deep learning is half science, half art● Just dive in and get your hands dirty.
Don’t bother trying to “understand”
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 53
Conclusion
● Anyone with a technical background can learn to apply deep learning, and even to create novel architectures
● Mathematics not required● Learning these intuitions is fun!
https://www.electricbrain.io/ #bigdata2019 @electricbrainio 54
Have a great evening!