View
2
Download
0
Category
Preview:
Citation preview
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
CS230: Lecture 2 Deep Learning Intuition
Kian Katanforoosh
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Recap
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Model =
Architecture +
Learning Process
Input Output
0
Loss
Gradients
Things that can change- activation function - optimizer - Hyperparameters
Parameters
- …
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Logistic Regression as a Neural Network
image2vector
255231...94142
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
…
/255
/255
/255
/255
… σ 0.73
x1(i )
wT x(i ) + b
x2(i )
xn−1(i )
xn(i )
“it’s a cat”0.73 > 0.5
…
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
image2vector
255231...94142
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
…
/255
/255
/255
/255
… σ 0.73
x1(i )
wT x(i ) + b
x2(i )
xn−1(i )
xn(i )
σwT x(i ) + b
σwT x(i ) + b
Multi-class
0.04
0.12
Cat?0.73 > 0.5
Giraffe?0.04 < 0.5
Dog?0.12 < 0.5
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
image2vector
255231...94142
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
…
/255
/255
/255
/255
… σ
x1(i )
wT x(i ) + b
x2(i )
xn−1(i )
xn(i )
σwT x(i ) + b
σwT x(i ) + b
Neural Network (Multi-class)
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
image2vector
255231...94142
⎛
⎝
⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟
…
/255
/255
/255
/255
…
x1(i )
x2(i )
xn−1(i )
xn(i )
Hidden layer
a1[2]
output layer
0.73
Cat
0.73 > 0.5
a1[1]
a2[1]
a3[1]
Neural Network (1 hidden layer)
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Deeper network: Encoding
output layer
a1[1]
a2[1]
a3[1]
a4[1]
a1[2]
a2[2]
a3[2]
a1[3]
x1(i )
x2(i )
x3(i )
x4(i )
ŷ(i )
Technique called “encoding”
Hidden layer
Hidden layer
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Let’s build intuition on concrete applications
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
I. Day’n’Night classification
II. Face Recognition
III. Art generation
IV. Keyword Spotting
V. Image Segmentation
Today’s outline
We will learn how to:
- Analyze a problem from a deep learning approach
- Choose an architecture
- Choose a loss and a training strategy
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Day’n’Night classification (warm-up)
Goal: Given an image, classify as taken “during the day” (0) or “during the night” (1)
1. Data?
2. Input?
3. Output?
4. Architecture ?
5. Loss?
10,000 images
Resolution?
y = 0 or y = 1 Last Activation?
Split? Bias?
Shallow network should do the job pretty well
L = y log(ŷ)+ (1− y)log(1− ŷ) Easy
(64, 64, 3)
sigmoid
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Verification
2. Input?
Resolution? (412, 412, 3)
1. Data?
Picture of every student labelled with their name
3. Output?
y = 1 (it’s you) or
y = 0 (it’s not you)
Bertrand
Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning halls, gym, pool …)
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning halls, gym, pool …)
4. What architecture?
Simple solution:
database image input image
compute distance pixel per pixel
if less than threshold then y=1
- Background lighting differences - A person can wear make-up, grow a
beard… - ID photo can be outdated
Issues:
Face Verification
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning halls, gym, pool …)
4. What architecture?Our solution: encode information about a picture in a vector
Deep Network
0.9310.4330.331!0.9420.1580.039
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟
128-d
Deep Network
0.9220.3430.312!0.8920.1420.024
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟
distance 0.4 y=10.4 < threshold
We gather all student faces encoding in a database. Given a new picture, we compute its distance with the encoding of card holder
Face Verification
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Recognition
Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning hall, gym, pool …)
4. Loss? Training?We need more data so that our model understands how to encode: Use public face datasets
What we really want:
similar encoding different encoding
So let’s generate triplets:
anchor positive negative
minimize encoding distance
maximize encoding distance
L = Enc(A)− Enc(P) 22
− Enc(A)− Enc(N ) 22
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Model =
Architecture +
Recap: Learning Process
Input Output
Loss
Gradients
Parametersanchor positive negative
L = Enc(A)− Enc(P) 22
− Enc(A)− Enc(N ) 22
+α
0.130.42..0.100.310.73..0.430.33
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
0.950.45..0.200.410.89..0.310.34
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
0.010.54..0.450.110.49..0.120.01
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟
Enc(A) Enc(P) Enc(N)
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Face Recognition
Goal: A school wants to use Face Identification for recognize students in facilities (dinning hall, gym, pool …)
Goal: You want to use Face Clustering to group pictures of the same people on your smartphone
K-Nearest Neighbors
K-Means Algorithm
Maybe we need to detect the faces first?
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Art generation (Neural Style Transfer)
Goal: Given a picture, make it look beautiful
1. Data?
Let’s say we have any data
2. Input? 3. Output?
content image
style image generated
image
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Art generation (Neural Style Transfer)
4. Architecture?
5. Loss?
We want a model that understands images very well We load an existing model trained on ImageNet for example
Deep Network classification
When this image forward propagates, we can get information about its content & its style by inspecting the layers.
L = ContentC −ContentG 22 + StyleS − StyleG 2
2
ContentCStyleS
We are not learning parameters by minimizing L. We are learning an image!
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Correct Approach
Art generation (Neural Style Transfer)
Deep Network (pretrained)
After 2000 iterations
computeloss
update pixels using gradients
L = ContentC −ContentG 22 + StyleS − StyleG 2
2
Kian Katanforoosh, Andrew Ng, Younes Bensouda MourriLeon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015
We are not learning parameters by minimizing L. We are learning an image!
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Speech recognition: Keyword Spotting
Goal: Given an audio speech, detect the word “lion”.
1. Input?
3. Data?
2. Output?
y = (0,0,0,0,0,0,0,0,0,0,….,0,0,0,0,0,0,1,0,0, ….,0,0,0,0,0,0,0,0,0,0)
Many audio recordings (“words”)
y = 0 (there is “lion”) or y = 1 (there isn’t “lion”)
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Speech recognition: Keyword Spotting
Goal: Given an audio speech, detect the word “lion”.
4. What architecture?
0.12 0.01 0.27 … … … … … … … 0.21 0.92 0.43 … … … … … … Threshold: 0.6
Deep Network
0.9310.4330.331!0.9420.1580.039
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟
Deep Network
0.9220.3430.312!0.8920.1420.024
⎛
⎝
⎜⎜⎜⎜⎜⎜⎜⎜
⎞
⎠
⎟⎟⎟⎟⎟⎟⎟⎟
distance 0.4 y=10.4 < threshold
L = Enc(A)− Enc(P) 22
− Enc(A)− Enc(N ) 22
+α
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri
Duties for next week
For Thursday 01/25, 9am:
C1M3• Quiz: Shallow Neural Networks• Programming Assignment: Planar data classification with one-hidden layer
C1M4• Quiz: Deep Neural Networks• Programming Assignment: Building a deep neural network - Step by Step• Programming Assignment: Deep Neural Network Application
Project• Start the project!• Fill-in AWS Form to get GPU credits
Recommended