32
Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri CS230: Lecture 2 Deep Learning Intuition Kian Katanforoosh

CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

CS230: Lecture 2 Deep Learning Intuition

Kian Katanforoosh

Page 2: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Recap

Page 3: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Model =

Architecture +

Learning Process

Input Output

0

Loss

Gradients

Things that can change- Activation function - Optimizer - Hyperparameters

Parameters

- …

Page 4: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Logistic Regression as a Neural Network

image2vector

255231...94142

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

/255

/255

/255

/255

… σ 0.73

x1(i )

wT x(i ) + b

x2(i )

xn−1(i )

xn(i )

“it’s a cat”0.73 > 0.5

Page 5: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

image2vector

255231...94142

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

/255

/255

/255

/255

… σ 0.73

x1(i )

wT x(i ) + b

x2(i )

xn−1(i )

xn(i )

σwT x(i ) + b

σwT x(i ) + b

Multi-class

0.04

0.12

Cat?0.73 > 0.5

Giraffe?0.04 < 0.5

Dog?0.12 < 0.5

Page 6: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

image2vector

255231...94142

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

/255

/255

/255

/255

… σ

x1(i )

wT x(i ) + b

x2(i )

xn−1(i )

xn(i )

σwT x(i ) + b

σwT x(i ) + b

Neural Network (Multi-class)

Page 7: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

image2vector

255231...94142

⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟

/255

/255

/255

/255

x1(i )

x2(i )

xn−1(i )

xn(i )

Hidden layer

a1[2]

output layer

0.73

Cat

0.73 > 0.5

a1[1]

a2[1]

a3[1]

Neural Network (1 hidden layer)

Page 8: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Deeper network: Encoding

output layer

a1[1]

a2[1]

a3[1]

a4[1]

a1[2]

a2[2]

a3[2]

a1[3]

x1(i )

x2(i )

x3(i )

x4(i )

y(i )

Technique called “encoding”

Hidden layer

Hidden layer

Page 9: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Let’s build intuition on concrete applications

Page 10: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

I. Day’n’Night classification

II. Face verification and recognition

III. Neural style transfer (Art generation)

IV. Trigger-word detection

V. Shipping model

Today’s outline

We will learn tips and tricks to:

- Analyze a problem from a deep learning approach

- Choose an architecture

- Choose a loss and a training strategy

Page 11: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Day’n’Night classification

Goal: Given an image, classify as taken “during the day” (0) or “during the night” (1)

1. Data?

2. Input?

3. Output?

4. Architecture ?

5. Loss?

10,000 images

Resolution?

y = 0 or y = 1 Last Activation?

Split? Bias?

Shallow network should do the job pretty well

L = −[y log( y)+ (1− y) log(1− y)] Easy warm up

(64, 64, 3)

sigmoid

Page 12: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Face Verification

2. Input?

Resolution? (412, 412, 3)

1. Data?

Picture of every student labelled with their name

3. Output?

y = 1 (it’s you) or

y = 0 (it’s not you)

Bertrand

Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning halls, gym, pool …)

Page 13: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning halls, gym, pool …)

4. What architecture?

Simple solution:

database image input image

compute distance pixel per pixel

if less than threshold then y=1

- Background lighting differences - A person can wear make-up, grow a

beard… - ID photo can be outdated

Issues:

Face Verification

Page 14: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning halls, gym, pool …)

4. What architecture?Our solution: encode information about a picture in a vector

Deep Network

0.9310.4330.331!0.9420.1580.039

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

128-d

Deep Network

0.9220.3430.312!0.8920.1420.024

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

distance 0.4 y=10.4 < threshold

We gather all student faces encoding in a database. Given a new picture, we compute its distance with the encoding of card holder

Face Verification

Page 15: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Face Recognition

Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning hall, gym, pool …)

4. Loss? Training?We need more data so that our model understands how to encode: Use public face datasets

What we really want: So let’s generate triplets:

minimize encoding distance

maximize encoding distancesimilar encoding different encoding

positiveanchor negative

Page 16: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Model =

Architecture +

Recap: Learning Process

Input Output

Loss

Gradients

Parameters

L = Enc(A)− Enc(P) 22

− Enc(A)− Enc(N ) 22

0.130.42..0.100.310.73..0.430.33

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

0.950.45..0.200.410.89..0.310.34

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

0.010.54..0.450.110.49..0.120.01

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

Enc(A) Enc(P) Enc(N)

0positiveanchor negative

Page 17: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Face Recognition

Goal: A school wants to use Face Identification for recognize students in facilities (dinning hall, gym, pool …)

Goal: You want to use Face Clustering to group pictures of the same people on your smartphone

K-Nearest Neighbors

K-Means Algorithm

Maybe we need to detect the faces first?

Page 18: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Art generation (Neural Style Transfer)

Goal: Given a picture, make it look beautiful

1. Data?

Let’s say we have any data

2. Input? 3. Output?

content image

style image generated

image

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015

Page 19: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Art generation (Neural Style Transfer)

4. Architecture?

5. Loss?

We want a model that understands images very well We load an existing model trained on ImageNet for example

Deep Network classification

When this image forward propagates, we can get information about its content & its style by inspecting the layers.

ContentCStyleS

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015

Page 20: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Page 21: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Correct Approach

Art generation (Neural Style Transfer)

Deep Network (pretrained)

After 2000 iterations

computeloss

update pixels using gradients

L = ContentC −ContentG 2

2 + StyleS − StyleG 2

2

Kian Katanforoosh, Andrew Ng, Younes Bensouda MourriLeon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015

We are not learning parameters by minimizing L. We are learning an image!

Page 22: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh

Trigger word detection

Goal: Given a 10sec audio speech, detect the word “activate”.

1. Data?

2. Input?

3. Output?

A bunch of 10s audio clips

Resolution? (sample rate)

Distribution?

x = A 10sec audio clip

y = 0 or y = 1

Page 23: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh

Let’s have an experiment!

y = 1y = 0y = 1

y = 000000000000000000000000000000000000000010000000000

y = 000000000000000000000000000000000000000000000000000

y = 000000000001000000000000000000000000000000000000000

Page 24: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh

Trigger word detection

Goal: Given a 10sec audio speech, detect the word “activate”.

1. Data?

2. Input?

3. Output?

4. Architecture ?

5. Loss?

A bunch of 10s audio clips

Resolution? (sample rate)

y = 00..00001..1000..000

Last Activation?

Distribution?

Sounds like it should be a RNN

L = −( y log( y)+ (1− y) log(1− y))

sigmoid (sequential)

x = A 10sec audio clip

(sequential)

y = 0 or y = 1y = 00..0000100000..000

Page 25: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh

2. Architecture search & Hyperparameter tuning1. Strategic data collection/labelling process

What is critical to the success of this project?

Positive word Negative words Background noise

000000..000001..10000..000 000000..000001..10000..000

Fourier transform

…LSTM LSTM LSTM LSTM LSTM LSTM

…σ σ σ σ σ σ

Fourier transform

00..00001..100..0

σ σ σ σ

CONV + BN

000000..000001..10000..000

…GRU +

BN

GRU +

BN

GRU +

BN

GRU +

BN

Automated labelling Never give up

+ Error analysis

Trigger word detection

Page 26: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Another way of solving the TWD problem?

Page 27: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Trigger word detection (other method)

Goal: Given an audio speech, detect the word “lion”.

4. What architecture?

0.12 0.01 0.27 … … … … … … … 0.21 0.92 0.43 … … … … … … Threshold: 0.6

Deep Network

0.9310.4330.331!0.9420.1580.039

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

Deep Network

0.9220.3430.312!0.8920.1420.024

⎜⎜⎜⎜⎜⎜⎜⎜

⎟⎟⎟⎟⎟⎟⎟⎟

distance 0.4 y=10.4 < threshold

L = Enc(A)− Enc(P) 22

− Enc(A)− Enc(N ) 22

+αy = (0,0,0,0,0,0,0,0,0,0,0,0,0,….,0,0,0,0,0,0,1,0,0, ….,0,0,0,0,0,0,0,0,0,0)

[For more on query-by-example trigger word detection, check: Guoguo Chen et al.: Query-by-example keyword spotting using long short-term memory networks (2015)]

Page 28: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Featured in the Magazine “the Most Beautiful Loss functions 2015”

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi: You Only Look Once: Unified, Real-Time Object Detection

Page 29: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

App implementation

Page 30: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Server-based or on-device?

On-device

Server-based

Model Architecture+

Learned Parametersy = 0

Model Architecture+

Learnt Parameters

y=0

Page 31: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Server-based or on-device?

On-device

Server-based

Model Architecture+

Learned Parametersy = 0

Model Architecture+

Learnt Parameters

y=0

App is light-weight

Faster predictions

App is easy to update

Works offline

Page 32: CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/fall2018/slides_week2.pdf · We load an existing model trained on ImageNet for example Deep Network classification When

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

Duties for next week

For Wednesday 10/09, 10am:

C1M3• Quiz: Shallow Neural Networks• Programming Assignment: Planar data classification with one-hidden layer

C1M4• Quiz: Deep Neural Networks• Programming Assignment: Building a deep neural network - Step by Step• Programming Assignment: Deep Neural Network Application

Others:• TA project mentorship (mandatory this week)• Friday TA section (10/05): focus on git and neural style transfer.• Fill-in AWS Form to get GPU credits for your projects