CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/files/Week2_slides_old.pdf · 2019. 1....

Kian Katanforoosh, Andrew Ng, Younes Bensouda Mourri

CS230: Lecture 2 Deep Learning Intuition

Kian Katanforoosh


Recap


Model =

Architecture +

Learning Process

Input Output

0

Loss

Gradients

Things that can change- activation function - optimizer - Hyperparameters

Parameters

- …


Logistic Regression as a Neural Network

image2vector

255231...94142

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

…

/255

/255

/255

/255

… σ 0.73

x1(i )

wT x(i ) + b

x2(i )

xn−1(i )

xn(i )

“it’s a cat”0.73 > 0.5

…


image2vector

255231...94142

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

…

/255

/255

/255

/255

… σ 0.73

x1(i )

wT x(i ) + b

x2(i )

xn−1(i )

xn(i )

σwT x(i ) + b

σwT x(i ) + b

Multi-class

0.04

0.12

Cat?0.73 > 0.5

Giraffe?0.04 < 0.5

Dog?0.12 < 0.5


image2vector

255231...94142

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

…

/255

/255

/255

/255

… σ

x1(i )

wT x(i ) + b

x2(i )

xn−1(i )

xn(i )

σwT x(i ) + b

σwT x(i ) + b

Neural Network (Multi-class)


image2vector

255231...94142

⎛

⎝

⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟

…

/255

/255

/255

/255

…

x1(i )

x2(i )

xn−1(i )

xn(i )

Hidden layer

a1[2]

output layer

0.73

Cat

0.73 > 0.5

a1[1]

a2[1]

a3[1]

Neural Network (1 hidden layer)


Deeper network: Encoding

output layer

a1[1]

a2[1]

a3[1]

a4[1]

a1[2]

a2[2]

a3[2]

a1[3]

x1(i )

x2(i )

x3(i )

x4(i )

ŷ(i )

Technique called “encoding”

Hidden layer

Hidden layer


Let’s build intuition on concrete applications


I. Day’n’Night classification

II. Face Recognition

III. Art generation

IV. Keyword Spotting

V. Image Segmentation

Today’s outline

We will learn how to:

- Analyze a problem from a deep learning approach

- Choose an architecture

- Choose a loss and a training strategy


Day’n’Night classification (warm-up)

Goal: Given an image, classify as taken “during the day” (0) or “during the night” (1)

1. Data?

2. Input?

3. Output?

4. Architecture ?

5. Loss?

10,000 images

Resolution?

y = 0 or y = 1 Last Activation?

Split? Bias?

Shallow network should do the job pretty well

L = y log(ŷ)+ (1− y)log(1− ŷ) Easy

(64, 64, 3)

sigmoid


Face Verification

2. Input?

Resolution? (412, 412, 3)

1. Data?

Picture of every student labelled with their name

3. Output?

y = 1 (it’s you) or

y = 0 (it’s not you)

Bertrand

Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning halls, gym, pool …)



4. What architecture?

Simple solution:

database image input image

compute distance pixel per pixel

if less than threshold then y=1

- Background lighting differences - A person can wear make-up, grow a

beard… - ID photo can be outdated

Issues:

Face Verification



4. What architecture?Our solution: encode information about a picture in a vector

Deep Network

0.9310.4330.331!0.9420.1580.039

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟

128-d

Deep Network

0.9220.3430.312!0.8920.1420.024

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟

distance 0.4 y=10.4 < threshold

We gather all student faces encoding in a database. Given a new picture, we compute its distance with the encoding of card holder

Face Verification


Face Recognition

Goal: A school wants to use Face Verification for validating student IDs in facilities (dinning hall, gym, pool …)

4. Loss? Training?We need more data so that our model understands how to encode: Use public face datasets

What we really want:

similar encoding different encoding

So let’s generate triplets:

anchor positive negative

minimize encoding distance

maximize encoding distance

L = Enc(A)− Enc(P) 22

− Enc(A)− Enc(N ) 22


Model =

Architecture +

Recap: Learning Process

Input Output

Loss

Gradients

Parametersanchor positive negative



+α

0.130.42..0.100.310.73..0.430.33

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

0.950.45..0.200.410.89..0.310.34

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

0.010.54..0.450.110.49..0.120.01

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

Enc(A) Enc(P) Enc(N)


Face Recognition

Goal: A school wants to use Face Identification for recognize students in facilities (dinning hall, gym, pool …)

Goal: You want to use Face Clustering to group pictures of the same people on your smartphone

K-Nearest Neighbors

K-Means Algorithm

Maybe we need to detect the faces first?


Art generation (Neural Style Transfer)

Goal: Given a picture, make it look beautiful

1. Data?

Let’s say we have any data

2. Input? 3. Output?

content image

style image generated

image

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015



4. Architecture?

5. Loss?

We want a model that understands images very well We load an existing model trained on ImageNet for example

Deep Network classification

When this image forward propagates, we can get information about its content & its style by inspecting the layers.

L = ContentC −ContentG 22 + StyleS − StyleG 2

2

ContentCStyleS

We are not learning parameters by minimizing L. We are learning an image!

Leon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015


Correct Approach


Deep Network (pretrained)

After 2000 iterations

computeloss

update pixels using gradients

L = ContentC −ContentG 22 + StyleS − StyleG 2

2

Kian Katanforoosh, Andrew Ng, Younes Bensouda MourriLeon A. Gatys, Alexander S. Ecker, Matthias Bethge: A Neural Algorithm of Artistic Style, 2015

We are not learning parameters by minimizing L. We are learning an image!


Speech recognition: Keyword Spotting

Goal: Given an audio speech, detect the word “lion”.

1. Input?

3. Data?

2. Output?

y = (0,0,0,0,0,0,0,0,0,0,….,0,0,0,0,0,0,1,0,0, ….,0,0,0,0,0,0,0,0,0,0)

Many audio recordings (“words”)

y = 0 (there is “lion”) or y = 1 (there isn’t “lion”)


Speech recognition: Keyword Spotting

Goal: Given an audio speech, detect the word “lion”.

4. What architecture?

0.12 0.01 0.27 … … … … … … … 0.21 0.92 0.43 … … … … … … Threshold: 0.6

Deep Network

0.9310.4330.331!0.9420.1580.039

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟

Deep Network

0.9220.3430.312!0.8920.1420.024

⎛

⎝

⎜⎜⎜⎜⎜⎜⎜⎜

⎞

⎠

⎟⎟⎟⎟⎟⎟⎟⎟

distance 0.4 y=10.4 < threshold



+α


Duties for next week

For Thursday 01/25, 9am: 

C1M3• Quiz: Shallow Neural Networks• Programming Assignment: Planar data classification with one-hidden layer

C1M4• Quiz: Deep Neural Networks• Programming Assignment: Building a deep neural network - Step by Step• Programming Assignment: Deep Neural Network Application

Project• Start the project!• Fill-in AWS Form to get GPU credits

CS230: Lecture 2 Deep Learning Intuitioncs230.stanford.edu/files/Week2_slides_old.pdf · 2019. 1....

Documents

CS230-Fall 2019 Final Project Report Painting Genre ...cs230.stanford.edu/projects_fall_2019/reports/26180781.pdf · CS230-Fall 2019 Final Project Report Painting Genre Classiﬁcation

1 Cascading Style Sheets (CSS) Part 2. 2 The Sources of Style Sheets

CS230 Deep Learningcs230.stanford.edu/projects_fall_2018/reports/12437786.pdf · ANET achieved 0.87 recall rate across all test cases. CS230: Deep Learning, Fall 2018, Stanford University,

cs230.stanford.educs230.stanford.edu/projects_spring_2019/reports/18675538.pdfconvolutional neural networks. " Convolutional Neural Networks for Visual Recognition 2 (2016). [3] Sharma,

web.stanford.eduweb.stanford.edu/class/cs230/projects_spring_2018/... · 2018-09-28 · generator, 2) predictive algorithms, 3) portfolio design and risk management parameters, and

CS230 Poster Final correctedcs230.stanford.edu/projects_fall_2019/posters/26249539.pdf · Localization ork Deng 1 Qiu 2 1 Engineering 2 Engineering oduction such-[3, technique ari-the

CS230: Lecture 9 Deep Reinforcement Learning · CS230: Lecture 9 Deep Reinforcement Learning Kian Katanforoosh Menti code: 80 24 08. Kian Katanforoosh, Andrew Ng, ... Today’s outline

AIR DRYERS BENDIX STYLE AD-2 ASSEMBLY BENDIX STYLE AD-2 AIR DRYERS

Fashion style 2

CS230: Lecture 9 Deep Reinforcement Learning

Face 2 Style

Elements of style 2

CS230 Deep Learningcs230.stanford.edu/files_winter_2018/projects/6939642.pdf · with a non-native clip as the "content" and a US accent clip as the "style". The CNN classifier was

CUSTOM SUIT STYLES - impactraceproducts.com · custom suit styles standard suit styles rev.2020a (11/1/19) style 2 style 3 style 4 style 5 style 6 style 7 style 8 style 9 style 10

Midterm Review - CS230 Deep Learningcs230.stanford.edu/fall2018/midterm_review.pdf · Midterm Review CS230 Fall 2018. Broadcasting. Calculating Means How would you calculate the means

PROGRAM 1 CITIZEN FSKM MAC 2013 23/3 - … €siti rabiah binti mahadi€ €cs230€ a4 65 €siti rokiah binti mahadi€ €cs230€ a3 66 €siti roslina binti roslim€ €cs230€

CS230: Lecture 5 Case Study

Midterm Review CS 230 – Distributed Systems (cs230) Nalini Venkatasubramanian nalini@ics.uci.edu

valentineoi234987878.files.wordpress.com · informal style informal style formal style formal style informal style 2 Unit 2 (Read out the theory and the paragraph plan. Explain/Elicit

CS230 Deep Learning · The third architecture is the VGG-19 pre-trained network (taken from the Coursera Assignment 'Art Generation with Neural Style Transfer')where the style matrix