63
Google confidential | Do not distribute Introduction to Convolutional Neural Network with TensorFlow Etsuji Nakai Cloud Solutions Architect at Google 2017/03/24 ver1.0 1

Introducton to Convolutional Nerural Network with TensorFlow

Embed Size (px)

Citation preview

Page 1: Introducton to Convolutional Nerural Network with TensorFlow

Google confidential | Do not distribute

Introduction to Convolutional Neural Networkwith TensorFlow

Etsuji NakaiCloud Solutions Architect at Google2017/03/24 ver1.0

1

Page 2: Introducton to Convolutional Nerural Network with TensorFlow

Background & Objective

2

Page 3: Introducton to Convolutional Nerural Network with TensorFlow

● What's happening here?!

Image Classification Transfer Learning with Inception v3

https://codelabs.developers.google.com/codelabs/cpb102-txf-learning 3

Page 4: Introducton to Convolutional Nerural Network with TensorFlow

● Let's study the underlying mechanism with this (relatively) simple CNN.

Convolutional Neural Network with Two Convolution Layers

RawImage

Softmax Function

PoolingLayer

ConvolutionFilter

・・・

ConvolutionFilter

・・・

・・・

Dropout Layer

Fully-connected Layer

PoolingLayer

ConvolutionFilter

・・・

ConvolutionFilter

PoolingLayer

・・・

PoolingLayer

4

Page 5: Introducton to Convolutional Nerural Network with TensorFlow

● Launch Cloud Datalab.○ https://cloud.google.com/datalab/docs/quickstarts

● Open a new notebook and execute the following command.○ !git clone https://github.com/enakai00/cnn_introduction.git

● Find notebook files in "cnn_introduction" folder.

Jupyter Notebooks

5

Page 6: Introducton to Convolutional Nerural Network with TensorFlow

Logistic Regression

6

Page 7: Introducton to Convolutional Nerural Network with TensorFlow

● Training Set:○ N data points on (x, y) plane.

○ Data points belong to two categories which are labeled as t = 1, 0.

● Problem to solve:○ Find a straight line to classify the given

data.○ If there's no perfect answer (which

doesn't have any misclassification), find an optimal one in some sense.

Sample Problem

7

Page 8: Introducton to Convolutional Nerural Network with TensorFlow

● Define the straight line as below.

● We apply the maximum likelihood method to determine the parameter w.

● In other words, we will define a "probability to obtain the training set", and maximize it.

Logistic Regression: Theoretical Ground

x

y

8

Page 9: Introducton to Convolutional Nerural Network with TensorFlow

● The probability of t = 1 for a new data point at (x, y) should have the following properties.○ t = 0.5 on the separation line.○ for leaving away from the

separation line.

● This can be satisfied by translating f (x, y) into the probability through logistic sigmoid function σ(a).

Logistic Sigmoid Function

x

y

9

P(x, y) increases in this direction

Page 10: Introducton to Convolutional Nerural Network with TensorFlow

● Using the probability defined in the previous page, calculate the probability of reproducing the training set .

○ If , the probability of observing it at is ○ If , the probability of observing it at is○ These results can be expressed by a single equation as below.

(Remember that for any x.)

● Hence, the total probability of reproducing all data (likelihood function) is expressed as:

Likelihood Function of Logistic Regression

10

Page 11: Introducton to Convolutional Nerural Network with TensorFlow

● Instead of maximizing the likelihood function, we generally minimize the following loss function to avoid the underflow issue of numerical calculation.

Loss Function

11

Page 12: Introducton to Convolutional Nerural Network with TensorFlow

Gradient Descent Optimization

● By modifying parameters in the opposite direction of the gradient vector incrementally, it may eventually achieve the minimum.

12

Page 13: Introducton to Convolutional Nerural Network with TensorFlow

Learning Rate and Convergence Issue

● Learning rate ε decides the "step size" of each modification.

● The convergence of the optimization depends on the learning rate value.

Converge

Diverge

http://sebastianruder.com/optimizing-gradient-descent/ 13

Page 14: Introducton to Convolutional Nerural Network with TensorFlow

TensorFlow Programming

14

Page 15: Introducton to Convolutional Nerural Network with TensorFlow

Programming Style of TensorFlow

● All data is represented by "multidimensional list".○ In many cases, you can use a two-dimension list which is equivalent to

the matrix. So by expressing models (functions) in the matrix form, you can translate them into TensorFlow codes.

● As a concrete example, we will write the following model (functions) in TensorFlow codes.

○ Pay attention to distinguish the following three objects.

■ Placeholder : a variable to store training data.■ Variable: parameters to be adjusted by the training algorithm.■ Functions constructed with Placeholders and Variables.

15

Page 16: Introducton to Convolutional Nerural Network with TensorFlow

Programming Style of TensorFlow

● The linear function representing the straight line can be expressed using matrix as below.

● should be treated as a Placeholder which holds multiple data simultaneously in general. So let represent n-th data and using the matrix holding all data for , you can write down the following matrix equation.

○ Where corresponds to the value of f for n-th data, and the "broadcast rule" is applied to the last part . This means adding

to all matrix elements.16

Page 17: Introducton to Convolutional Nerural Network with TensorFlow

Programming Style of TensorFlow

● Finally, by applying the sigmoid function σ to each element of f , the probability for each data is calculated.

○ The "broadcast rule" is applied to , meaning applying σ to each element of f .

● These relationships are expressed by TensorFlow codes as below.

x = tf.placeholder(tf.float32, [None, 2])

w = tf.Variable(tf.zeros([2, 1]))w0 = tf.Variable(tf.zeros([1]))

f = tf.matmul(x, w) + w0p = tf.sigmoid(f)

17

Page 18: Introducton to Convolutional Nerural Network with TensorFlow

Programming Style of TensorFlow

● This explains the relationship between matrix calculations and TensorFlow codes.

x = tf.placeholder(tf.float32, [None, 2])

w = tf.Variable(tf.zeros([2, 1]))w0 = tf.Variable(tf.zeros([1]))

f = tf.matmul(x, w) + w0p = tf.sigmoid(f)

Placeholder stores training data

Matrix size (The size of row should be None to hold

arbitrary numbers of data.)

Variables represent parameters to be trained.

(Initializing to 0, here)

The "broadcast rule" (similar to NumPy array) is applied to calculations. 18

Page 19: Introducton to Convolutional Nerural Network with TensorFlow

Error Function and Training Algorithm

● To train the model (i.e. to adjust the parameters), we need to define the error function and the training algorithm.

t = tf.placeholder(tf.float32, [None, 1])

loss = -tf.reduce_sum(t*tf.log(p) + (1-t)*tf.log(1-p))

train_step = tf.train.AdamOptimizer().minimize(loss)tf.reduce_sum adds up

all matrix elements.Using Adam Optimizer

to minimize "loss"

19

Page 20: Introducton to Convolutional Nerural Network with TensorFlow

Calculations inside Session

● The TensorFlow codes we prepared so far just define functions and various relations without doing any calculation. We prepare a "Session" and actual calculations are executed in the session.

Placeholder Variable

Calculations

Placeholder

Session 20

Page 21: Introducton to Convolutional Nerural Network with TensorFlow

Using Session to Train the Model

● Create a new session and initialize Variables inside the session.

● By evaluating the training algorithm inside the session, Variables are adjusted with the gradient descent method.○ "feed_dict" specifies the data which are stored in Placeholder.○ When functions are evaluated in the session, the corresponding values

are calculated using the current values of Variables.

i = 0for _ in range(20000): i += 1 sess.run(train_step, feed_dict={x:train_x, t:train_t}) if i % 2000 == 0: loss_val, acc_val = sess.run( [loss, accuracy], feed_dict={x:train_x, t:train_t}) print ('Step: %d, Loss: %f, Accuracy: %f' % (i, loss_val, acc_val))

sess = tf.Session()sess.run(tf.initialize_all_variables())

The gradient descent method is applied using the training data

specified by feed_dict. Calculating "loss" and "accuracy" using the current

values of Variables.

21

Page 22: Introducton to Convolutional Nerural Network with TensorFlow

Exercise

● Run through the Notebook:○ No.1 Tensorflow Programming

22

Page 23: Introducton to Convolutional Nerural Network with TensorFlow

Linear Multicategory Classifier

23

Page 24: Introducton to Convolutional Nerural Network with TensorFlow

● Logistic regression gives the "probability of being classified as t = 1" for each data in the training set.

● Parameters are adjusted to minimize the following error function.

Recap: Logistic Regression

P(x, y) increases in this direction

24

Page 25: Introducton to Convolutional Nerural Network with TensorFlow

● Drawing 3-dimensional graph of , we can see that the “tilted plate” divides the

plane into two classes.

● Logistic function σ translates the height on the plate into the probability of t = 1.

Graphical Interpretation of Logistic Regression

Logistic function σ

z

25

Page 26: Introducton to Convolutional Nerural Network with TensorFlow

● How can we divide the plane into three classes (instead of two)?

● We can define three linear functions and classify the point based on “which of them has the maximum value at that point.”

○ This is equivalent to dividing with three tilted plates.

Building Multicategory Linear Classifier

26

Page 27: Introducton to Convolutional Nerural Network with TensorFlow

● We can define the probability that belongs to the i-th class with the following softmax function.

● This translates the magnitude of into the probability satisfying the

following (reasonable) conditions.

Translation to Probability with Softmax function

One dimensional example of "Softmax translation."27

Page 28: Introducton to Convolutional Nerural Network with TensorFlow

Image Classificationwith Linear Multicategory Classifier

28

Page 29: Introducton to Convolutional Nerural Network with TensorFlow

● A grayscale image with 28x28 pixels can be represented as a 784 dimensional vector which is a collection of 784 float numbers.○ In other words, it corresponds to a single point in a 784 dimensional

space!

Images as Points in High Dimensional Space

● When we spread a bunch of images into this 784 dimensional space, similar images may come together to form clusters of images.○ If this is a correct assumption, we

can classify the images by dividing the 784 dimensional space with the softmax function.

29

Page 30: Introducton to Convolutional Nerural Network with TensorFlow

Matrix Representation

● To divide M dimensional space into K classes, we prepare the K linear functions.

● Defining n-th image data as , the values of linear functions for all data can be represented as below. (The broadcast rule is applied to "+ w" operation.)

30

Page 31: Introducton to Convolutional Nerural Network with TensorFlow

Matrix Representation

● Here is the summary of the matrix representation.

Broadcast rule

31

Page 32: Introducton to Convolutional Nerural Network with TensorFlow

Matrix Representation

● Finally, we can translate the result into a probability by applying softmax function. The probability of classified as k-th category for n-th data is:

● TensorFlow has "tf.nn.softmax" function which calculates them directly from the matrix F.

32

Page 33: Introducton to Convolutional Nerural Network with TensorFlow

TensorFlow Codes of the Model

● The matrix representations we built so far can be written in TensorFlow codes as below.○ Pay attention to the difference between Placeholder and Variables.

x = tf.placeholder(tf.float32, [None, 784])w = tf.Variable(tf.zeros([784, 10]))w0 = tf.Variable(tf.zeros([10]))f = tf.matmul(x, w) + w0p = tf.nn.softmax(f)

33

Page 34: Introducton to Convolutional Nerural Network with TensorFlow

Loss Function● The class label of n-th data is given by a vector with the one-of-K

representation. It has 1 only for the k-th element meaning it's class is k.

● Since the probability of having the correct answer for this data is , the probability of having correct answers for all data is calculated as below.

● We define the loss function as below. Then, minimizing the loss function is equivalent to maximizing the probability P.

only for k' = k (the class of n-th data)

34

Page 35: Introducton to Convolutional Nerural Network with TensorFlow

TensorFlow Codes for Loss Function

● The loss function and the optimization algorithm can be written in TensorFlow codes as below.

● The following code calculates the accuracy of the model.○ "correct_prediction" is a list of bool values of "correct or incorrect."○ "accuracy" is calculated by taking the mean of bool values (1 for correct,

0 for incorrect.)

t = tf.placeholder(tf.float32, [None, 10])

loss = -tf.reduce_sum(t * tf.log(p))

train_step = tf.train.AdamOptimizer().minimize(loss)

correct_prediction = tf.equal(tf.argmax(p, 1), tf.argmax(t, 1))accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

35

Page 36: Introducton to Convolutional Nerural Network with TensorFlow

Comparing Predictions and Class Labels

● The following shows how we calculate the correctness of predictions.

Predict the class of according to the maximum probability.

Comparing these to check the answer.

Indicates the correct class of

36

Page 37: Introducton to Convolutional Nerural Network with TensorFlow

Mini-Batch Optimization of Parameters

● We repeat the optimization operations using 100 samples at a time.

i = 0for _ in range(2000): i += 1 batch_xs, batch_ts = mnist.train.next_batch(100) sess.run(train_step, feed_dict={x: batch_xs, t: batch_ts}) if i % 100 == 0: loss_val, acc_val = sess.run([loss, accuracy], feed_dict={x:mnist.test.images, t: mnist.test.labels}) print ('Step: %d, Loss: %f, Accuracy: %f' % (i, loss_val, acc_val))

……Image data

Label data

batch_xs

batch_ts

100 samples

・・・

・・・

Optimization

……

……

……

Optimization

batch_xs

batch_ts 37

Page 38: Introducton to Convolutional Nerural Network with TensorFlow

Mini-Batch Optimization of Parameters

● Mini-batch optimization has the following advantages.○ Reduce the memory usage.○ Avoid being trapped in the local minima with the random movement.

Minimum Minimum

Stochastic gradient descent with mini-batch method.

Simple gradient descent method using all training data at once.

True minimum

Local minima

38

Page 39: Introducton to Convolutional Nerural Network with TensorFlow

Exercise

Correct Incorrect

39

● Run through the Notebook:○ No.2 Softmax classifier for MNIST

Page 40: Introducton to Convolutional Nerural Network with TensorFlow

Basic Strategy ofConvolutional Network

40

Page 41: Introducton to Convolutional Nerural Network with TensorFlow

● The linear categorizer assumes that samples can be classified with flat planes.

● This cannot be a perfect assumption and fails to capture the global (topological) features of handwritten digits.

The limitation of Linear Categorizer

Correct Incorrect

Examples form the result of linear classifier.41

Page 42: Introducton to Convolutional Nerural Network with TensorFlow

● The convolutional neural network (CNN) uses image filters to extract features from images and apply hidden layers to classify them.

The Overview of Convolutional Neural Network

RawImage

Softmax Function

PoolingLayer

ConvolutionFilter

・・・

ConvolutionFilter

・・・

・・・

Dropout Layer

Fully-connected Layer

PoolingLayer

ConvolutionFilter

・・・

ConvolutionFilter

PoolingLayer

・・・

PoolingLayer

42

Page 43: Introducton to Convolutional Nerural Network with TensorFlow

● Convolutional filters are ... just an image filter you sometimes apply in Photoshop!

Examples of Convolutional Filters

Filter to blur images Filter to extract vertical edges

43

Page 44: Introducton to Convolutional Nerural Network with TensorFlow

● To classify the following training set, what would be the best filters?

Question

44

Page 45: Introducton to Convolutional Nerural Network with TensorFlow

● Applying image filters to capture various features of the image.○ For example, if we want to classify three characters "+", "-", "|", we can

apply filters to extract vertical and horizontal edges as below.

● Applying the pooling layer to (deliberately) reduce the image resolution.○ The necessary information for classification is just a density of the

filtered image.

How Convolutional Neural Network Works

45

Page 46: Introducton to Convolutional Nerural Network with TensorFlow

46

def edge_filter():

filter0 = np.array(

[[ 2, 1, 0,-1,-2],

[ 3, 2, 0,-2,-3],

[ 4, 3, 0,-3,-4],

[ 3, 2, 0,-2,-3],

[ 2, 1, 0,-1,-2]]) / 23.0

filter1 = np.array(

[[ 2, 3, 4, 3, 2],

[ 1, 2, 3, 2, 1],

[ 0, 0, 0, 0, 0],

[-1,-2,-3,-2,-1],

[-2,-3,-4,-3,-2]]) / 23.0

filter_array = np.zeros([5,5,1,2])

filter_array[:,:,0,0] = filter0

filter_array[:,:,0,1] = filter1

return tf.constant(filter_array, dtype=tf.float32)

TensorFlow code to apply the filters

x = tf.placeholder(tf.float32, [None, 784])

x_image = tf.reshape(x, [-1,28,28,1])

W_conv = edge_filter()

h_conv = tf.abs(tf.nn.conv2d(x_image, W_conv,

strides=[1,1,1,1], padding='SAME'))

h_conv_cutoff = tf.nn.relu(h_conv-0.2)

h_pool =tf.nn.max_pool(h_conv_cutoff, ksize=[1,2,2,1],

strides=[1,2,2,1], padding='SAME')

Page 47: Introducton to Convolutional Nerural Network with TensorFlow

● In this model, we use pre-defined (fixed) filters to capture vertical and horizontal edges.

● Question: How can we choose appropriate filters for more general images?

Simple Model to Classify "+", "-", "|".

Input image

Convolution filterPooling layer

Softmax

47

Page 48: Introducton to Convolutional Nerural Network with TensorFlow

Exercise

48

● Run through the Notebook:○ No.3 Convolutional Filter Example○ No.4 Toy model with static filters

Page 49: Introducton to Convolutional Nerural Network with TensorFlow

Dynamic Optimization ofConvolution Filters

49

Page 50: Introducton to Convolutional Nerural Network with TensorFlow

● In the convolutional neural network, we define filters as Variable. The optimization algorithm tries to adjust the filter values to achieve better predictions.○ The following code applies 16 filters to images with 28x28 pixels(=784

dimensional vectors).

Dynamic Optimization of Filters

num_filters = 16

x = tf.placeholder(tf.float32, [None, 784])x_image = tf.reshape(x, [-1,28,28,1])

W_conv = tf.Variable(tf.truncated_normal([5,5,1,num_filters], stddev=0.1))h_conv = tf.nn.conv2d(x_image, W_conv, strides=[1,1,1,1], padding='SAME')h_pool =tf.nn.max_pool(h_conv, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')

Placeholder to store images

Define filters as VariablesApply filters and pooling layer

50

Page 51: Introducton to Convolutional Nerural Network with TensorFlow

Exercise

51

● Run through the Notebook:○ No.5 Single layer CNN for MNIST○ No.6 Single layer CNN for MNIST result

(Since filtered images contains negative pixel values, the background of images are not necessarily white.)

Page 52: Introducton to Convolutional Nerural Network with TensorFlow

● By adding more filter (and pooling) layers, we can build multi-layer CNN.○ Filters in different layers are believed to recognize different kinds of

features, but details are still under the study.○ Dropout layer is used to avoid overfitting by randomly cutting the part of

connections during the training.

Multi-layer Convolutional Neural Network

RawImage

Softmax Function

PoolingLayer

ConvolutionFilter

・・・

ConvolutionFilter

・・・

・・・

Dropout Layer

Fully-connected Layer

PoolingLayer

ConvolutionFilter

・・・

ConvolutionFilter

PoolingLayer

・・・

PoolingLayer

52

Page 53: Introducton to Convolutional Nerural Network with TensorFlow

● Run through the Notebook:○ No.7 CNN Handwriting Recognizer

Exercise

The images which passed through the second filters.

Predicting the handwritten number.

53

Page 54: Introducton to Convolutional Nerural Network with TensorFlow

Neural Network Basics

54

Page 55: Introducton to Convolutional Nerural Network with TensorFlow

Single Layer Neural Network

● This is an example of a single layer neural network.○ Two nodes in the hidden layer transform the

value of a linear function with the activation function.

○ There are some choices for the activation function. We will use the hyperbolic tangent in the following examples.

Logistic sigmoid

Hyperbolictangent

ReLU

Hidden layer

Output layer

55

Page 56: Introducton to Convolutional Nerural Network with TensorFlow

Single Layer Neural Network

● Since the output from the hyperbolic tangent quickly changes from -1 to 1, the outputs from the hidden layer effectively split the input space into discrete regions with straight lines.○ In this example, plane is split into 4 regions.

③ ④

56

Page 57: Introducton to Convolutional Nerural Network with TensorFlow

Single Layer Neural Network

● The logistic sigmoid in the output node can classify the plane with a straight line, this single layer network can classify the 4 regions into two classes as below.

①②

④③

③④

57

Page 58: Introducton to Convolutional Nerural Network with TensorFlow

Limitation of Single Layer Network

● On the other hand, this neural network cannot classify data in the following pattern.○ How can you extend the network to cope with this data?

①②

④③

Unable to classify with a straight line.

58

Page 59: Introducton to Convolutional Nerural Network with TensorFlow

Neural Network as Logical Units

■ A single node (consisted of a linear function and the activation function) works as a logical Unit for AND or OR as below.

59

Page 60: Introducton to Convolutional Nerural Network with TensorFlow

Neural Network as Logical Units

● Since the previous pattern is equivalent to XOR, we can combine the AND and OR units to make a XOR unit. As a result, the following "Enhanced output node" can classify the previous pattern.

①②

④③

AND Ops

OR Ops

XOR Ops

60

Page 61: Introducton to Convolutional Nerural Network with TensorFlow

Neural Network as Logical Units

● Combining the hidden layer and the "enhanced output unit", it results in the following 2-layer neural network.○ The first hidden layer extract features as a combination of binary variables

, and the second hidden layer plus output node classify them as a XOR logical unit.

Classifying with XOR Logical Unit

Extracting Features

61

Page 62: Introducton to Convolutional Nerural Network with TensorFlow

Exercise● You can see the actual result on Neural Network Playground.

○ http://goo.gl/VIvOaQ

62

Page 63: Introducton to Convolutional Nerural Network with TensorFlow

Thank you!

63