23
Face Recognition

Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

Face Recognition

Page 2: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# whoami Rubén Martínez Sánchez

• Twitter: @eldarsilver

• Computer Engineer (Universidad Politécnica Madrid)

• Security Researcher (Pentester)

• Certified Etical Hacker (CEH)

• Member of MundoHacker (TV Show)

• Master Data Science Datahack

• Cloudera Developer Training for Apache Spark

• Cloudera Developer Training for Apache Hadoop

Page 3: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

Agenda# ls()

• Face Recognition

• How to train your Network: Triplet Loss

• How to create your Dataset:

• Types of triplets

• Offline vs Online Triplet Mining

• Online Triplet Mining wins

• Hardening: Spatial Transformer Network

• Demo

• New paths: Spiking Neural Networks

Page 4: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Face_Recognition

• Introduction

Face Recognition Pipeline: Image HC Face CNN + Transfer Learning + STN Identity

Easily expandable: Embeddings.

• Haar Cascades

Page 5: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Face_Recognition

• Convolutional Layer

A kernel will be applied over a region of the image (NxN pixels), making an

element wise product of each pixel of that region with the corresponding pixel

of the kernel to finally add those resulting values.

Stride (S) (1 by default).

Number of kernels.

Kernel size (F).

Zero Padding (P) (add zeros around the image).

Feature Map size = (N-F+2P)/S +1

Page 6: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Face_Recognition

• Convolutional Layer

Page 7: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Face_Recognition

• Pooling Layer

This layer reduces the dimensionality of each feature map but retains the most

important information

Types: MaxPooling, Avg, Sum, etc.

It provides translation invariance depending on the size of the Receptive Field.

In a convolution neural network each unit in a hidden layer is only connected

to a small number of units in the previous layer. This region is called the

Receptive Field. (is the region of the input space that affects a particular unit

of the network) .

is the filter size in layer k. Product from i = 1 to k-1 of the stride in layer i.

is the stride in layer i.

With a small receptive field, the effects of a pooling operator are only felt

towards deeper layers

𝑓𝑘

𝑅0 = 1

𝑠𝑖

Page 8: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Triplet_Loss

• Definition

anchor

positive

negative

The loss of a triplet (a, p, n):

CNN

CNN

CNN

Shared weights

Shared weights

Embedding

Embedding

Embedding

Triplet Loss

𝐿 = 𝑚𝑎𝑥 𝑑 𝑎, 𝑝 − 𝑑 𝑎, 𝑛 + 𝑚𝑎𝑟𝑔𝑖𝑛, 0

Page 9: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat How_to_create_your_Dataset

• Types of triplets

Easy triplets:

Semihard triplets:

Hard triplets:

𝑑 𝑎, 𝑝 + 𝑚𝑎𝑟𝑔𝑖𝑛 < 𝑑 𝑎, 𝑛

𝑑 𝑎, 𝑝 < 𝑑 𝑎, 𝑛 < 𝑑 𝑎, 𝑝 + 𝑚𝑎𝑟𝑔𝑖𝑛

𝑑 𝑎, 𝑛 < 𝑑 𝑎, 𝑝

Page 10: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat How_to_create_your_Dataset

• Offline Triplet Mining

Create a list of hard or semihard triplets with all the embeddings each epoch.

Create batches of size N:

Compute 3N embeddings + loss of these N triplets + backpropagation.

Not efficient.

• Online Triplet Mining

Each Batch of N inputs:

Compute N embeddings → maximum of triplets.

More efficient.

𝑁3

Page 11: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Online_Triplet_Mining_Wins

• Strategies

Batch of size N.

Batch all.

Batch hard.

• Batch all

Select all the valid triplets.

It avoids easy triplets.

Page 12: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Online_Triplet_Mining_Wins

• Batch hard

It finds the hardest positive and the hardest negative for each anchor.

Hardest positive:

Compute pairwise distance matrix.

Compute 2D mask of valid pairs (a, p): Returns tf.bool `Tensor` with shape

[batch_size, batch_size]. Mask_positive[a, p] is True if a and p are distinct

and have same label.

posit_dist = tf.multiply(mask_positive, pairwise_distance) → Put to 0 triplets

where label(a) != label(p) or label(n) == label(a) or a == p.

hardest_posit = tf.reduce_max(posit_dist, axis=1)

Hardest negative:

Get pairwise distance matrix.

Compute 2D mask of valid pairs (a, n): They should have different label.

For each row → we add the maximum value to the invalid pairs (a, n):

max_row = tf.reduce_max(pairwise_distance, axis=1)

neg_dist = pairwise_distance + max_row * (1 – mask_negative)

hardest_negat = tf.reduce_min(neg_dist, axis=1)

Triplet Loss with Online Triplet Mining:

tf.reduce_mean(tf.maximum(hardest_posit + margin – hardest_negat, 0.0))

Page 13: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Hardening_with_Spatial_Transformer_Networks

• Intro

Goal → Add geometric transformation on an input.

The parameters of the transformation are learnt using the backpropagation

algorithm.

Properties:

Modular.

Specific Transformation for each input.

Trainable with Backpropagation.

Components:

Localisation Network.

Grid Generator.

Sampler.

Page 14: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Hardening_with_Spatial_Transformer_Networks

• Localisation Network

Goal → DNN or CNN estimating the parameters of a spatial transformation

based on the input grid.

Affine Transformations:

Components:

Input: Feature Map of

shape (h, w, c).

Output: Transformation

matrix of shape (6,).

𝛩

𝑃′ =𝑎 𝑏𝑑 𝑒

⋅𝑥𝑦 +

𝑐𝑓 =

𝑎𝑥 + 𝑏𝑦 + 𝑐𝑑𝑥 + 𝑒𝑦 + 𝑓

𝑎 𝑏 𝑐𝑑 𝑒 𝑓0 0 1

⋅𝑥𝑦1

=𝑎 ⋅ 𝑥 + 𝑏 ⋅ 𝑦 + 𝑐𝑑 ⋅ 𝑥 + 𝑒 ⋅ 𝑦 + 𝑓

1

Page 15: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Hardening_with_Spatial_Transformer_Networks

• Grid Generator

Goal → output a parameterised sampling grid.

Creates a normalized meshgrid G of the same size as U → set of indices

Apply an affine transformation to this meshgrid.

Source: https://papers.nips.cc/paper/5854-spatial-transformer-networks.pdf

𝑥𝑡, 𝑦𝑡

𝑥𝑠

𝑦𝑠=

𝜃11 𝜃12 𝜃13𝜃21 𝜃22 𝜃23

⋅𝑥𝑡

𝑦𝑡

1

Page 16: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Hardening_with_Spatial_Transformer_Networks

• Grid Generator

Source: https://www.datahack.es/tensorflow-potenciar-convoluciones-spatial-

transformers/

Page 17: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Hardening_with_Spatial_Transformer_Networks

• Sampler

Goal → Produce the sampled output V using the initial feature map, the

transformed meshgrid and a differentiable interpolation function (bilinear).

Page 18: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# ./Demo_Effect

Page 19: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# ./Demo_Effect

Page 20: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Spiking_Neural_Networks

• Biological Retina

• Source: https://slideplayer.com/slide/7469065

Page 21: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Spiking_Neural_Networks

• Leaky Integrate and Fire Model

Membrane Capacitance Cm, Membrane Voltage Vm, Membrane Resistance Rm,

Conductance G = 1 / Rm, Equilibrium Voltage Ve (between -30 mV and -90 mV),

Voltage Threshold Vth, Voltage Reset Vr, Current Im.

Spike → Vm > Vth

Change of the cell voltage over time when an external current Im is applied

(𝝉 = 𝑪𝒎 ∗ 𝑹𝒎):

𝑑𝑉

𝑑𝑡=

− 𝑉𝑚− 𝑉𝑒 +(𝐼𝑚 ∗𝑅𝑚)

𝜏

Page 22: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# cat Spiking_Neural_Networks

• Leaky Integrate and Fire Model

If our time step is ∆𝒕 ∶

V(t + ∆𝑡) ≈ 𝑉 𝑡 +𝑑𝑉

𝑑𝑡∆𝑡

Behavior of the LIF Neuron:

if Vm(t) > Vth:

Vm(t + 1) = Vr

else:

Vm(t + 1) = Vm(t) + dt * (- (Vm(t) – Ve) + Im * Rm) / 𝝉

Page 23: Presentación de PowerPointcdn.bdigital.org/PDF/BDC18/Workshop_Tensorflow.pdf# cat Face_Recognition • Pooling Layer This layer reduces the dimensionality of each feature map but

# poweroff

Thank you for your curiosity!!