Chernodub Paris Deeplearning

Deep Learning for Face Recognition

Artem Chernodub, Yurii Paschenko

IMMSP NASU

Paris, 30 September 2015.

ZZWolf

𝑝 (𝑥|𝑦 )=𝑝 ( 𝑦|𝑥 )𝑝 (𝑥)

𝑝 (𝑦 )

Biological-inspired models

Neuroscience

Machine Learning

2 / 50

Biological Neural Networks

3 / 50

Artificial Neural Networks

Traditional (Shallow) Neural Networks

Deep Neural Networks

Deep Feedforward Neural Networks

Recurrent Neural Networks

4 / 50

Deep Neural Networks

Deep Learning = Learning of Representations (Features)

The traditional model of pattern recognition (since the late 50's):

fixed/engineered features + trainable classifierHand-

crafted Feature

Extractor

TrainableClassifier

Trainable Feature

Extractor

TrainableClassifier

End-to-end learning / Feature learning / Deep learning:

trainable features + trainable classifier

6 / 50

Deep Feedforward Neural Networks

• 2-stage training process: i) unsupervised pre-training; ii) fine tuning (vanishing gradients problem is beaten!).

• Number of hidden layers > 1 (usually 6-9).• 100K – 100M free parameters.• No (or less) feature preprocessing stage. 7 / 50

FLOPS comparison

https://ru.wikipedia.org/wiki/FLOPS

Type Name Flops Cost

Mobile Raspberry Pi 1st Gen, 700 Mhz

0,04 Gflops $35

Mobile Apple A8 1,4 Gflops $700 (in iPhone 6)

CPU Intel Core i7-4930K (Ivy Bridge), 3.7 GHz

140 Gflops $700

CPU Intel Core i7-5960X (Haswell), 3.0 GHz

350 Gflops $1300

GPU NVidia GTX 980 4612 Gflops (single precision),

144 Gflops (double precision)

$600 + cost of PC (~$1000)

GPU NVidia Tesla K80 8740 Gflops (single precision),

2910 Gflops (double precision)

$4500 + cost of PC (~1500)

8 / 50



Sparse Autoencoders

9 / 50

Dimensionality reduction

• Use a stacked RBM as deep auto-encoder

1. Train RBM with images as input & output

2. Limit one layer to few dimensions

Information has to pass through middle layer

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.10 /

50

Original

Deep RBN

PCA


Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625 30)


50

How to use unsupervised pre-training stage / 1

12 / 50


13 / 50


14 / 50


15 / 50

Unlabeled data

Unlabeled data is readily available

Example: Images from the web

1. Download 10’000’000 images2. Train a 9-layer DNN3. Concepts are formed by DNN


50


PCA Deep RBN

804’414 Reuters news stories, reduction to 2 dimensions


50

Hierarchy of trained representations

Low-level feature

Middle-level

feature

Top-level feature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

18 / 50

Face Recognition

Standard Face Recognition Scenario

Step 1. Face Detection

Step 2. Face Alignment

Step 3. Face Matching

20 / 50

Face Matching (Face Recognition)

A) Training the classifier B) Matching two face images

Classifier

Features

Answer21 /

50

Recognition of Face Attributes

• man;• white;• 45-50 years;• glasses – no;• moustache –

yes;• beard – yes.

22 / 50

Beauty Recognition

http://www.linkface.cn/face/beauty

Чернодуб А.Н., Пащенко Ю.А., Головченко К.А. Нейросетевая система определения привлекательности лица человека // «Нейроинформатика-2013», Москва, 21-25 января 2013, c. 254 — 259. 23 /

50




Face Matching

EigenFaces, Principal Component Analysis (PCA) for Face Matching, 1991

M. Turk, A. Pentland. Face Recognition using EigenFaces // Journal of cognitive neuroscience 3 (1), p. 71-86.

• Well-known method for dimensionality reduction.• Building features from the training data.• Impressive visualization of the basis vectors.

25 / 50

Local Binary Pattern Histograms (LBPH), 2006

• Fast & robust features for texture descriptions applied to face recognition.

• distance for comparison of histograms.• Matrix of importance regions (weighted .

T. Ahonen, A. Hadid. Face Description with Local Binary Patterns:Application to Face Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 12, p. 2037-2041.

26 / 50

Deep Face (Facebook), 2014

Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014.

Model # of parameters

Accuracy, %, LFW

Deep Face Net 128M 97.25

Human level N/A ~ 97.64

Training data: 4M facial images

27 / 50

Convolutional Neural Networks: Return of Jedi

Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks

Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook 28 /

50

http://cs231n.github.io/convolutional-networks



http://www-labs.iro.umontreal.ca/~bengioy/DLbook/



DeepFace Training Framework

Step 1. Supervised training for identification

Step 2. Use produced features for face matching

Big Face database

~1-10M images,~ 1-5K persons

ConvolutionalNeuralNetwork

Feature

1-5Klabels

Feature 2

Feature 1 - Cosine metric- Euclid metric

- metric- Siamese networks

Similarity

29 / 50

Face Recognition Algorithm

Face localization

Normalization

Feature extraction

Comparing

Verification

metricFALSE

30 / 50

Databases for Deep Face Training

• Facebook’s Social Face Classification (SCF) dataset, 2014. 4.4M labeled faces, 4030 people with 800 to 1200 faces each. Private dataset.

• Visual Geometry Group Dataset, Oxford, 2015. 2622 people with 1000 faces each. Private dataset.

• CASIA WebFace Database, 2014. 10575 people, 500K faces. Public dataset.http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

31 / 50

http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html




CASIA

• Dataset: CASIA WebFace (10 575 subjects and 494 414 face images)

• Alignment: 2D• Architectures: single CNN + triplet

loss• Verification metric:– Cosine– Joint Bayes

D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. Technical report, arXiv:1411.7923, 2014.

32 / 50

CASIA architecture

• Input: gray image 100x100

• Feature vector size: 320

• Parameters: ~5M

33 / 50

CASIA replication

Training data Accuracy on LFW

Normalized 97,27 %

No alignment 94,15 %

Our 2D alignment 96,23 %

34 / 50

Error rate vs Database size

Identities

Faces Error rate

1.5K 150K 3.1%

9K 450K 1.35%

18K 1.2M 0.87%

J. Liu, Y. Deng, T. Bai, Zh. Wei, C. Huang. Baidu Targeting Ultimate Accuracy: Face Recognition via Deep Embedding, 2015 http://hdl.handle.net/1721.1/97999

35 / 50

http://hdl.handle.net/1721.1/97999



Face Databases (benchmarks)

AT&T Database (Olivetti, ORL): 1992-1994

F. Samaria, A. Harter. Parameterisation of a Stochastic Model for Human Face Identification // Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, December 1994.

http://www.uk.research.att.com/

• Grayscale images, 92x112.

• 40 subjects, 10 images per subject.

• Different facial expressions, poses, glasses.

• Studio, single session.

37 / 50



FERET dataset, 1994-1996

P.J. Phillips, H. Moon, P. Rauss, S. A. Rizvi. The FERET September 1996 Database and Evaluation Procedure // First International Conference, AVBPA'97 Crans-Montana, Switzerland, March 12–14, 1997, pp. 395-402

http://www.nist.gov/srd/

• 2413 still facial images, representing 856 individuals, studio, two sessions.

• Different facial expressions, poses, light.

38 / 50



Labeled Faces in the Wild dataset, 2007

G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments // University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.http://vis-www.cs.umass.edu/lfw/

• Images with faces collected from the web.

• Pairs comparison, restricted mode.

• test: 10-fold cross-validation, 6000 face pairs.

39 / 50

http://vis-www.cs.umass.edu/lfw/

http://vis-www.cs.umass.edu/lfw/

Results for Face Matching, LWF (option “Labeled outside data”)

http://vis-www.cs.umass.edu/lfw/results.html

Method Accuracy

1 EigenFaces 60,20%

2 LBP-classic (3K features) 72,43%

3 High-dim LBP (100K features) 95,17%

4 DeepFace 97,25%

5 Human level 97,64%

6 DeepID3 99,53%

7 Baidu 99,77%

40 / 50



Face Detection

Viola-Jones Object Detector

• Very popular for Face Detection.• Different features except HAAR: LBP, SURF/SIFT, etc.• Available free in OpenCV library (http://opencv.org).• OpenCV implementation supports Windows, Linux,

Mac OS, iOS (not officially) and Android.• May be trained with MATLAB tool traincascade.

42 / 50

http://opencv.org/

Images pyramid for Viola-Jones

43 / 50

FDDB: Face Detection Data Set and Benchmark

Vidit Jain and Erik Learned-Miller. FDDB: A Benchmark for Face Detection in Unconstrained Settings // Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst. 2010.

http://vis-www.cs.umass.edu/fddb

• 5171 faces in a set of 2845 LWF images.

• Special software tool for testing.

44 / 50



Viola-Jones Object Detector Classifier Structure

P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001.

45 / 50

FDDB: state-of-the-art

http://vis-www.cs.umass.edu/fddb/results.html46 /

50

http://vis-www.cs.umass.edu/fddb/results.html

http://vis-www.cs.umass.edu/fddb/results.html

Face Alignment

3D Face Alignment

Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014. 48 /

50

2D Face Alignment

D. Yi, Z. Lei, S. Liao, S.Z. Li. Learning Face Representation from Scratch // CoRR abs/1411.7923 (2014). [CASIA]

49 / 50

Some Computer Vision / Machine Learning tools• OpenCV, VLFeat – popular Computer Vision

libraries.• mexopencv – MATLAB interface for

OpenCV.• pythonXY – all-in-one Machine Learning

Package.• libSVM – reliable cross-platform Support

Vector Machine library.• NetLab – Shallow Neural Networks & other

ML models, from Christopher Bishop’s book.

• Caffe, Theano, Torch, matconvnet – frameworks for training Deep Neural Networks. 50 /

50

contact: [email protected]

Thanks!

mailto:[email protected]

Documents

Chernodub Paris Deeplearning