51
Deep Learning for Face Recognition Artem Chernodub, Yurii Paschenko IMMSP NASU Paris, 30 September 2015. ZZWolf

Chernodub Paris Deeplearning

Embed Size (px)

DESCRIPTION

Deeplearning for Face recognition

Citation preview

Page 1: Chernodub Paris Deeplearning

Deep Learning for Face Recognition

Artem Chernodub, Yurii Paschenko

IMMSP NASU

Paris, 30 September 2015.

ZZWolf

Page 2: Chernodub Paris Deeplearning

𝑝 (𝑥|𝑦 )=𝑝 ( 𝑦|𝑥 )𝑝 (𝑥)

𝑝 (𝑦 )

Biological-inspired models

Neuroscience

Machine Learning

2 / 50

Page 3: Chernodub Paris Deeplearning

Biological Neural Networks

3 / 50

Page 4: Chernodub Paris Deeplearning

Artificial Neural Networks

Traditional (Shallow) Neural Networks

Deep Neural Networks

Deep Feedforward Neural Networks

Recurrent Neural Networks

4 / 50

Page 5: Chernodub Paris Deeplearning

Deep Neural Networks

Page 6: Chernodub Paris Deeplearning

Deep Learning = Learning of Representations (Features)

The traditional model of pattern recognition (since the late 50's):

fixed/engineered features + trainable classifierHand-

crafted Feature

Extractor

TrainableClassifier

Trainable Feature

Extractor

TrainableClassifier

End-to-end learning / Feature learning / Deep learning:

trainable features + trainable classifier

6 / 50

Page 7: Chernodub Paris Deeplearning

Deep Feedforward Neural Networks

• 2-stage training process: i) unsupervised pre-training; ii) fine tuning (vanishing gradients problem is beaten!).

• Number of hidden layers > 1 (usually 6-9).• 100K – 100M free parameters.• No (or less) feature preprocessing stage. 7 / 50

Page 8: Chernodub Paris Deeplearning

FLOPS comparison

https://ru.wikipedia.org/wiki/FLOPS

Type Name Flops Cost

Mobile Raspberry Pi 1st Gen, 700 Mhz

0,04 Gflops $35

Mobile Apple A8 1,4 Gflops $700 (in iPhone 6)

CPU Intel Core i7-4930K (Ivy Bridge), 3.7 GHz

140 Gflops $700

CPU Intel Core i7-5960X (Haswell), 3.0 GHz

350 Gflops $1300

GPU NVidia GTX 980 4612 Gflops (single precision),

144 Gflops (double precision)

$600 + cost of PC (~$1000)

GPU NVidia Tesla K80 8740 Gflops (single precision),

2910 Gflops (double precision)

$4500 + cost of PC (~1500)

8 / 50

Page 9: Chernodub Paris Deeplearning

Sparse Autoencoders

9 / 50

Page 10: Chernodub Paris Deeplearning

Dimensionality reduction

• Use a stacked RBM as deep auto-encoder

1. Train RBM with images as input & output

2. Limit one layer to few dimensions

Information has to pass through middle layer

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.10 /

50

Page 11: Chernodub Paris Deeplearning

Original

Deep RBN

PCA

Dimensionality reduction

Olivetti face data, 25x25 pixel images reconstructed from 30 dimensions (625 30)

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.11 /

50

Page 12: Chernodub Paris Deeplearning

How to use unsupervised pre-training stage / 1

12 / 50

Page 13: Chernodub Paris Deeplearning

How to use unsupervised pre-training stage / 2

13 / 50

Page 14: Chernodub Paris Deeplearning

How to use unsupervised pre-training stage / 3

14 / 50

Page 15: Chernodub Paris Deeplearning

How to use unsupervised pre-training stage / 4

15 / 50

Page 16: Chernodub Paris Deeplearning

Unlabeled data

Unlabeled data is readily available

Example: Images from the web

1. Download 10’000’000 images2. Train a 9-layer DNN3. Concepts are formed by DNN

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.16 /

50

Page 17: Chernodub Paris Deeplearning

Dimensionality reduction

PCA Deep RBN

804’414 Reuters news stories, reduction to 2 dimensions

G. E. Hinton and R. R. Salakhutdinov. Reducing the Dimensionality of Data with Neural Networks // Science 313 (2006), p. 504 – 507.17 /

50

Page 18: Chernodub Paris Deeplearning

Hierarchy of trained representations

Low-level feature

Middle-level

feature

Top-level feature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

18 / 50

Page 19: Chernodub Paris Deeplearning

Face Recognition

Page 20: Chernodub Paris Deeplearning

Standard Face Recognition Scenario

Step 1. Face Detection

Step 2. Face Alignment

Step 3. Face Matching

20 / 50

Page 21: Chernodub Paris Deeplearning

Face Matching (Face Recognition)

A) Training the classifier B) Matching two face images

Classifier

Features

Answer21 /

50

Page 22: Chernodub Paris Deeplearning

Recognition of Face Attributes

• man;• white;• 45-50 years;• glasses – no;• moustache –

yes;• beard – yes.

22 / 50

Page 23: Chernodub Paris Deeplearning

Beauty Recognition

http://www.linkface.cn/face/beauty

Чернодуб А.Н., Пащенко Ю.А., Головченко К.А. Нейросетевая система определения привлекательности лица человека // «Нейроинформатика-2013», Москва, 21-25 января 2013, c. 254 — 259. 23 /

50

Page 24: Chernodub Paris Deeplearning

Face Matching

Page 25: Chernodub Paris Deeplearning

EigenFaces, Principal Component Analysis (PCA) for Face Matching, 1991

M. Turk, A. Pentland. Face Recognition using EigenFaces // Journal of cognitive neuroscience 3 (1), p. 71-86.

• Well-known method for dimensionality reduction.• Building features from the training data.• Impressive visualization of the basis vectors.

25 / 50

Page 26: Chernodub Paris Deeplearning

Local Binary Pattern Histograms (LBPH), 2006

• Fast & robust features for texture descriptions applied to face recognition.

• distance for comparison of histograms.• Matrix of importance regions (weighted .

T. Ahonen, A. Hadid. Face Description with Local Binary Patterns:Application to Face Recognition // IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 28, No. 12, p. 2037-2041.

26 / 50

Page 27: Chernodub Paris Deeplearning

Deep Face (Facebook), 2014

Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014.

Model # of parameters

Accuracy, %, LFW

Deep Face Net 128M 97.25

Human level N/A ~ 97.64

Training data: 4M facial images

27 / 50

Page 28: Chernodub Paris Deeplearning

Convolutional Neural Networks: Return of Jedi

Andrej Karpathy and Fei-Fei. CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.github.io/convolutional-networks

Yoshua Bengio, Ian Goodfellow and Aaron Courville. Deep Learning // An MIT Press book in preparation http://www-labs.iro.umontreal.ca/~bengioy/DLbook 28 /

50

Page 29: Chernodub Paris Deeplearning

DeepFace Training Framework

Step 1. Supervised training for identification

Step 2. Use produced features for face matching

Big Face database

~1-10M images,~ 1-5K persons

ConvolutionalNeuralNetwork

Feature

1-5Klabels

Feature 2

Feature 1 - Cosine metric- Euclid metric

- metric- Siamese networks

Similarity

29 / 50

Page 30: Chernodub Paris Deeplearning

Face Recognition Algorithm

Face localization

Normalization

Feature extraction

Comparing

Verification

metricFALSE

30 / 50

Page 31: Chernodub Paris Deeplearning

Databases for Deep Face Training

• Facebook’s Social Face Classification (SCF) dataset, 2014. 4.4M labeled faces, 4030 people with 800 to 1200 faces each. Private dataset.

• Visual Geometry Group Dataset, Oxford, 2015. 2622 people with 1000 faces each. Private dataset.

• CASIA WebFace Database, 2014. 10575 people, 500K faces. Public dataset.http://www.cbsr.ia.ac.cn/english/CASIA-WebFace-Database.html

31 / 50

Page 32: Chernodub Paris Deeplearning

CASIA

• Dataset: CASIA WebFace (10 575 subjects and 494 414 face images)

• Alignment: 2D• Architectures: single CNN + triplet

loss• Verification metric:– Cosine– Joint Bayes

D. Yi, Z. Lei, S. Liao, and S. Z. Li. Learning face representation from scratch. Technical report, arXiv:1411.7923, 2014.

32 / 50

Page 33: Chernodub Paris Deeplearning

CASIA architecture

• Input: gray image 100x100

• Feature vector size: 320

• Parameters: ~5M

33 / 50

Page 34: Chernodub Paris Deeplearning

CASIA replication

Training data Accuracy on LFW

Normalized 97,27 %

No alignment 94,15 %

Our 2D alignment 96,23 %

34 / 50

Page 35: Chernodub Paris Deeplearning

Error rate vs Database size

Identities

Faces Error rate

1.5K 150K 3.1%

9K 450K 1.35%

18K 1.2M 0.87%

J. Liu, Y. Deng, T. Bai, Zh. Wei, C. Huang. Baidu Targeting Ultimate Accuracy: Face Recognition via Deep Embedding, 2015 http://hdl.handle.net/1721.1/97999

35 / 50

Page 36: Chernodub Paris Deeplearning

Face Databases (benchmarks)

Page 37: Chernodub Paris Deeplearning

AT&T Database (Olivetti, ORL): 1992-1994

F. Samaria, A. Harter. Parameterisation of a Stochastic Model for Human Face Identification // Proceedings of 2nd IEEE Workshop on Applications of Computer Vision, Sarasota FL, December 1994.

http://www.uk.research.att.com/

• Grayscale images, 92x112.

• 40 subjects, 10 images per subject.

• Different facial expressions, poses, glasses.

• Studio, single session.

37 / 50

Page 38: Chernodub Paris Deeplearning

FERET dataset, 1994-1996

P.J. Phillips, H. Moon, P. Rauss, S. A. Rizvi. The FERET September 1996 Database and Evaluation Procedure // First International Conference, AVBPA'97 Crans-Montana, Switzerland, March 12–14, 1997, pp. 395-402

http://www.nist.gov/srd/

• 2413 still facial images, representing 856 individuals, studio, two sessions.

• Different facial expressions, poses, light.

38 / 50

Page 39: Chernodub Paris Deeplearning

Labeled Faces in the Wild dataset, 2007

G. B. Huang, M. Ramesh, T. Berg, E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments // University of Massachusetts, Amherst, Technical Report 07-49, October, 2007.http://vis-www.cs.umass.edu/lfw/

• Images with faces collected from the web.

• Pairs comparison, restricted mode.

• test: 10-fold cross-validation, 6000 face pairs.

39 / 50

Page 40: Chernodub Paris Deeplearning

Results for Face Matching, LWF (option “Labeled outside data”)

http://vis-www.cs.umass.edu/lfw/results.html

Method Accuracy

1 EigenFaces 60,20%

2 LBP-classic (3K features) 72,43%

3 High-dim LBP (100K features) 95,17%

4 DeepFace 97,25%

5 Human level 97,64%

6 DeepID3 99,53%

7 Baidu 99,77%

40 / 50

Page 41: Chernodub Paris Deeplearning

Face Detection

Page 42: Chernodub Paris Deeplearning

Viola-Jones Object Detector

• Very popular for Face Detection.• Different features except HAAR: LBP, SURF/SIFT, etc.• Available free in OpenCV library (http://opencv.org).• OpenCV implementation supports Windows, Linux,

Mac OS, iOS (not officially) and Android.• May be trained with MATLAB tool traincascade.

42 / 50

Page 43: Chernodub Paris Deeplearning

Images pyramid for Viola-Jones

43 / 50

Page 44: Chernodub Paris Deeplearning

FDDB: Face Detection Data Set and Benchmark

Vidit Jain and Erik Learned-Miller. FDDB: A Benchmark for Face Detection in Unconstrained Settings // Technical Report UM-CS-2010-009, Dept. of Computer Science, University of Massachusetts, Amherst. 2010.

http://vis-www.cs.umass.edu/fddb

• 5171 faces in a set of 2845 LWF images.

• Special software tool for testing.

44 / 50

Page 45: Chernodub Paris Deeplearning

Viola-Jones Object Detector Classifier Structure

P. Viola, M. Jones. Rapid object detection using a boosted cascade of simple features // Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001.

45 / 50

Page 46: Chernodub Paris Deeplearning

FDDB: state-of-the-art

http://vis-www.cs.umass.edu/fddb/results.html46 /

50

Page 47: Chernodub Paris Deeplearning

Face Alignment

Page 48: Chernodub Paris Deeplearning

3D Face Alignment

Y. Taigman, M. Yang, M.A. Ranzato, L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification // CVPR 2014. 48 /

50

Page 49: Chernodub Paris Deeplearning

2D Face Alignment

D. Yi, Z. Lei, S. Liao, S.Z. Li. Learning Face Representation from Scratch // CoRR abs/1411.7923 (2014). [CASIA]

49 / 50

Page 50: Chernodub Paris Deeplearning

Some Computer Vision / Machine Learning tools• OpenCV, VLFeat – popular Computer Vision

libraries.• mexopencv – MATLAB interface for

OpenCV.• pythonXY – all-in-one Machine Learning

Package.• libSVM – reliable cross-platform Support

Vector Machine library.• NetLab – Shallow Neural Networks & other

ML models, from Christopher Bishop’s book.

• Caffe, Theano, Torch, matconvnet – frameworks for training Deep Neural Networks. 50 /

50