1
Deep Learning Face Attributes in the Wild Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang The Chinese University of Hong Kong {lz013, pluo, xtang}@ie.cuhk.edu.hk, [email protected] 5.1. Experimental Results (Face Localization) 3. Large-scale CelebFaces Attributes Dataset 6. Conclusions 202,599 face images 5.2. Experimental Results (Attribute Prediction) IEEE 2015 International Conference on Computer Vision With carefully designed pre- training strategies, our approach is robust to background clutters and face variations. We devise a new fast feed- forward algorithm for locally shared filters to save redundant computation. We have also revealed multiple important facts about learning face representation, which shed a light on new directions of face localization and representation learning. 10,177 human identities 5 landmarks per image 40 attributes per image 0 0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8 1 False Positive Per Image True Positive Rates LNet DPM [21] ACF Multi-view [29] SURF Cascade [17] Face++ [1] 0 0.1 0.2 0.3 0.4 0.5 0.2 0.4 0.6 0.8 1 False Positive Per Image True Positive Rates LNet DPM [21] ACF Multi-view [29] SURF Cascade [17] Face++ [1] 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Overlap Ratio Recall Rates (FPPI = 0.1) LNet DPM [21] ACF Multi-view [29] SURF Cascade [17] LNet (w/o pre-training) (a) (b) High Resp. Low Resp. Test Image Neurons High Resp. Low Resp. Age Hair Color Race Gender Face Shape Eye Shape Bangs Brown Hair Pale Skin Narrow Eyes High Cheek. Eyeglasses Mustache Black Hair Smiling Big Nose Wear. Hat Blond Hair Wear. Lipstick Asian Big Eyes (b.1) (b.2) (b.3) (a.1) (a.2) (a.3) (a.4) (a.5) (a.6) Activations Identity-related Attributes ANet (FC) ANet (C4) ANet (C3) (a) 70% 75% 80% 85% 90% Smiling Wearing Hat Rosy Cheeks 5oClock Shadow 80% 85% 90% 95% 100% Male White Black Asian Accuracy Identity-non-related Attributes 50% 60% 70% 80% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Average Accuracy Percentage of Best Performing Neurons Used ANet (After fine-tuning) HOG (After PCA) single best performing neuron 4. Overall Pipeline x o Linear SVM Smiling Wavy Hair No Beard High Cheekbones h x s x f (a) LNet o (b) LNet s (c) ANet (d) Extracting features to predict attributes m n o (5) s (5) h f (4) h FC FC FC FC y Linear SVM Linear SVM x f LNets: (i) pre-trained with massive general objects (ii) face localization with weak supervision ANet: (i) pre-trained with massive face identities (ii) attribute prediction by leveraging local features LNets and ANet are jointly learned 20x larger than previous Available at: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html Arched Eyebrows Receding Hairline Smiling Mustache Young (a) HOG(landmarks)+SVM (b) Our Method true true false true true true true false false true (c) 5 attributes 10 attributes 20 attributes 40 attributes Existing methods: global and local methods Our idea: joint face localization and attribute prediction using only image-level attribute tags Global methods: not robust to deformations of objects Local methods: rely on face localization and alignment, which would fail under unconstrained face images with complex variations ... (b) multi-view detector ... ... (c) face localization by attributes (a) single detector View N View 1 Attr Config 1 Brown Hair Big Eyes Smiling Left Frontal Attr Config N Male Black Hair Sunglasses Response on Face Images Response on Bg. Images Maximum Score Percentage of Images threshold 1. Overview Face Attributes Prediction in the Wild Arched Eyebrows? Big Eyes? Receding Hairline? Mustache? 2. Motivation CelebA LFWA FaceTracer [ECCV08] 81% 74% PANDA-w [CVPR14] 79% 71% PANDA-l [CVPR14] 85% 81% SC+ANet 83% 76% LNets+ANet(w/o) 83% 79% LNets+ANet 87% 84% Problem Performance Running Time LNets: 35ms, ANet: 14ms Project Page: http://personal.ie.cuhk.edu.hk/~lz013/projects/FaceAttributes.html

Arched Eyebrows Receding Hairline Smiling Mustache Young · Arched Eyebrows Receding Hairline Smiling Mustache Young (a) G (s)+ M (b) od true true false true true true true false

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Arched Eyebrows Receding Hairline Smiling Mustache Young · Arched Eyebrows Receding Hairline Smiling Mustache Young (a) G (s)+ M (b) od true true false true true true true false

Deep Learning Face Attributes in the WildZiwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang

The Chinese University of Hong Kong

{lz013, pluo, xtang}@ie.cuhk.edu.hk, [email protected]

5.1. Experimental Results (Face Localization)

3. Large-scale CelebFaces Attributes Dataset

6. Conclusions

202,599 face images

5.2. Experimental Results (Attribute Prediction)

IEEE 2015 International

Conference on Computer Vision

With carefully designed pre-

training strategies, our approach

is robust to background clutters

and face variations.

We devise a new fast feed-

forward algorithm for locally

shared filters to save redundant

computation.

We have also revealed multiple

important facts about learning

face representation, which shed

a light on new directions of face

localization and representation

learning.

10,177 human identities

5 landmarks per image

40 attributes per image

0 0.1 0.2 0.3 0.4 0.50.2

0.4

0.6

0.8

1

False Positive Per Image

Tru

e P

ositiv

e R

ate

s

LNet

DPM [21]

ACF Multi-view [29]

SURF Cascade [17]

Face++ [1]

0 0.1 0.2 0.3 0.4 0.50.2

0.4

0.6

0.8

1

False Positive Per Image

Tru

e P

ositiv

e R

ate

s

LNet

DPM [21]

ACF Multi-view [29]

SURF Cascade [17]

Face++ [1]

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

Overlap Ratio

Re

ca

ll R

ate

s (

FP

PI

= 0

.1)

LNet

DPM [21]

ACF Multi-view [29]

SURF Cascade [17]

LNet (w/o pre-training)

(a) (b) High Resp. Low Resp. Test Image NeuronsHigh Resp. Low Resp.

Age

Hair Color

Race

Gender

Face Shape Eye Shape

Bangs Brown

Hair

Pale Skin Narrow

Eyes

High

Cheek.

Eyeglasses Mustache Black Hair Smiling Big Nose

Wear. Hat Blond Hair Wear.

Lipstick

Asian Big Eyes

(b.1)

(b.2)

(b.3)

(a.1) (a.2)

(a.3) (a.4)

(a.5) (a.6)

Activations

Identity-related AttributesANet (FC) ANet (C4) ANet (C3)(a)

70%

75%

80%

85%

90%

Smiling Wearing

Hat

Rosy

Cheeks

5oClock

Shadow

80%

85%

90%

95%

100%

Male White Black Asian

Acc

ura

cy

Identity-non-related Attributes

50%

60%

70%

80%

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

Aver

age

Acc

ura

cy

Percentage of Best Performing Neurons Used

ANet (After fine-tuning) HOG (After PCA)

single best

performing

neuron

4. Overall Pipeline

xo

Linear SVM

Smiling

Wavy Hair

No Beard

High Cheekbones

h

xs

xf

(a) LNeto (b) LNets

(c) ANet (d) Extracting features to predict attributes

m

n

o(5)

s(5)h

f(4)h

FC

FC

FC

FC y

Linear SVM

Linear SVM

xf

LNets:

(i) pre-trained with massive general objects

(ii) face localization with weak supervision

ANet:

(i) pre-trained with massive face identities

(ii) attribute prediction by leveraging local

features

LNets and ANet are jointly learned

20x larger than previous

Available at: http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

Arched Eyebrows Receding Hairline Smiling Mustache Young

(a)

HO

G(l

an

dm

ark

s)+

SV

M(b

) O

ur M

eth

od

true

true

false

true true true true

false false true

(c)

5 attributes 10 attributes 20 attributes 40 attributes

Existing methods: global and local methods

Our idea: joint face localization and attribute prediction using only image-level attribute tags

Global methods: not robust to deformations of objects

Local methods: rely on face localization and alignment, which would fail under

unconstrained face images with complex variations

...

(b) multi-view detector

...

...

(c) face localization by attributes (a) single detector

View NView 1 Attr Config 1

Brown Hair

Big Eyes

Smiling

LeftFrontal

Attr Config N

Male

Black Hair

Sunglasses

Response on Face Images

Response on Bg. Images

Maximum Score

Per

cen

tag

e o

f Im

ages

threshold

1. Overview

Face Attributes Prediction in the Wild

Arched Eyebrows?

Big Eyes?

Receding Hairline?

Mustache?

2. Motivation

CelebA LFWA

FaceTracer [ECCV08] 81% 74%

PANDA-w [CVPR14] 79% 71%

PANDA-l [CVPR14] 85% 81%

SC+ANet 83% 76%

LNets+ANet(w/o) 83% 79%

LNets+ANet 87% 84%

ProblemPerformance

Running Time

LNets: 35ms, ANet: 14ms

Project Page: http://personal.ie.cuhk.edu.hk/~lz013/projects/FaceAttributes.html