3
Deep Learning Face Attributes in the Wild Supplementary Material Ziwei Liu 1 Ping Luo 1 Xiaogang Wang 2 Xiaoou Tang 1 1 Department of Information Engineering, The Chinese University of Hong Kong 2 Department of Electronic Engineering, The Chinese University of Hong Kong {lz013,pluo,xtang}@ie.cuhk.edu.hk, [email protected] 1. Network Structures Table. 1 shows network structures for LNet and ANet. Detailed information are listed: filter number × filter size (e.g. 96 × 11 2 ), filter stride (e.g. str:4), pooling window size (e.g. pool:3 2 ), pooling stride (e.g. pool str:2). LRN represents local response normalization and (l) represents locally shared filters. C1 C2 C3 C4 C5 LNet 96 × 11 2 256 × 5 2 384 × 3 2 384 × 3 2 256 × 3 2 str:4,LRN str:1,LRN str:1 str:1 str:1 pool:3 2 pool:3 2 pool:3 2 pool str:2 pool str:2 pool str:2 ANet 20 × 4 2 40 × 3 2 60 × 3 2 80 × 2 2 - str:1 str:1 str:1 (l) str:1 (l) pool:2 2 pool:2 2 pool:2 2 pool str:2 pool str:2 pool str:2 Table 1. Network structures of LNet and ANet. 2. Effectiveness of LNet Attribute-specific regions discovery Different at- tributes capture information from different regions of face. We show that LNet automatically learns to discover these regions. Given an attribute, by converting fully connected layers of LNet into fully convolutional layers following [2], we can locate important region of this attribute. Fig.1 shows some examples. The important regions of some attributes are locally distributed, such as ‘Bags Under Eyes’, ‘Straight Hair’ and ‘Wearing Necklace’, but some are globally dis- tributed, such as ‘Young’, ‘Male’ and ‘Attractive’. More examples of LNet response maps Fig.4 shows more examples of LNet response maps on full images under different circumstances (lighting, pose, occlusion, image resolution, background clutter etc.). Input Image Attractive Wearing Lipstick Wearing Earrings Wearing Necklace Input Image Receding Hairline Bags Under Eyes Chubby Wearing Necktie Male Straight Hair (b) (c) Input Image Pale Skin Bangs Brown Hair No Beard Young (a) Figure 1. Attribute-specific regions discovery. 3. Effectiveness of ANet Attribute Grouping Here we show that the weight matrix at the FC layer of ANet can implicitly capture relations between attributes. Each column vector of the weight matrix can be viewed as a decision hyperplane to partition the negatives and positive samples of an attribute. By simply applying k-means to these vectors, the clusters show clear grouping patterns, which can be interpreted semantically. As shown in Fig.2, Group #1, Group #2 and Group #4 demonstrate co-occurrence relationship between attributes, e.g. ‘Attractive’ and ‘Heavy Makeup’ have high correlation. Attributes in Group #3 share similar color descriptors, while attributes in Group #6 correspond to certain texture and appearance traits. Semantic Concepts Emerging In the video we illus- trate semantic concepts emerge w.r.t. the training iterations. Both pre-training and fine-tuning process are visualized. 4. Attribute Prediction Performance on LFWA+ This experiment shows that the proposed approach can be generalized to attributes

Deep Learning Face Attributes in the Wild › papers › faceattributes_supp.pdfes (FaceTracer [1], PANDA-w [1] and PANDA-l [1]) by 8, 10 and 3 percent on average, respectively. It

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep Learning Face Attributes in the Wild › papers › faceattributes_supp.pdfes (FaceTracer [1], PANDA-w [1] and PANDA-l [1]) by 8, 10 and 3 percent on average, respectively. It

Deep Learning Face Attributes in the Wild

Supplementary Material

Ziwei Liu1 Ping Luo1 Xiaogang Wang2 Xiaoou Tang1

1Department of Information Engineering, The Chinese University of Hong Kong2Department of Electronic Engineering, The Chinese University of Hong Kong

{lz013,pluo,xtang}@ie.cuhk.edu.hk, [email protected]

1. Network Structures

Table. 1 shows network structures for LNet and ANet.Detailed information are listed: filter number × filter size(e.g. 96 × 112), filter stride (e.g. str:4), pooling windowsize (e.g. pool:32), pooling stride (e.g. pool str:2). LRNrepresents local response normalization and (l) representslocally shared filters.

C1 C2 C3 C4 C5

LNet

96 × 112 256 × 52 384 × 32 384 × 32 256 × 32

str:4,LRN str:1,LRN str:1 str:1 str:1pool:32 pool:32 pool:32

pool str:2 pool str:2 pool str:2

ANet

20 × 42 40 × 32 60 × 32 80 × 22

-str:1 str:1 str:1 (l) str:1 (l)pool:22 pool:22 pool:22

pool str:2 pool str:2 pool str:2

Table 1. Network structures of LNet and ANet.

2. Effectiveness of LNet

Attribute-specific regions discovery Different at-tributes capture information from different regions of face.We show that LNet automatically learns to discover theseregions. Given an attribute, by converting fully connectedlayers of LNet into fully convolutional layers following [2],we can locate important region of this attribute. Fig.1 showssome examples. The important regions of some attributesare locally distributed, such as ‘Bags Under Eyes’, ‘StraightHair’ and ‘Wearing Necklace’, but some are globally dis-tributed, such as ‘Young’, ‘Male’ and ‘Attractive’.

More examples of LNet response maps Fig.4 showsmore examples of LNet response maps on full images underdifferent circumstances (lighting, pose, occlusion, imageresolution, background clutter etc.).

Input Image Attractive Wearing Lipstick

Wearing Earrings

Wearing Necklace

Input Image Receding Hairline

Bags Under Eyes

Chubby Wearing Necktie

Male

Straight Hair

(b)

(c)

Input Image Pale Skin Bangs Brown Hair No BeardYoung

(a)

Figure 1. Attribute-specific regions discovery.

3. Effectiveness of ANetAttribute Grouping Here we show that the weight

matrix at the FC layer of ANet can implicitly capturerelations between attributes. Each column vector of theweight matrix can be viewed as a decision hyperplane topartition the negatives and positive samples of an attribute.By simply applying k-means to these vectors, the clustersshow clear grouping patterns, which can be interpretedsemantically. As shown in Fig.2, Group #1, Group #2 andGroup #4 demonstrate co-occurrence relationship betweenattributes, e.g. ‘Attractive’ and ‘Heavy Makeup’ have highcorrelation. Attributes in Group #3 share similar colordescriptors, while attributes in Group #6 correspond tocertain texture and appearance traits.

Semantic Concepts Emerging In the video we illus-trate semantic concepts emerge w.r.t. the training iterations.Both pre-training and fine-tuning process are visualized.

4. Attribute PredictionPerformance on LFWA+ This experiment shows that

the proposed approach can be generalized to attributes

Page 2: Deep Learning Face Attributes in the Wild › papers › faceattributes_supp.pdfes (FaceTracer [1], PANDA-w [1] and PANDA-l [1]) by 8, 10 and 3 percent on average, respectively. It

Asi

an

Indi

an

Whi

te

Bla

ck

Bab

y

Chi

ld

Mid

dle

Age

d

Seni

or

No

Eye

wea

r

Frow

ning

Har

shL

ight

.

Flas

h

Soft

Lig

ht.

Out

door

Cur

lyH

air

F.V.

Fore

head

P.V.

Fore

head

Obs

.For

ehea

d

Eye

sO

pen

Mou

thC

lose

d

FaceTracer [1] 86 87 74 91 97 83 77 78 84 78 73 82 69 73 66 75 80 77 80 74PANDA-w [3] 80 84 71 88 97 81 74 76 78 77 69 80 61 72 68 73 83 80 74 69PANDA-l [3] 86 92 82 93 97 83 76 78 82 89 78 87 74 79 75 79 81 82 70 82LNets+ANet 95 93 87 97 98 86 80 82 89 91 77 89 71 82 76 83 88 87 76 84

Teet

hN

.V.

Rou

ndJa

w

Squa

reFa

ce

Rou

ndFa

ce

Col

orPh

oto

Pose

dPh

oto

Shin

ySk

in

Stro

ngN

.Lin

es

Flus

hed

Face

Bro

wn

Eye

s

Aver

age

FaceTracer [1] 74 80 82 80 81 66 81 73 75 67 77PANDA-w [3] 71 77 80 78 77 64 80 73 69 63 75PANDA-l [3] 86 80 80 77 90 72 80 85 79 71 82LNets+ANet 87 82 83 84 91 75 84 87 83 75 85

Table 2. Performance comparison of FaceTracer [1], PANDA-w [3], PANDA-l [3] and LNets+ANet on LFWA+.

5_o_Clock_Shadow

Bald

Goatee

Male

MustacheSideburns

Wearing_Necktie

Arched_Eyebrows

Bags_Under_Eyes

Big_Lips

Big_Nose

Bushy_Eyebrows

ChubbyDouble_Chin

Narrow_Eyes

Receding_Hairline

Wearing_Earrings

Wearing_Necklace

Attractive

Bangs

Brown_Hair

Heavy_Makeup

No_Beard

Oval_Face

Pointy_Nose

Rosy_Cheeks

Wavy_HairWearing_Lipstick

Young

Black_Hair

Straight_Hair

Blond_Hair

Blurry

Gray_Hair

Pale_Skin

Wearing_Hat

Eyeglasses

High_Cheekbones

Mouth_Slightly_Open

Smiling

5_o_Clock_Shadow

Bald

Goatee

Male

MustacheSideburns

Wearing_Necktie

Arched_Eyebrows

Bags_Under_Eyes

Big_Lips

Big_Nose

Bushy_Eyebrows

ChubbyDouble_Chin

Narrow_Eyes

Receding_Hairline

Wearing_Earrings

Wearing_Necklace

Attractive

Bangs

Brown_Hair

Heavy_Makeup

No_Beard

Oval_Face

Pointy_Nose

Rosy_Cheeks

Wavy_HairWearing_Lipstick

Young

Black_Hair

Straight_Hair

Blond_Hair

Blurry

Gray_Hair

Pale_Skin

Wearing_Hat

Eyeglasses

High_Cheekbones

Mouth_Slightly_Open

Smiling

5_o_Clock_Shadow

Bald

Goatee

Male

MustacheSideburns

Wearing_Necktie

Arched_Eyebrows

Bags_Under_Eyes

Big_Lips

Big_Nose

Bushy_Eyebrows

ChubbyDouble_Chin

Narrow_Eyes

Receding_Hairline

Wearing_Earrings

Wearing_Necklace

Attractive

Bangs

Brown_Hair

Heavy_Makeup

No_Beard

Oval_Face

Pointy_Nose

Rosy_Cheeks

Wavy_HairWearing_Lipstick

Young

Black_Hair

Straight_Hair

Blond_Hair

Blurry

Gray_Hair

Pale_Skin

Wearing_Hat

Eyeglasses

High_Cheekbones

Mouth_Slightly_Open

Smiling

5_o_Clock_Shadow

Bald

Goatee

Male

MustacheSideburns

Wearing_Necktie

Arched_Eyebrows

Bags_Under_Eyes

Big_Lips

Big_Nose

Bushy_Eyebrows

ChubbyDouble_Chin

Narrow_Eyes

Receding_Hairline

Wearing_Earrings

Wearing_Necklace

Attractive

Bangs

Brown_Hair

Heavy_Makeup

No_Beard

Oval_Face

Pointy_Nose

Rosy_Cheeks

Wavy_HairWearing_Lipstick

Young

Black_Hair

Straight_Hair

Blond_Hair

Blurry

Gray_Hair

Pale_Skin

Wearing_Hat

Eyeglasses

High_Cheekbones

Mouth_Slightly_Open

Smiling

Group #5

Group #6

Group #3

Group #1 Group #2

Group #4

Figure 2. Attribute grouping.

1k 3k 10k82

85

88

LNets+ANet

Dataset Size

Acc

urac

y

PANDA-l

1k 3k 10k76

80

1k 3k 10k77

82

84 86CelebA LFWA LFWA+

Figure 3. Performances of different sizes of training dataset.

which are not presented in the training stage. We manuallylabel another 30 attributes on LFW and denote this extendeddataset as LFWA+. Table.2 reports the attribute predictionresults. LNets+ANet outperforms the other three approach-es (FaceTracer [1], PANDA-w [1] and PANDA-l [1]) by 8,10 and 3 percent on average, respectively. It demonstratesthat our method learns discriminative face representationsand has good generalization ability.

Size of Training Dataset We compare the attribute pre-diction accuracy of the proposed method with the accuracy

of PANDA-l, regarding different sizes of training datasets.Fig.3 demonstrates that our method performs well whendataset size is small, but the performance of PANDA-l dropssignificantly.

References[1] N. Kumar, P. Belhumeur, and S. Nayar. Facetracer: A search

engine for large collections of images with faces. In ECCV,pages 340–353. 2008. 2

[2] J. Long, E. Shelhamer, and T. Darrell. Fully convolutionalnetworks for semantic segmentation. In CVPR, 2015. 1

[3] N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev.Panda: Pose aligned networks for deep attribute modeling. InCVPR, 2014. 2

Page 3: Deep Learning Face Attributes in the Wild › papers › faceattributes_supp.pdfes (FaceTracer [1], PANDA-w [1] and PANDA-l [1]) by 8, 10 and 3 percent on average, respectively. It

Figure 4. More examples of LNet response maps.