16
Deep Convolutional Neural Network using Triplet of Faces, Deep Ensemble, and Score- level Fusion for Face Recognition Bong-Nam Kang*, Yonghyun Kim , and Daijin Kim * Dept. of Creative IT Engineering, Dept. of Computer Science & Engineering {bnkang, gkyh0805, dkim}@postech.ac.kr IEEE 2017 Conference on Computer Vision and Pattern Recognition

Deep Convolutional Neural Network using Triplet of Faces

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

DeepConvolutionalNeuralNetworkusingTripletofFaces,DeepEnsemble,andScore-

levelFusionforFaceRecognition

Bong-NamKang*,Yonghyun Kim†,andDaijin Kim†

*Dept.ofCreativeITEngineering,†Dept.ofComputerScience&Engineering{bnkang,gkyh0805,dkim}@postech.ac.kr

IEEE 2017 Conference on Computer Vision and Pattern

Recognition

Introduction• Facerecognitioninunconstrainedenvironmentsisverychallengingproblem.§ Inter-classvariations

• Facialposes,expressions,andilluminationchangescauseproblemstomisidentifyfacesofdifferentidentitiesasthesameidentity.

§ Intra-classvariations

• Suchvariationswithinthesameidentitycouldoverwhelmthevariationsduetoidentitydifferencesandmakefacerecognitionchallenging.

2

HughGrant PhilMickelson

Sameperson?

HopeSoloJenniferCarpenter

• TrainingforFeatures

• Test

Overview

3

Deep Neural

Network (DNN)

Feature LearningTriplets of Faces Discriminative Features

Multiple DNNs

TL-Joint Bayesian Classifier

Sameor

Not?

Feature Extraction ClassificationInput Face Images

• JointLossFunctionforFeatureLearning

• TripletLoss𝑳𝒕𝒓𝒊𝒑𝒍𝒆𝒕𝒔

§ Theoutputofnetworkisrepresentedby𝐹 𝐼 ∈ 𝑅-.§ 𝑚 isamargin:definetheminimumratiobetweenthenegativepairsandthepositive

pairsintheEuclideanspace

DiscriminativeFeatureLearning(1/2)

4

𝑳𝒕𝒐𝒕𝒂𝒍 = 𝑳𝒕𝒓𝒊𝒑𝒍𝒆𝒕𝒔 + 𝑳𝒑𝒂𝒊𝒓𝒔 + 𝑳𝒊𝒅𝒆𝒏𝒕𝒊𝒕𝒚

𝑳𝒕𝒓𝒊𝒑𝒍𝒆𝒕𝒔 = 𝑚𝑎𝑥 0, 1 −𝐹 𝐼> − 𝐹 𝐼? @

𝐹 𝐼> − 𝐹 𝐼A @ + 𝑚

• PairwiseLoss𝑳𝒑𝒂𝒊𝒓𝒔§ Minimizetheabsolutedistancesbetweenthepositivedatainthetriplets𝑇.

• Identityloss𝑳𝒊𝒅𝒆𝒏𝒕𝒊𝒕𝒚

§ Usenegativelog-likelihoodlosswithsoftmax.§ Reflectcharacteristicsforeachidentity.§ Encouragetheseparabilityoffeatures

DiscriminativeFeatureLearning(2/2)

5

𝑳𝒑𝒂𝒊𝒓𝒔 = C 𝐹 𝐼> − 𝐹 𝐼A @@

(FG,FH)∈∀K

𝑳𝒊𝒅𝒆𝒏𝒕𝒊𝒕𝒚 = −Clog𝒆𝑭𝒊(𝑰𝒊)

∑ 𝒆𝑭𝒋(𝑰𝒊)𝒏𝒋S𝟏

𝒎

𝒊S𝟏

DescriptionusingDeepEnsemble(1/2)• FeaturesforEnsemble

§ InconventionalapplicationsofDCNN,theoutputofthelastfullyconnectedlayerf2 isusedonlyasafeature.

§ UseDNNfeaturestakenfromf1 andf2 fullyconnectedlayers.èMulti-scalefeatureeffect

6

Net.#1

Net.#2

f2

f1

f1 f2

DescriptionusingDeepEnsemble(2/2)• DeepEnsemble

7

Feature extraction from multiple Neural Networks and deep ensemble generation

ConcatenationDimensionality reduction by

PCA

Fusion• Score-levelfusion

§ UsesimilaritiesDCNNensembleandsimilaritiesofhigh-dim.LBPasfeatures.§ UseSupportVectorMachine(SVM)asaclassifier(recognizer).

8

ExperimentalResults(1/2)• Trainingdata

§ 4,048subjectswithmorethanequal10images(198,018).

§ 396,036faceimages(horizontalflipped)areusedtogenerateabout4M tripletsoffacesfortraining.

• Testdata- LFW(LabeledFacesintheWild)§ Eachof10foldersconsistsof300intra

pairsand300extrapairs(total:6,000pairs).

§ 10-foldcrossvalidation.

9

ExperimentalResults(2/2)• ResultsonLFW

10

Method No.ofimages

No. ofDNNs Feature dim. Accuracy(%)

Human - - - 97.53JointBayesian 99,773 - 8,000 92.42

Fisher vectorface N/A - 256 93.03Tom-vs-Peteclassifier 20,639 - 5,000 93.30

High-dim.LBP 99,773 - 2,000 95.17TL-JointBayesian 99,773 - 2,000 96.23

DeepFace 4M 9 4,096 x 4 97.25DeepID 202,599 120 150 (PCA) 97.45DeepID3 300,000 50 300 x 100 99.53FaceNet 200M 1 128 99.63

LearningfromScratch 494,414 2 320 97.73ProposedMethod(+JointBayesian) 198,018 4 1,024

(PCA) 96.23

ProposedMethod(+TL-JointBayesian) 198,018 4 1,024 (PCA) 98.33

ProposedMethod(Fusion) 198,018 4 6 99.08

Discussion&Conclusion

• JointLossFunctiontolearnadiscriminativefeatureiseffective

• Theproposedmethodismoreefficient§ Smallnumberofdata- only198,018trainingimages§ Only4different deepnetworkmodelsused§ Accuracy:99.08%(Score-levelFusion)

• Theproposedmethodisusefulwhen§ Theamountoftrainingdataisinsufficienttotraindeepneuralnetworks.

11

Thankyou!!!

12

Appendix

13

• Multi-scaleConvolutionLayerBlock(MCLB)§ Consistsof1x1,3x3,5x5convolutionlayers,and3x3maxpoolinglayer.

DeepConvolutionNeuralNetworkArchitecture

14

• StructureofDeepConvolutionNeuralNetwork§ ConstructedbystackingMCLBsontopofeachother(24layersdeep).§ AllconvolutionsandfullyconnectedlayersusetheReLU non-linearactivation.§ Averagepoolingtakestheaverageofeachfeaturemapandsumsoutthespatial

information.§ Dropoutisonlyappliedtothelastfullyconnectedlayerforregularization.

• Aftertrainingwithonlytripletloss,weobservedthattherangeofdistancesbetweeneachpairdatawasnotwithinthecertainrange.§ Althoughtheratioofthedistanceswaswithinthecertainrange,therangeoftheabsolutedistanceswasnotwithinthecertainrange.

DiscriminativeFeatureLearning

15

1.86

1.86

Ratiosofdistancesaresame.

Ifdistance≤ 1.25,positive(sameidentity).

Thesetwoimagearewithsameidentity.Acceptable?

ExperimentalResults• ResultsofJointLossFunctiononValidationSet

§ 55,747faceimagesareusedasavalidationset.

16

Accuracy(%) Error reduction

DNN+𝐿Z-[\]Z]^ (baseline) 88.17 -

DNN+𝐿]_Z`a[] + 𝐿Z-[\]Z]^ 91.32 26.62%

DNN+𝐿]_Z`a[] +𝐿`bZ_c + 𝐿Z-[\]Z]^ 93.45 44.63%