5
Families in the Wild (FIW): Large-Scale Kinship Image Database and Benchmarks Joseph P. Robinson, Ming Shao, Yue Wu, Yun Fu Dept. of Electrical and Computer Engineering Northeastern University Boston, MA, USA {jrobins1, mingshao, yuewu, yunfu}@ece.neu.edu ABSTRACT We present the largest kinship recognition dataset to date, Families in the Wild (FIW). Motivated by the lack of a sin- gle, unified dataset for kinship recognition, we aim to pro- vide a dataset that captivates the interest of the research community. With only a small team, we were able to collect, organize, and label over 10,000 family photos of 1,000 fami- lies with our annotation tool designed to mark complex hier- archical relationships and local label information in a quick and ecient manner. We include several benchmarks for two image-based tasks, kinship verification and family recogni- tion. For this, we incorporate several visual features and metric learning methods as baselines. Also, we demonstrate that a pre-trained Convolutional Neural Network (CNN) as an o-the-shelf feature extractor outperforms the other fea- ture types. Then, results were further boosted by fine-tuning two deep CNNs on FIW data: (1) for kinship verification, a triplet loss function was learned on top of the network of pre-train weights; (2) for family recognition, a family-specific softmax classifier was added to the network. 1. INTRODUCTION Automatic kinship recognition in visual media is essential for many real-world applications: e.g., kinship verification [6, 8, 11, 26, 29, 30, 31, 32], automatic photo library man- agement [21, 27], historic lineage and genealogical studies [2], social-media analysis [10], along with many of security applications addressing missing persons, human tracking, crime scene investigations, and even our overall human sens- ing capabilities–ultimately, enhancing surveillance systems used in both real-time (e.g., vBOLO [28] mission 12 ) or of- fline (e.g., searching for a subject in a large gallery [3, 23]). Thus, a gallery of imagery annotated with rich family infor- mation should lead to more powerful multimedia retrieval 1 vBOLO: joint eort of two DHS Centers of Excellence, ALERT & VACCINE (http://www.northeastern.edu/alert) 2 https://web-oup.s3-fips-us-gov-west-1.amazonaws.com/ Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. MM ’16, October 15-19, 2016, Amsterdam, Netherlands c 2016 ACM. ISBN 978-1-4503-3603-1/16/10. . . $15.00 DOI: http://dx.doi.org/10.1145/2964284.2967219 SIBS GF-GD F-D S-S GM-GD M-D GM-GS M-S B-B GF-GS F-S Figure 1: Sample faces chosen of 11 relation- ship types of FIW. Parent-child : (top row) Father-Daughter (F-D), Father-Son (F-S), Mother- Daughter (M-S) Mother-Son (M-S). Grandparent- grandchild : (middle row) same labeling convention as above. Siblings : (bottom row) Sister-Brother (SIBS), Brother-Brother (B-B), Sister-Sister (S-S). tools and complement many existing facial recognition sys- tems (e.g., FBI’s NGI 3 ). However, even after several years (i.e., since 2010 [8]) there are only a few vision systems ca- pable of handling such tasks. Hence, kin-based technology in the visual domain has yet to truly evolve from research- to-reality. We believe the reason that kinship recognition technology has not yet advanced to real-world applications is two-fold: 1. Current image datasets available for kinship tasks are not large enough to reflect the true data distributions of a family and their members. 2. Visual evidence for kin relationships are less discrimi- nant than class types of other, more conventional ma- chine vision problems (e.g., facial recognition or object classification), as many hidden factors aect the simi- larities of facial appearances amongst family members. To that end, we introduce a large-scale image dataset for kinship recognition called Families in the Wild (FIW). To the best of our knowledge, FIW is by far the largest and most comprehensive kinship dataset available in the vi- sion and multimedia communities [see Table 1]. FIW in- cludes 11,193 unconstrained family photos of 1,000 families, which is nearly 10x more than the next-to-largest, Family101 [7]. Also, it is from these 1,000 families that 418,060 image pairs for the 11 relationship types (see Figure 1 & Table 2). Thus far, attempts at image-based kinship recognition have focused on facial features, as is the case for this work as well. Kinship recognition is typically conducted in one or more of the following modes: (1) kinship verification, which is a binary classification problem, i.e., determine whether 3 https://www.e.org/2014/fbi-to-have-52M-face-photos

Families in the Wild (FIW): Large-Scale Kinship Image ...chine vision problems (e.g., facial recognition or object classification), as many hidden factors a↵ect the simi-larities

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Families in the Wild (FIW): Large-Scale Kinship Image ...chine vision problems (e.g., facial recognition or object classification), as many hidden factors a↵ect the simi-larities

Families in the Wild (FIW): Large-Scale Kinship ImageDatabase and Benchmarks

Joseph P. Robinson, Ming Shao, Yue Wu, Yun Fu

Dept. of Electrical and Computer Engineering

Northeastern University

Boston, MA, USA

{jrobins1, mingshao, yuewu, yunfu}@ece.neu.edu

ABSTRACTWe present the largest kinship recognition dataset to date,Families in the Wild (FIW). Motivated by the lack of a sin-gle, unified dataset for kinship recognition, we aim to pro-vide a dataset that captivates the interest of the researchcommunity. With only a small team, we were able to collect,organize, and label over 10,000 family photos of 1,000 fami-lies with our annotation tool designed to mark complex hier-archical relationships and local label information in a quickand e�cient manner. We include several benchmarks for twoimage-based tasks, kinship verification and family recogni-tion. For this, we incorporate several visual features andmetric learning methods as baselines. Also, we demonstratethat a pre-trained Convolutional Neural Network (CNN) asan o↵-the-shelf feature extractor outperforms the other fea-ture types. Then, results were further boosted by fine-tuningtwo deep CNNs on FIW data: (1) for kinship verification,a triplet loss function was learned on top of the network ofpre-train weights; (2) for family recognition, a family-specificsoftmax classifier was added to the network.

1. INTRODUCTIONAutomatic kinship recognition in visual media is essential

for many real-world applications: e.g., kinship verification[6, 8, 11, 26, 29, 30, 31, 32], automatic photo library man-agement [21, 27], historic lineage and genealogical studies[2], social-media analysis [10], along with many of securityapplications addressing missing persons, human tra�cking,crime scene investigations, and even our overall human sens-ing capabilities–ultimately, enhancing surveillance systemsused in both real-time (e.g., vBOLO [28] mission12) or of-fline (e.g., searching for a subject in a large gallery [3, 23]).Thus, a gallery of imagery annotated with rich family infor-mation should lead to more powerful multimedia retrieval

1vBOLO: joint e↵ort of two DHS Centers of Excellence,ALERT & VACCINE (http://www.northeastern.edu/alert)2https://web-oup.s3-fips-us-gov-west-1.amazonaws.com/

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

MM ’16, October 15-19, 2016, Amsterdam, Netherlandsc� 2016 ACM. ISBN 978-1-4503-3603-1/16/10. . . $15.00

DOI: http://dx.doi.org/10.1145/2964284.2967219

SIBS

GF-GD

F-D

S-S

GM-GD

M-D

GM-GS

M-S

B-B

GF-GS

F-S

Figure 1: Sample faces chosen of 11 relation-ship types of FIW. Parent-child : (top row)Father-Daughter (F-D), Father-Son (F-S), Mother-Daughter (M-S) Mother-Son (M-S). Grandparent-

grandchild : (middle row) same labeling conventionas above. Siblings: (bottom row) Sister-Brother(SIBS), Brother-Brother (B-B), Sister-Sister (S-S).

tools and complement many existing facial recognition sys-tems (e.g., FBI’s NGI3). However, even after several years(i.e., since 2010 [8]) there are only a few vision systems ca-pable of handling such tasks. Hence, kin-based technologyin the visual domain has yet to truly evolve from research-to-reality.

We believe the reason that kinship recognition technologyhas not yet advanced to real-world applications is two-fold:

1. Current image datasets available for kinship tasks arenot large enough to reflect the true data distributionsof a family and their members.

2. Visual evidence for kin relationships are less discrimi-nant than class types of other, more conventional ma-chine vision problems (e.g., facial recognition or objectclassification), as many hidden factors a↵ect the simi-larities of facial appearances amongst family members.

To that end, we introduce a large-scale image dataset forkinship recognition called Families in the Wild (FIW). Tothe best of our knowledge, FIW is by far the largest andmost comprehensive kinship dataset available in the vi-sion and multimedia communities [see Table 1]. FIW in-cludes 11,193 unconstrained family photos of 1,000 families,which is nearly 10x more than the next-to-largest, Family101[7]. Also, it is from these 1,000 families that 418,060 imagepairs for the 11 relationship types (see Figure 1 & Table 2).

Thus far, attempts at image-based kinship recognitionhave focused on facial features, as is the case for this workas well. Kinship recognition is typically conducted in one ormore of the following modes: (1) kinship verification, whichis a binary classification problem, i.e., determine whether

3https://www.e↵.org/2014/fbi-to-have-52M-face-photos

Page 2: Families in the Wild (FIW): Large-Scale Kinship Image ...chine vision problems (e.g., facial recognition or object classification), as many hidden factors a↵ect the simi-larities

Male FemaleSasha

Sasha Child of Steven

Choose name or new to add member.new

StevenKate ‘new’

0 " #" 0 2# 2 0

& '()*1 ,-./.01 12-.

3*456

& 44 0,28ℎ2 61-Parent of; 2-Spouse of; 4-Child of

I. F0703.csv

1 2

3

2

II. P7343.csvName x y

Steven 100 60

Kate 50 65

Sasha 200 75

OlderßAGEàYounger

1. Data Collection

Steven, wife Kate Capshaw & daughter Sasha.

Steven Kate

Steven Kate

Metadata

In Family In PhotoSpielberg.Steven

PIDFIDClose

Save

Remove

P7343P7344P7345P7346P7347 Family Tree

Family ID (FID)F0001

:F1000

Family Photos

Family List

Image Search

0 " #" 0 2# 2 0

& '()*1 ,-./.01 12-.

3*456

& 44 0,28ℎ2 6

2. Data Annotation 3. Data Parsing

F0701F0702F0703F0704F0705

Relationship MatrixFamily Member ID (MID)

Picture ID (PID)1

2

3

File Help

Figure 2: Method to construct FIW. Data Collection: a list of candidate families (with an unique FID) andphotos (with an unique PID) are collected. Data Annotation: a labeling tool optimized the process of markingthe complex hierarchical nature of the 1,000 family trees of FIW. Data Parsing: post-processed the two setsof labels generated by the tool to partition data for kinship verification and family recognition.

or not two or more persons are blood-related (kin or nonkin); (2) family recognition, which is a multi-class classifi-cation problem where the aim is to determine the family ofan individual in a photo; (3) kinship identification, which isa fine-grain categorization problem, as the aim is to deter-mine the type of relationship shared between two or morepeople. We focus on the first and second modes in this pa-per, kinship verification and family recognition, which webriefly discuss next.

Kinship Verification. Previous e↵orts mainly focused on4 parent-child pairs. As research in both psychology andcomputer vision revealed, di↵erent kin relations render dif-ferent familiar features, and the 4 kin relations are usuallymodeled independently. Hence, it is important to have morekin relationship types accessible. Thus, FIW provides 11pair-wise types (see Figure 1): 7 returning types (i.e., parent-child and siblings), but in sample sizes scaling up 105xlarger; and 4 types being o↵ered to the research commu-nity for the first time (i.e., grandparent-grandchild). Also,existing datasets contain, at most, only a couple of hundredpairs per category (i.e., 1,000 in total). Such a insu�cientamount leads to models overfitting the training data. Hence,existing models do not generalize well to new, unseen testdata. However, FIW now makes 418,060 pairs available.

Family Recognition. A challenging task that grows moredi�cult with more families. This is because families con-tain large intra-class variations, often with overlaps betweenclasses. Similar to conventional facial recognition, when thetargets are unconstrained faces in the wild [12] (i.e., pose,illumination, expression, and scene variant), the level of di�-culty further increases, which is the same with kinship recog-nition. These are, unfortunately, challenges that need to beovercome. Thus, FIW poses realistic challenges needed tobe surpassed before transitioning to real-world applications.

Contributions. We make three distinct contributions:

1. We introduce the largest visual kinship dataset to date,Families in the Wild (FIW).4 FIW is complete withrich label information for 1,000 family trees, from which11 kin relationship types were extracted in numbersthat are orders of magnitude times larger than otherdatasets. This was made possible with an e�cient an-notation tool and procedure (Section 2).

4FIW will be available upon publication of this paper.

2. We provide several benchmarks on FIW for both kin-ship verification and family recognition, including var-ious low-level features, metric learning methods, andpre-trained Convolutional Neural Network (CNN) mod-els. See Section 3 for experimental settings and results.

3. We fine-tune two CNNs: one with a triplet-loss layeron top, and the other with a softmax loss. Both yield asignificant boost in performance over all other bench-marks for both tasks (Section 3.2).

2. FAMILIES IN THE WILDWe now discuss the procedure followed to collect, orga-

nize, and label 11,193 family photos of 1,000 families withminimal manual labor. Then, we statistically compare FIWwith other related datasets.

2.1 Building FIWThe goal for FIW was to collect approximately 10 photos

for 1,000 families, each with at least 3 family members). Wenow summarize the method for achieving this in a three stepprocess, which is visually depicted in Figure 2.Step 1: Data Collection. A list of over 1,000 candidatefamilies was made. To ensure diversity, we targeted groupsof public figures worldwide by searching for <ethnicity ORcountry> AND <occupation> online (e.g., MLB [Baseball]Players, Brazilian Politicians, Chinese Actors, Denmark +

Royal Family). Family photos were then collected using var-ious search engines (e.g., Google, Bing, Yahoo) and socialmedia outlets (e.g., Pinterest) to widen the search space.Those with at least 3 family members and 8 family photoswere added to FIW under an assigned Family ID (FID).Step 2: Data Annotation. We developed a labeling toolto quickly annotate a large corpus of family photos. All pho-tos for a given FID are labeled sequentially. Labeling is doneby clicking a family member’s face. Next, a face detector ini-tializes a resizable box around the face. Faces unseen by thedetector are discarded, as these are assumed to be poorlyresolved. The tool then prompts for the name of the mem-ber via a drop-down menu. For starters, option new addsa member to a family under a unique member ID (MID),which prompts for the name, gender, and relationship typesshared with others previously added to the current family(or FID). From there onward, labeling family members isjust a matter of clicking.

Page 3: Families in the Wild (FIW): Large-Scale Kinship Image ...chine vision problems (e.g., facial recognition or object classification), as many hidden factors a↵ect the simi-larities

Table 1: Comparison of FIW with related datasets.

Dataset No.Family

No.People

No.Faces

AgeVaries

FamilyTrees

CornellKin[8] 5 300 300 5 5

UBKinFace[20, 25] 5 400 600 X 5

KFW-I[16] 5 1,066 1,066 5 5

KFW-II[16] 5 2,000 2,000 5 5

TSKinFace[18] 5 2,589 5 X XFamily101[7] 101 607 14,816 X XFIW(Ours) 1,000 10,676 30,725 X X

Step 3: Dataset Parsing. Two sets of labels are gen-erated in Step 2: (1) image-level, containing names andcorresponding facial locations; (2) family-level, containinga relationship matrix that represents the entire family tree.Relationship matrices are then referenced to generate lists ofmember pairs for the 11 relationship types. Next, face detec-tions are normalized and cropped with [14], then are storedaccording the previously assigned FID! MID. Lastly, listsof image pairs are generated for both kinship verificationand family recognition.

2.2 Database StatisticsOur FIW dataset far outdoes its predecessors in terms of

quantity, quality, and purpose. FIW contains 11,193 familyphotos of 1,000 di↵erent families. There are about 10 imagesper family that include at least 3 and as many as 24 familymembers. We compare FIW to related datasets in Table1 and 2. Clearly, FIW provides more families, identities,facial images, relationship types, and labeled pairs– the paircount of FIW is orders of magnitude bigger than the next-to-largest (i.e., KFW-II).

3. EXPERIMENTS ON FIWIn this section, we first discuss visual features and related

methods used to benchmark the FIW dataset. We thenreport and review all benchmark results. Finally, we dis-cuss the two methods used to fine-tune the pre-trained CNNmodel: (1) training a triplet-loss for kinship verification and(2) learning a softmax classifier for family recognition. Bothobtain top scores in the respective task.

3.1 Features and Related MethodsAll features and methods covered here were used to bench-

mark FIW. First, we review handcrafted features, Scale In-variant Feature Transformation (SIFT) and Local BinaryPatterns (LBP), which are both widely used in kinship ver-ification [16] and facial recognition [19]. Next, we introduceVGG-Face, the pre-trained CNN model used here as an o↵-the-shelf feature extractor. Lastly, we review other relatedmetric learning methods.

SIFT [15] features have been widely applied in object andface recognition. As done in [16], we resized all facial imagesto 64 ⇥ 64, and set the block size to 16 ⇥ 16 with a strideof 8. Thus, there were a total of 49 blocks for each image,yielding a feature vector of length 128⇥ 49 = 6, 272D.

LBP [1] has been frequently used for texture analysis andface recognition, as it describes the appearance of an imagein a small, local neighborhood around a pixel. Once again,we followed the feature settings of [16] by first resizing eachfacial image to 64 ⇥ 64, and then extracting LBP features

Table 2: Pair counts for FIW and related datasets.KFW-II[16]

SiblingFace [10]

GroupFace [10]

Family101[7]

FIW(Ours)

B-B 5 232 40 5 86,000

S-S 5 211 32 5 86,000

SIB 5 277 53 5 75,000

F-D 250 5 69 147 45,000

F-S 250 5 69 213 43,000

M-D 250 5 62 148 44,000

M-S 250 5 70 184 37,000

GF-GD 5 5 5 5 410

GF-GS 5 5 5 5 350

GM-GD 5 5 5 5 550

GM-GS 5 5 5 5 750

Total 1,000 720 395 607 418,060

from 16⇥ 16 non-overlapping blocks with a radius of 2 pix-els and number of neighbors (i.e., samples) set to 8. Wethen generated a 256D histogram from each, yielding a finalfeature vector of length 256⇥ 16 = 4, 096D.

VGG-Face CNN [17] uses a“Very Deep”architecture withvery small convolutional kernels (i.e., 3 ⇥ 3) and convolu-tional stride (i.e., 1 pixel). This model was pre-trained onover 2.6 million images of 2,622 celebrity identities. For this,each face image was resized to 224⇥224 and then fed-forwardto the second-to-last fully-connected layer (i.e., fc7) of theCNN model, producing a 4,096D feature vector.

Metric Learning methods are commonly employed in thevisual domain, some designed specifically for kinship recog-nition [5, 16, 18] and some as generic metric learning meth-ods [4]. We chose two representative methods from thesetwo categories to report benchmarks on, Neighborhood Re-pulsed Metric Learning (NRML) [16] and Information The-oretic Metric Learning (ITML) [4].

3.2 Fine-Tuned CNN ModelWeights of deep networks are trained on larger amounts of

generic source data, then fine-tuned on target data, whichutilize a wider, more readily available source domain thatresembles the target in either modality, view, or both [9].Following this notion, and motivated by recent success withdeep learning on faces [17, 22, 24], we fine-tuned the VGG-Face model, improving results for both kinship verificationand family recognition.

Kinship Verification. In kinship verification, for each foldwe select the families which have more than 10 images in therest four folds. 90% images of each family are used to fine-tune the model and the rest for validation. An average of8,295 images are selected. The experimental results of thispart can be found in Table 3 and Figure 3. This is so farthe best results on FIW. Specifically, we remove the lastfully-connected layer which is used to identify 2,622 peopleand employed a triplet-loss [19] as the loss function. Thesecond-to-last fully-connected layer was the only non-frozenlayer of the original CNN model, with an initial learningrate of 10�5 that decreased by a factor of 10 every 700 iter-ations (out of 1,400). Batch size was 128 images, and othernetwork settings were the same as the original VGG-Facemodel. Training was done on a single GTX Titan X withabout 10GB GPU memory. Fine-tuning was done using therenown Ca↵e [13] framework.

Page 4: Families in the Wild (FIW): Large-Scale Kinship Image ...chine vision problems (e.g., facial recognition or object classification), as many hidden factors a↵ect the simi-larities

GF-GSGF-GD GM-GD GM-GS

F-D M-DF-S M-S

B-BSIBS S-S1

0.8

0.6

0.4

0.2

010.80.60.40.20

True

Pos

itive

Rat

e

False Positive Rate10.80.60.40.20

False Positive Rate10.80.60.40.20

False Positive Rate

1

0.8

0.6

0.4

0.2

0

True

Pos

itive

Rat

e

1

0.8

0.6

0.4

0.2

0

True

Pos

itive

Rat

e

10.80.60.40.20False Positive Rate

Figure 3: Relationship specific ROC curves depicting performance of each method.

Table 3: Verification accuracy scores (%) for 5-fold experiment on FIW. No family overlap between folds.

F-D F-S M-D M-S SIBS B-B S-S GF-GD GF-GS GM-GD GM-GS Avg.

SIFT 56.1 56.5 56.4 55.3 58.7 50.3 57.4 59.3 66.9 60.4 56.9 57.7±4.0

LBP 55.0 55.3 55.4 55.9 57.1 56.8 55.8 58.5 59.1 55.6 60.1 56.8±1.7

VGG-Face 64.4 63.4 66.2 64.0 73.2 71.5 70.8 64.4 68.6 66.2 63.5 66.9±3.5

Fine-Tuned CNN 69.4 68.2 68.4 69.4 74.4 73.0 72.5 72.9 72.3 72.4 68.3 71.0±2.3

Family Recognition. As visual kinship recognition is es-sentially related to facial features, we froze the weights in thelower levels of the VGG-Face network, and replaced with anew softmax layer to classify the 316 families. Other settingsare the same as those used for kinship verification.

3.3 Experimental SettingsIn this section, we provide benchmarks on FIW using all

features and related methods mentioned in the previous sec-tion. Dimensionality of each feature is reduced to 100D us-ing PCA. Experiments were done following a 5-fold cross-validation protocol. Each fold was of equal size and with nofamily overlap between folds.

Experiment 1: Kinship Verification. We randomly se-lect an equal number of positive and negative pairs for eachfold family. Cosine similarity is computed for each pair inthe test fold. The average verification rate of all folds isreported in Table 3, showing that kinship verification is achallenging task. While some relation are relatively easyto recognize, e.g., B-B, SIBS, S-S through SIFT, LBP, andVGG-Face features, results of other relations such as parent-child are still below 70.0%. Clearly, VGG-Face features aremuch better than hand-craft features. Notice grandparent-child pairs typically have higher accuracies than parent-child, which we believe is due to the di↵erences in sam-ple sizes. We also compare with the state-of-the-art metric

Table 4: Family Recognition accuracy scores (%) for5-fold experiment on FIW (316 Families). No familyoverlap between folds.

1 2 3 4 5 Avg.

VGG 9.6 14.5 11.6 12.7 13.1 12.3±1.8

Fine-tuned 10.9 14.8 12.5 14.8 13.5 13.3±1.6

learning methods, NRML and ITML [see Figure 3]. Showingimproved scores for the low-level features, but still outper-formed by the CNN model fine-tuned on FIW.

Experiment 2: Family Recognition. We again followthe 5-fold cross-validation protocol with no family overlap.Families with 6 or more members, from which the 5 memberswith the most images were used. The results in Table 4 arefrom 316 families with 7,772 images. Folds were made upof one member for each family. Multi-class SVM was usedto model VGG-Face features for each family (i.e., one-vs-rest). We then improved the top-1 classification accuracyfrom 12.3±1.8 (%, VGG-Face) to 13.3±1.6 (%, our fine-tuned model).

4. DISCUSSIONWe introduced a large-scale dataset of family photos cap-

tured in natural, unconstrained environments (i.e., Familiesin the Wild). An annotation tool was designed to quicklygenerate rich label information for over 10,000 photos of1,000 families. Emphasis was put on diversity (i.e., familiesworldwide), data distribution (i.e., at least 3 members and8 photos per family), sample sizes (i.e., multiple instances ofeach member, and at various ages), quality (i.e., only facesseen by the detector), and quantity (i.e., much more dataand new relationship types).

There are many interesting directions for future work onour dataset. We currently only verify whether a pair ofimages is kin or non kin. However, also predicting the kinrelationship type could lead to more value and practical use-fulness. Thus, bringing us closer to doing fine-grain catego-rization on entire family trees. FIW will be an ongoing e↵ortthat will continually grow and evolve. See project page fordownloads, updates, and to learn more about FIW.

Page 5: Families in the Wild (FIW): Large-Scale Kinship Image ...chine vision problems (e.g., facial recognition or object classification), as many hidden factors a↵ect the simi-larities

AcknowledgementsThis material is based upon work supported by the U.S. De-

partment of Homeland Security, Science and Technology Direc-torate, O�ce of University Programs, under Grant Award 2013-ST-061-ED0001. The views and conclusions contained in thisdocument are those of the authors and should not be interpretedas necessarily representing the o�cial policies, either expressedor implied, of the U.S. Department of Homeland Security.

We would also like to thank all members of SMILE Lab whohelped with the process of collecting and annotating the FIWdataset.

5. REFERENCES[1] T. Ahonen, A. Hadid, and M. Pietikainen. Face description

with local binary patterns: Application to face recognition.IEEE Transactions on Pattern Analysis and MachineIntelligence (TPAMI), 28(12):2037–2041, 2006.

[2] M. Almuashi, S. Z. M. Hashim, D. Mohamad, M. H.Alkawaz, and A. Ali. Automated kinship verification andidentification through human facial images: a survey.Multimedia Tools and Applications, pages 1–43, 2015.

[3] N. Crosswhite, J. Byrne, O. M. Parkhi, C. Stau↵er, Q. Cao,and A. Zisserman. Template adaptation for face verificationand identification. arXiv preprint arXiv:1603.03958, 2016.

[4] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon.Information-theoretic metric learning. In Proc.International Conference on Machine Learning (ICML),pages 209–216. ACM, 2007.

[5] A. Dehghan, E. Ortiz, R. Villegas, and M. Shah. Who do ilook like? determining parent-o↵spring resemblance viagated autoencoders. In Proc. IEEE Conference onComputer Vision and Pattern Recognition (CVPR), pages1757–1764, 2014.

[6] H. Dibeklioglu, A. Salah, and T. Gevers. Like father, likeson: Facial expression dynamics for kinship verification. InProc. IEEE International Conference on Computer Vision(ICCV), pages 1497–1504, 2013.

[7] R. Fang, A. Gallagher, T. Chen, and A. Loui. Kinshipclassification by modeling facial feature heredity. In 20thIEEE International Conference on Image Processing(ICIP), pages 2983–2987. IEEE, 2013.

[8] R. Fang, K. D. Tang, N. Snavely, and T. Chen. Towardscomputational models of kinship verification. In IEEEInternational Conference on Image Processing (ICIP),pages 1577–1580. IEEE, 2010.

[9] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Richfeature hierarchies for accurate object detection andsemantic segmentation. CoRR, abs/1311.2524, 2013.

[10] Y. Guo, H. Dibeklioglu, and L. van der Maaten.Graph-based kinship recognition. In InternationalConference on Pattern Recognition (ICPR), pages4287–4292, 2014.

[11] J. Hu, J. Lu, J. Yuan, and Y.-P. Tan. Large MarginMulti-metric Learning for Face and Kinship Verification inthe Wild, pages 252–267. Springer International Publishing,Cham, 2015.

[12] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller.Labeled faces in the wild: A database for studying facerecognition in unconstrained environments. TechnicalReport 07-49, University of Massachusetts, Amherst,October 2007.

[13] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long,R. Girshick, S. Guadarrama, and T. Darrell. Ca↵e:Convolutional architecture for fast feature embedding.arXiv preprint arXiv:1408.5093, 2014.

[14] D. E. King. Dlib-ml: A machine learning toolkit. Journal ofMachine Learning Research, 10:1755–1758, 2009.

[15] D. G. Lowe. Distinctive image features from scale-invariantkeypoints. International Journal of Computer Vision,60(2):91–110, 2004.

[16] J. Lu, X. Zhou, Y.-P. Tan, Y. Shang, and J. Zhou.Neighborhood repulsed metric learning for kinship

verification. IEEE Transactions on Pattern Analysis andMachine Intelligence (TPAMI), 36(2):331–345, 2014.

[17] O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep facerecognition. In Proc. British Machine Vision Conference(BMVC), 2015.

[18] X. Qin, X. Tan, and S. Chen. Tri-subject kinshipverification: Understanding the core of A family. CoRR,abs/1501.02555, 2015.

[19] F. Schro↵, D. Kalenichenko, and J. Philbin. Facenet: Aunified embedding for face recognition and clustering. InProc. IEEE Conference on Computer Vision and PatternRecognition (CVPR), pages 815–823, 2015.

[20] M. Shao, S. Xia, and Y. Fu. Genealogical face recognitionbased on ub kinface database. In CVPR 2011WORKSHOPS, pages 60–65. IEEE, 2011.

[21] M. Shao, S. Xia, and Y. Fu. Identity and kinship relationsin group pictures. In Human-Centered Social MediaAnalytics, pages 175–190. Springer, 2014.

[22] Y. Sun, X. Wang, and X. Tang. Deep learning facerepresentation from predicting 10,000 classes. In Proc.IEEE Conference on Computer Vision and PatternRecognition (CVPR), pages 1891–1898, 2014.

[23] D. Wang, C. Otto, and A. K. Jain. Face search at scale: 80million gallery. CoRR, abs/1507.07242, 2015.

[24] M. Wang, Z. Li, X. Shu, J. Tang, et al. Deep kinshipverification. In IEEE 17th International Workshop onMultimedia Signal Processing (MMSP), pages 1–6. IEEE,2015.

[25] S. Xia, M. Shao, and Y. Fu. Kinship verification throughtransfer learning. In Proc. International Joint Conferenceson Artificial Intelligence (IJCAI), volume 22, pages2539–2544, 2011.

[26] S. Xia, M. Shao, and Y. Fu. Toward kinship verificationusing visual attributes. In Pattern Recognition (ICPR),2012 21st International Conference on, pages 549–552.IEEE, 2012.

[27] S. Xia, M. Shao, J. Luo, and Y. Fu. Understanding kinrelationships in a photo. IEEE Transactions onMultimedia, 14(4):1046–1056, 2012.

[28] F. Xiong, M. Gou, O. Camps, and M. Sznaier. Personre-identification using kernel-based metric learningmethods. In European Conference on Computer Vision(ECCV), pages 1–16. Springer, 2014.

[29] H. Yan, J. Lu, W. Deng, and X. Zhou. Discriminativemultimetric learning for kinship verification. IEEETransactions on Information Forensics and Security,9(7):1169–1178, 2014.

[30] H. Yan, J. Lu, and X. Zhou. Prototype-based discriminativefeature learning for kinship verification. IEEE Transactionson Cybernetics, 45(11):2535–2545, 2015.

[31] X. Zhou, J. Hu, J. Lu, Y. Shang, and Y. Guan. Kinshipverification from facial images under uncontrolledconditions. In Proc. ACM International Conference onMultimedia, pages 953–956. ACM, 2011.

[32] X. Zhou, J. Lu, J. Hu, and Y. Shang. Gabor-based gradientorientation pyramid for kinship verification underuncontrolled environments. In Proc. ACM InternationalConference on Multimedia, pages 725–728. ACM, 2012.