11
1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012 Understanding Kin Relationships in a Photo Siyu Xia, Ming Shao, Student Member, IEEE, Jiebo Luo, Fellow, IEEE, and Yun Fu, Senior Member, IEEE Abstract—There is an urgent need to organize and manage im- ages of people automatically due to the recent explosion of such data on the Web in general and in social media in particular. Be- yond face detection and face recognition, which have been exten- sively studied over the past decade, perhaps the most interesting as- pect related to human-centered images is the relationship of people in the image. In this work, we focus on a novel solution to the latter problem, in particular the kin relationships. To this end, we con- structed two databases: the rst one named UB KinFace Ver2.0, which consists of images of children, their young parents and old parents, and the second one named FamilyFace. Next, we develop a transfer subspace learning based algorithm in order to reduce the signicant differences in the appearance distributions between children and old parents facial images. Moreover, by exploring the semantic relevance of the associated metadata, we propose an algorithm to predict the most likely kin relationships embedded in an image. In addition, human subjects are used in a baseline study on both databases. Experimental results have shown that the proposed algorithms can effectively annotate the kin relationships among people in an image and semantic context can further im- prove the accuracy. Index Terms—Context, face recognition, kinship verication, se- mantics. I. INTRODUCTION W ITH the development of technology in modern multi- media society, image acquisition and storage by digital devices have never been easier than today. Storage unit like GB or TB is not qualied already in storing images from the In- ternet. For example, as the most popular social network web- site around the world, Facebook has already hosted over 20 bil- lion images, with more than 2.5 billion new photos being added each month [1]. However, how to successfully and automati- cally manage the substantial images captured by people is a real challenge since it pushes the computer to its limit of image un- derstanding—it requires both large-scale data analysis and high accuracy. In most cases, people are the focus of images taken by consumers and managing or organizing them essentially raises Manuscript received June 26, 2011; revised December 05, 2011; accepted January 18, 2012. Date of publication February 10, 2012; date of current version July 13, 2012. This work was supported in part by the IC Postdoc- toral Research Fellowship under Award 2011-11071400006, AFOSR Award FA9550-12-1-0201, and by Google Faculty Research Awards. The associate editor coordinating the review of this manuscript and approving it for publica- tion was Dr. Qi Tian. S. Xia is with the School of Automation, Southeast University, Nanjing 210096, China (e-mail:[email protected]). J. Luo is with the Computer Science Department, University of Rochester, Rochester, NY 14627 USA (e-mail: [email protected]). M. Shao and Y. Fu are with the Department of Computer Science and En- gineering, State University of New York at Buffalo, Buffalo, NY 14260 USA (e-mail: [email protected]; [email protected]). Color versions of one or more of the gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identier 10.1109/TMM.2012.2187436 two problems: 1) who these people are and 2) what their rela- tionships are. In the rst problem, identities are what we are most con- cerned with and intuitively faces are critical clues. Face recog- nition is therefore an important approach toward this problem. Basically, face recognition can be further categorized into two classes according to whether the contextual information is used. Face recognition without context information, namely, pairwise face recognition, has been extensively studied during the past decade by exploration of following techniques: face detection and alignment [2], [3], subspace learning [4]–[10], invariant feature extraction [11]–[13], metric learning method [14], at- tributes based classier [15] and face synthesis and hallucina- tion [16], [17]. Nevertheless, it is still an ongoing problem due to several practical factors, e.g., illumination, pose, expression and aging. The performance of the face recognition algorithm is dramatically degraded when a large-scale database is consid- ered [18]. On the other hand, in practice, faces no longer appear alone due to the rapid development of multimedia. What often accompany faces are text, video and many other metadata. Re- cently, research attention is shifting to contextual information involved in the people-centric images, including locations, cap- ture time of images and patterns of cooccurrence and reoccur- rence in images [19], social norm and conventional positioning observed [20], [21], text or other linked information [22], [23], and clothing [24]. The relationship of people in a photo also deserves the re- search attention in that social connection has been the essence of modern society. The possible relationships in consumer im- ages include “kinship,” “friends,” “colleague,” etc. Statistically, such relationships are often the main themes of people’s albums and successfully parsing or tagging them leads to better under- standing of images. Among them, kinship is believed to be the most discriminative one since children naturally inherit gene from their parents. An intuitive way to annotate this relationship is by identities of individuals the images contain, and in theory, this can be achieved using automatic face recognition—the rst problem aforementioned. However, it is also possible that the relationship is still uncertain even if identities are known. There- fore, direct kinship verication is worth investigating. The pi- oneer work in [25] and [26] attempted to discriminate kinship based on selected inheritance—invariant features. When kin re- lationship, gender and age are known, a family tree can be auto- matically created given an family image like Fig. 1. Moreover, kinship verication can be applied to both general computer vi- sion problems, such as image retrieval or annotation, and some specic application scenarios, such as nding missing children. In this paper, as an extension of our previous work [27], [28], we attempt to tackle the kinship discrimination problem based on faces as well as the semantics embedded in contexts. This is reasonable because both appearance and semantics are valu- able for relationship inference. Recent research [29]–[31] dis- 1520-9210/$31.00 © 2012 IEEE

1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012

Understanding Kin Relationships in a PhotoSiyu Xia, Ming Shao, Student Member, IEEE, Jiebo Luo, Fellow, IEEE, and Yun Fu, Senior Member, IEEE

Abstract—There is an urgent need to organize and manage im-ages of people automatically due to the recent explosion of suchdata on the Web in general and in social media in particular. Be-yond face detection and face recognition, which have been exten-sively studied over the past decade, perhaps themost interesting as-pect related to human-centered images is the relationship of peoplein the image. In this work, we focus on a novel solution to the latterproblem, in particular the kin relationships. To this end, we con-structed two databases: the first one named UB KinFace Ver2.0,which consists of images of children, their young parents and oldparents, and the second one named FamilyFace. Next, we developa transfer subspace learning based algorithm in order to reducethe significant differences in the appearance distributions betweenchildren and old parents facial images. Moreover, by exploringthe semantic relevance of the associated metadata, we propose analgorithm to predict the most likely kin relationships embeddedin an image. In addition, human subjects are used in a baselinestudy on both databases. Experimental results have shown that theproposed algorithms can effectively annotate the kin relationshipsamong people in an image and semantic context can further im-prove the accuracy.

Index Terms—Context, face recognition, kinship verification, se-mantics.

I. INTRODUCTION

W ITH the development of technology in modern multi-media society, image acquisition and storage by digital

devices have never been easier than today. Storage unit like GBor TB is not qualified already in storing images from the In-ternet. For example, as the most popular social network web-site around the world, Facebook has already hosted over 20 bil-lion images, with more than 2.5 billion new photos being addedeach month [1]. However, how to successfully and automati-cally manage the substantial images captured by people is a realchallenge since it pushes the computer to its limit of image un-derstanding—it requires both large-scale data analysis and highaccuracy. In most cases, people are the focus of images taken byconsumers and managing or organizing them essentially raises

Manuscript received June 26, 2011; revised December 05, 2011; acceptedJanuary 18, 2012. Date of publication February 10, 2012; date of currentversion July 13, 2012. This work was supported in part by the IC Postdoc-toral Research Fellowship under Award 2011-11071400006, AFOSR AwardFA9550-12-1-0201, and by Google Faculty Research Awards. The associateeditor coordinating the review of this manuscript and approving it for publica-tion was Dr. Qi Tian.S. Xia is with the School of Automation, Southeast University, Nanjing

210096, China (e-mail:[email protected]).J. Luo is with the Computer Science Department, University of Rochester,

Rochester, NY 14627 USA (e-mail: [email protected]).M. Shao and Y. Fu are with the Department of Computer Science and En-

gineering, State University of New York at Buffalo, Buffalo, NY 14260 USA(e-mail: [email protected]; [email protected]).Color versions of one or more of the figures in this paper are available online

at http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/TMM.2012.2187436

two problems: 1) who these people are and 2) what their rela-tionships are.In the first problem, identities are what we are most con-

cerned with and intuitively faces are critical clues. Face recog-nition is therefore an important approach toward this problem.Basically, face recognition can be further categorized into twoclasses according to whether the contextual information is used.Face recognition without context information, namely, pairwiseface recognition, has been extensively studied during the pastdecade by exploration of following techniques: face detectionand alignment [2], [3], subspace learning [4]–[10], invariantfeature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and hallucina-tion [16], [17]. Nevertheless, it is still an ongoing problem dueto several practical factors, e.g., illumination, pose, expressionand aging. The performance of the face recognition algorithmis dramatically degraded when a large-scale database is consid-ered [18]. On the other hand, in practice, faces no longer appearalone due to the rapid development of multimedia. What oftenaccompany faces are text, video and many other metadata. Re-cently, research attention is shifting to contextual informationinvolved in the people-centric images, including locations, cap-ture time of images and patterns of cooccurrence and reoccur-rence in images [19], social norm and conventional positioningobserved [20], [21], text or other linked information [22], [23],and clothing [24].The relationship of people in a photo also deserves the re-

search attention in that social connection has been the essenceof modern society. The possible relationships in consumer im-ages include “kinship,” “friends,” “colleague,” etc. Statistically,such relationships are often the main themes of people’s albumsand successfully parsing or tagging them leads to better under-standing of images. Among them, kinship is believed to be themost discriminative one since children naturally inherit genefrom their parents. An intuitive way to annotate this relationshipis by identities of individuals the images contain, and in theory,this can be achieved using automatic face recognition—the firstproblem aforementioned. However, it is also possible that therelationship is still uncertain even if identities are known. There-fore, direct kinship verification is worth investigating. The pi-oneer work in [25] and [26] attempted to discriminate kinshipbased on selected inheritance—invariant features. When kin re-lationship, gender and age are known, a family tree can be auto-matically created given an family image like Fig. 1. Moreover,kinship verification can be applied to both general computer vi-sion problems, such as image retrieval or annotation, and somespecific application scenarios, such as finding missing children.In this paper, as an extension of our previous work [27], [28],

we attempt to tackle the kinship discrimination problem basedon faces as well as the semantics embedded in contexts. Thisis reasonable because both appearance and semantics are valu-able for relationship inference. Recent research [29]–[31] dis-

1520-9210/$31.00 © 2012 IEEE

Page 2: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

XIA et al.: UNDERSTANDING KIN RELATIONSHIPS IN A PHOTO 1047

Fig. 1. A family tree (bottom) can be extracted from a family photo (top) if kinrelationships among people in the photo are known. The double horizontal lineshown in the figure represents the couple relationship.

covers that facial appearance is a cue for genetic similarity aschildren resemble their parents more than other adults of thesame gender, and that there is differential resemblance betweentwo parents, depending on the age and gender of the child. Anal-ogously, a critical observation is that faces of parents capturedwhile they were young closely resemble their children’s com-pared with images captured when they are old, as Fig. 2 illus-trated. This promising statistics inspires us to perform the fol-lowing research. First, the UB KinFace1 Ver2.0 database [28]is set up by collecting child, young–parent and old–parent faceimages from the Web. Through this database, aforementionedhypothesis based on genetics theory is preliminarily proved.Second, a transfer subspace learning based method is proposed,aiming at reducing the large divergences of distributions be-tween children and old parents. The key idea is to utilize someintermediate data set close to both the source and target distribu-tions and naturally the young–parent set is suitable for this task.Third, to exploit the value of contextual information and seman-tics in kinship, a database called FamilyFace is built and its im-ages are drawn from social network websites, such as Facebookand Flickr. Then we propose a strategy to infer the most possiblekin relationships embedded in the images. In addition, to con-duct both objective and subjective evaluations, a human basedtest is introduced as a baseline and a more systematic compar-ison is performed as well. The difference between this paper andour previous works [25], [26] lie in the following: 1) we utilize

1UB KinFace Ver2.0 (for noncommercial purpose) now is available at: http://www.cse.buffalo.edu/~yunfu/research/Kinface/Kinface.htm

Fig. 2. Faces of parents captured while they were young (top of the blue tri-angle in online version) are more alike their children’s (lower right corner of theblue triangle in online version) compared with images captured when they areold (lower left corner of the blue triangle in online version). By leveraging theintermediate set, parent–child verification problem becomes more tractable.

young parents set as an intermediate set to reduce the divergencebetween data distributions and 2) we consider the context andsemantics in family photos and treat the kinship verification as ajoint tagging problem, mining more knowledge than mere pair-wise verification.The rest of the paper is organized as follows. Several related

works are reviewed in Section II. In Section III, we present thetransfer subspace learning method for kinship verification. Bothcontextual and semantic information are utilized to further en-hance the accuracy. Experimental evaluations are described inSection IV. Finally, we draw conclusions in Section V.

II. RELATED WORK

The first attempt of kinship verification was published in [25]and [26]. To utilize local information, they first localized keyparts of faces, so that facial features, such as skin color, grayvalue, histogram of gradient, and facial structure information,are extracted. Then K-nearest-neighbor (KNN) with Euclideanmetric was adopted to classify faces. Such simple strategyworksreasonably well for online collected data. Different from theverification problem of the same person, kinship verificationmay not expect each feature pair is exactly the same, becausegenetic variations often exist from parents to children. More-over, similarities or features on faces between kins are mostlylocated at eyes, nose, mouth, etc., according to genetics studies.We therefore extract these local features inherited from parentsrather than holistic ones. Though considering this intuitive in-formation, the kinship verification is still challenging due to re-markable differences between the query and gallery images. Themost significant degrading factor, in terms of face recognition,is aging. Parent images used in kinship verification often con-tain the elders, but queries are usually young males or females.Texture distributions of these faces are quite different due tothe aging effect, let alone the structure variations on faces fromdifferent identities. Meanwhile, other uncontrollable factors areevident in the real-world, as described in [18]. These factors to-gether lead to a complex new problem in biometrics.Moreover, recently a more significant problem draws consid-

erable attentions that common assumption of training and testdata from the same feature space and distribution is not alwaysfully valid. This is a natural situation for any new classificationproblems. Manual labeling work is time-consuming and peoplewant to reuse the knowledge that has already been extensively

Page 3: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

1048 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012

studied. In such a case, knowledge transfer or transfer learningis highly desirable. In [32], several practical instances havebeen introduced to illustrate the role of transfer learning,e.g., web-document classification, WiFi data remodeling andsentiment classification based on customer reviews. The basicproblem is how to reuse the knowledge learned from otherdata or feature spaces. Specifically, transfer learning as wementioned here can be further categorized into two classes,inductive transfer learning [33]–[38] and transductive transferlearning [39]–[41]. For the former one, the domains in whichtwo sets of data embedded are either the same or different,but learning targets are always different. Meanwhile, the latterone can tolerate different distributions between data sets, butlearning targets are quite identical. In this paper, for our kinshipverification problem, three distributions exist, i.e., children,young parents and old parents. Different from [25] and [26], weintroduce young–parent data set as an intermediate distributionto facilitate transfer learning and clearly our approach falls intothe transductive transfer learning category.Bregman divergence was adopted in [42] to minimize the dis-

tribution divergence in the subspace by seeking a common sub-space. This work has been applied to the cross-domain age es-timation [43] and improved the accuracy on one data set by theknowledge transferred from the other data set. Sometimes, how-ever, transfer learning may fail due to the large discrepancy be-tween the source and target data sets [32], such as children andold parents images. We therefore consider introducing an in-termediate data set as a link to abridge the divergence betweenthe source and target domains. These intermediate samples, i.e.,face images of young parents, are close to both children and oldparents, and appropriately designed for our formulation.Understanding human characteristics in images by contex-

tual information has been extensively investigated in [19]–[24],[44], and [45] to further assist image understanding. In [19],the time stamp, location of image, reoccurrence, cooccurrenceamong image collections and annotated individuals were lever-aged to infer unknown people. This work highly relies on theassumption that people in image collections may emerge morethan once, be tied to each other, and with high likelihood standin the same location with the same people. Relations betweenpeople’s positions in images, and their genders/ages were re-vealed in [20]. This can be explained by the fact that people’spositions in photos indicate social settings in real-world. More-over, these factors together, e.g., position, age, and gender, canalso help human identification in a family photo [21]. In ad-dition, [22] and [23] addressed the problem that when thereare no labels or poor labels in training set, how to annotatefaces by text-image cooccurrence, which is naturally containedin Web resources. It was argued in [24] that identities of peopleare highly related to their clothing, and clothing segmentationand face recognition are intertwined. Therefore, clothing fea-tures can be used to strengthen discriminative feature extraction.Face clustering technology has also been used in [44] to dis-cover social relationships of subjects in personal photo collec-tions. Cooccurrence of identified faces as well as interface dis-tances (inferred from the in-image distance and typical humanface size) were used to calculate the link strength between anytwo identified persons. Last but not least, Markov logic [45] wasutilized to formulate the rules of thumb as soft constraints, when

a group of photos are considered collectively with the purposeof detecting relationships in consumer photo collections.

III. PROPOSED APPROACH

A. UB KinFace Database

To the best of our knowledge, the “UB KinFace Ver1.0” [27]is the first database containing images of both children and theirparents at different ages. All images in the database are real-world collections of public figures (celebrities and politicians)from the Web. We use a person’s name as the query for imagesearch. The database used in this paper, named “UB KinFaceVer2.0” [28], is an extension of “UB KinFace Ver1.0” by con-sidering more instances and ethnicity group impacts. A generalview of this database is summarized in Fig. 3. Similar to “La-beled Faces in the Wild” [18], these unconstrained facial im-ages show a large range of variations, including pose, lighting,expression, age, gender, background, race, color saturation, andimage quality, etc.The key difference between our database and that in [25] lies

in our inclusion of young parents. Basically, our database com-prises 600 images of 400 people which can be separated into200 groups. Each group is composed of child, young–parent andold–parent images. The “UB KinFace Ver2.0” can be mainly di-vided into two parts in terms of race, i.e., Asian and non-Asian,each of which consists of 100 groups, 200 people and 300 im-ages. Typically, there are four kin relations, i.e., “son–father,”“son–mother,” “daughter–father,” and “daughter–mother,” asshown in Fig. 3. In addition, Fig. 3 illustrates the statistics fromthe perspective of race and kin relations. As we can see, malecelebrities of both Asian and non-Asian, either son or father, aredominant in the UB KinFace database. When four possible kinrelations are considered in our problem, “fathers” reasonablybecome the essential roles, with 46.5% and 38.5% over allgroups in “son–father” and “daughter–father” relations. Thisphenomenon can be explained by the fact that there are statisti-cally more notable males than females in current government,entertainment and sports communities.Contexts and semantics in family albums are other factors

that we can leverage to improve the accuracy. To this end, anew database called FamilyFace is built, which, including 214images, 507 persons in total, is collected from the popular so-cial network websites, e.g., Facebook and Flickr. The exampleimages are shown in Fig. 4. In this database, there are kinshipand other relationships, e.g., friendship and colleague, in eachimage, and they may coexist in the same image. In addition, notonly Asian but also western people are considered in the data-base. Since we downloaded them from the Web and they arenot subject to any specific constraints, these images may revealreal-world conditions for practical evaluations.

B. Feature Extraction

Two typical features that can distinguish true child–parentpairs from the false ones are explored in this section. One isbased on appearance and extracted by Gabor [11] filters (eightdirections and five scales). Particularly, we first partition eachface into regions in five layers, as illustrated in Fig. 5. As wecan see, the entire face is the first layer. The second layer in-cludes upper, lower, left, right and center parts of the face. The

Page 4: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

XIA et al.: UNDERSTANDING KIN RELATIONSHIPS IN A PHOTO 1049

Fig. 3. KinFace database illustrations. Four groups of people and the corresponding four possible relationships: son–father, son–mother, daughter–father, anddaughter–mother. Children images are in the red ellipse; male parents images are in the blue ellipse; female parents images are in the green ellipse (in onlineversion).

Fig. 4. Sample images (top) from the FamilyFace database. The cropped facescontained in these images are shown in the second row.

forehead, eyes, nose, mouth, and cheek areas constitute the thirdlayer and their finer sub-regions form the fourth layer. A groupof sub-regions based on the four fiducial points finally formthe fifth layer. Then we impose Gabor filters on each local re-gion. Similar idea has been adopted in [46] to analyze facialexpressions through local features. Intuitively, kinship verifi-cation is also a process on local regions. For instance, whenpeople are talking about kinship, they often compare regions onfaces between children and their parents and wonder whetherthey share similar eyes, noses, or mouths. Another feature isbased on the anthropometric model [47] which essentially con-siders structure information of faces. Based on the captured keypoints, we obtain 6-D structure features of ratios of typical re-gion distances, e.g., “eye–eye” distance versus “eye–nose” dis-tance. Structured information is believed to inherit largely fromparents, and therefore might be promising for kinship verifi-cation [25]. However, due to aging process [47], the old par-ents face structures are deformed from the ones when they wereyoung. So, we use transfer subspace learning to mitigate this de-grading factor.Since images obtained from theWeb are under arbitrary envi-

ronment, we first take advantage of “total variation” to removelighting effects. Total variation, as first used in image denoising,has been successfully applied to illumination free face recogni-tion [48]. After removing the lighting effect, we partition facesaccording to key points, i.e., the locations of left eye, right eye,

Fig. 5. Face partitions in different layers and face image illumination normal-ization. For simplicity, only four layers are illustrated instead of five. Red dotson faces in Layer1 illustrate four key points mentioned in this paper (in onlineversion).

nose tip, and the center of the mouth, into regions with differentwidths and heights, as shown in Fig. 5.To fully utilize all the features in different regions, we

propose a new feature extraction strategy based on cumulativematch characteristic (CMC) [49]. We add features in severalrounds and at each round we select the feature in one regionthat can maximize the difference of recognition performanceof child–old parent between current round and last round. Thisprocess can be formulated as

(1)

where means the sum of recognition rates fromrank 1 to , and C and O denote child and old–parent groups,

Page 5: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

1050 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012

TABLE ISELECTED REGIONS IN DIFFERENT LAYERS

respectively. Intuitively, this strategy intends to choose thefeature that best discriminates true child–parent pairs fromfalse ones. In addition, to accelerate the feature selectionprocess, we choose candidate features in the current layer untilno more can be added to enhance the performance. Then wewill repeat this process in the next layer and perform iteratively.The final selected regions are listed in Table I. All these featuresare ultimately concatenated to form a long feature vector. Note,in this paper, the samples we consider for verification are theabsolute difference between two subjects rather than the singleimage or the corresponding feature. Specifically, the sample ofchild–old parent pair and child–young parent pair canbe written as and , where, , and are features of old–parent, young–parent, and

child, respectively.

C. Transfer Subspace Learning

In our framework, we attempt to find a subspace where distri-bution similarity of two different data sets, i.e., child–old parentand child–young parent pairs, is maximized while in the sub-space they still can be well separated in terms of kinship veri-fication. Essentially, our approach is different from [42] in thatours takes advantage of the intermediate set (young parents) andprevents the failure of transfer. Moreover, instead of directlymodeling three distributions similar to [27], we use two distri-butions by pairwise differences of three data sets, which leadsto a simple but efficient model.In terms of transfer learning [32], method in [27] con-

siders children, young parents and old parents as the source,intermediate and target domain, respectively. However, thedifferences of features between children and parents are morediscriminative when we conduct two-class classification. So inthis paper, we impose transfer learning on child–old parent andchild–young parent pairs and therefore they become the sourceand target domain of the new problem. Our objective canbe formulated as, finding an appropriate subspace where thelow-dimensional representation of child–young parent featuredifference and that of child–old parent are still discriminativeand share the same distribution. Fig. 6 illustrates the idea oftransfer subspace learning proposed in this paper which can bemathematically formulated as

(2)

where is a general subspace learning objective func-tion, such as PCA [4], LDA [6], LPP [7], NPE [8], MFA [9],and DLA [10], and and represent the distribution ofthe source and target samples respectively. is theBregman divergence-based regularization that measures the dis-tance between two different distributions in the projected sub-

Fig. 6. Transfer subspace learning for kinship verification. D1, D2, and D3represent target domain, source domain, and learned subspace. The sourceand target domains have different distributions before they are projected tothe learned subspace. With transfer learning, features from both source andtarget domains now share a common distribution. Note the first and secondimage pairs in D1, D2, and D3 represent true and false child–parent pairs,respectively.

space . is the weight for the regularization. This optimiza-tion problem can be solved iteratively by the gradient descentmethod as follows:

(3)

where is the learning rate, is the low-dimen-sional representation of the sample, and and are the numbersof samples in the distribution of and . Particularly, thesecond term with the gradient in (3) can be obtained based on adiscriminant subspace learning method, e.g., discriminative lo-cality alignment (DLA) [10]

(4)

where encapsulates the local geometry and the discriminativeinformation and is the matrix of input samples of all thetraining images. Moreover, estimations of and of (2)in the projected subspace can be achieved by kernel densityestimation (KDE) [42]. Therefore, projection matrix canbe updated iteratively until it is optimized. By introducingthe intermediate set (young parents) and shrinking the dis-crepancy between feature difference of child–young and thatof child–old parent, finally, the child–old parent verificationproblem becomes more tractable. The experimental results inSection IV will demonstrate that the verification accuracy withthe intermediate data set is better than that with direct matchingfrom children to old parents.

D. Kinship Recognition With Context

In this section, we consider the task of kinship verificationin a photo as a joint tagging problem [1] to further improve

Page 6: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

XIA et al.: UNDERSTANDING KIN RELATIONSHIPS IN A PHOTO 1051

the accuracy. In the real-world people-centric images, relation-ships are not independent anymore. For example, kin relation-ships often accompany couple relationships in the same image.In addition, labels such as age and gender of two people are alsohelpful in determining their relations. For example, people withage gap around 20 years have high probability to be child–parentpairs. Different from [1] that considers labels of each node andrelations between labels simultaneously, we instead first pre-dict labels of each individual and then seek possible relations.In this way, complex constrains in one step now are decom-posed into two separate steps, which improve computationalefficiency. For simplicity, we will assume that the image hasalready been parsed into a discrete set of face regions via an ex-isting face detection algorithm. We further assume that each ofthe detected faces is associated with a discrete set of allowablelabels, i.e., gender and age. Our goal is to infer a joint taggingmatrix from the entire feature space and the singlelabel vector for each person . This jointtagging progress can be expressed as

(5)

where is the total score over all individuals of thisjoint annotation work. Suppose there are three relations we willconsider, i.e., kinship, couple, and others. So the joint annotationfunction can be written as

(6)

where is an element in the relation matrix and it couldbe equal to 1, 2, or 3, which are corresponding to kin, couple,and other relationships. is the prior for each of three rela-tions and can be readily computed by the training data. For eachrelation , we will use the following linear re-gression model to formulate:

(7)

where is the weight vector for the feature vector . Fordifferent relations, there should be three different weight vectors, and . The 4-D feature vector

comprises four parts, i.e., gender relation, age difference, rela-tive distance, and kinship score, by which contextual informa-tion embedded in individuals and can be utilized. Detailsof each dimension are defined as follows.• Gender relation: Gender is highly related to the couplerelationship. People with different gender in the same agegroup have higher probability of being couple. If individ-uals and are with the same gender, then ; oth-erwise 0.

• Age difference: Age difference also affects the relation-ship between two people. Generally, parents and childrenshould have an age gap from 20 to 30 with high probabili-ties. Therefore, we use age difference between individualsand as .

• Relative distance: Distance between two people measuresthe intimacy. We use the center of two eyes as the location

of an individual and calculate the Euclidean distance be-tween two individuals. It is normalized by the average overall pairwise distances of all individuals.

• Kinship score: Kinship score can be calculated by the pro-posed transfer subspace learning. For K-NN method, wewill count the positive and negative samples and calculatetheir ratio over neighbors. The percentage of positivesamples will be the score for .

The solution of (5) is not trivial. We can traverse all the possiblerelationship combinations and achieve the largest score, sincein our experiment, the number of people in one image will notprohibitively large.

IV. EXPERIMENTS

Before following experiments, we conduct face and fiducialpoints detection on all images in the UB KinFace and Fami-lyFace database to obtain cropped face images with the fixedsize of 127 100. Then we register faces according to corre-sponding points detected by the last step using an affine trans-form. Several image preprocessing techniques, e.g., histogramequalization, are implemented to mitigate image quality fac-tors. After that, discriminative features are extracted based onthe method proposed in Section III. A series of experimentsare specifically designed to verify the proposed assumption, thetransfer subspace learning method and the context based ver-ification. We also compare the performance by human judgeswith that of the method proposed in this paper to present a sys-tematic evaluation. Since most young parent images are cap-tured with outdated equipment, the black and white images arecommon in this category. Therefore, for fair comparisons withother approaches that do not involve young–parent images, werun all the experiments on gray-scale images. The designed ex-periments in this section were carried out on an Intel Core 2 DuoCPU E7300 2.66 GHz and 3 GB memory, and all algorithmswere implemented by Matlab 7.0.

A. Experiments on UB KinFace Database

1) Kinship Classification: First, a kinship classification ex-periment is performed. Here in terms of biometrics, classifica-tion means finding a proper identity for the query. Specifically,in this section, our aim is that given a child’s facial image, wewill seek and return his/her parent’s image, either young or old.In this process, children images are used as queries while youngparents and old parents images are used as gallery. Euclideandistance metric and nearest-neighbor classifier are adopted forthis task and results are shown in Fig. 7(a), from which we cansee that young parents are more similar to their children thanold parents based on the local Gabor features proposed in thispaper.2) Kinship Verification: We conduct kinship verification to

further prove our hypothesis: given two images of faces, de-termine if they are the true child–parent pair. In these exper-iments, rather than direct comparisons between children andtheir parents, feature discrepancy that measures the differencebetween the child and the parent is used. For the purpose oftraining and testing, we use 200 true child–old parent pairs and200 false child–old parent pairs. Both anthropometric modeland proposed method are evaluated. In order to classify theimage pairs into the true and false child–parent pairs, we use

Page 7: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

1052 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012

Fig. 7. Left: Cumulative match characteristic (CMC) for kinship classification. “Rank” indicates the size (number) of candidate list for comparison. Middle:Pairwise and transfer subspace learning over different subspace dimensions on kinship verification by leave-one-out strategy. “Pairwise” represents the directmatching with PCA and “Baseline 1” and “Baseline 2” the human performances mentioned in Section V. Right: Kinship verification by PCA, LPP, DLA and theproposed transfer subspace learning (TSL) method over different subspace dimensions in the FamilyFace database.

TABLE IIVERIFICATION RESULTS BY FIVE-FOLD CROSS VALIDATION.“C VERSUS. Y” AND “C VERSUS. O” MEAN CHILD–YOUNG

PARENT AND CHILD–OLD PARENT VERIFICATION, RESPECTIVELY.“RAW” AND “STRUCTURE” INDICATE THE RAW IMAGEFEATURE AND STRUCTURE FEATURE, RESPECTIVELY

Euclidean distance and KNN classifier with five-fold cross val-idation where 40 positive sample pairs and 40 negative samplepairs are used as test set at each round. Particularly, the positivesamples are the true pairs containing children and parents andnegative ones are children with randomly selected parents whoare not their true parents. Results are shown in Table II.Raw images and structure features based experimental results

with five-fold cross validation in Table II still support our hy-pothesis. The kinship verification rate based on young parentsis about three percent higher than that of based on old parents.Considering the poor quality of Internet-collected “wild” im-ages of young parents (It is impractical to acquire high qualityimages since the digital camera was not popular back to thattime), the improvements are significant enough to validate ourhypothesis.3) Kinship Verification With Transfer Learning: In the

training phase of the transfer subspace learning based veri-fication, the inputs are 320 2 samples (for five-fold crossvalidation). Half of inputs (320 samples) are child–old parentpairs while the rest of them are child–young parent pairs. AsDLA [10] is a supervised method, both 320 child–old parentpairs and child–young parent pairs consist of 160 true kinshippairs and 160 false ones. For the first part of (3), we initializewith the one generated only by the source, namely, the true

and false child–young parent pairs. As to the second term in(3), we calculate distributions of source (child–young parentpairs, including both true and false ones) and target (child–oldparent pairs, including both true and false ones) by KDE [42].In the test phase, we use 80 child–old parent (40 positive and 40negative for five-fold cross validation) samples for verificationexperiment. All image pairs, either training or testing, are pro-jected into the optimal subspace before KNN classification with

TABLE IIIVERIFICATION COMPARISONS BETWEEN PAIRWISE AND THE

PROPOSED TRANSFER SUBSPACE LEARNING (TSL)METHODS WITH DIFFERENT FEATURES

Euclidean distance. Additionally, we compare the performanceof the proposed feature extraction method with anthropometricmodel. The configuration is the same as the experiments aboveexcept including young parent samples. Fig. 7(b) shows theresults of pairwise and transfer subspace learning producedby leave-one-out strategy over different subspace dimensions.The average training time for pairwise and transfer subspacelearning is 2.84 and 154 seconds respectively given that thesubspace dimension is 25. The average time for verificationis 4.49 milliseconds. Table III lists several results of kinshipverification by different methods and features by five-fold crossvalidation.4) Kinship Verification by Human: Few works have been

done to evaluate human performance on child–parent kinshipverification though individual identification and sibling rela-tionship verifications have been reported in [15] and [30]. Toextensively evaluate our method and probe the feasible learningstrategy by which human being perceive kin relationship, weproceed with two groups of human tests on kinship verification.The first group has 40 training samples (20 true child–oldparent pairs as well as 20 false child–old parent pairs) and 40test samples (20 true child–old parent pairs as well as 20 falsechild–old parent pairs, not overlapped with the training ones).In the second group, however, 20 extra corresponding truechild–young parent pairs are added as supplement of trainingsamples and all other sets are identical with the former ones.The two experiments mentioned above aim at simulating

the process of pairwise and transfer subspace learning basedverification. Twenty people participate in these experimentsand average performances are 53.17% and 56.00% for thefirst and second experiment which are illustrated in Fig. 7(b)as two baselines. Compared with the best performance ofpairwise and transfer subspace learning based verification,humans perform better than most of them with the exception

Page 8: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

XIA et al.: UNDERSTANDING KIN RELATIONSHIPS IN A PHOTO 1053

Fig. 8. Kinship verification on four kinds of kinship relations (Left) and two kinds of races (Middle) based on the FamilyFace database. ROC curve of pairwiseand context based kinship verification (Right). Here we use random guess as the baseline. “TSL” is the proposed transfer subspace learning method and labels like“Age” and “Position” are added one by one. Note “ ” uses ground truth labels while “Age” uses predicted labels via method in [20]. Specifically, we do notadd gender into this experiment in that it is weakly relevant to the direct kinship inference.

of the proposed method, as Fig. 7(b) illustrated. In fact, evenwithout training samples, humans can perform well due to theirsufficient knowledge. This is mostly accumulated when peoplegrow up and helps people perform recognition. Therefore,human being are not as sensitive as machines against newtraining samples. Given more training data, e.g., child–youngparent pairs, however, machines can learn and acquire morediscriminative information.

B. Experiments on FamilyFace Database

There are two purposes for the experiments of this section.One is to investigate the impact of gender and race on kinship,and the other is to evaluate the role of contexts and semanticsin family albums. For the first purpose, we use the UB KinFaceVer2.0, FamilyFace and images downloaded from the Web asthe training and testing data. The entire UB KinFace is used fortraining while 296 true and 296 false child–parent pairs fromthe FamilyFace and theWeb are used for testing. Given a familyphoto, we first detect faces and extract features. Then the differ-ence of each pair of face features is projected into the subspacelearned from the training set followed by KNN classification.Experimental results are shown in Fig. 7(c). To further explorethe impact of the four typical kin relationships on verificationaccuracy, we select 120 (60 positive and 60 negative) sampleswith the same kin relationship and repeat this for the four rela-tionships. As can be seen from Fig. 8(a), verification rates on“daughter–father” and “son–mother” are higher than those on“daughter–mother” and “son–father”. It suggests that gender af-fects the determination of kinship to some extent. In addition,race is taken into consideration since the appearance divergesover different races. Fig. 8(b) illustrates the verification resultson two race groups, i.e., Asian and non-Asian.In paralleled with machine based experiments on the Family-

Face database, we conduct another one accomplished by humanjudges. In these experiments, 20 participants determine the au-thenticity of kin relationships of 32 pairs which are randomlyselected from FamilyFace database. Therefore, there are 8 pairsfor each relation and half of them are positive samples. The ex-perimental results show that the average accuracy by human on“daughter–father” relationship is 58.75%, “daughter–mother”56.25%, “son–father” 55.00%, and “son–mother” 57.50%, re-spectively. We also separate the data set into Asian and non-

TABLE IV“PROPOSED METHOD” MEANS METHOD BASED ON (6), “LINEAR MODEL”

MEANS CONTEXT BASED METHOD ONLY CONSIDERING “KINSHIP” AND “TSL”MEANS THE TRANSFER SUBSPACE LEARNING METHOD ONLY

Asian subsets and evaluate the human performance on kinshipverification. The results are 49.41% and 60.59%, respectively.It is interesting to notice that the verification accuracy for non-Asian is higher than that of Asian. This could be explainedby the fact that most non-Asians have more distinctive facialfeatures.Contextual information based experiments use both Family-

Face database (119 images) and images downloaded from theWeb (100 images), e.g., Facebook, which contain friend and col-league relationships as well. Experiment in this part is dividedinto two steps. The first step only uses the regression model in(7) and only kinship is considered which aims at evaluating theeffectiveness of “gender relation,” “age difference,” “relativedistance,” and “kinship score.” The gender and age estimationmethods are from [20]. Ten-fold cross validation is used and re-sults are shown in Fig. 8(c). As we can see, the more contextualinformation (age, position) is added, the higher ROC score is ob-tained. Moreover, we show the impact of unreliable estimationin Fig. 8(c). Since we use the method in [20] to estimate age,estimation error is inevitable. In Fig. 8(c), the green dot dashline and green solid line represent the performance of kinshipverification with ground truth age label and estimated value, re-spectively. The divergence between these two lines in Fig. 8(c)shows the aforementioned impact of unreliable age estimation.The second step of the experiment is to incorporate the coupleand other relations to restrict the relation function in (7), there-fore leading to an even better result. We use approximately halfof the images for training and the other half for testing. Thenwith (6), we obtain the results shown in Table IV. The averagetime for relationship recognition is 24.8 seconds. Note that sub-space dimensions for “TSL” method is fixed at 25 in Fig. 8 andTable IV.

Page 9: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

1054 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012

C. Discussion

From the experimental results above, it can be concludedthat the appearance similarity gap is so large that the task ofchild–parent classification is very challenging. Nevertheless,for child–parent verification, the proposed approach based ontransfer subspace learning outperforms pairwise kinship verifi-cation, validating that our approach of narrowing the similaritygap between the child and his/her parent in terms of appearanceis effective. In addition, we find that the proposed local featureselection strategy is more discriminative than anthropometricmodels due to a lack of discriminative information in the latter(six dimensions). This is consistent with the intuition thatpeople tend to recognize the kin relationship by comparingfacial features, such as eyes, nose, and mouth.Compared with the result in [25], some factors prevent a

better experimental result. First, variations in pose, lighting andaging of faces in our data set are more significant. Second, colorinformation is not taken advantage of in our experiment dueto the poor quality of young parents images. As a matter offact, family members always have similar hair, eyes and skincolors. In addition, the volume and structure of two databasesare different. The dataset used in [25] has 150 groups of 300images while ours has 200 groups of 600 images and considersdifferent race distributions as well. Finally, though a direct com-parison with [25] is not practical, we have already proved thattransfer subspace learning based method is better than merepairwise verification which is utilized by [25] in the discrimi-nant phase.Moreover, contextual information is useful and results are

significantly improved due to more available semantic con-straints. However, we should point out that both gender andage estimations are imperfect in practice. Improvements oftheir solutions will certainly benefit kinship verification, asillustrated in Fig. 8(c). Furthermore, in this paper, the familyalbum does not contain too many people and relationships weconsider are only kinship, couple and others. When there aremore people in images, more complex relationships shouldbe considered, e.g., siblings, inferior/superior relationships. Inaddition, some differences exist between our testing results andthose in [25], for instance, impacts of gender and race. This isdue to the limited capacity and different bias in each database.In any cases, it proves that both gender and race should becarefully considered for kinship verification.

V. CONCLUSION

In this paper, we investigate the problem of kinship verifica-tion through face images as well as the impact of contexts andsemantics. First, the UB KinFace Ver2.0 was collected from theWeb. Second, we propose a transfer subspace learning methodusing young parents as an intermediate set whose distributionis close to both the source and target sets. Through this learningprocess, the large similarity gap between distributions is re-duced and child–old parent verification problem becomes moretractable. Third, we combine the proposed transfer learningapproach with contextual information in family albums tofurther improve verification accuracy. Experimental resultsdemonstrate our hypothesis on the role of young parents is

valid and transfer learning can take advantage of it to enhancethe verification accuracy. In addition, we prove that contextualinformation can reasonably improve the kinship verificationaccuracy via tests on the FamilyFace database.

ACKNOWLEDGMENT

This work was performed at SUNY–Buffalo when S. Xia wasa Visiting Scholar there. S. Xia and M. Shao are equal contrib-utors to this paper.

REFERENCES[1] Z. Stone, T. Zickler, and T. Darrell, “Toward large-scale face recog-

nition using social network context,” Proc. IEEE, vol. 98, no. 8, pp.1408–1415, Aug. 2010.

[2] P. Viola and M. Jones, “Robust real-time face detection,” Int. J.Comput. Vis., vol. 57, no. 2, pp. 137–154, 2004.

[3] G. Huang, V. Jain, and E. Learned-Miller, “Unsupervised joint align-ment of complex images,” in IEEE Int. Conf. Comput. Vis., Rio deJaneiro, Brazil, 2007.

[4] M. Turk and A. Pentland, “Eigenfaces for recognition,” J. CognitiveNeurosci., vol. 3, no. 1, pp. 71–86, 1991.

[5] M. Bartlett, J. Movellan, and T. Sejnowski, “Face recognition by inde-pendent component analysis,” IEEE Trans. Neural Netw., vol. 13, no.6, pp. 1450–1464, Nov. 2002.

[6] P. Belhumeur, J. Hespanha, and D. Kriegman, “Eigenfaces vs.Fisherfaces: Recognition using class specific linear projection,” IEEETrans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp. 711–720, Jul.2002.

[7] X. He and P. Niyogi, “Locality preserving projections,” in Advancesin Neural Information Processing Systems. Cambridge, MA: MITPress, 2004.

[8] X. He, D. Cai, S. Yan, and H. Zhang, “Neighborhood preservingembedding,” in Proc. IEEE Int. Conf. Comput. Vis., 2005, pp.1208–1213.

[9] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, “Graph em-bedding and extensions: A general framework for dimensionality re-duction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp.40–51, Jan. 2007.

[10] T. Zhang, D. Tao, and J. Yang, “Discriminative locality alignment,” inProc. Eur. Conf. Comput. Vis., 2008, pp. 725–738.

[11] Y. Adini, Y. Moses, and S. Ullman, “Face recognition: Theproblem of compensating for changes in illumination direction,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 7, pp.721–732, Jul. 2002.

[12] T. Ahonen et al., “Face description with local binary patterns: Appli-cation to face recognition,” IEEE Trans. Pattern Anal. Mach. Intell.,vol. 28, no. 12, pp. 2037–2041, Dec. 2006.

[13] Z. Cao, Q. Yin, X. Tang, and J. Sun, “Face recognition with learning-based descriptor,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,2010, pp. 2707–2714.

[14] M. Guillaumin, J. Verbeek, and C. Schmid, “Is that you? Metriclearning approaches for face identification,” in Proc. IEEE Int. Conf.Comput. Vis., 2009, pp. 498–505.

[15] N. Kumar, A. Berg, P. Belhumeur, and S. Nayar, “Attribute and simileclassifiers for face verification,” in Proc. IEEE Int. Conf. Comput. Vis.,2009, pp. 365–372.

[16] K. Jia and S. Gong, “Generalized face super-resolution,” IEEE Trans.Image Process., vol. 17, no. 6, pp. 873–886, Jun. 2008.

[17] C. Liu, H. Shum, and W. Freeman, “Face hallucination: Theory andpractice,” Int. J. Comput. Vis., vol. 75, no. 1, pp. 115–134, 2007.

[18] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, “Labeledfaces in the wild: A database for studying face recognition in uncon-strained environments,” Univ. Massachusetts, Amherst, MA, Tech.Rep. 07-49, 2007.

[19] M. Naaman, R. Yeh, H. Garcia-Molina, and A. Paepcke, “Leveragingcontext to resolve identity in photo albums,” in Proc. ACM/IEEE-CSJoint Conf. Digital Libraries, 2005, pp. 178–187.

[20] A. Gallagher and T. Chen, “Understanding images of groups ofpeople,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009,pp. 256–263.

Page 10: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

XIA et al.: UNDERSTANDING KIN RELATIONSHIPS IN A PHOTO 1055

[21] G.Wang, A. Gallagher, J. Luo, and D. Forsyth, “Seeing people in socialcontext: Recognizing people and social relationships,” in Proc. Eur.Conf. Comput. Vis., 2010, pp. 169–182.

[22] J. Yagnik and A. Islam, “Learning people annotation from the webvia consistency learning,” in Proc. Int. Workshop Multimedia Inf. Re-trieval, 2007, pp. 285–290.

[23] T. L. Berg, E. C. Berg, J. Edwards, M. Maire, R. White, Y. whyeTeh, E. Learned-miller, and D. A. Forsyth, “Names and faces in thenews,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2004, pp.848–854.

[24] A. Gallagher and T. Chen, “Clothing cosegmentation for recognizingpeople,” presented at the IEEE Conf. Comput. Vis. Pattern Recognit.,Anchorage, AK, Jun. 2008.

[25] R. Fang, K. Tang, N. Snavely, and T. Chen, “Towards computationalmodels of kinship verification,” in Proc. IEEE Int. Conf. ImageProcess., 2010, pp. 1577–1580.

[26] S. Ng, C. Lai,M. Tsai, R. Fang, and T. Chen, “Automatic kinship recog-nition using social contextual information,” presented at the Bits onOur Minds, Itchaca, NY, 2011.

[27] S. Xia, M. Shao, and Y. Fu, “Kinship verification through transferlearning,” in Proc. Int. Joint Conf. Artificial Intell., 2011, pp.2539–2544.

[28] M. Shao, S. Xia, and Y. Fu, “Genealogical face recognition basedon UB kinface database,” in Proc. IEEE Conf. Comput. Vis. PatternRecognit. (Workshop Biometrics), 2011, pp. 65–70.

[29] A. Alvergne, C. Faurie, and M. Raymond, “Differential facialresemblance of young children to their parents: Who do children looklike more?,” Evol. Human Behavior, vol. 28, no. 2, pp. 135–144, 2007.

[30] M. D. Martello and L. Maloney, “Where are kin recognition signals inthe human face?,” J. Vis., vol. 6, no. 12, pp. 1356–1366, 2006.

[31] L. DeBruine, F. Smith, B. Jones, S. C. Roberts, M. Petrie, and T.Spector, “Kin recognition signals in adult faces,” Vis. Res., vol. 49,no. 1, pp. 38–43, 2009.

[32] S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans.Knowledge Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.

[33] W. Dai, Q. Yang, G. Xue, and Y. Yu, “Boosting for transfer learning,”in Proc. Int. Conf. Machine Learning, 2007, pp. 193–200.

[34] R. Raina, A. Battle, H. Lee, B. Packer, and A. Ng, “Self-taughtlearning: Transfer learning from unlabeled data,” in Proc. Int. Conf.Mach. Learning, 2007, pp. 759–766.

[35] J. Jiang and C. Zhai, “Instance weighting for domain adaptation inNLP,” in Proc. Annu. Meeting–Assoc. Comput. Ling., 2007, vol. 45,pp. 264–264.

[36] A. Evgeniou and M. Pontil, “Multi-task feature learning,” in Advancesin Neural Information Processing Systems. Cambridge, MA: MITPress, 2007, vol. 19.

[37] N. Lawrence and J. Platt, “Learning to learn with the informativevector machine,” in Proc. Int. Conf. Machine Learning, 2004, pp.512–519.

[38] L. Mihalkova, T. Huynh, and R. Mooney, “Mapping and revisingMarkov logic networks for transfer learning,” in Proc. AAAI, 2007,vol. 22, no. 1, pp. 608–614.

[39] A. Arnold, R. Nallapati, and W. Cohen, “A comparative study ofmethods for transductive transfer learning,” in Proc. Int. Conf. DataMining (Workshops), 2007, pp. 77–82.

[40] W. Dai, G.-R. Xue, Q. Yang, and Y. Yu, “Transferring naive Bayesclassifiers for text classification,” in Proc. AAAI, 2007, pp. 540–545.

[41] W. Dai, G. Xue, Q. Yang, and Y. Yu, “Co-clustering based classifi-cation for out-of-domain documents,” in Proc. ACM SIGKDD Conf.Knowl. Discovery Data Mining, 2007, pp. 210–219.

[42] S. Si, D. Tao, and B. Geng, “Bregman divergence-based regularizationfor transfer subspace learning,” IEEE Trans. Knowl. Data Eng., vol.22, no. 7, pp. 929–942, Jul. 2010.

[43] Y. Su, Y. Fu, Q. Tian, and X. Gao, “Cross-database age estimationbased on transfer learning,” in Proc. IEEE Int. Conf. Acoustics, Speech,Signal Process., 2010, pp. 1270–1273.

[44] A. Gallagher and T. Chen, “Using group prior to identify people in con-sumer images,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,2007, pp. 1–8.

[45] P. Singla, H. Kautz, J. Luo, and A. Gallagher, “Discovery of socialrelationships in consumer photo collections using Markov logic,” pre-sented at the IEEE Conf. Comput. Vis. Pattern Recognit., Washington,DC, 2008.

[46] J. Wong and S. Cho, “A face emotion tree structure representation withprobabilistic recursive neural network modeling,” Neural Comput.Appl., vol. 19, no. 1, pp. 33–54, 2010.

[47] N. Ramanathan and R. Chellappa, “Modeling age progression in youngfaces,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2006, pp.387–394.

[48] T. Chen, W. Yin, X. Zhou, D. Comaniciu, and T. Huang, “Totalvariation models for variable lighting face recognition,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 28, no. 9, pp. 1519–1524, Sep.2006.

[49] A. Jain and S. Li, Handbook of Face Recognition. New York:Springer, 2005.

Siyu Xia received the B.E. and M.S. degrees inautomation engineering from the Nanjing Universityof Aeronautics and Astronautics, Nanjing, China, in2000 and 2003, respectively, and the Ph.D. degreein pattern recognition and intelligence system fromSoutheast University, Nanjing, China, in 2006.From 2010 to 2011, he was a Visiting Scholar

in the Department of Computer Science and Engi-neering, State University of New York at Buffalo.He currently works at Southeast University, Nanjing,China. His research interests include object detec-

tion, applied machine learning, social media analysis, and intelligent visionsystem.

Ming Shao received the B.E. degree in computerscience, the B.S. degree in applied mathematics,and the M.E. degree in computer science and engi-neering from Beihang University, Beijing, China, in2006, 2007, and 2009, respectively. He is currentlyworking toward the Ph.D. degree in computerscience and engineering at the State University ofNew York at Buffalo.His current research interests include sparse

coding, low-rank representation, graph theory,multi-label learning, and applied machine learning

on social media.

Jiebo Luo (S’93–M’96–SM’99–F’09) receivedthe B.S. and M.S. degrees in electrical engineeringfrom the University of Science and Technology ofChina (USTC), Hefei, China, in 1989 and 1992,respectively; and the Ph.D. degree in electricaland computer engineering from the University ofRochester, Rochester, NY, in 1995.He was a Senior Principal Scientist with the

Kodak Research Laboratories, Rochester, NY,before joining the Computer Science Department,University of Rochester, in 2011. His research

interests include signal and image processing, machine learning, computervision, social media data mining, and medical imaging. He has authored over180 technical papers and holds over 60 U.S. patents.Dr. Luo is a Fellow of the SPIE and IAPR. He has been actively involved

in numerous technical conferences, including serving as the General Chairof ACM CIVR 2008; program Co-Chair of IEEE CVPR 2012 and ACMMultimedia 2010; area Chair of IEEE ICASSP 2009–2012, ICIP 2008–2012,CVPR 2008, and ICCV 2011. He has served on the editorial boards of the IEEETRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, theIEEE TRANSACTIONS ON MULTIMEDIA, the IEEE TRANSACTIONS ON CIRCUITSAND SYSTEMS FOR VIDEO TECHNOLOGY, Pattern Recognition, Machine Visionand Applications, and the Journal of Electronic Imaging.

Page 11: 1046 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4 ... · feature extraction [11]–[13], metric learning method [14], at-tributes based classifier [15] and face synthesis and

1056 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 14, NO. 4, AUGUST 2012

Yun Fu (S’07–M’08–SM’11) received the B.Eng.degree in information engineering and the M.Eng.degree in pattern recognition and intelligence sys-tems from Xi’an Jiaotong University (XJTU), China,in 2001 and 2004, respectively; and the M.S. degreein statistics and the Ph.D. degree in electrical andcomputer engineering from the University of Illinoisat Urbana–Champaign (UIUC), in 2007 and 2008,respectively.He was a Research Intern with Mitsubishi Electric

Research Laboratories, Cambridge, MA, in summer2005; with Multimedia Research Lab of Motorola Labs, Schaumburg, IL, insummer 2006. He jointed BBN Technologies, Cambridge, MA, as a Scientist in2008. He holds a part-time Lecturer position at the Department of Computer Sci-ence, Tufts University, Medford, MA, in the spring of 2009. He joined the De-partment of Computer Science and Engineering, SUNY at Buffalo as an Assis-tant Professor in 2010. His research interests include applied machine learning,

social media analytics, human-centered computing, pattern recognition, and in-telligent vision system.Dr. Fu is a life member of ACM, SPIE, AAAI, and Institute of Mathemat-

ical Statistics (IMS), and 2007–2008 Beckman Graduate Fellow. He is theAssociate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMSFOR VIDEO TECHNOLOGY (TCSVT). He is the recipient of the 2002 RockwellAutomation Master of Science Award, Edison Cups of the 2002 GE FundEdison Cup Technology Innovation Competition, the 2003 Hewlett-Packard(HP) Silver Medal and Science Scholarship, the 2007 Chinese GovernmentAward for Outstanding Self-Financed Students Abroad, the IEEE ICIP 2007Best Paper Award, the 2007–2008 Beckman Graduate Fellowship, the 2008 M.E. Van Valkenburg Graduate Research Award, the ITESOFT Best Paper Awardof 2010 IAPR International Conferences on the Frontiers of HandwritingRecognition (ICFHR), the 2010 Google Faculty Research Award, the BestPaper Award for the IEEE ICDM 2011 Workshop on Large Scale VisualAnalytics, IEEE ICME 2011 Quality Reviewer, and the 2011 IC PostdoctoralResearch Fellowship Award.