Eshan Content

Embed Size (px)

Citation preview

  • 8/6/2019 Eshan Content

    1/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 1

    CHAPTER 1

    INTRODUCTION

    PedaniusDioscorides wrote perhaps theearliest known field guide for botanists, giving pictures

    and writtendescriptions of nearly 600 plant species, showing howeach could be found and

    identified. This work laid out the rules of modern taxonomy. All of these workshave in common

    an effort to teach the reader how toidentify a plant or animal by describable aspects of itsvisual

    appearance.

    While the use of describable visual attributes foridentification has been around since

    antiquity, it hasnot been the focus of work by researchers in computervision and related

    disciplines.Existing methodswork byextracting low-level features in images, such as pixelvalues,

    gradient directions, histograms of orientedgradients [6], SIFT [7], etc., which are then used

    todirectly train classifiers for identification or detection.

    In contrast, the use of low-level image features to first learn intermediate representations

    in whichimages are labeled with an extensive list of descriptivevisual attributes, is the method

    used in this paper. Although these attributes couldclearly be useful in a variety of domains (such

    asobject recognition, species identification, architecturaldescription, action recognition, etc.), we

    focus solelyon faces in this paper.These face attributes can rangefrom simple demographic

    information such as gender,age, or ethnicity; to physical characteristics of a facesuch as nose

    size, mouth shape, or eyebrow thickness;and even to environmental aspects such as

    lightingconditions, facial expression, or image quality.

    The attributes are used to train a classifier which would then be used to identify attributes

    from new images.The classifier outputscan then be used to identify faces and search throughlarge

    image collections, and they also seem promisingfor use in many other tasks such as image

    explorationor automatic description-generation. The attributes can be combined to produce

    descriptionsat multiple levels, including object categories, objects,or even instances of objects.

  • 8/6/2019 Eshan Content

    2/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 2

    CHAPTER 2

    IMAGE ATTRIBUTES

    2.1Why might one need these attributes?

    y Visual attributes much like words can be composed, offering tremendousflexibility and efficiency. Attributes can be combined to produce descriptions at

    multiple levels, including object categories, objects, or even instances of objects.

    For example, one can describe white male at the category level (a set of

    people), or white male brown-hair green-eyes scar on forehead at the object

    level (a specific person), or smiling lit-from-above seen-from-left to theprevious for an instance of the object (a particular image of a person). Refer fig.

    1.1

    y Moreover, attributes are generalizable; one can learn a set of attributes from largeimage collections and then apply them in almost arbitrary combinations to novel

    images, objects, or categories. Perhaps most importantly, these attributes can be

    chosen to align with the domain-appropriate vocabulary that people have

    developed over time for describing different types of objects. For faces, this

    includes descriptions at the coarsest level (such as gender and age) to more subtle

    aspects (such as expressions and shape of face parts) to highly face-specific marks

    (such as moles and scars).

    y While describable visual attributes are one of the most natural ways of describingfaces, a persons appearance can also be described in terms of the similarity of a

    part of their face to the same part of another individuals. For example, someones

    mouth might be like Angelina Jolies, or their nose like Brad Pitts. Dissimilarities

    also provide useful information e.g., her eyes are not like Jennifer Anistons.

    We call these similes.

  • 8/6/2019 Eshan Content

    3/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 3

    2.2Uses of trained classifiers

    Two major uses of classifierstrained on describable visual attributes and similes are: face

    verification and image search.

    2.2.1 Face VerificationFace verificationis the problem of determining whether two faces areof the same individual.

    What makes this problemdifficult is the enormous variability in the mannerin which an

    individuals face presents itself to acamera: not only might the pose differ, but so mightthe

    expression and hairstyle. Making matters worse at least for researchers in biometrics is that

    theillumination direction, camera type, focus, resolution,and image compression are all almost

    certain to varyas well. These manifold differences in images of thesame person have confounded

    methods for automaticface recognition and verification, often limiting thereliability of automatic

    algorithms to the domain ofmore controlled settings with cooperative subjects.We approach the

    unconstrained face verificationproblem (with non-cooperative subjects) by comparingfaces using

    our attribute and simile classifieroutputs, instead of low-level features directly. Fig. 1.1shows the

    outputs of various attribute classifiers, for(a) two images of the same person and (b) images

    oftwo different people. Note that in (a), most attributevalues are in strong agreement, despite the

    changes inpose, illumination, and expression, while in (b), thevalues are almost perfectly

    contrasting. By traininga classifier that uses these labels as inputs for faceverification, we

    achieve close to state-of-the-art performanceon the Labeled Faces in theWild (LFW) dataset [8],

    at 85.54% accuracy.

    2.2.2 Image Search

    Another application of describable visual attributesis image search. The ability of current search

    enginesto find images based on facial appearance is limitedto images with text annotations. Yet,

    there aremany problems with annotation-based image search:

    y the manual labeling of images is time-consuming;y the annotations are often incorrect or misleading, as they may refer to other content on a

    webpage;

  • 8/6/2019 Eshan Content

    4/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 4

    y and finally, the vast majority of images are simply not annotated.

    Figs. 1.2a and 1.2b show the results ofthe query, smiling asian men with glasses,

    usinga conventional image search engine (Yahoo ImageSearch, as of November 2010) and our

    search engine,respectively. The difference in quality of search resultsis clearly visible. Yahoos

    reliance on text annotationscauses it to find some images that have no relevanceto the query,

    while our system returns only the imagesthat match the query. In addition, many of the

    correctresults on Yahoo point to stock photography websites,which can afford to manually label

    their images withkeywords but only because they have collectionsof a limited size, and they

    label only the coarsestattributes. Clearly, this approach does not scale.

    Fig. 1.1 An attribute classifier can be trained to recognize the presence or absence of adescribable visual attribute. Theresponses of several such attribute classifiers are shown for (a)

    two images of the same person and (b) two imagesof different individuals. In (a), notice howmost attribute values are in strong agreement, despite the changes in pose,illumination,

    expression, and image quality. Conversely, in (b), the values differ completely despite thesimilarity in thesesame environmental aspects. We train a verification classifier on these outputs

    to perform face verification, achieving 85.54%accuracy on the Labeled Faces in the Wild (LFW)benchmark [8], comparable to the state-of-the-art.

  • 8/6/2019 Eshan Content

    5/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 5

    Fig. 1.2 Results for the query, smiling Asian men with glasses, using (a) the Yahoo image

    search engine (as of November2010) and (b) our face search engine. Conventional image search

    engines rely on text annotations, such as file metadata,manual labels, or surrounding text, whichare often incorrect, ambiguous, or missing. In contrast, we use attribute classifierstoautomaticallylabel images with faces in them, and store these labels in a database. At search

    time, only this databaseneeds to be queried, and results are returned instantaneously. Theattribute-based search results are much more relevant tothe query.

  • 8/6/2019 Eshan Content

    6/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 6

    CHAPTER 3

    OVERVIEW OF RELAVANT WORK

    3.1 Attribute Classification

    Prior research on attribute classification has focusedmostly on gender and ethnicity

    classification. Earlyworks used neural networks to performgender classification on small

    datasets. The Fisherfaceswork [2] showed that linear discriminant analysiscould be used for

    simple attribute classificationsuch as glasses/no glasses. Later, Moghaddam andYang [3] used

    Support Vector Machines (SVMs) trained on small face-prints to classify the genderof a face,

    showing good results. The works of Shakhnarovich and Baluja and Rowley used Adaboost [4] to

    selecta linear combination of weak classifiers, allowingfor almost real-time classification of face

    attributes,with results in the latter case again demonstrated onthe FERET database. These

    methods differ in theirchoice of weak classifiers: the former uses the Haarlikefeatures of the

    Viola-Jones face detector [5], whilethe latter uses simple pixel comparison operators. Ina more

    general setting, Ferrari and Zisserman [6]described a probabalistic approach for learning

    simpleattributes such as colors and stripes.

    In contrast to both of these approaches,which are trying to find relations across differentcategories,we concentrate on finding relations betweenobjects in a single category: faces.Faces

    have many advantages compared to genericobject categories.

    y There is a well-established and consistentreference frame to use for aligning images;y differentiating objects is conceptually simple (e.g., itsunclear whether two cars of the

    same model shouldbe considered the same object or not, whereas no suchdifficulty exists

    for two faces);

    y most attributes canbe shared across all people (unlike, e.g., 4-legged,gothic, or dual-exhaust, which are applicable toanimals, architecture, and automobiles, respectively

    but not to each other).

    All of these benefits makeit possible for us to train more reliable and usefulclassifiers,

    and demonstrate results comparable to thestate-of-the-art.

  • 8/6/2019 Eshan Content

    7/29

  • 8/6/2019 Eshan Content

    8/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 8

    CHAPTER 4

    BUILDING THE IMAGE DATASET

    Internet services have madecollecting and labeling image data easy in the following ways:

    y Large internet photo-sharing sites such as flickr.com and picasa.com are growingexponentially and host billions of public images, some with textual annotations and

    comments. In addition, search engines such as Google Images allow searching for images

    of particular people (albeit not perfectly).

    y Efficient marketplaces for online labor, such as Amazons Mechanical Turk (MTurk),make it possible to label thousands of images easily and with very low overhead.

    By exploiting both of these trends, we could create a large dataset of real-world images with

    attribute and identity labels, as shown in Fig. 4.1 and described next.

    Fig. 4.1. Creating labeled image datasets: Our system downloads images from the internet. Theseimages span many sources of variability, including pose, illumination, expression, cameras, and

    environment. Next, faces and fiducial points are detectedusing a commercial detector and storedin the Face Database. A subset of these faces are submitted to theAmazon Mechanical Turk

    service, where they are labeled with attributes or identity, which are used to create the datasets.

    4.1. Collecting Face Images

    4.1.1 Procedure

    y Using a variety of online sources, we collect face images, including search engines suchas Yahoo Images and photo-sharing websites such as flickr.com. Depending on the type

    of data needed, one can either search for particular peoples names (to build a dataset

  • 8/6/2019 Eshan Content

    9/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 9

    labeled by identity) or for default image filenames assigned by digital cameras (to use for

    labeling with attributes).

    y Next, we apply the OKAO face detector [40] to the downloaded images to extract faces.This detector also returns the pose angles of each face, as well as the locations of six

    fiducial points: the corners of both eyes and the corners of the mouth. These fiducial

    points are used to align faces to a canonical pose.

    The 3.1 millionaligned faces collected using this procedure comprisethe Columbia Face

    Database.

    4.1.2 Conclusion

    The following empirical conclusions can be arrived at from the dataset of images

    y From the statistics of the randomly-named images, it appears that a significant fractionof them contain faces (25.7%), and on average, each image contains 0.5 faces. Thus, it is

    clear that faces are ubiquitous and an important case to understand.

    y The dataset is truly a realworld dataset, with completely uncontrolled lighting andenvironments, taken using unknown cameras andunknownlighting conditions unlike

    existing image datasets.

    4.2Collecting attribute and identity labels

    For labeling images in the Columbia Face Database, the Amazon Mechanical Turk (MTurk)

    service is used.This service matches workers to online jobs createdby requesters, who can

    optionally set quality controlssuch as requiring confirmation of results by multipleworkers, filters

    on minimum worker experience, etc.We submitted 110, 000 attribute labeling jobs showing30

    images to 3 workers per job, presenting atotal of over 10 million images to users. The jobsasked

    workers to select face images which exhibiteda specified attribute.

  • 8/6/2019 Eshan Content

    10/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 10

    4.3The FaceTracer Dataset

    The FaceTracer dataset is a subset of the ColumbiaFace Database, and it includes attribute labels.

    Eachof the 15, 000 faces in the dataset has a variety ofmetadata and fiducial points marked. The

    attributeslabeled include demographic information such as ageand race, facial features like

    mustaches and hair color,and other attributes such as expression, environment,etc. There are 5,

    000 labels in all.

    4.4 ThePubFig Dataset

    The PubFig dataset is a complement to the LFWdataset [8]. It consists of 58, 797 images of 200

    publicfigures. The larger number of images per person (ascompared to LFW) allows one to

    construct subsets ofthe data across different poses, lighting conditions,and expressions for further

    study. Figure 4.2 c showsthe variation present in all the images of a singleindividual. In addition,

    this dataset is well-suited forrecognition experiments.

    Fig 4.2(a) PubFig Development set (60 individuals)

  • 8/6/2019 Eshan Content

    11/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 11

    Fig 4.2(b) PubFig Evaluation set (140 individuals)

    Fig 4.2(c) All 170 images of Steve Martin

    Fig. 4.2 The PubFig dataset consists of 58, 797 images of 200 public figures celebrities and

    politicians partitioned into (a)a development set of 60 individuals and (b) an evaluation set of140 individuals. Below each thumbnail is shown the numberof photos of that person. There is no

    overlap in either identity or image between the development set and any dataset thatwe evaluateon, including Labeled Faces in the Wild (LFW) [8]. The immense variability in appearance

    captured by PubFigcan be seen in (c), which shows all 170 images of Steve Martin(a celebrity).

  • 8/6/2019 Eshan Content

    12/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 12

    CHAPTER 5

    TRAINING THE CLASSIFIER

    Given a particular describable visual attribute saygender how can one train a classifier for

    the attribute?Attributes can be thought of as functions ai that mapimages I to real values ai.

    Large positive values of aiindicate the presence or strength of the ith attribute,while negative

    values indicate its absence.

    5.1Training Architecture:

    An overview of the attribute training architecture isshown in Fig. 5.1 The key idea is to leverage

    the manyefficient and effective low-level features that havebeen developed by the computer

    vision community,choosing amongst a large set of them to find the onessuited for learning a

    particular attribute. This processshould ideally be done in a generic, application- anddomain-

    independent way, but with the ability to takeadvantage of domain-specific knowledge where

    available.

    Fig. 5.1 Overview of attribute training architecture. Given a set of labeled positive and negativetraining images, low-levelfeature vectors are extracted using a large pool of low-level featureoptions.. An automatic, iterative selection process then picks the best set offeatures for correctly

    classifying the input data. The outputs are the selected features and the trained attribute classifier.

  • 8/6/2019 Eshan Content

    13/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 13

    For the domain of faces, this knowledge consistsof an affine alignment procedure and the

    use of lowlevelfeatures which have proven to be very usefulin a number of leading vision

    techniques, especiallyfor faces. The alignment takes advantage of the factthat all faces have

    common structure i.e., two eyes,a nose, a mouth, etc. and that we have fiducial

    pointdetections available from a face detector [40].

    5.2Low Level Features

    Face images are first alignedusing an affine transformation. A set of k low-levelfeature

    extractorsfjare applied to an aligned inputimage I to form a feature set F(I):

    F(I) = {f1(I), , fk(I)}

    We describe each extractorfjin terms of four choices:

    y the region of the face to extract features fromy the type of pixel data to usey the kind of normalization to apply to the datay the level of aggregation to use.The complete set of our 10 regions are shown inFig. 5.2The regions correspond to functional

    parts of aface, such as the nose, mouth, etc.From each region, one can extract different typesof

    information, as categorized in Table 5.1. The typesof pixel data to extract include various color

    spaces(RGB, HSV) as well as edge magnitudes and orientations.To remove lighting effects and

    better generalizeacross a limited number of training images,one can optionally normalize these

    extracted values.

    Finally, one can aggregate normalized values overthe region rather than simply concatenating

    them.This can be as simple as using only the mean andvariance, or include more information by

    computinga histogram of values over the region. A completefeature type is created by choosing a

    region from Fig. 5.2 and one entry from each column of Table 5.1.(Of course,not all possible

    combinations are valid).

  • 8/6/2019 Eshan Content

    14/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 14

    Fig. 5.2The face regions used for automatic feature selectionare shown here on an affine-aligned

    face image. There is (a)one region for the whole face, and (b) nine regions correspondingtofunctional parts of the face, such as the mouth,eyes, nose, etc. Regions are large enough to

    contain the facepart across changes in pose, small errors in alignment, anddifferences betweenindividuals. The regions are manuallydefined, once, in the affine-aligned coordinate system,

    andcan then be used automatically for all aligned input faces.

    Table 5.1 Feature type options. A complete feature type isconstructed by first converting the

    pixels in a givenregion (see Fig. 5.2) to one of the pixel value types fromthe first column, thenapplying one of thenormalizations from the second column, and finallyaggregating these values

    into the output featurevector using one of the options from the last column.

    5.3 Attribute Classifiers

    Attribute classifiers Ci are built using a supervisedlearning approach. Training requires a set of

    labeledpositive and negative images for each attribute, examplesof which are shown in Fig. 5.3

    The goal is tobuild a classifier that best classifies this training databy choosing an appropriate

    Pixel Value Types Normalizations Aggregation

    RGB

    HSVImage Intensity

    Edge MagnitudeEdge Orientation

    None

    Mean NormalizationEnergy Normalization

    None

    HistogramMean/Variance

  • 8/6/2019 Eshan Content

    15/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 15

    subset of the feature setF(I) described in the previous section. We do thisiteratively using

    forward feature selection. In eachiteration, we first train several individual classifierson the

    current set of features in the output set, concatenatedwith a single region-feature combination.

    Each classifiers performance is evaluated using crossvalidation.The features used in the

    classifier withthe highest cross-validation accuracy are added tothe output set. We continue

    adding features until theaccuracy stops improving. For computational reasons, we dropthe

    lowest-scoring 70% of features at each round, butalways keeping at least 10 features.

    While the design of the classifier architecture has been flexible enough to handle a large

    variety ofattributes, it is important to ensure that we havenot sacrificed accuracy in the process.

    We thereforecompare our approach to three previous state-of-the-artmethods for attribute

    classification: full-face SVMsusing brightness normalized pixel values, Adaboost using Haar-

    like features and Adaboostusing pixel comparison features.Results are shown in Table 5.2.

    Using the Columbia Face Database and the learning procedure just described, a total of

    73 attribute classifiers have been trained. Their cross-validation accuraciesare shown in Table

    5.3, and typically range from 80%to 90% (random performance would be 50% for eachattribute).

    Fig. 5.3 Training data for the attribute classifiers consists offace images that match the givenattribute label (positiveexamples) and those that dont (negative examples). Shownhere are a few

    of the training images used for four differentattributes. Final classifier accuracies for all 73attributes areshown in Table 5.3.

  • 8/6/2019 Eshan Content

    16/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 16

    Table 5.2.Comparison of attribute classification performance forgender&smiling attributes.

    Table 5.3. Cross-validation accuracies of the 73 attribute classifier.

    5.4 Simile Classifiers

  • 8/6/2019 Eshan Content

    17/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 17

    Simile classifiers measure the similarity of part of apersons face to the same part on a set of

    referencepeople. We use the 60 individuals from the developmentset ofPubFig as the reference

    people. The left part of Fig. 5.4 shows examples of four regions selectedfrom two reference

    people as positive examples. Onthe right are negative examples, which are simply thesame

    region extracted from other individuals images.

    Fig. 5.4. Each simile classifier is trained using several imagesof a specific reference person,

    limited to a small face regionsuch as the eyes, nose, or mouth. We show here threepositive andthree negative examples each, for four regionson two of the reference people used to train these

    classifiers.

    y the individuals chosen as reference people do not appearin LFW or other benchmarks onwhich we produce results.

    y train simile classifiers to recognize similarity topartof a reference persons face in manymages, not similarity to a single image.

    For each reference person, we train support vectormachines to distinguish a region (e.g.,

    eyebrows, eyes,nose, mouth) on their face from the same regionon other faces.

    CHAPTER 6

  • 8/6/2019 Eshan Content

    18/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 18

    FACE VERIFICATION

    Challenges faced with the existing methods of face verification by the method are these

    twofaces of the same person often make mistakes thatwould seem to be avoidable: men being

    confused forwomen, young people for old, asians for caucasians,etc. On the other hand, small

    changes in pose, expression,or lighting can cause two otherwise similarimages of the same

    person to be misclassified by analgorithm as different. Based on this observation, it is

    hypothesized that the attribute and simile classifierscould avoid such mistakes.

    6.1 Training a Verification Classifier

    Fig. 6.1 illustrates how attribute-based face verificationis performed on a new pair of input

    mages. In orderto decide whether two face images I1 and I2 showthe same person, one can train

    a verification classifierV that compares attribute vectors C(I1) and C(I2)and returns v(I1, I2), the

    verification decision. Thesevectors are constructed by concatenating the result ofn different

    attribute and/or simile classifiers.

    To build V , let us make some observations aboutthe particular form of our classifiers:

    y Values Ci(I1) and Ci(I2) from the ith classifier should be similar if the images are of thesame individual, and different otherwise.

    y Classifier values are raw outputs of binary classifiers, where the objective function istrying to separate examples around 0.

    Let ai = Ci(I1) and bi = Ci(I2) be the outputs of theith trait classifier for each face (1 i

    n). One wouldlike to combine these values in such a way that oursecond-stage verification

    classifier V can make senseof the data. This means creating values that are large(and positive)when the two inputs are of the sameindividual, and negative otherwise. From observationwe see

    that using the absolute difference |ai bi|will yield the desired outputs and the product aibi.

    Putting both terms together yieldsthe tuple pi:

    pi = {|ai bi|, aibi} (2)

  • 8/6/2019 Eshan Content

    19/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 19

    The concatenation of these tuples for all n attribute/simile classifier outputs forms the

    input to theverification classifier V :

    v(I1, I2) = V {(p1, . . . , pn)} (3)

    Training V requires pairs of positive examples (twoimages of the same person) and

    negative examples(images of two different people).

    Fig. 6.1 The face verification pipeline. A pair of input images are run through a face and fiducialdetector [40], and the fiducialsare then used to align both faces to a canonical coordinate system.The aligned face images are fed to each of our attributeand simile classifiers individually to

    obtain a set of attribute values. Finally, these values are compared using a verificationclassifier tomake the output determination, which is returned along with the distance to the decision

    boundary. The entireprocess is fully automatic.

    6.2 Experimental Setup

  • 8/6/2019 Eshan Content

    20/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 20

    Face verification experiments are performed on the LabeledFaces in the Wild (LFW) benchmark

    [8] andalso on thePubFig benchmark. For each computationalexperiment, a set of pairs of face

    images ispresented for training, and a second set of pairs ispresented for testing.

    6.3 Attribute Classifier Results on LFW

    Fig. 6.2 shows results on LFW for our attributeclassifiers (red line), simile classifiers (blue line),

    anda hybrid of the two (green line), along with several other methods (dotted lines). The highest

    accuracy of 85.54% is comparableto the 86.83% accuracy of the current state-of-the-art method

    on LFW. The small bump inperformance from combining the attribute and simileclassifiers

    suggests that while they contain much ofthe same kind of information, there are still

    someinteresting differences. This can be better seen inFig. 6.2, where similes do better in the

    low-falsepositiveregime, but attributes do better in the highdetection-rate regime.

    Fig. 6.2 Face verification performance on LFW of our attributeclassifiers, simile classifiers, and

    a hybrid of the twoare shown in solid red, blue, and green, respectively. Dashedlines are existingmethods

  • 8/6/2019 Eshan Content

    21/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 21

    6.4 Human Attribute Labels on LFW

    It is interesting toconsider how well attribute classifiers could potentiallydo. There are several

    reasons to believe that ourresults are only first steps towards this ultimate goal:

    y We have currently trained 73 attribute classifiers. Adding more attributes, especially fine-scale ones such as the presence and location of highly discriminative facial features

    including moles, scars,and tattoos, should greatly improve performance.

    y Of the 73 attributes, many are not discriminative for verification. For example, facialexpression, scene illumination, and image quality are all unlikely to aid in verification.

    There is also a severe imbalance in LFW of many basic attributes such as gender and age,

    which reduces the expected benefit of using these attributes for verification.

    y The attribute functions were trained as binary classifiers rather than as continuousregressors. While we use the distance to the separation boundary as a measure of degree

    of the attribute, using regression may improve results.

    Fig. 6.3 shows a comparison of face verification performanceon LFW using either these

    human attributelabels (blue line) or our automatically-computed classifieroutputs (red line), for

    increasing numbers ofattributes. In both cases, the labels are fed to the verificationclassifier V

    and training proceeds identically,as described earlier. The set of attributes used foreach

    corresponding point on the graphs were chosenmanually (and were identical for both).

    Verificationresults using the human attribute labels reach 91.86%accuracy with 18 attributes,

    significantly outperformingour computed labels at 81.57% for the same 18attributes. Moreover,

    the drop in error rates fromcomputational to human labels is actually increasingwith more

    attributes, suggesting that adding moreattributes could further improve accuracies.

  • 8/6/2019 Eshan Content

    22/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 22

    Fig. 6.3 Comparison of face verification performance on LFWusing human attribute labels (blue

    line) vs. automaticallycomputedclassifier outputs (red line).

    6.5 Human Verification on LFW

    The high accuracies obtained in the previous sectionlead to a natural question: How well do

    peopleperform on the verification task itself? While manyalgorithms for automatic face

    verification have beendesigned and evaluated on LFW, there are no publishedresults about how

    well people perform onthis benchmark.

    Results are shown in Fig. 6.4. Using the originalLFW images (red curve), people have

    99.20% accuracy essentially perfect3. The task was then made tougher by cropping the images,

    leaving only the facevisible (including at least the eyes, nose and mouth,and possibly parts of the

    hair, ears, and neck). Thisexperiment measures how much people are helped bythe context

  • 8/6/2019 Eshan Content

    23/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 23

    (sports shot, interview, press conference,etc.), background (some images of individuals

    weretaken with the same background), and hair (althoughsometimes it is partially visible). The

    results (bluecurve) show that performance drops to 97.53% atripling of the error rate.

    Fig. 6.4. Face verification performance on LFW by humans

    6.6Attribute Classifier Results on PubFig

    Faceverification is performed on 20, 000 pairs of images of140 people, divided into 10 cross-

    validation folds withmutually disjoint sets of 14 people each. These peopleare separate from the

    60 people in the developmentset ofPubFig, which were used for training the simileclassifiers.

    The performance of the attribute classifierson this benchmark is shown in Fig. 6.5, and it is

    indeedmuch lower than on LFW, with an accuracy of 78.65%.

  • 8/6/2019 Eshan Content

    24/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 24

    Fig. 6.5 Face verification results on theP

    ubFig evaluationbenchmark using the attributeclassifiers.

  • 8/6/2019 Eshan Content

    25/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 25

    CHAPTER 7

    FACE SEARCH

    Image search engines are currently dependent ontextual metadata. This data can be in the form

    offilenames, manual annotations, or surrounding text.However, for the vast majority of images

    on theinternet (and in peoples private collections), this datais often ambiguous, incorrect, or

    simply not present.This presents a great opportunity to use attributeclassifiers on images with

    faces, thereby making themsearchable. To facilitate fast searches on a large collectionof images,

    all images are labeled in an offlineprocess using attribute classifiers. The resulting attributelabels

    are stored for fast online searches usingthe FaceTracer engine

    The FaceTracer engine uses simple text-basedqueries as inputs, since these are both

    familiar andaccessible to most internet users, and correspond well to describable visual

    attributes. Search results are ranked by confidence, so thatthe most relevant images are shown

    first. The engine uses thecomputed distance to the classifier decision boundaryas a measure of

    the confidence. For searches withmultiple query terms, the confidences ofdifferent attribute

    labels such that the final rankingshows images in decreasing order of relevance to allsearch terms

    are combined.To prevent high confidences for oneattribute from dominating the search results,

    we firstconvert the confidences into probabilities by fittingGaussian distributions on attribute

    scores computedon a held-out set of positive and negative examples,and then use the product of

    the probabilities as thesort criteria. This ensures that the images with highconfidences for

    allattributes are shown first.

    Example queries on the search engine are shown inFigs. 7.1a and 7.1b. The returned

    results are all highlyrelevant. Fig. 7.1b additionally demonstrates two other interesting things.

    y it was run on a personalized dataset of images from a single user, showing that thismethod can be applied to specialized image collections as well as general ones.

    y it shows that we can learn useful things about an image using just the appearance of thefaces within it in this case determining whether the image was taken indoors or

    outdoors.

  • 8/6/2019 Eshan Content

    26/29

  • 8/6/2019 Eshan Content

    27/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 27

    This attribute-based search engine can be used inmany other applications, replacing or

    augmentingexisting tools. In law enforcement, eyewitnesses tocrimes could use this system to

    quickly narrow a listof possible suspects and then identify the actual criminalfrom the reduced

    list, saving time and increasingthe chances of finding the right person. On the internet,our face

    search engine is a perfect match for socialnetworking websites such as Facebook, which

    containlarge numbers of images with people. Additionally,the community aspect of these

    websites would allowfor collaborative creation of new attributes. Finally,people could use our

    system to more easily organizeand manage their own personal photo collections. Forexample,

    searches for blurry or other poor-qualityimages can be used to find and remove all suchimages

    from the collection.

  • 8/6/2019 Eshan Content

    28/29

    Describable Visual Attributes for Face Verification and Image Search

    Dept. Of CSE, AMC Engineering College 28

    CHAPTER 8

    CONCLUSION

    These seem to be promising first steps in a newdirection, and there are many avenues to explore.

    Theexperiments with human attribute labeling suggest that adding more attributes and

    improvingthe attribute training process could yield great benefitsfor face verification. Another

    direction to exploreis how best to combine attribute and simile classifierswith low-level image

    cues. Finally, an open question ishow attributes can be applied to domains other thanfaces. It

    seems that for reliable and accurate attributetraining, analogues to the detection and

    alignmentprocess must be found.

    The set of attributes used in this work were chosenin an ad-hoc way; how to select them

    dynamically ina more principled manner is an interesting topic toconsider. In particular, a system

    with a user-in-the-loopcould be used to suggest new attributes. Thanksto Amazon Mechanical

    Turk, such a system would beeasy to setup and could operate autonomously.

  • 8/6/2019 Eshan Content

    29/29

    Describable Visual Attributes for Face Verification and Image Search

    REFERENCES

    Books:

    [1] BahramJavidi, Image Recognition and Classification, Marcel Dekker, 5(2), 2001.

    [2] Takeo Kanade, Anil K. Jain, NaliniKantaRatha, Audio- and video-based biometric person

    authentication, International Association forPattern Recognition, 2005.

    [3] Shigeo Abe, Support Vector Machines for Pattern Classification, Series in Machine

    Perception Artificial Intelligence,pringer, pp. 891-895, 2005.

    [4]De-Shuang Huang, Advanced Intelligent Computing. Theories and Applications Springer,

    2010.

    [5]Cha Zhang, Zhengyou Zhang, Face Detection and Adaptation, Morgan & Claypool

    Publishers, 2011.

    Links:

    [6] Histogram of Oriented Gradients,http://portal.acm.org/citation.cfm?id=1069007

    [7] SIFT, http://en.wikipedia.org/wiki/Scale-invariant_feature_transform

    IEEE:

    [8] Hieu V. Nguyen and Li Bai, Cosine Similarity Metric Learning, Computer Vision - ACCV

    2010: 10th Asian Conference on Computer, 2010.

    [9]Neeraj Kumar, Alexander C. Berg,Peter N. Belhumeur, and Shree K. Nayar, Describable

    Visual Attributes forFace Verication and Image Search, IEEE Transaction on Pattern Analysis

    and Machine intelligence.