10
Open set source camera attribution and device linking Filipe de O. Costa a , Ewerton Silva a , Michael Eckmann b , Walter J. Scheirer c,d , Anderson Rocha a,a Institute of Computing, University of Campinas (Unicamp), Av. Albert Einstein, 1251, Cidade Universitária ‘‘Zeferino Vaz’’, Campinas, SP CEP 13083-852, Brazil b Dept. of Mathematics and Computer Science, Skidmore College, Saratoga Springs, NY 12866, USA c School of Engineering and Applied Sciences, Harvard University, 29 Oxford Street, Cambridge, MA 02138, USA d Dept. of Molecular and Cellular Biology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA article info Article history: Available online 21 September 2013 Communicated by S. Sarkar Keywords: Open set recognition Camera attribution Device linking Decision boundary carving abstract Camera attribution approaches in digital image forensics have most often been evaluated in a closed set context, whereby all devices are known during training and testing time. However, in a real investigation, we must assume that innocuous images from unknown devices will be recovered, which we would like to remove from the pool of evidence. In pattern recognition, this corresponds to what is known as the open set recognition problem. This article introduces new algorithms for open set modes of image source attri- bution (identifying whether or not an image was captured by a specific digital camera) and device linking (identifying whether or not a pair of images was acquired from the same digital camera without the need for physical access to the device). Both algorithms rely on a new multi-region feature generation strategy, which serves as a projection space for the class of interest and emphasizes its properties, and on decision boundary carving, a novel method that models the decision space of a trained SVM classifier by taking advantage of a few known cameras to adjust the decision boundaries to decrease false matches from unknown classes. Experiments including thousands of unconstrained images collected from the web show a significant advantage for our approaches over the most competitive prior work. Ó 2013 Elsevier B.V. All rights reserved. 1. Introduction With the rise of digital photography, a growing number of dig- ital images have become associated with evidentiary pools for criminal and civil proceedings. This presents an often frustrating dilemma for those charged with verifying the integrity and authen- ticity of such images, since they are not always generated by known devices, and can be modified with ease (Rocha et al., 2011). Moreover, with an estimated 250 million images being added to Facebook every day 1 from an enormous set of unknown sources, looking for images from a particular camera of interest be- comes a significant challenge. In this article, we investigate a funda- mentally new approach for the specific problems of Image Source Attribution and Device Linking in the context of open set recognition, where not all cameras are known during training time (Fig. 1). Similar to a ballistics exam in which bullet scratches allow forensic examiners to match a bullet to a particular gun (Li, 2002), image source attribution techniques look for artifacts left in an image by the source camera such as dust on the lens, the interaction between device components and the light, factory de- fects, and other effects (Swaminathan et al., 2009). Sensor attribu- tion problems span a variety of devices such as cameras (Kurosawa et al., 1999; Dirik et al., 2008; Lukáš et al., 2006; Li, 2010), printers (Chiang et al., 2009; Kee and Farid, 2008), and scanners (Khanna et al. 2007; Khanna et al., 2009). Beyond a basic examination of the EXIF headers, which contain textual information about the dig- ital camera type and the conditions under which the photograph was taken but can be easily tampered with or destroyed (Rocha et al., 2011), a class of methods exists that identifies the brand/ model of the source camera (Popescu and Farid, 2005; Kharrazi et al., 2004) by directly considering the image data. These methods generally perform an analysis of color interpolation algorithms. However, many camera brands use components by only a few fac- tories, and the color interpolation algorithm is the same (or very similar) among different models of the same brand of cameras (Ro- cha et al., 2011; Swaminathan et al., 2009). Since fine-grained categorization is of more value to the field of digital image forensics, most source attribution approaches have the objective of identifying the specific camera that took a photo- graph instead of just the device’s brand and model. There is some previous work that analyzes device defects for image source iden- tification (Kurosawa et al., 1999; Geradts et al., 2001), as well as artifacts caused by dust on the lens at the time the image was taken (Dirik et al., 2008). The problem with such methods is that some current camera models do not contain any obvious defects, 0167-8655/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.patrec.2013.09.006 Corresponding author. Tel.: +55 19 3521 5854; fax: +55 19 3521 5838. E-mail addresses: [email protected] (F. de O. Costa), ewerton.silva@ ic.unicamp.br (E. Silva), [email protected] (M. Eckmann), wscheirer@ fas.harvard.edu (W.J. Scheirer), [email protected] (A. Rocha). 1 Public disclosure to the U.S. Securities and Exchange Commission by Facebook: http://goo.gl/cAU0O. Pattern Recognition Letters 39 (2014) 92–101 Contents lists available at ScienceDirect Pattern Recognition Letters journal homepage: www.elsevier.com/locate/patrec

Open set source camera attribution and device linking

Embed Size (px)

Citation preview

Page 1: Open set source camera attribution and device linking

Pattern Recognition Letters 39 (2014) 92–101

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier .com/locate /patrec

Open set source camera attribution and device linking

0167-8655/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.patrec.2013.09.006

⇑ Corresponding author. Tel.: +55 19 3521 5854; fax: +55 19 3521 5838.E-mail addresses: [email protected] (F. de O. Costa), ewerton.silva@

ic.unicamp.br (E. Silva), [email protected] (M. Eckmann), [email protected] (W.J. Scheirer), [email protected] (A. Rocha).

1 Public disclosure to the U.S. Securities and Exchange Commission by Facebook:http://goo.gl/cAU0O.

Filipe de O. Costa a, Ewerton Silva a, Michael Eckmann b, Walter J. Scheirer c,d, Anderson Rocha a,⇑a Institute of Computing, University of Campinas (Unicamp), Av. Albert Einstein, 1251, Cidade Universitária ‘‘Zeferino Vaz’’, Campinas, SP CEP 13083-852, Brazilb Dept. of Mathematics and Computer Science, Skidmore College, Saratoga Springs, NY 12866, USAc School of Engineering and Applied Sciences, Harvard University, 29 Oxford Street, Cambridge, MA 02138, USAd Dept. of Molecular and Cellular Biology, Center for Brain Science, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA

a r t i c l e i n f o

Article history:Available online 21 September 2013

Communicated by S. Sarkar

Keywords:Open set recognitionCamera attributionDevice linkingDecision boundary carving

a b s t r a c t

Camera attribution approaches in digital image forensics have most often been evaluated in a closed setcontext, whereby all devices are known during training and testing time. However, in a real investigation,we must assume that innocuous images from unknown devices will be recovered, which we would like toremove from the pool of evidence. In pattern recognition, this corresponds to what is known as the openset recognition problem. This article introduces new algorithms for open set modes of image source attri-bution (identifying whether or not an image was captured by a specific digital camera) and device linking(identifying whether or not a pair of images was acquired from the same digital camera without the needfor physical access to the device). Both algorithms rely on a new multi-region feature generation strategy,which serves as a projection space for the class of interest and emphasizes its properties, and on decisionboundary carving, a novel method that models the decision space of a trained SVM classifier by takingadvantage of a few known cameras to adjust the decision boundaries to decrease false matches fromunknown classes. Experiments including thousands of unconstrained images collected from the webshow a significant advantage for our approaches over the most competitive prior work.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction

With the rise of digital photography, a growing number of dig-ital images have become associated with evidentiary pools forcriminal and civil proceedings. This presents an often frustratingdilemma for those charged with verifying the integrity and authen-ticity of such images, since they are not always generated byknown devices, and can be modified with ease (Rocha et al.,2011). Moreover, with an estimated 250 million images beingadded to Facebook every day1 from an enormous set of unknownsources, looking for images from a particular camera of interest be-comes a significant challenge. In this article, we investigate a funda-mentally new approach for the specific problems of Image SourceAttribution and Device Linking in the context of open set recognition,where not all cameras are known during training time (Fig. 1).

Similar to a ballistics exam in which bullet scratches allowforensic examiners to match a bullet to a particular gun (Li,2002), image source attribution techniques look for artifacts leftin an image by the source camera such as dust on the lens, the

interaction between device components and the light, factory de-fects, and other effects (Swaminathan et al., 2009). Sensor attribu-tion problems span a variety of devices such as cameras (Kurosawaet al., 1999; Dirik et al., 2008; Lukáš et al., 2006; Li, 2010), printers(Chiang et al., 2009; Kee and Farid, 2008), and scanners (Khannaet al. 2007; Khanna et al., 2009). Beyond a basic examination ofthe EXIF headers, which contain textual information about the dig-ital camera type and the conditions under which the photographwas taken but can be easily tampered with or destroyed (Rochaet al., 2011), a class of methods exists that identifies the brand/model of the source camera (Popescu and Farid, 2005; Kharraziet al., 2004) by directly considering the image data. These methodsgenerally perform an analysis of color interpolation algorithms.However, many camera brands use components by only a few fac-tories, and the color interpolation algorithm is the same (or verysimilar) among different models of the same brand of cameras (Ro-cha et al., 2011; Swaminathan et al., 2009).

Since fine-grained categorization is of more value to the field ofdigital image forensics, most source attribution approaches havethe objective of identifying the specific camera that took a photo-graph instead of just the device’s brand and model. There is someprevious work that analyzes device defects for image source iden-tification (Kurosawa et al., 1999; Geradts et al., 2001), as well asartifacts caused by dust on the lens at the time the image wastaken (Dirik et al., 2008). The problem with such methods is thatsome current camera models do not contain any obvious defects,

Page 2: Open set source camera attribution and device linking

Known Cameras Unknown Cameras

1 2 3

Open Set Camera Attribution Open Set Device Linking

x y zKnown Cameras Unknown Cameras

1 2 3x y z

(a) (b)

Fig. 1. Image source camera attribution is the process of identifying whether or not an image was captured by a specific digital camera. Device linking is the process ofidentifying whether or not a pair of images comes from the same digital camera – without the need for physical access to the device. While much progress has been made inboth areas, the most promising recent approaches (Lukáš et al., 2006; Li, 2010; Goljan and Fridrich, 2007) restrict evaluation to a closed set scenario, where all cameras areknown during training and testing. For instance, closed set camera attribution considers only images from known cameras during training and testing (blue cases in (a)),while closed set device linking considers matched and non-matched pairs of images from known cameras (blue cases in (b)). A more realistic scenario for real worldinvestigations is open set evaluation, where during testing (the operational scenario) we must consider images from unknown cameras. For camera attribution, images fromunknown cameras should be rejected to avoid false attribution (e.g. the red arrow cases in (a)). Similarly, pairs of images containing images from unknown cameras shouldalso be rejected to avoid false linking (e.g. the red/blue and blue/red pairs on the right of (b)). (For interpretation of the references to color in this figure legend, the reader isreferred to the web version of this article.)

F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101 93

while others eliminate defective pixels by post-processing theirimages on-board. Further, some artifacts are strictly temporal bynature and can be easily destroyed (e.g., the lens may be cleanedor switched). In response, forensic experts have given specialattention to methods based on sensor pattern noise (SPN) becausethey can identify specific instances of the same camera model byusing the deterministic component of SPN (Lukáš et al., 2006; Li,2010). This component is a robust fingerprint for identifying sourcecameras and verifying the integrity of images because it is the re-sult of factors such as the variable sensitivity of each sensor ele-ment to light, the inhomogeneity of silicon wafers, and theuniqueness of manufacturing imperfections that even sensors ofthe same model possess (Rocha et al., 2011; Swaminathan et al.,2009; Lukáš et al., 2006).

SPN is also useful for cases in which all a forensic examiner hasis a set of photographs and the question is to determine whether ornot the photographs were taken by the same camera. This chal-lenge is known in the literature as device linking. With device link-ing methods, we can attest that a set of images was taken by aspecific camera by comparing each image to another image thatwe know belongs to the specific camera – without needing physi-cal access to it. This is a practical problem with potentially impor-tant implications in the age of social media. With the possibility ofdifferent photo albums spread across sites (Flickr, Facebook, Picasa,etc.), useful evidence can be isolated if an investigator knows thatcertain suspect images came from the same device, even if she onlyhas access to the public images, and not the camera itself. Solutionsto this problem also apply to the scenario of discovering whetheror not illegal photos posted on the Internet were generated by aknown stolen camera (when an investigator is in possession of acollection of reference images). Further, the commercial spacehas also expressed interest in the device linking problem: a pre-mium service is already available for public and privateinvestigators2.

Nearly all of the prior work in image source attribution and de-vice linking was evaluated in a closed set scenario, in which one as-sumes that an image under investigation was generated by one of nknown cameras available during training. However, it is possiblethat the image may have been generated by an unknown devicenot available during training (i.e., in the set of suspect devices un-der investigation). Therefore, it is essential to model attributionproblems as Open Set scenarios (Fig. 1), which resemble a realistic

2 Quintel Intelligence: http://goo.gl/0pRN9.

situation where we only have partial knowledge of the world weare modeling. In this case, we need a classification model for thefew available classes (cameras under investigation), while tryingto take the large unknown set of unavailable cameras intoconsideration.

In this article we describe a new feature generation approachfor open set classification, as well as a new method for adjustingthe decision boundary of an SVM classifier, based on the availableknowledge of the world during training, called decision boundarycarving (de Oliveira Costa et al., 2012). For image source attributionin an open set scenario, we obtain better results compared to state-of-the-art approaches for a very large dataset composed of 13, 210images from 400 different cameras, including ‘‘in the wild’’ imagesfrom 375 cameras taken from public Flickr albums. Similarly, weachieve higher accuracies for the device linking problem in an openset scenario for a dataset composed of 25,000 pairs images sam-pled from the same set used for attribution. Our approach can beused by investigators to analyze images with different resolutionsand acquisition circumstances, with good classification resultsacross all conditions. In addition, the classification methods wepropose are general enough to also be useful in a diverse set ofclassification problems outside of the realm of forensics.

Our contributions in this article, which is an extension of our re-cent conference paper (de Oliveira Costa et al., 2012), can be sum-marized as follows:

1. A review of the recent literature on camera attribution prob-lems in the context of realistic open set recognition scenarios.

2. A new feature generation approach that addresses the open setclassification problem in digital image forensics by serving as aprojection space for the class of interest.

3. Algorithms for image source attribution and device linkingincorporating decision boundary carving – a new approach formodeling the decision space of a trained SVM.

4. Large scale open set experimentation incorporating thousandsof unconstrained images from the web, including an assessmentof statistical significance for all algorithms considered.

2. Related work

To expand upon what we have touched on above, the problemof matching an image to the device that captured it is known inthe forensics literature as image source attribution (Rocha et al.,2011). There are several features one can rely on for tackling this

Page 3: Open set source camera attribution and device linking

94 F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101

problem such as environment, noise, dust on the lens, hardwareand component imperfections, and effects of operational condi-tions. These same features can also be used to solve the relatedproblem of device linking, where the objective is to verify whetheror not two images come from the same camera without the needfor physical access to the actual device.

2.1. Image source attribution

Recent approaches have explored sensor pattern noise (SPN) forsolving the image source attribution problem. SPN has drawn spe-cial attention from the forensics community because of its abilityto identify a specific camera and not just the brand/model of thedevice. In general, one can consider two types of sensor noise pat-terns: Fixed Pattern Noise (FPN) and Photo Response Non-Unifor-mity Noise (PRNU).

FPN is caused by dark currents (the result of the accumulationof electrons in each sensor element of the device due to thermal ac-tion (Kurosawa et al., 1999)). Normally, it can be eliminated bysome camera models on-the-fly. The PRNU, on the other hand, isdivided into low-frequency defects noise (LFD) and pixel non-uni-formity noise (PNU). LFD is mainly related to light refraction onparticles near the camera, or by zoom configurations, and doesnot make a good candidate for forensics attribution because of itsunstable nature. Conversely, PNU is more stable because it iscaused by the interaction between the light and each sensor ele-ment of the sensor array, revealing important clues for forensics.

Lukáš et al. (2006) have explored using PNU for the imagesource attribution problem. Given a set of images K generated bya camera C, they calculate the residual noise RIj

for each imageIj 2 K using a discrete wavelet transform based filter F

RIj¼ Ij � FðIjÞ ð1Þ

Then, the method calculates a reference pattern SPN c of the sensorpattern noise of the camera C by averaging all residual noise in theimage set (Eq. (2)). The residual noise is used in this step to reducethe influence of scene detail.

SPNc ¼1k

Xk

i¼1

RIi; where k ¼ jKj: ð2Þ

A correlation value qc is calculated between the residual noise RJ ofan image J under investigation and the SPN c of a camera

corr ¼ qcðRJ ; SPNcÞ ¼ðRJ � RJÞ � ðSPNc � SPNcÞjjRJ � RJ jj � jjSPNc � SPNcjj

; ð3Þ

where the mean value of the pixels is denoted by the bar above asymbol. For deciding a match, a threshold s is calculated usingthe Neyman-Pearson approach to minimize the false rejection rate(FRR) while imposing a bound on the false acceptance rate (FAR).A match between an image and camera C exists if the value of qc

is higher than s. High accuracy rates were reported by Lukáš et al.(2006) for a test of nine cameras, and the results were later con-firmed by others (Goljan et al., 2009; Chen et al., 2008).

Extending Lukáš et al.’s method (Lukáš et al., 2006), Li (2010)proposed a sensor pattern noise enhancement method to reducethe influence of the scene content in the noise component. Li ar-gues that the high frequencies (e.g., object edges) in an image di-rectly affect its PRNU component, which subsequently affects thecamera identification results. In Li’s method, given one imageIp 2 K , after extracting its noise n ¼ RIp according to Eq. (1), thereis a normalization of each pixel gðx; yÞ, generating the enhancednoise geðx; yÞ. Eq. (4) represents the model with the best results(a ¼ 7).

geðx; yÞ ¼e�0:5n2ðx;yÞ=a2

; if 0 6 n2ðx; yÞ;�e�0:5n2ðx;yÞ=a2

; otherwise;

(ð4Þ

Li reported lower false-positive rates than (Lukáš et al., 2006) for ascenario with six cameras and the center 512� 512 region of inter-est of the image.

In addition to Lukáš et al. (2006) and Li (2010), several otherrelated methods have been proposed in the literature such as thosebased on clustering image sets (Li, 2010; Caldelli et al., 2010), orapproaches that combine image source attribution informationand color filter interpolation features (Sutcu et al., 2007). To com-plicate things, there are also counter-forensic techniques that focuson discovering and exploiting inconsistencies in camera identifica-tion methods to foil the attribution process (Gloe et al., 2007;Caldelli et al., 2011).

2.2. Device linking

Parallel to the sensor attribution problem, Goljan and Fridrich(2007) proposed an approach for checking whether a pair ofimages was captured by the same acquisition device without hav-ing physical access to that device. The device linking algorithmproceeds as follows. Given a pair of images I1 and I2, the PNU com-ponent is extracted as in Lukáš et al. (2006) (Eq. (1)) and theimages are directly compared by means of the Normalized Cross-Correlation (NCC)

NCCðu;vÞ ¼P

i;jðRI1 ½i; j� � RI1 Þ � ðRI2 ½iþ u; jþ v � � RI2 ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPi;jðRI1 ½i; j� � RI1 Þ

2q

�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

i;jðRI2 ½i; j� � RI2 Þ2

q : ð5Þ

where the bar above a symbol represents the mean.The algorithm requires that both images be of the same size for

comparison. For this reason, Goljan et al. recommend padding theimages with zeros (when necessary) before calculating the NCC.Cropping two regions of the same size around the center of bothimages is also plausible (Goljan and Fridrich, 2007). For the deci-sion step, Goljan et al. explore measures of peak sharpness onthe NCC such as the ratio between the primary and the secondarypeaks (PSR). A PSR that exceeds a threshold (established to mini-mize the false positive rate of misclassifying two images as comingfrom the same camera) is an indication that both images were cap-tured by the same camera.

2.3. Limitations of the prior work in sensor attribution and devicelinking for open set problems

Although the approaches (Lukáš et al., 2006; Li, 2010; Goljanand Fridrich, 2007) we have reviewed are effective, it is importantto understand that they have some deficiencies. To estimate thethreshold s responsible for matching an image to a camera or fordeciding that two images come from the same camera, it is as-sumed that one has images from all possible cameras, and has sub-sequently labeled the entire space in binary fashion as eitherpositive (generated by the camera under investigation) or negative(otherwise). More specifically, when creating an algorithm fordeciding whether an image matches its acquisition camera, oneneeds to eliminate false matches to other possible cameras foundin the wild that are unknown during training. This is known asopen set classification.

To highlight the problem with current attribution methods,consider the following example: suppose that investigators seizeand retain several cameras and hard disks containing child pornog-raphy. In a closed set camera source attribution analysis for thiscase, the investigators would assume that the images come fromone of the seized cameras. However, this assumption is naïve

Page 4: Open set source camera attribution and device linking

Fig. 2. An example of open set classification. The above diagram shows a knownclass of interest (‘‘orange’’), surrounded by other classes that are not of interest,which can be known (‘‘blue,’’ ‘‘gray,’’ ‘‘green’’), or unknown (‘‘?’’). The ‘‘orange’’ classmay refer to images of a suspicious camera under investigation and the ‘‘blue,’’‘‘gray’’ and ‘‘green’’ classes may refer to images of three other known cameras. Sincethis is an open scenario, we can only use the information about the four knowncameras for calculating a decision threshold. However, the real world is morecomplex, and there are countless cameras that cannot be considered as part of thenegative class in training. Any algorithm used operationally must address thisaspect of ‘‘unknowns’’ in the problem. (For interpretation of the references to colorin this figure legend, the reader is referred to the web version of this article.)

F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101 95

and would easily lead to unexpected false matches. Thus, open setattribution is the mode they should operate in. If we are strictabout what an open set means, we can say that Lukáš et al.(2006) and Li (2010) partially dealt with it by defining specific ref-erence patterns for each image, which aims at ruling out unknowndevices. In this article, we go beyond a simple characterization ofthe problem and explicitly deal with it in the context of a novelmachine learning approach.

Although important, the open set recognition problem has re-ceived limited attention in the pattern recognition literature thusfar. For instance, in a study of face recognition evaluation methodsproposed by Phillips et al. (2005), the authors define a threshold ssuch that all scores from an algorithm must necessarily exceed sto be considered a match. However, being greater than s is a neces-sary but not sufficient condition for avoiding false matches. Possibleunknown impostors may exist (exceeding the threshold) since it isimpossible to train the system with all possible impostors. Indeed,nearly all works that we could find in the literature claiming to ad-dress the open set problem do so by simply setting a threshold.

A more sophisticated ‘‘1-vs-Set Machine’’ algorithm based onthe linear SVM (Cortes and Vapnik, 1995) is described by Scheireret al. (2013). To improve the overall open set recognition error, the1-vs-Set Machine balances the unknown classes by obtaining acore margin around the decision boundary from a base SVM, spe-cializing the resulting half-space by adding another plane and thengeneralizing or specializing the two planes to optimize empiricaland open space risk. The process uses the open set training dataand the risk model to define a new ‘‘open set margin’’. Our workhere can be viewed as a variation of this approach that considersa non-linear kernel and moves only a single plane.

In the forensics literature, Wang et al. (2009) perform open setcamera model identification using features based on Color Filter Ar-ray (CFA) coefficients as proposed in Kharrazi et al. (2004) andPopescu and Farid (2005). The authors use a combination of binarySVM classification approaches: Two-class SVMs (TC-SVM) and One-class SVMs (OC-SVM) (Schölkopf et al., 2001). More specifically, theOC-SVM may be considered a solution for open set problems, giventhat it is not restricted to a defined sampling of negatives. Theauthors use only two out of 17 available cameras for training (oneclass of interest and one for outlier definition, which can be seenas a form of accounting for the unknown) and all 17 cameras fortesting. The work reported a true positive rate of approximately91%. Two limitations of the method are that, considering CFA coef-ficients, one can identify only the brand/model of the camera thatgenerated an image. Second, one-class solutions for open set prob-lems tend to generalize poorly (Zhou and Huang, 2003).

3. A methodology for open set camera attribution and devicelinking

In this work, we do not assume that an image under investiga-tion was generated by an available camera. The key difference fromprior work is that for learning, we assume that we have access tosome classes of interest (suspicious cameras under investigation),but a vastly undersampled representation of the space of negativeclasses (all other cameras in the world) when calculating the deci-sion boundary from a rich source of features. Since the space of allnegative classes is beyond our means to quantify, we solve for amore feasible objective during training. Fig. 2 depicts an exampleof this open set classification problem.

Our approaches for open set source camera attribution and de-vice linking share two core elements:

1. Multi-Region Feature Characterization;2. Open Set Classification.

3.1. Multi-region feature characterization

Before performing any training or classification, we need tocharacterize the images under investigation and represent themin a form that is more suitable for computation. The first step isto determine a region of interest for characterization in each im-age. Lukáš et al. (2006) proposed the use of the image’s central re-gion while Li (2010) considered the whole image in some cases.Choosing a common region for all images (e.g., the central region)may be better for image source camera attribution and device link-ing when we have images with different resolutions.

However, according to Li and Satta (2011), different regions ofthe image can contain different information with respect to theacquisition process. Therefore, our method considers multiple re-gions of one image instead of just the central one. We calculatenine 512� 512-pixel regions of interest (ROIs), as shown on thelefthand side of Fig. 3. Regions in the center of the image (1–5) as-sume coincidence with the principal axis of the lens and are ex-pected to have a great deal of scene detail, as amateurphotographers usually focus the object of interest in the center ofthe lens. Peripheral regions (6–9) are also important because somecameras suffer from vignetting, which is a radial falloff of intensityfrom the center of the image, causing a reduction of an image’sbrightness or saturation at the periphery (Li and Satta, 2011;Goldman and Chen, 2005). These are interesting and useful proper-ties for attribution and linking analysis.

Different from Lukáš et al. (2006), who have considered onlygray-scale images when calculating the PNU for camera attribu-tion, we calculate the PNU (as defined in Eqs. (1) and (2)) for eachregion of interest by considering the color channels R (red), G(green), and B (blue), as well as the Y channel (luminance, fromYCbCr color space), which is a combination of the R, G and Bchannels (a gray-scale version of the image) (Wang and Weng,2000). We end up with 36 reference noise patterns to represent

Page 5: Open set source camera attribution and device linking

Fig. 3. Feature generation takes advantage of multiple image regions, giving us a rich pool of data to consider when learning classifiers that perform well in open setscenarios. For source camera attribution (shown above), we calculate the residual noise for each ROI considering the R, G, B and Y color channels. Next, we generate thefeature vector with respect to the correlation between the noise in each ROI and the corresponding noise pattern for each camera. Performing this for all ROIs for one camera,we have 36 features for each image. Device linking proceeds much the same way, except that pairs of images are considered, and we only analyze the R, G, and B colorchannels, giving us 27 features for each image. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

3 Structural risk minimization refers to the inductive principle for model selectionused for learning from finite training data sets and solving the problem of finding themaximum margin separation hyperplane.

96 F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101

each camera, giving us a larger space of information to considerduring learning. The regions and color spaces can be thought ofas a projection space for the class of interest, emphasizing its prop-erties. This is essential for reaching the levels of discriminabilityrequired for open set scenarios where any number of unknownimages can be encountered. Further, this type of region character-ization allows us to compare images with different resolutionswithout color interpolation artifacts, and it is not necessary to dozero-padding, for instance, when comparing images of differentsizes. The feature vector representing an image is created by calcu-lating the correlation between each ROI of the image and the cor-responding noise pattern for each camera, according to Eq. (3). Thefinal result is a 36-dimensional feature vector. Fig. 3 depicts thisprocess.

For device linking, we collect feature vectors from the positiveclass of interest and from the known negative classes. In this case,a positive class of interest consists of actual examples of pairs ofimages coming from the same camera. A negative class consistsof examples of pairs of images coming from different cameras.Since we do not have access to any physical device, we need tocheck if two images originate from the same camera based solelyon their content and properties. We extract a feature vector for apair of images using a 2-D correlation function (Eq. (6)) betweenthe correlated noises of each ROI (c.f., Section 3.1) for a given pairof images I1 and I2.

corr ¼P

i;jðRI1 ½i; j� � RI1 Þ � ðRI2 ½i; j� � RI2 ÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPi;jðRI1 ½i; j� � RI1 Þ

2q

�ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

i;jðRI2 ½i; j� � RI2 Þ2

q : ð6Þ

We considered the color channels R, G, and B in this task becauseexperiments showed that using Y did not provide significantimprovement for device linking. Thus, we end up with 27 featuresin total.

3.2. Open set classification

Open set recognition is more than just setting a decision thresh-old (Lukáš et al., 2006; Li, 2010; Goljan and Fridrich, 2007). Ourapproach begins by learning a classifier from the training set con-sisting of the positive samples and the available negative samples.Given training data ðxi; yiÞ for i ¼ 1 . . . N, with xi 2 Rd andyi 2 f�1;1g, a classifier f is learned such that

f ðxiÞ ¼P 0; yi ¼ þ1< 0; yi ¼ �1:

�ð7Þ

Let X be a training data matrix in which the ith row of X denotesthe row vector xi. Consider that the positive training class consistsof feature vectors P ¼ fxþ1 ;xþ2 ; . . . xþnþg and the negative class(which in itself can contain multiple negative classes) consists ofN ¼ fx�1 ;x�2 ; . . . x�n�g where N ¼ nþ þ n� is the total number oftraining examples..

We can find a maximum margin separation hyperplane~w � xþ b ¼ 0 (linear case) or ~w � /ðxÞ þ b ¼ 0 (non-linear case) bymeans of the classical support vector machine classification algo-rithm (Bishop, 2006; Cortes and Vapnik, 1995). The objective ofSVM is to find a classifier that is able to separate the data pointsfrom P and N , where ~w is the normal to the hyperplane, b is thebias of the hyperplane such that jbj=jj~wjj is the perpendicular dis-tance from the origin to the hyperplane, and / is a mapping func-tion from original feature space to a higher dimensional space bymeans of the kernel trick (Bishop, 2006). After finding a maximummargin separation hyperplane (the solution f) from the trainingdata points X, we have a situation where we have one class of inter-est as the positive data (e.g., consisting of data points from a suspi-cious camera) and one or more classes as the negative data (e.g.,consisting of data points from other known cameras). Accordingto this model, each data point xi during training is at a distancedi to the decision boundary given the SVM solution and can be clas-sified as class þ1 if ~w � xi þ b P 0 or as �1, otherwise.

SVM uses structural risk minimization3 Bishop (2006), allowingit to minimize the risk of misclassification based only on what itknows from the training data. In the open set case, however, manymore classes can appear that are part of the overall negative class(e.g., other cameras in the world), which could adversely affect theoperation of the classifier during testing. Therefore, given the sensorattribution or device linking classification problem, our objective inthe open set scenario is to minimize the risk of the unknown bymaximizing the training classification accuracy while minimizingthe false positive rate (false matches of other cameras to the camera

Page 6: Open set source camera attribution and device linking

Fig. 4. Source camera attribution and device linking using Decision Boundary Carving (DBC). (a) Calculated separation hyperplane, considering the orange and green datapoints as the known positive and known negative classes, respectively, and the white data points as the unknown classes. The bluish region represents the distance betweenthe margins of the positive and negative support vectors. (b) DBC over the calculated hyperplane, represented by the gray region. The process of carving the decision boundaryseeks to minimize the risk of the unknown by minimizing the data error, which is implemented as the normalized accuracy during training 1

AðXÞ

� �. (For interpretation of the

references to color in this figure legend, the reader is referred to the web version of this article.)

F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101 97

of interest). We achieve that by solving the following optimizationproblem:

min1

AðXÞ

� �; ð8Þ

where AðXÞ is the normalized training accuracy given by

AðXÞ ¼ 12

PðnþÞi¼1 hðxþi Þ

nþþPðn�Þ

j¼1 xðx�j Þn�

!ð9Þ

such that

hðxþi Þ ¼1; if f ðxþi ÞP e0; otherwise;

�ð10Þ

4 All the feature vectors for both datasets as well as the list of used images will befreely available upon acceptance at http://www.ic.unicamp.br/rocha/pub/communications.

5 The list of all cameras from Flickr is included as supplementary material.

xðx�j Þ ¼1; if f ðx�j Þ < e0; otherwise:

(ð11Þ

In summary, Eq. (9) means that we analyze the classification valuesof all training samples to find the classification accuracy AðXÞ.

With the calculated hyperplane in the initial training step,which represents the best SVM can do based on what it knows dur-ing training time, we move the decision hyperplane by a value e in-wards towards the positive class or outwards in the direction of thenegative known class(es) in order to account for unknown classesand to minimize future false positive matches. By changing thehyperplane position, we can be more strict about what we knowto be positive examples and therefore classify any other data pointthat is ‘‘too different’’ as negative, or we can be less strict aboutwhat we know with respect to the positive class and accept moredistant data points as possible positive ones.

We consider e to move in the interval given by the most positiveexample (farthest from the decision hyperplane in the positivedirection) and the most negative example (farthest from the deci-sion hyperplane in the negative direction). For simplification, wemight constrain the interval, as we do in this paper, to be tightersuch as 2 ½�1;1�, so as not to drastically change the initial hyper-plane found by the SVM. The e value represents a movement onthe decision hyperplane ~w � xþ bþ e ¼ 0 (the linear case) or~w � /ðxÞ þ bþ e ¼ 0 (the non-linear case considered herein). Weloosely call this process Decision Boundary Carving (DBC). Anexhaustive search to minimize the training data error defines thevalue of e, which we accomplish by solving Eq. (8). Given any datapoint z for testing, it is classified as a positive example if f ðzÞP e.Fig. 4 depicts the DBC process and (A) outlines a high-level algo-rithm for DBC.

3.3. Open set camera attribution and device linking

We can train a classifier for the open set camera attributionproblem in a straightforward manner from the methodology justdefined. The steps are: (1) collecting feature vectors according toSection 3.1 from the positive class of interest and from the knownnegative classes; (2) training a classifier; and (3) finally performingdecision boundary carving for finding the most suitable positionfor the decision hyperplane according to Section 3.2. For solvingthe device linking problem while accounting for its open set nat-ure, all we need to do is adapt the feature extraction procedureas explained in Section 3.1. Similar to the camera attribution prob-lem, the final step consists of training a two-class SVM classifierand moving the decision hyperplane via DBC so as to avoid/dimin-ish false matches.

4. Experiments and results

In this section, we present the experiments performed tovalidate the proposed method as well as to compare it withstate-of-the-art solutions considering the image source attributionand device linking problems. We divide the experiments into twopartitions according to the problems of interest.

4.1. Data sets and experimental protocol

We consider a master data pool of 13,210 unique images com-ing from 400 different digital cameras, with a large portion of theimages consisting of unconstrained photos we downloaded fromthe web. As camera attribution and device linking are conceptuallydifferent, we organized this set of images into two datasets4:

� Dataset A consists of 13,210 images from 400 different cameras.We have physical access to 25 of the cameras and the other 375cameras are represented by images collected in the wild (Flickr)to simulate what happens in a real world investigation whenseizing devices and images. We further organize Dataset A intodatasets A25 and AF. A25 includes images whose source is withinthe set of 25 cameras we have physical access to. AF includesimages from Flickr and represents a large number of sourcecameras. Table 1 provides a comprehensive listing of cameradetails for the A25 dataset5. A25 contains 4,411 images with an

Page 7: Open set source camera attribution and device linking

Table 1Cameras used for all experiments. Images or matched pairs/ non-matched pairs of images from cameras 1–15 are the only ones that can be used for training in both the cameraattribution and device linking problems. Images or matched pairs/ non-matched pairs of images from cameras 16–25 and from cameras in Flickr are always used for testing, andrepresent the ‘‘open set’’.

Camera Native resolution Camera Native resolution

1 Canon PowerShot SX1-LS 3840� 2160 14 Nikon D40 3008� 20002 Kodak EasyShare c743 3072� 2304 15 Olympus SP570UZ 3968� 29763 Sony Cybershot DSC-H55 4320� 3240 16 Panasonic Lumix DMC-FZ35 4000� 30004 Sony Cybershot DSC-S730 2592� 1944 17 Sony Alpha DSLRA 500L 4272� 28485 Sony Cybershot DSC-W50 2816� 2112 18 Olympus Camedia D395 2048� 15366 Sony Cybershot DSC-W125 3072� 2304 19 Sony Cybershot DSC-W120 3072� 23047 Samsung Omnia 2560� 1920 20 Nikon Coolpix S8100 4000� 30008 Apple iPhone 4 (1) 2592� 1936 21 Sony Cybershot DSC-W330 4320� 32409 Kodak EasyShare M340 3664� 2748 22 Apple iPhone 4 (2) 2592� 193610 Sony Cybershot DSC-H20 3648� 2736 23 Canon Powershot A520 1600� 120011 HP PhotoSmart R727 2048� 2144 24 Apple iPhone 3 1600� 120012 Canon EOS 50d 4752� 3168 25 Samsung Star 2048� 153613 Kodak EasyShare Z981 4288� 3216 Flickr 375 different cameras (brands and models) Various Resolutions

Table 2Results (AccF � r, in percentage) for 15, 10, 5, and 2 available cameras during training.An open set with 15 out of 400 cameras consists of training on 15 cameras but testingon images from any of the cameras already seen in training and also from either anyof the cameras in the set 16–25 never used in training or any camera in AF (the wild).The number of testing examples change for each scenario. For instance, for eachpositive testing camera (cameras 1–15 in A25), there are 1� 30 positives and 14� 30negatives from the known classes, 10� 30 negatives from cameras 16–25 in A25 and8,799 negatives from AF, for a total of 30 positives and 9, 519 negatives. Best resultsare highlighted in bold.

Open Set Cameras – Results in %

15 10 5 2

Lukáš et al. (2006) 95.08 94.70 95.04 94.46± 2.40 ± 2.46 ± 2.34 ± 2.69

Li (2010) 94.62 94.06 94.50 93.23± 2.56 ± 2.67 ± 2.49 ± 2.84

TC-SVM – T1 90.95 91.17 93.65 94.29Only central ROI ± 3.14 ± 2.62 ± 2.63 ± 2.81

TC-SVM – T2 95.95 95.35 95.84 94.89Central ROI + DBC ± 1.70 ± 1.95 ± 1.63 ± 2.01

TC-SVM – T3 95.75 95.69 96.88 96.56All ROIs without DBC ± 1.64 ± 1.83 ± 1.48 ± 1.65

TC-SVM – T4 97.18 96.80 97.34 96.49All ROIs + DBC ± 1.63 ± 1.63 ± 1.16 ± 1.60

98 F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101

average of 150 images per camera, while AF contains 8,799images in total with a varied number of images per camera(reflecting the unconstrained acquisition).In the camera attribution experiments, we analyze a scenario inwhich a ‘‘search, seize, and capture’’ has occurred and a set ofcameras and suspicious images were apprehended. The questionis then to check whether any of the images come from any of thecameras or from another device not in the apprehended set ofcameras. This is an open set attribution problem because theapprehended images may or may not have come from one ofthe apprehended cameras. However, any solution dealing withsuch problem needs to be trained using only the apprehendedcameras.For that, we analyze the open set image source attribution prob-lem considering access to sets of 15, 10, 5 and 2 suspect camerasfrom the A25 set. However, we emphasize that in testing, theimages can be generated by any of the cameras represented inA25 and AF. For a better analysis of the training variation, wedivide the images from cameras 1–15 from A25 into five groupsand perform a 5-fold cross validation during training. The cross-validation for set A25 is intended to analyze how the classifierlearning and the decision boundary calculation is affected by var-iation in the training sets. In each round of the cross-validation,we train the algorithms with four folds and use the fifth one fortesting. In addition, the testing set is always complemented byimages from cameras 16–25 in A25 as well as by images fromall cameras in AF. During training, we never have access to cam-eras 16–25, nor the cameras that generated the images in AF.� Dataset L consists of 25,000 pairs of images from two cameras

with both in A25, or one camera in A25 and another in AF. Weconsider 5,000 matched pairs and 5,000 non-matched pairsfrom cameras 1–15 in A25 (Set L1), 5,000 matched-pairs and5,000 non-matched pairs from cameras 16–25 in A25 (Set L2),and 5,000 non-matched pairs from one camera in the set ofcameras 16–25 from A25 and another camera in AF (Set L3).The algorithms are always trained with half of the matchedpairs and non-matched pairs in L1 and tested with the other halfof L1 and all pairs of images in L2 and L3. The process is repeatedonce, switching the training half. The L2 and L3 testing representthe open set validation partition, since the matched-pairs andnon-matched pairs of images therein come from cameras neverused during the training phase of the algorithms.

For SVM classification in all experiments, we use the LibSVM li-brary (Chang and Lin, 2011). We consider only the non-linear caseherein with a radial basis function kernel (RBF):

RBFðxþi ;x�j Þ ¼ expð�c � jxþi � x�j j2Þ ð12Þ

We find the best separation hyperplane during training througha grid search by changing the RBF kernel parameter c (that repre-sents the boundary smoothness between positive and negativesamples) and the cost of misclassification, considering just theknown samples (positive and known negative classes at training).

4.2. Image source attribution

To measure the effectiveness of the image source attributionanalysis, we calculate the accuracy (in %) for each camera consid-ering the relative classification accuracy AccR according to

AccR ¼Accþ þ Acc�

2; ð13Þ

which is the average of the percentage of correct classificationsduring testing for positive (Accþ) and negative (Acc�) images for agiven camera. The average accuracy AccM for each camera iscalculated as

AccM ¼1z

Xz

i¼1

AcciR; ð14Þ

Page 8: Open set source camera attribution and device linking

Table 3Significance tests between our solutions and the ones proposed by Lukáš et al. (2006) and Li (2010). A ‘�’ means statistically significant differences.

Approach n Number of Cameras Lukáš et al. (2006) Li (2010) TC-SVM – T3 TC-SVM + DBC – T4

15 10 5 2 15 10 5 2 15 10 5 2 15 10 5 2

Lukáš et al. (2006) – – – – � � � � � � � � � �Li (2010) � � – – – – � � � � � � � �TC-SVM – T3 � � � � � � � � – – – – � � � �TC-SVM + DBC – T4 � � � � � � � � � � � � – – – –

Table 4Breakdown for the Open Set setup with cameras 1–15 from A25 for training and 400cameras for testing from A25 and AF.

Lukáš et al. (2006) Li (2010) TC-SVM + DBC – T4

TP 92.59% ± 5.15 91.62% ± 5.70 95.81% ± 3.46(27.77/ 30) (27.48/ 30) (28.74/ 30)

TN 97.56% ± 1.77 97.62% ± 1.55 98.54% ± 0.53(9,277/ 9,519) (9,283/ 9,519) (9,370/ 9,519)

F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101 99

where z ¼ 5 divisions of the 5-fold cross-validation protocol usedfor training with the A25 set. The results we report correspond tothe final accuracy AccF , calculated as the average over all cameras

AccF ¼1

NC

XNC

i¼1

AcciM; ð15Þ

where NC is the number of available cameras during training.When we consider access to cameras 1–15, it means we train

with cameras 1–15 as suspect cameras but the images under inves-tigation can come from any of the 25 cameras in A25 as well as fromany of the 375 cameras in AF. For the case with 10 known camerasduring training, we performed two experiments (access to cameras1–10 and separately access to cameras 6–15). For the experimentswith five cameras, we considered three different combinations offive cameras (1–5, 6–10, 11–15). For the experiments with twoavailable cameras, we considered seven different combinations(1–2, 3–4, and so forth).

We validate the proposed method in four ways. We refer to ourapproach considering only ROI #1 (see Fig. 3) as T1, with ROI #1plus the open set decision boundary carving (DBC) solution as T2,our approach considering all ROIs without DBC as T3, and the com-plete solution with all regions plus DBC as T4. For each case, the re-sult is the average of the results for tests considering eachcombination of cameras. Table 2 shows the comparison of the pro-posed methods to Lukáš et al.’s (Lukáš et al., 2006) and Li’s (Li,2010) approaches in an open set scenario.

Specifically, for the case with 15 known suspect cameras duringtraining and 400 cameras in testing, we can see that the open setfeature characterization (T3) with no decision boundary carvingslightly improves the classification accuracy when compared tothe best baseline (Lukáš et al., 2006). T3 reduces the classificationerror by 13%6. When we add the decision boundary carving to fur-ther deal with the open set nature of the problem, the classificationerror is reduced by 42.7%. These results show that it is possible toreliably identify image sources in an open set scenario even withhundreds of unknown cameras.

The results also indicate that the DBC implementation does nothelp in the case of only two suspect cameras when considering allROIs. However, we still see the improvement in results when weconsider more ROIs for the identification, which makes the casefor the open set feature treatment we devised in Section 3.1. Theapproach proposed by Li (2010) does not statistically improvethe classification results of Lukáš et al. (2006) (considering thedataset and the open set evaluation scenario used in this work).We used the Wilcoxon Sign Rank (Wilcoxon, 1999) at the 99% con-fidence interval for statistical significance tests. Any statistical dif-ference between the methods devised by Lukáš et al. (2006) and Li(2010) and our approaches (T3 and T4) is shown in Table 3.

6 The error reduction is calculated by dividing the error of the proposed method bythe error of the baseline and taking the complement. For instance, suppose a proposedmethod A has an error of 10% and the baseline has an error of 50%. In this case, we cansay A reduced the classification error compared to B by: 1� ð0:1=0:5Þ ¼ 80%.

Table 4 contains a breakdown for the case with known cameras1–15 for training and 400 for testing (385 unknown). It shows thetrue positive rate, as well as the true negative rate with results inX% ± r (standard deviation), as well as the raw numbers consider-ing the average of a 5-fold cross validation protocol. Observe thatthe proposed method demonstrates higher performance than theapproaches of Lukáš et al. (2006) and Li (2010): we reduce the riskof the unknown considerably. This is reflected in the high numberof true negatives (and consequently low false positives) with a verylow standard deviation and an increase in the true positives.

4.3. Device linking

For device linking, we present the proposed methodswithout DBC (T3) and with DBC (T4) and compare them to thestate-of-the-art method proposed by Goljan and Fridrich (2007)(Goljan Baseline). We also compared our methods to an extensionof Goljan and Fridrich (2007) (Goljan ML Extended) in order to placeit on a common machine learning basis with our approach. Wecarried out the experiments considering the original and theenhanced residual noise (Li, 2010) of the images.

The baseline proposed by Goljan and Fridrich (2007) essentiallyfails in this open set validation scenario. Regarding this method, aregion of size 512� 512 pixels around the center of each imagewas considered following the original paper. We carried out twoexperiments in this case: in the first experiment, we examinedhow effective the threshold found by the authors was for the data-set used herein. In the second experiment, we calculated the bestthreshold considering the PSR values for our training set (imagesfrom the L1 set in dataset L), and assessed that value using our testset (containing images from L1; L2 and L3). We found accuracies ofonly about 50%. A possible reason for such results is that theauthors perform zero-padding aiming at matching the resolutionof the images, and we just considered the central region of theimages. As the images have different native resolutions in our data-set we could not strictly reproduce their validation scenario. There-fore, we extended their method to automatically find the thresholdusing the same classifier we use in this article: SVM. For that, theclassifier is given a 3-d feature vector composed of the result ofthe correlation values between the pair of images for each colorchannel as input. The performance improvement is remarkable:going from the original chance baseline to 75.6% (see Goljan ML Ex-tended method in Table 5).

Nonetheless, the proposed methods T3 and T4 yield an averageaccuracy of 87.4% for correctly classifying a random image pair as

Page 9: Open set source camera attribution and device linking

Table 5Summary results for device linking obtained with a 2-fold cross-validation protocol.

Exp. ID Info. Used Acc. Std.Dev.

Goljan Baseline (Goljan and Fridrich (2007)) Original PRNU 51% 2.63Goljan Baseline (Goljan and Fridrich (2007)) Enhanced PRNU 51.1% 2.14Goljan ML Extended Original PRNU 76.3 0.98Goljan ML Extended Enhanced PRNU 75.6% 0.97Proposed T3 Original PRNU 86.7 0.81Proposed T3 Enhanced PRNU 87.4% 1.62Proposed T4 Original PRNU 86.5 1.31Proposed T4 Enhanced PRNU 87.4% 2.51

100 F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101

generated or not by the same camera. This is an improvement of11.7% and represents a reduction of 52% in the classification error.A Wilcoxon Sign Rank test (Wilcoxon, 1999) at the 99% confidenceinterval shows that the Goljan ML Extended method is statisticallybetter than the baseline, but both methods are statistically worsethan T3 and T4. In addition, we can see that T3 and T4 are very similar,which means that the multi-region feature characterization was en-ough for open set device linking – the DBC procedure does not play amajor role here. Table 5 summarizes the results of all methods.

5. Discussion

Understanding the image source attribution problem in an openset context is an important step towards solving a real world prob-lem. It is not difficult to imagine a situation where an investigatorneeds to answer the question of whether an apprehended photo-graph belongs to one out of a possible set of known cameras orto some other unknown camera. Just as important as open set cam-era attribution, the open set device linking problem presents theinvestigator with the task of determining if two images were gen-erated by the same camera. Our results for both problems wereencouraging, with remarkable reductions in error rates despitethe presence of a large open set of unconstrained images fromthe wild. However, the method we propose is just a first contribu-tion towards a comprehensive solution to this very challenging andpoorly understood general problem of open set classification.

As with almost any work, ours can certainly be improved upon.We can extend this research to help combat counter-forensic ap-proaches, as discussed by Goljan et al. (2011) and Gloe et al.(2007). The application of the core idea of the proposed featurecharacterization technique, which is tailored to expanding theamount of information available to define a projection space fora unique class of interest, as well as the principles of the decisionboundary carving technique, to other pattern recognition and com-puter vision problems looks promising.

Finally, it is important to bear in mind that the solutions pre-sented herein are applicable to myriad pattern recognition and vi-sion problems operating in the open set mode. According to Duinand Pekalska.

Duin et al. (2005), the way we approach a problem that containsclasses that are ill-sampled, not sampled at all, or are undefined isdefinitely an open issue. Therefore, we envision our open set solu-tions will represent possible starting points for approaches to otherproblems such as face recognition and verification, object recogni-tion and image categorization – all longstanding areas of primeimportance in pattern recognition and computer vision.

Acknowledgments

We acknowledge the financial support of the FAPESP (Grant#2010/05647-4) CNPq (Grant #304352/2012-8), Microsoft and

Samsung. Part of the results presented in this paper were obtainedthrough the project ‘‘Unicamp/Samsung’’, sponsored by SamsungElectronics da Amazônia Ltda., in the framework of law No. 8,248/91.

Appendix A. Decision boundary carving algorithm

Algorithm 1 shows the pseudo-code for the decision boundarycarving process described in this paper. The refinement processin which we further separate the training data into the actual train-ing set (for finding the SVM parameters ~w and b) and validation set(for optimizing the open set threshold e) is not shown for the pur-pose of simplicity.

Algorithm 1.

1: Input : P; N .Set of elements of the positive class ofinterest and the known negative class (es), respectively

2: Output: Decision hyperplane parameters ~w and b, and theopen set threshold e

3:4: ð~w; bÞ SVM-Training (P;N ); .Calculating the SVM

hyperplane parameters5: C Classification (~w; b;P;N ); .Obtaining the

decision scores6: min lowest� decision� scoreðCÞ;7: max highest� decision� scoreðCÞ;8: D þ1 .Setting the initial data error to a

maximum value9: For e0 min to max do .e0 spans possible scores in

C (increments of 10�4 herein)10: ðAþ;A�Þ 011: For all xþ 2 P do12: Aþ Aþ þ hðxþ; e0Þ; .True positives for this

particular position of the hyperplane (Eq. (10))13: End For14: For All x� 2 K do15: A� A� þxðx�; e0Þ; .True negatives for this

particular position of the hyperplane (Eq. (11))16: End For

17: AX 12

AþjPj ;

A�jN j

� �; .Normalized averaged accuracy

(Eq. (9))18: D0 1

AX;

19: If D0 < D then20: D D0;21: e e0;22: End If23: End For24: Return ð~w; b; eÞ

Page 10: Open set source camera attribution and device linking

F. de O. Costa et al. / Pattern Recognition Letters 39 (2014) 92–101 101

References

Bishop, C.M., 2006. Pattern Recognition and Machine Learning, 1st ed. Springer.Caldelli, R., Amerini, I., Picchioni, F., Innocenti, M., 2010. Fast image clustering of

unknown source images. In: IEEE International Workshop on InformationForensics and Security (WIFS), pp. 1–5.

Caldelli, R., Amerini, I., Novi, A., 2011. An analysis on attacker actions in fingerprint–copy attack in source camera identification. In: IEEE International Workshop onInformation Forensics and Security (WIFS), Foz do Iguaçu, pp. 1–6.

Chang, C., Lin, C., 2011. LIBSVM: a library for support vector machines. Transactionson Intelligent Systems and Technology (TIST) 2 (3), 27:1–27:27.

Chen, M., Fridrich, J., Goljan, M., Lukáš, J., 2008. Determining image origin andintegrity using sensor noise. IEEE Transactions on Information Forensics andSecurity (TIFS) 3 (1), 74–90.

Chiang, P., Khana, N., Mikkilineni, A.K., Segovia, M.V.O., Suh, S., Allebach, J.P., Chiu,G.T.C., Delp, E.J., 2009. Printer and scanner forensics. IEEE Signal ProcessingMagazine 72 (2), 72–83.

Cortes, C., Vapnik, V., 1995. Machine learning. In: Support-Vector Networks, 20thed. Kluwer, pp. 273–297 (Ch.).

de Oliveira Costa, F., Eckmann, M., Scheirer, W.J., Rocha, A. 2012. Open set sourcecamera attribution. In: SIBGRAPI Conference on Graphics, Patterns and Images,pp. 1–8.

Dirik, A.E., Sencar, H.T., Memon, N., 2008. Digital single lens reflex cameraidentification from traces of sensor dust. IEEE Transactions on InformationForensics and Security (TIFS) 3 (3), 539–552.

Duin, R.P.W., Pekalska, E., 2005. Open issues in pattern recognition. In: InternationalConference on Computer Recognition Systems (CORE), pp. 27–42.

Geradts, Z.J., Bijhold, J., Kieft, M., Kurosawa, K., Kuroki, K., Saitoh, N., 2001. Methodsfor identification of images acquired with digital cameras. EnablingTechnologies for Law Enforcement and Security 4232, 505–512.

Gloe, T., Kirchner, M., Winkler, A., Böhme, R., 2007. Can we trust digital imageforensics? In: ACM Multimedia, pp. 78–86.

Goldman, D.B., Chen, J.H., 2005. Vignette and exposure calibration andcompensation. In: International Conference on Computer Vision (ICCV), pp.899–906.

Goljan, M., Fridrich, J., 2007. Identifying common source digital camera from imagepairs. In: IEEE International Conference on Image Processing (ICIP), pp. 14–19.

Goljan, M., Fridrich, J., Filler, T., 2009. Large scale test of sensor fingerprint cameraidentification. Proceedings of SPIE 7254 (pp. 72540I–72540I–12).

Goljan, M., Fridrich, J., Chen, M., 2011. Defending against fingerprint-copy attack insensor-based camera identification. In: IEEE Transactions on InformationForensics and Security (TIFS), pp. 227–236.

Kee, E., Farid, H., 2008. Printer profiling for forensics and ballistics. ACM Workshopon Multimedia and Security 10, 3–10.

Khanna, N., Mikkilineni, A.K., Delp, E.J., 2009. Scanner identification using feature-based processing and analysis. IEEE Transactions on Information Forensics andSecurity (TIFS) 4 (1), 123–139.

Khanna, N., Mikkilineni, A.K., Chiu, G.T.C., Allebach, J.P., Delp, E.J., 2007. Scanneridentification using sensor pattern noise, SPIE Security, Steganography andWatermarking of Multimedia Contents (SSWMC) 6505, 1K–1L.

Kharrazi, M., Sencar, H., Memon, N., 2004. Blind source camera identification. In:IEEE International Conference on Image Processing (ICIP), Singapore, pp. 709–712.

Kurosawa, K., Kuroki, K., Saitoh, N., 1999. CCD fingerprint method – identification ofa video camera from videotaped images. In: IEEE International Conference onImage Processing (ICIP), pp. 537–540.

Li, D., 2002. Ballistics projectile image analysis for firearm identification. IEEETransactions on Image Processing (TIP) 15 (10), 2857–2865.

Li, C.-T., 2010. Source camera identification using enhanced sensor pattern noise.IEEE Transactions on Information Forensics and Security (TIFS) 5 (2), 280–287.

Li, C.-T., 2010. Unsupervised classification of digital images using enhanced sensorpattern noise. In: IEEE International Symposium on Circuits and Systems(ISCAS), pp. 3429–3432.

Li, C.-T., Satta, R., 2011. On the location-dependent quality of the sensor patternnoise and its implication in multimedia forensics. In: Proceedings of IVInternational Conference on Imaging for Crime Detection and Prevention(ICDP), pp. 1–6.

Lukáš, J., Fridrich, J., Goljan, M., 2006. Digital camera identification from sensorpattern noise. IEEE Transactions on Information Forensics and Security (TIFS) 1(2), 205–214.

Phillips, P., Grother, P., Micheals, R., 2005. Handbook of face recognition. In:Evaluation Methods on Face Recognition. Springer, pp. 329–348 (Ch.).

Popescu, A., Farid, H., 2005. Exposing digital forgeries in color filter arrayinterpolated images. IEEE Transactions on Signal Processing (TSP) 53 (10),3948–3959.

Rocha, A., Scheirer, W., Boult, T.E., Goldenstein, S., 2011. Vision of the unseen:current trends and challenges in digital image and video forensics. ACMComputing Surveys (CSUR) 42 (26), 26:1–26:42.

Scheirer, W, Rocha, A., Sapkota, A., Boult, T.E., 2013. Towards open set recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 35 (7),1757–1772.

Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C., 2001.Estimating the support of a high-dimensional distribution. Neural Computation13 (7), 1443–1471.

Sutcu, Y., Bayran, S., Sencar, H., Memon, N., 2007. Improvements on sensor noisebased source camera identification. In: IEEE Internationa Conference onMultimedia and Expo, pp. 24–27.

Swaminathan, A., Wu, M., Liu, K., 2009. Component forensics – theory,methodologies and applications. IEEE Signal Processing Magazine 26 (2), 38–48.

Wang, X., Weng, Z., 2000. Scene abrupt change detection. In: Canadian Conferenceon Electrical and Computing, Engineering, pp. 880–883.

Wang, B., Kong, X., You, X., 2009. Source camera identification using support vectormachines. In: Advances in Digital Forensics V. IFIP Advances in Information andCommunication Technology, vol. 306. Springer, Boston, pp. 107–118.

Wilcoxon, F., 1999. Individual comparisons by ranking methods. Biometrics Bulletin1 (6), 80–83.

Zhou, X., Huang, T., 2003. Relevance feedback in image retrieval: a comprehensivereview. Multimedia Systems 8 (6), 536–544.