[IEEE 2010 17th IEEE International Conference on Image Processing (ICIP 2010) - Hong Kong, Hong Kong (2010.09.26-2010.09.29)] 2010 IEEE International Conference on Image Processing

SVD BASED IMAGE MANIPULATION DETECTION

Gokhan GUL1, Ismail AVCIBAS2, Fatih KURUGOLLU3

1. Interdisciplinary Centre for Security, Reliability and Trust, Luxembourg University, Luxembourg. ([email protected])

2. Electrical and Electronics Engineering Department, Baskent University, Turkey. ([email protected]) 3. Sch. of Electronics, Electrical Eng. and Computer Sc., Queen’s University, Belfast, UK. ([email protected])

ABSTRACT

In this paper we present a novel method based on singular value decomposition (SVD) for forensic analysis of digital images. We show that image tampering distorts linear dependencies of image rows/columns and derived features can be accurate enough to detect image manipulations and digital forgeries. Extensive experiments show that the proposed approach can outperform the counterparts in the literature.

Index Terms — Digital Forensics, Image Authentica-tion, Singular Value Decomposition, Classification.

1. INTRODUCTION

In the last decades research on image forensics has become an emerging field due to the availability of sophisticated image editing tools and the lack of methodologies for validating the authenticity of digital images. With the ease of using image-processing tools, non-existing situations can be created and presented as evidence in a court of law. This diminishes the credibility of photographs as a definitive proof of events. Digital image forensics, in this context, is concerned with ascertaining the source and the potential authenticity of an image. An image can be captured from the real world by a digital camera or a scanner or can be synthetically created by computer graphic (CG) software. On the other hand, an image can be authentic or manipulated.

Several techniques have been proposed in the literature to detect various forms of digital tampering. In the case that an image is captured from the real world by a digital camera and the camera is available to the analyzer, it might be possible to use camera fingerprints, related to its sensor noise, Color Filter Array (CFA) interpolation and dynamic range to discover the tampered areas. However, this is a very challenging task and it might be possible that both the tampered parts and the background are captured by the same camera as in copy-move forgery. In the presence of the camera (or a number of images taken by the camera), Farid et al. proposed a method with the assumption that it is often difficult to exactly match the lighting conditions when creating a forgery from multiple images [1]. In the absence of the camera, fragile watermarks have been proposed for content authentication and temper detection [2], [3]. Major

drawback of this approach is that the watermark should be embedded before the tampering occurs. This limits their application to the cameras which are equipped with a watermarking module. Moreover, it is well known that the overwhelming majority of images circulating in the media or internet do not contain any watermark.

Besides the aforementioned techniques based on a priori information about the availability of the camera or the integration of a watermarking process to the camera, there have been a number of approaches which do not need any a priori information. One observation is that CFA interpolation and/or the nature of re-sampling introduce pixel-wise correlations [4], [5]. In both cases, it is assumed that the existing correlations would be destroyed by any alteration. Later, Lucas et al. proposed a method to detect and to localize the tampering by analyzing the inconsistencies in the sensor pattern noise extracted from the camera [6]. One other assumption reveals the fact that the original image segment and the pasted one exhibit a certain correlation, which can be used to detect the copy-move forgery [7]. Motivated by the de-correlation between tampered and original image segments, Sutcu et al. investigated the potential of sharpness/blurriness adjustment using the regularity properties of wavelet transform coefficients [8]. In [9] and [10] another approach is devised to the image forensics problem. Their assumption is that without applying image-processing operations such as rotation, scaling, brightness adjustment, etc. image tampering is evident to the human eye. However, applying these manipulations destroys the natural statistics of the images. Therefore, informed classifiers with the type and the degree of manipulations can validate the authenticity of an image under consideration if powerful features are incorporated to the classification process.

In this paper, we propose a linear dependency based approach to detect the image manipulations, which are potentially applied to preserve the imperceptibility criterion during tampering. The rest of the paper is organized as follows. Section II briefly introduces SVD. In section III, SVD based features are presented. In section IV, controlled and uncontrolled experiments are performed for forensic analysis. Finally, the conclusions are drawn in section V.

1765978-1-4244-7994-8/10/$26.00 ©2010 IEEE ICIP 2010

Proceedings of 2010 IEEE 17th International Conference on Image Processing September 26-29, 2010, Hong Kong

2. SINGULAR VALUE DECOMPOSITION

SVD is an extremely useful tool in linear algebra. It

decomposes a matrix A IR into the product of two

orthonormal matrices U IR , V IR and a

diagonal matrix S IRM

as follows N

NM

MM NN

.USVA (1) The diagonal elements of matrix S are non-negative and sorted in decreasing order where M and N are the dimensions of A. These elements produce a vector called singular value vector

),min(1 2 NM

SDiagSv (2)

Singular values of a matrix indicate the soft relationship between image rows/columns in terms of linear dependency. More precisely, singular values tend to become zero if the image rows and/or columns tend to become relatively linearly dependent [11]. Two rows/columns, c1 and c2, of a matrix are called linearly dependent if they can be defined as c2=K·c1 where K is an integer. Accordingly, we define relative linear dependency between two rows/columns as the closeness of K to an integer.

3. SVD BASED FEATURES A good model of relative linear dependency of image

rows/columns can be accurate enough to identify the image manipulations. It can be expected that the natural images introduce certain types of linear dependencies and these dependencies would be destroyed by any alteration. Since spatial domain represents a strong dependency between pixels in local neighborhoods the features are extracted from the sub-blocks representing locality in the spatial domain rather than the whole image. The whole image on the other hand provides poor statistics and does not represent local dependency at all. Therefore we adopted the strategy in which the features are extracted from the image sub-blocks and then merged together to represent the whole image. We consider the image sub-blocks of sizes w×w (w=3,4,…,20). It is obvious that the larger the sub-block size, the less the number of blocks to be evaluated and thus the less inter-block dependencies are taken into account. In order to alleviate this problem, the blocks are overlapped proportionally to the block size to be able to take into account the correlations both within and among the image blocks. The image sub-blocks are overlapped according to the following overlapping rules:

If , no overlapping 103 wIf , 50% overlapping 1610 wIf , 75% overlapping 2116 w

Two different types of features are derived using this sub-block scheme. We obtain each feature for fixed sub-block size and fixed index of singular values. Let B be the number of overlapped blocks according to the block size w. First type of SVD based features is defined as follows:

Bsv iSvB

iwf ))((1),(1 , i=2,3,…,w, w=3,4,…20 (3)

where )( indicates the unit impulse function. 1i is out of consideration since is non-zero if and only if all elements of A are zero. Second type of features is the means of the normalized singular values at index i in Sv of blocks of size w×w.

)1,(1 wfsv

Bsv BCiSv

Biwf

)()(1),(2 , i=1,2,…,w, w=3,4,…20 (4)

where indicates the sum of singular values of the related sub-block B. Note that type 1 and type 2 features have previously been used in [11] for steganalysis purposes.

)(BC

4. EXPERIMENTAL RESULTS

4.1. Image Set

In order to provide comparable results, 200 natural images used in [9] and [10] have been used in the experiments. These images are subjected to several manipulations with various degrees as given in Table I. Besides, two different image sets are used to show that the designed classifiers are also capable of detecting the natural images. First image set, Gr, contains 1800 jpeg images which have the quality factors (Q) equal to 100 [12]. Second image set is provided by the authors of [13] which contains the mixture of 800 images from the authors' collection, Pc, (Q=75) and 800 images obtained by means of Google image search facility, Gs, (Q=various). We create a photomontage image set, Ps, containing 100 images. This set consists of two subsets: Ps1 and Ps2. The first subset, Ps1, is created by selecting 80 images from the image set used in [9], with the criterion that the images should not be evident as to be manipulated. The other subset (Ps2) comprises 20 images which are professionally manipulated by the authors of [10]. All the images have been converted to gray level before the feature extraction.

TABLE I SELECTED IMAGE MANIPULATION TYPES AND DEGREES S1 S2 S3 S4 S5

Scale-up (%) 1 5 10 25 50

Scale-down (%) 1 5 10 25 50 Brightness 1 5 15 25 50

Contrast 1 5 15 25 50

Rotation (0) 1 5 15 30 45 Blurring (radius)

0.1 0.2 0.3 0.5 1.0

Sharpening Equalization

Photoshop Default

4.2. Image Manipulation Detection

Manipulated images cannot be distinguished from the genuine ones by the human eye if the forgery is concealed successfully by the use of some image manipulation tools. However matching the conditions of an image to those of the original one is not trivial. This usually requires some combinations of individual image modifications such as

1766

scaling, rotation and the adjustment of lighting conditions. The increase of the strength and the number of alterations make use of statistical methods possible under the digital forensic concept. Therefore, pre-trained classifiers may provide convenient way of detecting digital forgeries. Three different classifier modes are considered in this work: clairvoyant, semi-blind and blind.

In the clairvoyant mode, the classifier is aware of the type of manipulation as well as of its strength. For instance, an image is rotated with the strength of 1°. This kind of detector is not realistic since both the type and the strength of the attack cannot be foreseeable. However, it gives some insights about the discrimination ability of the features. In the semi-blind mode, the classifier is aware of the type of the manipulation but not of its strength. For example, an image is up-sampled with an unknown degree. In order to train such classifiers, all degrees of manipulations should be considered. Blind mode on the other hand is the most realistic mode. In this mode, the classifier is aware of neither the type nor the strength of the attack which simulates the case when one downloads a possibly altered image from the internet. When compared to the semi-blind mode, there is only one classifier in the blind mode and less effort is required for the forensic analysis. However, generally the detection performance of the blind classifier is worse when compared to a specific semi-blind classifier.

Beginning with the clairvoyant mode, we proceed to the semi-blind and blind modes in the following experiments. In order to make the results comparable with [9], same linear regression classifier is used. Half of the original and manipulated images are used for training and the rest for testing. Sequential Forward Floating Search (SFFS) algorithm is fed with the SVD based features [14]. According to [15] the number of selected features should be less than 10% of the number of images in the training phase in order to claim that the classifier generalizes but not memorizes. Therefore, the number of selected features is upper-bounded by 20. The accuracy rates of the clairvoyant classifiers are listed in Table II. Except for the lower rates of brightness and contrast, the detection is reliable. For the scale-up, scale-down and rotation, average number of features selected by the classifier is 4, 4 and 7 respectively. This is promising since the detection accuracy is perfect for these manipulations.

In the next step, semi-blind classifiers are built by combining different strengths of specific alterations. We create training sets formed of 100 images taking 20 images from each rate Si and the corresponding original images for each manipulation type. Testing sets are created similarly with the rest of the images. For blurring, from each manipulation degree we take 25 images because Gaussian blur with radius 0.1 does not create any manipulated image when applied by the use of Adobe Photoshop (denoted by N). Detection performances of semi-blind classifiers are given in Table III. The results of semi-blind classifiers are not uniquely comparable with [9] since we mix equally from

each rate and in [9] there is not explicit description of semi-blind classifier construction except for scaling-up. However, in general, the proposed features are superior to the joint feature set used in [9]. This can be seen in the scatter plots of the best three features for rotation and scale-up attacks depicted in Fig.1 and Fig. 2 which clearly show how well the features separated from each other.

Finally, for the design of the first blind classifier, f1, we follow the same structure as in [10], i.e. the same number of images from each type and the degree of the manipulation to build the classifier. In order to make the results comparable with [9], another blind classifier, f2, is designed with all items included in Table I. More precisely, from each manipulation type and strength we consider three images, from sharpening eight images and from equalization five images resulting in 100 images in total for training. Accordingly, the testing set is designed using the rest of the images. SFFS feature selection algorithm selects the best feature vectors. The detection performances of the first f1 and the second f2 blind classifiers are given in Table IV.

TABLE II DETECTION PERFORMANCES OF CLAIRVOYANT CLASSIFIERS

S1 S2 S3 S4 S5

Scale-up 100% 100% 100% 100% 100%

Scale-down 100% 100% 100% 100% 100% Brightness 66% 74% 85% 90.5% 97.5%

Contrast 67% 78.5% 92.5% 97% 99.5%

Rotation 100% 100% 100% 100% 100% Blurring N 77.5% 100% 100% 100%

Sharpening 99.5%

Equalization 97.5%

TABLE III DETECTION PERFORMANCES OF SEMI-BLIND CLASSIFIERS

False Miss Accuracy

Scale-up 0% 0% 100%

Scale-down 0% 0% 100% Brightness 12% 30% 79%

Contrast 9% 29% 81%

Rotation 0% 0% 100% Blurring 5% 14% 90.5%

TABLE IV DETECTION PERFORMANCES OF THE BLIND CLASSIFIERS

False Miss Accuracy

f1 11% 7% 92%

f2 6% 22% 86%

TABLE V SELECTED FEATURE VECTORS BY THE SFFS FOR TWO BLIND CLASSIFIERS

)2,6(2

svf )1,7(2svf )6,7(2

svf ) 6,8(2svf )5,8(1

svf f1

)6,8(1svf )5,11(2

svf )4,14(1svf )4,18(1

svf )11,20(2svf

)4,4(2svf )5,5(2

svf )4,6(1svf ) 5,7(1

svf )7,9(1svf

f2

)12,16(1svf ) 12,17(1

svf

1767

As expected, the detection performance of the second blind classifier is poorer compared to the first one. However, both designs are superior to [9] and [10]. In Table V, we enumerate the selected feature vectors of two blind classifiers.

[3] Watanabe J., Hasegawa M., and Kato S., “A study on a watermarking method for both copyright protection and tamper detection,” Proc. ICIP, International Conference on Image Processing, vol. 4, pp. 2155– 2158, 2004.

[4] A. C. Popescu and H. Farid, “Exposing digital forgeries by detecting traces of resampling,” IEEE Trans. Signal Process. 53(2), 758–767 2005.

4.3. Forensic Analysis [5] A. C. Popescu and H. Farid, “Exposing digital forgeries in color filter array interpolated images,” IEEE Trans. Signal Process., 53(10), 3948–3959, 2005.

In this section pre-trained blind classifiers which are developed in the previous section using the 200 BMP image set with two different feature vectors, are tested on the suspicious parts of the photomontage images. We assume that only one portion of an image can be suspicious and this region can be determined manually as in [9]. Moreover, to be consistent with the results that are obtained for photomontage images, we also apply the tests to the original image sets: Pc, Gs and Gr.

[6] Luk J., Fridrich J., and Goljan M., “Detecting digital image forgeries using sensor pattern noise,” Proc. of SPIE Electronic Imaging, Photonics West, January 2006.

[7] J. Fridrich, D. Soukal, and J. Lukas, “Detection of copy-move forgery in digital images,” in Proc. of the Digital Forensic Research Workshop, Cleveland OH, 2003.

[8] Y. Sutcu, B. Coskun, H. T. Sencar, and N. Memon, “Tamper detection based on regularity of wavelet transform coefficients,” Proc. of IEEE ICIP, 2007.

[9] S. Bayram, I. Avcibas, B. Sankur, N Memon, “Image Manipulation Detection,” J. Electronic Imaging, 15(4), 041102-1-041102-17, 2006. The detection performances of pre-trained blind

classifiers are presented in Table VI for the original image sets where f1 and f2 corresponds to the cross tests between two different blind classifiers. Same tests are performed for the photomontage image set and tabulated in Table VII. According to these results we can conclude that the selected features are reliable. It can be also observed that the second blind classifier is superior to the first one. Although the tests for the original images are performed over three different image sets with large number of images, when compared to [9] and [10], it can be seen that the overall detection performance of f1 is superior to that of [9] whereas of f2 is superior to that of [10].

[10] S. Bayram, . Avc ba , B. Sankur, N. Memon, “Image Manipulation Detection with Binary Similarity Measures,” 13th European Signal Processing Conference, Vol. I, 752-755, Antalya, 2005.

[11] G. Gul, F. Kurugollu, “Detection of Watermarking Methods using Steganalysis,” Proc. of IEEE ICASSP, 2008.

[12] Image set is available at http://philip.greenspun.com. [13] Tian-Tsong Ng, Shih-Fu Chang, “An Online System for Classifying

Computer Graphics Images from Natural Photographs,” SPIE Electronic Imaging, San Jose, CA, January 2006.

[14] Pudil, P., J. Novovicova, and J. Kittler, “Floating search methods in feature selection,” Pattern Recognition Letters, 15, pp. 1119-1125, 1994.

[15] A.K. Jain, R.P.W. Duin and M. Jianchang, “Statistical pattern recognition: a review,” IEEE Trans. On PAMI, vol. 22, no 1, pp. 4– 37, 2000.

TABLE VI DETECTION PERFORMANCES OF BLIND CLASSIFIERS ON THE ORIGINAL

IMAGE SETS

0.20.40.60.811.21.41.61.82

x 10-3

0

1

2

3

4

5

6

7

x 10-6

Original45 degree5 degree1 degree15 degree30 degree

Fig. 1. Scatter plot of the best three features for rotation attack.

0.92

0.94

0.96

0.98

1

1.5 2 2.5 3 3.5 4 4.5

x 105

0

0.02

0.04

0.06

0.08

feature2

feature1

Original50%5%10%25%1%

Fig. 2. Scatter plot of the best three features for scale-up attack.

f1 f2 Pc Gs Grf1 N 0.7550 0.989 0.916 0.696

f2 0.875 N 0.954 0.820 0.802

TABLE VII DETECTION PERFORMANCES OF BLIND CLASSIFIERS ON THE PHOTO-

SHOPPED IMAGE SETS Ps1 Ps2 Detection

f1 0.825 1.000 0.860

f2 0.950 0.950 0.950

5. CONCLUSION

SVD based features provide a way to model the correlation among the rows and columns using relative linear dependency. Using this ability, image manipulations which destroy this correlation can be detected. For this aim, two types of SVD features are considered in different scenarios including clairvoyant, semi-blind and blind modes. The results compared with [9] and [10] are very promising.

6. REFERENCES

[1] M. K. Johnson and H. Farid, “Exposing digital forgeries in complex lighting environments,” IEEE Transactions on Information Forensics and Security, 2(3):450–461, 2007.

[2] Fridrich J., “Image watermarking for tamper detection,” Proc. ICIP, International Conference on Image Processing, vol. 2, pp. 404–408, 1998.

1768

Documents

[IEEE 2010 17th IEEE International Conference on Image Processing (ICIP 2010) - Hong Kong, Hong Kong (2010.09.26-2010.09.29)] 2010 IEEE International Conference on Image Processing