D.WVL.22 Final Report on Forensic Tracking Techniquesvideo segment using a signature composed of a superposition of temporal pulse series representing the existence (appearance / disappearance)

ECRYPT

IST-2002-507932

ECRYPT

European Network of Excellence in Cryptology

Network of Excellence

Information Society Technologies

D.WVL.22

Final Report on Forensic Tracking Techniques

Due date of deliverable: 31. July 2008Actual submission date: 21. July 2008

Start date of project: 1. February 2004 Duration: 4.5 years

Lead contractor: Fraunhofer-Gesellschaft zur Forderung der angewandten Forschung e.V.(FHG)

Revision 1.0

Project co-funded by the European Commission within the 6th Framework Programme

Dissemination Level

PU Public X

PP Restricted to other programme participants (including the Commission services)

RE Restricted to a group specified by the consortium (including the Commission services)

CO Confidential, only for members of the consortium (including the Commission services)

Final Report on Forensic Tracking Techniques

EditorAlexander Opel (FHG)

ContributorsNikos Nikolaidis, Simeon Nikitidis, Ioannis Pitas,

Nikos Vretos, Elias Horness, Dimitrios Zotos (AUTH)Di Wu (FHG)

Andreas Lang, Andreas Uhl (GAUSS)Sviatoslav Voloshynovskiy, Oleksiy Koval (UNIGE)

21. July 2008Revision 1.0

The work described in this report has in part been supported by the Commission of the European Com-munities through the IST program under contract IST-2002-507932. The information in this document isprovided as is, and no warranty is given or implied that the information is fit for any particular purpose. Theuser thereof uses the information at its sole risk and liability.

Contents

1 Summary of Work done in WVL4 11.1 Summary of AUTH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Summary of FHG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Summary of GAUSS-Salzburg . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Summary of GAUSS-Magdeburg . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Summary of UNIGE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Problems addressed by AUTH 52.1 Image replica detection utilizing color / texture information, R-trees and LDA 5

2.1.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Semantic video Fingerprinting using face-related information . . . . . . . . . 62.2.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 A Latent Dirichlet Allocation Approach To Video Fingerprinting . . . . . . . 72.3.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Video fingerprinting utilizing R-trees, SVM and a frame-based voting approach 82.4.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Video database for benchmarking of video fingerprinting / replica detectionalgorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.5.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Problems addressed by FHG 113.1 Fingerprinting for passport photos . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1.1 Addressed problem and Motivation . . . . . . . . . . . . . . . . . . . . 11

i

ii

3.1.2 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.3 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Problems addressed by GAUSS 134.1 Improving Security of JPEG2000-Based Robust Hashing using Key Dependent

Wavelet Packet Subband Structures . . . . . . . . . . . . . . . . . . . . . . . 134.1.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.1.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 Key Dependent JPEG2000-Based Robust Hashing for Secure Image Authenti-cation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.2.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3 Attack Against a JPEG2000-based Robust Hash for Content Identification . . 144.3.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.3.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.3.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.3.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 Problems addressed by UNIGE 175.1 Authentication of Biometric Identification Documents via Mobile Devices . . 17

5.1.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.1.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.1.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.2 Decision-theoretic consideration of robust hashing: link to practical algorithms 185.2.1 Addressed problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2.3 Technical approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2.4 Obtained results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Conclusion 19

Abstract

This is the last deliverable of WAVILA WP4 which provides an update of the ongoing researchactivities on perceptual hashing performed within the project ECRYPT.

Chapter 1 summarises the work in the area of perceptual hashing done by Aristotle Uni-versity of Thessaloniki (AUTH), the Fraunhofer-Gesellschaft (FHG), German AUStrian Mul-tiMedia Security Channel (GAUSS) and University of Geneva (UNIGE) in the last 4.5 yearsof ECRYPT.

Chapter two to five address the work of each partner in WVL4 in the last period ofECRYPT. Finally, this deliverable concludes with a short overview about the perceptualhashing work package and the dissemination of the achieved results.

Chapter 1

Summary of Work done in WVL4

1.1 Summary of AUTH

During the project duration, AUTH dealt with a number of research and integration issueswithin the scope of WVL.4:

• Evaluation of different color-based descriptors for image fingerprinting. Various colorquantization methods, histograms calculated using color-only and/or spatial-color in-formation and similarity metrics were compared under the effect of various image ma-nipulation operators. This work resulted in the publication of a journal paper.

• Image replica detection/fingeprinting utilizing R-trees. Two different variants wereproposed. In both variants, a set of example (training) attacked images is used tochoose the bounding boxes in an R-tree structure that holds the original images. In thefirst variant, features related to color, texture and gray-scale statistics are used. Theset of candidate images returned by the R-tree is given as input to binary classifierseach one trained to detect whether a query image is a replica of a specific image. Thesecond variant utilizes color/texture-based descriptors, Linear Discriminant Analysisand a SIFT-based verification step. This work resulted in one published conferencepaper and one journal paper (under review).

• Video fingerprinting utilizing face-related information. The method characterizes avideo segment using a signature composed of a superposition of temporal pulse seriesrepresenting the existence (appearance / disappearance) of individuals in the video. Thesimilarity of two video segments is determined using a novel, fast approach. A journalpaper has been submitted for publication.

• Video fingerprinting utilizing a Latent Dirichlet Allocation (LDA) approach. The pro-posed method tries to extract latent aspects of a video and use them for robust videofingerprinting. SIFT descriptors from facial areas are utilized as features. A journalpaper is under preparation.

• Video fingerprinting / replica detection using color-related information and a two-step,coarse-to-fine procedure. Two different variants were investigated with respect to thecoarse step: one that utilizes an R-tree and another one that utilizes Support VectorMachines (SVM). A journal paper has been submitted.

1

2 ECRYPT — European NoE in Cryptology

• A book chapter and a conference paper that review the area of image/video fingerprint-ing and replica detection and describe solutions devised by AUTH within WVL.4 havebeen also published.

• A database of 2200 images with similar topics, suitable for benchmarking image finger-printing methods has been assembled and became available to ECRYPT partners.

• A database of approximately 1000 short videos (collected from the internet), suitable forbenchmarking video fingerprinting methods has been assembled and became availableto ECRYPT partners.

1.2 Summary of FHG

The research of the Fraunhofer Institute for Computer Graphics Research IGD focused onthe improvement of existing techniques, the development of new techniques and applicationsin which perceptual hashing can give a valuable contribution.

During the project ECRYPT different approaches were developed. A video perceptualhashing algorithm was developed, which is based on spatial and temporal variation of framesand uses different statistical characteristics such as mean, variance and colour. Additionally, aperceptual hashing method for the identification of music scores based on graphical represen-tation was developed. The developed perceptual hashinge methods were adopted for mutualobservation of peers in filesharing networks. The proposed architecture was a framework forthe legal distribution of commercial and non-commercial content via P2P networks. Further-more a histogram-based perceptual hashing algorithm was developed for minimally changingvideo sequences. Moreover a benchmark system to test and evaluate different perceptualhashing algorithms was addressed.

1.3 Summary of GAUSS-Salzburg

Main focus of the Salzburg group in WVL4 was to employ key-dependent wavelet transformsin robust hashing schemes in order to increase the corresponding security. Key-dependencyhas been achieved by using either randomly chosen wavelet packet structures or randomlychosen NSMRA wavelet decompositions.

In a first stage, we have identified wavelet-based proposals for robust visual hashingschemes which exhibit certain weaknesses against malicious attacks. We have found weakschemes for both types of visual hashing schemes, i.e. targeting on authentication as well ascontent identification.

In the second stage, we have incorporated key-dependent transforms and have investigatedthe impact of this approach on (a) the properties of the hashing scheme (e.g. robustness,sensitivity) and on (b) the security of the scheme.

We have found that authentication targeted schemes may be enhanced by our approach inalmost all cases (where care has to be taken with respect to potentially different robustness andsensitivity properties), while content identification schemes are too robust to gain sufficientkey dependency by this approach.

D.WVL.22 — Final Report on Forensic Tracking Techniques 3

1.4 Summary of GAUSS-Magdeburg

The main focus of the Magdeburg group in WVL4 was the design and implementation of theapplication profile perceptual hashing [Lan08]. This profile abstracts the evaluation processitself by simplifying it for end users with few inside knowledge and normalizing the evalua-tion results to provide objective comparison. It can be used twice. Firstly, it is useable toevaluate the robustness of the perceptual hashing algorithm against audio signal modifica-tion occurred by, for example, watermark embedding or attacking. Secondly, the perceptualhash can be used as transparency measurement [LKD08] [Lan08] to evaluate the embeddingand/or attacking transparency of a digital watermark. In [LD08] and [LDK07] are first prac-tical evaluation results presented, where the embedding transparency of exemplary selecteddigital audio watermarking schemes is evaluated within the application scenario of perceptualhashing. More and detailed information can be found in [LKD08] [Lan08].

1.5 Summary of UNIGE

Within the reported period, UniGe was mostly concentrated on a solution to the followingtwo problems.

First, we consider the problem of authentication of biometric identification documents viamobile devices. In order to overcome the existing weaknesses of habitual person identifica-tion documents, we proposed to use digital data-hiding and perceptual hashing in order tocross-store the biometric data within the personal data and vice versa. We proposed personidentification system design that is optimal in Shannon separation principle sense. Finally,we developed practical approaches for text data hiding in electronic and printed forms andperceptual hashing as building blocks of the above system.

Secondly, we analyzed robust perceptual problem as multiple composite hypothesis testingproblem under partial ambiguity about the acquisition channel/source data statistics. Weproposed asymptotically universal test approaching the performance of the classical ML testwith perfectly known mentioned statistics. Finally, we performed the analysis of robustperceptual hashing from joint positions of complexity, robustness, lack of priors and security.We analyzed a practical hash construction that replaces an M-ary formulation by a set ofbinary hypothesis tests and presented the details of its implementation for robust hashing oftext documents.


Chapter 2

Problems addressed by AUTH

2.1 Image replica detection utilizing color / texture informa-tion, R-trees and LDA

2.1.1 Addressed problem

This work is a continuation of work performed by AUTH within ECRYPT on the samesubject (reported in D.WVL.13). The general problem addressed is how to exploit informationconcerning the possible manipulations that an original image is likely to undergo, in order tooptimize the performance of a classification scheme that performs perceptual hashing - replicadetection. In the reporting period a number of improvements on the initial technique havebeen introduced.

2.1.2 Motivation

The motivation for the proposed improvements came from the need to enhance the perfor-mance of the system in terms of false positives. Moreover, the system has been tested in alarger image set in order to judge its performance in more realistic situations. Finally, thepreviously reported method did not deal with security issues. Such issues have been addressedin the enhanced version.

2.1.3 Technical approach

Each image is described by a feature vector. Then, a multidimensional indexing structurebased on R-trees is implemented. For selecting the optimal hyper-bounding boxes that delimitthe neighborhood of each image in the R-tree, an attack oriented strategy, that tries to modelall potential attacks that the system is designed to encounter, is introduced. When queriedwith an unknown image, the R-tree returns a set of candidate matching images. Subsequently,Linear Discriminant Analysis (LDA) is applied in order to reformulate the feature space andyield more discriminant image representations so that a final decision can be reached. Duringthe reporting period the following research directions have been followed: a) various MPEG-7image descriptors have been tested. b) kd-trees were compared to R-trees. c) Comparisonof the query image with the original (database) image returned by the system using SIFTfeatures was incorporated. This new step aimed at reducing the percentage of false positives.d) The security of the system to intentional attacks has been investigated and one method

5


to increase security by introducing a user-defined key that affects the feature extractionprocedure has been proposed.

2.1.4 Obtained results

The improved method has been tested on two image datasets. A set consisting of 2300 images(set A) and another, new one, consisting of almost 10000 images (Set B). An extensive series ofattacks were applied on the query images. The MPEG-7 image descriptor that produced thebest results was the Scalable Color Edge Histogram descriptor that incorporates both colorand texture information. kd-trees were experimentally found to be inferior to R-trees for theproblem at hand. For set A, and when the same attacks used during the system training areapplied on the query images, the final, optimized system involving the SIFT verification stepachieved a false negative rate of 1.1% and a false positive rate of 0.58% (down from 1.1%when no SIFT features were used). For set B, the system achieved false positive and falsenegative rates of 6.5%. A journal paper has been authored. The paper has been submittedto IEEE Transactions on Multimedia [NZNP08]. The method has been also described in abook chapter authored by AUTH in an edited book [NP08]

2.2 Semantic video Fingerprinting using face-related informa-tion


An obvious solution to the problem of determining the origin of a video segment in the pres-ence of attacks (i.e. fingerprinting or replica detection) is to use semantic features for videocharacterization. This is because such semantic features are expected to be robust to manip-ulations. Using the pattern of appearance of specific people in the video as a characteristichas been proposed as an idea before. However no work had been done on formulating afingerprinting method, and evaluating its performance on video databases.

2.2.2 Motivation

Our first goal was to develop a metric that would determine if a video segments is a replica ofanother, based on the faces existing therein. Such a metric needed to be robust to errors inface detection and recognition, and also to arbitrary temporal limits in the video segments.Our second goal was to develop an efficient algorithm for finding if a specific video segmentis a replica of any segment in a video database. Our final goal was to test our fingerprintingmethod with respect to different performance rates of face detection and recognition.


We chose to characterize a video segment using a signature composed of a superpositionof temporal pulse series. Each such pulse series represents the existence of an individualin the video, where a pulse represents a continuous appearance of a person’s face. Then,the similarity of two video segments is determined using a novel, fast approach that bearssimilarities to convolution.


In order to efficiently locate possible copies of a specific video segment within a database,we first construct the appropriate signature for each video in the database. Then the pulsesrepresenting persons’ appearances are indexed temporally and on identity. The propertiesof the signature are exploited to find the segments in the database that are most similar tothe given video. In order to declare identity, the similarity of the best match must exceed athreshold. The major novelty of the algorithm is in the efficient computation of the signaturesimilarity, specifically in finding the maxima of the similarity metric.


In order to test our method with respect to parameters such as the effectiveness of facedetection and recognition, and in order to be able to test its performance in very largedatabases, we devised a probabilistic model for the simulation of video signatures. This modelwas used to build artificial video databases which formed a controllable experimental corpus.This enabled us to evaluate the effect that the face detection and recognition errors had onthe performance of the proposed method. In scenarios with moderate detector and recognizermalfunctions, we registered a false rejection of 6% and negligible false acceptance. The systemwas also applied on a database of real videos with very good results. The computationalcomplexity of the algorithm was found to be logarithmic with respect to database size, as hadbeen predicted.

A journal paper has been authored. A version of the algorithm that operates within avideo content-based retrieval framework is also described in this paper. The paper has beensubmitted to Signal Processing: Image Communication [CNNP08]. The method has beenalso described in a book chapter authored by AUTH in an edited book [NP08]

2.3 A Latent Dirichlet Allocation Approach To Video Finger-printing


We investigate the possibility of extracting latent aspects of a video in order to use themfor video fingerprinting. A probabilistic model is utilized for this purpose, using face-relatedinformation as features. We use the bag-of-words model (in our case ”bag-of-faces”) in eachvideo and therefore we establish a topic model, where topics are modelled as a mixture ofdistributions of SIFT features extracted from faces. This framework has already been usedin the case of text modelling with good results.

2.3.2 Motivations

Our motivation stems from the fact that in a video there exist latent aspects of its contentwhich are robust under standard attacks. These aspects can be extracted by using the LatentDirichlet Allocation (LDA) generative model. This approach has already been applied in textmodelling and provided good results. In [BNJL03] it is proven that LDA performs better thanthe probabilistic Latent Semantic Indexing (pLSI) algorithm in the context of text modellingdue to the known overfitting problems of pLSI. Moreover, recent research efforts, made use ofthis probabilistic generative process in the context of image processing [BDF+03],[FFFPZ05],[FFP05] and video processing [YLXH07],[CLZT07].



For, the feature extraction, a Viola and Jones [VJ02] face detector was applied on the videosand SIFT features were extracted from the detected faces. These features create the ”words”that characterize the video and in the same time are appended to the universal vocabularyof the video database. Afterwards, the LDA framework is trained and the topic level hyper-parameters α and β are found. Once the model is trained we can test a new video by itsco-occurrence matrix and infer in which topic it belongs to. As mentioned above, this willresult in a mixture of topics, rather than a single topic. Taking the one with highest probabilitywill result in the main topic (i.e., video). As far as inference is concerned, many probabilisticapproaches exist which provide good results. In our case, the variational inference approach,described in [BNJL03] is used.


In order to test the performance our framework we have conducted experiments based ontwo sets of videos. The first set consists of a number low quality videos (collected from theinternet) with an average length of 2000 frames each and the second set consists of movieswith an average length of 1 hour 30 minutes (approx. 120.000 frames each). The preliminaryresults are very promising. A technical report has been authored [VNHP08]. A journal paperis currently under preparation.

2.4 Video fingerprinting utilizing R-trees, SVM and a frame-based voting approach


The problem addressed in this work was that of video fingerprinting / replica detection usingcolor-related information and a two-step, coarse-to-fine procedure. Two different variantswere investigated with respect to the coarse step: one that utilizes R-trees and another onethat utilizes Support Vector Machines (SVM).

2.4.2 Motivation

The variant that utilizes R-trees was motivated from the good performance of the imagereplica detection method that makes use of R-trees and color-based information that hasbeen devised by AUTH (see section 2.1). In a sense, it constitutes an extension of thismethod towards video data. The SVM variant was motivated by the very good performanceof SVMs in two-class and multi-class classification problems.


Videos are represented by color histograms (one per frame) and then, a multidimensionalindexing structure based on R-trees is implemented. The efficiency of the indexing structurein producing accurate results depends directly on the selection of optimal hyper-boundingboxes for use in the R-tree. For selecting these bounding boxes, an attack-oriented trainingstrategy that aims at modelling potential attacks that the system is designed to encounter isutilized. When queried with an unknown video, the R-tree returns a set of videos that are


candidates for being the original of the query. This is achieved by a voting scheme whereeach frame from the query video is processed separately and casts a vote to a video in thedatabase. Videos whose votes are above a threshold are selected. Subsequently, a refinementstep that utilizes distances between frame color histograms is applied on the set of videosreturned by the R-tree in order to reach the final decision.

The above description refers to the R-tree variant of the system. In the SVM variant,original videos as well as their attacked versions are used to train a multi-class SVM whoseclasses are equal to the number of videos in the database. When queried with a certain video,the SVM, along with the same per-frame voting procedure returns a small set of videos thatare candidates for being the original of the query. The same refinement procedure is thenapplied on this set.


The method has been applied on a database of 590 relatively low-quality short videos collectedfrom the internet. The system was trained by applying Gaussian noise, MPEG compressionand scaling attacks on the videos. When queried with videos from the database that weredistorted by the same attacks (involving either the parameters used in training, or differentparameters) the system achieved 0% misclassification and false rejection. A 2.5% false re-jection rate (with 0% misclassification) was observed when the query videos were distortedwith all three attacks simultaneously. When queried with videos that were not stored in thedatabase the false acceptance error was 3.3%. The above results correspond to the R-treevariant. Slightly better results were obtained when using the SVM variant, which was alsofound to be considerably faster. The method has been described in a journal paper thathas been submitted to IEEE Transactions on Circuits and Systems for Video Technology[ZNNP08].

2.5 Video database for benchmarking of video fingerprinting/ replica detection algorithms


A database consisting of approximately 1000 short videos (average duration 4.5 mins pervideo), collected from the Internet (more specifically from the popular YouTube website) hasbeen assembled. The videos are of diverse topics and relatively small resolution. The purposeof this database is to be used for the benchmarking of replica detection algorithms. Thedatabase is available to all ECRYPT partners.

2.5.2 Motivation

The motivation for the assembly of this database was the lack of a video corpus suitable foruse in benchmarking of video replica detection algorithms.


Chapter 3

Problems addressed by FHG

3.1 Fingerprinting for passport photos

3.1.1 Addressed problem and Motivation

In modern world, the cross-regional and multinational movement of persons is very simple. Itmakes trouble for the border control, crime prevention, and public security. Authenticationof persons is based on his/her identity document, such as passport or driver licence. It isindispensable to discriminate legal ID documents and counterfeit or stolen ones. The anti-counterfeit printing method may be obtained by the forger. The biometrics technique mayintrude the privacy or need extra instrument. So we need a more acceptable and trustableway.

Image fingerprinting (Perceptual hashing) has been regarded as a newly effective methodto deal with image identification and image authentication. It is feature based and sensitiveenough to the potential perceptual content changes. And it can be designed as key dependentto archive security.


We designed an image fingerprinting algorithm based on the Radon transform. The mostinteresting properties of radon transform are the rotation and scaling(retaining aspect ratio)invariance [SHKY04]. The steps of the algorithm are show in Fig 3.1. Based on the algorithm,FIFA (Face Image Fingerprinting for Authentication) was developed to authenticate passportand other identity card by the image fingerprinting of the face image. In this system, theimage fingerprinting is extracted from the face photo in the document. After combined withthe information of the holder, the authentication code is generated.


In the experiment, there are 47 images which belongs to 25 persons, include 11 females and 14males. The samples were collected by Oriana Yuridia Gonzalez Castillo for the project 2DFIQ[Cas06]. The attacks include luminance intensity adjustment (gamma correction), rotation(and rotate back), additive noise, blur and print-scan. The feature extraction algorithm is notbit-error-free under the attack. But the experimental results show that the FAR and FRRamong different persons can be both 0 after attack when the threshold is in a proper range.

11


Figure 3.1: Steps of image fingerprinting algorithm

Chapter 4

Problems addressed by GAUSS

4.1 Improving Security of JPEG2000-Based Robust Hashingusing Key Dependent Wavelet Packet Subband Structures


This work [LU07] is on robust content based hash functions for visual data where we focuson the security aspect of such schemes which aim at image authentication. In particular,we focus on a particluar technique proposed in literature which uses part of the JPEG2000packet body data to create a hash string.

4.1.2 Motivation

Using a well established multimedia standard for image hashing is promising due to availablehard- and software and a wide range of knowledge available. Concrete motivation for thiswork is a class of attacks discovered against the beforementioned JPEG2000 based hashingscheme where packet data extracted from different images can be mixed producing almostarbitrary attacks against the original scheme.


We propose to use key-dependent wavelet transforms in order to introduce key-dependencyinto the hashing scheme. In particular, we employ randomly generated wavelet-packet baseswhich can be neatly integrated into JPEG2000 in the context of Part 2 of the standard. Foreach hash generated, a different wavelet packet structure is used in a key-dependent manner.


We achieve satisfactory key-dependency of the hash with an extremely large key-space size andthe sensitivity of the scheme against intentional image alterations is maintained. However,the different wavelet packet bases exhibit an extremely varying robustness against compres-sion and various “common” image manipulations – for some extreme cases, the property ofbeing a “robust” hashing scheme is entirely lost (this applies to wavelet packet bases with alarge amount of high frequency subbands, the packet data of which appearing “early” in theJPEG2000 bitstream.

13


4.2 Key Dependent JPEG2000-Based Robust Hashing for Se-cure Image Authentication


This work [LU08] is on robust content based hash functions for visual data where we focuson the security aspect of such schemes which aim at image authentication. In particular,we focus on a particluar technique proposed in literature which uses part of the JPEG2000packet body data to create a hash string.

4.2.2 Motivation



We propose to use key-dependent wavelet transforms in order to introduce key-dependencyinto the hashing scheme. In particular, we employ randomly generated non-stationary multi-resolution wavelet decompositions (NSMRA) which employ different wavelet filters at each de-composition level (and also at each direction within one leven in the inhomogeneous variant).The different filters used for decomposition are generated using a biorthogonal parametriza-tion scheme which results in the common 9/7 wavelet for specific parameter settings. Foreach hash generated, a different NSMRA structure is used in a key-dependent manner. Thisapproach can be neatly integrated into JPEG2000 in the context of Part 2 of the standard.


We achieve satisfactory key-dependency of the hash with an sufficiently large key-space size.The effectiveness of the proposed technique against the described attacks is shown and wealso estimate unicity distance for the resulting key-depending hashing scheme be very high.The achieved higher secuirty has to be paid for with problematic robustness and sensitivityproperties, which vary from one NSMRA scheme to the other, in some case improving on thestandard filter, in some cases significantly worsening it.

4.3 Attack Against a JPEG2000-based Robust Hash for Con-tent Identification


This work [HULU08] is on robust content based hash functions for visual data where we focuson the security aspect of such schemes which aim at content search (CBIR). In particular,we focus on a particluar technique proposed in literature which uses part of the JPEG2000packet body data to create a hash string.


4.3.2 Motivation



We investigate the impact of using different measures to determine hash similarity in the dis-cussed scheme: since the JPEG2000 bitstream syntax is based on bytes, the original proposalbasically counts identical bytes to determine hash similarity. We show that this suggestionresults in the possibility of a “bit-flip” attack where only a small fraction of a byte (i.e. onebit) needs to be changed, resulting in a different byte but hardly changed visual content.


We propose to use the classical Hamming distance also for the JPEG2000 based hash whichprevents the bit-flip attack and also facilitates to measure hash distances introduced by key-dependent NSMRA or wavelet packet decompositions.


Chapter 5

Problems addressed by UNIGE

5.1 Authentication of Biometric Identification Documents viaMobile Devices


We consider the problem of authentication of biometric identification documents via mobiledevices [VKV+08].

5.1.2 Motivation

Motivation of the current research is justified by the needs of security enhancement of currentlyavailable identification documents.


The proposed solution makes use of digital data-hiding and perceptual hashing in order tocross-store the biometric data inside the personal data and vice versa.


The main obtained results could be summarized as follows. We proposed a solution to se-curity enhancement of identification documents based on digital data-hiding and perceptualhashing. A theoretical framework was presented which enables the analysis of future authen-tication systems based on this approach and guides their design. We advocate the separationapproach which uses robust visual hashing techniques in order to match the information ratesof biometric and personal data to the rates offered by current image and text data-hidingtechnologies. We also described practical schemes for robust perceptual hashing and digi-tal data-hiding that can be used as building blocks for the proposed authentication system.The obtained experimental results show that the proposed authentication system constitutesa viable and practical solution when the document acquisition is performed using a CCDscanner.

17


5.2 Decision-theoretic consideration of robust hashing: linkto practical algorithms


We propose to consider the problem of robust perceptual hashing of multimedia data ascomposite hypothesis testing [KVBP07, VKBP07].

5.2.2 Motivation

Such a problem formulation is justified by prior ambiguity about source statistics and channelparameters that is usually the case in multiple practical scenarios.


An asymptotically universal test approaching the performance of the classical ML test per-formed under the exact knowledge of the mentioned statistics is proposed under the specificconstraints on the assumed source and geometric channel models.


The main obtained results can be summarized as follows. First, we proposed a compositehypothesis testing formulation of the problem of hashing under source and channel ambiguity.Second, in order to link the considered theory with practice in a way that a number ofconflicting requirements to complexity, robustness, lack of priors and security will be satisfied,we analyzed a practical hash construction that replaces an M-ary formulation by a set ofbinary hypothesis tests and presented the details of its implementation for robust hashing oftext documents.

Chapter 6

Conclusion

Massive unauthorized distribution and copying of digital content due to lossless digital copyingand fast digital distribution requires additional tools especially in the context of DRM. Eitherthese tools reduce the unauthorized activities or they allow content owners to classify copiesof their content as legal or not. The objective of WVL4 was to investigate technical issuesof forensic tracking applications. This investigation is to enhance available technology. Toachieve this goal WVL4 facilitates the efforts among European researchers in the explorationof perceptual hashing (fingerprinting). Results achieved in the first year will be taken intoaccount, especially the fundamentals within WVL1.

Different technological issues were analysed in ECRYPT’s year one in the original WVL6(”Forensic Tracking”, which was the predecessor of today’s WVL4 before the general restruc-turing in year two). For images, the content dependency of watermarks was investigatedin particular with respect to capacity and visibility constraints. Within this research thecombination of semantic and syntactical embedding paradigms were analysed for images.Additionally, robust feature extraction algorithms for images were investigated. For audio,year one focussed on bit-error tolerant perceptual audio content hashing. Special require-ments for document images were studied. Besides feature extraction and processing, efficientsearch and database structures have been considered. As described in D.WVL.7 legal issueswere investigated within Belgium, Germany, and Switzerland. This investigation showed thatthere has been no jurisprudence at European courts. Only in the U.S. digital watermark-ing was described as a mechanism for tracing although the proof of ownership was done byconventional methods.

Within year one deviations from the project work programme had been identified: Theresearch activities within the WAVILA groups focus on basic research. Thus, the plannedactivities within WVL6 were not completely covered by the initial work plan for the first12 months. Within the general restructuring of WAVILA at the beginning of year two thisdevelopment was taken into account, resulting in a new virtual lab (VL) structure, assigningthe topics from the ”old” WVL6 to a new founded WVL4, and a slightly modified set ofobjectives for this new WVL4.

In contrast to watermarking, where theoretical foundations have transferred from com-munication theory, perceptual hashing is a relatively new area. Different methods have beenproposed for the identification and authentication of multimedia content. While for the eval-uation of perceptual hashing technologies knowledge, which was gained in related areas likewatermarking or biometrics, can be transferred, a basic theoretical framework for the devel-

19


opment of perceptual hashing techniques is not available. Neither is a public tool availablethat support the evaluation of existing perceptual hashing techniques. Similarly, security ofperceptual hashing techniques has not been considered adequately.

As a consequence WVL4 addressed these missing issues in year two and three by collect-ing different requirements and evaluation criteria as a first step towards a general framework.Furthermore, existing methods have been investigated and new methods have been developedincluding methods for new content types. An additional focus was on applying perceptualhashing techniques for the authentication and verification of content. Some of the results fo-cusing on the benchmarking of perceptual hashing techniques where published in the WVL3deliverable D.WVL.11 ”Benchmarking Metrics and Concepts for Perceptual Hashing (finger-printing)”.

In year three a continuation of the research on video fingerprinting, copyright protectionfor 3D models and forensic image replica detection can be seen, considering the latter as anefficient alternative to watermarking for the discovery of copyright infringement.

In year four a continuation of the research on video and image fingerprinting was doneby the participating partners. Some quite interesting approaches were developed in the lastyear.

The results of year two where presented in Salzburg at the ECRYPT WAVILA Summerschool organised by GAUSS. In year three the results of the research activities in WVL4were presented and discussed with international experts from within and from outside theECRYPT consortium on major events like the 3rd IFIP Conference on Artificial IntelligenceApplications & Innovations (AIAI 06) (where WVL4 members from AUTH co-organizeda special session entitled ”Digital Rights Management Techniques and Interoperability ofProtection Tools”) or on the IEEE International Symposium on Intelligent Signal Processingand Communication Systems (ISPACS 2006). Further activities of WVL4 were focused onthe preparation of the call for papers for the 3rd WAVILA Challenge to be held in 2007. Thehighlights of WVL4 of 4.5 years of ECRYPT were presented at events like the 2nd ECRYPTSummer School on Multimedia Security (Thessaloniki, Greece, Sept. 2007) or ECRYPT’s”Challenges and Perspectives for Academia and Industry” in Antwerp, May 2008.

Bibliography

[BDF+03] K. Barnard, P. Duygulu, D. Forsyth, N. de Freitas, D.M. Blei, M.I. Jordan,J. Kandola, T. Hofmann, T. Poggio, and J. Shawe-Taylor. Matching Words andPictures. Journal of Machine Learning Research, 3(6):1107–1135, 2003.

[BNJL03] D.M. Blei, A.Y. Ng, M.I. Jordan, and J. Lafferty. Latent Dirichlet Allocation.Journal of Machine Learning Research, 3(4-5):993–1022, 2003.

[Cas06] Oriana Yuridia Gonzalez Castillo. Survey about facial image quality. Technicalreport, Fraunhofer IGD A8, 2006.

[CLZT07] J. Cao, J. Li, Y. Zhang, and S. Tang. LDA-Based Retrieval Framework forSemantic News Video Retrieval. Proceedings of the International Conference onSemantic Computing, pages 155–160, 2007.

[CNNP08] C. Cotsaces, N. Nikolaidis, S. Nikitidis, and I. Pitas. Semantic video fingerprintingand retrieval using face information. Signal Processing: Image Communication(submitted), 2008.

[FFFPZ05] R. Fergus, L. Fei-Fei, P. Perona, and A. Zisserman. Learning Object Categoriesfrom Googles Image Search. Computer Vision, 2005. ICCV 2005. Tenth IEEEInternational Conference on, 2, 2005.

[FFP05] L. Fei-Fei and P. Perona. A Bayesian hierarchical model for learning naturalscene categories. Proc. CVPR, 5, 2005.

[HULU08] Jutta Hammerle-Uhl, G. Laimer, and A. Uhl. Attack against a jpeg2000-basedrobust hash for content identification. In Proceedings of the 5th IASTED Inter-national Conference on Visualization, Imaging, and Image Processing (VIIP’08),Palma de Mallorca, Spain, September 2008. to appear.

[KVBP07] Oleksiy Koval, Sviatoslav Voloshynovskiy, Fokko Beekhof, and Thierry Pun.Decision-theoretic consideration of robust perceptual hashing: link to practicalalgorithms. In WaCha2007, Third WAVILA Challenge, Saint Malo, France, June15th 2007.

[Lan08] Andreas Lang. Audio Watermarking Benchmarking – A Profile Based Approach.PhD dissertation, Otto-von-Guericke University of Magdeburg, 2008. ISBN 978-3-940961-22-8.

21


[LD08] Andreas Lang and Jana Dittman. Digital audio watermarking evaluation withinthe application field of perceptual hashing. In SAC ’08: Proceedings of the 2008ACM symposium on Applied computing, pages 1192–1196, New York, NY, USA,2008. ACM.

[LDK07] Andreas Lang, Jana Dittmann, and Christian Kraetzer. Digital Watermarkingand Perceptual Hashing of Audio Signals with Focus on their Evaluation. InProceedings of the 3rd Wavila Challenge, 2007.

[LKD08] Andreas Lang, Christian Kraetzer, and Jana Dittmann. D.WVL.21 Final Re-port on Watermarking Benchmarking. Technical report, ECRYPT - EuropeanNetwork of Excellence in Cryptology, 2008. ISBN 987-3-940961-23-5.

[LU07] G. Laimer and A. Uhl. Improving security of JPEG2000-based robust hashingusing key-dependent wavelet packet subband structures. In P. Dondon, V. Mlade-nov, S. Impedovo, and S. Cepisca, editors, Proceedings of the 7th WSEAS Interna-tional Conference on Wavelet Analysis & Multirate Systems (WAMUS’07), pages127–132, Arcachon, France, October 2007.

[LU08] Gerold Laimer and Andreas Uhl. Key dependent JPEG2000-based robust hashingfor secure image authentication. EURASIP Journal on Information Security,(Article ID 895174):doi:10.1155/2008/895174, 19 pages, 2008.

[NP08] N. Nikolaidis and I. Pitas. Digital rights management of images and videos usingrobust replica detection techniques. L. Drossos, S. Sioutas, D. Tsolis, T. Pap-atheodorou (editors) Digital Rights Management for E- Commerce Systems, IdeaGroup Publishing, October 2008.

[NZNP08] S. Nikolopoulos, S. Zafeiriou, N. Nikolaidis, and I. Pitas. Image replica detectionsystem utilizing r-trees and linear discriminant analysis. IEEE Transactions onMultimedia (submitted), 2008.

[SHKY04] Jin S. Seo, Jaap Haitsma, Ton Kalker, and Chang D. Yoo. A robust imagefingerprinting system using the radon transform. Signal Processing: Image Com-munication, 19(4):325–339, 2004.

[VJ02] P. Viola and M. Jones. Robust real-time object detection. International Journalof Computer Vision, 57(2):137–154, 2002.

[VKBP07] Sviatoslav Voloshynovskiy, Oleksiy Koval, Fokko Beekhof, and Thierry Pun. Ro-bust perceptual hashing as classification problem: decision-theoretic and practi-cal considerations. In Proceedings of the IEEE 2007 International Workshop onMultimedia Signal Processing, Chania, Crete, Greece, October 1–3 2007.

[VKV+08] S. Voloshynovskiy, O. Koval, R. Villan, F. Beekhof, and T. Pun. Authentication ofbiometric identification documents via mobile devices. SPIE Journal of ElectronicImaging, 2008.

[VNHP08] N. Vretos, N. Nikolaidis, E. Horness, and I. Pitas. A Latent Dirichlet AllocationApproach to Video Fingerprinting. Technical Report, 2008.


[YLXH07] J. Yang, Y. Liu, E.P. Xing, and A.G. Hauptmann. Harmonium Models for Se-mantic Video Representation and Classification. SIAM Conf. Data Mining, 2007.

[ZNNP08] D. Zotos, N. Nikolaidis, S. Nikitidis, and I. Pitas. Video fingerprinting utilizingR-trees, SVM and a frame-based voting approach. IEEE Transactions on Circuitsand Systems for Video Technology (submitted), 2008.