15
1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEE Transactions on Image Processing 1 Securing SIFT: Privacy-preserving Outsourcing Computation of Feature Extractions over Encrypted Image Data Shengshan Hu, Qian Wang, Member, IEEE, Jingjun Wang, Zhan Qin, and Kui Ren, Fellow, IEEE Abstract—Advances in cloud computing have greatly motivated data owners to outsource their huge amount of personal multime- dia data and/or computationally expensive tasks onto the cloud by leveraging its abundant resources for cost saving and flexibility. Despite the tremendous benefits, the outsourced multimedia data and its originated applications may reveal the data owner’s private information, such as the personal identity, locations or even financial profiles. This observation has recently aroused new research interest on privacy-preserving computations over outsourced multimedia data. In this paper, we propose an effec- tive and practical privacy-preserving computation outsourcing protocol for the prevailing scale-invariant feature transform (SIFT) over massive encrypted image data. We first show that previous solutions to this problem have either efficiency/security or practicality issues, and none can well preserve the important characteristics of the original SIFT in terms of distinctiveness and robustness. We then present a new scheme design that achieves efficiency and security requirements simultaneously with the preservation of its key characteristics, by randomly splitting the original image data, designing two novel efficient protocols for secure multiplication and comparison, and carefully distributing the feature extraction computations onto two independent cloud servers. We both carefully analyze and extensively evaluate the security and effectiveness of our design. The results show that our solution is practically secure, outperforms the state-of-the- art, and performs comparably to the original SIFT in terms of various characteristics, including rotation invariance, image scale invariance, robust matching across affine distortion, addition of noise and change in 3D viewpoint and illumination. Index Terms—Scale-invariant feature transform, privacy- preserving, security, homomorphic encryption, cloud computing. I. I NTRODUCTION W ITH the increasing popularity of cloud-based data services, data owners are highly motivated to store their huge amount of (potentially sensitive) personal multi- media data files and computationally expensive tasks onto remote cloud servers. While enjoying the abundant storage and computation resources for cost saving and flexibility, the outsourcing of data storage and computation to the cloud also raises great security and privacy concerns due to the different S. Hu, Q. Wang and J. Wang are are with The State Key Lab of Software Engineering, School of Computer Science, Wuhan University, Wuhan 430072, China. E-mail: {hushengshan, qianwang, jingjun}@whu.edu.cn. Q. Wang is the corresponding author. Z. Qin and K. Ren are with the Department of Computer Science and Engineering, The State University of New York at Buffalo, NY 14260, USA. {zhanqin, kuiren}@buffalo.edu A preliminary version [1] of this work has been accepted by the IEEE International Conference on Computer Communications (INFOCOM’16). trust domains the data owner and the cloud belong to [2], [3], [4], [5]. More specifically, popular social network providers like Facebook and Flickr commonly exploit the outsourced personal image data to conduct behavioral advertising and preference analytics for improving user experience. To do this, instead of directly working on the massive data sets, they extract image features as inputs to well-defined data mining models. On the other side, from the data owner’s perspective, to relieve the heavy computation workload in image feature extraction and utilization on the “Big Data” (e.g., massive satellite images) for building versatile user-defined applica- tions such as similarity search indexes, more and more users are inclined to outsource the computation of image feature extractions to the cloud. Obviously, the expose of original image data to the semi-trusted cloud service provider may inevitably reveal the data owner’s private information, such as the personal identity, locations or even financial profiles etc. To provide privacy guarantees for sensitive data, a straightforward approach is to encrypt the sensitive multimedia data locally before outsourcing. While providing strong end-to-end privacy, encryption also becomes a hindrance to data computation or utilization. Now the challenging problem is how to enable privacy-preserving image feature extractions over massive image data while apparently relieving the database owner of its high computation burden and relying on the cloud for providing fast and effective image feature extraction services. In the existing literature, efforts on privacy-preserving out- sourcing computation have been devoted to various mathe- matical problems including modular exponentiation [6], linear equations [7] and kNN search [8]. These works mainly focus on engineering computation problems over numerical data or text data. Only in recent years, privacy-preserving data search in the ciphertext domain has been extended to content-based multimedia retrieval [9], face recognition [10] and fingerprint identification [11]. The authors in [12], [13] explored how to enable secure image search in the data outsourcing environ- ment. Nevertheless, they all assume that the images have been pre-processed by some feature extraction algorithms to obtain their vector representations. Due to the importance of image feature extraction in multimedia data processing and its heavy operations on massive data, especially for satellite data for its tremendous size and large number of feature points, the extraction or detection of image features from the ciphertext domain has began to attract more and more research interest. To the best of our knowledge, Hsu et al. [14] was the first to investigate privacy-preserving SIFT in the encrypted

Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

1

Securing SIFT: Privacy-preserving OutsourcingComputation of Feature Extractions over Encrypted

Image DataShengshan Hu, Qian Wang, Member, IEEE, Jingjun Wang, Zhan Qin, and Kui Ren, Fellow, IEEE

Abstract—Advances in cloud computing have greatly motivateddata owners to outsource their huge amount of personal multime-dia data and/or computationally expensive tasks onto the cloud byleveraging its abundant resources for cost saving and flexibility.Despite the tremendous benefits, the outsourced multimedia dataand its originated applications may reveal the data owner’sprivate information, such as the personal identity, locations oreven financial profiles. This observation has recently arousednew research interest on privacy-preserving computations overoutsourced multimedia data. In this paper, we propose an effec-tive and practical privacy-preserving computation outsourcingprotocol for the prevailing scale-invariant feature transform(SIFT) over massive encrypted image data. We first show thatprevious solutions to this problem have either efficiency/securityor practicality issues, and none can well preserve the importantcharacteristics of the original SIFT in terms of distinctivenessand robustness. We then present a new scheme design thatachieves efficiency and security requirements simultaneously withthe preservation of its key characteristics, by randomly splittingthe original image data, designing two novel efficient protocols forsecure multiplication and comparison, and carefully distributingthe feature extraction computations onto two independent cloudservers. We both carefully analyze and extensively evaluate thesecurity and effectiveness of our design. The results show thatour solution is practically secure, outperforms the state-of-the-art, and performs comparably to the original SIFT in terms ofvarious characteristics, including rotation invariance, image scaleinvariance, robust matching across affine distortion, addition ofnoise and change in 3D viewpoint and illumination.

Index Terms—Scale-invariant feature transform, privacy-preserving, security, homomorphic encryption, cloud computing.

I. INTRODUCTION

W ITH the increasing popularity of cloud-based dataservices, data owners are highly motivated to store

their huge amount of (potentially sensitive) personal multi-media data files and computationally expensive tasks ontoremote cloud servers. While enjoying the abundant storageand computation resources for cost saving and flexibility, theoutsourcing of data storage and computation to the cloud alsoraises great security and privacy concerns due to the different

S. Hu, Q. Wang and J. Wang are are with The State Key Lab of SoftwareEngineering, School of Computer Science, Wuhan University, Wuhan 430072,China. E-mail: {hushengshan, qianwang, jingjun}@whu.edu.cn. Q. Wang isthe corresponding author.

Z. Qin and K. Ren are with the Department of Computer Science andEngineering, The State University of New York at Buffalo, NY 14260, USA.{zhanqin, kuiren}@buffalo.edu

A preliminary version [1] of this work has been accepted by the IEEEInternational Conference on Computer Communications (INFOCOM’16).

trust domains the data owner and the cloud belong to [2], [3],[4], [5]. More specifically, popular social network providerslike Facebook and Flickr commonly exploit the outsourcedpersonal image data to conduct behavioral advertising andpreference analytics for improving user experience. To do this,instead of directly working on the massive data sets, theyextract image features as inputs to well-defined data miningmodels. On the other side, from the data owner’s perspective,to relieve the heavy computation workload in image featureextraction and utilization on the “Big Data” (e.g., massivesatellite images) for building versatile user-defined applica-tions such as similarity search indexes, more and more usersare inclined to outsource the computation of image featureextractions to the cloud. Obviously, the expose of originalimage data to the semi-trusted cloud service provider mayinevitably reveal the data owner’s private information, such asthe personal identity, locations or even financial profiles etc. Toprovide privacy guarantees for sensitive data, a straightforwardapproach is to encrypt the sensitive multimedia data locallybefore outsourcing. While providing strong end-to-end privacy,encryption also becomes a hindrance to data computation orutilization. Now the challenging problem is how to enableprivacy-preserving image feature extractions over massiveimage data while apparently relieving the database owner ofits high computation burden and relying on the cloud forproviding fast and effective image feature extraction services.

In the existing literature, efforts on privacy-preserving out-sourcing computation have been devoted to various mathe-matical problems including modular exponentiation [6], linearequations [7] and kNN search [8]. These works mainly focuson engineering computation problems over numerical data ortext data. Only in recent years, privacy-preserving data searchin the ciphertext domain has been extended to content-basedmultimedia retrieval [9], face recognition [10] and fingerprintidentification [11]. The authors in [12], [13] explored how toenable secure image search in the data outsourcing environ-ment. Nevertheless, they all assume that the images have beenpre-processed by some feature extraction algorithms to obtaintheir vector representations. Due to the importance of imagefeature extraction in multimedia data processing and its heavyoperations on massive data, especially for satellite data forits tremendous size and large number of feature points, theextraction or detection of image features from the ciphertextdomain has began to attract more and more research interest.

To the best of our knowledge, Hsu et al. [14] was thefirst to investigate privacy-preserving SIFT in the encrypted

Page 2: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

2

domain by utilizing homomorphic encryption Paillier [15].However, their solution is either computationally intractable,or otherwise insecure from the privacy perspective [16]. As afollowing work, Qin et al. [17] presented an improved schemewith the aid of order preserving encryption [18] and randompermutation. In [19], Wang et al. considered the problemof secure and private outsourcing of shape-based featureextraction and proposed two approaches with different levelsof security by using homomorphic encryption and the garbledcircuit protocol [20], respectively. While putting great efforton the privacy or efficiency aspect, one common limitationof the previous solutions is that they all lack comprehensiveanalysis and evaluations with respect to the preservation ofthe key characteristics of the original image feature extractionalgorithm. Recently, we proposed a new scheme [1] to addressthis problem with the aid of garbled circuits [20]. Althoughthis solution has a good preservation of key characteristics ofSIFT, there is still room to further reduce its computation andcommunication costs. More importantly, it is still unable toeliminate edge responses such that the detected keypoints areunstable to small amount noise.

Motivated by the above observations, in this paper, wepropose an effective and practical privacy-preserving compu-tation outsourcing protocol for the prevailing scale-invariantfeature transform (SIFT) over massive encrypted image data.In our design, we first additively and randomly split theimage into two shares and distribute them to two semi-honestnon-colluding cloud servers. Then we design two secureinteractive protocols–batched secure multiplication protocol(BSMP) and batched secure comparison protocol (BSCP)based on somewhat homomorphic encryption (SHE) integratedwith the latest batching technique single-instruction multi-data(SIMD). The former one allows the two servers to securelycompute the products of multiple pairs of their private inputssimultaneously, while the latter one enables them to comparemultiple pairs of private inputs at one time with privacy preser-vation. Based on the two interactive protocols, we then furtherdevelop a new approach to let the two servers collaborativelydetect real locations of stable keypoints via the difference-of-Gaussian (DoG) scale space built on encrypted image, andmain orientations through computing the orientation rangeinstead of their specific values from the encrypted versionsof orientation histogram. Finally, by exploiting the additiveproperty of the encrypted image, the data owner can recoverthe real feature descriptors from the encrypted descriptorsgenerated by the two servers while preventing them fromseeing any information about the original image. Our maincontributions can be summarized as follows.

• We design two novel secure interactive protocols BSMPand BSCP that enable the two servers to compute theproducts and make comparisons of multiple pairs ofintegers simultaneously with privacy preservation by us-ing somewhat homomorphic encryption (SHE) and thebatching technique SIMD.

• We for the first time propose a new and effective privacy-preserving outsourcing protocol for SIFT with the preser-vation of its key characteristics, by randomly splitting

Server s Server s2

Data owner

Encrypted images ①

① Encrypted images③ Encrypted descriptors③

Encrypted

Descriptors

④ Descriptor recovery

② Secure interaction protocols

Fig. 1: System model.

the original image data, carefully distributing the featureextraction computations to two independent cloud serversbased on BSMP and BSCP.

• We carefully analyze our protocol to show that it canpreserve much well the important characteristics of theoriginal SIFT in terms of distinctiveness and robustnessas compared to the existing solutions. We also provideboth a detailed security analysis and an extensive privacyevaluation to demonstrate the privacy-preserving guaran-tee of our design.

• We evaluate our protocol comprehensively with real-world massive image datasets. The results show that oursolution is practically secure, outperforms the state-of-the-art and performs comparably to the original SIFTin terms of various characteristics, including rotation in-variance, image scale invariance, robust matching acrossaffine distortion, change in illumination, addition of noise,and change in 3D viewpoint.

II. PROBLEM STATEMENT

A. System Model

In our work, we consider a cloud-based image featureextraction outsourcing computation system involving threeparties: the data owner O, the cloud server S1, the cloudserver S2 (as illustrated in Fig. 1). In this application scenario,we assume that the data owner O holding a large volume ofsensitive image files (e.g., satellite images that are usuallyof extremely large size and each has plenty of keypoints)is resource-constrained. Thus, O would like to outsourceboth the image data set and the computation-intensive SIFTtask to the cloud by leveraging its abundant storage andcomputation resources. We assume S1 and S2 are independentwith each other and can be considered to belong to twoindependent cloud service providers like Amazon EC2 andMicrosoft Azure. To protect the privacy, O will first encrypteach image set and then distribute the ciphertexts to S1 andS2. After running SIFT algorithm in the ciphertext domainvia a sequence of secure interaction protocols, S1 and S2 willreturn the encrypted feature descriptors to the data owner, whocan eventually recover the real feature descriptors from theirencrypted versions.

B. Threat Model

As discussed above, S1 and S2 are assumed to be twoindependent or say non-colluding servers. Note that, as shown

Page 3: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

3

in [17], delegating all image feature extraction tasks to onlyone cloud entity will obviously lead to the leakage of pixelvalues of an image. Hence, it is necessary to have at least twoindependent entities for achieving privacy-preserving imagefeature extraction outsourcing. To this end, three independentservers are strictly required in [17] while our protocol isimplemented with only two ones, which is easier to deploy inpractice. We emphasize that this non-colluding assumption canbe promised in practice because the cloud service providershave to maintain their reputation and take their own financialinterests into account.

Following all existing privacy-preserving data/computationoutsourcing protocols [6], [14], [19], [17], we also assumethe cloud is semi-trusted (which can be seen as the adversaryin our application scenario), i.e., S1 and S2 will executethe protocol as specified but may try to learn additionalprivate information from the encrypted image data and all theintermediate results generated during the protocol execution.In addition, instead of knowing the pixels’ specific values, theadversary may try to deduce the image content by leveraging i)revealed relationships between pixels (e.g., keypoint locations)and/or ii) encrypted feature descriptors.

III. BACKGROUND

Before elaborating our solutions, we will first give a briefoverview of the Scale-invariant feature transform (SIFT) algo-rithm to make the paper self-contained. Then we will discussthe cryptographic tools used as building blocks to constructour privacy-preserving protocol for SIFT outsourcing.

A. Scale-invariant feature transform (SIFT)

In the area of computer vision and pattern recognition,scale-invariant feature transform (SIFT) is an important fea-ture extraction algorithm that has been widely used due toits distinctiveness and robust matching across a substantialrange of affine distortion, addition of noise, and change inillumination [21]. According to SIFT [21], image features areextracted by the following three stages:Keypoint localization. For an initial image I, a difference-of-Gaussian scale space is built by convolving I with adifference-of-Gaussian (DoG) function as

D(x, y, σ) =(G(x, y, kσ)− G(x, y, σ)) ∗ I(x, y)

=L(x, y, kσ)− L(x, y, σ),(1)

where L(x, y, σ) = G(x, y, σ)∗I(x, y) is the smoothed imageand ∗ is the convolution operation in x and y. In order todetect the local extrema of D(x, y, σ), each sample point iscompared to its 26 neighbors in the current and adjacent scales.If it is larger or smaller than all of them, it will be selectedas a candidate keypoint. Then each candidate keypoint will befiltered to eliminate edge responses such that the final detectedkeypoints are stable.Orientation assignment. One or more orientations will beassigned to each keypoint based on the local image gradi-ent direction. In this way, the keypoint can be representedrelative to its orientation and therefore achieve invariance

to image rotation. Specifically, for the Gaussian smoothedimage L(x, y) that a keypoint lies in, if we denote Diffx =L(x+ 1, y)−L(x− 1, y),Diffy = L(x, y + 1)−L(x, y − 1),the gradient magnitude m(x, y) and orientation θ are computed

as m(x, y) =√

Diff2x + Diff2y and θ(x, y) = tan−1DiffyDiffx

.Then, an orientation histogram H with 36 bins covering

360-degree range is formed from the gradient orientationsof sample points within a region around the keypoint. Eachsample added to the histogram is weighted by its gradientmagnitude and by a Gaussian-weighted circular window. Fi-nally, the highest peak in the histogram is detected and anyother local peaks that are within the 80% of the highest peakare also used to create a keypoint with that orientation.Keypoint descriptor. After being rotated relative to the keypointorientation, a keypoint descriptor is created by computingthe gradient magnitude and the orientation for each samplepoint in a region around the keypoint. These samples areweighted by a Gaussian window and then accumulated intoan orientation histogram with 8 bins covering 360-degreerange over 4×4 subregions. Finally, a 128-dimensional featuredescriptor is formed by using a vector to represent its 16histograms.

B. Cryptographic Tools

Homomorphic encryption is a public-key encryption schemethat supports meaningful operations over the encrypted data. Inparticular, somewhat homomorphic encryption (SHE) is sucha scheme that enables a limited number of both addition andmultiplication operations on the ciphertexts. For our privacy-preserving outsourcing computation system, we propose touse a ring-LWE-based homomorphic cryptosystem [22]. Thisscheme is parametrized by the ring Rq , Zq[x]/〈xn + 1〉,where n is a power of two, an odd prime number q, andan error parameter σ that defines a discrete Gaussian errordistribution χ = DZn,σ with standard deviation σ. Themessage space of the scheme is defined by a prime t asRt , Zt[x]/〈xn + 1〉. We use the tuple (SH.Gen, SH.Enc,SH.Dec) to represent the key generation, the encryption andthe decryption procedures in a SHE scheme, respectively.SH.Gen: Sample a ring element s← χ and define the secretkey sk , s. Sample a uniformly random ring element a1 ←Rq and an error e ← χ and compute the public key pk ,(a0 = −(a1s+ te), a1).SH.Enc(m, pk): Given pk = (a0, a1) and a message m ∈ Rt,the encryption algorithm samples u ← χ, and f, g ← χ, andcompute the ciphertext ct = (c0, c1) , (a0u+ tg +m, a1u+tf).SH.Dec(ct, sk): To decrypt ct = (c0, c1, . . . , cδ), computem =

∑δi=0 cis

i. Output the message as m mod t.The reader may refer to [22], [23] for more detailed param-

eter analysis and cryptosystem construction.One of the most appealing advantages of SHE is that the

batching technique: single-instruction multiple-data (SIMD)of [24] can be used to reduce the communication costand speed up the computation on ciphertexts when solvingproblems on a large scale [25]. SIMD enables one to packl different plaintext elements (where l is typically in the

Page 4: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

4

range of 500 ∼ 10000) in a single ciphertext and thusthe same operation on these elements (also called plaintextslots) can be computed in parallel. Formally, following [24]we let Γl denote the SIMD-type isomorphism mapping thatcan pack l integers into a single one, and let Γ−1l denotethe inverse mapping correspondingly. We also use ⊕ and ⊗to denote homomorphic addition and multiplication on theciphertexts respectively. Then for two l-dimensional integervectors x = (x1, x2, . . . , xl) and y = (y1, y2, . . . , yl), theSIMD-type homomorphic operations can be expressed as

SH.Enc(Γl(x1 + y1, x2 + y2, . . . , xn + yn))

= SH.Enc(Γl(x1, x2, . . . , xl))⊕ SH.Enc(Γl(y1, y2, . . . , yl));

SH.Enc(Γl(x1 × y1, x2 × y2, . . . , xn × yn))

= SH.Enc(Γl(x1, x2, . . . , xl))⊗ SH.Enc(Γl(y1, y2, . . . , yl)).

It clearly shows that a single encryption is sufficient to encodethe vector, i.e., a single homomorphic operation on the ci-phertexts applies the homomorphic operation componentwiseon the entire vector. Readers may refer to [25] for detailedconstructions and security proof of the SHE scheme and theSIMD mapping.

IV. PRIVACY-PRESERVING OUTSOURCING OF SIFT: ANEXAMINATION OF THE STATE-OF-THE-ART

In this section, we provide a detailed analysis of two mostclosely-related works, which reflects the most recent researchprogress and results in privacy-preserving outsourcing of SIFTalgorithm. A careful analysis of these solutions (in termsof security/efficiency and the effectiveness of outsourcing inpreservation of original SIFT’s key characteristics) motivatesus to seek a new solution.

1) HE-SIFT in [14]: Hsu et al. [14] was the first to inves-tigate the problem of secure SIFT outsourcing and proposeda solution that enables the cloud server to run SIFT algorithmin the encrypted domain. Their main idea is to encryptthe initial image by using Paillier cryptosystem [15], whichprovides additive homomorphism and plaintext multiplication.By leveraging the above two properties, all the operations oforiginal SIFT, including common addition, multiplication andcomparison, can be done over ciphertexts. However, a carefulanalysis of this solution shows that their scheme has two mainlimitations that make it inapplicable to real-world applicationsas first pointed out by [16].

First, it introduces a very large computation complexity onthe encrypted data comparison. In the comparison protocolof [14], each comparison needs to traverse half of the plaintextspace between any two thresholds (Note that the plaintexts andthresholds are transformed to or chosen as integers due to theuse of Paillier cryptosystem). Take the following applicationscenario as an example: assuming that 10 thresholds are cho-sen and the secret keys p and q are 1000-bit primes (to ensurethe security of the Paillier cryptosystem), the plaintext spaceZN (N = pq) has about 22000 elements, and thus it will require22000/(2× 10) > 21995 comparison operations. Apparently, itputs an intolerable computation effort on the cloud. Besides,

as the user needs to encrypt and decrypt each pixel in theimage, by using a local machine with an Intel Xeon CPUrunning at 2.0GHz and a memory of 4GB, each encryptionwill cost at least 0.07 seconds for a 1024-bit modulus of thePaillier cryptosystem. For an image of size 300× 300 pixels,the total encryption time will be about 105 minutes. So, it alsoputs huge intolerable workload on the data owner, who onlyhas limited computation resource. On the other hand, if thecomplexity of the comparison protocol is reduced by choosinga smaller plaintext domain, the security will be broken. Forexample, if the secret keys p and q are 16-bit primes, thenthe plaintext space will have 232 elements which makes thecomparison protocol computationally feasible. However, dueto the small sizes of p and q, it can factor N and break thewhole system by successfully launching a brute-force attack.

Second, the content of the image can be deduced by thecloud server. This is because the server has the ability tocompare two arbitrary ciphertexts by the proposed comparisonstrategy. Thus, it can order all the pixels and then traverse allthe possible values in the range [0, 255] (assuming that eachpixel has 8 bits) to recover the image content. For an imageof size n × n, it has n2 elements in total. A quick sortingalgorithm has time complexity O(n2 log n2) and the traversingstep is only O(n2). For the cloud server, the time complexityof launching this attack is polynomial-time.

2) Privacy-Preserving Outsourcing of SIFT in [17]: As afollowing work, Qin et al. [17] proposed a new and efficientsecure SIFT outsourcing protocol. The authors argued thatthe locations of keypoints are protected from revealing to thecloud. However, we show that this is not true and it violatestheir security statements. In [17], to protect the locations ofkeypoints, it generates a set of dummy keypoints and mixesthem with the real ones. To measure its security level, aprobability bound that the adversary can choose a real keypointfrom the mixed keypoint group is given, i.e., P = |r|

|r|+|d| ,where |r| denotes number of real keypoints and |d| denotesnumber of dummy keypoints. In practice, the maximum valueof |d| is about 100 times larger than the value of |r|, whichleads to P ≈ 1%. For an typical image of size 400 × 400pixels, about 2000 keypoints will be generated, but the numberof dummy keypoints is certainly less than the number of totalpixels in the image, say about 160000. Thus, the probabilityP is approximately 2000

2000+160000 ≈ 1.2%. Strictly speaking,however, this probability is non-negligible at all! That impliesthat the cloud still has a good probability of finding the reallocations of keypoints.

In addition to the above limitations, the most seriousproblem with both of the two schemes is that their protocoldesigns accidentally destroy the robustness and distinctivenessof original SIFT, whose original steps are intentionally mod-ified in order to operate over the ciphertexts. As described inSection III, in the original SIFT algorithm, each keypoint isassigned with the image location, the scale and the orientationto provide invariance to these parameters, and the computationof the keypoint descriptor within a local region of the imagemakes it highly distinctive. In fact, these properties explainwhy SIFT outperforms other feature extraction methods andprevails in object recognition. In [14] and [17], however, the

Page 5: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

5

authors have both removed the edge response eliminatingstep in stage 1 (Keypoint localization) and the orientationassignment in the stage 2 (Orientation assignment). Moveover,they created descriptors without sampling in a region in thestage 3 (Keypoint descriptor generation). These modificationsmake their systems unable to work well in image matching,as will be shown by our experimental results in Section IX.

V. PRIVACY-PRESERVING OUTSOURCING OF SIFT:SYSTEM OVERVIEW

In this work, our ultimate goal is to enable privacy-preserving SIFT outsourcing computation while preserving allits key characteristics as much as possbile. To this end, wehave the following specific design goals.Security Constraint: The privacy of the original image contentshould be protected as much as possible from the adversary.Effectiveness Constraint: The extracted feature vectors per-formed by outsourced SIFT algorithm should be close to theresults from original SIFT algorithm as much as possible.Meanwhile, these feature vectors should preserve importantproperties embedded in the original SIFT algorithm.Efficiency Constraint: The local computations done by thedata owner should be substantially reduced compared withperforming SIFT algorithm on his own. The computationburden on the cloud servers should be acceptable consideringtheir ability.

To give a better demonstration, we will first provide asystem overview of our proposed scheme as illustrated inFig. 2(a) while presenting the detail constructions for eachstage in section VII.Image Encryption: The data owner O, who holds a privateimage I, encrypts it and sends the encrypted sub-images I1and I2 to servers S1 and S2, respectively. Then, Si buildstheir own Gaussian scale spaces Li(x, y, σ) and difference-of-Gaussian spaces Di(x, y, σ) based on Ii (i = 1, 2).Keypoint Localization: S1 and S2 run a keypoint localizationprotocol based on BSCP to find the candidate keypoint loca-tions without compromising the privacy of their own inputs.Then they will eliminate edge responses to obtain stablekeypoints with the aid of BSMP. Finally, the locations arerevealed to the two servers.Orientation Assignment: S1 and S2 run an orientation as-signment protocol based on BSCP. By using the orientationsof each sample point around the keypoint (computed in Step1), they will further build their orientation histograms H1 andH2, respectively. Then, the peaks are found in the originalorientation histograms H by using H1 and H2, and eachpeak corresponds to a main orientation (computed in Step 2).Finally, the image data will be rotated relative to the mainorientations, and each rotation generates a feature descriptorfor a keypoint.Descriptor Generation: S1 and S2 build orientation his-tograms with the directions that have been computed inthe above steps without interacting with each other. Then,they respectively transform the orientation histograms intoencrypted feature descriptors V1 and V2 and send them tothe data owner. This phase is called Descriptor Generation

Algorithm 1 Batched Secure Multiplication Protocol

Require: S1 has private data x = (x1, x2, · · · , xN ) ∈ ZNp ;S2 has private data y = (y1, y2, · · · , yN ) ∈ ZNp .

Ensure: S1 obtains s = (s1, s2, · · · , sN ); S2 obtains r =(r1, r2, · · · , rN ).At S1:

1) Generate a private and public key pair (pk,sk), numberl that plaintexts can be packed and then compute d =Nl ;

2) for j = 0, 1, · · · , d− 1 docj = SH.Enc(Γl(xlj+1, xlj+2, · · · , xlj+l), pk);

end for3) Send ciphertexts (c0, c1, · · · , cd−1) and parameterspk, d, l to S2.

At S2:1) Randomly generate (r1, r2, · · · , rN ) ∈ ZNp ;2) Generate T ′ = SH.Enc(Γl(−1,−1, . . . ,−1), pk);3) for j = 0, 1, · · · , d− 1 do

tj = cj ⊗SH.Enc(Γl(ylj+1, ylj+2, · · · , ylj+l), pk);kj = T ′⊗SH.Enc(Γl(rlj+1, rlj+2, · · · , rlj+l), pk);c′j = tj ⊕ kj ;

end for4) Send (c′0, c

′1, · · · , c′d−1) to the S1.

At S1:1) for j = 0, 1, · · · , d− 1 do

(slj+1, slj+2, · · · , slj+l) = Γ−1l (SH.Dec(c′j));end for

return s = (s1, s2, · · · , sN ), r = (r1, r2, · · · , rN ).

Protocol. Finally, the data owner can recover the real featuredescriptors from V1 and V2.

In this work, our key idea is to employ two servers tosimulate SIFT algorithm by using I1 and I2, such that the finalresults equal as much as possible to performing SIFT on theoriginal image I, while the two servers learn nothing about it.Therefore, each interaction between S1 and S2 is particularlydesigned to ensure that the returned results well approximatethe outputs generated by the original SIFT. This is explicitlyshown in our effectiveness analysis in Section VIII-A.

VI. OUR PROPOSED BASIC SECURITY PRIMITIVES

Before elaborating our scheme, we first present two novelprotocols that will be used as sub-routines in our privacy-preserving outsourcing scheme for SIFT. The two newlyproposed protocols are designed based on SHE integrated withSIMD which are shown to outperform the existing methods inefficiency in terms of both computation and communicationoverhead. This will be explicitly illustrated in our experimentsin section IX.

A. Batched Secure Multiplication Protocol

We propose a new protocol for secure multiplication be-tween two parties, the goal of which is that each partyobtains a random share of the product of their private inputs.

Page 6: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

6

Data owner

Keypointlocations

Keypointlocations

Main orientations

Main orientations

Data owner

Orientationhistograms

Orientationhistograms

Keypoint localization

protocol

Orientation assignment

protocolStep 1

Orientation assignment

protocolStep 2

Descriptor generation

protocol

Image encryption

-

-

Orientation histogram

Orientation histogram

I1

I2

I

Scale (the next octave)

Scale (the next octave)

Scale (the first octave)

Scale (the first octave)

...

...L1(x,y,σ)

L2(x,y,σ)

D1(x,y,σ)

D2(x,y,σ)

H1

H2

S1

S2

O

O

......

......

......

......

(a) System Overview.

L1(x,y+1)-L1(x,y-1)

B

S

C

P

Orientation range

ΔDiff1

Fine‐grained

orientation range

Orientation

histogram H1

L2(x,y+1)-L2(x,y-1)

Orientation range

ΔDiff2

Fine‐grained

orientation range

Orientation

histogram H2

Peaks Peaks

Step 1:Orientation computation

Step 2:

Finding peaks

(b) A sketch of orientation assignment protocol.

Fig. 2: System and OSP overviews.

Specifically, for the two servers S1 and S2 who own privatek-bit length integer x1 and y1 respectively, where x1, y1 ∈ Zpand p is a pre-defined parameter that determines the plaintextspace, after executing the protocol, S1 receives a uniformlydistributed random value s1 ∈ Zp and S2 receives a depen-dent uniformly distributed random value r1 ∈ Zp such thats1+r1 = x1×y1 (mod p). During this protocol, no informationregarding x1 and y1 is revealed to the other party.

Different from the existing scheme proposed in [26] wherethe homomorphic cryptosystem Paillier [15] is adopted toencrypt private messages, our batched secure multiplicationprotocol (BSMP) is built on SHE as described above thatenables the two parties to securely compute the products ofmultiple pairs of private integers simultaneously, with com-putation and communication costs greatly reduced. Namely,for two N -dimensional integer vectors x = (x1, x2, . . . , xN )and y = (y1, y2, . . . , yN ) owned by S1 and S2 respectively,the outputs of BSMP will be s = (s1, s2, . . . , sN ) andr = (r1, r2, . . . , rN ) such that si + ri = xi × yi (mod p) for1 ≤ i ≤ N . Since we only consider 8-bit length inputs andp is unique and usually set to be more than 16bits in length,the bit length of intermediate results are much less than p.Thus we will drop this modulo p operations in the followingcontext for ease of exposition. At the beginning of BSMP, S1will first encrypt x by using SIMD in a packing manner. Herefor simplicity, we assume that N is a multiple of l. If not,we can append enough 0’s to x to achieve this goal. Then S1sends the encrypted version of x to S2, who will then multiplyit with y by utilizing the homomorphic properties. Finally, S2will blind the multiplication result with randomly-generatednumbers and send it to S1 for decryption. The explicit stepsof the protocol BSMP are depicted in Alg. 1.

B. Batched Secure Comparison Protocol

Here we propose a new protocol for secure comparison thatenables two servers to compare two integers without revealingthe privacy of their own inputs. Compared with the verygeneral secure comparison protocol garbled circuit [20], oursubsequent experiments will show that our proposed schemeperforms much better in efficiency.

The key idea of our comparison protocol is motivatedby [27], where a secure comparison protocol is proposed based

on the secure scalar product protocol (SPP) proposed in [26].However, different from their design, we use SHE combinedwith SIMD to construct a batched secure comparison protocol(BSCP) instead of using SPP. As a result, we are able tocompare multiple pairs of integers at one time with privacypreservation guarantee and substantial communication andcomputation overhead reduction.

Specifically, if we want to securely compare two k-bitlength integers x, y ∈ Zp, whose binary representationsare xkxk−1 · · ·x1 and ykyk−1 · · · y1, respectively. Then, fori = k, k − 1, · · · , 1 we compute ai and bi as follows• If i = k, then ai = xi × (1− yi), bi = yi × (1− xi);• If i < k, thenai = (1− bi+1)× (ai+1 + (1− ai+1)× xi × (1− yi)),bi = (1− ai+1)× (bi+1 + (1− ai+1)× yi × (1− xi)).

If the final result a1 (b1) equals 1, we claim that x is greater(smaller) than y. Based on this result, S1 will first conductbitwise encryption of x and send the ciphertexts to S2. Herewe also assume that the total number of integers N is amultiple of the batched number l for ease of representationas did in BSMP. Then S2 will compute the above equationsover ciphertexts based on the homomorphic properties. Afterk iterations, S2 obtains the final encrypted results, and sendsthem to S1 for decryption to get the final results in plaintextform. The explicit steps are presented in Alg. 2.

VII. PRIVACY-PRESERVING OUTSOURCING OF SIFT: OURCONSTRUCTION

In this section, we will describe each protocol explicitlyshown in the Fig. 2(a). Each step is carefully designed tosatisfy the three constraints mentioned above.

A. Image Encryption

For an original image I with size of n×n pixels, we use thematrix I(x, y) (0 6 x, y < n) with each entry of 8-bit length.The data owner randomly selects n2 integers from [0, 255] togenerate a random matrix I2(x, y) and then encrypts I(x, y)by computing I1(x, y) as: I1(x, y) = I(x, y) + I2(x, y). Theciphertexts I1 and I2 are sent to S1 and S2, respectively. Notethat each element of the matrix I1(x, y) will be representedwith 9 bits due to the potential overflow.

Page 7: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

7

Algorithm 2 Batched Secure Comparison Protocol

Require: S1 has private N -dimensional integer vector x =(x1, x2, . . . , xN ) ∈ ZNp , where xj’s binary representationis xkjx

k−1j · · ·x1j . S2 has private N -dimensional integer

vector y = (y1, y2, . . . , yN ) ∈ ZNp , where yj’s binaryrepresentation is ykj y

k−1j · · · y1j .

Ensure: The public binary vector z = (z1, z2, . . . , zN ), wherezj = 1 iff xj > yj .At S1 :

1) Generate a private key and public key pair (pk,sk) andthe number l. Compute d = N

l ;At S1 :

1) for i = k, k − 1, · · · , 1 dofor j = 0, 1, · · · , d− 1 docij = SH.Enc(Γl(x

ilj+1, x

ilj+2, . . . , x

ilj+l), pk);

end forend for

2) Send ciphertexts (ck0 , · · · , ckd−1, · · · , c10, · · · , c1d−1)and parameters pk, d, l to the server S2;

At S2 :

1) Randomly select d × k integers from Zp, that is(rk0 , · · · , rkd−1, · · · , r10, · · · , r1d−1);

2) Compute T = SH.Enc(Γl(1, 1, . . . , 1), pk) and T ′ =SH.Enc(Γl(−1,−1, . . . ,−1), pk);

3) for j = 0, 1, · · · , d− 1 dofor i = k, k − 1, · · · , 1 dowij = SH.Enc(Γl(y

ilj+1, y

ilj+2, . . . , y

ilj+l), pk);

if i = k thenaij = cij(T ⊕ T ′ ⊗ wij);bij = wij(T ⊕ T ′ ⊗ cij);

end ifif i < k thenaij = (T⊕T ′⊗bi+1

j )(ai+1j ⊕(T⊕T ′⊗ai+1

j )⊗cij ⊗ (T ⊕ T ′ ⊗ wij));bij = (T ⊕T ′⊗ai+1

j )(bi+1j ⊕(T ⊕T ′⊗bi+1

j )⊗wij ⊗ (T ⊕ T ′ ⊗ cij));

end ifend for

end for4) Send the binary vector (a10, a

11, . . . , a

1d−1) to the server

S2.At S1 :

1) for j = 0, 1, · · · , d− 1 do(zlj+1, zlj+2, . . . , zlj+l) = Γ−1l (SH.Dec(a1j ));

end for2) Send (z1, z2, · · · , zN ) back to S2.

return (z1, z2, · · · , zN ).

B. Keypoint Localization Protocol

1) Finding Candidate Keypoint: After receiving ciphertextsIi from O, Si creates Gaussian spaces Li(x, y, σ) (i ∈ {1, 2})by conducting convolution and downsampling operations on

it with the variable-scale Gaussian G(x, y, σ). Specifically, S1will compute

L1(x, y, σ) = G(x, y, σ) ∗ I1(x, y), (2)

where ∗ denotes the convolution operation in x and y, andG(x, y, σ) = 1

2πσ2 e−(x2+y2)/2σ2

. As shown in Fig. 2(a), foreach octave of the scale space, the original image is repeatedlyconvolved with G(x, y, σ) by Eq. (2) to produce the set ofimages in the scale space. After generating each octave, theGaussian image is down-sampled by a factor of 2, and theprocess repeats. Note that the number of octaves (whichdetermine the repetition times) and number of scales per octaveare pre-defined by the data owner.

Then, S1 further computes the difference-of-Gaussian func-tion D1(x, y, σ) from the substraction of adjacent Gaussianimages as

D1(x, y, σ) = L1(x, y, rσ)− L1(x, y, σ),

where r is a constant multiplicative factor pre-defined by thedata owner. In the meantime, S2 will do the same operationsfor I2(x, y) and obtain D2(x, y, σ). Note that both serversoperate in the same order to generate a homologous scalespace.

Next, in the process of extremum detection, each samplepoint is compared with its 26 neighbors in the current andadjacent scales in the original difference-of-Gaussian scalespace D(x, y, σ). If it is larger or smaller than all of them, itwill be selected as a candidate keypoint. To determine whichis larger between points D(x, y, σ) and D(x, y + 1, σ), S1and S2 will compute ∆D1 = D1(x, y, σ) − D1(x, y + 1, σ)and ∆D2 = D2(x, y, σ) − D2(x, y + 1, σ), respectively.Then, they will work together to compute which is largerbetween ∆D1 and ∆D2 by using BSCP. If ∆D1 > ∆D2,we will claim that D(x, y, σ) > D(x, y + 1, σ), otherwiseD(x, y, σ) < D(x, y + 1, σ).

2) Eliminating Edge Responses: After the candidate key-points are located, edge response eliminating will be conductedto exclude the points that are not stable. For a detectedcandidate keypoint D1(x, y, σ) on the server S1, its derivativesin x-direction, y-direction and xy-direction are first computedas follows to constitute its Hessian matrix

R1xx =D1(x+ 1, y, σ) +D1(x− 1, y, σ)− 2D1(x, y, σ)

R1yy =D1(x, y + 1, σ) +D1(x, y − 1, σ)− 2D1(x, y, σ)

R1xy =0.25(D1(x+ 1, y + 1, σ)−D1(x+ 1, y − 1, σ)

−D1(x− 1, y + 1, σ) +D1(x− 1, y − 1, σ)).

Let R2xx,R2yy and R2xy denote that computed on theserver S2. By using BSMP with R1xx and R2yy as thetwo private inputs, S1 is able to get a share of the productof R1xx and R2yy denoted by M1(R1xxR2yy), and S2is able to get the other share denoted by M2(R1xxR2yy)such that M1(R1xxR2yy) +M2(R1xxR2yy) = R1xxR2yy .Similarly, S1 can get the shares M1(R1yyR2xx) andM1(R1xyR2xy), and S2 can get the shares M2(R1yyR2xx)

Page 8: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

8

andM2(R1xyR2xy). Then S1 will compute the trace and thedeterminant of the Hessian matrix as

Tr1 =R1xx +R1yy

Det1 =R1xxR1yy −R21xy − (M1(R1xxR2yy)

+M1(R1yyR2xx)− 2M1(R1xyR2xy)).

Correspondingly, S2 computes the trace Tr2 and the determi-nant Det2. Finally, to determine whether the point D1(x, y, σ)is a stable one, S1 and S2 will first compare Det1 and−Det2. If Det1 6 −Det2, then the point should be re-jected immediately, otherwise they will further compute Z1 =rTr21 − 2rM1(Tr1Tr2) − (r + 1)2Det1 and Z2 = −(rTr22 −2rM2(Tr1Tr2)− (r+ 1)2Det2) respectively, where r is a pre-defined threshold andM1(Tr1Tr2) andM2(Tr1Tr2) denote thetwo shares of the product of Tr1 and Tr2 that are obtained byusing BSMP. And then they will involve in BSCP with Z1 andZ2 as the two private inputs. If Z1 < Z2, then we claim thatthe point D1(x, y, σ) is stable, otherwise eliminate this point.

Remarks. To reduce the rounds of interactions in the im-plementation of BSCP and BSMP, S1 can prepare all the datasimultaneously and transmit all the encrypted data in a one-time transmission. For example, in detecting the candidatekeypoints, S1 and S2 can compute ∆D1 and ∆D2 betweenall neighboring two points in the scale space simultaneously.Then S1 can transmit all encrypted ∆D1’s to S2 in a one-timetransmission. Consequently, the number of rounds of interac-tions required in detecting all the keypoints is a constant.

C. Orientation Assignment Protocol

To give a better illustration of this protocol, we first providea sketch that depicts the workflow in Fig. 2(b).

1) Orientation Computation: Once the location of a key-point is determined, the gradient magnitude and the orientationshould be computed according to pixel differences so asto build an orientation histogram. The scale of a keypointD(x, y, σ), associated with a parameter σ, is used to selectthe Gaussian smoothed image L that has the closest scale,so all the following computations are performed in a scale-invariant manner. Thus, for ease of expression we will omitparameter σ in the following discussions.

For the orientation computation on the keypoint L(x, y)(which is mapped back from D(x, y)), S1 and S2 willfirst determine which is larger between L(x, y + 1) andL(x, y − 1), and between L(x + 1, y) and L(x − 1, y),respectively. Here, we set two variables α and β as: α = 1if L(x, y + 1) > L(x, y − 1); otherwise α = −1, andβ = 1 if L(x + 1, y) > L(x − 1, y); otherwise β = −1.Secure comparisons of the two pairs can be achieved byusing the protocol BSCP with L1(x, y + 1) − L1(x, y − 1)and L2(x, y + 1) − L2(x, y − 1) being the two inputs fromS1 and S2 respectively, L1(x + 1, y) − L1(x − 1, y) andL2(x+ 1, y)−L2(x− 1, y) being the other pair of inputs. Asa result, which range the orientation lies in can be obtained,i.e., 0◦ ∼ 90◦, 90◦ ∼ 180◦, 180◦ ∼ 270◦, 270◦ ∼ 360◦. To geta more fine-grained direction range, S1 computes

∆Diff1 = Diff1y − kDiff1x= α(L1(x, y + 1)− L1(x, y − 1))

− kβ(L1(x+ 1, y)− L1(x− 1, y)).

On the other side, S2 will get ∆Diff2 = Diff2y−kDiff2x, wherek is a constant used to determine the orientation interval. Then,S1 and S2 will involve in BSCP computation with ∆Diff1and ∆Diff2 as the two inputs. If ∆Diff1 > ∆Diff2, we saythat the direction degree is larger than arctan k. Repeatedlyexecuting the above steps for a series of pre-defined k’s,a fine-grained direction degree range can be obtained. Inour system, we select k to be eight values respectively, i.e.tan 10◦, tan 20◦, . . . , tan 80◦. Therefore, a 360-degree rangeof orientations can be fully covered with each span of theinterval to be 10◦.

2) Gradient Magnitude Computation: In our outsourcingmodel, we define our gradient magnitude as

m(x, y) = |Diffx|+ |Diffy|, (3)

where Diffx = L(x + 1, y) − L(x − 1, y),Diffy = L(x, y +1) − L(x, y − 1). Accordingly, S1 and S2 compute gradientmagnitudes m1(x, y) = Diff1x + Diff1y and m2(x, y) =Diff2x + Diff2y , respectively. Then, they respectively accumu-late m1 and m2 according to their orientations, after beingweighted by the same Gaussian window, to build their ownencrypted orientation histograms H1 and H2.

3) Finding Peaks in the Orientation Histogram: To detectpeaks in the original orientation histogram H, all we need is tofind whose accumulated gradient magnitude is larger betweentwo bins. Let

∑10◦ m1 and

∑20◦ m1 denote the accumulated

gradient in H1 whose orientations lie in 0◦ ∼ 10◦ and 10◦ ∼20◦, respectively. Let

∑10◦ m2 and

∑20◦ m2 denote those

computed in H2. Then with the aid of BSCP with∑

10◦ m1−∑20◦ m1 and

∑10◦ m2 −

∑20◦ m2 being the two inputs, if

the former is larger, we say the bin of 0◦ ∼ 10◦ is higher;lower otherwise. After repeating this comparison between anytwo bins, we are able to find the highest peak and all the localpeaks. To detect any local peak within 80% of the highestpeak in H, S1 and S2 will compute which is larger between∑LP m1− 0.8

∑HP m1 and

∑LP m2− 0.8

∑HP m2 where∑

LP mi corresponds to the bin with local peak and∑HP mi

(i ∈ 1, 2) corresponds to the bin with the highest peak. As aresult, for locations with multiple peaks of similar magnitudes,there exists multiple keypoints created at the same location andscale, but with different orientations as the original SIFT does.Note that, in the above protocol, constant-round interactionscan also be achieved by encrypting all the data simultaneouslyand transmitting the encrypted data at one time.

D. Descriptor Generation Protocol

Once S1 and S2 have detected dominant orientations on akeypoint, they will first generate their respective descriptorsas follows.

First, the coordinates and the gradient orientations arerotated relative to a keypoint orientation. Note that the specificvalue of the gradient orientation is needed in the process of

Page 9: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

9

(a) The 3000 × 3000pixel original highlystructured image.

(b) 21277 keypointsfound in Lowe’s scheme.

(c) 31435 keypointsfound in ours.

(d) The 3000 × 3000pixel original poorlystructured image.

(e) 3688 keypoints foundin Lowe’s scheme.

(f) 10842 keypointsfound in ours.

(g) The 3000 × 3000pixel original texture im-age.

(h) 35299 keypointsfound in Lowe’s scheme.

(i) 45888 keypointsfound in ours.

Fig. 3: Comparison of the number of keypoints generated in Lowe’s schemeand ours for different types of images.

rotation, so we propose to use the median of the interval itsorientation lies in to replace the actual orientation value. Forexample, if the orientation of one point lies in the interval10◦ ∼ 20◦, we use 15◦ as its orientation to perform rotation.Our subsequent theoretical analysis and experiments showthat this approximate orientation substitution has a negligibleimpact on the final results.

Second, in the 4 × 4 subregions around the keypoint,gradient magnitudes of sample points (as shown in Eq. (3)) areweighted by a Gaussian window and accumulated into 4 × 4orientation histograms. Then, each histogram consists of 8 binsand the span of each bin is 45◦ so that a 360-degree range oforientations can be covered. To improve the system efficiency,all the gradients are pre-computed in the phase of orientationassignment.

Finally, a 128-dimensional feature vector is used to repre-sent 16 histograms for a keypoint. Let V1 and V2 denote thefeature vectors generated by S1 and S2, respectively. Theywill be sent to O, who can recover the actual feature vectorby computing V = V1−V2, which will be normalized to unitlength by O in order to achieve invariance to affine changesin illumination.

VIII. SYSTEM ANALYSIS

In this section, we will give a careful analysis of theproposed scheme in terms of effectiveness and security. Wewill show that the proposed scheme is carefully constructedto meet both the effectiveness and the security constraintsdescribed in our design goals in Section V.

A. Effectiveness Analysis

In this subsection, we will provide a theoretical examinationof the effectiveness of our scheme step by step in the sense thatthe extracted features are very close to the outputs generatedby the original SIFT.Keypoint Localization Protocol: In the keypoint localization,when comparing two neighbors to detect the extrema afterbuilding the difference-of-Gaussian space, we have

∆D1 −∆D2

= [D1(x, y, σ)−D1(x, y + 1, σ)]− [D2(x, y, σ)−D2(x, y + 1, σ)]

= [D1(x, y, σ)−D2(x, y, σ)]− [D1(x, y + 1, σ)−D2(x, y + 1, σ)]

= D(x, y, σ)−D(x, y + 1, σ)

which means that the comparison of ∆D1 and ∆D2 isequivalent to determine which is larger between D(x, y, σ) andD(x, y+ 1, σ). Thus, all candidate keypoints can be correctlylocalized by the cloud servers. Note that in the above equationswe use the property D(x, y, σ) = D1(x, y, σ) − D2(x, y, σ),which can be easily deduced from the equation L(x, y) =G ∗ I(x, y) = G ∗ (I1(x, y)−I2(x, y)) = L1(x, y)−L2(x, y)

Then in the process of edge response eliminating, whencomputing derivatives we have

Rxx =D(x+ 1, y, σ) +D(x− 1, y, σ)− 2D(x, y, σ)

=D1(x+ 1, y, σ) +D1(x− 1, y, σ)− 2D1(x, y, σ)

−[D2(x+ 1, y, σ) +D2(x− 1, y, σ)− 2D2(x, y, σ)]

=R1xx −R2xx

Similarly, we can have Ryy = R1yy − R2yy and Rxy =R1xy−R2xy , which will lead to the following equations whencomputing the Hessian’s determinants,

Det =RxxRyy −R2xy

=(R1xx −R2xx)(R1yy −R2yy)− (R1xy −R2xy)2

=[R1xxR1yy −R21xy − (M1(R1xxR2yy)

+M1(R1yyR2xx)− 2M1(R1xyR2xy))]

+ [R2xxR2yy −R22xy − (M2(R1xxR2yy)

+M2(R1yyR2xx)− 2M2(R1xyR2xy))]

=Det1 + Det2

Note that the above equations hold due to the propertyM1(R1xxR2yy +M2(R1xxR2yy) = R1xxR2yy . Based onthe above results, to determine whether the candidate keypointis a stable keypoint, if Det1 + Det2 6 0, we have Det 6 0which indicates that the candidate keypoint has differentsigns for curvatures and should be rejected immediately. IfDet1 + Det2 > 0, we have

Z1 −Z2

= rTr21 − 2rM1(Tr1Tr2)− (r + 1)2Det1− [−(rTr22 − 2rM2(Tr1Tr2)− (r + 1)2Det2)]

= r(Tr21 + Tr22 − 2M1(Tr1Tr2)− 2M2(Tr1Tr2))

− (r + 1)2(Det1 + Det2)

= r(Tr1 − Tr2)2 − (r + 1)2(Det1 + Det2)

= rTr2 − (r + 1)2Det

Note that the above equations hold based on M1(Tr1Tr2) +M2(Tr1Tr2) = Tr1Tr2 and Tr = Rxx + Ryy = Tr1 − Tr2.

Page 10: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

10

Then if Z1 < Z2, there has Tr2Det <

(r+1)2

r , which indicatesthat the candidate keypoint has a low principal curvature andis a stable one.

Note that in our scheme, we cannot remove some unstablekeypoints by fitting a 3D quadratic function as the originalSIFT did [21]. This is because this filtering step containsdivision and derivative operations that are difficult to beimplemented in the ciphertext domain. Therefore, as shownin Fig. 3, we compare the number of keypoints generated inLowe’s scheme [21] and ours. As the number of the keypointsvaries greatly with the images’ content, the average numberof keypoints cannot give a better illustration and thus we onlychoose three specific images with size of 3000 × 3000 forhighly structured image, poorly structured image and textureimage. Fig. 3 shows that the number of keypoints generatedin our scheme is about 1.5 times as much as that of Lowe’s.This is roughly equivalent to the results presented in [21].Orientation Assignment Protocol: In the computation of ori-entations, we have

∆Diff1 −∆Diff2= [Diff1y − kDiff1x]− [Diff2y − kDiff2x]

= [α(L1(x, y + 1)− L1(x, y − 1))− kβ(L1(x+ 1, y)− L1(x− 1, y))]

− [α(L2(x, y + 1)− L2(x, y − 1))− kβ(L2(x+ 1, y)− L2(x− 1, y))]

= α[(L1(x, y + 1)− L2(x, y + 1))− (L1(x, y − 1)− L2(x, y − 1))]

− kβ[(L1(x+ 1, y)− L2(x+ 1, y))− (L1(x− 1, y)− L2(x− 1, y))]

= α[L(x, y + 1)− L(x, y − 1)]− kβ[L(x+ 1, y)− L(x− 1, y)]

= |Diffy | − k|Diffx|(4)

When ∆Diff1 > ∆Diff2, we will have |Diffy| > k|Diffx|,which leads to |Diffy|

|Diffx| > k. Therefore, we can claim thatthe direction degree of this keypoint is larger than arctan k.As discussed in Section VII-C, by choosing a series of k’s,the orientation of each keypoint can be located in a intervalwhose span is 10◦. So it is enough to construct an orientationhistogram with 36 bins covering the 360-degree range as theoriginal SIFT does, though their specific values of orientationstill remain unknown.

In the process of finding peaks, our goal is to determinewhich is larger between two bins in the original histogramH. For ease of analysis, we assume

∑10◦ m1 = w1m1(x, y)

and∑

20◦ m1 = w2m1(x′, y′), where w1 and w2 denote thecorresponding Gaussian weight values. It means that the binsof 0◦ ∼ 10◦ and 10◦ ∼ 20◦ in the histogram built by S1consist of only one point. Accordingly, we can get

∑10◦ m2 =

w1m2(x, y) and∑

20◦ m2 = w2m2(x′, y′) computed on S2.Then, when comparing

∑10◦ m1−

∑20◦ m1 and

∑10◦ m2−∑

20◦ m2), we have

(∑10◦

m1 −∑20◦

m1)− (∑10◦

m2 −∑20◦

m2)

=w1[m1(x, y)−m2(x, y)]− w2[m1(x′, y′)−m2(x′, y′)]

=w1[(Diff1x + Diff1y)− (Diff2x + Diff2y)]

− w2[(Diff1x′ + Diff1y′ )− (Diff2x′ + Diff2y′ )]

=w1(|Diffx|+ |Diffy |)− w2(|Diff′x|+ |Diff′y |)=w1m(x, y)− w2m(x′, y′)

=∑10◦

m−∑20◦

m,

(5)

which explains that this comparison is able to find out whichbin has the largest accumulated gradient magnitude in thehistogram H. Similarly, local peaks within 80% of the highestpeak can also be detected. To sum up, the main orientationscan be correctly computed by the Orientation AssignmentProtocol.Descriptor Generation Protocol: First, we will theoreticallyshow that the approximate orientation substitution in the imagerotation has a very low error probability, which can be ignored.In the orientation assignment, each direction is located in aninterval, and we use the median to represent its value. Ifone point, for example, lies in the interval 0◦ ∼ 10◦, wewill assume its direction is 5◦. Since the rotation degrees are0◦, 10◦, 20◦, . . . , 350◦, and it needs to determine which newinterval (i.e. , 0◦ ∼ 45◦, 45◦ ∼ 90◦, . . . , 315◦ ∼ 360◦) therotated direction lies in, the error that the rotated direction isdistributed into a wrong interval will happen with probability12 only when the rotated interval contains 45◦, 135◦, 225◦

and 315◦, such as 40◦ ∼ 50◦, 130◦ ∼ 140◦, 220◦ ∼ 230◦

and 310◦ ∼ 320◦. Hence, the total error probability is436 ×

12 = 1

18 ≈ 6%. In order to further demonstrate thatthis probability is low enough to be ignored in practice, wedefine the error rate as:

ErrV =< V − V ′ >|V ′|

where V denotes the feature vector extracted by the ourscheme and V ′ denotes the feature descriptor extracted by theoriginal SIFT. As shown in Fig. 5(a), the maximum error rateof the orientation is lower than 5%, which is small enough tobe ignored in practice. In Section IX, our experimental resultswill further validate our theoretical analysis.

Finally, in the descriptor recovery process, for the entrycorresponding to the bin of 0◦ ∼ 10◦ in the descriptor, thedata owner O has∑

10◦m1 −

∑10◦

m2 = w1m1(x, y)− w1m2(x, y)

= w1m(x, y) =∑10◦

m.

Note that the above equation holds by Eq. (5), and it has thesame property for all the other entries in the descriptor. Thismeans that each entry O derives by computing V = V1 − V2is the accumulated gradient magnitude in that orientation ofthe original histogram. Therefore, O is able to recover the realdescriptor correctly in this way.

B. Security Analysis

1) Confidentiality of Pixel Values: In the Image Encryptionstep, each image I represented by a matrix is disguised byadding a random matrix. After splitting, each cloud serverreceives a random matrix. If the image has size of 300× 300and each pixel has 8 bits, the server has to use brute-forceapproach which needs 256300×300 operations to recover theimage, and it is computationally-infeasible in practice. As forthe Keypoint Localization Protocol, the two servers interactwith each other and may derive some additional information.Since our proposed interaction protocols BSMP and BSCP

Page 11: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

11

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (l)

Fig. 4: Recovered images by the cloud server.

are built on semantically-secure SHE and all the intermediateresults are generated in the encrypted form, the two serverscannot learn anything but the final encrypted results. Specifi-cally, in BSMP, S1 only observes a random encryption of thetwo private inputs’ product as s = xy− r, where r is randomand held by S2. Thus, both S1 and S2 can derive nothing at all.In BSCP, the servers only know large or small of pixel valuesbetween any two neighbors, thus they are unable to computethe specific value of each pixel. Similarly, in the OrientationAssignment Protocol, the servers can obtain nothing but somemore complex but unuseful relationships between pixels ina small region around keypoints. For example, they may getL(x, y+1) > L(x, y−1) and |Diffy| > k|Diffx| in orientationcomputation, and

∑10◦ m >

∑20◦ m in finding peaks. These

available inequalities are still insufficient to compute the exactvalue of L(x, y). Therefore, we claim that the confidential ofpixel values are well protected against the servers.

2) Privacy of Image Content: We next show that the serversare unable to deduce the original image content by whatthey have obtained. As illustrated above, the servers can onlyget some inequalities about the pixel values, so they haveto traverse all the possibilities for each pixel that satisfythese inequalities. We argue that in order to reconstruct thewhole image, the most useful information is the knowledgeof the relationship between any two neighboring pixels whileother inequality constraints have very limited contribution inrecovering pixels. Fig. 4 depicts the cloud server’s possiblereconstructions of twelve images that fulfill this relationship.These images are randomly chosen from twelve categories ofCaltech256 Dataset [28], and correspond to a bat, a clutter,a duck, a iguana, a butterfly, a touring-bike, a centipede,a chimp, a comet, a crab, a toad and a fighter-jet fromFig. 4(a) to Fig. 4(l). Obviously, it is almost impossible torecognize the original content in the recovered images, letalone the details. On the other hand, we show that there existsexponential number of possible reconstructions that satisfy thegiven constraints between any two neighboring pixels. For animage of size n×n with each pixel of 8-bit length, as shownin Fig. 5(b) with n = 10, the values of pixels in each squarehave

(2564

)different possibilities. The rest pixels are supposed

to have only one possibility so that all the neighbors can meetthe given constraints. Since there exist at least (n4 )2 squares

0 200 400 600 800 1000 1200 1400 1600 1800 2000Number of points

0

1

2

3

4

5

6

7

8

Avera

ge e

rror

rate

(%

)

(a) The average error. (b) Structure analysis for im-age recovery.

Fig. 5: The average error and structure analysis.

in the image, the total number of different possibilities is atleast (

(2564

))(

n4 )2 . Hence, there has an exponential number

of possible reconstructions of the original image, which isimpractical for the servers to launch an exhaustive search.

IX. EXPERIMENTAL EVALUATION

In this section, we conduct various experiments to evalu-ate the effectiveness and the efficiency of our secure SIFToutsourcing scheme using representative real-world imagedatasets. All our experiments are implemented by C++ pro-gramming language on Linux machines, each of which iswith an Intel Core CPU 3.1GHz processor and a 12GB RAMrunning Ubuntu 14.04 LTS.

A. Effectiveness Evaluation

1) Descriptor Evaluation: In the following experiments,we compare our scheme with Qin et al.’s scheme [17], Hsuet al.’s scheme [14] and the original SIFT proposed byLowe [21] at the same time. To have a better evaluationon the distinctiveness and robustness of feature vectors, wechoose to measure the performance on keypoint matchingexperiment as did in [29]. That is, given an interest point, wewill find all matches of that interest point in a dataset. Morespecifically, we will compute Euclidean distances betweenfeature vectors to find the nearest point and the second nearestone in the dataset. A match between the interest point andits nearest point occurs if the distance ratio between thesetwo points is below a threshold t. We conduct experimentson real image dataset: INRIA Graffiti [30], which containsimages of graffiti-covered walls taken from different cameraviewpoints changes. For each change, it contains 6 imageswith a gradual geometric or photometric transformation, andall images in this dataset are approximately in 800 × 600pixels. We then use well-acknowledged evaluation metricsrecall and 1-precision defined in [31] to quantify our re-sults: recall = number of correct-matches

total number of positives and 1-precision =number of false-matches

total number of matches , where a correct-match is a matchwhere the two keypoints correspond to the same physical lo-cation while a false-match is a match where the two keypointscome from different physical locations, and the total numberof positives is known as a priori for a given dataset.

In Fig. 6, we plot recall VS. 1-precision for the imagematching experiment by varying the distance ratio threshold t(0 6 t 6 1). In Fig. 6(a), the target images are rotated by 20◦

Page 12: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

12

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.71-precision

0.0

0.2

0.4

0.6

0.8

1.0

Reca

ll

Lowe' SchemeQin et al.' SchemeHsu et al.' SchemeOur Scheme

(a) Matching experiments where tar-get images are rotated by 20◦ andscaled by 20%.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.71-precision

0.0

0.2

0.4

0.6

0.8

1.0

Reca

ll

Lowe' SchemeQin et al.' SchemeHsu et al.' SchemeOur Scheme

(b) Matching experiments where tar-get images have a 3D viewpointchange of 20◦.

0.0 0.2 0.4 0.6 0.8 1.01-precision

0.0

0.2

0.4

0.6

0.8

1.0

Reca

ll

Lowe' SchemeQin et al.' SchemeHsu et al.' SchemeOur Scheme

(c) Matching experiment where targetimages are blurred.

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.81-precision

0.0

0.2

0.4

0.6

0.8

1.0R

eca

ll

Lowe' SchemeQin et al.' SchemeHsu et al.' SchemeOur Scheme

(d) Matching experiment where targetimages have illumination change.

Fig. 6: Recall versus 1-precision.

10 15 20 25 30 35 40 45 50Rotation angle (o)

0.0

0.2

0.4

0.6

0.8

1.0

Pre

cisi

on

Lowe' SchemeQin et al.' SchemeHsu et al' SchemeOur Scheme

(a) Precision varies as a function ofrotation angle.

20 25 30 35 40 45 50Scale (%)

0.0

0.2

0.4

0.6

0.8

1.0

Pre

cisi

on

Lowe' SchemeQin et al.' SchemeHsu et al.' SchemeOur Scheme

(b) Precision varies as a function ofscale ratio.

Fig. 7: Rotation invariance and image scale invariance.

and scaled by 20%. It is shown that when t is rather small,our scheme and Lowe’s scheme both perform better in findingthe correct matches, i.e., achieving a high recall that is gettingclose to the maximum 0.5 and a low 1-precision that is muchsmaller than 0.1, while the recalls of Qin et al.’s scheme andHsu et al.’s scheme are low (i.e., smaller than 0.2) and their1-precision are high (i.e., larger than 0.1). Note that when tapproaches to 1, the 1-precision in our scheme and Lowe’sremain around 0.5 but that of Qin et al.’s scheme and Hsu etal.’s scheme have quickly increased to 0.7. The results indicatethat Qin et al.’s scheme and Hsu et al.’s scheme will generatemuch more false matches than our scheme and Lowe’s underthe same threshold. The reason why Qin et al.’s scheme andHsu et al.’s scheme have the same performance is that theyalways generate the same feature descriptors.

Fig. 6(b) is obtained when the target images have a 3Dviewpoint change of 20◦, it is shown that our scheme andLowe’s almost have the same performance and both of themoutperform Qin et al.’s scheme and Hsu et al.’s scheme.

Fig. 6(c) measures the performance for images with asignificant amount blur which is introduced by changing thecamera focus. The results show that all descriptors generatedby the three schemes are affected by this type of image

degradation. But Lowe’s and our scheme clearly dominate Qinet al.’s scheme and Hsu et al.’s scheme especially when thethreshold t is rather low. This is because [17] and [14] arenot able to eliminate unstable points that are poorly localizedalong an edge (and are therefore sensitive to noise).

Fig. 6(d) shows the results for illumination changes whichhave been obtained by changing the camera setting. It againshows that both our scheme and Lowe’s perform better thanthe competing approaches. However, [17] and [14] have a lowrecall due to the reason that they have computed their featuredescriptors without sampling in a region.

To further demonstrate the property of invariance to rotationand image scale, Fig. 7 depicts the precision as a function ofrotation angle and scale ratio respectively. It shows that bothour scheme and Lowe’s scheme maintain a stable and highprecision (i.e. , always larger than 0.9) no matter how theimage rotated. On the contrary, the precision of Qin et al’sscheme and Hsu et al.’s scheme fall sharply when the imageis only rotated by 20◦. When it comes to scale change, bothour scheme and Lowe’s decrease slowly in precision with theincrease of scale ratio, and they will stay above 0.4 at last,while the precision of Qin et al.’s scheme and Hsu et al.’sscheme stay lower than 0.2 all the time. The results indicatethat both our scheme and Lowe’s provide a good invarianceto rotation change and a certain degree of invariance to imagescale, but Qin et al’s scheme and Hsu et al.’s scheme provideneither of the two properties.

2) Keypoint Matching Experiment: In Fig. 8, we providea comparison of three scheme’s keypoint matching resultsvisualized for illustration purpose. The images are selectedfrom real-world scenes taken from different viewpoints. Fora better illustration, we manually set the thresholds to haveeach scheme return only 10 matches, which contain correctmatches and false positives. It clearly shows that our schemeapproaches Lowe’s original SIFT and outperforms Qin et al.’sscheme and Hsu et al.’s scheme in finding correct matches.

3) Image Retrieval Experiment: In Fig. 9, we conductimage retrieval experiments to evaluate the descriptors inspiredfrom [32]. We randomly select 11 categories from Oxforddatabase [33] which consists of 5062 images collected fromFlickr as the dataset by searching for particular Oxford land-marks. Given a query image, it returns a ranked list of imagesthat contain the same objects by using the voted based method.The number below each returned image represents how manyvotes it receives. It clearly shows that our scheme receivesmuch more votes than the original sift. We think this may becaused by much more keypoints generated in our scheme. Wecan also see that Lowe’s scheme and ours generate less falsepositives than Qin et al’s and Hsu et al’s.

B. Efficiency Evaluation

In Fig. 10, we evaluate the computation efficiency ofprivacy-preserving SIFT outsourcing with the increase ofimage size on the data owner side. The time cost and commu-nication cost are computed for one image with different sizes.As has been analyzed in Section IV-1, Hsu et al’scheme incursan intolerable computation burden on the data owner, so we

Page 13: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

13

Lowe s scheme

Cor

nmar

ket

Pitt

Riv

ers

Bod

leia

n K

eble

473 395 355 263

423 111 22 22

485 431 431 431

40 37 34 25

Ours

676 666 623 395

700 228 37 36

766 623 623 623

187 77 77 77

Qin et al. s scheme

785 747 570 523

884 275 54 45

817 688 688 688

106 105 98 89

Hsu et al. s scheme

785 747 570 523

884 275 54 45

817 688 688 688

106 105 98 89

Fig. 9: A comparison of four schemes for the image matching application. The leftmost column consists of four query images while the right column imagesare the corresponding top-k (k=4) matching results of each scheme. Note that in each image the yellow rectangle depicts the matched object, and the imageswithout it are the false positives.

Qin et al. s scheme: 3/10 correct

Ours: 7/10 correct

Lowe s scheme: 8/10 correct

Hsu et al. s scheme: 3/10 correct

Fig. 8: A performance comparison of the visualized matching results. Here,real-world images are selected from Graffiti dataset taken from differentviewpoints. The top ten matches are shown for each algorithm: solid whitelines represent correct matches while dotted black lines represent false ones.

200×200 400×400 600×600 800×800 1000×1000Image size (pixels)

0

2

4

6

8

10

12

14

Tim

e c

ost

on t

he d

ata

ow

ner

(s) Without outsourcing

Qin et al. s scheme: with SIFT outsourcingOurs: with SIFT outsourcing

(a)

200×200 400×400 600×600 800×800 1000×1000Image size(pixels)

0.0

0.5

1.0

1.5

2.0

Com

mu.

cost

on t

he d

ata

ow

ner(

MB

)

Qin et al.'s scheme

Ours

(b)Fig. 10: A comparison of computation and communicationcosts on the data owner without and with outsourcing in twodifferent schemes.

0×0 40×40 80×80 120×120 160×160 200×200Image size(pixels)

0

500

1000

1500

2000

Tim

e c

ost

on t

he s

erv

er(

min

ute

s)

The BSCP

The garbled circuit

The BSMP

The SPP

(a)

0×0 40×40 80×80 120×120 160×160 200×200Image size(pixels)

0

200

400

600

800

1000

1200

1400

1600

Com

mu.

cost

betw

een s

erv

ers

(MB

) The BSCP

The garbled circuit

The BSMP

The SPP

(b)Fig. 11: A comparison of computation and communicationcosts on the cloud servers between using BSCP and GarbledCircuit, BSMP and SPP.

only compare our scheme with Qin et al.’s scheme. Fig. 10(a)shows the time cost of performing SIFT algorithm locallyand that of outsourcing of SIFT scheme. As can be seen,our scheme has an extreme low time cost on the data owner,which is nearly a constant value. On the contrary, without

outsourcing the time cost on the data owner increases linearlywith the image size. This is because without outsourcing thedata owner needs to perform full extraction algorithm whilein our scheme it only needs to do some addition operations.With the increase of image size, the benefit of outsourcingwill become more apparent. The side effect of outsourcingis generating some communication overheads on the dataowner. In our scheme, however, the communication cost is lowenough to be acceptable by the data owner as shown below.

As for the communication cost on the data owner side asillustrated in Fig. 10(b), since the data owner only sends twosplit images to the two cloud servers and receives two splitdescriptors with fixed size, our scheme and Qin et al.’s schemehave the same communication overhead which grows linearlywith the size of an image.

In Fig. 11, we evaluate the time cost and the communicationcost of our scheme on the server side. Note that the serversand the data owner are all simulated on machines of thesame configuration without using multithreading or any otherparallel techniques. To make a better illustration, we alsocompare i) the secure product protocol (SPP) proposed in [26]that adopts the Paillier cryptosystem [15] with our BSMP, andii) a general secure comparison protocol-garbled circuit [20]with our BSCP. Since the protocols BSMP and BSCP take upmost of the time costs (because of encryption and decryptionoperations) and are the only source of communication cost, weonly evaluate these two protocols and ignore the other stepssuch as building gaussian scale spaces, computing descriptorsetc. In our experiments, we set the Paillier cryptosystem with a1024-bit modulus and implement SHE with the aid of SIMD.Garbled circuit is tested using the lastest optimizations [34].

Fig. 11(a) shows the time cost on the servers. With theincrease of image size, the time cost in garbled circuit growsmuch faster than that of BSCP, and SPP also consumes muchmore than BSMP. According to our experimental results,when the image size grows to 500× 500, garbled circuit andSPP will cost more than 10854 minutes and 3799 minutesrespectively, while the costs in BSCP and BSMP are about 235minutes and 2.2 minutes respectively. Fig. 11(b) evaluates thecommunication cost between two servers which is generatedby using BSCP and BSMP. It also shows that communicationcost of BSMP is much lower than that of SPP, and BSCP alsooutperforms the garbled circuit greatly. Particularly, when the

Page 14: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

14

image size grows to 500×500, garbled circuit costs more than8458MB while BSCP costs about 2968MB, and SPP needsabout 2226MB while BSMP costs about 240MB. The aboveresults indicate that the proposed protocols BSCP and BSMPare more suitable for practical applications. Note that when theimage size grows to 3000 × 3000, our scheme will consumeabout 711 minutes and 8875MB in total. Our future work mayfocus on further reducing the time and communication costsfor large-size images, by exploring some parallel techniquesand more packing methods.

As for the interactions between two servers, as we haveshown in section VII, a constant number of round canbe achieved by sending all the encrypted data in one-timetransmission. The specific number of interactions depends onthe parameters chosen in somewhat homomorphic enryptionscheme. In our experiment, it needs dozens of rounds tocomplete the entire extraction process.

Finally, for the memory cost on the servers, for a n × nimage it at most needs 12 × n × n bytes where 12 is a pre-defined parameter to decide the number of scales in scalespace. Thus when n = 3000 the memory cost is about 103MB,which is still acceptable for an off-the-shelf machine, let alonea powerful cloud server.

X. FUTURE RESEARCH DIRECTIONS

In this section, we will give some discussions on theextensions of our scheme.

First, SIFT is now widely computed in a dense manner.Dense SIFT computes a descriptor for every pixel in theimage instead of locating points that we are only interested in.The other steps such as orientation assignment and descriptorcomputation are exactly the same as SIFT. Therefore, weclaim that our scheme can be directly applied to dense SIFToutsourcing. This is because the two servers only need toproceed the other steps identically without performing thekeypoint localization protocol.

Second, one may want to aggregate SIFT descriptors intoa compact representation by using Fish vectors [35] orVLAD [36]. In fact, it is also possible to compute VLADby using a secure outsourcing protocol. This can be achievedby using techniques in [37] to outsource k-means algorithm tothe cloud, and then computing aggregated descriptors whichonly needs addition operations. As for Fisher vectors, sinceit contains many derivative and division operations which aredifficult to be implemented in the ciphertext domain, we arenot sure whether it is suitable for outsourcing yet. We willinvestigate this issue in our future work.

Finally, feature descriptors are usually used in content-basedretrieval such as k-nn search. As far as we know, secureoutsourcing of k-nn search has been widely studied and manysolutions have been proposed such as [8]. In [8], the dataowner only needs to encrypt descriptors and sends ciphertextsto the two cloud servers. The cloud servers will then workcooperatively to complete the k-nn search task and return anaccurate search result without obtaining anything private aboutthe descriptors. This method can be directly combined with ourscheme such that the data owner can relieve himself from theburden of computationally-intensive search jobs.

XI. CONCLUSION

Privacy-preserving outsourcing of image feature extractionoffers the promise of obtaining feature descriptors dependenton private data without exposing the data owner’s privacy,while enjoying the abundant cloud computation resources.However, previous solutions for secure SIFT outsourcing haveeither security or efficiency issues, and none can well preservethe important characteristics of the original SIFT in terms ofdistinctiveness and robustness. In this work, we presented anew and novel privacy-preserving SIFT outsourcing protocolbased on newly proposed secure interactive protocols BSMPand BSCP. We both carefully analyze and extensively evaluatethe security and effectiveness of our design. Our experimentalresults show that our protocol outperforms the state-of-the-artand performs comparably to the original SIFT and is practicalfor real-world applications.

ACKNOWLEDGMENT

Qian’s research is supported in part by National Natural Sci-ence Foundation of China under Grant No. 61373167, NationalBasic Research Program of China (973 Program) under GrantNo. 2014CB340600, National High Technology Research andDevelopment Program of China (863 Program) under GrantNo. 2015AA016004, Wuhan Science and Technology Bureauunder Grant No. 2015010101010020, and Fundamental Re-search Funds for the Central Universities under Grant No.2042016kf0137. Kui’s research is supported in part by USNational Science Foundation under grants CNS-1262277. QianWang is the corresponding author.

REFERENCES

[1] Q. Wang, S. Hu, K. Ren, J. Wang, Z. Wang, and M. Du, “Catch me inthe dark: Effective privacy-preserving outsourcing of feature extractionsover image data,” in Proc. of INFOCOM’16, Accepted to appear, 2016.

[2] K. Ren, C. Wang, and Q. Wang, “Security challenges for the publiccloud,” IEEE Internet Computing, vol. 16, no. 1, pp. 69–73, 2012.

[3] Z. Ren, L. Wang, Q. Wang, and M. Xu, “Dynamic proofs of retrievabilityfor coded cloud storage systems,” IEEE Transactions on ServicesComputing, 2015, DOI: 10.1109/TSC.2015.2481880.

[4] Z. Fu, K. Ren, J. Shu, X. Sun, and F. Huang, “Enabling personalizedsearch over encrypted outsourced data with efficiency improvement,”IEEE Transactions on Parallel and Distributed Systems, 2015, DOI:10.1109/TPDS.2015.2506573.

[5] Z. Xia, X. Wang, X. Sun, and Q. Wang, “A secure and dynamicmulti-keyword ranked search scheme over encrypted cloud data,” IEEETransactions on Parallel and Distributed Systems, vol. 27, no. 2, pp.340–352, 2015.

[6] S. Hohenberger and A. Lysyanskaya, “How to securely outsourcecryptographic computations,” in Proc. of TCC’05. Springer, 2005, pp.264–282.

[7] S. Salinas, C. Luo, X. Chen, and P. Li, “Efficient secure outsourcingof large-scale linear systems of equations,” in Proc. of INFOCOM’15.IEEE, 2015, pp. 1035–1043.

[8] Y. Elmehdwi, B. K. Samanthula, and W. Jiang, “Secure k-nearestneighbor query over encrypted data in outsourced environments,” inProc. of ICDE’14. IEEE, 2014, pp. 664–675.

[9] L. Weng, L. Amsaleg, A. Morton, and S. Marchand-Maillet, “Aprivacy-preserving framework for large-scale content-based informationretrieval,” IEEE Transactions on Information Forensics and Security,vol. 10, no. 1, pp. 152–167, 2015.

[10] M. Osadchy, B. Pinkas, A. Jarrous, and B. Moskovich, “Scifi-a systemfor secure face identification,” in Proc. of S&P’10. IEEE, 2010, pp.239–254.

[11] Q. Wang, S. Hu, K. Ren, M. He, M. Du, and Z. Wang, “Cloudbi:Practical privacy-preserving outsourcing of biometric identification inthe cloud,” in Proc. of ESORICS’15. Springer, 2015, pp. 186–205.

Page 15: Securing SIFT: Privacy-preserving Outsourcing Computation of … Papers/2016 .Net/LSD1636 - Securing … · new research interest on privacy-preserving computations over outsourced

1057-7149 (c) 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TIP.2016.2568460, IEEETransactions on Image Processing

15

[12] L. Zhang, T. Jung, C. Liu, X. Ding, X.-Y. Li, and Y. Liu, “Pop: Privacy-preserving outsourced photo sharing and searching for mobile devices,”in Proc. of ICDCS’15. IEEE, 2015, pp. 308–317.

[13] L. Zhang, T. Jung, P. Feng, K. Liu, X.-Y. Li, and Y. Liu, “Pic: Enablelarge-scale privacy preserving content-based image search on cloud,” inProc. of ICPP’15. IEEE, 2015, pp. 949–958.

[14] C.-Y. Hsu, C.-S. Lu, and S.-C. Pei, “Image feature extraction inencrypted domain with privacy-preserving sift,” IEEE Transactions onImage Processing, vol. 21, no. 11, pp. 4593–4607, 2012.

[15] P. Paillier and D. Pointcheval, “Efficient public-key cryptosystemsprovably secure against active adversaries,” in Proc. of ASIACRYPT’99,1999, pp. 165–179.

[16] M. Schneider and T. Schneider, “Notes on non-interactive securecomparison in image feature extraction in the encrypted domain withprivacy-preserving sift,” in Proc. of IH & MMSec’14. ACM, 2014, pp.135–140.

[17] Z. Qin, J. Yan, K. Ren, C. W. Chen, and C. Wang, “Towards efficientprivacy-preserving image feature extraction in cloud computing,” inProc. of ACM MM’14. ACM, 2014, pp. 497–506.

[18] A. Boldyreva, N. Chenette, Y. Lee, and A. Oneill, “Order-preservingsymmetric encryption,” in Proc. of EUROCRYPT’09. Springer, 2009,pp. 224–241.

[19] S. Wang, M. Nassar, M. Atallah, and Q. Malluhi, “Secure and privateoutsourcing of shape-based feature extraction,” in Proc. of ICICS’13.Springer, 2013, pp. 90–99.

[20] Y. Huang, D. Evans, J. Katz, and L. Malka, “Faster secure two-party computation using garbled circuits.” in Proc. of USENIX SecuritySymposium, vol. 201, no. 1, 2011.

[21] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,”International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110,2004.

[22] Z. Brakerski and V. Vaikuntanathan, “Fully homomorphic encryptionfrom ring-lwe and security for key dependent messages,” in Proc. ofCRYPTO’11. Springer, 2011, pp. 505–524.

[23] M. Naehrig, K. Lauter, and V. Vaikuntanathan, “Can homomorphicencryption be practical?” in Proc. of CCSW’11. ACM, 2011, pp. 113–124.

[24] N. P. Smart and F. Vercauteren, “Fully homomorphic simd operations,”Designs, codes and cryptography, vol. 71, no. 1, pp. 57–81, 2014.

[25] I. Damgard, V. Pastro, N. Smart, and S. Zakarias, “Multiparty computa-tion from somewhat homomorphic encryption,” in Proc. of CRYPTO’12.Springer, 2012, pp. 643–662.

[26] B. Goethals, S. Laur, H. Lipmaa, and T. Mielikainen, “On private scalarproduct computation for privacy-preserving data mining,” in Proc. ofICISC’04. Springer, 2005, pp. 104–120.

[27] Y. Qi and M. J. Atallah, “Efficient privacy-preserving k-nearest neighborsearch,” in Proc. of ICDCS’08. IEEE, 2008, pp. 311–319.

[28] G. Griffin, A. Holub, and P. Perona, “Caltech-256 object categorydataset,” 2007.

[29] K. Mikolajczyk and C. Schmid, “A performance evaluation of localdescriptors,” IEEE Transactions on Pattern Analysis and Machine Intel-ligence, vol. 27, no. 10, pp. 1615–1630, 2005.

[30] http://www.robots.ox.ac.uk/∼vgg/research/affine/ .[31] Y. Ke and R. Sukthankar, “Pca-sift: A more distinctive representation

for local image descriptors,” in Proc. of CVPR’04, vol. 2. IEEE, 2004,pp. 506–513.

[32] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman, “Objectretrieval with large vocabularies and fast spatial matching,” in Proc.of CVPR’07. IEEE, 2007, pp. 1–8.

[33] http://www.robots.ox.ac.uk/∼vgg/data/oxbuildings/.[34] http://www.mightbeevil.org/.[35] F. Perronnin, J. Sanchez, and T. Mensink, “Improving the fisher kernel

for large-scale image classification,” in Proc. of ECCV’10. Springer,2010, pp. 143–156.

[36] H. Jegou, M. Douze, C. Schmid, and P. Perez, “Aggregating localdescriptors into a compact image representation,” in Proc. of CVPR’10.IEEE, 2010, pp. 3304–3311.

[37] D. Liu, E. Bertino, and X. Yi, “Privacy of outsourced k-means cluster-ing,” in Proc. of Asiaccs’14. ACM, 2014, pp. 123–134.

Shengshan Hu received the B.E. degree fromWuhan University, China, in 2014, in ComputerScience and Technology. He is currently a PhDcandidate in the School of Computer Science inWuhan University. His research interest focuses onsecure outsourcing of computations.

Qian Wang is a Professor with the School of Com-puter Science, Wuhan University, China. He receivedthe B.S. degree from Wuhan University, China, in2003, the M.S. degree from Shanghai Institute ofMicrosystem and Information Technology, ChineseAcademy of Sciences, China, in 2006, and thePh.D. degree from Illinois Institute of Technology,USA, in 2012, all in Electrical Engineering. Hisresearch interests include wireless network securityand privacy, cloud computing security, and appliedcryptography. Qian is an expert under “1000 Young

Talents Program” of China. He is a co-recipient of the Best Paper Award fromIEEE ICNP 2011. He is a Member of the IEEE and a Member of the ACM.

Jingjun Wang received his B.E. degree in computerscience in 2014 from China University of Miningand Technology. He is currently a graduate student inthe School of Computer Science, Wuhan University,China. His research interests include crowdsourcingand cloud computing security.

Zhan Qin is currently working toward his Ph.D.degree at the director of the Ubiquitous Securityand Privacy Research Laboratory (UbiSeC) in theComputer Science and Engineering Department ofthe State University of New York at Buffalo. Hisresearch interests are in the areas of cloud computingand security, with focus on differential privacy datacollection and publication, cybersecurity in smartgrid. He is a student member of the IEEE, IEEECOMSOC, and ACM.

Kui Ren is an Associate Professor of computerscience at State University of New York at Buffalo.He received his PhD degree from Worcester Poly-technic Institute and both BE and ME degrees fromZhejiang University. Kui’s research interests includeCloud Security, Wireless Security, and Smartphone-enabled Crowdsourcing Systems. His research hasbeen supported by NSF, DoE, AFRL, MSR, andAmazon. He is a recipient of NSF CAREER Awardin 2011 and Sigma Xi Research Excellence Award in2012. Kui has published 135 peer-review journal and

conference papers. Kui received several Best Paper Awards including IEEEICNP 2011. Kui currently serves as an associate editor for IEEE Transactionson Information Forensics and Security, IEEE Wireless Communications, IEEEInternet of Things Journal, IEEE Transactions on Smart Grid, IEEE Commu-nications Surveys and Tutorials, Elsevier Pervasive and Mobile Computing,and Oxford The Computer Journal. Kui is a Fellow of IEEE, a member ofACM, a Distinguished Lecturer of IEEE Vehicular Technology Society, anda past board member of Internet Privacy Task Force, State of Illinois.