17
Research Article A Layered Searchable Encryption Scheme with Functional Components Independent of Encryption Methods Guangchun Luo, Ningduo Peng, Ke Qin, and Aiguo Chen School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan 611731, China Correspondence should be addressed to Ningduo Peng; nindo [email protected] Received 18 October 2013; Accepted 8 January 2014; Published 25 February 2014 Academic Editors: H.-E. Tseng and G. Wei Copyright © 2014 Guangchun Luo et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Searchable encryption technique enables the users to securely store and search their documents over the remote semitrusted server, which is especially suitable for protecting sensitive data in the cloud. However, various settings (based on symmetric or asymmetric encryption) and functionalities (ranked keyword query, range query, phrase query, etc.) are oſten realized by different methods with different searchable structures that are generally not compatible with each other, which limits the scope of application and hinders the functional extensions. We prove that asymmetric searchable structure could be converted to symmetric structure, and functions could be modeled separately apart from the core searchable structure. Based on this observation, we propose a layered searchable encryption (LSE) scheme, which provides compatibility, flexibility, and security for various settings and functionalities. In this scheme, the outputs of the core searchable component based on either symmetric or asymmetric setting are converted to some uniform mappings, which are then transmitted to loosely coupled functional components to further filter the results. In such a way, all functional components could directly support both symmetric and asymmetric settings. Based on LSE, we propose two representative and novel constructions for ranked keyword query (previously only available in symmetric scheme) and range query (previously only available in asymmetric scheme). 1. Introduction Cloud storage provides an elastic, highly available, easily accessible, and cheap repository to users to store and use their data, and such a convenient way attracts more and more people. In many cases, the users require their sensitive data, such as business documents, to be secure against any adversary or even the cloud provider, and therefore all data must be encrypted before sending to the server [1]. However, traditional encryption schemes (e.g., DES) do not provide any functionality to the users such that searching for the desired documents by keywords, as a basic function for storage system, is quite impossible. e problem is that there is no way to know if there exists such keywords in an encrypted document without decryption, and apparently the server should not have the decryption key. Searchable encryption technique provides a solution to such problem. It enables the users to encrypt their sensitive data and store it to the remote server, while retaining the ability to search by keywords. While searching, the user sends to the server a secret token (a transformation of the queried keywords); then the server uses the token to search over the encrypted data and returns the matched documents. During the process, the server does not know what the queried keywords and the document contents are, and therefore the privacy is guaranteed. Many searchable encryption schemes have been proposed with various settings and functionalities. For symmetric searchable encryption schemes, the user encrypts, searches, and decrypts the documents using his/her private symmetric key. For asymmetric searchable encryption schemes, the data sender encrypts the documents using the user’s public key, Hindawi Publishing Corporation e Scientific World Journal Volume 2014, Article ID 153791, 16 pages http://dx.doi.org/10.1155/2014/153791

Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

Research ArticleA Layered Searchable Encryption Scheme with FunctionalComponents Independent of Encryption Methods

Guangchun Luo Ningduo Peng Ke Qin and Aiguo Chen

School of Computer Science and Engineering University of Electronic Science and Technology of China ChengduSichuan 611731 China

Correspondence should be addressed to Ningduo Peng nindo academia163com

Received 18 October 2013 Accepted 8 January 2014 Published 25 February 2014

Academic Editors H-E Tseng and G Wei

Copyright copy 2014 Guangchun Luo et alThis is an open access article distributed under theCreative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Searchable encryption technique enables the users to securely store and search their documents over the remote semitrusted serverwhich is especially suitable for protecting sensitive data in the cloud However various settings (based on symmetric or asymmetricencryption) and functionalities (ranked keyword query range query phrase query etc) are often realized by different methodswith different searchable structures that are generally not compatible with each other which limits the scope of application andhinders the functional extensions We prove that asymmetric searchable structure could be converted to symmetric structure andfunctions could be modeled separately apart from the core searchable structure Based on this observation we propose a layeredsearchable encryption (LSE) scheme which provides compatibility flexibility and security for various settings and functionalitiesIn this scheme the outputs of the core searchable component based on either symmetric or asymmetric setting are converted tosome uniformmappings which are then transmitted to loosely coupled functional components to further filter the results In sucha way all functional components could directly support both symmetric and asymmetric settings Based on LSE we propose tworepresentative and novel constructions for ranked keyword query (previously only available in symmetric scheme) and range query(previously only available in asymmetric scheme)

1 Introduction

Cloud storage provides an elastic highly available easilyaccessible and cheap repository to users to store and usetheir data and such a convenient way attracts more andmore people In many cases the users require their sensitivedata such as business documents to be secure against anyadversary or even the cloud provider and therefore all datamust be encrypted before sending to the server [1] Howevertraditional encryption schemes (eg DES) do not provide anyfunctionality to the users such that searching for the desireddocuments by keywords as a basic function for storagesystem is quite impossible The problem is that there is noway to know if there exists such keywords in an encrypteddocument without decryption and apparently the servershould not have the decryption key

Searchable encryption technique provides a solution tosuch problem It enables the users to encrypt their sensitivedata and store it to the remote server while retaining theability to search by keywordsWhile searching the user sendsto the server a secret token (a transformation of the queriedkeywords) then the server uses the token to search over theencrypted data and returns the matched documents Duringthe process the server does not know what the queriedkeywords and the document contents are and therefore theprivacy is guaranteed

Many searchable encryption schemes have been proposedwith various settings and functionalities For symmetricsearchable encryption schemes the user encrypts searchesand decrypts the documents using hisher private symmetrickey For asymmetric searchable encryption schemes the datasender encrypts the documents using the userrsquos public key

Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 153791 16 pageshttpdxdoiorg1011552014153791

2 The Scientific World Journal

and the user searches and decrypts the documents usingthe private key Beyond the basic keyword matching manyfunctions are also added to either symmetric or asymmetricsetting such as range query phrase query and fuzzy keywordquery

However these functions are often realized by differ-ent methods with different searchable structures which aregenerally not compatible with each other For example theasymmetric encryption scheme introduced in [2] realizedconjunctive subset and range queries However it is difficultto figure out how to apply this method to symmetric settingEven for the same setting such as the fuzzy query schemeintroduced in [3] and the rank-ordered query scheme intro-duced in [4] it is difficult to figure out how to combine twomethods together since the functions are constructed basedon different indexing structures

Layered searchable encryption (LSE) scheme aims toprovide compatibility flexibility and security for varioussettings and functionalities In this new framework key-words are firstly transformed to tokens that are filtered bythe core searchable component (symmetric or asymmetricsetting) and then the tokens are dynamically converted touniform mappings which are transmitted to many stand-alone functional components (eg ranked keyword querycomponent fuzzy query component etc) to further filterthe results Since all functional components are independentof each other and the interfaces are common the functionsare compatible with each other and directly support bothsymmetric and asymmetric settings and adding or deleting afunction is quite simple since each function is loosely coupledwith the core searchable component Furthermore LSEsupports combined query For example the query ldquoSELECTlowast WHERE keywords = ldquocloud storage encryptionrdquo ANDldquosecurity classification gt 5rdquo ORDERED BY ldquokeywordcloudrdquordquo(to express the query we adopt the SQL-like format used indatabase) is a combination of three functional componentsbasic query range query and ranked keyword query (inthis paper we will present the concrete construction for thisexample)

Furthermore this framework is similar to the data streamprocessing architecture [5] where functional componentscould be treated as operator boxes and the whole schemecould be treated as a data-flow system by which all processesfollow the popular boxes and arrows paradigmTherefore incomparison to the previous searchable encryption schemesLSE is more suitable for distributed and parallel computingenvironment

In this paper our contributions are the following (1)Wepropose a novel framework for designing searchable encryp-tion scheme called layered searchable encryption (LSE)which enables combined query and provides compatibilityflexibility and security for various settings and function-alities The new framework consists of a core searchablecomponent with a symmetricasymmetric converter manyfunctional components and a common interface with newsecurity model (2) We propose a concrete construction forLSE that could theoretically combine all possible function-alities which are proposed in the recent years and proveits semantic security for the interface (3) As a complement

for the prior works we formally define two new securitymodels for ranked keyword query and range query calledsemantic security against chosen ranked keyword attack(CRKA) and chosen range attack (CRA) respectively whichprovide integral security models for cryptographic analysis(4) Based on LSE we propose two representative and novelconstructions for ranked keyword query component (previ-ously only available in symmetric scheme) and range querycomponent (previously only available in asymmetric scheme)and prove them semantically secure under the new securitymodels

The rest of the paper is organized as follows Section 2presents the related work Section 3 presents the notationsand preliminaries Section 4 presents the layered searchableencryption scheme and the concrete construction Section 5discusses how to realize various functionalities and presentsthe concrete constructions for ranked keyword query andrange query Section 6 concludes this paper

2 Related Work

Searchable encryption schemes are designed to help the usersto securely search over the encrypted data by keywordsThe first scheme was introduced in [6] by Song et al andlater on many index-based symmetric searchable encryption(SSE) schemes were proposed Goh introduced the firstsecure index in [7] and they also built the security modelfor searchable encryption called Adaptive Chosen KeywordAttack (IND-CKA) In [8] Curtmola et al introduced twoconstructions to realize symmetric searchable encryptionthe first construction (named SSE-1) is nonadaptive andthe second one (named SSE-2) is adaptive A generalizationfor symmetric searchable encryption was introduced in [9]and a representative SSE system designed by Microsoft wasintroduced in [10] Another type of searchable encryptionnamed asymmetric searchable encryption (ASE) is public-key based which allows the user to search over the dataencrypted by some data senders using the public key of theuser The first scheme was introduced in [11] by Boneh etal based on bilinear maps and the improved definition wasintroduced in [12]

There are many functional extensions for the search-able encryption schemes beyond the basic precise keywordmatching For symmetric setting the authors in [4 13 14]introduced ranked keyword search schemes based on order-preserving encryption technique or two-round protocolwhich allows the server to only return the top-119896 relevantresults to the user In [15] Golle et al introduced a schemesupporting conjunctive keyword searchwhich allows the userto searchmultiple keywords in a single query In [3 16 17] theauthors introduced fuzzy keyword search schemes based onwildcard technique which allows the user to submit only partof the precise keyword Similar to fuzzy keyword search butdifferent the authors in [18 19] introduced similarity searchschemes based on wildcard technique which allows theserver to return the results similar to the queried keyword In[20 21] the authors introduced phrase query schemes basedon trusted client-side server or binary search which allows

The Scientific World Journal 3

the user to query a phrase instead of multiple independentkeywords For asymmetric setting the authors in [22 23]introduced range query schemes In addition Boneh et alalso introduced conjunctive and subset query in [22] basedon bilinear maps

Note that most of these techniques are not compatiblewith each other due to specific data structure andmathemati-cal property However in the following sections wewill provethat functional structures and searchable structures could beseparately constructed and asymmetric structures could beconverted to symmetric structures such that a compatible all-in-one scheme is possible

3 Notations and Preliminaries

We write 119909larr119880119883 to denote sampling element 119909 uniformly

random from a set 119883 and write 119909 larr A to denote the outputof an algorithmAWewrite 119886 119887 to denote the concatenationof two strings 119886 and 119887 We write |119860| to denote its cardinality if119860 is a set and write |119886| to denote its bit length if 119886 is a stringA function 120583(119896) N rarr R is negligible if for every positivepolynomial 119901(sdot) there exists an inter 119873 gt 0 such that for all119896 gt 119873 |120583(119896)| lt 1119901(119896) We write poly(119896) and negl(119896) todenote polynomial and negligible functions in 119896 respectively

We write Δ = (1199081 119908

119899) to denote a dictionary of 119899

words in lexicographic order We assume that all words areof length polynomial in 119896 We write 119889 to refer to a documentthat contains poly(119896) words and write |119889| to denote the sizeof the document in bytes In some cases we also write 119889 todenote the document identifier that uniquely identifies thedocument such as amemory locationWewrite X to denote acomponent or a scheme and write Xfunc( ) to denote thecorresponding function for the component or an algorithmin the scheme

4 Layered Searchable Encryption Scheme

Layered searchable encryption scheme aims to combinesymmetric and asymmetric searchable encryption schemesto provide a uniformmodel for functional extensionsThere-fore we first revisit the basic symmetric and asymmetricsearchable encryption models and then build the layeredsearchable encryption model based on these two differentsettings After that we introduce the security model ofthe new framework and finally we present the concreteconstruction

41 Revisiting Searchable Encryption Weadopt the definitionintroduced by Curtmola et al in [8] as a representativemodel for symmetric searchable encryption scheme In thissetting the user who searches for the documents is also thedata sender who encrypts the documents Therefore someefficient searching techniques such as using a global indexare used and the searchable structure may be a single indexfile for all stored documents For consistency with otherdefinitions we make a little modification for the originaldefinition and define the scheme as follows

Definition 1 (symmetric searchable encryption) A symmet-ric searchable encryption (SSE) scheme is a collection offive polynomial-time algorithms SSE = (Gen Enc TokenSearch Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takes asinput a security parameter 119896 and outputs a secret key119870 It is run by the user and the key is kept secret(120574 119862) larr Enc(119870119863) is a probabilistic algorithmthat takes as input a secret key 119870 and a documentcollection 119863 = (119889

1 119889

119899) and outputs a searchable

structure 120574 and a sequence of encrypted documents119862 = (119888

1 119888

119899) It enables a user to query some key-

words and the server returns thematched documentsFor instance in an index-based symmetric searchableencryption scheme 120574 is the secure index It is run bythe user and (120574 119862) is sent to the server119905 larr Token(119870 119908) is a deterministic algorithm thattakes as input a secret key 119870 and a keyword 119908 andoutputs a search token 119905 (also named trapdoor orcapacity) It is run by the user1198621015840larr Enc(119862 120574 119905) is a deterministic algorithm that

takes as input the encrypted documents 119862 thesearchable structure 120574 and the search token 119905 andoutputs the matched documents (or identifiers) 1198621015840 =(1198881 119888

119898) It is run by the server and1198621015840 is sent to the

user119889 larr Dec(119870 119888) is a deterministic algorithm that takesas input a secret key119870 and the encrypted document 119888and outputs the recovered plaintext 119889 It is run by theuser

We adopt the definition introduced by Boneh et al in[11] as a representative model for asymmetric searchableencryption scheme In this setting the user generates thepublic key and the private key The data sender encrypts thedata using the public key and the user searches and decryptsthe data using the private key The original definition onlycontains the searchable part for consistency we add twoalgorithms and define the asymmetric searchable encryptionas follows

Definition 2 (asymmetric searchable encryption) An asym-metric searchable encryption (ASE) scheme is a collection ofseven polynomial-time algorithms ASE = (Gen PEKS EncToken Test Search Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs a pub-licprivate key pair 119870 = (119870pub 119870priv) It is run by theuser and only119870priv is kept secret119904 larr PEKS(119870pub 119908) is a probabilistic algorithm thattakes as input a public key 119870pub and a word 119908 andoutputs a searchable structure 119904 It is run by the datasender and 119904 is attached to the encryptedmessage andthe combination is sent to the server119888 larr Enc(119870pub 119889) is a probabilistic algorithm thattakes as input a public key 119870pub and a document

4 The Scientific World Journal

(message) and outputs the ciphertext 119888 It is run bythe data sender and 119888 (followed bymultiple searchablestructures) is sent to the server

119905 larr Token(119870priv 119908) is a deterministic algorithm thattakes as input a private key119870priv and a keyword119908 andoutputs a search token 119905 It is run by the user

119887 larr Test(119870pub 119904 119905) is a deterministic algorithmthat takes as input the public key 119870pub a searchablestructure 119904 larrPEKS(119870pub 119908

1015840) and a search token 119905 larr

Token(119870priv 119908) and outputs 119887 = 1 if 119908 = 1199081015840 or 119887 = 0otherwise It is run by the server

1198621015840larr Search(119870pub 119862 119878 119905) is a deterministic algo-

rithm that takes as input the public key 119870pub theencrypted documents 119862 = (119862

1 119862

119899) the corre-

sponding searchable structure set 119878 = (1198781 119878

119899)

(each 119878119894containsmultiple searchable structures corre-

sponding to the keywords of the document) and thesearch token 119905 and outputs the matched documents1198621015840= (1198881 119888

119898) (the documentsrsquo searchable struc-

tures satisfying 1 larr Test(119870pub 119904 119905)) It is run by theserver and 1198621015840 is sent to the user

119889 larr Dec(119870priv 119888) is a probabilistic algorithm thattakes as input a private key 119870priv and a ciphertext 119888and outputs the plaintext 119889 It is run by the user

Unlike symmetric setting the definition of asymmetricsetting only works on a single document For a documentcollection it does not make any difference since the usercould execute the encryption algorithm for each documentrespectively

By comparing the definitions of the two different settingsthere exists a common link between the queried keywordsand the matched documents the searchable structure whichis constructed using either symmetric key or the public keyNote that the structure is probabilistic in the asymmetricsetting or else the server could directly launch the chosenplaintext attack using the public key However we say that forsymmetric and asymmetric settings the searchable structuresare both run-time deterministic To prove this property wefirst introduce a lemma as follows

Lemma 3 For asymmetric setting if the token 119905 generatedusing the private key 119870priv is deterministic then the searchablestructure 119904 encrypted using the public key 119870pub is run-time de-terministic when the the algorithm ASETest outputs 1 even ifthe encryption is probabilistic

Proof Recall that the algorithm 119905 larr Token(119870priv 119908) is deter-ministic and 119904 larr PEKS(119870pub 119908) is probabilistic Howeverfor a single document there only exists a single 119904 that linksto 119908 When 1 larr Test(119870pub 119904 119905) it implies that 119905 matches119904 We replace 119905 with 119904 then the token 119905 = 119904 which could begenerated by the data sender who could generate the tokenusing the public key 119905 larr Token1015840(119870pub 119908)which is in fact thealgorithm PEKS(119870pub 119908) It seems that the data sender hasindirectly generated the token without having the private key

Keywords

Tokenization

Converter

Mapping

Func1

Documents

Decryption

Core (symmetric asymmetric)

Filter

Func 2

Global layer

Local layerFunc

Figure 1 Architecture for layered searchable encryption scheme

Therefore when the output of the test is 1 both the token andthe searchable structure map to 119905 which is deterministic

Based on the lemma above we introduce a theoremwhichguides us to construct the converter in the layered searchableencryption scheme

Theorem 4 (run-time invariance) For both symmetric andasymmetric settings if the search token 119905 is deterministic thenthe searchable structure is run-time deterministic

Proof As proved in Lemma 3 the searchable structure is run-time deterministic for asymmetric setting For symmetricsetting the searchable structure is encrypted using the sym-metric key 120574 larr Enc(119870119863) which is probabilistic The token119905 larr Token(119870 119908) is deterministic Similar to asymmetricsetting when executing the deterministic algorithm Searchthematched entries (probabilistic)map to 119905 and themappingis deterministic (here an entry is the encrypted data usingsymmetric encryption that contains the information aboutthe matched document such as the node in the invertedindex [8]) In other words the searchable structure is run-time deterministic because of the deterministicmapping

42 Scheme Definition Our primary goal is to separate thefunctionalities from the searchable structures therefore weconsider to construct the basic searchable structures andvarious functions in different layers as shown in Figure 1

(i) Global layer we name this layer ldquoglobalrdquo because alldocuments and all searchable structures are involvedIn this layer the basic searchable encryption scheme(symmetric or asymmetric) is executed and a globalindex could be constructed to improve search effi-ciency The server receives the search tokens (eachtoken is related to a keyword) executes the searchprocedure and outputs the matched documents Fur-thermore the server converts the tokens (symmet-ric or asymmetric) to the corresponding mappings(another type of secret token) with uniform formatand transfers the mappings with the matched docu-ments (or identifiers) to the local layer

(ii) Local layer we name this layer ldquolocalrdquo because func-tional structures are constructed for each documentindependently In this layer each matched documentis further filtered by all functions (eg phrase query

The Scientific World Journal 5

function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user

For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows

Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870

119890 119863) is a probabilistic algorithm

that takes as input an encryption key 119870119890(119870119890= 119870priv

for symmetric setting or 119870119890= 119870pub for asymmetric

setting) and a document collection 119863 = (1198891 119889

119899)

It outputs 119899 encrypted documents 119862 = (1198881 119888

119899)

a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866

1 119866

119899) corresponding to 119899 documents

and a sequence of local functional structures 119871 =

(1198711 119871

119899) corresponding to 119899 encrypted docu-

ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908

1 119908

119900) with functional instruc-

tions and outputs the corresponding search tokens119879 = (119905

1 119905

119900) with functional instructions It is run

by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic

algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888

1 119888

119898) It is run by the

server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user

Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query

For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908

1 119908

119900)

(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions

A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows

Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows

119871119889larr Build(119889 119881

119889) is an algorithm that takes as input

a document 119889 and the corresponding conversion 119881119889

and outputs a functional structure 119871119889 It is run by

the data sender and 119871119889is appended to the encrypted

document

1198621015840larr Filter(119862 119871 119881

119879) is an algorithm that takes as

input the encrypted documents 119862 = (1198621 119862

119909)

the corresponding functional structure set 119871 =

(1198711 119871

119909) and the converted search tokens 119881

119879=

(1198811 119881

119909) and outputs a subset of documents 1198621015840 It

is run by the server

43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models

The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else

6 The Scientific World Journal

the security of some components will be correlated such thatthe loose coupling property is lost

Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface

Definition 7 (plain trace) Let119863 = (1198891 119889

119899) be a document

collection Let 119873 = (1198731 119873

119899) (only for asymmetric

setting) be a keyword-counter set where119873119894is the number of

keywords in 119889119894 Let the query history119882 = (119908

1 119908

119901) be a

sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882

119894= 119882119895and 0 otherwiseThe plain

trace 120587(119863119882) = (|1198891| |119889

119899| 119873 120590(119882))

Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface

Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889

1 119889

119899) a sequence of query

119882 = (1199081 119908

119901) and receives (119862 119866) larr Enc(119870

119890 119863)

and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881

119879as the input for

the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-

ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast

119879as the input for the

functional component FinallyA returns a bit 119887 thatis output by the experiment

We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (1)

where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here

since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately

The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example

if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component

Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later

44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface

441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888

1 119888

119899 Each 119888

119894(1 le 119894 le 119899) is linked to a

global searchable structure 119866119894(if a global index is used then

only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871

119894 and only

the global searchable structure 119866(1198661 119866

119899) is used in this

step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881

119879and the matched 119909 encrypted documents

are transmitted to functional components FC1 FC

119890to

further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888

1 119888

119898are

returned to the user and the user decrypts them to obtainthe plaintexts 119889

1 119889

119898 In order to construct a functional

component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1

We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency

The Scientific World Journal 7

Input Token

User Search

Core (SSEASE) Converter

Outputm le x le n

Decrypt

Filter Convert

Filter

W = (w1 w2 w0) T = (t1 t2 t0)

c1 c2 cn

G(G1 G2 Gn)

L1 L2 Ln

d1 d2 dm

c1 c2 cm

c1 c2 cx

G(G1 G2 Gx)

L1 L2 Lx

VT = (V1 V2 Vx)

FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe

Figure 2 Search process of layered searchable encryption scheme

Build(119889 119881119889)

(1) input a document 119889 and the mappings of all words 119881119889in 119889

(2) specified according to the functionality(3) output a local functional structure 119871

119889

Filter(119862 119871 119881119879)

(1) input a set of encrypted document 119862 = (1198881 119888

119909) the corresponding local functional

structures 119871 = (1198711 119871

119909) and the mappings of the queried keywords 119881

119879= (1198811 119881

119909)

(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862

Algorithm 1 Template for functional component FC

rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms

442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905

1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo

and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as

ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)

Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings

Suppose there are 119899 documents 119863 = (1198891 119889

119899) 119899 cor-

responding ciphertexts 119862 = (1198881 119888

119899) 119890 functional compo-

nents FC= (1198651 119865

119890) and 119900 queried keywords119882 = (119908

1

119908119900) In addition we define a hash function as follows

119891ℎ 0 1

lowast997888rarr 0 1

119897 (3)

where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891

ℎ then 119897

is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4

443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890

1and 1198903map to the

word ldquodayrdquo)

ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)

Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1

Then we have

ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)

In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed

8 The Scientific World Journal

Input the encryption key 119870119890= 119870priv the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871

1 119871

119899)

Method(1) compute (120574 119862) larr SSEsdotEnc(119870

119890 119863) Here 119862 = (119888

1 119888

119899)

(2) for each document 119889119894isin 119863 and the corresponding 119888

119894(1 le 119894 le 119899) do

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) for each word 119908119896(1 le 119896 le 119903) in119882 do

(5) compute 119905119896larr SSEsdotToken(119870

119890 119908119896)

(6) compute V119894119896larr 119891ℎ(119905119896)

(7) end for(8) let 119881

119894= (V1198941 V

119894119903)

(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(11) end for(12) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(13) end for(14) let 119866 = 120574 and 119871 = (119871

1 119871

119899) and output (119862 119866 119871)

Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)

Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871

1 119871

119899)

(5) 119879 search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888

1 119888

119909)

(2) for each token 119905119896(1 le 119896 le 119900) compute V

119896larr 119891ℎ(119905119896)

(3) let 1198811= 1198812= sdot sdot sdot = 119881

119909= (V1 V

119900) and

(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(7) end for

Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)

Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 2

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889

Algorithm 4 LSE scheme symmetric part

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 2: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

2 The Scientific World Journal

and the user searches and decrypts the documents usingthe private key Beyond the basic keyword matching manyfunctions are also added to either symmetric or asymmetricsetting such as range query phrase query and fuzzy keywordquery

However these functions are often realized by differ-ent methods with different searchable structures which aregenerally not compatible with each other For example theasymmetric encryption scheme introduced in [2] realizedconjunctive subset and range queries However it is difficultto figure out how to apply this method to symmetric settingEven for the same setting such as the fuzzy query schemeintroduced in [3] and the rank-ordered query scheme intro-duced in [4] it is difficult to figure out how to combine twomethods together since the functions are constructed basedon different indexing structures

Layered searchable encryption (LSE) scheme aims toprovide compatibility flexibility and security for varioussettings and functionalities In this new framework key-words are firstly transformed to tokens that are filtered bythe core searchable component (symmetric or asymmetricsetting) and then the tokens are dynamically converted touniform mappings which are transmitted to many stand-alone functional components (eg ranked keyword querycomponent fuzzy query component etc) to further filterthe results Since all functional components are independentof each other and the interfaces are common the functionsare compatible with each other and directly support bothsymmetric and asymmetric settings and adding or deleting afunction is quite simple since each function is loosely coupledwith the core searchable component Furthermore LSEsupports combined query For example the query ldquoSELECTlowast WHERE keywords = ldquocloud storage encryptionrdquo ANDldquosecurity classification gt 5rdquo ORDERED BY ldquokeywordcloudrdquordquo(to express the query we adopt the SQL-like format used indatabase) is a combination of three functional componentsbasic query range query and ranked keyword query (inthis paper we will present the concrete construction for thisexample)

Furthermore this framework is similar to the data streamprocessing architecture [5] where functional componentscould be treated as operator boxes and the whole schemecould be treated as a data-flow system by which all processesfollow the popular boxes and arrows paradigmTherefore incomparison to the previous searchable encryption schemesLSE is more suitable for distributed and parallel computingenvironment

In this paper our contributions are the following (1)Wepropose a novel framework for designing searchable encryp-tion scheme called layered searchable encryption (LSE)which enables combined query and provides compatibilityflexibility and security for various settings and function-alities The new framework consists of a core searchablecomponent with a symmetricasymmetric converter manyfunctional components and a common interface with newsecurity model (2) We propose a concrete construction forLSE that could theoretically combine all possible function-alities which are proposed in the recent years and proveits semantic security for the interface (3) As a complement

for the prior works we formally define two new securitymodels for ranked keyword query and range query calledsemantic security against chosen ranked keyword attack(CRKA) and chosen range attack (CRA) respectively whichprovide integral security models for cryptographic analysis(4) Based on LSE we propose two representative and novelconstructions for ranked keyword query component (previ-ously only available in symmetric scheme) and range querycomponent (previously only available in asymmetric scheme)and prove them semantically secure under the new securitymodels

The rest of the paper is organized as follows Section 2presents the related work Section 3 presents the notationsand preliminaries Section 4 presents the layered searchableencryption scheme and the concrete construction Section 5discusses how to realize various functionalities and presentsthe concrete constructions for ranked keyword query andrange query Section 6 concludes this paper

2 Related Work

Searchable encryption schemes are designed to help the usersto securely search over the encrypted data by keywordsThe first scheme was introduced in [6] by Song et al andlater on many index-based symmetric searchable encryption(SSE) schemes were proposed Goh introduced the firstsecure index in [7] and they also built the security modelfor searchable encryption called Adaptive Chosen KeywordAttack (IND-CKA) In [8] Curtmola et al introduced twoconstructions to realize symmetric searchable encryptionthe first construction (named SSE-1) is nonadaptive andthe second one (named SSE-2) is adaptive A generalizationfor symmetric searchable encryption was introduced in [9]and a representative SSE system designed by Microsoft wasintroduced in [10] Another type of searchable encryptionnamed asymmetric searchable encryption (ASE) is public-key based which allows the user to search over the dataencrypted by some data senders using the public key of theuser The first scheme was introduced in [11] by Boneh etal based on bilinear maps and the improved definition wasintroduced in [12]

There are many functional extensions for the search-able encryption schemes beyond the basic precise keywordmatching For symmetric setting the authors in [4 13 14]introduced ranked keyword search schemes based on order-preserving encryption technique or two-round protocolwhich allows the server to only return the top-119896 relevantresults to the user In [15] Golle et al introduced a schemesupporting conjunctive keyword searchwhich allows the userto searchmultiple keywords in a single query In [3 16 17] theauthors introduced fuzzy keyword search schemes based onwildcard technique which allows the user to submit only partof the precise keyword Similar to fuzzy keyword search butdifferent the authors in [18 19] introduced similarity searchschemes based on wildcard technique which allows theserver to return the results similar to the queried keyword In[20 21] the authors introduced phrase query schemes basedon trusted client-side server or binary search which allows

The Scientific World Journal 3

the user to query a phrase instead of multiple independentkeywords For asymmetric setting the authors in [22 23]introduced range query schemes In addition Boneh et alalso introduced conjunctive and subset query in [22] basedon bilinear maps

Note that most of these techniques are not compatiblewith each other due to specific data structure andmathemati-cal property However in the following sections wewill provethat functional structures and searchable structures could beseparately constructed and asymmetric structures could beconverted to symmetric structures such that a compatible all-in-one scheme is possible

3 Notations and Preliminaries

We write 119909larr119880119883 to denote sampling element 119909 uniformly

random from a set 119883 and write 119909 larr A to denote the outputof an algorithmAWewrite 119886 119887 to denote the concatenationof two strings 119886 and 119887 We write |119860| to denote its cardinality if119860 is a set and write |119886| to denote its bit length if 119886 is a stringA function 120583(119896) N rarr R is negligible if for every positivepolynomial 119901(sdot) there exists an inter 119873 gt 0 such that for all119896 gt 119873 |120583(119896)| lt 1119901(119896) We write poly(119896) and negl(119896) todenote polynomial and negligible functions in 119896 respectively

We write Δ = (1199081 119908

119899) to denote a dictionary of 119899

words in lexicographic order We assume that all words areof length polynomial in 119896 We write 119889 to refer to a documentthat contains poly(119896) words and write |119889| to denote the sizeof the document in bytes In some cases we also write 119889 todenote the document identifier that uniquely identifies thedocument such as amemory locationWewrite X to denote acomponent or a scheme and write Xfunc( ) to denote thecorresponding function for the component or an algorithmin the scheme

4 Layered Searchable Encryption Scheme

Layered searchable encryption scheme aims to combinesymmetric and asymmetric searchable encryption schemesto provide a uniformmodel for functional extensionsThere-fore we first revisit the basic symmetric and asymmetricsearchable encryption models and then build the layeredsearchable encryption model based on these two differentsettings After that we introduce the security model ofthe new framework and finally we present the concreteconstruction

41 Revisiting Searchable Encryption Weadopt the definitionintroduced by Curtmola et al in [8] as a representativemodel for symmetric searchable encryption scheme In thissetting the user who searches for the documents is also thedata sender who encrypts the documents Therefore someefficient searching techniques such as using a global indexare used and the searchable structure may be a single indexfile for all stored documents For consistency with otherdefinitions we make a little modification for the originaldefinition and define the scheme as follows

Definition 1 (symmetric searchable encryption) A symmet-ric searchable encryption (SSE) scheme is a collection offive polynomial-time algorithms SSE = (Gen Enc TokenSearch Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takes asinput a security parameter 119896 and outputs a secret key119870 It is run by the user and the key is kept secret(120574 119862) larr Enc(119870119863) is a probabilistic algorithmthat takes as input a secret key 119870 and a documentcollection 119863 = (119889

1 119889

119899) and outputs a searchable

structure 120574 and a sequence of encrypted documents119862 = (119888

1 119888

119899) It enables a user to query some key-

words and the server returns thematched documentsFor instance in an index-based symmetric searchableencryption scheme 120574 is the secure index It is run bythe user and (120574 119862) is sent to the server119905 larr Token(119870 119908) is a deterministic algorithm thattakes as input a secret key 119870 and a keyword 119908 andoutputs a search token 119905 (also named trapdoor orcapacity) It is run by the user1198621015840larr Enc(119862 120574 119905) is a deterministic algorithm that

takes as input the encrypted documents 119862 thesearchable structure 120574 and the search token 119905 andoutputs the matched documents (or identifiers) 1198621015840 =(1198881 119888

119898) It is run by the server and1198621015840 is sent to the

user119889 larr Dec(119870 119888) is a deterministic algorithm that takesas input a secret key119870 and the encrypted document 119888and outputs the recovered plaintext 119889 It is run by theuser

We adopt the definition introduced by Boneh et al in[11] as a representative model for asymmetric searchableencryption scheme In this setting the user generates thepublic key and the private key The data sender encrypts thedata using the public key and the user searches and decryptsthe data using the private key The original definition onlycontains the searchable part for consistency we add twoalgorithms and define the asymmetric searchable encryptionas follows

Definition 2 (asymmetric searchable encryption) An asym-metric searchable encryption (ASE) scheme is a collection ofseven polynomial-time algorithms ASE = (Gen PEKS EncToken Test Search Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs a pub-licprivate key pair 119870 = (119870pub 119870priv) It is run by theuser and only119870priv is kept secret119904 larr PEKS(119870pub 119908) is a probabilistic algorithm thattakes as input a public key 119870pub and a word 119908 andoutputs a searchable structure 119904 It is run by the datasender and 119904 is attached to the encryptedmessage andthe combination is sent to the server119888 larr Enc(119870pub 119889) is a probabilistic algorithm thattakes as input a public key 119870pub and a document

4 The Scientific World Journal

(message) and outputs the ciphertext 119888 It is run bythe data sender and 119888 (followed bymultiple searchablestructures) is sent to the server

119905 larr Token(119870priv 119908) is a deterministic algorithm thattakes as input a private key119870priv and a keyword119908 andoutputs a search token 119905 It is run by the user

119887 larr Test(119870pub 119904 119905) is a deterministic algorithmthat takes as input the public key 119870pub a searchablestructure 119904 larrPEKS(119870pub 119908

1015840) and a search token 119905 larr

Token(119870priv 119908) and outputs 119887 = 1 if 119908 = 1199081015840 or 119887 = 0otherwise It is run by the server

1198621015840larr Search(119870pub 119862 119878 119905) is a deterministic algo-

rithm that takes as input the public key 119870pub theencrypted documents 119862 = (119862

1 119862

119899) the corre-

sponding searchable structure set 119878 = (1198781 119878

119899)

(each 119878119894containsmultiple searchable structures corre-

sponding to the keywords of the document) and thesearch token 119905 and outputs the matched documents1198621015840= (1198881 119888

119898) (the documentsrsquo searchable struc-

tures satisfying 1 larr Test(119870pub 119904 119905)) It is run by theserver and 1198621015840 is sent to the user

119889 larr Dec(119870priv 119888) is a probabilistic algorithm thattakes as input a private key 119870priv and a ciphertext 119888and outputs the plaintext 119889 It is run by the user

Unlike symmetric setting the definition of asymmetricsetting only works on a single document For a documentcollection it does not make any difference since the usercould execute the encryption algorithm for each documentrespectively

By comparing the definitions of the two different settingsthere exists a common link between the queried keywordsand the matched documents the searchable structure whichis constructed using either symmetric key or the public keyNote that the structure is probabilistic in the asymmetricsetting or else the server could directly launch the chosenplaintext attack using the public key However we say that forsymmetric and asymmetric settings the searchable structuresare both run-time deterministic To prove this property wefirst introduce a lemma as follows

Lemma 3 For asymmetric setting if the token 119905 generatedusing the private key 119870priv is deterministic then the searchablestructure 119904 encrypted using the public key 119870pub is run-time de-terministic when the the algorithm ASETest outputs 1 even ifthe encryption is probabilistic

Proof Recall that the algorithm 119905 larr Token(119870priv 119908) is deter-ministic and 119904 larr PEKS(119870pub 119908) is probabilistic Howeverfor a single document there only exists a single 119904 that linksto 119908 When 1 larr Test(119870pub 119904 119905) it implies that 119905 matches119904 We replace 119905 with 119904 then the token 119905 = 119904 which could begenerated by the data sender who could generate the tokenusing the public key 119905 larr Token1015840(119870pub 119908)which is in fact thealgorithm PEKS(119870pub 119908) It seems that the data sender hasindirectly generated the token without having the private key

Keywords

Tokenization

Converter

Mapping

Func1

Documents

Decryption

Core (symmetric asymmetric)

Filter

Func 2

Global layer

Local layerFunc

Figure 1 Architecture for layered searchable encryption scheme

Therefore when the output of the test is 1 both the token andthe searchable structure map to 119905 which is deterministic

Based on the lemma above we introduce a theoremwhichguides us to construct the converter in the layered searchableencryption scheme

Theorem 4 (run-time invariance) For both symmetric andasymmetric settings if the search token 119905 is deterministic thenthe searchable structure is run-time deterministic

Proof As proved in Lemma 3 the searchable structure is run-time deterministic for asymmetric setting For symmetricsetting the searchable structure is encrypted using the sym-metric key 120574 larr Enc(119870119863) which is probabilistic The token119905 larr Token(119870 119908) is deterministic Similar to asymmetricsetting when executing the deterministic algorithm Searchthematched entries (probabilistic)map to 119905 and themappingis deterministic (here an entry is the encrypted data usingsymmetric encryption that contains the information aboutthe matched document such as the node in the invertedindex [8]) In other words the searchable structure is run-time deterministic because of the deterministicmapping

42 Scheme Definition Our primary goal is to separate thefunctionalities from the searchable structures therefore weconsider to construct the basic searchable structures andvarious functions in different layers as shown in Figure 1

(i) Global layer we name this layer ldquoglobalrdquo because alldocuments and all searchable structures are involvedIn this layer the basic searchable encryption scheme(symmetric or asymmetric) is executed and a globalindex could be constructed to improve search effi-ciency The server receives the search tokens (eachtoken is related to a keyword) executes the searchprocedure and outputs the matched documents Fur-thermore the server converts the tokens (symmet-ric or asymmetric) to the corresponding mappings(another type of secret token) with uniform formatand transfers the mappings with the matched docu-ments (or identifiers) to the local layer

(ii) Local layer we name this layer ldquolocalrdquo because func-tional structures are constructed for each documentindependently In this layer each matched documentis further filtered by all functions (eg phrase query

The Scientific World Journal 5

function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user

For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows

Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870

119890 119863) is a probabilistic algorithm

that takes as input an encryption key 119870119890(119870119890= 119870priv

for symmetric setting or 119870119890= 119870pub for asymmetric

setting) and a document collection 119863 = (1198891 119889

119899)

It outputs 119899 encrypted documents 119862 = (1198881 119888

119899)

a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866

1 119866

119899) corresponding to 119899 documents

and a sequence of local functional structures 119871 =

(1198711 119871

119899) corresponding to 119899 encrypted docu-

ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908

1 119908

119900) with functional instruc-

tions and outputs the corresponding search tokens119879 = (119905

1 119905

119900) with functional instructions It is run

by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic

algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888

1 119888

119898) It is run by the

server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user

Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query

For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908

1 119908

119900)

(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions

A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows

Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows

119871119889larr Build(119889 119881

119889) is an algorithm that takes as input

a document 119889 and the corresponding conversion 119881119889

and outputs a functional structure 119871119889 It is run by

the data sender and 119871119889is appended to the encrypted

document

1198621015840larr Filter(119862 119871 119881

119879) is an algorithm that takes as

input the encrypted documents 119862 = (1198621 119862

119909)

the corresponding functional structure set 119871 =

(1198711 119871

119909) and the converted search tokens 119881

119879=

(1198811 119881

119909) and outputs a subset of documents 1198621015840 It

is run by the server

43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models

The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else

6 The Scientific World Journal

the security of some components will be correlated such thatthe loose coupling property is lost

Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface

Definition 7 (plain trace) Let119863 = (1198891 119889

119899) be a document

collection Let 119873 = (1198731 119873

119899) (only for asymmetric

setting) be a keyword-counter set where119873119894is the number of

keywords in 119889119894 Let the query history119882 = (119908

1 119908

119901) be a

sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882

119894= 119882119895and 0 otherwiseThe plain

trace 120587(119863119882) = (|1198891| |119889

119899| 119873 120590(119882))

Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface

Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889

1 119889

119899) a sequence of query

119882 = (1199081 119908

119901) and receives (119862 119866) larr Enc(119870

119890 119863)

and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881

119879as the input for

the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-

ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast

119879as the input for the

functional component FinallyA returns a bit 119887 thatis output by the experiment

We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (1)

where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here

since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately

The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example

if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component

Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later

44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface

441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888

1 119888

119899 Each 119888

119894(1 le 119894 le 119899) is linked to a

global searchable structure 119866119894(if a global index is used then

only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871

119894 and only

the global searchable structure 119866(1198661 119866

119899) is used in this

step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881

119879and the matched 119909 encrypted documents

are transmitted to functional components FC1 FC

119890to

further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888

1 119888

119898are

returned to the user and the user decrypts them to obtainthe plaintexts 119889

1 119889

119898 In order to construct a functional

component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1

We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency

The Scientific World Journal 7

Input Token

User Search

Core (SSEASE) Converter

Outputm le x le n

Decrypt

Filter Convert

Filter

W = (w1 w2 w0) T = (t1 t2 t0)

c1 c2 cn

G(G1 G2 Gn)

L1 L2 Ln

d1 d2 dm

c1 c2 cm

c1 c2 cx

G(G1 G2 Gx)

L1 L2 Lx

VT = (V1 V2 Vx)

FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe

Figure 2 Search process of layered searchable encryption scheme

Build(119889 119881119889)

(1) input a document 119889 and the mappings of all words 119881119889in 119889

(2) specified according to the functionality(3) output a local functional structure 119871

119889

Filter(119862 119871 119881119879)

(1) input a set of encrypted document 119862 = (1198881 119888

119909) the corresponding local functional

structures 119871 = (1198711 119871

119909) and the mappings of the queried keywords 119881

119879= (1198811 119881

119909)

(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862

Algorithm 1 Template for functional component FC

rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms

442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905

1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo

and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as

ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)

Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings

Suppose there are 119899 documents 119863 = (1198891 119889

119899) 119899 cor-

responding ciphertexts 119862 = (1198881 119888

119899) 119890 functional compo-

nents FC= (1198651 119865

119890) and 119900 queried keywords119882 = (119908

1

119908119900) In addition we define a hash function as follows

119891ℎ 0 1

lowast997888rarr 0 1

119897 (3)

where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891

ℎ then 119897

is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4

443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890

1and 1198903map to the

word ldquodayrdquo)

ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)

Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1

Then we have

ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)

In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed

8 The Scientific World Journal

Input the encryption key 119870119890= 119870priv the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871

1 119871

119899)

Method(1) compute (120574 119862) larr SSEsdotEnc(119870

119890 119863) Here 119862 = (119888

1 119888

119899)

(2) for each document 119889119894isin 119863 and the corresponding 119888

119894(1 le 119894 le 119899) do

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) for each word 119908119896(1 le 119896 le 119903) in119882 do

(5) compute 119905119896larr SSEsdotToken(119870

119890 119908119896)

(6) compute V119894119896larr 119891ℎ(119905119896)

(7) end for(8) let 119881

119894= (V1198941 V

119894119903)

(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(11) end for(12) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(13) end for(14) let 119866 = 120574 and 119871 = (119871

1 119871

119899) and output (119862 119866 119871)

Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)

Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871

1 119871

119899)

(5) 119879 search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888

1 119888

119909)

(2) for each token 119905119896(1 le 119896 le 119900) compute V

119896larr 119891ℎ(119905119896)

(3) let 1198811= 1198812= sdot sdot sdot = 119881

119909= (V1 V

119900) and

(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(7) end for

Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)

Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 2

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889

Algorithm 4 LSE scheme symmetric part

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 3: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

The Scientific World Journal 3

the user to query a phrase instead of multiple independentkeywords For asymmetric setting the authors in [22 23]introduced range query schemes In addition Boneh et alalso introduced conjunctive and subset query in [22] basedon bilinear maps

Note that most of these techniques are not compatiblewith each other due to specific data structure andmathemati-cal property However in the following sections wewill provethat functional structures and searchable structures could beseparately constructed and asymmetric structures could beconverted to symmetric structures such that a compatible all-in-one scheme is possible

3 Notations and Preliminaries

We write 119909larr119880119883 to denote sampling element 119909 uniformly

random from a set 119883 and write 119909 larr A to denote the outputof an algorithmAWewrite 119886 119887 to denote the concatenationof two strings 119886 and 119887 We write |119860| to denote its cardinality if119860 is a set and write |119886| to denote its bit length if 119886 is a stringA function 120583(119896) N rarr R is negligible if for every positivepolynomial 119901(sdot) there exists an inter 119873 gt 0 such that for all119896 gt 119873 |120583(119896)| lt 1119901(119896) We write poly(119896) and negl(119896) todenote polynomial and negligible functions in 119896 respectively

We write Δ = (1199081 119908

119899) to denote a dictionary of 119899

words in lexicographic order We assume that all words areof length polynomial in 119896 We write 119889 to refer to a documentthat contains poly(119896) words and write |119889| to denote the sizeof the document in bytes In some cases we also write 119889 todenote the document identifier that uniquely identifies thedocument such as amemory locationWewrite X to denote acomponent or a scheme and write Xfunc( ) to denote thecorresponding function for the component or an algorithmin the scheme

4 Layered Searchable Encryption Scheme

Layered searchable encryption scheme aims to combinesymmetric and asymmetric searchable encryption schemesto provide a uniformmodel for functional extensionsThere-fore we first revisit the basic symmetric and asymmetricsearchable encryption models and then build the layeredsearchable encryption model based on these two differentsettings After that we introduce the security model ofthe new framework and finally we present the concreteconstruction

41 Revisiting Searchable Encryption Weadopt the definitionintroduced by Curtmola et al in [8] as a representativemodel for symmetric searchable encryption scheme In thissetting the user who searches for the documents is also thedata sender who encrypts the documents Therefore someefficient searching techniques such as using a global indexare used and the searchable structure may be a single indexfile for all stored documents For consistency with otherdefinitions we make a little modification for the originaldefinition and define the scheme as follows

Definition 1 (symmetric searchable encryption) A symmet-ric searchable encryption (SSE) scheme is a collection offive polynomial-time algorithms SSE = (Gen Enc TokenSearch Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takes asinput a security parameter 119896 and outputs a secret key119870 It is run by the user and the key is kept secret(120574 119862) larr Enc(119870119863) is a probabilistic algorithmthat takes as input a secret key 119870 and a documentcollection 119863 = (119889

1 119889

119899) and outputs a searchable

structure 120574 and a sequence of encrypted documents119862 = (119888

1 119888

119899) It enables a user to query some key-

words and the server returns thematched documentsFor instance in an index-based symmetric searchableencryption scheme 120574 is the secure index It is run bythe user and (120574 119862) is sent to the server119905 larr Token(119870 119908) is a deterministic algorithm thattakes as input a secret key 119870 and a keyword 119908 andoutputs a search token 119905 (also named trapdoor orcapacity) It is run by the user1198621015840larr Enc(119862 120574 119905) is a deterministic algorithm that

takes as input the encrypted documents 119862 thesearchable structure 120574 and the search token 119905 andoutputs the matched documents (or identifiers) 1198621015840 =(1198881 119888

119898) It is run by the server and1198621015840 is sent to the

user119889 larr Dec(119870 119888) is a deterministic algorithm that takesas input a secret key119870 and the encrypted document 119888and outputs the recovered plaintext 119889 It is run by theuser

We adopt the definition introduced by Boneh et al in[11] as a representative model for asymmetric searchableencryption scheme In this setting the user generates thepublic key and the private key The data sender encrypts thedata using the public key and the user searches and decryptsthe data using the private key The original definition onlycontains the searchable part for consistency we add twoalgorithms and define the asymmetric searchable encryptionas follows

Definition 2 (asymmetric searchable encryption) An asym-metric searchable encryption (ASE) scheme is a collection ofseven polynomial-time algorithms ASE = (Gen PEKS EncToken Test Search Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs a pub-licprivate key pair 119870 = (119870pub 119870priv) It is run by theuser and only119870priv is kept secret119904 larr PEKS(119870pub 119908) is a probabilistic algorithm thattakes as input a public key 119870pub and a word 119908 andoutputs a searchable structure 119904 It is run by the datasender and 119904 is attached to the encryptedmessage andthe combination is sent to the server119888 larr Enc(119870pub 119889) is a probabilistic algorithm thattakes as input a public key 119870pub and a document

4 The Scientific World Journal

(message) and outputs the ciphertext 119888 It is run bythe data sender and 119888 (followed bymultiple searchablestructures) is sent to the server

119905 larr Token(119870priv 119908) is a deterministic algorithm thattakes as input a private key119870priv and a keyword119908 andoutputs a search token 119905 It is run by the user

119887 larr Test(119870pub 119904 119905) is a deterministic algorithmthat takes as input the public key 119870pub a searchablestructure 119904 larrPEKS(119870pub 119908

1015840) and a search token 119905 larr

Token(119870priv 119908) and outputs 119887 = 1 if 119908 = 1199081015840 or 119887 = 0otherwise It is run by the server

1198621015840larr Search(119870pub 119862 119878 119905) is a deterministic algo-

rithm that takes as input the public key 119870pub theencrypted documents 119862 = (119862

1 119862

119899) the corre-

sponding searchable structure set 119878 = (1198781 119878

119899)

(each 119878119894containsmultiple searchable structures corre-

sponding to the keywords of the document) and thesearch token 119905 and outputs the matched documents1198621015840= (1198881 119888

119898) (the documentsrsquo searchable struc-

tures satisfying 1 larr Test(119870pub 119904 119905)) It is run by theserver and 1198621015840 is sent to the user

119889 larr Dec(119870priv 119888) is a probabilistic algorithm thattakes as input a private key 119870priv and a ciphertext 119888and outputs the plaintext 119889 It is run by the user

Unlike symmetric setting the definition of asymmetricsetting only works on a single document For a documentcollection it does not make any difference since the usercould execute the encryption algorithm for each documentrespectively

By comparing the definitions of the two different settingsthere exists a common link between the queried keywordsand the matched documents the searchable structure whichis constructed using either symmetric key or the public keyNote that the structure is probabilistic in the asymmetricsetting or else the server could directly launch the chosenplaintext attack using the public key However we say that forsymmetric and asymmetric settings the searchable structuresare both run-time deterministic To prove this property wefirst introduce a lemma as follows

Lemma 3 For asymmetric setting if the token 119905 generatedusing the private key 119870priv is deterministic then the searchablestructure 119904 encrypted using the public key 119870pub is run-time de-terministic when the the algorithm ASETest outputs 1 even ifthe encryption is probabilistic

Proof Recall that the algorithm 119905 larr Token(119870priv 119908) is deter-ministic and 119904 larr PEKS(119870pub 119908) is probabilistic Howeverfor a single document there only exists a single 119904 that linksto 119908 When 1 larr Test(119870pub 119904 119905) it implies that 119905 matches119904 We replace 119905 with 119904 then the token 119905 = 119904 which could begenerated by the data sender who could generate the tokenusing the public key 119905 larr Token1015840(119870pub 119908)which is in fact thealgorithm PEKS(119870pub 119908) It seems that the data sender hasindirectly generated the token without having the private key

Keywords

Tokenization

Converter

Mapping

Func1

Documents

Decryption

Core (symmetric asymmetric)

Filter

Func 2

Global layer

Local layerFunc

Figure 1 Architecture for layered searchable encryption scheme

Therefore when the output of the test is 1 both the token andthe searchable structure map to 119905 which is deterministic

Based on the lemma above we introduce a theoremwhichguides us to construct the converter in the layered searchableencryption scheme

Theorem 4 (run-time invariance) For both symmetric andasymmetric settings if the search token 119905 is deterministic thenthe searchable structure is run-time deterministic

Proof As proved in Lemma 3 the searchable structure is run-time deterministic for asymmetric setting For symmetricsetting the searchable structure is encrypted using the sym-metric key 120574 larr Enc(119870119863) which is probabilistic The token119905 larr Token(119870 119908) is deterministic Similar to asymmetricsetting when executing the deterministic algorithm Searchthematched entries (probabilistic)map to 119905 and themappingis deterministic (here an entry is the encrypted data usingsymmetric encryption that contains the information aboutthe matched document such as the node in the invertedindex [8]) In other words the searchable structure is run-time deterministic because of the deterministicmapping

42 Scheme Definition Our primary goal is to separate thefunctionalities from the searchable structures therefore weconsider to construct the basic searchable structures andvarious functions in different layers as shown in Figure 1

(i) Global layer we name this layer ldquoglobalrdquo because alldocuments and all searchable structures are involvedIn this layer the basic searchable encryption scheme(symmetric or asymmetric) is executed and a globalindex could be constructed to improve search effi-ciency The server receives the search tokens (eachtoken is related to a keyword) executes the searchprocedure and outputs the matched documents Fur-thermore the server converts the tokens (symmet-ric or asymmetric) to the corresponding mappings(another type of secret token) with uniform formatand transfers the mappings with the matched docu-ments (or identifiers) to the local layer

(ii) Local layer we name this layer ldquolocalrdquo because func-tional structures are constructed for each documentindependently In this layer each matched documentis further filtered by all functions (eg phrase query

The Scientific World Journal 5

function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user

For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows

Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870

119890 119863) is a probabilistic algorithm

that takes as input an encryption key 119870119890(119870119890= 119870priv

for symmetric setting or 119870119890= 119870pub for asymmetric

setting) and a document collection 119863 = (1198891 119889

119899)

It outputs 119899 encrypted documents 119862 = (1198881 119888

119899)

a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866

1 119866

119899) corresponding to 119899 documents

and a sequence of local functional structures 119871 =

(1198711 119871

119899) corresponding to 119899 encrypted docu-

ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908

1 119908

119900) with functional instruc-

tions and outputs the corresponding search tokens119879 = (119905

1 119905

119900) with functional instructions It is run

by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic

algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888

1 119888

119898) It is run by the

server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user

Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query

For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908

1 119908

119900)

(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions

A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows

Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows

119871119889larr Build(119889 119881

119889) is an algorithm that takes as input

a document 119889 and the corresponding conversion 119881119889

and outputs a functional structure 119871119889 It is run by

the data sender and 119871119889is appended to the encrypted

document

1198621015840larr Filter(119862 119871 119881

119879) is an algorithm that takes as

input the encrypted documents 119862 = (1198621 119862

119909)

the corresponding functional structure set 119871 =

(1198711 119871

119909) and the converted search tokens 119881

119879=

(1198811 119881

119909) and outputs a subset of documents 1198621015840 It

is run by the server

43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models

The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else

6 The Scientific World Journal

the security of some components will be correlated such thatthe loose coupling property is lost

Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface

Definition 7 (plain trace) Let119863 = (1198891 119889

119899) be a document

collection Let 119873 = (1198731 119873

119899) (only for asymmetric

setting) be a keyword-counter set where119873119894is the number of

keywords in 119889119894 Let the query history119882 = (119908

1 119908

119901) be a

sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882

119894= 119882119895and 0 otherwiseThe plain

trace 120587(119863119882) = (|1198891| |119889

119899| 119873 120590(119882))

Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface

Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889

1 119889

119899) a sequence of query

119882 = (1199081 119908

119901) and receives (119862 119866) larr Enc(119870

119890 119863)

and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881

119879as the input for

the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-

ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast

119879as the input for the

functional component FinallyA returns a bit 119887 thatis output by the experiment

We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (1)

where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here

since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately

The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example

if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component

Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later

44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface

441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888

1 119888

119899 Each 119888

119894(1 le 119894 le 119899) is linked to a

global searchable structure 119866119894(if a global index is used then

only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871

119894 and only

the global searchable structure 119866(1198661 119866

119899) is used in this

step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881

119879and the matched 119909 encrypted documents

are transmitted to functional components FC1 FC

119890to

further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888

1 119888

119898are

returned to the user and the user decrypts them to obtainthe plaintexts 119889

1 119889

119898 In order to construct a functional

component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1

We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency

The Scientific World Journal 7

Input Token

User Search

Core (SSEASE) Converter

Outputm le x le n

Decrypt

Filter Convert

Filter

W = (w1 w2 w0) T = (t1 t2 t0)

c1 c2 cn

G(G1 G2 Gn)

L1 L2 Ln

d1 d2 dm

c1 c2 cm

c1 c2 cx

G(G1 G2 Gx)

L1 L2 Lx

VT = (V1 V2 Vx)

FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe

Figure 2 Search process of layered searchable encryption scheme

Build(119889 119881119889)

(1) input a document 119889 and the mappings of all words 119881119889in 119889

(2) specified according to the functionality(3) output a local functional structure 119871

119889

Filter(119862 119871 119881119879)

(1) input a set of encrypted document 119862 = (1198881 119888

119909) the corresponding local functional

structures 119871 = (1198711 119871

119909) and the mappings of the queried keywords 119881

119879= (1198811 119881

119909)

(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862

Algorithm 1 Template for functional component FC

rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms

442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905

1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo

and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as

ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)

Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings

Suppose there are 119899 documents 119863 = (1198891 119889

119899) 119899 cor-

responding ciphertexts 119862 = (1198881 119888

119899) 119890 functional compo-

nents FC= (1198651 119865

119890) and 119900 queried keywords119882 = (119908

1

119908119900) In addition we define a hash function as follows

119891ℎ 0 1

lowast997888rarr 0 1

119897 (3)

where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891

ℎ then 119897

is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4

443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890

1and 1198903map to the

word ldquodayrdquo)

ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)

Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1

Then we have

ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)

In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed

8 The Scientific World Journal

Input the encryption key 119870119890= 119870priv the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871

1 119871

119899)

Method(1) compute (120574 119862) larr SSEsdotEnc(119870

119890 119863) Here 119862 = (119888

1 119888

119899)

(2) for each document 119889119894isin 119863 and the corresponding 119888

119894(1 le 119894 le 119899) do

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) for each word 119908119896(1 le 119896 le 119903) in119882 do

(5) compute 119905119896larr SSEsdotToken(119870

119890 119908119896)

(6) compute V119894119896larr 119891ℎ(119905119896)

(7) end for(8) let 119881

119894= (V1198941 V

119894119903)

(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(11) end for(12) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(13) end for(14) let 119866 = 120574 and 119871 = (119871

1 119871

119899) and output (119862 119866 119871)

Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)

Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871

1 119871

119899)

(5) 119879 search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888

1 119888

119909)

(2) for each token 119905119896(1 le 119896 le 119900) compute V

119896larr 119891ℎ(119905119896)

(3) let 1198811= 1198812= sdot sdot sdot = 119881

119909= (V1 V

119900) and

(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(7) end for

Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)

Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 2

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889

Algorithm 4 LSE scheme symmetric part

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 4: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

4 The Scientific World Journal

(message) and outputs the ciphertext 119888 It is run bythe data sender and 119888 (followed bymultiple searchablestructures) is sent to the server

119905 larr Token(119870priv 119908) is a deterministic algorithm thattakes as input a private key119870priv and a keyword119908 andoutputs a search token 119905 It is run by the user

119887 larr Test(119870pub 119904 119905) is a deterministic algorithmthat takes as input the public key 119870pub a searchablestructure 119904 larrPEKS(119870pub 119908

1015840) and a search token 119905 larr

Token(119870priv 119908) and outputs 119887 = 1 if 119908 = 1199081015840 or 119887 = 0otherwise It is run by the server

1198621015840larr Search(119870pub 119862 119878 119905) is a deterministic algo-

rithm that takes as input the public key 119870pub theencrypted documents 119862 = (119862

1 119862

119899) the corre-

sponding searchable structure set 119878 = (1198781 119878

119899)

(each 119878119894containsmultiple searchable structures corre-

sponding to the keywords of the document) and thesearch token 119905 and outputs the matched documents1198621015840= (1198881 119888

119898) (the documentsrsquo searchable struc-

tures satisfying 1 larr Test(119870pub 119904 119905)) It is run by theserver and 1198621015840 is sent to the user

119889 larr Dec(119870priv 119888) is a probabilistic algorithm thattakes as input a private key 119870priv and a ciphertext 119888and outputs the plaintext 119889 It is run by the user

Unlike symmetric setting the definition of asymmetricsetting only works on a single document For a documentcollection it does not make any difference since the usercould execute the encryption algorithm for each documentrespectively

By comparing the definitions of the two different settingsthere exists a common link between the queried keywordsand the matched documents the searchable structure whichis constructed using either symmetric key or the public keyNote that the structure is probabilistic in the asymmetricsetting or else the server could directly launch the chosenplaintext attack using the public key However we say that forsymmetric and asymmetric settings the searchable structuresare both run-time deterministic To prove this property wefirst introduce a lemma as follows

Lemma 3 For asymmetric setting if the token 119905 generatedusing the private key 119870priv is deterministic then the searchablestructure 119904 encrypted using the public key 119870pub is run-time de-terministic when the the algorithm ASETest outputs 1 even ifthe encryption is probabilistic

Proof Recall that the algorithm 119905 larr Token(119870priv 119908) is deter-ministic and 119904 larr PEKS(119870pub 119908) is probabilistic Howeverfor a single document there only exists a single 119904 that linksto 119908 When 1 larr Test(119870pub 119904 119905) it implies that 119905 matches119904 We replace 119905 with 119904 then the token 119905 = 119904 which could begenerated by the data sender who could generate the tokenusing the public key 119905 larr Token1015840(119870pub 119908)which is in fact thealgorithm PEKS(119870pub 119908) It seems that the data sender hasindirectly generated the token without having the private key

Keywords

Tokenization

Converter

Mapping

Func1

Documents

Decryption

Core (symmetric asymmetric)

Filter

Func 2

Global layer

Local layerFunc

Figure 1 Architecture for layered searchable encryption scheme

Therefore when the output of the test is 1 both the token andthe searchable structure map to 119905 which is deterministic

Based on the lemma above we introduce a theoremwhichguides us to construct the converter in the layered searchableencryption scheme

Theorem 4 (run-time invariance) For both symmetric andasymmetric settings if the search token 119905 is deterministic thenthe searchable structure is run-time deterministic

Proof As proved in Lemma 3 the searchable structure is run-time deterministic for asymmetric setting For symmetricsetting the searchable structure is encrypted using the sym-metric key 120574 larr Enc(119870119863) which is probabilistic The token119905 larr Token(119870 119908) is deterministic Similar to asymmetricsetting when executing the deterministic algorithm Searchthematched entries (probabilistic)map to 119905 and themappingis deterministic (here an entry is the encrypted data usingsymmetric encryption that contains the information aboutthe matched document such as the node in the invertedindex [8]) In other words the searchable structure is run-time deterministic because of the deterministicmapping

42 Scheme Definition Our primary goal is to separate thefunctionalities from the searchable structures therefore weconsider to construct the basic searchable structures andvarious functions in different layers as shown in Figure 1

(i) Global layer we name this layer ldquoglobalrdquo because alldocuments and all searchable structures are involvedIn this layer the basic searchable encryption scheme(symmetric or asymmetric) is executed and a globalindex could be constructed to improve search effi-ciency The server receives the search tokens (eachtoken is related to a keyword) executes the searchprocedure and outputs the matched documents Fur-thermore the server converts the tokens (symmet-ric or asymmetric) to the corresponding mappings(another type of secret token) with uniform formatand transfers the mappings with the matched docu-ments (or identifiers) to the local layer

(ii) Local layer we name this layer ldquolocalrdquo because func-tional structures are constructed for each documentindependently In this layer each matched documentis further filtered by all functions (eg phrase query

The Scientific World Journal 5

function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user

For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows

Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870

119890 119863) is a probabilistic algorithm

that takes as input an encryption key 119870119890(119870119890= 119870priv

for symmetric setting or 119870119890= 119870pub for asymmetric

setting) and a document collection 119863 = (1198891 119889

119899)

It outputs 119899 encrypted documents 119862 = (1198881 119888

119899)

a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866

1 119866

119899) corresponding to 119899 documents

and a sequence of local functional structures 119871 =

(1198711 119871

119899) corresponding to 119899 encrypted docu-

ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908

1 119908

119900) with functional instruc-

tions and outputs the corresponding search tokens119879 = (119905

1 119905

119900) with functional instructions It is run

by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic

algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888

1 119888

119898) It is run by the

server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user

Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query

For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908

1 119908

119900)

(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions

A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows

Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows

119871119889larr Build(119889 119881

119889) is an algorithm that takes as input

a document 119889 and the corresponding conversion 119881119889

and outputs a functional structure 119871119889 It is run by

the data sender and 119871119889is appended to the encrypted

document

1198621015840larr Filter(119862 119871 119881

119879) is an algorithm that takes as

input the encrypted documents 119862 = (1198621 119862

119909)

the corresponding functional structure set 119871 =

(1198711 119871

119909) and the converted search tokens 119881

119879=

(1198811 119881

119909) and outputs a subset of documents 1198621015840 It

is run by the server

43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models

The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else

6 The Scientific World Journal

the security of some components will be correlated such thatthe loose coupling property is lost

Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface

Definition 7 (plain trace) Let119863 = (1198891 119889

119899) be a document

collection Let 119873 = (1198731 119873

119899) (only for asymmetric

setting) be a keyword-counter set where119873119894is the number of

keywords in 119889119894 Let the query history119882 = (119908

1 119908

119901) be a

sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882

119894= 119882119895and 0 otherwiseThe plain

trace 120587(119863119882) = (|1198891| |119889

119899| 119873 120590(119882))

Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface

Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889

1 119889

119899) a sequence of query

119882 = (1199081 119908

119901) and receives (119862 119866) larr Enc(119870

119890 119863)

and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881

119879as the input for

the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-

ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast

119879as the input for the

functional component FinallyA returns a bit 119887 thatis output by the experiment

We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (1)

where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here

since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately

The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example

if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component

Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later

44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface

441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888

1 119888

119899 Each 119888

119894(1 le 119894 le 119899) is linked to a

global searchable structure 119866119894(if a global index is used then

only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871

119894 and only

the global searchable structure 119866(1198661 119866

119899) is used in this

step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881

119879and the matched 119909 encrypted documents

are transmitted to functional components FC1 FC

119890to

further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888

1 119888

119898are

returned to the user and the user decrypts them to obtainthe plaintexts 119889

1 119889

119898 In order to construct a functional

component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1

We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency

The Scientific World Journal 7

Input Token

User Search

Core (SSEASE) Converter

Outputm le x le n

Decrypt

Filter Convert

Filter

W = (w1 w2 w0) T = (t1 t2 t0)

c1 c2 cn

G(G1 G2 Gn)

L1 L2 Ln

d1 d2 dm

c1 c2 cm

c1 c2 cx

G(G1 G2 Gx)

L1 L2 Lx

VT = (V1 V2 Vx)

FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe

Figure 2 Search process of layered searchable encryption scheme

Build(119889 119881119889)

(1) input a document 119889 and the mappings of all words 119881119889in 119889

(2) specified according to the functionality(3) output a local functional structure 119871

119889

Filter(119862 119871 119881119879)

(1) input a set of encrypted document 119862 = (1198881 119888

119909) the corresponding local functional

structures 119871 = (1198711 119871

119909) and the mappings of the queried keywords 119881

119879= (1198811 119881

119909)

(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862

Algorithm 1 Template for functional component FC

rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms

442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905

1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo

and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as

ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)

Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings

Suppose there are 119899 documents 119863 = (1198891 119889

119899) 119899 cor-

responding ciphertexts 119862 = (1198881 119888

119899) 119890 functional compo-

nents FC= (1198651 119865

119890) and 119900 queried keywords119882 = (119908

1

119908119900) In addition we define a hash function as follows

119891ℎ 0 1

lowast997888rarr 0 1

119897 (3)

where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891

ℎ then 119897

is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4

443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890

1and 1198903map to the

word ldquodayrdquo)

ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)

Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1

Then we have

ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)

In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed

8 The Scientific World Journal

Input the encryption key 119870119890= 119870priv the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871

1 119871

119899)

Method(1) compute (120574 119862) larr SSEsdotEnc(119870

119890 119863) Here 119862 = (119888

1 119888

119899)

(2) for each document 119889119894isin 119863 and the corresponding 119888

119894(1 le 119894 le 119899) do

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) for each word 119908119896(1 le 119896 le 119903) in119882 do

(5) compute 119905119896larr SSEsdotToken(119870

119890 119908119896)

(6) compute V119894119896larr 119891ℎ(119905119896)

(7) end for(8) let 119881

119894= (V1198941 V

119894119903)

(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(11) end for(12) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(13) end for(14) let 119866 = 120574 and 119871 = (119871

1 119871

119899) and output (119862 119866 119871)

Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)

Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871

1 119871

119899)

(5) 119879 search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888

1 119888

119909)

(2) for each token 119905119896(1 le 119896 le 119900) compute V

119896larr 119891ℎ(119905119896)

(3) let 1198811= 1198812= sdot sdot sdot = 119881

119909= (V1 V

119900) and

(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(7) end for

Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)

Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 2

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889

Algorithm 4 LSE scheme symmetric part

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 5: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

The Scientific World Journal 5

function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user

For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows

Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows

119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870

119890 119863) is a probabilistic algorithm

that takes as input an encryption key 119870119890(119870119890= 119870priv

for symmetric setting or 119870119890= 119870pub for asymmetric

setting) and a document collection 119863 = (1198891 119889

119899)

It outputs 119899 encrypted documents 119862 = (1198881 119888

119899)

a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866

1 119866

119899) corresponding to 119899 documents

and a sequence of local functional structures 119871 =

(1198711 119871

119899) corresponding to 119899 encrypted docu-

ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908

1 119908

119900) with functional instruc-

tions and outputs the corresponding search tokens119879 = (119905

1 119905

119900) with functional instructions It is run

by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic

algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888

1 119888

119898) It is run by the

server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user

Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query

For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908

1 119908

119900)

(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions

A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows

Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows

119871119889larr Build(119889 119881

119889) is an algorithm that takes as input

a document 119889 and the corresponding conversion 119881119889

and outputs a functional structure 119871119889 It is run by

the data sender and 119871119889is appended to the encrypted

document

1198621015840larr Filter(119862 119871 119881

119879) is an algorithm that takes as

input the encrypted documents 119862 = (1198621 119862

119909)

the corresponding functional structure set 119871 =

(1198711 119871

119909) and the converted search tokens 119881

119879=

(1198811 119881

119909) and outputs a subset of documents 1198621015840 It

is run by the server

43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models

The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else

6 The Scientific World Journal

the security of some components will be correlated such thatthe loose coupling property is lost

Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface

Definition 7 (plain trace) Let119863 = (1198891 119889

119899) be a document

collection Let 119873 = (1198731 119873

119899) (only for asymmetric

setting) be a keyword-counter set where119873119894is the number of

keywords in 119889119894 Let the query history119882 = (119908

1 119908

119901) be a

sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882

119894= 119882119895and 0 otherwiseThe plain

trace 120587(119863119882) = (|1198891| |119889

119899| 119873 120590(119882))

Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface

Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889

1 119889

119899) a sequence of query

119882 = (1199081 119908

119901) and receives (119862 119866) larr Enc(119870

119890 119863)

and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881

119879as the input for

the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-

ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast

119879as the input for the

functional component FinallyA returns a bit 119887 thatis output by the experiment

We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (1)

where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here

since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately

The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example

if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component

Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later

44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface

441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888

1 119888

119899 Each 119888

119894(1 le 119894 le 119899) is linked to a

global searchable structure 119866119894(if a global index is used then

only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871

119894 and only

the global searchable structure 119866(1198661 119866

119899) is used in this

step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881

119879and the matched 119909 encrypted documents

are transmitted to functional components FC1 FC

119890to

further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888

1 119888

119898are

returned to the user and the user decrypts them to obtainthe plaintexts 119889

1 119889

119898 In order to construct a functional

component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1

We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency

The Scientific World Journal 7

Input Token

User Search

Core (SSEASE) Converter

Outputm le x le n

Decrypt

Filter Convert

Filter

W = (w1 w2 w0) T = (t1 t2 t0)

c1 c2 cn

G(G1 G2 Gn)

L1 L2 Ln

d1 d2 dm

c1 c2 cm

c1 c2 cx

G(G1 G2 Gx)

L1 L2 Lx

VT = (V1 V2 Vx)

FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe

Figure 2 Search process of layered searchable encryption scheme

Build(119889 119881119889)

(1) input a document 119889 and the mappings of all words 119881119889in 119889

(2) specified according to the functionality(3) output a local functional structure 119871

119889

Filter(119862 119871 119881119879)

(1) input a set of encrypted document 119862 = (1198881 119888

119909) the corresponding local functional

structures 119871 = (1198711 119871

119909) and the mappings of the queried keywords 119881

119879= (1198811 119881

119909)

(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862

Algorithm 1 Template for functional component FC

rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms

442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905

1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo

and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as

ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)

Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings

Suppose there are 119899 documents 119863 = (1198891 119889

119899) 119899 cor-

responding ciphertexts 119862 = (1198881 119888

119899) 119890 functional compo-

nents FC= (1198651 119865

119890) and 119900 queried keywords119882 = (119908

1

119908119900) In addition we define a hash function as follows

119891ℎ 0 1

lowast997888rarr 0 1

119897 (3)

where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891

ℎ then 119897

is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4

443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890

1and 1198903map to the

word ldquodayrdquo)

ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)

Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1

Then we have

ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)

In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed

8 The Scientific World Journal

Input the encryption key 119870119890= 119870priv the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871

1 119871

119899)

Method(1) compute (120574 119862) larr SSEsdotEnc(119870

119890 119863) Here 119862 = (119888

1 119888

119899)

(2) for each document 119889119894isin 119863 and the corresponding 119888

119894(1 le 119894 le 119899) do

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) for each word 119908119896(1 le 119896 le 119903) in119882 do

(5) compute 119905119896larr SSEsdotToken(119870

119890 119908119896)

(6) compute V119894119896larr 119891ℎ(119905119896)

(7) end for(8) let 119881

119894= (V1198941 V

119894119903)

(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(11) end for(12) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(13) end for(14) let 119866 = 120574 and 119871 = (119871

1 119871

119899) and output (119862 119866 119871)

Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)

Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871

1 119871

119899)

(5) 119879 search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888

1 119888

119909)

(2) for each token 119905119896(1 le 119896 le 119900) compute V

119896larr 119891ℎ(119905119896)

(3) let 1198811= 1198812= sdot sdot sdot = 119881

119909= (V1 V

119900) and

(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(7) end for

Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)

Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 2

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889

Algorithm 4 LSE scheme symmetric part

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 6: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

6 The Scientific World Journal

the security of some components will be correlated such thatthe loose coupling property is lost

Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface

Definition 7 (plain trace) Let119863 = (1198891 119889

119899) be a document

collection Let 119873 = (1198731 119873

119899) (only for asymmetric

setting) be a keyword-counter set where119873119894is the number of

keywords in 119889119894 Let the query history119882 = (119908

1 119908

119901) be a

sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882

119894= 119882119895and 0 otherwiseThe plain

trace 120587(119863119882) = (|1198891| |119889

119899| 119873 120590(119882))

Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface

Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889

1 119889

119899) a sequence of query

119882 = (1199081 119908

119901) and receives (119862 119866) larr Enc(119870

119890 119863)

and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881

119879as the input for

the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-

ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast

119879as the input for the

functional component FinallyA returns a bit 119887 thatis output by the experiment

We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (1)

where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here

since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately

The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example

if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component

Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later

44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface

441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888

1 119888

119899 Each 119888

119894(1 le 119894 le 119899) is linked to a

global searchable structure 119866119894(if a global index is used then

only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871

119894 and only

the global searchable structure 119866(1198661 119866

119899) is used in this

step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881

119879and the matched 119909 encrypted documents

are transmitted to functional components FC1 FC

119890to

further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888

1 119888

119898are

returned to the user and the user decrypts them to obtainthe plaintexts 119889

1 119889

119898 In order to construct a functional

component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1

We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency

The Scientific World Journal 7

Input Token

User Search

Core (SSEASE) Converter

Outputm le x le n

Decrypt

Filter Convert

Filter

W = (w1 w2 w0) T = (t1 t2 t0)

c1 c2 cn

G(G1 G2 Gn)

L1 L2 Ln

d1 d2 dm

c1 c2 cm

c1 c2 cx

G(G1 G2 Gx)

L1 L2 Lx

VT = (V1 V2 Vx)

FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe

Figure 2 Search process of layered searchable encryption scheme

Build(119889 119881119889)

(1) input a document 119889 and the mappings of all words 119881119889in 119889

(2) specified according to the functionality(3) output a local functional structure 119871

119889

Filter(119862 119871 119881119879)

(1) input a set of encrypted document 119862 = (1198881 119888

119909) the corresponding local functional

structures 119871 = (1198711 119871

119909) and the mappings of the queried keywords 119881

119879= (1198811 119881

119909)

(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862

Algorithm 1 Template for functional component FC

rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms

442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905

1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo

and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as

ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)

Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings

Suppose there are 119899 documents 119863 = (1198891 119889

119899) 119899 cor-

responding ciphertexts 119862 = (1198881 119888

119899) 119890 functional compo-

nents FC= (1198651 119865

119890) and 119900 queried keywords119882 = (119908

1

119908119900) In addition we define a hash function as follows

119891ℎ 0 1

lowast997888rarr 0 1

119897 (3)

where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891

ℎ then 119897

is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4

443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890

1and 1198903map to the

word ldquodayrdquo)

ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)

Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1

Then we have

ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)

In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed

8 The Scientific World Journal

Input the encryption key 119870119890= 119870priv the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871

1 119871

119899)

Method(1) compute (120574 119862) larr SSEsdotEnc(119870

119890 119863) Here 119862 = (119888

1 119888

119899)

(2) for each document 119889119894isin 119863 and the corresponding 119888

119894(1 le 119894 le 119899) do

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) for each word 119908119896(1 le 119896 le 119903) in119882 do

(5) compute 119905119896larr SSEsdotToken(119870

119890 119908119896)

(6) compute V119894119896larr 119891ℎ(119905119896)

(7) end for(8) let 119881

119894= (V1198941 V

119894119903)

(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(11) end for(12) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(13) end for(14) let 119866 = 120574 and 119871 = (119871

1 119871

119899) and output (119862 119866 119871)

Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)

Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871

1 119871

119899)

(5) 119879 search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888

1 119888

119909)

(2) for each token 119905119896(1 le 119896 le 119900) compute V

119896larr 119891ℎ(119905119896)

(3) let 1198811= 1198812= sdot sdot sdot = 119881

119909= (V1 V

119900) and

(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(7) end for

Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)

Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 2

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889

Algorithm 4 LSE scheme symmetric part

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 7: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

The Scientific World Journal 7

Input Token

User Search

Core (SSEASE) Converter

Outputm le x le n

Decrypt

Filter Convert

Filter

W = (w1 w2 w0) T = (t1 t2 t0)

c1 c2 cn

G(G1 G2 Gn)

L1 L2 Ln

d1 d2 dm

c1 c2 cm

c1 c2 cx

G(G1 G2 Gx)

L1 L2 Lx

VT = (V1 V2 Vx)

FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe

Figure 2 Search process of layered searchable encryption scheme

Build(119889 119881119889)

(1) input a document 119889 and the mappings of all words 119881119889in 119889

(2) specified according to the functionality(3) output a local functional structure 119871

119889

Filter(119862 119871 119881119879)

(1) input a set of encrypted document 119862 = (1198881 119888

119909) the corresponding local functional

structures 119871 = (1198711 119871

119909) and the mappings of the queried keywords 119881

119879= (1198811 119881

119909)

(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862

Algorithm 1 Template for functional component FC

rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms

442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905

1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo

and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as

ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)

Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings

Suppose there are 119899 documents 119863 = (1198891 119889

119899) 119899 cor-

responding ciphertexts 119862 = (1198881 119888

119899) 119890 functional compo-

nents FC= (1198651 119865

119890) and 119900 queried keywords119882 = (119908

1

119908119900) In addition we define a hash function as follows

119891ℎ 0 1

lowast997888rarr 0 1

119897 (3)

where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891

ℎ then 119897

is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4

443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890

1and 1198903map to the

word ldquodayrdquo)

ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)

Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1

Then we have

ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)

In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed

8 The Scientific World Journal

Input the encryption key 119870119890= 119870priv the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871

1 119871

119899)

Method(1) compute (120574 119862) larr SSEsdotEnc(119870

119890 119863) Here 119862 = (119888

1 119888

119899)

(2) for each document 119889119894isin 119863 and the corresponding 119888

119894(1 le 119894 le 119899) do

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) for each word 119908119896(1 le 119896 le 119903) in119882 do

(5) compute 119905119896larr SSEsdotToken(119870

119890 119908119896)

(6) compute V119894119896larr 119891ℎ(119905119896)

(7) end for(8) let 119881

119894= (V1198941 V

119894119903)

(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(11) end for(12) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(13) end for(14) let 119866 = 120574 and 119871 = (119871

1 119871

119899) and output (119862 119866 119871)

Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)

Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871

1 119871

119899)

(5) 119879 search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888

1 119888

119909)

(2) for each token 119905119896(1 le 119896 le 119900) compute V

119896larr 119891ℎ(119905119896)

(3) let 1198811= 1198812= sdot sdot sdot = 119881

119909= (V1 V

119900) and

(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(7) end for

Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)

Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 2

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889

Algorithm 4 LSE scheme symmetric part

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 8: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

8 The Scientific World Journal

Input the encryption key 119870119890= 119870priv the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871

1 119871

119899)

Method(1) compute (120574 119862) larr SSEsdotEnc(119870

119890 119863) Here 119862 = (119888

1 119888

119899)

(2) for each document 119889119894isin 119863 and the corresponding 119888

119894(1 le 119894 le 119899) do

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) for each word 119908119896(1 le 119896 le 119903) in119882 do

(5) compute 119905119896larr SSEsdotToken(119870

119890 119908119896)

(6) compute V119894119896larr 119891ℎ(119905119896)

(7) end for(8) let 119881

119894= (V1198941 V

119894119903)

(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(11) end for(12) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(13) end for(14) let 119866 = 120574 and 119871 = (119871

1 119871

119899) and output (119862 119866 119871)

Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)

Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871

1 119871

119899)

(5) 119879 search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888

1 119888

119909)

(2) for each token 119905119896(1 le 119896 le 119900) compute V

119896larr 119891ℎ(119905119896)

(3) let 1198811= 1198812= sdot sdot sdot = 119881

119909= (V1 V

119900) and

(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(7) end for

Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)

Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 2

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889

Algorithm 4 LSE scheme symmetric part

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 9: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

The Scientific World Journal 9

ASETest

Convert

FCFilter

Gi searchable structure

middot middot middot

L1 L2 middot middot middot Le

Li functional structure

T = (t1 t0)

Kpub s1 s2 sa

c1

c2

ci

cn

Figure 3 Data structure and search process for asymmetric setting

Table 1 Searchable encryption schemes with various functionalities

Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash

Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash

Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash

Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes

to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument

There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components

Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7

Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1

444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently

Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure

Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881

119879) from equal-size ran-

dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-

security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879

lowast Forasymmetric setting119881

119879is the hash of the searchable structure

119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881

119879is indistinguishable from the hash value

119881lowast

119879

5 Realizing Various Functionalities

In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities

51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 10: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

10 The Scientific World Journal

Input encryption key 119870119890= 119870pub the documents119863 = (119889

1 119889

119899)

Output(1) 119862 encrypted documents 119862 = (119888

1 119888

119899)

(2) 119866 global searchable structures 119866 = (1198661 119866

119899)

(3) 119871 local functional structures 119871 = (1198711 119871

119899)

Method(1) for each document 119889

119894(1 le 119894 le 119899) in119863 do

(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)

(3) scan 119889119894for all 119903 words to form a word list119882 = (119908

1 119908

119903)

(4) extract 119886 distinct keywords1198821015840 = (1199081 119908

119886) from119882

(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do

(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)

(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)

(8) end for(9) let 119866

119894= (1199041198941 119904

119894119886)map to119867

119894= (ℎ1198941 ℎ

119894119886)map to1198821015840

(10) for each word 119908119910(1 le 119910 le 119903) in119882 do

(11) find the ℎ isin 119867119894that the corresponding word 119908

119910isin 1198821015840

(12) set V119894119910= ℎ

(13) end for(14) let 119881

119894= (V1198941 V

119894119903)

(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895

119894larr FCjsdotBuild(119889119894 119881119894)

(17) end for(18) append 119871

119894= (1198711

119894 119871

119890

119894) to 119888119894

(19) end for(20) output 119862 = (119888

1 119888

119899) 119866 = (119866

1 119866

119899) 119871 = (119871

1 119871

119899)

Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)

Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866

1 119866

119899)

(4) 119871 local functional structures 119871 = (1198711 119871

119899)

(5) 119879 the search tokens 119879 = (1199051 119905

119900)

Output matched documents 1198621015840 = (1198881 119888

119898)

Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862

1015840= (1198881 119888

119909)

(2) for each 119888119894isin 1198621015840 and the functional structure 119871

119894(1 le 119894 le 119909) do

(3) let 119866119894= (1199041198941 119904

119894119886) denote the searchable encryptions of 119888

119894

where 119886 is the number of keywords in 119888119894

(4) for each 119905119910(1 le 119910 le 119900) in 119879 do

(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904

(6) compute V119894119910larr 119891ℎ(119904)

(7) end for(8) let 119881

119894= (V1198941 V

119894119900) 119871119894= (1198711

119894 119871

119890

119894)

(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888

1 119888

119909) then the corresponding 1198711015840 = (119871

1 119871

119909)

and 119881119879= (1198811 119881

119909) Let 119871119895 = (119871119895

1 119871

119895

119909)

(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)

(13) end for

Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 11: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

The Scientific World Journal 11

Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870

119890 119863) described in Algorithm 5

Token(119870priv119882)(1) for each keyword 119908

119896(1 le 119896 le 119900) in119882 do

(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)

(3) end for(4) output 119879 = (119905

1 119905

119900)

Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889

Algorithm 7 LSE scheme asymmetric part

(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high

52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity

521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]

A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return

the associated value in constant look-up time and returnotherwise

522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary

We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889

1 119889

119899) For each document

119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the

relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900

119896) for

each keyword119908119896119894isin 119882 in 119889

119896 and record a 119900119896times3matrix for 119889

119896

with the 119894th line recording 119877119896119894= (119908119896

119894 119904119896

119894 119901119896

119894) where 119901119896

119894is the

position where the first 119908119896119894occurs For all documents setup

the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900

119899 numbers (1199041 119904

119873)

For each number 119904119895(1 le 119895 le 119873) the encryption is 119890

119895

Transform the previous matrix to an OPE table with the 119894thline recording 119877119896

119894= (119908119896

119894 119890119896

119894 119901119896

119894) where 119890119896

119894is the encryption of

119904119896

119894For a document it has at most |119889|2 + 1 keywords

(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8

523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863

1 1198632with equal size and |119863

1| = |119863

2| then the chal-

lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of

the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows

Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 12: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

12 The Scientific World Journal

Build(119889 119881119889)

(1) input a document 119889 and the mapping 119881119889= (V1 V

119903) for 119903 words

(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908

119900 119890119900 119901119900))

(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894

(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871

119889= 119860

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures

119871 = (1198711 119871

119899) and the mappings of the queried keywords

119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903

119899= 119871119899[V119899]

and select the top 119896 results 1198881 119888

119896corresponding to 119903

1 119903

119896

(3) output 1198621015840 = (1198881 119888

119896)

Algorithm 8 Ranked keyword query component

the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a documentcollection119863 = (119889

1 119889

119899) (the size of each document

is fixed) and receives the encrypted documents 119862 =

(1198881 119888

119899) and functional structures 119871 = (119871

1 119871

119899)

with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889

1 119908 isin 119889

119899and

receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the

size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment

We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (6)

where the probabilities are over the coins of Gen

Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure

Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast

1 119888

lowast

119899of size |119889|

As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast

119898with each has size |V|S generates an119898times 119899matrix

119864119898times119899

= (119890lowast

119894119895) where each element 119890lowast

119894119895is a random number

Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast

119894119895

(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast

We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption

key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V

119894isin 119881 or Vlowast = Vlowast

119894isin 119881lowast the adversaryA could

invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=

1198711[V119894] = 119890

1 119903

119899= 119871119899[V119894] = 119890

119899) or (119903lowast

1= 119871lowast

1[Vlowast119894] =

119890lowast

1 119903

lowast

119899= 119871lowast

119899[Vlowast119894] = 119890

lowast

119899) POPF-CCA security guarantees

that the set (1199031 119903

119899) is indistinguishable from (119903

lowast

1 119903

lowast

119899)

that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871

lowast

53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908

Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and

the functional structure could record all possible predicatesand provide predicate test using a bloom filter

531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904

1 119904

119899) The set 119878 is

coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ

1 ℎ

119903

where each ℎ119894 0 1

lowastrarr [1119898] for 1 le 119894 le 119903 For each

element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions

ℎ1(119904119896) ℎ

119903(119904119896) to 1 Note that a location could be set to 1

multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ

1(119904) ℎ

119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878

Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 13: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

The Scientific World Journal 13

Build(119889 119881119889)

(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873)

and the mapping 119881119889= (V1 V

2119873+1) Here 119889 is the transformed form for the label 119886 = 119886

119894

(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910

1= ℎ1(119889 V

119896) 119910

119903= ℎ119903(119889 V

119896)

(5) insert the codewords 1199101 119910

119903into the bloom filter 119861

(6) end for(7) output a local functional structure 119871

119889= 119861

Filter(119862 119871 119881119879)

(1) input 119899 ciphertexts 119862 = (1198881 119888

119899) the corresponding functional structures 119871 = (119871

1 119871

119899)

the mappings of the queried keywords 119881119879= (1198811 119881

119899) = (V

1 V

119899) (single keyword)

where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo

(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910

1= ℎ1(119889 V

119894) 119910

119903= ℎ119903(119889 V

119894)

(4) if all 119903 locations 1199101 119910

119903in bloom filter 119861

119894= 119871119894are 1 then add 119888

119894to 1198621015840

(5) end for(6) output 1198621015840

Algorithm 9 Range query component

In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910

1 119910

119899) to denote 119909 gt 119910

1 119909 gt 119910

119899for

simplicity

532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators

We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886

1 119886

119873) where 119886

1lt 1198862lt sdot sdot sdot lt 119886

119873

Then we set five shared virtual documents 11988910158401= (gt 119886

1 gt

1198862 gt 119886

119873minus1) 11988910158402= (ge 119886

1 ge 1198862 ge 119886

119873) 11988910158403= (lt 119886

2 lt

1198863 lt 119886

119873) 11988910158404= (le 119886

1 le 1198862 le 119886

119873) and 1198891015840

5= (= 119886

1 =

1198862 = 119886

119873) for all userrsquos documents The virtual document

could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886

119894rdquowhere 1 le 119894 le 119873minus1

there always exists a mapping V119894

Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886

119894minus1lt 119886 lt 119886

119894+1(or 119886 = 119886

119894) could

be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886

119894minus1

ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886

119873 le 119886119873) and

these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886

1lt 119908 lt 119886

119873 then the

query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for

simplicity) suppose we have two documents 1198881 1198882labeled

5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt

2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-

mits gt7 then only 1198612matches the query which is the same

result as direct comparisons since 5 gt 7 and 10 gt 7 and

then 1198882is returned Similarly the query ldquogt3rdquo will match both

documents and 1198881 1198882are returned

Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886

1 119886

119873) denote the domain of

the label and setup the bloom filter with 119903 independent hashfunctions ℎ

1 ℎ

119903 The identifier 119889 of a document is always

bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9

The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult

533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886

1 1198862 respectively

the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The

adversary is allowed to adaptively query 119901 keywords 119882 =

(1199081 119908

119901) where each 119908

119894isin 119882 that (119886

1 1198862) gt 119908119894 Note that

querying 119908119894that 119886

1gt 119908119894 1198862

gt 119908119894is not allowed since the

document is immediately distinguished (only the documentwith 119886

1gt 119908119894is matched and returned) We propose the

notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 14: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

14 The Scientific World Journal

Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator

RealΣA(119896) the challenger runs Gen(1119896) to generate

the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908

1 gt119908

119901) For each query ldquogt 119908

119894rdquo A re-

ceives a mapping V119894from the challenger Finally A

returns a bit 119887 that is output by the experiment

SimΣAS(119896) given the document size |119889| the cardi-

nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast

1 Vlowast

119901) and then

sends the results to A Finally A returns a bit 119887 thatis output by the experiment

We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat

1003816100381610038161003816Pr [Real

ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]

1003816100381610038161003816

le negl (119896) (7)

where the probabilities are over the coins of Gen

Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure

Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast

119901)

as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889

lowast

and 2119873+1 distinct andrandom strings 119879 = (119905

1 119905

2119873+1) For each 119905

119894isin 119879 S com-

putes 119903 codewords 119910lowast1= ℎ1(119889

lowast

||119905119894) 119910

lowast

119903= ℎ119903(119889

lowast

||119905119894) and

inserts the codewords 119910lowast1 119910

lowast

119903into a bloom filter 119861lowast Let

119871lowast= 119861lowast As to119881lowast for each Vlowast

119894isin 119881lowastS randomly selects a dis-

tinct 119905119895isin 119879maps to Vlowast

119894 such that Vlowast

119894= 119905119895 Note that if Vlowast

119909= Vlowast119910

for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could

distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V

119894isin 119881 is indistinguishable from

the random string Vlowast119894such that 119881 is indistinguishable from

119881lowast Therefore the locations (119910

1 119910

119903) of V119894in bloom filter

119861 is indistinguishable from the locations (119910lowast1 119910

lowast

119903) of Vlowast

119894

in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871

lowast

54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components

We briefly introduce how to realize some other functionalitiesbased on LSE as follows

Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)

Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords

Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme

Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same

55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 15: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

The Scientific World Journal 15

05 1 15 20

100

200

300

400

500

600

700

800

900

Number of documents (million)

Runn

ing

time (

mill

iseco

nds)

Ranked keyword queryRange query

Figure 4 Time costs of the filter algorithms (single server singlequery)

User

Documents

RankRange

Core

Rank Range

Range

Core

Searchable and functional structures

Tokens

Figure 5 Deploying functional components to multiple servers

component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4

Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further

6 Conclusions

Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)

References

[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010

[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007

[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010

[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007

[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003

[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000

[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003

[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006

[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 16: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

16 The Scientific World Journal

[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research

[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004

[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008

[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011

[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010

[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004

[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011

[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011

[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012

[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012

[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012

[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012

[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007

[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007

[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992

[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo

IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012

[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010

[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009

[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004

[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011

[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984

[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970

[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002

[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004

[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999

[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999

[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Page 17: Research Article A Layered Searchable Encryption Scheme ...downloads.hindawi.com/journals/tswj/2014/153791.pdf · A Layered Searchable Encryption Scheme with Functional Components

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Distributed Sensor Networks

International Journal of

Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014

International Journal of

ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia

International Journal of

Biomedical Imaging

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

ArtificialNeural Systems

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Computational Intelligence and Neuroscience

Industrial EngineeringJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014