Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Research ArticleA Layered Searchable Encryption Scheme with FunctionalComponents Independent of Encryption Methods
Guangchun Luo Ningduo Peng Ke Qin and Aiguo Chen
School of Computer Science and Engineering University of Electronic Science and Technology of China ChengduSichuan 611731 China
Correspondence should be addressed to Ningduo Peng nindo academia163com
Received 18 October 2013 Accepted 8 January 2014 Published 25 February 2014
Academic Editors H-E Tseng and G Wei
Copyright copy 2014 Guangchun Luo et alThis is an open access article distributed under theCreative CommonsAttribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Searchable encryption technique enables the users to securely store and search their documents over the remote semitrusted serverwhich is especially suitable for protecting sensitive data in the cloud However various settings (based on symmetric or asymmetricencryption) and functionalities (ranked keyword query range query phrase query etc) are often realized by different methodswith different searchable structures that are generally not compatible with each other which limits the scope of application andhinders the functional extensions We prove that asymmetric searchable structure could be converted to symmetric structure andfunctions could be modeled separately apart from the core searchable structure Based on this observation we propose a layeredsearchable encryption (LSE) scheme which provides compatibility flexibility and security for various settings and functionalitiesIn this scheme the outputs of the core searchable component based on either symmetric or asymmetric setting are converted tosome uniformmappings which are then transmitted to loosely coupled functional components to further filter the results In sucha way all functional components could directly support both symmetric and asymmetric settings Based on LSE we propose tworepresentative and novel constructions for ranked keyword query (previously only available in symmetric scheme) and range query(previously only available in asymmetric scheme)
1 Introduction
Cloud storage provides an elastic highly available easilyaccessible and cheap repository to users to store and usetheir data and such a convenient way attracts more andmore people In many cases the users require their sensitivedata such as business documents to be secure against anyadversary or even the cloud provider and therefore all datamust be encrypted before sending to the server [1] Howevertraditional encryption schemes (eg DES) do not provide anyfunctionality to the users such that searching for the desireddocuments by keywords as a basic function for storagesystem is quite impossible The problem is that there is noway to know if there exists such keywords in an encrypteddocument without decryption and apparently the servershould not have the decryption key
Searchable encryption technique provides a solution tosuch problem It enables the users to encrypt their sensitivedata and store it to the remote server while retaining theability to search by keywordsWhile searching the user sendsto the server a secret token (a transformation of the queriedkeywords) then the server uses the token to search over theencrypted data and returns the matched documents Duringthe process the server does not know what the queriedkeywords and the document contents are and therefore theprivacy is guaranteed
Many searchable encryption schemes have been proposedwith various settings and functionalities For symmetricsearchable encryption schemes the user encrypts searchesand decrypts the documents using hisher private symmetrickey For asymmetric searchable encryption schemes the datasender encrypts the documents using the userrsquos public key
Hindawi Publishing Corporatione Scientific World JournalVolume 2014 Article ID 153791 16 pageshttpdxdoiorg1011552014153791
2 The Scientific World Journal
and the user searches and decrypts the documents usingthe private key Beyond the basic keyword matching manyfunctions are also added to either symmetric or asymmetricsetting such as range query phrase query and fuzzy keywordquery
However these functions are often realized by differ-ent methods with different searchable structures which aregenerally not compatible with each other For example theasymmetric encryption scheme introduced in [2] realizedconjunctive subset and range queries However it is difficultto figure out how to apply this method to symmetric settingEven for the same setting such as the fuzzy query schemeintroduced in [3] and the rank-ordered query scheme intro-duced in [4] it is difficult to figure out how to combine twomethods together since the functions are constructed basedon different indexing structures
Layered searchable encryption (LSE) scheme aims toprovide compatibility flexibility and security for varioussettings and functionalities In this new framework key-words are firstly transformed to tokens that are filtered bythe core searchable component (symmetric or asymmetricsetting) and then the tokens are dynamically converted touniform mappings which are transmitted to many stand-alone functional components (eg ranked keyword querycomponent fuzzy query component etc) to further filterthe results Since all functional components are independentof each other and the interfaces are common the functionsare compatible with each other and directly support bothsymmetric and asymmetric settings and adding or deleting afunction is quite simple since each function is loosely coupledwith the core searchable component Furthermore LSEsupports combined query For example the query ldquoSELECTlowast WHERE keywords = ldquocloud storage encryptionrdquo ANDldquosecurity classification gt 5rdquo ORDERED BY ldquokeywordcloudrdquordquo(to express the query we adopt the SQL-like format used indatabase) is a combination of three functional componentsbasic query range query and ranked keyword query (inthis paper we will present the concrete construction for thisexample)
Furthermore this framework is similar to the data streamprocessing architecture [5] where functional componentscould be treated as operator boxes and the whole schemecould be treated as a data-flow system by which all processesfollow the popular boxes and arrows paradigmTherefore incomparison to the previous searchable encryption schemesLSE is more suitable for distributed and parallel computingenvironment
In this paper our contributions are the following (1)Wepropose a novel framework for designing searchable encryp-tion scheme called layered searchable encryption (LSE)which enables combined query and provides compatibilityflexibility and security for various settings and function-alities The new framework consists of a core searchablecomponent with a symmetricasymmetric converter manyfunctional components and a common interface with newsecurity model (2) We propose a concrete construction forLSE that could theoretically combine all possible function-alities which are proposed in the recent years and proveits semantic security for the interface (3) As a complement
for the prior works we formally define two new securitymodels for ranked keyword query and range query calledsemantic security against chosen ranked keyword attack(CRKA) and chosen range attack (CRA) respectively whichprovide integral security models for cryptographic analysis(4) Based on LSE we propose two representative and novelconstructions for ranked keyword query component (previ-ously only available in symmetric scheme) and range querycomponent (previously only available in asymmetric scheme)and prove them semantically secure under the new securitymodels
The rest of the paper is organized as follows Section 2presents the related work Section 3 presents the notationsand preliminaries Section 4 presents the layered searchableencryption scheme and the concrete construction Section 5discusses how to realize various functionalities and presentsthe concrete constructions for ranked keyword query andrange query Section 6 concludes this paper
2 Related Work
Searchable encryption schemes are designed to help the usersto securely search over the encrypted data by keywordsThe first scheme was introduced in [6] by Song et al andlater on many index-based symmetric searchable encryption(SSE) schemes were proposed Goh introduced the firstsecure index in [7] and they also built the security modelfor searchable encryption called Adaptive Chosen KeywordAttack (IND-CKA) In [8] Curtmola et al introduced twoconstructions to realize symmetric searchable encryptionthe first construction (named SSE-1) is nonadaptive andthe second one (named SSE-2) is adaptive A generalizationfor symmetric searchable encryption was introduced in [9]and a representative SSE system designed by Microsoft wasintroduced in [10] Another type of searchable encryptionnamed asymmetric searchable encryption (ASE) is public-key based which allows the user to search over the dataencrypted by some data senders using the public key of theuser The first scheme was introduced in [11] by Boneh etal based on bilinear maps and the improved definition wasintroduced in [12]
There are many functional extensions for the search-able encryption schemes beyond the basic precise keywordmatching For symmetric setting the authors in [4 13 14]introduced ranked keyword search schemes based on order-preserving encryption technique or two-round protocolwhich allows the server to only return the top-119896 relevantresults to the user In [15] Golle et al introduced a schemesupporting conjunctive keyword searchwhich allows the userto searchmultiple keywords in a single query In [3 16 17] theauthors introduced fuzzy keyword search schemes based onwildcard technique which allows the user to submit only partof the precise keyword Similar to fuzzy keyword search butdifferent the authors in [18 19] introduced similarity searchschemes based on wildcard technique which allows theserver to return the results similar to the queried keyword In[20 21] the authors introduced phrase query schemes basedon trusted client-side server or binary search which allows
The Scientific World Journal 3
the user to query a phrase instead of multiple independentkeywords For asymmetric setting the authors in [22 23]introduced range query schemes In addition Boneh et alalso introduced conjunctive and subset query in [22] basedon bilinear maps
Note that most of these techniques are not compatiblewith each other due to specific data structure andmathemati-cal property However in the following sections wewill provethat functional structures and searchable structures could beseparately constructed and asymmetric structures could beconverted to symmetric structures such that a compatible all-in-one scheme is possible
3 Notations and Preliminaries
We write 119909larr119880119883 to denote sampling element 119909 uniformly
random from a set 119883 and write 119909 larr A to denote the outputof an algorithmAWewrite 119886 119887 to denote the concatenationof two strings 119886 and 119887 We write |119860| to denote its cardinality if119860 is a set and write |119886| to denote its bit length if 119886 is a stringA function 120583(119896) N rarr R is negligible if for every positivepolynomial 119901(sdot) there exists an inter 119873 gt 0 such that for all119896 gt 119873 |120583(119896)| lt 1119901(119896) We write poly(119896) and negl(119896) todenote polynomial and negligible functions in 119896 respectively
We write Δ = (1199081 119908
119899) to denote a dictionary of 119899
words in lexicographic order We assume that all words areof length polynomial in 119896 We write 119889 to refer to a documentthat contains poly(119896) words and write |119889| to denote the sizeof the document in bytes In some cases we also write 119889 todenote the document identifier that uniquely identifies thedocument such as amemory locationWewrite X to denote acomponent or a scheme and write Xfunc( ) to denote thecorresponding function for the component or an algorithmin the scheme
4 Layered Searchable Encryption Scheme
Layered searchable encryption scheme aims to combinesymmetric and asymmetric searchable encryption schemesto provide a uniformmodel for functional extensionsThere-fore we first revisit the basic symmetric and asymmetricsearchable encryption models and then build the layeredsearchable encryption model based on these two differentsettings After that we introduce the security model ofthe new framework and finally we present the concreteconstruction
41 Revisiting Searchable Encryption Weadopt the definitionintroduced by Curtmola et al in [8] as a representativemodel for symmetric searchable encryption scheme In thissetting the user who searches for the documents is also thedata sender who encrypts the documents Therefore someefficient searching techniques such as using a global indexare used and the searchable structure may be a single indexfile for all stored documents For consistency with otherdefinitions we make a little modification for the originaldefinition and define the scheme as follows
Definition 1 (symmetric searchable encryption) A symmet-ric searchable encryption (SSE) scheme is a collection offive polynomial-time algorithms SSE = (Gen Enc TokenSearch Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takes asinput a security parameter 119896 and outputs a secret key119870 It is run by the user and the key is kept secret(120574 119862) larr Enc(119870119863) is a probabilistic algorithmthat takes as input a secret key 119870 and a documentcollection 119863 = (119889
1 119889
119899) and outputs a searchable
structure 120574 and a sequence of encrypted documents119862 = (119888
1 119888
119899) It enables a user to query some key-
words and the server returns thematched documentsFor instance in an index-based symmetric searchableencryption scheme 120574 is the secure index It is run bythe user and (120574 119862) is sent to the server119905 larr Token(119870 119908) is a deterministic algorithm thattakes as input a secret key 119870 and a keyword 119908 andoutputs a search token 119905 (also named trapdoor orcapacity) It is run by the user1198621015840larr Enc(119862 120574 119905) is a deterministic algorithm that
takes as input the encrypted documents 119862 thesearchable structure 120574 and the search token 119905 andoutputs the matched documents (or identifiers) 1198621015840 =(1198881 119888
119898) It is run by the server and1198621015840 is sent to the
user119889 larr Dec(119870 119888) is a deterministic algorithm that takesas input a secret key119870 and the encrypted document 119888and outputs the recovered plaintext 119889 It is run by theuser
We adopt the definition introduced by Boneh et al in[11] as a representative model for asymmetric searchableencryption scheme In this setting the user generates thepublic key and the private key The data sender encrypts thedata using the public key and the user searches and decryptsthe data using the private key The original definition onlycontains the searchable part for consistency we add twoalgorithms and define the asymmetric searchable encryptionas follows
Definition 2 (asymmetric searchable encryption) An asym-metric searchable encryption (ASE) scheme is a collection ofseven polynomial-time algorithms ASE = (Gen PEKS EncToken Test Search Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs a pub-licprivate key pair 119870 = (119870pub 119870priv) It is run by theuser and only119870priv is kept secret119904 larr PEKS(119870pub 119908) is a probabilistic algorithm thattakes as input a public key 119870pub and a word 119908 andoutputs a searchable structure 119904 It is run by the datasender and 119904 is attached to the encryptedmessage andthe combination is sent to the server119888 larr Enc(119870pub 119889) is a probabilistic algorithm thattakes as input a public key 119870pub and a document
4 The Scientific World Journal
(message) and outputs the ciphertext 119888 It is run bythe data sender and 119888 (followed bymultiple searchablestructures) is sent to the server
119905 larr Token(119870priv 119908) is a deterministic algorithm thattakes as input a private key119870priv and a keyword119908 andoutputs a search token 119905 It is run by the user
119887 larr Test(119870pub 119904 119905) is a deterministic algorithmthat takes as input the public key 119870pub a searchablestructure 119904 larrPEKS(119870pub 119908
1015840) and a search token 119905 larr
Token(119870priv 119908) and outputs 119887 = 1 if 119908 = 1199081015840 or 119887 = 0otherwise It is run by the server
1198621015840larr Search(119870pub 119862 119878 119905) is a deterministic algo-
rithm that takes as input the public key 119870pub theencrypted documents 119862 = (119862
1 119862
119899) the corre-
sponding searchable structure set 119878 = (1198781 119878
119899)
(each 119878119894containsmultiple searchable structures corre-
sponding to the keywords of the document) and thesearch token 119905 and outputs the matched documents1198621015840= (1198881 119888
119898) (the documentsrsquo searchable struc-
tures satisfying 1 larr Test(119870pub 119904 119905)) It is run by theserver and 1198621015840 is sent to the user
119889 larr Dec(119870priv 119888) is a probabilistic algorithm thattakes as input a private key 119870priv and a ciphertext 119888and outputs the plaintext 119889 It is run by the user
Unlike symmetric setting the definition of asymmetricsetting only works on a single document For a documentcollection it does not make any difference since the usercould execute the encryption algorithm for each documentrespectively
By comparing the definitions of the two different settingsthere exists a common link between the queried keywordsand the matched documents the searchable structure whichis constructed using either symmetric key or the public keyNote that the structure is probabilistic in the asymmetricsetting or else the server could directly launch the chosenplaintext attack using the public key However we say that forsymmetric and asymmetric settings the searchable structuresare both run-time deterministic To prove this property wefirst introduce a lemma as follows
Lemma 3 For asymmetric setting if the token 119905 generatedusing the private key 119870priv is deterministic then the searchablestructure 119904 encrypted using the public key 119870pub is run-time de-terministic when the the algorithm ASETest outputs 1 even ifthe encryption is probabilistic
Proof Recall that the algorithm 119905 larr Token(119870priv 119908) is deter-ministic and 119904 larr PEKS(119870pub 119908) is probabilistic Howeverfor a single document there only exists a single 119904 that linksto 119908 When 1 larr Test(119870pub 119904 119905) it implies that 119905 matches119904 We replace 119905 with 119904 then the token 119905 = 119904 which could begenerated by the data sender who could generate the tokenusing the public key 119905 larr Token1015840(119870pub 119908)which is in fact thealgorithm PEKS(119870pub 119908) It seems that the data sender hasindirectly generated the token without having the private key
Keywords
Tokenization
Converter
Mapping
Func1
Documents
Decryption
Core (symmetric asymmetric)
Filter
Func 2
Global layer
Local layerFunc
Figure 1 Architecture for layered searchable encryption scheme
Therefore when the output of the test is 1 both the token andthe searchable structure map to 119905 which is deterministic
Based on the lemma above we introduce a theoremwhichguides us to construct the converter in the layered searchableencryption scheme
Theorem 4 (run-time invariance) For both symmetric andasymmetric settings if the search token 119905 is deterministic thenthe searchable structure is run-time deterministic
Proof As proved in Lemma 3 the searchable structure is run-time deterministic for asymmetric setting For symmetricsetting the searchable structure is encrypted using the sym-metric key 120574 larr Enc(119870119863) which is probabilistic The token119905 larr Token(119870 119908) is deterministic Similar to asymmetricsetting when executing the deterministic algorithm Searchthematched entries (probabilistic)map to 119905 and themappingis deterministic (here an entry is the encrypted data usingsymmetric encryption that contains the information aboutthe matched document such as the node in the invertedindex [8]) In other words the searchable structure is run-time deterministic because of the deterministicmapping
42 Scheme Definition Our primary goal is to separate thefunctionalities from the searchable structures therefore weconsider to construct the basic searchable structures andvarious functions in different layers as shown in Figure 1
(i) Global layer we name this layer ldquoglobalrdquo because alldocuments and all searchable structures are involvedIn this layer the basic searchable encryption scheme(symmetric or asymmetric) is executed and a globalindex could be constructed to improve search effi-ciency The server receives the search tokens (eachtoken is related to a keyword) executes the searchprocedure and outputs the matched documents Fur-thermore the server converts the tokens (symmet-ric or asymmetric) to the corresponding mappings(another type of secret token) with uniform formatand transfers the mappings with the matched docu-ments (or identifiers) to the local layer
(ii) Local layer we name this layer ldquolocalrdquo because func-tional structures are constructed for each documentindependently In this layer each matched documentis further filtered by all functions (eg phrase query
The Scientific World Journal 5
function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user
For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows
Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870
119890 119863) is a probabilistic algorithm
that takes as input an encryption key 119870119890(119870119890= 119870priv
for symmetric setting or 119870119890= 119870pub for asymmetric
setting) and a document collection 119863 = (1198891 119889
119899)
It outputs 119899 encrypted documents 119862 = (1198881 119888
119899)
a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866
1 119866
119899) corresponding to 119899 documents
and a sequence of local functional structures 119871 =
(1198711 119871
119899) corresponding to 119899 encrypted docu-
ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908
1 119908
119900) with functional instruc-
tions and outputs the corresponding search tokens119879 = (119905
1 119905
119900) with functional instructions It is run
by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic
algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888
1 119888
119898) It is run by the
server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user
Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query
For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908
1 119908
119900)
(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions
A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows
Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows
119871119889larr Build(119889 119881
119889) is an algorithm that takes as input
a document 119889 and the corresponding conversion 119881119889
and outputs a functional structure 119871119889 It is run by
the data sender and 119871119889is appended to the encrypted
document
1198621015840larr Filter(119862 119871 119881
119879) is an algorithm that takes as
input the encrypted documents 119862 = (1198621 119862
119909)
the corresponding functional structure set 119871 =
(1198711 119871
119909) and the converted search tokens 119881
119879=
(1198811 119881
119909) and outputs a subset of documents 1198621015840 It
is run by the server
43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models
The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else
6 The Scientific World Journal
the security of some components will be correlated such thatthe loose coupling property is lost
Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface
Definition 7 (plain trace) Let119863 = (1198891 119889
119899) be a document
collection Let 119873 = (1198731 119873
119899) (only for asymmetric
setting) be a keyword-counter set where119873119894is the number of
keywords in 119889119894 Let the query history119882 = (119908
1 119908
119901) be a
sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882
119894= 119882119895and 0 otherwiseThe plain
trace 120587(119863119882) = (|1198891| |119889
119899| 119873 120590(119882))
Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface
Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889
1 119889
119899) a sequence of query
119882 = (1199081 119908
119901) and receives (119862 119866) larr Enc(119870
119890 119863)
and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881
119879as the input for
the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-
ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast
119879as the input for the
functional component FinallyA returns a bit 119887 thatis output by the experiment
We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (1)
where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here
since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately
The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example
if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component
Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later
44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface
441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888
1 119888
119899 Each 119888
119894(1 le 119894 le 119899) is linked to a
global searchable structure 119866119894(if a global index is used then
only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871
119894 and only
the global searchable structure 119866(1198661 119866
119899) is used in this
step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881
119879and the matched 119909 encrypted documents
are transmitted to functional components FC1 FC
119890to
further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888
1 119888
119898are
returned to the user and the user decrypts them to obtainthe plaintexts 119889
1 119889
119898 In order to construct a functional
component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1
We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency
The Scientific World Journal 7
Input Token
User Search
Core (SSEASE) Converter
Outputm le x le n
Decrypt
Filter Convert
Filter
W = (w1 w2 w0) T = (t1 t2 t0)
c1 c2 cn
G(G1 G2 Gn)
L1 L2 Ln
d1 d2 dm
c1 c2 cm
c1 c2 cx
G(G1 G2 Gx)
L1 L2 Lx
VT = (V1 V2 Vx)
FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe
Figure 2 Search process of layered searchable encryption scheme
Build(119889 119881119889)
(1) input a document 119889 and the mappings of all words 119881119889in 119889
(2) specified according to the functionality(3) output a local functional structure 119871
119889
Filter(119862 119871 119881119879)
(1) input a set of encrypted document 119862 = (1198881 119888
119909) the corresponding local functional
structures 119871 = (1198711 119871
119909) and the mappings of the queried keywords 119881
119879= (1198811 119881
119909)
(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862
Algorithm 1 Template for functional component FC
rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms
442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905
1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo
and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as
ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)
Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings
Suppose there are 119899 documents 119863 = (1198891 119889
119899) 119899 cor-
responding ciphertexts 119862 = (1198881 119888
119899) 119890 functional compo-
nents FC= (1198651 119865
119890) and 119900 queried keywords119882 = (119908
1
119908119900) In addition we define a hash function as follows
119891ℎ 0 1
lowast997888rarr 0 1
119897 (3)
where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891
ℎ then 119897
is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4
443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890
1and 1198903map to the
word ldquodayrdquo)
ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)
Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1
Then we have
ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)
In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed
8 The Scientific World Journal
Input the encryption key 119870119890= 119870priv the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871
1 119871
119899)
Method(1) compute (120574 119862) larr SSEsdotEnc(119870
119890 119863) Here 119862 = (119888
1 119888
119899)
(2) for each document 119889119894isin 119863 and the corresponding 119888
119894(1 le 119894 le 119899) do
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) for each word 119908119896(1 le 119896 le 119903) in119882 do
(5) compute 119905119896larr SSEsdotToken(119870
119890 119908119896)
(6) compute V119894119896larr 119891ℎ(119905119896)
(7) end for(8) let 119881
119894= (V1198941 V
119894119903)
(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(11) end for(12) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(13) end for(14) let 119866 = 120574 and 119871 = (119871
1 119871
119899) and output (119862 119866 119871)
Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)
Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871
1 119871
119899)
(5) 119879 search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888
1 119888
119909)
(2) for each token 119905119896(1 le 119896 le 119900) compute V
119896larr 119891ℎ(119905119896)
(3) let 1198811= 1198812= sdot sdot sdot = 119881
119909= (V1 V
119900) and
(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(7) end for
Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)
Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 2
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889
Algorithm 4 LSE scheme symmetric part
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 The Scientific World Journal
and the user searches and decrypts the documents usingthe private key Beyond the basic keyword matching manyfunctions are also added to either symmetric or asymmetricsetting such as range query phrase query and fuzzy keywordquery
However these functions are often realized by differ-ent methods with different searchable structures which aregenerally not compatible with each other For example theasymmetric encryption scheme introduced in [2] realizedconjunctive subset and range queries However it is difficultto figure out how to apply this method to symmetric settingEven for the same setting such as the fuzzy query schemeintroduced in [3] and the rank-ordered query scheme intro-duced in [4] it is difficult to figure out how to combine twomethods together since the functions are constructed basedon different indexing structures
Layered searchable encryption (LSE) scheme aims toprovide compatibility flexibility and security for varioussettings and functionalities In this new framework key-words are firstly transformed to tokens that are filtered bythe core searchable component (symmetric or asymmetricsetting) and then the tokens are dynamically converted touniform mappings which are transmitted to many stand-alone functional components (eg ranked keyword querycomponent fuzzy query component etc) to further filterthe results Since all functional components are independentof each other and the interfaces are common the functionsare compatible with each other and directly support bothsymmetric and asymmetric settings and adding or deleting afunction is quite simple since each function is loosely coupledwith the core searchable component Furthermore LSEsupports combined query For example the query ldquoSELECTlowast WHERE keywords = ldquocloud storage encryptionrdquo ANDldquosecurity classification gt 5rdquo ORDERED BY ldquokeywordcloudrdquordquo(to express the query we adopt the SQL-like format used indatabase) is a combination of three functional componentsbasic query range query and ranked keyword query (inthis paper we will present the concrete construction for thisexample)
Furthermore this framework is similar to the data streamprocessing architecture [5] where functional componentscould be treated as operator boxes and the whole schemecould be treated as a data-flow system by which all processesfollow the popular boxes and arrows paradigmTherefore incomparison to the previous searchable encryption schemesLSE is more suitable for distributed and parallel computingenvironment
In this paper our contributions are the following (1)Wepropose a novel framework for designing searchable encryp-tion scheme called layered searchable encryption (LSE)which enables combined query and provides compatibilityflexibility and security for various settings and function-alities The new framework consists of a core searchablecomponent with a symmetricasymmetric converter manyfunctional components and a common interface with newsecurity model (2) We propose a concrete construction forLSE that could theoretically combine all possible function-alities which are proposed in the recent years and proveits semantic security for the interface (3) As a complement
for the prior works we formally define two new securitymodels for ranked keyword query and range query calledsemantic security against chosen ranked keyword attack(CRKA) and chosen range attack (CRA) respectively whichprovide integral security models for cryptographic analysis(4) Based on LSE we propose two representative and novelconstructions for ranked keyword query component (previ-ously only available in symmetric scheme) and range querycomponent (previously only available in asymmetric scheme)and prove them semantically secure under the new securitymodels
The rest of the paper is organized as follows Section 2presents the related work Section 3 presents the notationsand preliminaries Section 4 presents the layered searchableencryption scheme and the concrete construction Section 5discusses how to realize various functionalities and presentsthe concrete constructions for ranked keyword query andrange query Section 6 concludes this paper
2 Related Work
Searchable encryption schemes are designed to help the usersto securely search over the encrypted data by keywordsThe first scheme was introduced in [6] by Song et al andlater on many index-based symmetric searchable encryption(SSE) schemes were proposed Goh introduced the firstsecure index in [7] and they also built the security modelfor searchable encryption called Adaptive Chosen KeywordAttack (IND-CKA) In [8] Curtmola et al introduced twoconstructions to realize symmetric searchable encryptionthe first construction (named SSE-1) is nonadaptive andthe second one (named SSE-2) is adaptive A generalizationfor symmetric searchable encryption was introduced in [9]and a representative SSE system designed by Microsoft wasintroduced in [10] Another type of searchable encryptionnamed asymmetric searchable encryption (ASE) is public-key based which allows the user to search over the dataencrypted by some data senders using the public key of theuser The first scheme was introduced in [11] by Boneh etal based on bilinear maps and the improved definition wasintroduced in [12]
There are many functional extensions for the search-able encryption schemes beyond the basic precise keywordmatching For symmetric setting the authors in [4 13 14]introduced ranked keyword search schemes based on order-preserving encryption technique or two-round protocolwhich allows the server to only return the top-119896 relevantresults to the user In [15] Golle et al introduced a schemesupporting conjunctive keyword searchwhich allows the userto searchmultiple keywords in a single query In [3 16 17] theauthors introduced fuzzy keyword search schemes based onwildcard technique which allows the user to submit only partof the precise keyword Similar to fuzzy keyword search butdifferent the authors in [18 19] introduced similarity searchschemes based on wildcard technique which allows theserver to return the results similar to the queried keyword In[20 21] the authors introduced phrase query schemes basedon trusted client-side server or binary search which allows
The Scientific World Journal 3
the user to query a phrase instead of multiple independentkeywords For asymmetric setting the authors in [22 23]introduced range query schemes In addition Boneh et alalso introduced conjunctive and subset query in [22] basedon bilinear maps
Note that most of these techniques are not compatiblewith each other due to specific data structure andmathemati-cal property However in the following sections wewill provethat functional structures and searchable structures could beseparately constructed and asymmetric structures could beconverted to symmetric structures such that a compatible all-in-one scheme is possible
3 Notations and Preliminaries
We write 119909larr119880119883 to denote sampling element 119909 uniformly
random from a set 119883 and write 119909 larr A to denote the outputof an algorithmAWewrite 119886 119887 to denote the concatenationof two strings 119886 and 119887 We write |119860| to denote its cardinality if119860 is a set and write |119886| to denote its bit length if 119886 is a stringA function 120583(119896) N rarr R is negligible if for every positivepolynomial 119901(sdot) there exists an inter 119873 gt 0 such that for all119896 gt 119873 |120583(119896)| lt 1119901(119896) We write poly(119896) and negl(119896) todenote polynomial and negligible functions in 119896 respectively
We write Δ = (1199081 119908
119899) to denote a dictionary of 119899
words in lexicographic order We assume that all words areof length polynomial in 119896 We write 119889 to refer to a documentthat contains poly(119896) words and write |119889| to denote the sizeof the document in bytes In some cases we also write 119889 todenote the document identifier that uniquely identifies thedocument such as amemory locationWewrite X to denote acomponent or a scheme and write Xfunc( ) to denote thecorresponding function for the component or an algorithmin the scheme
4 Layered Searchable Encryption Scheme
Layered searchable encryption scheme aims to combinesymmetric and asymmetric searchable encryption schemesto provide a uniformmodel for functional extensionsThere-fore we first revisit the basic symmetric and asymmetricsearchable encryption models and then build the layeredsearchable encryption model based on these two differentsettings After that we introduce the security model ofthe new framework and finally we present the concreteconstruction
41 Revisiting Searchable Encryption Weadopt the definitionintroduced by Curtmola et al in [8] as a representativemodel for symmetric searchable encryption scheme In thissetting the user who searches for the documents is also thedata sender who encrypts the documents Therefore someefficient searching techniques such as using a global indexare used and the searchable structure may be a single indexfile for all stored documents For consistency with otherdefinitions we make a little modification for the originaldefinition and define the scheme as follows
Definition 1 (symmetric searchable encryption) A symmet-ric searchable encryption (SSE) scheme is a collection offive polynomial-time algorithms SSE = (Gen Enc TokenSearch Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takes asinput a security parameter 119896 and outputs a secret key119870 It is run by the user and the key is kept secret(120574 119862) larr Enc(119870119863) is a probabilistic algorithmthat takes as input a secret key 119870 and a documentcollection 119863 = (119889
1 119889
119899) and outputs a searchable
structure 120574 and a sequence of encrypted documents119862 = (119888
1 119888
119899) It enables a user to query some key-
words and the server returns thematched documentsFor instance in an index-based symmetric searchableencryption scheme 120574 is the secure index It is run bythe user and (120574 119862) is sent to the server119905 larr Token(119870 119908) is a deterministic algorithm thattakes as input a secret key 119870 and a keyword 119908 andoutputs a search token 119905 (also named trapdoor orcapacity) It is run by the user1198621015840larr Enc(119862 120574 119905) is a deterministic algorithm that
takes as input the encrypted documents 119862 thesearchable structure 120574 and the search token 119905 andoutputs the matched documents (or identifiers) 1198621015840 =(1198881 119888
119898) It is run by the server and1198621015840 is sent to the
user119889 larr Dec(119870 119888) is a deterministic algorithm that takesas input a secret key119870 and the encrypted document 119888and outputs the recovered plaintext 119889 It is run by theuser
We adopt the definition introduced by Boneh et al in[11] as a representative model for asymmetric searchableencryption scheme In this setting the user generates thepublic key and the private key The data sender encrypts thedata using the public key and the user searches and decryptsthe data using the private key The original definition onlycontains the searchable part for consistency we add twoalgorithms and define the asymmetric searchable encryptionas follows
Definition 2 (asymmetric searchable encryption) An asym-metric searchable encryption (ASE) scheme is a collection ofseven polynomial-time algorithms ASE = (Gen PEKS EncToken Test Search Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs a pub-licprivate key pair 119870 = (119870pub 119870priv) It is run by theuser and only119870priv is kept secret119904 larr PEKS(119870pub 119908) is a probabilistic algorithm thattakes as input a public key 119870pub and a word 119908 andoutputs a searchable structure 119904 It is run by the datasender and 119904 is attached to the encryptedmessage andthe combination is sent to the server119888 larr Enc(119870pub 119889) is a probabilistic algorithm thattakes as input a public key 119870pub and a document
4 The Scientific World Journal
(message) and outputs the ciphertext 119888 It is run bythe data sender and 119888 (followed bymultiple searchablestructures) is sent to the server
119905 larr Token(119870priv 119908) is a deterministic algorithm thattakes as input a private key119870priv and a keyword119908 andoutputs a search token 119905 It is run by the user
119887 larr Test(119870pub 119904 119905) is a deterministic algorithmthat takes as input the public key 119870pub a searchablestructure 119904 larrPEKS(119870pub 119908
1015840) and a search token 119905 larr
Token(119870priv 119908) and outputs 119887 = 1 if 119908 = 1199081015840 or 119887 = 0otherwise It is run by the server
1198621015840larr Search(119870pub 119862 119878 119905) is a deterministic algo-
rithm that takes as input the public key 119870pub theencrypted documents 119862 = (119862
1 119862
119899) the corre-
sponding searchable structure set 119878 = (1198781 119878
119899)
(each 119878119894containsmultiple searchable structures corre-
sponding to the keywords of the document) and thesearch token 119905 and outputs the matched documents1198621015840= (1198881 119888
119898) (the documentsrsquo searchable struc-
tures satisfying 1 larr Test(119870pub 119904 119905)) It is run by theserver and 1198621015840 is sent to the user
119889 larr Dec(119870priv 119888) is a probabilistic algorithm thattakes as input a private key 119870priv and a ciphertext 119888and outputs the plaintext 119889 It is run by the user
Unlike symmetric setting the definition of asymmetricsetting only works on a single document For a documentcollection it does not make any difference since the usercould execute the encryption algorithm for each documentrespectively
By comparing the definitions of the two different settingsthere exists a common link between the queried keywordsand the matched documents the searchable structure whichis constructed using either symmetric key or the public keyNote that the structure is probabilistic in the asymmetricsetting or else the server could directly launch the chosenplaintext attack using the public key However we say that forsymmetric and asymmetric settings the searchable structuresare both run-time deterministic To prove this property wefirst introduce a lemma as follows
Lemma 3 For asymmetric setting if the token 119905 generatedusing the private key 119870priv is deterministic then the searchablestructure 119904 encrypted using the public key 119870pub is run-time de-terministic when the the algorithm ASETest outputs 1 even ifthe encryption is probabilistic
Proof Recall that the algorithm 119905 larr Token(119870priv 119908) is deter-ministic and 119904 larr PEKS(119870pub 119908) is probabilistic Howeverfor a single document there only exists a single 119904 that linksto 119908 When 1 larr Test(119870pub 119904 119905) it implies that 119905 matches119904 We replace 119905 with 119904 then the token 119905 = 119904 which could begenerated by the data sender who could generate the tokenusing the public key 119905 larr Token1015840(119870pub 119908)which is in fact thealgorithm PEKS(119870pub 119908) It seems that the data sender hasindirectly generated the token without having the private key
Keywords
Tokenization
Converter
Mapping
Func1
Documents
Decryption
Core (symmetric asymmetric)
Filter
Func 2
Global layer
Local layerFunc
Figure 1 Architecture for layered searchable encryption scheme
Therefore when the output of the test is 1 both the token andthe searchable structure map to 119905 which is deterministic
Based on the lemma above we introduce a theoremwhichguides us to construct the converter in the layered searchableencryption scheme
Theorem 4 (run-time invariance) For both symmetric andasymmetric settings if the search token 119905 is deterministic thenthe searchable structure is run-time deterministic
Proof As proved in Lemma 3 the searchable structure is run-time deterministic for asymmetric setting For symmetricsetting the searchable structure is encrypted using the sym-metric key 120574 larr Enc(119870119863) which is probabilistic The token119905 larr Token(119870 119908) is deterministic Similar to asymmetricsetting when executing the deterministic algorithm Searchthematched entries (probabilistic)map to 119905 and themappingis deterministic (here an entry is the encrypted data usingsymmetric encryption that contains the information aboutthe matched document such as the node in the invertedindex [8]) In other words the searchable structure is run-time deterministic because of the deterministicmapping
42 Scheme Definition Our primary goal is to separate thefunctionalities from the searchable structures therefore weconsider to construct the basic searchable structures andvarious functions in different layers as shown in Figure 1
(i) Global layer we name this layer ldquoglobalrdquo because alldocuments and all searchable structures are involvedIn this layer the basic searchable encryption scheme(symmetric or asymmetric) is executed and a globalindex could be constructed to improve search effi-ciency The server receives the search tokens (eachtoken is related to a keyword) executes the searchprocedure and outputs the matched documents Fur-thermore the server converts the tokens (symmet-ric or asymmetric) to the corresponding mappings(another type of secret token) with uniform formatand transfers the mappings with the matched docu-ments (or identifiers) to the local layer
(ii) Local layer we name this layer ldquolocalrdquo because func-tional structures are constructed for each documentindependently In this layer each matched documentis further filtered by all functions (eg phrase query
The Scientific World Journal 5
function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user
For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows
Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870
119890 119863) is a probabilistic algorithm
that takes as input an encryption key 119870119890(119870119890= 119870priv
for symmetric setting or 119870119890= 119870pub for asymmetric
setting) and a document collection 119863 = (1198891 119889
119899)
It outputs 119899 encrypted documents 119862 = (1198881 119888
119899)
a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866
1 119866
119899) corresponding to 119899 documents
and a sequence of local functional structures 119871 =
(1198711 119871
119899) corresponding to 119899 encrypted docu-
ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908
1 119908
119900) with functional instruc-
tions and outputs the corresponding search tokens119879 = (119905
1 119905
119900) with functional instructions It is run
by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic
algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888
1 119888
119898) It is run by the
server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user
Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query
For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908
1 119908
119900)
(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions
A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows
Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows
119871119889larr Build(119889 119881
119889) is an algorithm that takes as input
a document 119889 and the corresponding conversion 119881119889
and outputs a functional structure 119871119889 It is run by
the data sender and 119871119889is appended to the encrypted
document
1198621015840larr Filter(119862 119871 119881
119879) is an algorithm that takes as
input the encrypted documents 119862 = (1198621 119862
119909)
the corresponding functional structure set 119871 =
(1198711 119871
119909) and the converted search tokens 119881
119879=
(1198811 119881
119909) and outputs a subset of documents 1198621015840 It
is run by the server
43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models
The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else
6 The Scientific World Journal
the security of some components will be correlated such thatthe loose coupling property is lost
Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface
Definition 7 (plain trace) Let119863 = (1198891 119889
119899) be a document
collection Let 119873 = (1198731 119873
119899) (only for asymmetric
setting) be a keyword-counter set where119873119894is the number of
keywords in 119889119894 Let the query history119882 = (119908
1 119908
119901) be a
sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882
119894= 119882119895and 0 otherwiseThe plain
trace 120587(119863119882) = (|1198891| |119889
119899| 119873 120590(119882))
Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface
Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889
1 119889
119899) a sequence of query
119882 = (1199081 119908
119901) and receives (119862 119866) larr Enc(119870
119890 119863)
and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881
119879as the input for
the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-
ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast
119879as the input for the
functional component FinallyA returns a bit 119887 thatis output by the experiment
We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (1)
where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here
since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately
The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example
if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component
Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later
44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface
441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888
1 119888
119899 Each 119888
119894(1 le 119894 le 119899) is linked to a
global searchable structure 119866119894(if a global index is used then
only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871
119894 and only
the global searchable structure 119866(1198661 119866
119899) is used in this
step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881
119879and the matched 119909 encrypted documents
are transmitted to functional components FC1 FC
119890to
further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888
1 119888
119898are
returned to the user and the user decrypts them to obtainthe plaintexts 119889
1 119889
119898 In order to construct a functional
component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1
We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency
The Scientific World Journal 7
Input Token
User Search
Core (SSEASE) Converter
Outputm le x le n
Decrypt
Filter Convert
Filter
W = (w1 w2 w0) T = (t1 t2 t0)
c1 c2 cn
G(G1 G2 Gn)
L1 L2 Ln
d1 d2 dm
c1 c2 cm
c1 c2 cx
G(G1 G2 Gx)
L1 L2 Lx
VT = (V1 V2 Vx)
FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe
Figure 2 Search process of layered searchable encryption scheme
Build(119889 119881119889)
(1) input a document 119889 and the mappings of all words 119881119889in 119889
(2) specified according to the functionality(3) output a local functional structure 119871
119889
Filter(119862 119871 119881119879)
(1) input a set of encrypted document 119862 = (1198881 119888
119909) the corresponding local functional
structures 119871 = (1198711 119871
119909) and the mappings of the queried keywords 119881
119879= (1198811 119881
119909)
(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862
Algorithm 1 Template for functional component FC
rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms
442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905
1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo
and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as
ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)
Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings
Suppose there are 119899 documents 119863 = (1198891 119889
119899) 119899 cor-
responding ciphertexts 119862 = (1198881 119888
119899) 119890 functional compo-
nents FC= (1198651 119865
119890) and 119900 queried keywords119882 = (119908
1
119908119900) In addition we define a hash function as follows
119891ℎ 0 1
lowast997888rarr 0 1
119897 (3)
where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891
ℎ then 119897
is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4
443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890
1and 1198903map to the
word ldquodayrdquo)
ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)
Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1
Then we have
ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)
In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed
8 The Scientific World Journal
Input the encryption key 119870119890= 119870priv the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871
1 119871
119899)
Method(1) compute (120574 119862) larr SSEsdotEnc(119870
119890 119863) Here 119862 = (119888
1 119888
119899)
(2) for each document 119889119894isin 119863 and the corresponding 119888
119894(1 le 119894 le 119899) do
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) for each word 119908119896(1 le 119896 le 119903) in119882 do
(5) compute 119905119896larr SSEsdotToken(119870
119890 119908119896)
(6) compute V119894119896larr 119891ℎ(119905119896)
(7) end for(8) let 119881
119894= (V1198941 V
119894119903)
(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(11) end for(12) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(13) end for(14) let 119866 = 120574 and 119871 = (119871
1 119871
119899) and output (119862 119866 119871)
Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)
Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871
1 119871
119899)
(5) 119879 search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888
1 119888
119909)
(2) for each token 119905119896(1 le 119896 le 119900) compute V
119896larr 119891ℎ(119905119896)
(3) let 1198811= 1198812= sdot sdot sdot = 119881
119909= (V1 V
119900) and
(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(7) end for
Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)
Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 2
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889
Algorithm 4 LSE scheme symmetric part
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 3
the user to query a phrase instead of multiple independentkeywords For asymmetric setting the authors in [22 23]introduced range query schemes In addition Boneh et alalso introduced conjunctive and subset query in [22] basedon bilinear maps
Note that most of these techniques are not compatiblewith each other due to specific data structure andmathemati-cal property However in the following sections wewill provethat functional structures and searchable structures could beseparately constructed and asymmetric structures could beconverted to symmetric structures such that a compatible all-in-one scheme is possible
3 Notations and Preliminaries
We write 119909larr119880119883 to denote sampling element 119909 uniformly
random from a set 119883 and write 119909 larr A to denote the outputof an algorithmAWewrite 119886 119887 to denote the concatenationof two strings 119886 and 119887 We write |119860| to denote its cardinality if119860 is a set and write |119886| to denote its bit length if 119886 is a stringA function 120583(119896) N rarr R is negligible if for every positivepolynomial 119901(sdot) there exists an inter 119873 gt 0 such that for all119896 gt 119873 |120583(119896)| lt 1119901(119896) We write poly(119896) and negl(119896) todenote polynomial and negligible functions in 119896 respectively
We write Δ = (1199081 119908
119899) to denote a dictionary of 119899
words in lexicographic order We assume that all words areof length polynomial in 119896 We write 119889 to refer to a documentthat contains poly(119896) words and write |119889| to denote the sizeof the document in bytes In some cases we also write 119889 todenote the document identifier that uniquely identifies thedocument such as amemory locationWewrite X to denote acomponent or a scheme and write Xfunc( ) to denote thecorresponding function for the component or an algorithmin the scheme
4 Layered Searchable Encryption Scheme
Layered searchable encryption scheme aims to combinesymmetric and asymmetric searchable encryption schemesto provide a uniformmodel for functional extensionsThere-fore we first revisit the basic symmetric and asymmetricsearchable encryption models and then build the layeredsearchable encryption model based on these two differentsettings After that we introduce the security model ofthe new framework and finally we present the concreteconstruction
41 Revisiting Searchable Encryption Weadopt the definitionintroduced by Curtmola et al in [8] as a representativemodel for symmetric searchable encryption scheme In thissetting the user who searches for the documents is also thedata sender who encrypts the documents Therefore someefficient searching techniques such as using a global indexare used and the searchable structure may be a single indexfile for all stored documents For consistency with otherdefinitions we make a little modification for the originaldefinition and define the scheme as follows
Definition 1 (symmetric searchable encryption) A symmet-ric searchable encryption (SSE) scheme is a collection offive polynomial-time algorithms SSE = (Gen Enc TokenSearch Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takes asinput a security parameter 119896 and outputs a secret key119870 It is run by the user and the key is kept secret(120574 119862) larr Enc(119870119863) is a probabilistic algorithmthat takes as input a secret key 119870 and a documentcollection 119863 = (119889
1 119889
119899) and outputs a searchable
structure 120574 and a sequence of encrypted documents119862 = (119888
1 119888
119899) It enables a user to query some key-
words and the server returns thematched documentsFor instance in an index-based symmetric searchableencryption scheme 120574 is the secure index It is run bythe user and (120574 119862) is sent to the server119905 larr Token(119870 119908) is a deterministic algorithm thattakes as input a secret key 119870 and a keyword 119908 andoutputs a search token 119905 (also named trapdoor orcapacity) It is run by the user1198621015840larr Enc(119862 120574 119905) is a deterministic algorithm that
takes as input the encrypted documents 119862 thesearchable structure 120574 and the search token 119905 andoutputs the matched documents (or identifiers) 1198621015840 =(1198881 119888
119898) It is run by the server and1198621015840 is sent to the
user119889 larr Dec(119870 119888) is a deterministic algorithm that takesas input a secret key119870 and the encrypted document 119888and outputs the recovered plaintext 119889 It is run by theuser
We adopt the definition introduced by Boneh et al in[11] as a representative model for asymmetric searchableencryption scheme In this setting the user generates thepublic key and the private key The data sender encrypts thedata using the public key and the user searches and decryptsthe data using the private key The original definition onlycontains the searchable part for consistency we add twoalgorithms and define the asymmetric searchable encryptionas follows
Definition 2 (asymmetric searchable encryption) An asym-metric searchable encryption (ASE) scheme is a collection ofseven polynomial-time algorithms ASE = (Gen PEKS EncToken Test Search Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs a pub-licprivate key pair 119870 = (119870pub 119870priv) It is run by theuser and only119870priv is kept secret119904 larr PEKS(119870pub 119908) is a probabilistic algorithm thattakes as input a public key 119870pub and a word 119908 andoutputs a searchable structure 119904 It is run by the datasender and 119904 is attached to the encryptedmessage andthe combination is sent to the server119888 larr Enc(119870pub 119889) is a probabilistic algorithm thattakes as input a public key 119870pub and a document
4 The Scientific World Journal
(message) and outputs the ciphertext 119888 It is run bythe data sender and 119888 (followed bymultiple searchablestructures) is sent to the server
119905 larr Token(119870priv 119908) is a deterministic algorithm thattakes as input a private key119870priv and a keyword119908 andoutputs a search token 119905 It is run by the user
119887 larr Test(119870pub 119904 119905) is a deterministic algorithmthat takes as input the public key 119870pub a searchablestructure 119904 larrPEKS(119870pub 119908
1015840) and a search token 119905 larr
Token(119870priv 119908) and outputs 119887 = 1 if 119908 = 1199081015840 or 119887 = 0otherwise It is run by the server
1198621015840larr Search(119870pub 119862 119878 119905) is a deterministic algo-
rithm that takes as input the public key 119870pub theencrypted documents 119862 = (119862
1 119862
119899) the corre-
sponding searchable structure set 119878 = (1198781 119878
119899)
(each 119878119894containsmultiple searchable structures corre-
sponding to the keywords of the document) and thesearch token 119905 and outputs the matched documents1198621015840= (1198881 119888
119898) (the documentsrsquo searchable struc-
tures satisfying 1 larr Test(119870pub 119904 119905)) It is run by theserver and 1198621015840 is sent to the user
119889 larr Dec(119870priv 119888) is a probabilistic algorithm thattakes as input a private key 119870priv and a ciphertext 119888and outputs the plaintext 119889 It is run by the user
Unlike symmetric setting the definition of asymmetricsetting only works on a single document For a documentcollection it does not make any difference since the usercould execute the encryption algorithm for each documentrespectively
By comparing the definitions of the two different settingsthere exists a common link between the queried keywordsand the matched documents the searchable structure whichis constructed using either symmetric key or the public keyNote that the structure is probabilistic in the asymmetricsetting or else the server could directly launch the chosenplaintext attack using the public key However we say that forsymmetric and asymmetric settings the searchable structuresare both run-time deterministic To prove this property wefirst introduce a lemma as follows
Lemma 3 For asymmetric setting if the token 119905 generatedusing the private key 119870priv is deterministic then the searchablestructure 119904 encrypted using the public key 119870pub is run-time de-terministic when the the algorithm ASETest outputs 1 even ifthe encryption is probabilistic
Proof Recall that the algorithm 119905 larr Token(119870priv 119908) is deter-ministic and 119904 larr PEKS(119870pub 119908) is probabilistic Howeverfor a single document there only exists a single 119904 that linksto 119908 When 1 larr Test(119870pub 119904 119905) it implies that 119905 matches119904 We replace 119905 with 119904 then the token 119905 = 119904 which could begenerated by the data sender who could generate the tokenusing the public key 119905 larr Token1015840(119870pub 119908)which is in fact thealgorithm PEKS(119870pub 119908) It seems that the data sender hasindirectly generated the token without having the private key
Keywords
Tokenization
Converter
Mapping
Func1
Documents
Decryption
Core (symmetric asymmetric)
Filter
Func 2
Global layer
Local layerFunc
Figure 1 Architecture for layered searchable encryption scheme
Therefore when the output of the test is 1 both the token andthe searchable structure map to 119905 which is deterministic
Based on the lemma above we introduce a theoremwhichguides us to construct the converter in the layered searchableencryption scheme
Theorem 4 (run-time invariance) For both symmetric andasymmetric settings if the search token 119905 is deterministic thenthe searchable structure is run-time deterministic
Proof As proved in Lemma 3 the searchable structure is run-time deterministic for asymmetric setting For symmetricsetting the searchable structure is encrypted using the sym-metric key 120574 larr Enc(119870119863) which is probabilistic The token119905 larr Token(119870 119908) is deterministic Similar to asymmetricsetting when executing the deterministic algorithm Searchthematched entries (probabilistic)map to 119905 and themappingis deterministic (here an entry is the encrypted data usingsymmetric encryption that contains the information aboutthe matched document such as the node in the invertedindex [8]) In other words the searchable structure is run-time deterministic because of the deterministicmapping
42 Scheme Definition Our primary goal is to separate thefunctionalities from the searchable structures therefore weconsider to construct the basic searchable structures andvarious functions in different layers as shown in Figure 1
(i) Global layer we name this layer ldquoglobalrdquo because alldocuments and all searchable structures are involvedIn this layer the basic searchable encryption scheme(symmetric or asymmetric) is executed and a globalindex could be constructed to improve search effi-ciency The server receives the search tokens (eachtoken is related to a keyword) executes the searchprocedure and outputs the matched documents Fur-thermore the server converts the tokens (symmet-ric or asymmetric) to the corresponding mappings(another type of secret token) with uniform formatand transfers the mappings with the matched docu-ments (or identifiers) to the local layer
(ii) Local layer we name this layer ldquolocalrdquo because func-tional structures are constructed for each documentindependently In this layer each matched documentis further filtered by all functions (eg phrase query
The Scientific World Journal 5
function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user
For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows
Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870
119890 119863) is a probabilistic algorithm
that takes as input an encryption key 119870119890(119870119890= 119870priv
for symmetric setting or 119870119890= 119870pub for asymmetric
setting) and a document collection 119863 = (1198891 119889
119899)
It outputs 119899 encrypted documents 119862 = (1198881 119888
119899)
a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866
1 119866
119899) corresponding to 119899 documents
and a sequence of local functional structures 119871 =
(1198711 119871
119899) corresponding to 119899 encrypted docu-
ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908
1 119908
119900) with functional instruc-
tions and outputs the corresponding search tokens119879 = (119905
1 119905
119900) with functional instructions It is run
by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic
algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888
1 119888
119898) It is run by the
server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user
Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query
For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908
1 119908
119900)
(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions
A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows
Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows
119871119889larr Build(119889 119881
119889) is an algorithm that takes as input
a document 119889 and the corresponding conversion 119881119889
and outputs a functional structure 119871119889 It is run by
the data sender and 119871119889is appended to the encrypted
document
1198621015840larr Filter(119862 119871 119881
119879) is an algorithm that takes as
input the encrypted documents 119862 = (1198621 119862
119909)
the corresponding functional structure set 119871 =
(1198711 119871
119909) and the converted search tokens 119881
119879=
(1198811 119881
119909) and outputs a subset of documents 1198621015840 It
is run by the server
43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models
The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else
6 The Scientific World Journal
the security of some components will be correlated such thatthe loose coupling property is lost
Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface
Definition 7 (plain trace) Let119863 = (1198891 119889
119899) be a document
collection Let 119873 = (1198731 119873
119899) (only for asymmetric
setting) be a keyword-counter set where119873119894is the number of
keywords in 119889119894 Let the query history119882 = (119908
1 119908
119901) be a
sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882
119894= 119882119895and 0 otherwiseThe plain
trace 120587(119863119882) = (|1198891| |119889
119899| 119873 120590(119882))
Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface
Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889
1 119889
119899) a sequence of query
119882 = (1199081 119908
119901) and receives (119862 119866) larr Enc(119870
119890 119863)
and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881
119879as the input for
the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-
ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast
119879as the input for the
functional component FinallyA returns a bit 119887 thatis output by the experiment
We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (1)
where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here
since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately
The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example
if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component
Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later
44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface
441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888
1 119888
119899 Each 119888
119894(1 le 119894 le 119899) is linked to a
global searchable structure 119866119894(if a global index is used then
only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871
119894 and only
the global searchable structure 119866(1198661 119866
119899) is used in this
step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881
119879and the matched 119909 encrypted documents
are transmitted to functional components FC1 FC
119890to
further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888
1 119888
119898are
returned to the user and the user decrypts them to obtainthe plaintexts 119889
1 119889
119898 In order to construct a functional
component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1
We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency
The Scientific World Journal 7
Input Token
User Search
Core (SSEASE) Converter
Outputm le x le n
Decrypt
Filter Convert
Filter
W = (w1 w2 w0) T = (t1 t2 t0)
c1 c2 cn
G(G1 G2 Gn)
L1 L2 Ln
d1 d2 dm
c1 c2 cm
c1 c2 cx
G(G1 G2 Gx)
L1 L2 Lx
VT = (V1 V2 Vx)
FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe
Figure 2 Search process of layered searchable encryption scheme
Build(119889 119881119889)
(1) input a document 119889 and the mappings of all words 119881119889in 119889
(2) specified according to the functionality(3) output a local functional structure 119871
119889
Filter(119862 119871 119881119879)
(1) input a set of encrypted document 119862 = (1198881 119888
119909) the corresponding local functional
structures 119871 = (1198711 119871
119909) and the mappings of the queried keywords 119881
119879= (1198811 119881
119909)
(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862
Algorithm 1 Template for functional component FC
rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms
442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905
1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo
and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as
ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)
Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings
Suppose there are 119899 documents 119863 = (1198891 119889
119899) 119899 cor-
responding ciphertexts 119862 = (1198881 119888
119899) 119890 functional compo-
nents FC= (1198651 119865
119890) and 119900 queried keywords119882 = (119908
1
119908119900) In addition we define a hash function as follows
119891ℎ 0 1
lowast997888rarr 0 1
119897 (3)
where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891
ℎ then 119897
is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4
443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890
1and 1198903map to the
word ldquodayrdquo)
ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)
Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1
Then we have
ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)
In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed
8 The Scientific World Journal
Input the encryption key 119870119890= 119870priv the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871
1 119871
119899)
Method(1) compute (120574 119862) larr SSEsdotEnc(119870
119890 119863) Here 119862 = (119888
1 119888
119899)
(2) for each document 119889119894isin 119863 and the corresponding 119888
119894(1 le 119894 le 119899) do
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) for each word 119908119896(1 le 119896 le 119903) in119882 do
(5) compute 119905119896larr SSEsdotToken(119870
119890 119908119896)
(6) compute V119894119896larr 119891ℎ(119905119896)
(7) end for(8) let 119881
119894= (V1198941 V
119894119903)
(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(11) end for(12) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(13) end for(14) let 119866 = 120574 and 119871 = (119871
1 119871
119899) and output (119862 119866 119871)
Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)
Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871
1 119871
119899)
(5) 119879 search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888
1 119888
119909)
(2) for each token 119905119896(1 le 119896 le 119900) compute V
119896larr 119891ℎ(119905119896)
(3) let 1198811= 1198812= sdot sdot sdot = 119881
119909= (V1 V
119900) and
(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(7) end for
Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)
Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 2
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889
Algorithm 4 LSE scheme symmetric part
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 The Scientific World Journal
(message) and outputs the ciphertext 119888 It is run bythe data sender and 119888 (followed bymultiple searchablestructures) is sent to the server
119905 larr Token(119870priv 119908) is a deterministic algorithm thattakes as input a private key119870priv and a keyword119908 andoutputs a search token 119905 It is run by the user
119887 larr Test(119870pub 119904 119905) is a deterministic algorithmthat takes as input the public key 119870pub a searchablestructure 119904 larrPEKS(119870pub 119908
1015840) and a search token 119905 larr
Token(119870priv 119908) and outputs 119887 = 1 if 119908 = 1199081015840 or 119887 = 0otherwise It is run by the server
1198621015840larr Search(119870pub 119862 119878 119905) is a deterministic algo-
rithm that takes as input the public key 119870pub theencrypted documents 119862 = (119862
1 119862
119899) the corre-
sponding searchable structure set 119878 = (1198781 119878
119899)
(each 119878119894containsmultiple searchable structures corre-
sponding to the keywords of the document) and thesearch token 119905 and outputs the matched documents1198621015840= (1198881 119888
119898) (the documentsrsquo searchable struc-
tures satisfying 1 larr Test(119870pub 119904 119905)) It is run by theserver and 1198621015840 is sent to the user
119889 larr Dec(119870priv 119888) is a probabilistic algorithm thattakes as input a private key 119870priv and a ciphertext 119888and outputs the plaintext 119889 It is run by the user
Unlike symmetric setting the definition of asymmetricsetting only works on a single document For a documentcollection it does not make any difference since the usercould execute the encryption algorithm for each documentrespectively
By comparing the definitions of the two different settingsthere exists a common link between the queried keywordsand the matched documents the searchable structure whichis constructed using either symmetric key or the public keyNote that the structure is probabilistic in the asymmetricsetting or else the server could directly launch the chosenplaintext attack using the public key However we say that forsymmetric and asymmetric settings the searchable structuresare both run-time deterministic To prove this property wefirst introduce a lemma as follows
Lemma 3 For asymmetric setting if the token 119905 generatedusing the private key 119870priv is deterministic then the searchablestructure 119904 encrypted using the public key 119870pub is run-time de-terministic when the the algorithm ASETest outputs 1 even ifthe encryption is probabilistic
Proof Recall that the algorithm 119905 larr Token(119870priv 119908) is deter-ministic and 119904 larr PEKS(119870pub 119908) is probabilistic Howeverfor a single document there only exists a single 119904 that linksto 119908 When 1 larr Test(119870pub 119904 119905) it implies that 119905 matches119904 We replace 119905 with 119904 then the token 119905 = 119904 which could begenerated by the data sender who could generate the tokenusing the public key 119905 larr Token1015840(119870pub 119908)which is in fact thealgorithm PEKS(119870pub 119908) It seems that the data sender hasindirectly generated the token without having the private key
Keywords
Tokenization
Converter
Mapping
Func1
Documents
Decryption
Core (symmetric asymmetric)
Filter
Func 2
Global layer
Local layerFunc
Figure 1 Architecture for layered searchable encryption scheme
Therefore when the output of the test is 1 both the token andthe searchable structure map to 119905 which is deterministic
Based on the lemma above we introduce a theoremwhichguides us to construct the converter in the layered searchableencryption scheme
Theorem 4 (run-time invariance) For both symmetric andasymmetric settings if the search token 119905 is deterministic thenthe searchable structure is run-time deterministic
Proof As proved in Lemma 3 the searchable structure is run-time deterministic for asymmetric setting For symmetricsetting the searchable structure is encrypted using the sym-metric key 120574 larr Enc(119870119863) which is probabilistic The token119905 larr Token(119870 119908) is deterministic Similar to asymmetricsetting when executing the deterministic algorithm Searchthematched entries (probabilistic)map to 119905 and themappingis deterministic (here an entry is the encrypted data usingsymmetric encryption that contains the information aboutthe matched document such as the node in the invertedindex [8]) In other words the searchable structure is run-time deterministic because of the deterministicmapping
42 Scheme Definition Our primary goal is to separate thefunctionalities from the searchable structures therefore weconsider to construct the basic searchable structures andvarious functions in different layers as shown in Figure 1
(i) Global layer we name this layer ldquoglobalrdquo because alldocuments and all searchable structures are involvedIn this layer the basic searchable encryption scheme(symmetric or asymmetric) is executed and a globalindex could be constructed to improve search effi-ciency The server receives the search tokens (eachtoken is related to a keyword) executes the searchprocedure and outputs the matched documents Fur-thermore the server converts the tokens (symmet-ric or asymmetric) to the corresponding mappings(another type of secret token) with uniform formatand transfers the mappings with the matched docu-ments (or identifiers) to the local layer
(ii) Local layer we name this layer ldquolocalrdquo because func-tional structures are constructed for each documentindependently In this layer each matched documentis further filtered by all functions (eg phrase query
The Scientific World Journal 5
function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user
For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows
Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870
119890 119863) is a probabilistic algorithm
that takes as input an encryption key 119870119890(119870119890= 119870priv
for symmetric setting or 119870119890= 119870pub for asymmetric
setting) and a document collection 119863 = (1198891 119889
119899)
It outputs 119899 encrypted documents 119862 = (1198881 119888
119899)
a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866
1 119866
119899) corresponding to 119899 documents
and a sequence of local functional structures 119871 =
(1198711 119871
119899) corresponding to 119899 encrypted docu-
ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908
1 119908
119900) with functional instruc-
tions and outputs the corresponding search tokens119879 = (119905
1 119905
119900) with functional instructions It is run
by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic
algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888
1 119888
119898) It is run by the
server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user
Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query
For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908
1 119908
119900)
(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions
A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows
Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows
119871119889larr Build(119889 119881
119889) is an algorithm that takes as input
a document 119889 and the corresponding conversion 119881119889
and outputs a functional structure 119871119889 It is run by
the data sender and 119871119889is appended to the encrypted
document
1198621015840larr Filter(119862 119871 119881
119879) is an algorithm that takes as
input the encrypted documents 119862 = (1198621 119862
119909)
the corresponding functional structure set 119871 =
(1198711 119871
119909) and the converted search tokens 119881
119879=
(1198811 119881
119909) and outputs a subset of documents 1198621015840 It
is run by the server
43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models
The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else
6 The Scientific World Journal
the security of some components will be correlated such thatthe loose coupling property is lost
Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface
Definition 7 (plain trace) Let119863 = (1198891 119889
119899) be a document
collection Let 119873 = (1198731 119873
119899) (only for asymmetric
setting) be a keyword-counter set where119873119894is the number of
keywords in 119889119894 Let the query history119882 = (119908
1 119908
119901) be a
sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882
119894= 119882119895and 0 otherwiseThe plain
trace 120587(119863119882) = (|1198891| |119889
119899| 119873 120590(119882))
Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface
Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889
1 119889
119899) a sequence of query
119882 = (1199081 119908
119901) and receives (119862 119866) larr Enc(119870
119890 119863)
and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881
119879as the input for
the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-
ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast
119879as the input for the
functional component FinallyA returns a bit 119887 thatis output by the experiment
We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (1)
where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here
since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately
The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example
if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component
Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later
44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface
441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888
1 119888
119899 Each 119888
119894(1 le 119894 le 119899) is linked to a
global searchable structure 119866119894(if a global index is used then
only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871
119894 and only
the global searchable structure 119866(1198661 119866
119899) is used in this
step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881
119879and the matched 119909 encrypted documents
are transmitted to functional components FC1 FC
119890to
further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888
1 119888
119898are
returned to the user and the user decrypts them to obtainthe plaintexts 119889
1 119889
119898 In order to construct a functional
component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1
We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency
The Scientific World Journal 7
Input Token
User Search
Core (SSEASE) Converter
Outputm le x le n
Decrypt
Filter Convert
Filter
W = (w1 w2 w0) T = (t1 t2 t0)
c1 c2 cn
G(G1 G2 Gn)
L1 L2 Ln
d1 d2 dm
c1 c2 cm
c1 c2 cx
G(G1 G2 Gx)
L1 L2 Lx
VT = (V1 V2 Vx)
FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe
Figure 2 Search process of layered searchable encryption scheme
Build(119889 119881119889)
(1) input a document 119889 and the mappings of all words 119881119889in 119889
(2) specified according to the functionality(3) output a local functional structure 119871
119889
Filter(119862 119871 119881119879)
(1) input a set of encrypted document 119862 = (1198881 119888
119909) the corresponding local functional
structures 119871 = (1198711 119871
119909) and the mappings of the queried keywords 119881
119879= (1198811 119881
119909)
(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862
Algorithm 1 Template for functional component FC
rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms
442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905
1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo
and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as
ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)
Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings
Suppose there are 119899 documents 119863 = (1198891 119889
119899) 119899 cor-
responding ciphertexts 119862 = (1198881 119888
119899) 119890 functional compo-
nents FC= (1198651 119865
119890) and 119900 queried keywords119882 = (119908
1
119908119900) In addition we define a hash function as follows
119891ℎ 0 1
lowast997888rarr 0 1
119897 (3)
where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891
ℎ then 119897
is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4
443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890
1and 1198903map to the
word ldquodayrdquo)
ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)
Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1
Then we have
ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)
In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed
8 The Scientific World Journal
Input the encryption key 119870119890= 119870priv the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871
1 119871
119899)
Method(1) compute (120574 119862) larr SSEsdotEnc(119870
119890 119863) Here 119862 = (119888
1 119888
119899)
(2) for each document 119889119894isin 119863 and the corresponding 119888
119894(1 le 119894 le 119899) do
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) for each word 119908119896(1 le 119896 le 119903) in119882 do
(5) compute 119905119896larr SSEsdotToken(119870
119890 119908119896)
(6) compute V119894119896larr 119891ℎ(119905119896)
(7) end for(8) let 119881
119894= (V1198941 V
119894119903)
(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(11) end for(12) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(13) end for(14) let 119866 = 120574 and 119871 = (119871
1 119871
119899) and output (119862 119866 119871)
Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)
Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871
1 119871
119899)
(5) 119879 search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888
1 119888
119909)
(2) for each token 119905119896(1 le 119896 le 119900) compute V
119896larr 119891ℎ(119905119896)
(3) let 1198811= 1198812= sdot sdot sdot = 119881
119909= (V1 V
119900) and
(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(7) end for
Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)
Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 2
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889
Algorithm 4 LSE scheme symmetric part
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 5
function) which execute separately Only the docu-ments that pass all filter tests are returned to the globallayer and finally return to the user
For both layers the framework consists of three differentcomponents the core symmetric and asymmetric searchablecomponents which provide basic keyword search one ormore functional components which provides various func-tionalities and a converter The converter is an algorithmthat provides a uniform interface for both symmetric andasymmetric settings and provides uniform inputs for allfunctions We note that all components in the two layersexecute the search algorithm on the server side and notrusted third-party is required Now we formally define thescheme as follows
Definition 5 (layered searchable encryption) A layeredsearchable encryption (LSE) scheme is a collection offive polynomial-time algorithms LSE = (Gen Enc TokenSearch Dec) as follows
119870 larr Gen(1119896) is a probabilistic algorithm that takesas input a security parameter 119896 and outputs eithera symmetric encryption key 119870 = 119870priv or anasymmetric encryption key pair 119870 = (119870pub 119870priv) Itis run by the user and only the public key 119870pub is notkept secret(119862 119866 119871) larr Enc(119870
119890 119863) is a probabilistic algorithm
that takes as input an encryption key 119870119890(119870119890= 119870priv
for symmetric setting or 119870119890= 119870pub for asymmetric
setting) and a document collection 119863 = (1198891 119889
119899)
It outputs 119899 encrypted documents 119862 = (1198881 119888
119899)
a single (index-based) global searchable structure119866 or a sequence of global searchable structures119866 = (119866
1 119866
119899) corresponding to 119899 documents
and a sequence of local functional structures 119871 =
(1198711 119871
119899) corresponding to 119899 encrypted docu-
ments It is run by the data sender and (119862 119866 119871) aresent to the server119879 larr Token(119870priv 119908) is a deterministic algorithmthat takes as input a secret key 119870priv and a set ofkeywords119882 = (119908
1 119908
119900) with functional instruc-
tions and outputs the corresponding search tokens119879 = (119905
1 119905
119900) with functional instructions It is run
by the user and 119879 is sent to the server1198621015840larr Search(119870pub 119862 119866 119871 119879) is a deterministic
algorithm that takes as input a public key 119870pub (onlyfor asymmetric setting) the encrypted documents 119862the global searchable structure119866 the local functionalstructure 119871 and the search token 119879 and outputs thematched documents 1198621015840 = (119888
1 119888
119898) It is run by the
server and 1198621015840 is sent to the user119889 larr Dec(119870priv 119888) is a deterministic algorithm thattakes as input a secret key 119870priv and an encrypteddocument 119888 and outputs the plaintext 119889 It is run bythe user
Functional instructions are separately specified by thefunctionalities and are written as a single SQL-like query
For example the query ldquoSELECT lowast WHERE keywords =ldquocloud storage encryptionrdquo AND ldquosecurity classification gt5rdquo ORDERED BY ldquokeywordcloudrdquordquo indicate that findingthe documents that satisfying containing the keywordsldquocloud storage encryptionrdquo the security classification of thedocuments gt 5 sorting the matched documents by relevancescore according to the keyword ldquocloudrdquo and return the top-119896 relevant documents Here we only write119882 = (119908
1 119908
119900)
(eg 119882 = ldquocloud storage encryptionrdquo) as a representationfor any instruction that contains the keywords Similarlythe tokens 119879 are just a representation for all functionalinstructions
A functional component (FC) is a module in LSE thatprovides a specific functionality It generates a local func-tional structure 119871 for each encrypted document and providesfilter service while searching FC is designed to be compatiblewith both symmetric and asymmetric settings Therefore aconversion for the document as well as the query is requiredWe formally define the FC as follows
Definition 6 (functional component) A functional compo-nent (FC) is a collection of two polynomial-time algorithmsFC = (Build Filter) as follows
119871119889larr Build(119889 119881
119889) is an algorithm that takes as input
a document 119889 and the corresponding conversion 119881119889
and outputs a functional structure 119871119889 It is run by
the data sender and 119871119889is appended to the encrypted
document
1198621015840larr Filter(119862 119871 119881
119879) is an algorithm that takes as
input the encrypted documents 119862 = (1198621 119862
119909)
the corresponding functional structure set 119871 =
(1198711 119871
119909) and the converted search tokens 119881
119879=
(1198811 119881
119909) and outputs a subset of documents 1198621015840 It
is run by the server
43 Security Model The security of LSE relies on thealgorithms used by the components For example if thesymmetric searchable encryption scheme introduced in [8]is used as the core searchable component then the coresearchable structure guarantees that it is semantic secureagainst chosen keyword attack (CKA-secure) Similarly thefunctional components have their individual security guar-antees Therefore the whole LSE scheme does not havea uniform security model and security models are builtseparately and each component could be analyzed indepen-dently However we could divide the security models intothree parts searchable component security interface securityand functional component security Searchable componentsecurity is guaranteed by the underlying core searchableencryption scheme Therefore we mainly discuss the othertwo security models
The interface is common and therefore the data that flowthrough the interface must be semantic secure Informallyspeaking it must guarantee that the adversary cannot dis-tinguish the input and the output of each component fromrandom strings Semantic security against chosen plaintextattack (CPA) is very important for the interface or else
6 The Scientific World Journal
the security of some components will be correlated such thatthe loose coupling property is lost
Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface
Definition 7 (plain trace) Let119863 = (1198891 119889
119899) be a document
collection Let 119873 = (1198731 119873
119899) (only for asymmetric
setting) be a keyword-counter set where119873119894is the number of
keywords in 119889119894 Let the query history119882 = (119908
1 119908
119901) be a
sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882
119894= 119882119895and 0 otherwiseThe plain
trace 120587(119863119882) = (|1198891| |119889
119899| 119873 120590(119882))
Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface
Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889
1 119889
119899) a sequence of query
119882 = (1199081 119908
119901) and receives (119862 119866) larr Enc(119870
119890 119863)
and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881
119879as the input for
the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-
ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast
119879as the input for the
functional component FinallyA returns a bit 119887 thatis output by the experiment
We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (1)
where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here
since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately
The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example
if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component
Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later
44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface
441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888
1 119888
119899 Each 119888
119894(1 le 119894 le 119899) is linked to a
global searchable structure 119866119894(if a global index is used then
only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871
119894 and only
the global searchable structure 119866(1198661 119866
119899) is used in this
step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881
119879and the matched 119909 encrypted documents
are transmitted to functional components FC1 FC
119890to
further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888
1 119888
119898are
returned to the user and the user decrypts them to obtainthe plaintexts 119889
1 119889
119898 In order to construct a functional
component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1
We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency
The Scientific World Journal 7
Input Token
User Search
Core (SSEASE) Converter
Outputm le x le n
Decrypt
Filter Convert
Filter
W = (w1 w2 w0) T = (t1 t2 t0)
c1 c2 cn
G(G1 G2 Gn)
L1 L2 Ln
d1 d2 dm
c1 c2 cm
c1 c2 cx
G(G1 G2 Gx)
L1 L2 Lx
VT = (V1 V2 Vx)
FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe
Figure 2 Search process of layered searchable encryption scheme
Build(119889 119881119889)
(1) input a document 119889 and the mappings of all words 119881119889in 119889
(2) specified according to the functionality(3) output a local functional structure 119871
119889
Filter(119862 119871 119881119879)
(1) input a set of encrypted document 119862 = (1198881 119888
119909) the corresponding local functional
structures 119871 = (1198711 119871
119909) and the mappings of the queried keywords 119881
119879= (1198811 119881
119909)
(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862
Algorithm 1 Template for functional component FC
rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms
442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905
1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo
and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as
ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)
Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings
Suppose there are 119899 documents 119863 = (1198891 119889
119899) 119899 cor-
responding ciphertexts 119862 = (1198881 119888
119899) 119890 functional compo-
nents FC= (1198651 119865
119890) and 119900 queried keywords119882 = (119908
1
119908119900) In addition we define a hash function as follows
119891ℎ 0 1
lowast997888rarr 0 1
119897 (3)
where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891
ℎ then 119897
is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4
443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890
1and 1198903map to the
word ldquodayrdquo)
ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)
Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1
Then we have
ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)
In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed
8 The Scientific World Journal
Input the encryption key 119870119890= 119870priv the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871
1 119871
119899)
Method(1) compute (120574 119862) larr SSEsdotEnc(119870
119890 119863) Here 119862 = (119888
1 119888
119899)
(2) for each document 119889119894isin 119863 and the corresponding 119888
119894(1 le 119894 le 119899) do
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) for each word 119908119896(1 le 119896 le 119903) in119882 do
(5) compute 119905119896larr SSEsdotToken(119870
119890 119908119896)
(6) compute V119894119896larr 119891ℎ(119905119896)
(7) end for(8) let 119881
119894= (V1198941 V
119894119903)
(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(11) end for(12) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(13) end for(14) let 119866 = 120574 and 119871 = (119871
1 119871
119899) and output (119862 119866 119871)
Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)
Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871
1 119871
119899)
(5) 119879 search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888
1 119888
119909)
(2) for each token 119905119896(1 le 119896 le 119900) compute V
119896larr 119891ℎ(119905119896)
(3) let 1198811= 1198812= sdot sdot sdot = 119881
119909= (V1 V
119900) and
(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(7) end for
Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)
Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 2
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889
Algorithm 4 LSE scheme symmetric part
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 The Scientific World Journal
the security of some components will be correlated such thatthe loose coupling property is lost
Wefirst define the notion of plain trace which is the directinformation that could be captured from the data that flowthrough the interface
Definition 7 (plain trace) Let119863 = (1198891 119889
119899) be a document
collection Let 119873 = (1198731 119873
119899) (only for asymmetric
setting) be a keyword-counter set where119873119894is the number of
keywords in 119889119894 Let the query history119882 = (119908
1 119908
119901) be a
sequence of queried keywords Let the search pattern 120590(119882)be a 119901 times 119901 binary matrix such that for 1 le 119894 119895 le 119901 the 119894throw and 119895th column is 1 if119882
119894= 119882119895and 0 otherwiseThe plain
trace 120587(119863119882) = (|1198891| |119889
119899| 119873 120590(119882))
Note that plain trace is different from the notion of traceintroduced in [8] which further captures the logic links Wewill explain the reason after the definition of the securitymodel We now present the security model for the interface
Definition 8 (interface security against chosen plaintextattack interface-CPA-secure) Let Σ be the layered search-able encryption scheme Let 119896 isin N be the security parameterWe consider the following probabilistic experiments whereAis an adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 = 119870priv (symmetric) or 119870 = (119870priv 119870pub)(asymmetric)The adversaryA generates a documentcollection 119863 = (119889
1 119889
119899) a sequence of query
119882 = (1199081 119908
119901) and receives (119862 119866) larr Enc(119870
119890 119863)
and search tokens 119879 larr Token(119870priv119882) from thechallengerA generates a mapping119881
119879as the input for
the functional component Finally A returns a bit 119887that is output by the experimentSimΣAS(119896) given the plain trace 120587(119863119882)S gener-
ates (119862lowast 119866lowast) and 119879lowast and then sends the results toA A generates a mapping 119881lowast
119879as the input for the
functional component FinallyA returns a bit 119887 thatis output by the experiment
We say that the interface of LSE is semantic secure againstchosen plaintext attack if for all PPT adversaries A thereexists a PPT simulator S such that
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (1)
where the probabilities are over the coins of GenNote that the functional structure 119871 is not included here
since the functional component is loosely coupled with thecore Therefore the security of the functional componentis separate from the framework and should be defined andanalyzed separately
The security model of the interface does not care aboutthe search algorithm and the number of queries (thereforeonly a single query sequence is presented) The reason is thatthe other information about the queried keywords and thedocuments are protected by the components For example
if some documents are returned by one token then theadversary could immediately infer that these documentshave a common keyword (even the tokens and documentsare indistinguishable from random in the interface) andsuch logic links could be hidden by generating multipledifferent tokens for one keyword (please refer to the adaptiveconstruction in [8]) and the protection is guaranteed in thecore searchable component
Therefore semantic security for the interface does notguarantee that the whole scheme is secure against chosenkeyword attack or each component is secure under someother security models However it provides the basic securityguarantee for the whole scheme and the independence foreach component and we will show such independence in theconstruction of the functional component later
44 Concrete Construction We first present the basic ideafor the search process and the converter then we presentthe template for constructing the functional componentFinally we present the constructions for LSE (symmetric andasymmetric) in detail and prove the security of the interface
441 Basic Idea As shown in Figure 2 the basic search pro-cess is as follows The user transforms his queried keywords119882 to tokens 119879 using the private key The server receives thetokens119879 and executes the search procedure over all encrypteddocuments 119888
1 119888
119899 Each 119888
119894(1 le 119894 le 119899) is linked to a
global searchable structure 119866119894(if a global index is used then
only a single searchable structure 119866 is used for all encrypteddocuments) and a local functional structure 119871
119894 and only
the global searchable structure 119866(1198661 119866
119899) is used in this
step Then the tokens 119879 are converted to the uniform tokens119881119879 and both 119881
119879and the matched 119909 encrypted documents
are transmitted to functional components FC1 FC
119890to
further filter the results (eg phrase query filter) Eachcomponent outputs a subset of the input documents andall components work serially since any document that doesnot pass the current filter will be unnecessary for the nextfilter Finally thematched encrypted documents 119888
1 119888
119898are
returned to the user and the user decrypts them to obtainthe plaintexts 119889
1 119889
119898 In order to construct a functional
component that supports both symmetric and asymmetricsettings a conversion is needed to transform the plaintext toa kind of ciphertext that is independent from the settingsWecall this independent ciphertext as a one-to-one ldquomappingrdquosince each word in the plaintext has a deterministic tokenin the ciphertext In addition in order to provide a uniformformat for the functional components a hash function isused and we will show the detailed construction in thenext section Now we present the template for the functionalcomponent (FC) in Algorithm 1
We note that in order to obtain the loose couplingproperty any specific parameter is not allowedTherefore theuniform mappings of the words become the ideal commonparameter Another advantage of the mapping is that themain information needed for any functionality is retainedthe difference of each word and the order of all words in thedocument Based on this information the word frequency
The Scientific World Journal 7
Input Token
User Search
Core (SSEASE) Converter
Outputm le x le n
Decrypt
Filter Convert
Filter
W = (w1 w2 w0) T = (t1 t2 t0)
c1 c2 cn
G(G1 G2 Gn)
L1 L2 Ln
d1 d2 dm
c1 c2 cm
c1 c2 cx
G(G1 G2 Gx)
L1 L2 Lx
VT = (V1 V2 Vx)
FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe
Figure 2 Search process of layered searchable encryption scheme
Build(119889 119881119889)
(1) input a document 119889 and the mappings of all words 119881119889in 119889
(2) specified according to the functionality(3) output a local functional structure 119871
119889
Filter(119862 119871 119881119879)
(1) input a set of encrypted document 119862 = (1198881 119888
119909) the corresponding local functional
structures 119871 = (1198711 119871
119909) and the mappings of the queried keywords 119881
119879= (1198811 119881
119909)
(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862
Algorithm 1 Template for functional component FC
rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms
442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905
1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo
and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as
ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)
Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings
Suppose there are 119899 documents 119863 = (1198891 119889
119899) 119899 cor-
responding ciphertexts 119862 = (1198881 119888
119899) 119890 functional compo-
nents FC= (1198651 119865
119890) and 119900 queried keywords119882 = (119908
1
119908119900) In addition we define a hash function as follows
119891ℎ 0 1
lowast997888rarr 0 1
119897 (3)
where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891
ℎ then 119897
is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4
443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890
1and 1198903map to the
word ldquodayrdquo)
ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)
Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1
Then we have
ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)
In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed
8 The Scientific World Journal
Input the encryption key 119870119890= 119870priv the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871
1 119871
119899)
Method(1) compute (120574 119862) larr SSEsdotEnc(119870
119890 119863) Here 119862 = (119888
1 119888
119899)
(2) for each document 119889119894isin 119863 and the corresponding 119888
119894(1 le 119894 le 119899) do
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) for each word 119908119896(1 le 119896 le 119903) in119882 do
(5) compute 119905119896larr SSEsdotToken(119870
119890 119908119896)
(6) compute V119894119896larr 119891ℎ(119905119896)
(7) end for(8) let 119881
119894= (V1198941 V
119894119903)
(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(11) end for(12) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(13) end for(14) let 119866 = 120574 and 119871 = (119871
1 119871
119899) and output (119862 119866 119871)
Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)
Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871
1 119871
119899)
(5) 119879 search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888
1 119888
119909)
(2) for each token 119905119896(1 le 119896 le 119900) compute V
119896larr 119891ℎ(119905119896)
(3) let 1198811= 1198812= sdot sdot sdot = 119881
119909= (V1 V
119900) and
(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(7) end for
Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)
Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 2
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889
Algorithm 4 LSE scheme symmetric part
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 7
Input Token
User Search
Core (SSEASE) Converter
Outputm le x le n
Decrypt
Filter Convert
Filter
W = (w1 w2 w0) T = (t1 t2 t0)
c1 c2 cn
G(G1 G2 Gn)
L1 L2 Ln
d1 d2 dm
c1 c2 cm
c1 c2 cx
G(G1 G2 Gx)
L1 L2 Lx
VT = (V1 V2 Vx)
FC1 minusrarr FC2 minusrarr middot middot middot minusrarr FCe
Figure 2 Search process of layered searchable encryption scheme
Build(119889 119881119889)
(1) input a document 119889 and the mappings of all words 119881119889in 119889
(2) specified according to the functionality(3) output a local functional structure 119871
119889
Filter(119862 119871 119881119879)
(1) input a set of encrypted document 119862 = (1198881 119888
119909) the corresponding local functional
structures 119871 = (1198711 119871
119909) and the mappings of the queried keywords 119881
119879= (1198811 119881
119909)
(2) specified according to the functionality(3) output a subset of the documents 1198621015840 sube 119862
Algorithm 1 Template for functional component FC
rank subset and so forth could also be inferred withoutthe plaintext which facilitates the designs of the Token andSearch algorithms
442 Constructing Symmetric Part For symmetric settingthe deterministic mapping of a document could be computedwith ease Let the tokens 119905
1 1199052 1199053map to the words ldquodayrdquo ldquobyrdquo
and ldquonightrdquo respectively Then the deterministic mapping ofa sentence could be written as
ldquoday by day night by nightrdquo 997904rArr 119905111990521199051119905311990521199053 (2)
Both the Enc and Token algorithms could generate thesemappings and the main process is as follows For eachdocument 119889 scan all words and compute the correspondingtokens which are further hashed to the fixed-size mappings
Suppose there are 119899 documents 119863 = (1198891 119889
119899) 119899 cor-
responding ciphertexts 119862 = (1198881 119888
119899) 119890 functional compo-
nents FC= (1198651 119865
119890) and 119900 queried keywords119882 = (119908
1
119908119900) In addition we define a hash function as follows
119891ℎ 0 1
lowast997888rarr 0 1
119897 (3)
where 119897 is the length of the mapping according to the hashfunction For example if we use MD5 [24] as 119891
ℎ then 119897
is 128 bit For clarity we present the encryption scheme inAlgorithm 2 and the search scheme inAlgorithm 3 andfinallypresent the complete scheme in Algorithm 4
443 Constructing Asymmetric Part For asymmetric set-ting the data sender does not have the private key thereforethe mapping will fail while searching since any encryptionusing the public key is probabilistic (CPA security) Forexample let 119890 represent an encryption of aword and the samesentence will become (note that both 119890
1and 1198903map to the
word ldquodayrdquo)
ldquoday by day night by nightrdquo 997904rArr 119890111989021198903119890411989051198906 (4)
Therefore we delay the construction for such mappingafter the construction of the searchable structure in algorithmEnc and use this searchable structure as an independenttoken for the corresponding word in algorithm SearchRecall that 119904 larr PEKS(119870pub 119908) then the tokens 1199051 1199052 1199053whichmap to the words ldquodayrdquo ldquobyrdquo and ldquonightrdquo will be transformedto 1199041 1199042 1199043when the test in the search algorithm outputs 1
Then we have
ldquoday by day night by nightrdquo 997904rArr 119904111990421199041119904311990421199043 (5)
In this way the data sender could construct the deter-ministic mapping for the document and indirectly obtain thedeterministic tokens just using the public key Similar to thesymmetric setting the process is as follows For each docu-ment119889 scan all words and compute the corresponding tokensaccording to searchable structures which are further hashed
8 The Scientific World Journal
Input the encryption key 119870119890= 119870priv the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871
1 119871
119899)
Method(1) compute (120574 119862) larr SSEsdotEnc(119870
119890 119863) Here 119862 = (119888
1 119888
119899)
(2) for each document 119889119894isin 119863 and the corresponding 119888
119894(1 le 119894 le 119899) do
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) for each word 119908119896(1 le 119896 le 119903) in119882 do
(5) compute 119905119896larr SSEsdotToken(119870
119890 119908119896)
(6) compute V119894119896larr 119891ℎ(119905119896)
(7) end for(8) let 119881
119894= (V1198941 V
119894119903)
(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(11) end for(12) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(13) end for(14) let 119866 = 120574 and 119871 = (119871
1 119871
119899) and output (119862 119866 119871)
Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)
Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871
1 119871
119899)
(5) 119879 search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888
1 119888
119909)
(2) for each token 119905119896(1 le 119896 le 119900) compute V
119896larr 119891ℎ(119905119896)
(3) let 1198811= 1198812= sdot sdot sdot = 119881
119909= (V1 V
119900) and
(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(7) end for
Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)
Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 2
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889
Algorithm 4 LSE scheme symmetric part
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 The Scientific World Journal
Input the encryption key 119870119890= 119870priv the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structure (index-based)(3) 119871 local functional structures 119871 = (119871
1 119871
119899)
Method(1) compute (120574 119862) larr SSEsdotEnc(119870
119890 119863) Here 119862 = (119888
1 119888
119899)
(2) for each document 119889119894isin 119863 and the corresponding 119888
119894(1 le 119894 le 119899) do
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) for each word 119908119896(1 le 119896 le 119903) in119882 do
(5) compute 119905119896larr SSEsdotToken(119870
119890 119908119896)
(6) compute V119894119896larr 119891ℎ(119905119896)
(7) end for(8) let 119881
119894= (V1198941 V
119894119903)
(9) for each functional component FCj (1 le 119895 le 119890) do(10) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(11) end for(12) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(13) end for(14) let 119866 = 120574 and 119871 = (119871
1 119871
119899) and output (119862 119866 119871)
Algorithm 2 Encryption (symmetric) Enc(119870119890 119863)
Input(1)119870pub the public key is not available here(2) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(3) 119866 global searchable structure (index-based)(4) 119871 local functional structures 119871 = (119871
1 119871
119899)
(5) 119879 search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larr SSEsdotSearch(119862 119866 119879) Here 1198621015840 = (119888
1 119888
119909)
(2) for each token 119905119896(1 le 119896 le 119900) compute V
119896larr 119891ℎ(119905119896)
(3) let 1198811= 1198812= sdot sdot sdot = 119881
119909= (V1 V
119900) and
(4) for each functional component FCj (1 le 119895 le 119890) do(5) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(6) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(7) end for
Algorithm 3 Search (symmetric) Search(119870pub 119862 119866 119871 119879)
Gen(1119896) compute 119870 = 119870priv larr SSEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 2
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larrSSEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 3Dec(119870priv 119888) compute 119889 larrSSEsdotDec(119870priv 119888) and output 119889
Algorithm 4 LSE scheme symmetric part
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 9
ASETest
Convert
FCFilter
Gi searchable structure
middot middot middot
L1 L2 middot middot middot Le
Li functional structure
T = (t1 t0)
Kpub s1 s2 sa
c1
c2
ci
cn
Figure 3 Data structure and search process for asymmetric setting
Table 1 Searchable encryption schemes with various functionalities
Symm Asymm Ranked keyword Range Phrase Fuzzy keyword Similarity SubsetRanked keywordquery [4 13 14 25] Yes mdash Yes mdash Possible Possible Possible mdash
Range query [22 23] mdash Yes mdash Yes mdash Possible Possible PossiblePhrase query [20 21] Yes mdash Possible mdash Yes Possible Possible mdashFuzzy keyword query[3 16 17] Yes mdash Possible mdash Possible Yes Possible mdash
Wildcard query [26] mdash Yes mdash Possible mdash Yes Possible PossibleSimilarity query[18 19] Yes mdash Possible mdash Possible Possible Yes mdash
Subset query [22] mdash Yes mdash Possible mdash Possible Possible YesThis paper Yes Yes Yes Yes Yes Yes Yes Yes
to the fixed-size mappings While searching the tokens aremapped to different searchable structures according to eachdocument
There are some differences from the symmetric counter-part as shown in Figure 3 First the searchable structuresare appended to each encrypted data such that the globalindex is not available Second a public key is involved forthe searchable structure However due to the conversion thepublic key is unnecessary for the functional components
Now we present the encryption scheme in Algorithm 5and the search scheme in Algorithm 6 and finally present thecomplete scheme in Algorithm 7
Wenote that the process of ldquofind srdquo at line 5 inAlgorithm 6could be simply done by directly using the intermediateresults from the algorithm ASESearch at line 1
444 Proof of Security As we encapsulate the basic symmet-ric and asymmetric searchable encryptions in the global layerthe core is semantic secure against chosen keyword attack(CKA) [8 9 11] The only thing we need is proving that theinterface is CPA secure and other functionalities are analyzedindependently
Theorem9 If the core symmetric or asymmetric component issemantic secure against chosen keyword attack (CKA-secure)then LSE is interface-CPA-secure
Proof We briefly prove this theorem since the proof isstraightforward We claim that no polynomial-size distin-guisher could distinguish (119862119866 119879 119881
119879) from equal-size ran-
dom strings (119862lowast 119866lowast 119879lowast 119881lowast119879) As proved in [8 11] the CKA-
security of the core component guarantees that (119862119866 119879) areindistinguishable from (119862lowast 119866lowast 119879lowast) For symmetric setting119881119879is the hash of 119879 which is indistinguishable from 119879
lowast Forasymmetric setting119881
119879is the hash of the searchable structure
119878 which is indistinguishable from random say 119878lowast Thereforethe hash value 119881
119879is indistinguishable from the hash value
119881lowast
119879
5 Realizing Various Functionalities
In this section we show how to realize various function-alities based on LSE We fist present the overview of thesearchable encryption schemes with various functionalitiesand then propose two representative constructions for rankedkeyword query and range query Finally we briefly discuss themethods for realizing the other functionalities
51 Overview As shown in Table 1 we present variousfunctionalities for searchable encryption schemes symmetricsetting (Symm) asymmetric setting (Asym) ranked keywordquery (Ranked keyword) range query (Range) phrase query(Phrase) fuzzy keyword query and wildcard query (Fuzzykeyword) similarity query (Similarity) and subset query
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
10 The Scientific World Journal
Input encryption key 119870119890= 119870pub the documents119863 = (119889
1 119889
119899)
Output(1) 119862 encrypted documents 119862 = (119888
1 119888
119899)
(2) 119866 global searchable structures 119866 = (1198661 119866
119899)
(3) 119871 local functional structures 119871 = (1198711 119871
119899)
Method(1) for each document 119889
119894(1 le 119894 le 119899) in119863 do
(2) compute 119888119894larr ASEsdotEnc(119870pub 119889119894)
(3) scan 119889119894for all 119903 words to form a word list119882 = (119908
1 119908
119903)
(4) extract 119886 distinct keywords1198821015840 = (1199081 119908
119886) from119882
(5) for each word 119908119909(1 le 119909 le 119886) in1198821015840 do
(6) compute 119904119894119909larr ASEsdotPEKS(119870pub 119908119909)
(7) compute ℎ119894119909larr 119891ℎ(119904119894119909)
(8) end for(9) let 119866
119894= (1199041198941 119904
119894119886)map to119867
119894= (ℎ1198941 ℎ
119894119886)map to1198821015840
(10) for each word 119908119910(1 le 119910 le 119903) in119882 do
(11) find the ℎ isin 119867119894that the corresponding word 119908
119910isin 1198821015840
(12) set V119894119910= ℎ
(13) end for(14) let 119881
119894= (V1198941 V
119894119903)
(15) for each functional component FCj (1 le 119895 le 119890) do(16) compute 119871119895
119894larr FCjsdotBuild(119889119894 119881119894)
(17) end for(18) append 119871
119894= (1198711
119894 119871
119890
119894) to 119888119894
(19) end for(20) output 119862 = (119888
1 119888
119899) 119866 = (119866
1 119866
119899) 119871 = (119871
1 119871
119899)
Algorithm 5 Encryption (asymmetric) Enc(119870119890 119863)
Input(1) 119870pub the userrsquos public key(2) 119862 encrypted documents(3) 119866 global searchable structures 119866 = (119866
1 119866
119899)
(4) 119871 local functional structures 119871 = (1198711 119871
119899)
(5) 119879 the search tokens 119879 = (1199051 119905
119900)
Output matched documents 1198621015840 = (1198881 119888
119898)
Method(1) compute 1198621015840 larrASEsdotSearch(119870pub 119862 119866 119879) Let 119862
1015840= (1198881 119888
119909)
(2) for each 119888119894isin 1198621015840 and the functional structure 119871
119894(1 le 119894 le 119909) do
(3) let 119866119894= (1199041198941 119904
119894119886) denote the searchable encryptions of 119888
119894
where 119886 is the number of keywords in 119888119894
(4) for each 119905119910(1 le 119910 le 119900) in 119879 do
(5) find 119904 isin 119866119894where ASEsdotTest(119870pub 119904 119905119910) == 119910119890119904
(6) compute V119894119910larr 119891ℎ(119904)
(7) end for(8) let 119881
119894= (V1198941 V
119894119900) 119871119894= (1198711
119894 119871
119890
119894)
(9) end for(10) for each functional component FCj (1 le 119895 le 119890) do(11) let 1198621015840 = (119888
1 119888
119909) then the corresponding 1198711015840 = (119871
1 119871
119909)
and 119881119879= (1198811 119881
119909) Let 119871119895 = (119871119895
1 119871
119895
119909)
(12) compute 1198621015840 larr FCjsdotFilter(1198621015840 119871119895 119881119879)
(13) end for
Algorithm 6 Search (asymmetric) Search(119870pub 119862 119866 119871 119879)
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 11
Gen(1119896) compute 119870 = (119870pub 119870priv) larr ASEsdotGen(1119896) and output 119870Enc(119870
119890 119863) described in Algorithm 5
Token(119870priv119882)(1) for each keyword 119908
119896(1 le 119896 le 119900) in119882 do
(2) compute 119905119896larr ASEsdotToken(119870priv 119908119896)
(3) end for(4) output 119879 = (119905
1 119905
119900)
Search(119870pub 119862 119866 119871 119879) described in Algorithm 6Dec(119870priv 119888) compute 119889 larr ASEsdotDec(119870priv 119888) and output 119889
Algorithm 7 LSE scheme asymmetric part
(Subset) ldquoYesrdquomeans that the corresponding scheme directlysupports such functionality ldquoPossiblerdquo means that the under-lying data structure is compatible and such functionalitycould be realized through minor modification of the originalscheme ldquomdashrdquo means that realizing such functionality is quitechallenging or the cost is relatively high
52 Ranked Keyword Query Ranked keyword query refersto a functionality that all matched documents are sortedaccording to some criteria and only the top-119896 relevantdocumentswill be returned to the userThe SQLquery formatis ldquoORDERED BY ldquokeywordrsquordquo In [14] the authors introducedthe computation for the relevance scores and proposed acomparing method over the encrypted scores based on orderpreserving symmetric encryption (OPSE) [27] By using thesame cryptographic primitive the functional structure couldrecord the encrypted relevance scores and setup an indexwith(token score) pairs in order to obtain the score with 119874(1)computation complexity
521 Preliminaries Order-preserving encryption (OPE)aims to encrypt the data in such a way that comparisonsover the ciphertexts are possible For 119860 119861 sube N a function119891 119860 rarr 119861 is order-preserving if for all 119894 119895 isin 119860 119891(119894) lt 119891(119895)if and only if 119894 lt 119895 We say an encryption scheme OPE =(Enc Dec) is order-preserving if Enc(119870 sdot) is an order-preserving function In [28] Agrawal et al proposed arepresentative OPE scheme that all numeric numbers areuniformly distributed In [27] Boldyreva et al introducedan order-preserving symmetric encryption scheme andproposed the security model The improved definitions areintroduced in [29] Informally speaking OPE is secure if theoracle access to OPEEnc is indistinguishable from accessingto a random order-preserving function (ROPF)The securitymodel is described as Pseudorandom Order-PreservingFunction against Chosen Ciphertext Attack (POPF-CCA)[27]
A sparse look-up table is often managed by indirectaddressing technique Indirect addressing is also called FKSdictionary [30] which is used in symmetric searchableencryption scheme [8] The addressing format is addressvalue where the address is a virtual address that could locatethe value field Given the address the algorithm will return
the associated value in constant look-up time and returnotherwise
522 Construction We build a sparse look-up table 119860 thatrecords the pair (keyword relevance score) with all dataencrypted When queried the server searches the relevancescores of all documents and finds the top-119896 relevant docu-ments Note that in order to security use OPE scheme toencrypt relevance scores a preprocessing is necessary
We build an OPE table to preprocess all plaintexts andstore the encrypted relevance scores as follows Given adocument collection 119863 = (119889
1 119889
119899) For each document
119889119896(1 le 119896 le 119899) scan it for 119900119896 keywords Compute the
relevance score (based on word frequency) 119904119896119894(1 le 119894 le 119900
119896) for
each keyword119908119896119894isin 119882 in 119889
119896 and record a 119900119896times3matrix for 119889
119896
with the 119894th line recording 119877119896119894= (119908119896
119894 119904119896
119894 119901119896
119894) where 119901119896
119894is the
position where the first 119908119896119894occurs For all documents setup
the OPE with 119873 = 1199001+ 1199002+ sdot sdot sdot + 119900
119899 numbers (1199041 119904
119873)
For each number 119904119895(1 le 119895 le 119873) the encryption is 119890
119895
Transform the previous matrix to an OPE table with the 119894thline recording 119877119896
119894= (119908119896
119894 119890119896
119894 119901119896
119894) where 119890119896
119894is the encryption of
119904119896
119894For a document it has at most |119889|2 + 1 keywords
(note that each keyword is followed by a separator such asa blank) The look-up table is padded to |119889|2 + 1 entriesin order to achieve semantic security Now we present theconcrete construction for ranked keyword query componentin Algorithm 8
523 Proof of Security Informally speaking the functionalcomponent must guarantee that given two documentsrsquo col-lection119863
1 1198632with equal size and |119863
1| = |119863
2| then the chal-
lenger flips a coin 119887 and encrypts 119863119887using LSE (the order of
the ciphertexts are randomized) The adversary could querya keyword and receive the ordered document collection buthe could not distinguishwhich one the challenger selected Bycombining the securitymodels defined in [8 27] we formallydefine the notion of non-adaptive chosen ranked keywordattack (CRKA) as follows
Definition 10 (semantic security against nonadaptive chosenranked keyword attack CRKA-secure) Let Σ be the func-tional component for ranked keyword query Let 119896 isin N be
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
12 The Scientific World Journal
Build(119889 119881119889)
(1) input a document 119889 and the mapping 119881119889= (V1 V
119903) for 119903 words
(2) let the entries of 119889 in OPE table be ((1199081 1198901 1199011) (119908
119900 119890119900 119901119900))
(3) for each 119894 isin [1 119900] build index 119860[V119901119894] = 119890119894
(4) padding the remaining |119889|2 + 1 minus 119900 entries with random strings(5) output a local functional structure 119871
119889= 119860
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures
119871 = (1198711 119871
119899) and the mappings of the queried keywords
119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
(2) for all 119899 functional structures compute 1199031= 1198711[V1] 119903
119899= 119871119899[V119899]
and select the top 119896 results 1198881 119888
119896corresponding to 119903
1 119903
119896
(3) output 1198621015840 = (1198881 119888
119896)
Algorithm 8 Ranked keyword query component
the security parameterWe consider the following probabilis-tic experiments whereA is an adversary andS is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a documentcollection119863 = (119889
1 119889
119899) (the size of each document
is fixed) and receives the encrypted documents 119862 =
(1198881 119888
119899) and functional structures 119871 = (119871
1 119871
119899)
with random order from the challengerA is allowedto query a keyword 119908 where 119908 isin 119889
1 119908 isin 119889
119899and
receives a mapping V from the challenger Finally Areturns a bit 119887 that is output by the experimentSimΣAS(119896) given the number of documents 119899 the
size of each document |119889| and the size of themapping|V| S generates 119862lowast 119871lowast and Vlowast and then sends theresults to A Finally A returns a bit 119887 that is outputby the experiment
We say that the functional component is CRKA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (6)
where the probabilities are over the coins of Gen
Theorem11 If LSE is interface-CPA-secure and the underlyingOPE is POPF-CCA secure then the ranked keyword querycomponent is CRKA-secure
Proof The simulator S generates 119862lowast 119871lowast and Vlowast as followsAs to 119862lowast S generates 119899 random strings 119888lowast
1 119888
lowast
119899of size |119889|
As to119871lowast let119898 = |119889|2+1S generates119898 random strings119881lowast =Vlowast1 Vlowast
119898with each has size |V|S generates an119898times 119899matrix
119864119898times119899
= (119890lowast
119894119895) where each element 119890lowast
119894119895is a random number
Then for each document S generates an index 119860lowast119895[Vlowast119894] = 119890lowast
119894119895
(1 le 119894 le 119898 1 le 119895 le 119899) As to Vlowast S randomly selects Vlowast =Vlowast119894isin 119881lowast
We claim that no polynomial-size distinguisher coulddistinguish (119862 119871 V) from (119862lowast 119871lowast Vlowast) Since the encryption
key 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119862lowast is indistinguishable from119862 It also guarantees that Vlowast is indistinguishable from V Uponreceiving V = V
119894isin 119881 or Vlowast = Vlowast
119894isin 119881lowast the adversaryA could
invoke Filter(119862 119871 V) or Filter(119862lowast 119871lowast Vlowast) to obtain (1199031=
1198711[V119894] = 119890
1 119903
119899= 119871119899[V119894] = 119890
119899) or (119903lowast
1= 119871lowast
1[Vlowast119894] =
119890lowast
1 119903
lowast
119899= 119871lowast
119899[Vlowast119894] = 119890
lowast
119899) POPF-CCA security guarantees
that the set (1199031 119903
119899) is indistinguishable from (119903
lowast
1 119903
lowast
119899)
that is the adversary is unable to distinguish the result ofOPE from the result of a random order-preserving functionTherefore 119871 is indistinguishable from 119871
lowast
53 Range Query Range query refers to a functionality thatthe server could test if the submitted keyword (integer) iswithin a rangeThe SQLquery format is ldquoWHERE lsquo119909 operator119910rsquordquo For example the user submits an integer 119908 and theserver could return the documents where the correspondingsearchable fields 119886 satisfying 119886 gt 119908
Although OPE could be applied here to support rangequery (similar to ranked keyword query) we propose anothersolution to demonstrate that how to apply the methods usedin asymmetric setting to LSE In [2] the authors introduceda construction based on bilinear map (asymmetric setting)which is not compatible with symmetric setting Howeverthe idea of transforming the comparison into a predicate (eg119875119886(119908) = 1 if 119886 gt 119908 where 119875 is a predicate) could be used and
the functional structure could record all possible predicatesand provide predicate test using a bloom filter
531 Preliminaries A bloom filter [31] is a space-efficientprobabilistic data structure that is used to test whether anelement 119904 is a member of a set 119878 = (119904
1 119904
119899) The set 119878 is
coded as an array 119861 of 119898 bits Initially all array bits are setto 0 The filter uses 119903 independent hash functions ℎ
1 ℎ
119903
where each ℎ119894 0 1
lowastrarr [1119898] for 1 le 119894 le 119903 For each
element 119904119896isin 119878 where 1 le 119896 le 119899 set the bits at positions
ℎ1(119904119896) ℎ
119903(119904119896) to 1 Note that a location could be set to 1
multiple times To determine if 119904 isin 119878 just check whether thepositions ℎ
1(119904) ℎ
119903(119904) in119861 are all 1 If any bit is 0 then 119904 notin 119878
Otherwise we say 119904 isin 119878with high probability (the probabilitycould be adjusted by parameters until acceptable)
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 13
Build(119889 119881119889)
(1) input a range document 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1 ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873)
and the mapping 119881119889= (V1 V
2119873+1) Here 119889 is the transformed form for the label 119886 = 119886
119894
(2) initialize a bloom filter 119861 with all bits set to 0(3) for (119896 = 1 119896 le 2119873 + 1 119896 + +) do(4) compute 119903 codewords 119910
1= ℎ1(119889 V
119896) 119910
119903= ℎ119903(119889 V
119896)
(5) insert the codewords 1199101 119910
119903into the bloom filter 119861
(6) end for(7) output a local functional structure 119871
119889= 119861
Filter(119862 119871 119881119879)
(1) input 119899 ciphertexts 119862 = (1198881 119888
119899) the corresponding functional structures 119871 = (119871
1 119871
119899)
the mappings of the queried keywords 119881119879= (1198811 119881
119899) = (V
1 V
119899) (single keyword)
where V119894(1 le 119894 le 119899) is the mapping of ldquogt 119908rdquo
(2) for (119894 = 1 119894 le 119899 119894 + +) do(3) compute 119903 codewords 119910
1= ℎ1(119889 V
119894) 119910
119903= ℎ119903(119889 V
119894)
(4) if all 119903 locations 1199101 119910
119903in bloom filter 119861
119894= 119871119894are 1 then add 119888
119894to 1198621015840
(5) end for(6) output 1198621015840
Algorithm 9 Range query component
In addition we write 119889 to denote the identifier of adocument 119889 such as the cryptographic hash of the pathnameand write 119909 gt (119910
1 119910
119899) to denote 119909 gt 119910
1 119909 gt 119910
119899for
simplicity
532 Construction For range query the document 119889 islabeled by some numbers Here we only consider a singlelabel 119886 Therefore the aim of range query is to enable theuser to submit a number 119908 to search for the documents thatsatisfying the SQL-like query such as ldquoWHERE ldquo119886 gt 119908rdquordquo Weconsider the five basic range query operators ldquogt ge lt le =rdquoThe other operators such as ldquoisinrdquo could be naturally derivedfrom the basic operators
We consider the whole range to be a sequence of 119873discrete numbers 119860 = (119886
1 119886
119873) where 119886
1lt 1198862lt sdot sdot sdot lt 119886
119873
Then we set five shared virtual documents 11988910158401= (gt 119886
1 gt
1198862 gt 119886
119873minus1) 11988910158402= (ge 119886
1 ge 1198862 ge 119886
119873) 11988910158403= (lt 119886
2 lt
1198863 lt 119886
119873) 11988910158404= (le 119886
1 le 1198862 le 119886
119873) and 1198891015840
5= (= 119886
1 =
1198862 = 119886
119873) for all userrsquos documents The virtual document
could be encrypted by LSErsquo core as a normal documentTherefore for any keyword such as ldquogt 119886
119894rdquowhere 1 le 119894 le 119873minus1
there always exists a mapping V119894
Based on the notion of virtual document a label 119886 isin 119860 fora userrsquos document satisfying 119886
119894minus1lt 119886 lt 119886
119894+1(or 119886 = 119886
119894) could
be represented as 2119873+1 keywords 119889 = (gt 1198861 ge 1198861 gt 119886
119894minus1
ge 119886119894minus1 ge 119886119894 = 119886119894 le 119886119894 lt 119886119894+1 le 119886119894+1 lt 119886
119873 le 119886119873) and
these keywords are stored in a bloom filter 119861 Suppose theuser queries a keyword ldquogt 119908rdquo where 119886
1lt 119908 lt 119886
119873 then the
query is transmitted to the bloom filter to test if ldquogt 119908rdquo isin 119861For example (we only consider the operator ldquogtrdquo here for
simplicity) suppose we have two documents 1198881 1198882labeled
5 10 respectivelyThen the transformed sets are 1198611= (gt 1 gt
2 gt 4) and 1198612= (gt 1 gt 2 gt 9) If the user sub-
mits gt7 then only 1198612matches the query which is the same
result as direct comparisons since 5 gt 7 and 10 gt 7 and
then 1198882is returned Similarly the query ldquogt3rdquo will match both
documents and 1198881 1198882are returned
Now we construct the secure version of the aforemen-tioned scheme Let 119860 = (119886
1 119886
119873) denote the domain of
the label and setup the bloom filter with 119903 independent hashfunctions ℎ
1 ℎ
119903 The identifier 119889 of a document is always
bound to the document 119889 or the ciphertext 119888 The concreteconstruction is presented in Algorithm 9
The size of the bloom filter could be dramatically reducedif the domain is bucketized [32] for example bucketizing thesubrange [10 20) as tag 10 and the subrange [20 30) as tag20 Then a query for ldquogt13rdquo could be mapped to the closestquery ldquogt10rdquo In other words the whole domain is dividedto multiple subranges that the queried range is transformedto the approximate range The optimization of the idea ofbucketizing the range is introduced in [33] In such way thenumber of the data stored in the bloom filter will becomesmaller However this will induce inaccuracy for the queryresult
533 Proof of Security For simplicity without loss of gener-ality we only consider the operator ldquogtrdquo here and the otheroperators are the same Informally speaking the functionalcomponent must guarantee that the adversary is unable toguess the queried range as well as the range in the ciphertextand the basic game works as follows Given two documents1198891 1198892that are labeled with two numbers 119886
1 1198862 respectively
the challenger flips a coin 119887 and encrypts (119889119887 119886119887) The
adversary is allowed to adaptively query 119901 keywords 119882 =
(1199081 119908
119901) where each 119908
119894isin 119882 that (119886
1 1198862) gt 119908119894 Note that
querying 119908119894that 119886
1gt 119908119894 1198862
gt 119908119894is not allowed since the
document is immediately distinguished (only the documentwith 119886
1gt 119908119894is matched and returned) We propose the
notion of chosen range attack (CRA) and formally define thesecurity model for semantic security as follows
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
14 The Scientific World Journal
Definition 12 (semantic security against chosen range attackCRA-secure) Let Σ be the functional component for rankedkeyword query Let 119896 isin N be the security parameter Weconsider the following probabilistic experiments whereA isan adversary and S is a simulator
RealΣA(119896) the challenger runs Gen(1119896) to generate
the key 119870 The adversary A generates a docu-ment 119889 and the labeled number 119886 and receives theencrypted document 119888 and the functional structure119871 A is allowed to adaptively query 119901 keywords119882 = (gt 119908
1 gt119908
119901) For each query ldquogt 119908
119894rdquo A re-
ceives a mapping V119894from the challenger Finally A
returns a bit 119887 that is output by the experiment
SimΣAS(119896) given the document size |119889| the cardi-
nality of the range119873 and the size of the mapping |V|S generates 119888lowast 119871lowast and 119881lowast = (Vlowast
1 Vlowast
119901) and then
sends the results to A Finally A returns a bit 119887 thatis output by the experiment
We say that the functional component is CRA-secure iffor all PPT adversariesA there exists a PPT simulatorS suchthat
1003816100381610038161003816Pr [Real
ΣA (119896) = 1] minus Pr [SimΣAS (119896) = 1]
1003816100381610038161003816
le negl (119896) (7)
where the probabilities are over the coins of Gen
Theorem 13 If LSE is interface-CPA-secure then the rankedkeyword query component is CRA-secure
Proof ThesimulatorS generates 119888lowast 119871lowast and119881lowast = (Vlowast1 Vlowast
119901)
as follows As to 119888lowastS generates a random string of size |119889| Asto 119871lowastS generates a random string 119889
lowast
and 2119873+1 distinct andrandom strings 119879 = (119905
1 119905
2119873+1) For each 119905
119894isin 119879 S com-
putes 119903 codewords 119910lowast1= ℎ1(119889
lowast
||119905119894) 119910
lowast
119903= ℎ119903(119889
lowast
||119905119894) and
inserts the codewords 119910lowast1 119910
lowast
119903into a bloom filter 119861lowast Let
119871lowast= 119861lowast As to119881lowast for each Vlowast
119894isin 119881lowastS randomly selects a dis-
tinct 119905119895isin 119879maps to Vlowast
119894 such that Vlowast
119894= 119905119895 Note that if Vlowast
119909= Vlowast119910
for some locations 1 le 119909 119910 le 119901 the mapping is the sameWe claim that no polynomial-size distinguisher could
distinguish (119888 119871 119881) from (119888lowast 119871lowast 119881lowast) Since the encryptionkey 119870 is kept secret from the adversary the interface-CPA-security directly guarantees that 119888lowast is indistinguishable from119888 It also guarantees that each V
119894isin 119881 is indistinguishable from
the random string Vlowast119894such that 119881 is indistinguishable from
119881lowast Therefore the locations (119910
1 119910
119903) of V119894in bloom filter
119861 is indistinguishable from the locations (119910lowast1 119910
lowast
119903) of Vlowast
119894
in bloom filter 119861lowast Therefore 119903 sdot (2119873 + 1) locations in 119861 areindistinguishable from 119903 sdot (2119873 + 1) locations in 119861lowast Thus 119871 isindistinguishable from 119871
lowast
54 Other Functionalities Due to space limitation we onlydiscuss the above two representative functional components
We briefly introduce how to realize some other functionalitiesbased on LSE as follows
Phrase Query It refers to a query with consecutive andordered multiple keywords For example searching withphrase ldquooperating systemrdquo requires that not only each key-word ldquooperatingrdquo and ldquosystemrdquo must exist in each returneddocument but also the order that ldquooperatingrdquo is followed byldquosystemrdquomust also be satisfied In [21] the authors introduceda solution based on Nextword Index [34] It allows theindex to record the keyword position for each documentand enables the user to query the consecutive keywordsbased on binary search over all positions However it has119874(log 119899) computation complexity for each document Basedon LSE this functionality could be realized using bloom filter(as demonstrated in range query scheme) which recordingbiword or more words based on Partial Phrase Indexes [35]As a result the scheme coude achieve approximately 119874(1)computation complexity (note that the index in the globallayer could reduce a large number of results for multiplekeywords)
Fuzzy Keyword Search It refers to a functionality that theuser submit a fragment of a keyword (or a keyword thatdoes not exist in all documents) and the server could searchfor the documents with all possible keywords that are closedto the fragment In [3] the authors introduced a wildcard-based construction that could handle fuzzy keyword searchwith arbitrary edit distance [36] By using the same methodthe functional structure could realize this functionality byrecording and indexing the fuzzy set of all mappings insteadof keywords
Similarity Query It refers to a functionality that the servercould return to the user some documents containing key-words which are similar to the queried keyword In both[18 19] the authors realized this functionality based onfuzzy setTherefore although different methods are used theconstruction of the fundamental component is similar to theconstruction of fuzzy keyword search scheme
Subset Query It refers to a functionality that the servercould test if the queried message is a subset of the values inthe searchable fields For example let 119878 be a set that con-tains multiple e-mail addresses If the user search for someencrypted mails containing Alicersquos e-mail 119886 then the servermust have the ability to test if 119886 isin 119878 without knowing anyother information A solution was also introduced in [2]Similar to the range query scheme this test could also beviewed as a predicate and therefore the solution is the same
55 Performance Analysis The algorithms of ranked key-word query component and range query component arecoded in C++ programming language and the server isa Pentium Dual-Core E5300 PC with 26GHz CPU Eachdocument is fixed to 10 KB with random words chosen froma dictionary and the query is also some random keywords(random numbers) For bloom filter used in range query
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
The Scientific World Journal 15
05 1 15 20
100
200
300
400
500
600
700
800
900
Number of documents (million)
Runn
ing
time (
mill
iseco
nds)
Ranked keyword queryRange query
Figure 4 Time costs of the filter algorithms (single server singlequery)
User
Documents
RankRange
Core
Rank Range
Range
Core
Searchable and functional structures
Tokens
Figure 5 Deploying functional components to multiple servers
component the number of hash functions is set to 8The timecosts of the filter algorithms are shown in Figure 4
Let 119899 denote the number of documents For ranked key-word query the main operations are retrieving the relevancescores from the secure table managed by indirect addressingtechnique (119874(119899) search complexity) and selecting the top-119896 scores (119874(119899) computation complexity) For range querythe main operation is computing 8 hash values (119874(119899) com-putation complexity) Note that the current document willbe passed if any position in bloom filter is 0 Therefore notall eight hash functions are executed all the time The figuredemonstrates that even for a single server the algorithms areboth efficient Note that since the functional components areloosely coupled with each other they could be deployed todifferent servers For example two core components (Core)two ranked keyword query components (Rank) and threerange query components (Range) could be executed as a data-flow boxes as shown in Figure 5 Each box could be deployedto any server The detailed methods are out of scope of thispaper and we will not discuss this further
6 Conclusions
Layered searchable encryption scheme provides a new wayof thinking the relationship among the searchable structurefunctionality and security It separates the functionalitiesapart from the core searchable structure without loss ofsecurity Therefore the loose coupling property providescompatibility for symmetric and asymmetric settings andit also provides flexibility for adding or deleting variousfunctionalities Furthermore following the popular boxesand arrows paradigm the loose coupling property makes thescheme more suitable for distributed and parallel computingenvironment
Conflict of Interests
The authors declare that there is no conflict of interestsregarding the publication of this paper
Acknowledgment
This work is supported by the Science and TechnologyDepartment of Sichuan Province (Grant no 2012GZ0088 andno 2012FZ0064)
References
[1] H Takabi J B D Joshi and G-J Ahn ldquoSecurity and privacychallenges in cloud computing environmentsrdquo IEEE Securityand Privacy vol 8 no 6 pp 24ndash31 2010
[2] D Boneh and B Waters ldquoConjunctive subset and range que-ries on encrypted datardquo inTheory of Cryptography pp 535ndash554Springer 2007
[3] J Li Q Wang C Wang N Cao K Ren and W Lou ldquoFuzzykeyword search over encrypted data in cloud computingrdquo inProceeding of the Fuzzy Keyword Search over Encrypted Data inCloud Computing (IEEE INFOCOM rsquo10) Institute of Electricaland Electronics Engineers March 2010
[4] A Swaminathan Y Mao G-M Su et al ldquoConfidentiality-pre-serving rank-ordered searchrdquo in Proceedings of the ACMWork-shop on Storage Security and Survivability (StorageSS rsquo07) pp7ndash12 Association for Computing Machinery October 2007
[5] D J Abadi D Carney U Cetintemel et al ldquoAurora a newmodel and architecture for data stream managementrdquo VLDBJournal vol 12 no 2 pp 120ndash139 2003
[6] D X Song D Wagner and A Perrig ldquoPractical techniquesfor searches on encrypted datardquo in Proceedings of the IEEESymposium on Security and Privacy pp 44ndash55 May 2000
[7] E J Goh ldquoSecure indexesrdquo Tech Rep IACR ePrint Cryptogra-phy Archive 2003
[8] R Curtmola J Garay S Kamara and R Ostrovsky ldquoSearch-able symmetric encryption improved definitions and efficientconstructionsrdquo in Proceedings of the 13th ACM Conference onComputer and Communications Security (CCS rsquo06) pp 79ndash88November 2006
[9] MChase and S Kamara ldquoStructured encryption and controlleddisclosurerdquo in Proceedings of the 16th International Conferenceon the Theory and Application of Cryptology and InformationSecurity (ASIACRYPT rsquo10) pp 577ndash594 Springer 2010
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
16 The Scientific World Journal
[10] S Kamara C Papamanthou and T Roeder ldquoCs2 a searchablecryptographic cloud storage systemrdquo Tech Rep MSR-TR-2011-58 Microsoft Research
[11] D BonehGDCrescenzo ROstrovsky andG Persiano ldquoPub-lic key encryption with keyword searchrdquo in Proceedings of theInternational Conference on theTheory andApplications of Cryp-tographic Techniques (EUROCRYPT rsquo04) pp 506ndash522 Springer2004
[12] M Abdalla M Bellare D Catalano et al ldquoSearchable encryp-tion revisited consistency properties relation to anonymousIBE and extensionsrdquo Journal of Cryptology vol 21 no 3 pp350ndash391 2008
[13] N Cao C Wang M Li K Ren and W Lou ldquoPrivacy-preserv-ing multi-keyword ranked search over encrypted cloud datardquoinProceedings of the IEEE International Conference onComputerCommunications (IEEE INFOCOM rsquo11) pp 829ndash837 Institute ofElectrical and Electronics Engineers 2011
[14] C Wang N Cao J Li K Ren and W Lou ldquoSecure rankedkeyword search over encrypted cloud datardquo inProceedings of the30th IEEE International Conference on Distributed ComputingSystems (ICDCS rsquo10) pp 253ndash262 June 2010
[15] P Golle J Staddon andBWaters ldquoSecure conjunctive keywordsearch over encrypted datardquo in Proceedings of the 2nd Interna-tional Conference (ACNS rsquo04) pp 31ndash45 Springer 2004
[16] C Bosch R Brinkman P Hartel and W Jonker ldquoConjunctivewildcard search over encrypted datardquo in Proceedings of the8th VLDB Workshop on Secure Data Management pp 114ndash127Springer 2011
[17] J Bringer and H Chabanne ldquoEmbedding edit distance to allowprivate keyword search in cloud computingrdquo in Proceedingsof the 8th FTRA International Conference on Secure and TrustComputing Data Management and Application pp 105ndash113Springer 2011
[18] W Cong R Kui Y Shucheng and K M R Urs ldquoAchievingusable and privacy-assured similarity search over outsourcedcloud datardquo in Proceedings of the IEEE Conference on ComputerCommunications (INFOCOM rsquo12) pp 451ndash459 2012
[19] M Kuzu M S Islam andM Kantarcioglu ldquoEfficient similaritysearch over encrypted datardquo in Proceedings of the IEEE 28thInternational Conference on Data Engineering (ICDE rsquo12) pp1156ndash1167 2012
[20] S Zittrower and C C Zou ldquoEncrypted phrase searching in thecloudrdquo in Proceedings of the IEEE Conference and ExhibitionGlobal Telecommunications Conference (GLOBECOM rsquo12) pp764ndash770 2012
[21] Y Tang D Gu N Ding and H Lu ldquoPhrase search overencrypted data with symmetric encryption schemerdquo in Pro-ceedings of the 32nd International Conference on DistributedComputing Systems Workshops (ICDCSW rsquo12) pp 471ndash4802012
[22] D Boneh E Kushilevitz R Ostrovsky and W E Skeith IIIldquoPublic key encryption that allows pir queriesrdquo in Proceeding ofthe 27th Annual International Cryptology Conference (CRYPTOrsquo07) pp 50ndash67 Springer 2007
[23] E Shi J Bethencourt T-H H Chan D Song and A PerrigldquoMulti-dimensional range query over encrypted datardquo in Pro-ceedings of the IEEE Symposium on Security and Privacy (SP rsquo07)pp 350ndash364 May 2007
[24] R RivestThe Md5 Message-Digest Algorithm Internet RequestFor Comments 1992
[25] C Wang N Cao K Ren and W Lou ldquoEnabling secure andefficient ranked keyword search over outsourced cloud datardquo
IEEE Transactions on Parallel and Distributed Systems vol 23no 8 pp 1467ndash1479 2012
[26] S Sedghi P Van Liesdonk S Nikova P Hartel and W JonkerldquoSearching keywords with wildcards on encrypted datardquo inSecurity and Cryptography For Networks pp 138ndash153 Springer2010
[27] A Boldyreva N Chenette Y Lee and A Oneill ldquoOrder-preserving symmetric encryptionrdquo in Advances in Cryptology(EUROCRYPT rsquo09) pp 224ndash241 Springer 2009
[28] R Agrawal J Kiernan R Srikant and Y Xu ldquoOrder preservingencryption for numeric datardquo in Proceedings of the ACMSIGMOD International Conference on Management of Data(SIGMOD rsquo04) pp 563ndash574 June 2004
[29] A Boldyreva N Chenette and A OrsquoNeill ldquoOrder-preservingencryption revisited improved security analysis and alterna-tive solutionsrdquo in Proceedings of the Advances in Cryptology(CRYPTO rsquo11) pp 578ndash595 Springer 2011
[30] M L Fredman E Szemeredi and J Komlos ldquoStoring a sparsetable with o(1) worst case access timerdquo Journal of the ACM vol31 no 3 pp 538ndash544 1984
[31] B H Bloom ldquoSpacetime trade-offs in hash coding withallowable errorsrdquoCommunications of the ACM vol 13 no 7 pp422ndash426 1970
[32] H Hacigumus B Iyer C Li and S Mehrotra ldquoExecuting SQLover encrypted data in the database-service-provider modelrdquoin Proceedings of the ACM SIGMOD International Conferenceon Managment of Data (ACM SIGMOD rsquo02) pp 216ndash227 June2002
[33] B Hore S Mehrotra and G Tsudik ldquoA privacy-preserving in-dex for range queriesrdquo in Proceedings of the Thirtieth interna-tional conference on Very large data bases pp 720ndash731 VLDBEndowment 2004
[34] H E Williams J Zobel and P Anderson ldquoWhatrsquos next indexstructures for efficient phrase queryingrdquo in Australasian Data-base Conference pp 141ndash152 1999
[35] C Gutwin G Paynter I Witten C Nevill-Manning and EFrank ldquoImproving browsing in digital libraries with keyphraseindexesrdquo Decision Support Systems vol 27 no 1 pp 81ndash1041999
[36] V Levenstein ldquoBinary codes capable of correcting spuriousinsertions and deletions of onesrdquo Problems of InformationTransmission vol 1 no 1 pp 8ndash17 1965
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpwwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Journal of
Computer Networks and Communications
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
ArtificialNeural Systems
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014