Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
UT DALLAS Erik%Jonsson%School%of%Engineering%&%Computer%Science
FEARLESS engineering
SGX IRSecure Information Retrieval with Trusted Processors
Fahad Shaon, Murat Kantarcioglu
The University of Texas at Dallas
FEARLESS engineering 1 / 29
Problem - Secure Cloud based Information Retrieval
Encrypted Data
Encrypted Result
Encrypted Search Query
Build a secure information retrieval system
I User stores encrypted files in cloud server
I Perform selective retrieval
FEARLESS engineering 2 / 29
Build Block - Intel SGX
I We use Intel SGX - Software Guard Extensions
I SGX is new Intel instruction set
I Allows us to create secure compartment inside processor,called Enclave
I Privileged softwares, such as, OS, Hypervisor, can not directlyobserve data and computation inside enclave
FEARLESS engineering 3 / 29
Threat Model - Intel SGX
Encrypted Data & Code
Encrypted Result
Disk
Memory CPU
Enclave
Server
Adversary can control hypervisor, OS, memory, disk of the server
FEARLESS engineering 4 / 29
State of The Art
I Relevant search or indexing systems that uses SGX -HardIDX (Fuhry et al., 2017), Rearguard (Sun et al., 2018),Oblix (Mishra et al., 2018), Hardware-supportedORAM (Hoang et al., 2019)
I These works mainly focus on building efficient data structuresfor searching using SGX
I Assume inverted index is built and/or build the index in client
I Did not look into ranked retrieval
FEARLESS engineering 5 / 29
Challenges - Access Pattern Leakage
Challenge: Access Pattern Leakage
I Adversary can observe memory accesses in SGX
I Memory access reveals about encrypted data (Islam, Kuzu,and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)
Solution
I Data Obliviousness - we build custom data obliviousindexing algorithms
FEARLESS engineering 6 / 29
Challenges - Access Pattern Leakage
Challenge: Access Pattern Leakage
I Adversary can observe memory accesses in SGX
I Memory access reveals about encrypted data (Islam, Kuzu,and Kantarcioglu, 2012; Naveed, Kamara, and Wright, 2015)
Solution
I Data Obliviousness - we build custom data obliviousindexing algorithms
FEARLESS engineering 6 / 29
Data Obliviousness - Oblivious Select
I Data Obliviousness: Program executes same path for allinput of same size
I Example: x == y ? a : b
oblivousSelect(a, b, x, y):
...
mov %[x],%%eax
mov %[y],%%ebx
xor %%eax , %%ebx
...
mov %[a],%%ecx
mov %[b],%%edx
cmovz %%ecx ,%%edx
...
mov %%edx , %[out]
FEARLESS engineering 7 / 29
Data Obliviousness - Oblivious Select
I Data Obliviousness: Program executes same path for allinput of same size
I Example: x == y ? a : b
oblivousSelect(a, b, x, y):
...
mov %[x],%%eax
mov %[y],%%ebx
xor %%eax , %%ebx
...
mov %[a],%%ecx
mov %[b],%%edx
cmovz %%ecx ,%%edx
...
mov %%edx , %[out]
FEARLESS engineering 7 / 29
Data Obliviousness - Oblivious Select
I Data Obliviousness: Program executes same path for allinput of same size
I Example: x == y ? a : b
oblivousSelect(a, b, x, y):
...
mov %[x],%%eax
mov %[y],%%ebx
xor %%eax , %%ebx
...
mov %[a],%%ecx
mov %[b],%%edx
cmovz %%ecx ,%%edx
...
mov %%edx , %[out]
FEARLESS engineering 7 / 29
Challenge - Memory Constraint
Challenge: Memory Constraint
I SGX (v1) only 90MB enclave
Solution
I Blocking - Break large data into small blocks
I We utilize SGXBigMatrix (Shaon et al., 2017) primitives
I BigMatrix handles the complexity of data blocking
FEARLESS engineering 8 / 29
Challenge - Memory Constraint
Challenge: Memory Constraint
I SGX (v1) only 90MB enclave
Solution
I Blocking - Break large data into small blocks
I We utilize SGXBigMatrix (Shaon et al., 2017) primitives
I BigMatrix handles the complexity of data blocking
FEARLESS engineering 8 / 29
Objectives - Summary
Encrypted Intermediate Data
Encrypted Result
Pre-Processing
Encrypted Search Query
Final Processing
I Very low client side processing
I Build index securely in the cloud using SGX
I Build data oblivious algorithms
I Support ranked retrieval
FEARLESS engineering 9 / 29
SGX IR - Document and Query Types
I Text DataI Ranked document retrieval using TF-IDF (Token Frequency
and Inverse Document Frequency)
I Image DataI Face recognition using Eigenface
FEARLESS engineering 10 / 29
Text Pre-Processing - Client
TokenizationStemming
TokenIDGeneration
Cryptography is the practice and
study of techniques for secure
communication ...
cryptographi practic studi
techniqu secur commun
cryptographi
practic
studi
techniqu
secur
1
2
3
4
5
EncryptedBigMatrixGeneration
tok-id
1
2
3
doc-id
11
1
1
...
freq
...
2
10
6
...
I We tokenize and stem the input text files
I We build a matrix I with token id, document id, andfrequency columns
I Finally, we encrypt I and upload
I Single round of read and write is required
FEARLESS engineering 11 / 29
Text Indexing - Server
doc-idtok-id freq
1 1 22 1 3... ... ...
8 2 11 2 5... ... ...
17 3 81 4 1... ... ...
counttok-id sum
1 8 20
2 4 9
3 7 154 5 3
# # #... ... ...
... ... ...
5 1 2
6 1 1
doc-idtok-id freq
1,0 1 2
7 2
... ... ...
1 3
9 10... ... ...
1,1... ...
2,0... ... ...
2,1
9 103,0
...Indexing
I Input I, we output two matrices
I U ′ containing total frequencies of the tokens, for IDFcalculation
I T containing equal length blocks of token to documentfrequency mapping for TF calculation
FEARLESS engineering 12 / 29
Text Indexing - IDF - Server
doc-idtok-id freq
1 1 22 1 3... ... ...
8 2 11 2 5... ... ...
17 3 81 4 1... ... ...
Sort Count & Sum Sort and
Adjust
doc-idtok-id freq
1 1 21 2 5
... ... ...
1 4 1... ... ...
2 1 3
2 5 10
3 6 4
... ... ...
counttok-id sum
1 0 0# # #... ... ...
2 8 20# # #... ... ...
3 4 9# # #... ... ...
counttok-id sum
1 8 20
2 4 9
3 7 154 5 3
# # #... ... ...
... ... ...
5 1 2
6 1 1
I I ′ ← Obliviously sort I on token id columnI We generate U , to keep count and sum of frequencies
I c← I ′[i].token id 6= I ′[i− 1].token idI U [i].sum← obliviousSelect(sum,#, 1, c)I sum← obliviousSelect(sum, 0, 1, c) + I[i].frequency
I Finally, we sort this matrix so that the dummy entries go tothe bottom
FEARLESS engineering 13 / 29
Text Indexing - TF - Block Size Optimization
I We can read document frequency of tokens from matrix I ′
I This will reveal number of documents having a specific token
I So, we split I ′ into equal length blocks
I We optimize block size b from count column of U ′ usingtechnique outline in (Shaon and Kantarcioglu, 2016)
I We assume the frequency follow Pareto distributionI Mathematically find the value minimize the padding
FEARLESS engineering 14 / 29
Text Indexing - TF - Padding Generation
We regenerate token id with bucket number function σ
doc-idtok-id freq
1 1 21 2 5
... ... ...
1 4 1... ... ...
2 1 3
2 5 10
3 6 4
... ... ...
Regenerate TokenId
doc-idtok-id freq
1,0 1 2
7 2
... ... ...
1 3
9 10... ... ...
1,1... ... ...
2,0... ... ...
2,1
9 103,0
We generate padding
Generate Padding Rows
counttok-id sum
1 8 20
2 4 9
3 7 154 5 3
# # #... ... ...
... ... ...
5 1 2
6 1 1
doc-idtok-id freq
1,1 # #... ... ...
... ... ...
... ... ...
2,1... ... ...
3,1
# # #
# #
## #
# #
Finally we merge and sort X and J to get the output T matrix.
FEARLESS engineering 15 / 29
TF - IDF Calculation
I On T we run term frequency functions - (log normalization)
1 + log(tft,d)
I On U ′ we run document frequency functions, such as, IDF
logN
dft
I Query result we use T for TF and U ′ for IDF
FEARLESS engineering 16 / 29
Bitonic Sorting of Arbitrary Input Size
I Sorting is one of the most frequently used operations
I We use arbitrary length Bitonic sort version (Lang, 1998)
I However, existing definition is recursive
I Not suitable for memory constrained environments like SGX
I So, we propose a non-recursive algorithm without usingstack
FEARLESS engineering 17 / 29
Bitonic Sort Non Recursive Algorithm - Concept
Concept
I We can express a number asN = 2xm+...+2x3+2x2+2x1
I Merge network can sort adescending and an ascendingblock into ascending orderblock
I We sort then merge fromsmallest to biggest block
Sort Merge
FEARLESS engineering 18 / 29
Bitonic Sort Non Recursive Algorithm
1: for d = 0 to dlog2(N)e do2: if ((N >> d) & 1) 6= 0 then3: start← (−1 << (d+ 1)) & N4: size← 1 << d5: dir ← (size & N &−N) 6= 06: bitonicSort2K(start, size, dir)7: if !dir then8: bitonicMerge(start,N − start, 1)9: end if
10: end if11: end for
FEARLESS engineering 19 / 29
Face recognition indexing
I We adopt EigenFace
I Pre-processing and matching face are simple matrix operations
I Core problem to solve obliviously is eigenvector calculation
I We adopt Jacobi method of eigenvector calculation
FEARLESS engineering 20 / 29
Eigenvector calculation - Jacobi method
Oblivious Value Extract
Oblivious Column Extract
Rotate
Oblivious Column Assign
Oblivious Row Assign
Calculate & Ressign
We find the max off-diagonal element at Ak,l, then rotate columnk and l. Repeat until A becomes diagonal. The diagonal valuesare eigenvalues.
FEARLESS engineering 21 / 29
Experimental Evaluations
We implemented a prototype using Intel SGX SDK 2.6 for Linux
Setup
I Processor Intel Xeon E3-1270
I Memory 64GB
I OS Ubuntu 18.04
I SGX SDK Version 2.6 for Linux
FEARLESS engineering 22 / 29
Experimental Results - Bitonic Sort and Text Indexing
0
5
10
15
20
25
30
0
5x10
6
1x10
7
1.5
x107
2x10
7
2.5
x107
3x10
7
3.5
x107
So
rtin
g t
ime
(min
)
Number of rows
Sorting timeSorting time next 2
k
Bitonic sort
0 20 40 60 80
100 120 140 160
950
1000
1050
1100
1150
1200
1250
1300
1350
Ind
ex p
rep
arat
ion
(se
c)
Data set size (MB)
Encryption onlyIncremental id
MD5 hashSHA256 hashMurmur hash
Client end processing cost onEnron Dataset
FEARLESS engineering 23 / 29
Experimental Results - Text Indexing
100
120
140
160
180
200
220
240
260
280
950
1000
1050
1100
1150
1200
1250
1300
1350
Ind
exin
g t
ime
(min
)
Data set size (MB)
Oblivious index buildingNon-Oblivious index building
SGX index processing on EnronDataset
0
0.2
0.4
0.6
0.8
1
950
1000
1050
1100
1150
1200
1250
1300
1350
ND
CG
Sco
re
Data set size (MB)
NDCG Score compare to Lucene
NDCG results compare to ApaceLucene on Enron Dataset
FEARLESS engineering 24 / 29
Experimental Result - Eigenvector Calculation
15
20
25
30
35
40
800
850
900
950
1000
1050
1100
1150
1200
Sca
lin
g a
nd
pro
ject
tim
e (s
ec)
Number of faces
Scaling and project time (sec)
Pre-processing overhead
0
50
100
150
200
250
300
350
400
800
850
900
950
1000
1050
1100
1150
1200
1250
Cal
cula
tio
n t
ime
(Ho
urs
)
Matrix elements
Oblivious Eigenface calculationNon-oblivious Eigenface calculation
Eigenvector calculation time
FEARLESS engineering 25 / 29
Thank you
Questions / Comments
I Fahad Shaon - [email protected]
I Murat Kantarcioglu - [email protected]
Acknowledgments: This research is supported in part by NIHaward 1R01HG006844, NSF awards CICI-1547324, IIS-1633331,CNS-1837627, OAC-1828467 and ARO award W911NF-17-1-0356.
FEARLESS engineering 26 / 29
References I
Fuhry, Benny et al. (2017). “HardIDX: Practical and secure indexwith SGX”. In: IFIP Annual Conference on Data andApplications Security and Privacy. Springer, pp. 386–408.
Hoang, Thang et al. (2019). “Hardware-supported ORAM ineffect: Practical oblivious search and update on very largedataset”. In: Proceedings on Privacy Enhancing Technologies2019.1, pp. 172–191.
Islam, Mohammad Saiful, Mehmet Kuzu, and Murat Kantarcioglu(2012). “Access Pattern disclosure on Searchable Encryption:Ramification, Attack and Mitigation.”. In: NDSS. Vol. 20, p. 12.
Lang, H.W. (1998). Bitonic sorting network for n not a power of 2.Accessed 05/31/2020. url: http://www.iti.fh-flensburg.de/lang/algorithmen/sortieren/bitonic/oddn.htm.
FEARLESS engineering 27 / 29
References II
Mishra, Pratyush et al. (2018). “Oblix: An efficient oblivioussearch index”. In: 2018 IEEE Symposium on Security andPrivacy (SP). IEEE, pp. 279–296.
Naveed, Muhammad, Seny Kamara, and Charles V Wright (2015).“Inference attacks on property-preserving encrypted databases”.In: Proceedings of the 22nd ACM SIGSAC Conference onComputer and Communications Security. ACM, pp. 644–655.
Shaon, Fahad and Murat Kantarcioglu (2016). “A practicalframework for executing complex queries over encryptedmultimedia data”. In: IFIP Annual Conference on Data andApplications Security and Privacy. Springer, pp. 179–195.
FEARLESS engineering 28 / 29
References III
Shaon, Fahad et al. (2017). “SGX-BigMatrix: A PracticalEncrypted Data Analytic Framework With Trusted Processors”.In: Proceedings of the 2017 ACM SIGSAC Conference onComputer and Communications Security. CCS 17. Dallas, Texas,USA: Association for Computing Machinery, 12111228. isbn:9781450349468. doi: 10.1145/3133956.3134095.
Sun, Wenhai et al. (2018). “REARGUARD: Secure keyword searchusing trusted hardware”. In: IEEE INFORM.
FEARLESS engineering 29 / 29