Upload
loraine-hines
View
224
Download
3
Embed Size (px)
Citation preview
Secure Database System
Introduction
• Demand of secure database systems– Cloud computing• Database-as-a-Service
• Current cloud database systems– Amazon RDS– Microsoft SQL Azure
• Advantages of cloud database systems– Economies of scale– Focus on own business
Security challenge
• Security concern– Data is put to third party service providers– The servers may be compromised
• To enforce security, encryption is necessary• Challenge– How to compute queries on encrypted data
Single method approach
• A standalone encryption system is developed to address a particular query pattern
• Example: – Order-preserving encryption scheme (OPES)
supports comparison (E(x) > E(y) iff x > y)– RSA (E(x)E(y) = E(xy))
Difficulty in building a generic query system
• Each method (e.g. OPES, RSA) has its own encryption mechanism. The encrypted values by each method are not interoperable– There is no trivial method to translate an
encrypted value by OPES to the corresponding encrypted value by RSA
– The following query cannot be supported:• SELECT * WHERE price * quantity > 1000
Supported by RSASupported by OPES
Cannot be done by OPES, RSA or composition of OPES and RSA
Building database system based on single method approach
• Example systems: NetDB2 (with encryption), CryptDB
• Limitations– Limited support on complex queries• Need to develop a new encryption method to support
each query pattern
– Lowered security guarantee in order to support more query patterns at the same time by one method
Our approach
• How to develop a query system that supports generic querying?
• Relational algebra– A few primitives are enough to build any queries
• Observation– Data interchangeability: the result of one primitive
operator can be used as input by other primitive operators
To enforce data interchangeability
• There is only one encrypted data format• All operations operate on this format
• A similar secure mechanism with data interchangeability – ShareMind– Using secure multiparty computation (SMC) with
secret sharing• Each data is split into shares and is distributed to multiple
parties. A distributed algorithm among all parties is executed and gives the result in shared form.
Illustration of SMC + secret sharing
Party 1
x: 3 y:8
Party 2
x: 2 y:4
Party 3
x: 5 y:-7
After some communications
Party 1
z: 13
Party 2
z: 6
Party 3
z: 6
Plain values:x = 10y = 5
Note:10 = 3 + 2 + 55 = 8 + 7 + (-7)
Plain values:z = (x – y)2
z = 25
SMC algorithms
Secret sharing
Generic operations in SMC
• Basic:– Addition– Multiplication
• Any operations that can be expressed as circuit can be computed– Addition on binary data can be regarded as XOR gate– Multiplication on binary data can be regarded as AND
gate– The two gates can form a universal gate which can
express any circuit
Using the idea of SMC + secret sharing on encrypted database?
• Multiple parties vs client-server
• Same storage size (= original database size) for all parties– Secure share generation reduces the storage cost at user
Data Owner / User Cloud server
User Cloud server User Cloud server
Development of new operators
• Why?
• Our goal:– To develop (i) a secure generator with (ii) its corresponding
operators
SMC Secure database system
Operations are done between multiple parties
Operations are done between user and service provider (SP)
No privileged party User is privileged. Can observe any plain data and should always have a low cost in any computation
Shares in secret sharing are materialized in each party
Shares at user are not materialized but can be generated
Attack model
• Security is defined w.r.t. to an attack model• The attack model in our case: chosen
ciphertext attack (CPA)– Formally: an attacker can observe the ciphertext
of any chosen plaintext. But it is still computationally hard to recover the key
• Some remarks on CPA– CPA is also used in RSA– OPES cannot guard against CPA
System Scope
• First address integer type data• Focus on operations between columns in the
same table– SELECT (PRICE * QUANTITY)
• Also support aggregate operation and limited join operation
DESCRIPTION OF OUR SOLUTION
Encryption procedure
• Secret sharing– Multiplicative secret sharing– Given a plain value v, the share at user vk, and the share
at SP v’• v = vkv’ mod n (n is a parameter in share generating function)
• The share at user is called the item key of the value v– The item key of each cell in the table is different– Each item key can be identified by the row ID and
column ID
Encryption illustration
A B
1 2 3
2 4 1
Plain data
A B
1 8 9
2 16 11
A B
1 9 12
2 9 16
n=35
Item keys at user Encrypted values at SP
Number of item keys = number of values in the table
Secure item key generator
• We extend RSA as our generator• Each column has a column key <m, x> (private values)• Each row has a row ID r (public value)• Item key: mxr mod n
– n: the system parameter generated in RSA; n is a composite number with two big prime factors• n is public
– m, x, r are non-zero random values < n• Note: n is at least 1024-bit value
A<4, 2>
B<1, 9>
1 8 9
2 16 11
Item keys at user
Actual storage
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16
n=35
Table schema, and column keys at user
Encrypted values at SPA B
1 8 9
2 16 11 Conceptual item keys
Note: User does not need to keep row IDs
Recovering values
• Example query: SELECT A
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16
n=35
Table schema, and column keys at user Encrypted values at SP
A
2
4
A
1 9
2 9
A<4, 2>
8
16 *Row IDs are passed to user too
Security of our item key generator
• Our generating function extends RSA function– Ours: mxr mod n (r, n are public, m, x are private)– RSA: xe mod n (e, n are public, x is private)
• Imagine m = 1, the functions are equivalent
PRIMITIVE OPERATORS
Overview
• Operations between columns– Multiplication (SELECT A * B)– Addition (SELECT A + B)– Will show that the above two are enough to support generic
function evaluation (that can be expressed as a circuit and inputs are values in the same row)
• Note: above operations assume both inputs are encrypted– We are interested in encrypt-plain column-column operations
(SELECT A * B; A is encrypted but B is not)– Special case: one of the operands is constant
• Column-constant operation (SELECT 10 * A)
Basic primitive operations
• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration
General Procedure
C
1 y
2 z
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SP
Each operation is an algorithm which may contain some communication between user and SP
C<m, x>
1
2
The result is always a new column
Security remark: The underlying item key generator is secure. In order to show the entire system is secure, it is adequate to show that the messages (if any) in the algorithm does not breach security w.r.t. CPA
Basic primitive operations
• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration
Column-column multiplication• C=AB (SELECT A*B AS C)• In some row r, the values of A, B are a, b
– a = aka’ (ak: item key at user, a’ encrypted value of a)
– b = bkb’ (bk: item key at user, b’ encrypted value of b)
• c=ab = (akbk) (a’b’) mod nA B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SP
C
3
4
n=35
Can be done by SP
Item keys are not materialized at user. User operates on column key level
Column-column multiplication
A B
… … …
r 4*2r mod 35 1*9r mod 35
… … …
A<4, 2>
B<1, 9>
Table schema, and column keys at user
Item key table
C
…
(4*1)*(2*9)r mod 35
…
C<4, 18>
Column-column multiplication - Result
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SPn=35
C<4, 18>
C
1 3
2 4
Result: C
1 2
2 1
C=AB
6
4
Answer
Security: No information about item keys of A and B is sent to SP
Basic primitive operations
• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration
Column-constant multiplication• C=kA (e.g., SELECT 5*A AS C)
– k is a constant• In some row r, the values of A is a
– a = aka’ (ak: item key at user, a’ encrypted value of a)
• c=5a = (5ak) (a’) mod nA B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SP
C
9
9
n=35
No action at SP
(4*2r mod 35) * 5= 20 * 2r mod 35
C<20, 2>
Column-constant multiplication - Result
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SPn=35
C<20, 2>
C
1 9
2 9
Result: C
1 5
2 10
C=5A
10
20
Answer
Security: No information about item keys of A is sent to SP
Basic primitive operations
• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration
Power• C=Ak (e.g., SELECT A^2 AS C)
– k is a constant• In some row r, the values of A is a
– a = aka’ (ak: item key at user, a’ encrypted value of a)
• c=a2 = (ak)2 (a’)2 mod nA B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SP
C
11
11
n=35
A2 at SP
(4*2r mod 35)2
= 16 * 4r mod 35
C<16, 4>
Power - Result
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SPn=35
C<16, 4>
C
1 11
2 11
Result: C
1 29
2 11
C=A2
4
16
Answer
Security: No information about item keys of A is sent to SP
Basic primitive operations
• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration
Key regeneration
• Objective: Set C = A, but C’s column key is different from A– C’s key appears to be random to SP
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SPn=35
C
2
4
C<??, ??>
C
??
??
Adding a constant column
A B K
1 2 3 4
2 4 1 4
… … … 4
… … … 4
Plain data
An artificial column K is added
The value on K is the same for all rows.The value is randomly determined by user at the beginning (CREATE TABLE). In the example, it is 4.
A<4, 2>
B<1, 9>
K<3, 3>α = 4
Table schema, and column keys at user
A B K
1 9 12 16
2 9 16 17
Encrypted values at SP
K is encrypted like other columns
Key regeneration
• Set C = (α-1)pAKp
– α-1 is modular multiplicative inverse of α w.r.t. n– The multiplicative inverse of 4 is 9 w.r.t. n = 35– p is randomly determined each time– The value of each row at C = value at A
Procedure
C1 = (α-1)pA C2 =Kp
C =C1C2
1 2
3
Column-constant multiplication
Power
Column-column multiplication
Note: SP has no action in step 1
Key regeneration
• C = A = (α-1)pAKp
– α = 4, p = 2, α-1 = 9
=> C = 92 A K2 = 11A K2
A B K
1 2 3 4
2 4 1 4
Plain data
A<4, 2>
B<1, 9>
K<3, 3>α = 4
Table schema, and column keys at user
A B K
1 9 12 16
2 9 16 17
Encrypted values at SP
C<11, 18>
C
29
11
AK2
C1 = 11A<9, 2>
C2 = K2
<9, 9>
C
2
4
C
1 23
2 29
Security: Only parameter sent to SP: p
Even if C’s key is sent to SP, SP cannot get K’s key. In the form of xe, e is known to SP, but x is not. Hard to compute x (like RSA). Thus, it is hard to get A’s key
Basic primitive operations
• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration
Column-column addition• C=A+B (SELECT A+B AS C)• In some row r, the values of A, B are a, b
– a = aka’ (ak: item key at user, a’ encrypted value of a)
– b = bkb’ (bk: item key at user, b’ encrypted value of b)
• c=a+b = (aka’) + (bkb’) mod n
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SP
We must combine ak and a’ to compute addition. But ak is not materialized (generated by A’s key)Send A’s key to SP in a protected way.
Column-column addition• C=A+B (SELECT A+B AS C)• In some row r, the values of A, B are a, b
– a = aka’ (ak: item key at user, a’ encrypted value of a)
– b = bkb’ (bk: item key at user, b’ encrypted value of b)
• c=a+b = (aka’) + (bkb’) mod nIn the end, c should be also encrypted like other values, i.e., c = ckc’ mod n
• ckc’= (aka’) + (bkb’) mod n
• c’ = (ck-1ak)a’ + (ck
-1bk)b’ mod n
ck can be abstracted by C’s column key. User generates C’s key randomly
Remaining problem is to help SP compute c’
User prepares these two partsItem keys are not there yet, but can be abstracted at column key level
C <mc, xc>; A <ma, xa>At row r,
ck = mcxcr mod n
ck-1 = mc
-1(xc-1)r mod n
ak = maxar mod n
ck-1ak = mc
-1ma (xc-1xa)r mod n
=> < mc-1ma, xc
-1xa>
Example
Hint for A
Hint for B
1 23 4
2 3 13
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SP
C<3, 27>
1First, generate C’ key
C-1
<12, 13>
2 C’s inverse
3 Hint for A, BHint A
<13, 26>Hint B
<12, 12>
4 SP materializes the hints for every row
C
10
25
5 SP obtains encrypted values of C
C
5
5
C
1 11
2 17
Obtain the correct answers if we look at plain values
Security
Hint for A
Hint for B
1 23 4
2 3 13
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16Table schema, and column keys at user Encrypted values at SP
C<3, 27>
Hint for A, BHint A
<13, 26>Hint B
<12, 12>
C
10
25
These 4 values are what SP can observe
Security
Hint for A, BHint A
<13, 26>Hint B
<12, 12>
These 4 values are what SP can observe
A<ma, xa>
C-1
<p, q>B
<mb, xb>C-1
<p, q>
4 equations:pma mod 35 = 13qxa mod 35 = 26…
C-1’s key is different in each addition, but A and B are not
In the long run, an attacker can gather enough information to breach security
Each can be imagined as a column in the table. Before sending the key of this column, we do a key regeneration
Security
• Recap: Even if the newly regenerated key is revealed to SP, SP cannot associate it to the old key.– Because there is an exponential in the formula
Hint for A, BHint A
<13, 26>Hint B
<12, 12>
After key regeneration
Hint A<3, 3>
Hint B<3, 12>
Hint A<?, ?>
Hint B<?, ?>
SP’s viewAnd so cannot know A’s or B’s or C’s key
Basic primitive operations
• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration
Trivial as we have a constant column
Basic primitive operations
• Column-column multiplication• Column-column addition• Column-constant multiplication• Column-constant addition• Encrypt-plain column multiplication• Encrypt-plain column addition• Necessary operation– Power– Key regeneration
Encrypt-plain operations
• C=AB (SELECT A*B AS C) but now B is not encrypted• Encryption will always incur some overheads, e.g.,
decryption is needed. • Encrypted is done only when the data is sensitive
A B
1 2 3
2 4 1
Plain data
A<4, 2>
B<1, 1>
A B
1 9 3
2 9 1Table schema, and column keys at user Encrypted values at SP
n=35
B is not encrypted is equivalent to B has a key of <1, 1>. All operations are the same
Encrypted columns and unencrypted columns are interoperable
Generic column-column operations
• With addition and multiplication, we can compute any function that can be expressed as a circuit
• All data is in binary form• It is sufficient to show that we can build a
universal gate (e.g., NAND gate) on top of binary data
Building NAND gate
• 1 – XY (multiplication and addition)
• Any circuit can be expressed• Note: Since we are using multiplicative sharing, we
have a poor protection on 0 values– Example: RSA has a poor protection on 0 and 1 values
X Y Result
0 0 1
0 1 1
1 0 1
1 1 0
Switching to other values
• X+Y-XY+1 (addition and multiplication and non-zero booleans)
X Y Result
2 2 1
2 1 2
1 2 2
1 1 2
Side note: Addition revisit
• Note: we are protecting the column keys, but an attacker may observe some information in our operations
• c’ = (ck-1ak)a’ + (ck
-1bk)b’ mod n
• c’ = ck-1a + ck
-1b mod n
• Dangerous in the binary case, as ck-1a = ck
-1b iff a = b– An attacker can identify whether the bits are the same
The same factor
A more secure method
• X+Y-XY+1 = (X-p)(q-Y)+ (1-q)X + (1-p)Y + (1+pq)• RHS =qX + pY – XY – pq + (1-q)X + (1-p)Y +
(1+pq)= X + Y –XY + 1
p, q are random numbers
All parts are of different values
Note on circuit construction
• Not efficient if we all use generic gate construction– Shortcut operations should be developed for
common jobs (part of future work, e.g., on string data)
• Still there is no comparison operation (branch)– We will discuss comparison in later slides
• The above generic gate construction is of theoretical interest only
Summary of our operations so far
• With addition and multiplication – Compute any arithmetic function (using addition,
multiplication, power) on integer columns relatively efficiently (significantly smaller overhead than baseline at user)• Baseline: user download the encrypted database,
decrypt it and compute query on its own
EXTENSION OPERATORS
Comparison
• Note: the objective is to let SP filter tuples– The result of comparison should be revealed to SP– Thus, data interchangeability cannot be achieved
by comparison• Side note: If the comparison result is required
to be hided from SP as well, the overhead at user is significantly increased– Such requirement will have a cost at user not less
than baseline
Comparison
• One operation is required only– X > 0• Every other comparison can be transformed to the
above format with 1 addition
• Equivalent operation– Check the sign bit of the data
Domain partitioning
• Modular arithmetic-3 = 32 (mod 35)-10 = 25 (mod 35)
• Domain
0 ~ 1024bit value
Positive if in this range
Negative if in this range
~ 1023 bits
Comparison
• We will let SP observe the comparison result, to achieve efficient selections
• Goal– If the real value is +ve, make it to +ve region– If the real value is –ve, make it to –ve region
0 ~ 1024bit value
Positive if in this range
Negative if in this range
Controlling the parameter
• a = aka’ => a’ = ak-1a
– Regenerate the key to make ak-1
a small constant
A<4, 2>
1 8
2 16
User
ID A
1 9
2 9SP
A
1 2
2 4
Real value
A<12, 1>
1 12
2 12
UserID A
1 6
2 12SP
n = 35n/2 = 17
As long as there is no overflow, the result is correct
A-1
<3, 1>
Overflow?
• Each region is around 21023
– Should be more than enough for usual domains, 4 bytes int => 232
• Security issue– Factoring attack• Each value has the same factor (e.g., 3 in the last
example)
– Order-preserving• A larger value will give a larger value at SP
Random column
• X > 0 f(R)X > 0 for f(R) > 0
• Example of f(R)– (R-p+1)2 : 160 bit value• p is random in every query
ID A B R
1 2 3 2
2 4 1 99Real value R is random in 280
(+ve domain, > 0)
Aggregate query
• Since they are usually the last operations, data interchangeability is not important
• COUNT– Same as selection: after SP filters the tuples, just
count qualifying ones• SUM– Next slide
SUM
• SELECT SUM(A)– Now addition operation is between rows– Using the same logic as column-column addition
A B
1 2 3
2 4 1
A<4, 2>
B<1, 9>
A B
1 9 12
2 9 16
Plain data
Table schema, and column keys at user Encrypted values at SP
r ? ?Generate the result item key (only the row ID)
s=ak1a’1 + ak2a’2
ass’ = ak1a’1 + ak2a’2
s’ = as-1ak1a’1 + as
-1ak2a’2
SUM
• SELECT SUM(A)• s’ = as
-1ak1a’1 + as-1ak2a’2
A<m, x>
ak1 = mxr1 mod nas = mxr mod nas
-1 = m-1(x-1)r mod nas
-1ak1 = xr1 (x-1)r mod n
SP needs x and (x-1)r to compute the above part
Part of column keyCannot be sent to SP directly
Performs a key regeneration (not exactly)
Key regeneration
• Keep C = pA for random p (not = A)– Note that an attacker may know A, but cannot
know C, no CPA attack on CA
<m, x>C = pA
<m’, x’>
As we discussed, key regeneration does not let the attacker trace x from knowing m’ and x’
Revealing this x’ is safe
The sum calculated is multiplied by pThe user just multiply p-1 to get the actual sum
Indexing
• Processing each tuple by linear scan is feasible but slow
• Indexing is needed• Note: index itself is a compromise of security– If certain tuples are filtered without any
processing, the attacker can obtain certain information about the data, e.g., a range about the data
An index option
• Make data become uncertainA B
1 2 3
2 4 1
A B
1 1-2 3-4
2 3-4 1-2
User SPDomain partitioning
Index on uncertain data
Index processing
• First process index, filter all disqualified tuples• Then, use cryptographic operation to compute
the actual answer
Integration with existing DBMS
DBMS
Applications
SPUser
Query
SDB Client Layer SDB ServerLayer
QueryExecution
Plan
SecureOperators
SecureOperators
MemorySQL
Result
Example
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Row ID A B C D
105 … … … …
278 … … … …Table schema, and column keys at user Encrypted values at SPn=35
A*B + D – 20 > 0
E<…>
Column-column multiplication:E = AB
Column-column additionF = E + D – 20
Comparison
F<…>
Query execution plan done (with corresponding parameters)Note: E, F can be thrown away, since they are not needed in the result
Example
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Row ID A B C D
105 … … … …
278 … … … …Table schema, and column keys at user Encrypted values at SPn=35
SP receives the query planRow ID Answers?
105 No
278 Yes
337 No
129 No
… …
Execute the plan and find the answers
Projection on C only
Row ID C
278 3
776 12
… …
Encrypted answer sent back to userRow IDs must be there
Example
• SELECT C WHERE A * B + D > 20
A<…>
B<…>
C<…>
D<…>
Table schema, and column keys at user n=35
Row ID C
278 3
776 12
… …
Row ID C
278 9
776 9
… …
User computes own item keys
Encrypted answers
C
27
3
…
Decrypt