49
©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Embed Size (px)

Citation preview

Page 1: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

©2015, David C. Roberts, all rights reserved

1

TranslucentDatabasesTranslucentDatabases

CSCI 6442

Page 2: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Recommendation

Buy Translucent Databases, Second Edition, by Peter Wayner. It’ll be a valuable addition to your library

2

Page 3: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

3

Translucent DatabasesThis is a new technique for controlling

access to database information. It is being used in some state-of-the-art

software products. It’s not widely known, is not found in any database textbooks.

We will use it. You will learn to use it and teach others to use it.

Page 4: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

One-Way Functions• Translucency typically relies on the

validity of one-way functions• There are a number of functions

used as one-way • There are no proofs that these

functions are one-way

Page 5: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

One-Way FunctionA one-way function h(x) is a function

such that h(x) can be computed easily but it is impossible, given y, to find x such that h(x) = y

May uses of translucency are based on one-way functions

5

Page 6: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

One-Way Function• Wikipedia definition: a function

that is easy to compute but hard to invert, given the image of a random input.

• “Hard” may be hard enough for some commercial purposes, not hard enough for others

Page 7: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Which Are One-Way?• Modulo• Multiply by a prime• Hashing

Page 8: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

One-Way Functions• Secure Hash Algorithm (SHA) from NIST-

designed as one-way function• From file of any length, produces a 160-

bit value• Arbitrary input size allows great

flexibility: can be used as message digest

• Generalizes earlier MD-5 work

8

Page 9: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Locus of Access Control• Where do you think access control

should be located?

DatabaseDatabas

e System

Application

Page 10: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

In the Application• Access control may also be in the

application• Such control must be built-in to

program logic• It is hard to verify, hard to change, hard

to completely test• But it’s not so hard for a programmer to

insert (and hide) a back door

Page 11: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

11

By The Database System?

Usually, database access is controlled by built-in access controls that are administered through GRANT and REVOKE commands.

Page 12: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Grant• Usual technique:

– GRANT CONNECT TO SMITH IDENTIFIED BY PASSWORD

– GRANT SELECT ON EMP TO SMITH– GRANT UPDATE ON EMP TO SMITH

• But what if new users arrive all the time, even over the Internet?– Not enough DBAs in the world– New users may arrive 24x7

12

Page 13: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Translucency• Translucent techniques allow privileges to be

controlled by the DBMS but not using the DBMS’s relatively static controls

• Translucent access control can be made external to the application

• Audit of translucent access controls is relatively straightforward

• Typically, translucent techniques are used to allow users to see and change their own information, in a controlled fashion

13

Page 14: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

14

Motivation• Translucency exposes some parts of the

database to the public and protects other parts.

• With translucency, the whole database content is never exposed to a single individual, and access control administration is not required.

• The database is designed to let out some information, keep other information protected.

Page 15: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Security• Translucency provides protection that’s

generally fine for privacy purposes• It doesn’t replace more heavy-duty

forms of protection• For example, nuclear weapon launch

codes need stronger security than translucency!

Page 16: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

16

Early TranslucencyUNIX password file: stored using

irreversible scrambling function. The password entered by a user seeking access is scrambled and compared to the stored scrambled password. The password entered by the user is never stored in its original form.

Page 17: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Passwords

17

johnsmith

password

User Enters

johnsmith

uejsgqkkd

Stored

Encrypted password

Page 18: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Advantages• Compromise of password list won’t

compromise control of access to the system

• Compromise of the encryption function won’t compromise access control if the encryption function is a one-way function

18

Page 19: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Attacking Translucency• How would you attack the UNIX

password system?

Page 20: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Attacks• Most attacks are dictionary

attacks, trying all possible combinations

• Counter such attacks in two ways:1. Limit number of attempts2. Provide large number of possible

combinations

Page 21: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

21

Examples of Uses for Translucency

Personal scheduling: personal schedules have considerable malicious values.

Keep personal schedules for many users in a single table, but expose each user’s information to only that user

Users can come and go without administering accesses

Software as a Service (SaaS)—multi-user software offered over the Web

Page 22: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

22

More ExamplesPreference information: clothing sizes

and preferences, food ordering history, travel history all have potential malicious value.

They can be entered by the customer and then accessed later only by that customer.

They can be used for analysis without exposing identities to the analysts.

Page 23: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

23

General Principles• Translucency usually employs “stunt

data”; that is, data that stands in for real data and behaves similarly but does not have the original value.

• Stunt data is usually computed by a one-way transformation from the original data.

• In the password example, the encrypted password is stunt data

Page 24: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

24

One-Way TablesTranslucency is added to tables by passing sensitive

values through a one-way function before storing them.

Diary(HashedUserID,HashedPW,ID,Content)

MD5(UserID) and MD5(PW) are stored as HashedUserID and HashedPW. User enters UserID and PW. Query to retrieve all comments is

Select MD5(UserID),MD5(PW),ID,ContentFrom DiaryWhere HashedUserId=MD5(UserID)

and HashedPW=MD5(PW);

Page 25: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

25

Use of UserID and PW• For translucency, we generally use

UserID and PW• Users can choose a UserID at their

first login• We can require UserID to be

unique on its own, as long as we have PW

Question: why do we need a PW for UserID to be unique?

Page 26: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

26

VulnerabilitySimple use of one-way function is vulnerable to

dictionary attack.SELECT * FROM PURCHASES WHERE

NAMEHASH=MD5(“Fred Smith”)

Can append password to the vulnerable value, hash both of them togetherINSERT INTO PURCHASES VALUES MD5(“Fred

Smith/swordfish”), ….

Now dictionary attack becomes geometrically more difficult

Page 27: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Simplification• What about small variations in how

user enters information? What if we want to not be sensitive to them?

• Can clean up one-way input; remove spaces, convert to upper case, remove punctuation, remove non-printing characters, even use Soundex.

What happens to security if we “clean up” the one-way input?

Page 28: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

28

Security Trade-Offs• Today’s strong hash is tomorrow’s

broken protection• A desktop machine can compute

1,000,000 MD5 hashes per second• Difficulty of dictionary attacks can be

estimated numerically, providing an estimate of the strength of a transformation

• Normalizing input increases vulnerability to dictionary attack

Page 29: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

29

Salting• Data can be “salted” by a salt column of

random numbers, appended to the value that is hashed, before it is transformed.

• Dictionary attack would now have to guess the salt string as well as UserID and PW.

• Salt can be unique per row or use the same salt for a whole table

Does salting improve security by a little or a lot?

Page 30: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

30

UserID and PW• Is it better to concatenate UserID

and PW and then hash, or to put their hashed values in separate columns?

• What are the tradeoffs?

Page 31: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

31

More About UserID, PWShould we keep the original form of

UserID and PW in the database?

What are the tradeoffs?

Page 32: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

32

One-Way Transformations

1. Pure one-way functions2. Trapdoor functions3. Symmetric encryptionPure one-way functions cannot be reversed, so

their effect cannot be undone, and once obscured, encoded information cannot be recovered. On the other hand, trapdoor functions and symmetric encryption functions allow some users to be given additional access, or they can allow “just a peek” when needed.

Page 33: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

33

Pure One-Way Functions

Let h(x) stand for a one-way function. For a pure one-way function, it is easy to compute y=h(x) but impossible, given y, to find x such that y=h(x).

In general, there are not proofs of the irreversibility of one-way functions.

The MD5 hash function is implemented in MySQL. It is widely used in industry under an assumption that it is one-way; however, there is no proof that it is truly one-way.

One common use of the MD5 function is for elimination of file duplicates. An MD5 hash is computed for each file, and if a new file is encountered with an identical MD5 hash, that file is compared with the original.

Page 34: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

34

Trapdoor FunctionsTrapdoor functions: appear to be one-

way functions, but there is another value called the key that can be used to reverse h(x).

Such functions can also be used for public-key encryption, where one key is used to encrypt and the other can be used to decrypt.

Page 35: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

35

Symmetric EncryptionSymmetric encryption: the same

key is used to encrypt and decrypt. Not truly one-way because the person who encrypts can also decrypt.

In a translucent database, the compromise of the key would open up all the protected contents.

Page 36: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

36

MySQL Implementations

MD5(“data”) produces a 32-character long stringPASSWORD(“password”) Produces a 16-

character string. NOT the algorithm used by UNIX.

ENCODE(“data”,”password”) encodes data into a binary string. DECODE reverses the process

DES_ENCRYPT and DES_DECRYPT use the DES to encrypt and decrypt. Any user can encrypt, but only users with access to keys can decrypt.

Page 37: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

37

Inserting Redundancy• Very secure translucent tables can hash

several columns together, such as name/address/ssn/birthday to encode HR information

• Such a hash is difficult to attack• However, retrieval won’t work if the user

misspells just one of the entries • Can design the table to match three of the

four by constructing four hashed columns, one with each of the four values omitted.

Page 38: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

38

Three Out of FourEmp(HashedNameAddressSSN,

NameAddressBirthday, HashedNameSSNBirthday, HashedAddressSSNBirthday, … <protected information> … )

Page 39: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

39

Protecting Repetitions• When a value is repeated, all the hashes of it will be

identical• This is true even if a password or other values are

appended• This behavior may be acceptable, or it might be a

weakness—you may want to protect repetitions• To protect repetitions, add a serial number to each

entry for a given value• For example, “001/Fred Smith”, “002/Fred Smith”,

“002/Fred Smith”, etc.• Decoding takes longer, since all Fred Smith values have

to be decoded one at a time until there are no more

Page 40: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

40

Coordinating Users• Previous techniques show how to protect with

a single mechanismInformation is indexed with h(info)—if you have h(info)

then you get the whole rowh(x) acts like a password

• Can use two different values, require both to get a row

• Example: bulletin board of communications between two people. Can append a password

• Can also use public-private keys

Page 41: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Bulletin Board• CREATE TABLE bb (FROM CHAR(32), TO

CHAR(32), MESSAGE BLOB);• Put hashed from and to names into first

two columns• Receiver logs on with TO userid, uses it

to retrieve messages• Note that userid must be validated for

this to be secure

Page 42: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Prediction• Party wants to make a choice, seal into an envelope,

have it opened later and reliably read• Let party choose string b as that choice. Party also

chooses random strings r1 and r2• Database stores hash(r1 r2 b). r1 and hash(r1 r2 b) are

both published for those who want to review results• When outcome is known, r2 and b (Party’s expected

outcome) are released. If the published r1, along with r2 and b produce the released hash result, then b is the prediction and there was no cheating.

Page 43: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Bidding• Need to validate that the person

bidding is the same person as the last bid

• Can require a person to authenticate by presenting h(x)n+1 when h(x)n was presented previously

• Or for more security go in the other direction

Page 44: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Access Control Lists• Sometimes we need to establish an access

control list• Certain individuals are given access to certain

rows• Create a table with hashed column from the

row and hashed userid of user who has access• Query joins the access control table and the

table of data

Page 45: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

45

Interesting Applications• Babysitting Exchange—controls access

to addresses of clients, schedules of sitters

• Blog site—only I can add and delete to my own blog

• Store—only I can see my record of sizes that I have purchased

• SaaS—single application used by multiple customers

Page 46: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Quantization• Another technique of translucency• Quantization is rounding off• Can reduce visibility into details by

rounding the data

Example: remove home address, keep zip code

Page 47: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Quantization • Has many variations• Similar to adding small error to

data• Can project out some dimensions

(ie exclude one or more columns)

Page 48: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

Quantization Examples• Military location as lat and long, but not

minutes and seconds• $ amount as number of figures, but no

figures given—really logarithmic rounding

• $ as number of commas—also logarithmic rounding

• Rounding creates quanta

Page 49: ©2015, David C. Roberts, all rights reserved 1 Translucent Databases CSCI 6442

49

Thank YouThank You