47
1 QUAKE: Quadruple Key and Encryption C raig A.M ason Shihfen Tu Q uansheng Song University ofM aine Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference, Washington, DC, February, 2004.

QUAKE: Quadruple Key and Encryption

  • Upload
    fay

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

QUAKE: Quadruple Key and Encryption. Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference, Washington, DC, February, 2004. Background. - PowerPoint PPT Presentation

Citation preview

Page 1: QUAKE: Quadruple Key and Encryption

1

QUAKE: Quadruple Key and Encryption

Craig A. Mason Shihfen Tu Quansheng SongUniversity of Maine

Centers for Disease Control and Prevention Third Annual National Early Hearing Detection and Intervention Conference,

Washington, DC, February, 2004.

Page 3: QUAKE: Quadruple Key and Encryption

3

To Link or Not to Link… Data linkage provides huge opportunity for

public health research Integrate large, complex, longitudinal datasets Address questions impossible to do any other way

This impractical 10 or 15 years ago Lead to fears of “Big brother”

Abuse of information Has identifiable information be released by researchers?

Individual rights versus public good At what point does the public right to health trump my

right to privacy? (assuming either of these exist)

Page 4: QUAKE: Quadruple Key and Encryption

4

Strategies for Addressing Concerns

Legislative Procedural Educational Our focus: Technological

Review linkage strategies Review encryption issues

Page 5: QUAKE: Quadruple Key and Encryption

5

Deterministic Linkage A series of common identifying fields are

selected across two databases Records are matched across databases

based on these fields Two records must have identical values

across all of these fields in order to be linked “John”, “Bartholomew”, “Szapoznick” “Jon”, “Bartholomew”, “Szapoznick”

Page 6: QUAKE: Quadruple Key and Encryption

6

Probabilistic Linkage Two records do not have to match

across all fields in order to be linked For a possible pairing, a value is

calculated that reflects the likelihood that the two records are (or are not) the same person

Based upon the frequencies of values and the quality of the data

Page 7: QUAKE: Quadruple Key and Encryption

7

Reliability of data fields Greater reliability results in increased odds of a correct match If a field is pure noise, correct matches will be random

Frequency of field values The more common the value in a field, the greater the odds

that the records will be erroneously matched E.g., a match based on the name Szapocznik is more likely to

reflect a correct match than is a match on the name Smith Number of matches

The greater the number of individuals in one database that also appear in the other database, the greater probability of linkage across databases.

If two databases have no individuals in common, the probability of a linkage across the databases must be zero

Factors Influencing Probabilistic Linkage

Page 8: QUAKE: Quadruple Key and Encryption

8

Statistician’s Anonymous

“I’m David, and I’m a bean-counter”

Page 9: QUAKE: Quadruple Key and Encryption

9

Encryption Ecretsay odecay Information is coded so that true values are

not obvious Ancient field Modern era focus on electronic

transmission of sensitive data Notice the little yellow padlock in the bottom

corner of your browser when shopping on e-bay?

Page 10: QUAKE: Quadruple Key and Encryption

10

Encryption Techniques Asymmetric or public key

Different key for encryption and decryption Encryption key is public Decryption key is private Decryption key cannot be derived from encryption

key Provide security of data transmission

Anyone can use the public key to code a message Only I can decrypt it

Typically based on product of large primes

Page 11: QUAKE: Quadruple Key and Encryption

11

Challenge of Factorization

Factors hard to find But once you know one, the other is easy to find

Public Key: 114,381,625,757,888,867,669,235,779,976,146,612,010,218,296,721,242,362,562,561,842,935,706,935,245,733,897,830,597,123,563,958,705,058,989,075,147,599,290,026,879,543,541

Private Key Based on Factors:3,490,529,510,847,650,949,147,849,619,903,

898,133, 417,764,638,493,387,843,990,820,577

and32,769,132,993,266,709,549,961,988,190,83

4,461,413,177,642,967,992,942,539,798,288,533

Page 12: QUAKE: Quadruple Key and Encryption

12

Encryption Techniques Symmetric key

Same key for encryption and decryption Key is not made public

Secret key - One Key to Rule Them All More secure than asymmetric key

Nothing suggesting a possible key is published Asymmetric key must be 6 to 30 times longer

than symmetric key for equivalent security Useful if you know in advance exactly who

will want to encrypt a message to you

Page 13: QUAKE: Quadruple Key and Encryption

13

Encryption Techniques Security often described in terms of bits

128 bit encryption indicated 2128 possible keys

3,402,823,669,209,384,634,633,746,074,300,000,000,000,000,000,000,000,000,000,000,000,000

A lot of possibilities… Widespread use of 1024 and 2048 bit

encryption on the horizon 128 bit symmetric = 2304 bit asymmetric

(Cryptography, p.166)

Page 14: QUAKE: Quadruple Key and Encryption

14

A Dirty Little Secret.. These big numbers hide the fact that the

security is only as good as the algorithm Think reliability of DNA testing Plaintext attack (and its variations)

If the only unique name in the data set is Szapocznik

And the only unique variation in the encrypted data set is “X*GFfF825d=“…..

The key can be resolved

Page 15: QUAKE: Quadruple Key and Encryption

15

A Dirty Little Secret..

Even without the key, you can determine my grade Some computational or physical wall between

decrypted and encrypted data

SCREENING DATA SCHOOL DATALast Name First Name Last Name First Name GradeMason Craig KLFIP XCSEA B+Mason James FDDFO UIQMB A-Smith Craig KLFIP UIQMB D-

Page 16: QUAKE: Quadruple Key and Encryption

16

One-to-One Encryption

Identifiers are encrypted into a unique value

Craig

93812….2431Encryption

KeyH3~f9(-d

Page 17: QUAKE: Quadruple Key and Encryption

17

One-to-Many Encryption

Identifiers are encrypted into one of multiple values Lack of uniqueness increases challenge of decryption

Craig

93812….2431Encryption

KeyH3~f9(-dor9Dj1D[d dfR1”d/Gor

Page 18: QUAKE: Quadruple Key and Encryption

18

That’s nice, but how can this help with data

linkage?

All right. But apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, the fresh water system, and public health… What have the Romans ever done for us?

--- Reg, spokesman for the People’s Front of Judea

Monty PythonLife of Brian

(and Martin White, UC Berkeley)

Page 19: QUAKE: Quadruple Key and Encryption

19

The Politics of Linkage

Two data systems contain information on same individuals Would like to link data for public health research

Service Data: Craig A. Mason…. School Data: Craig A. Mason….

Page 20: QUAKE: Quadruple Key and Encryption

20

Service Data: Craig A. Mason…. School Data: Craig A. Mason…. I may not want schools to know about health services I have received

The Politics of Linkage

Page 21: QUAKE: Quadruple Key and Encryption

21

Service Data: Craig A. Mason…. School Data: Craig A. Mason…. What solution may allow data to be linked, yet prevent sources from seeing each other’s

identifying data

The Politics of Linkage

Page 22: QUAKE: Quadruple Key and Encryption

22

Quake

QUAdruple Key and Encryption

Service Data: Craig A. Mason…. School Data: Craig A. Mason….

Page 23: QUAKE: Quadruple Key and Encryption

23

Quake Requires algorithms to be reversible You can “undo” a process to come back to

original value

22;22

3515;1553358;853

22

Page 24: QUAKE: Quadruple Key and Encryption

24

Quake Requires algorithms to be commutative You get the same answer even if you do the

problem backwards

46313423

4321

8765

;50432219

8765

4321

1535;1553835;853

Page 25: QUAKE: Quadruple Key and Encryption

25

Quake

052385043…9471 757260024…2512

Each provider selects their own unique encryption key that is used to encrypt identifiers prior to linkage

Service Data: Craig A. Mason…. School Data: Craig A. Mason….

Page 26: QUAKE: Quadruple Key and Encryption

26

Quake

850258434…3435

052385043…9471

420504763….8372

757260024…2512

Community members representing individuals in each dataset also select their own unique encryption keys

Service Data: Craig A. Mason…. School Data: Craig A. Mason….

Page 27: QUAKE: Quadruple Key and Encryption

27

Quake

850258434…3435

052385043…9471

420504763….8372

757260024…2512

Hidden Key: 342002330…2852 Hidden Key: 147742268…0042

The encryption keys for the community representatives and the providers are entered separately, and the combined keys are hidden from the users

Service Data: Craig A. Mason…. School Data: Craig A. Mason….

Page 28: QUAKE: Quadruple Key and Encryption

28

Quake

850258434…3435

052385043…9471

420504763….8372

757260024…2512

Service Data: *Bj&!33t…. School Data: yy#K66….

These combined encryption keys are used to encrypt identifiers in each file prior to linkage

Hidden Key: 342002330…2852 Hidden Key: 147742268…0042

Page 29: QUAKE: Quadruple Key and Encryption

29

Quake

850258434…3435

052385043…9471

420504763….8372

757260024…2512

Service Data: *Bj&!33t…. School Data: yy#K66….

Symmetric key with 1:many encryption

Hidden Key: 342002330…2852 Hidden Key: 147742268…0042

Page 30: QUAKE: Quadruple Key and Encryption

30

Quake

850258434…3435

052385043…9471

420504763….8372

757260024…2512

Service Data: *Bj&!33t…. School Data: yy#K66….

The combined encryption keys are not stored so neither party can decrypt on their own

Hidden Key: 342002330…2852 Hidden Key: 147742268…0042

Page 31: QUAKE: Quadruple Key and Encryption

31

Illustration of Security

Rep Key: 3

Provider Key: 7

Hidden Combined Key: 21

To see why, consider the following simple keys

Service provider key: 7 Community representative key: 3 Combined key: 3 x 7 = 21

Simple message to encrypt, “A”

Simple encryption algorithm Each letter has a value 1-26,

repeating “A”=1, “Z”=26, “A”=27…

Multiply that value by the encryption key in order to obtain the new value

Page 32: QUAKE: Quadruple Key and Encryption

32

Illustration of Security

Rep Key: 3

Provider Key: 7

Hidden Combined Key: 21

Once encrypted, “A” becomes “U”

Original Message: A

Encrypted Message: U

Page 33: QUAKE: Quadruple Key and Encryption

33

Illustration of Security

Rep Key: 3

Provider Key: 7

Hidden Combined Key: 21

If the community representative applied their key to the encrypted message, they would see “G”

21 ÷ 3 = 7 “G” is the letter with value 7

Encrypted Message: U

De-Encrypted Message: G

Page 34: QUAKE: Quadruple Key and Encryption

34

Illustration of Security

Rep Key: 3

Service Provider Key: 7

Hidden Combined Key: 21

If the service provider applied their key to the encrypted message, they would see “C”

21 ÷ 7 = 3 “C” is the letter with value 3

Encrypted Message: U

De-Encrypted Message: C

Page 35: QUAKE: Quadruple Key and Encryption

35

Illustration of Security

Rep Key: 3

Service Provider Key: 7

Hidden Combined Key: 21

Only by working together can the message be decrypted

Encrypted Message: U

Fully Decrypted Message: A

Partially Decrypted Message: G

Page 36: QUAKE: Quadruple Key and Encryption

36

Quake

850258434…3435

052385043…9471

420504763….8372

757260024…2512

Service Data: *Bj&!33t…. School Data: yy#K66….

Once each dataset encrypted, several possible methods for linking

Hidden Key: 342002330…2852 Hidden Key: 147742268…0042

Page 37: QUAKE: Quadruple Key and Encryption

37

Linking Encrypted Files Simple approach

Bring both encrypted files together on independent, non-networked machine

Each of the four parties enters their own key Respective files internally decrypted and linked New, de-identified linked file containing fields of

interest created Record of identifiers and keys electronically or

physically erased DoD 5220.22-M protocol

Page 38: QUAKE: Quadruple Key and Encryption

38

Linking Encrypted Files Benefits

Flexible linkage strategies (partial names, etc.) Easiest to perform Once completed no identifiers to enable

plaintext attack Issues

Process of encryption/decryption can be computationally demanding

Potential record of encrypted data and all keys Can be destroyed, but time consuming

Page 39: QUAKE: Quadruple Key and Encryption

39

Variation of QuakeKey: 052385043…9471 Key: 757260024…2512

Service Data: Craig A. Mason School Data: Craig A. Mason

Each provider selects own unique encryption key used to encrypt identifiers prior to linkage

Page 40: QUAKE: Quadruple Key and Encryption

40

VariationKey: 052385043…9471 Key: 757260024…2512

Service Data: *Bj&!33t…. School Data: yy#K66….

Identifiers in their file encrypted with a 1:1 symmetric key

Page 41: QUAKE: Quadruple Key and Encryption

41

Service Data: *Bj&!33t….

VariationKey: 052385043…9471 Key: 757260024…2512

School Data: yy#K66….

Parties then switch encrypted files If identifying fields in both files are all equal..

May be prone to variations of a plaintext attack Inclusion of additional records whose identifiers

contain random noise can nearly eliminate this risk

Page 42: QUAKE: Quadruple Key and Encryption

42

Service Data: Jf*72Coo….

VariationKey: 052385043…9471 Key: 757260024…2512

School Data: Jf*72Coo….

Each party then applies their own key to the other parties already-encrypted file

Identifiers in each file will have the same value Can not determine key used by other source

Page 43: QUAKE: Quadruple Key and Encryption

43

Service Data: Jf*72Coo….

VariationKey: 052385043…9471 Key: 757260024…2512

School Data: Jf*72Coo….

If files brought together by one of the parties They may be able to conduct a plaintext attack May then be able to determine key used by other

party Both files linked by trusted third party

Page 44: QUAKE: Quadruple Key and Encryption

44

Service Data: Jf*72Coo….

VariationKey: 052385043…9471 Key: 757260024…2512

School Data: Jf*72Coo….

Again, may bring in community representatives

Linked Data: Jf*72Coo, Services, Grades

Final Linked Data: Services, Grades

Page 45: QUAKE: Quadruple Key and Encryption

45

Variation Link based upon the encrypted

identifier fields No need to decrypt files when linking Apply deterministic and probabilistic

algorithms to encrypted data No machine ever sees all keys

Final file contains no identifiers and only a limited number of fields of interest

Page 46: QUAKE: Quadruple Key and Encryption

46

Variation of Quake Issues

Requires 1:1 encryption algorithm Can be addressed, but adds level of

complexity Can not examine partial strings

Specific partial strings can be generated prior to encryption

Month of birth, day of birth First letter of first name

Page 47: QUAKE: Quadruple Key and Encryption

47

Advanced Linkage Protocols for Addressing Confidentiality

Concerns Encrypted Linkage Protocols

Unique encryption keys administered by each database administrator and community liaisons

No one at any time sees the other person’s identifiers Person conducting the linkage never sees any identifiers Resulting linked set includes no decrypted identifiers Resulting file can not be decoded, expanded, or relinked

without agreement and cooperation of all parties The community participates in the process

Technology that creates confidentiality concerns may provide means for reducing those concerns