30
On method-specific record linkage for risk assessment Jordi Nin Javier Herranz Vicenç Torra

On method-specific record linkage for risk assessment

  • Upload
    colum

  • View
    26

  • Download
    1

Embed Size (px)

DESCRIPTION

On method-specific record linkage for risk assessment. Jordi Nin Javier Herranz Vicenç Torra. On method-specific record linkage for risk assessment Contents. Disclosure Risk Scenario: How an intruder re-identifies an individual Preliminaries : Protection methods and Record Linkage - PowerPoint PPT Presentation

Citation preview

Page 1: On method-specific record linkage for risk assessment

On method-specific record linkage for risk assessment

Jordi NinJavier Herranz Vicenç Torra

Page 2: On method-specific record linkage for risk assessment

2

Disclosure Risk Scenario:

How an intruder re-identifies an individual

Preliminaries:

Protection methods and Record Linkage

Location record linkage:

A new way to compute the disclosure risk

Conclusions and future work:

On method-specific record linkage for risk assessment Contents

Page 3: On method-specific record linkage for risk assessment

3

Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work

Page 4: On method-specific record linkage for risk assessment

4

On method-specific record linkage for risk assessment Disclosure Risk Scenario

X

n

a

Attribute classification

Identifiers: Passport number

Quasi-Identifiers: Age, postal code

Confidential: Income

id SexMarital status

Income

1

2

...

Male

Male

...

Single

Single

13.500

11.000

Page 5: On method-specific record linkage for risk assessment

5

On method-specific record linkage for risk assessment Disclosure Risk Scenario

Re-identification scenario

X = id || Xnc || Xc X’ = X’nc || Xc

Privacy is ensured, quasi-identifiers are anonymized

Data quality is preserved, confidential attributes are preserved

Page 6: On method-specific record linkage for risk assessment

6

On method-specific record linkage for risk assessment Disclosure Risk Scenario

Data set 1 Data set 2

X1 X2 X3 X4

X1 X2 X3 X4

X1 X2 X3 X4

X’1 X’2 X’3 X’4

X’1 X’2 X’3 X’4

X’1 X’2 X’3 X’4

Problem: Find a correct mapping between data file 1 and data file 2

Record Linkage

Page 7: On method-specific record linkage for risk assessment

7

On method-specific record linkage for risk assessment Disclosure Risk Scenario

Distance based Record linkage

Probabilistic Record linkage

• The nearest pairs of record are considered as linked pairs • It is very easy to tune

• Results very dependent of the parameters

• Moderated time cost

• Linked pairs are computed using conditional probabilities • Tuning is difficult

• Few parameters

• High time cost

Page 8: On method-specific record linkage for risk assessment

8

Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work

Page 9: On method-specific record linkage for risk assessment

9

On method-specific record linkage for risk assessment Preliminaries

Rank swapping - p

Algorithm

For all attrj where 1 j n

Attrj is sorted

all values xij are swapped with xil where i < l l+p

Sorting Attrj is reversed

End for

End algorithm Simple

Preserve µ and

All combinations disappear

Page 10: On method-specific record linkage for risk assessment

10

On method-specific record linkage for risk assessment Preliminaries

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

Rank swapping - p example

p = 20%

8

6

10

7

9

2

1

4

5

3

1

2

3

4

5

6

7

8

9

10

Page 11: On method-specific record linkage for risk assessment

11

On method-specific record linkage for risk assessment Preliminaries

Microaggregation - ka

k

a a a

k

k

k

a = 1 Optimal

a > 1, NP-Hard Heuristic

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

k=3

Page 12: On method-specific record linkage for risk assessment

12

On method-specific record linkage for risk assessment Preliminaries

Optimal univariate Microaggregation

Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist)

Result 2. All clusters of any optimal partition have between k and 2k-1 elements.

x1

x2

x3

x4

k = 2

Clusters are built using the nodes of the shortest path

algorithm

Page 13: On method-specific record linkage for risk assessment

13

On method-specific record linkage for risk assessment Preliminaries

MDAV Microaggregation

k=2

X X’

MDAV is multivariate heuristic microaggegation

Page 14: On method-specific record linkage for risk assessment

14

On method-specific record linkage for risk assessment Preliminaries

Score: Protection method evaluation

Score = 0.5 IL + 0.5 DR

IL = 100(0.2 IL1+0.2 IL2+0.2 IL3+0.2 IL4+0.2 IL5)

IL1 = mean of absolute error

IL2 = mean variation of average

IL3 = mean variation of variance

IL4 = mean variation of covariancie

IL5 = mean variation of correlation

DR = 0.25 DLD+0.25 PLD+0.5 ID

DLD = number of links using DBRL

PLD = number of links using PRL

ID = protected values near orginal

Page 15: On method-specific record linkage for risk assessment

15

Disclosure Risk Scenario

Preliminaries

Location Record Linkage

Conclusions and future work

Page 16: On method-specific record linkage for risk assessment

16

On method-specific record linkage for risk assessment Location Problem Desciption

L-RL: Location Record Linkage

Standard record linkage compares all records

Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set

It is unnecessary to compare all the records

Page 17: On method-specific record linkage for risk assessment

17

On method-specific record linkage for risk assessment Location record linkage

Method Description

Xext X’QuickTime™ and a

Photo - JPEG decompressorare needed to see this picture.

Page 18: On method-specific record linkage for risk assessment

18

On method-specific record linkage for risk assessment Location record linkage

Example: Rank swapping

QuickTime™ and aPhoto - JPEG decompressor

are needed to see this picture.

P=20%

17

6

13

14

16

19

12

5

16

Distance

Page 19: On method-specific record linkage for risk assessment

19

On method-specific record linkage for risk assessment Location record linkage

Rank Swapping Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Rank swapping configurations:

p = 2 … 20

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

Page 20: On method-specific record linkage for risk assessment

20

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

On method-specific record linkage for risk assessment Location record linkage

L-RL: Rank Swapping Linkage Results

Page 21: On method-specific record linkage for risk assessment

21

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

On method-specific record linkage for risk assessment Location record linkage

L-RL: Rank Swapping Score Results

Page 22: On method-specific record linkage for risk assessment

22

On method-specific record linkage for risk assessment Location record linkage

Univariate Microaggregation Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Univariate microaggregation configurations:

k = 10 … 50

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

Page 23: On method-specific record linkage for risk assessment

23

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

On method-specific record linkage for risk assessment Location record linkage

L-RL: Univariate Microaggregation Linkage Results

Page 24: On method-specific record linkage for risk assessment

24

On method-specific record linkage for risk assessment Location record linkage

L-RL: Univariate Microaggregation Score Results

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

Page 25: On method-specific record linkage for risk assessment

25

On method-specific record linkage for risk assessment Location record linkage

MDAV Experiments

Data sets:

Census (1080 records & 13 attributes)

EIA (4092 records & 10 attributes)

Univariate microaggregation configurations:

k = 10 … 50

Score modifications:

DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID

Page 26: On method-specific record linkage for risk assessment

26

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

On method-specific record linkage for risk assessment Location record linkage

L-RL: MDAV Linkage Results

Page 27: On method-specific record linkage for risk assessment

27

On method-specific record linkage for risk assessment Location record linkage

L-RL: MDAV Score Results

Para ver esta película, debedisponer de QuickTime™ y de

un descompresor Photo - JPEG.

Page 28: On method-specific record linkage for risk assessment

28

Disclosure Risk Scenario

Preliminaries

Location Problem Description

Location Record Linkage

Conclusions and future work

Page 29: On method-specific record linkage for risk assessment

29

On method-specific record linkage for risk assessment Conclusions and future work

• We have presented a new type of record linkage designed

to exploit the limitations of some protection methods

• L-RL method obtains a more accurate DR evaluation for

rank swapping and univariate microaggregation

• MDAV is immune to the location problem

Conclusions

• We plan to study the DR of MDAV and other protection methods using other ad-hoc methods

Future work

Page 30: On method-specific record linkage for risk assessment

On method-specific record linkage for risk assessment

Jordi NinJavier Herranz Vicenç Torra