Upload
colum
View
26
Download
1
Embed Size (px)
DESCRIPTION
On method-specific record linkage for risk assessment. Jordi Nin Javier Herranz Vicenç Torra. On method-specific record linkage for risk assessment Contents. Disclosure Risk Scenario: How an intruder re-identifies an individual Preliminaries : Protection methods and Record Linkage - PowerPoint PPT Presentation
Citation preview
On method-specific record linkage for risk assessment
Jordi NinJavier Herranz Vicenç Torra
2
Disclosure Risk Scenario:
How an intruder re-identifies an individual
Preliminaries:
Protection methods and Record Linkage
Location record linkage:
A new way to compute the disclosure risk
Conclusions and future work:
On method-specific record linkage for risk assessment Contents
3
Disclosure Risk Scenario
Preliminaries
Location Record Linkage
Conclusions and future work
4
On method-specific record linkage for risk assessment Disclosure Risk Scenario
X
n
a
Attribute classification
Identifiers: Passport number
Quasi-Identifiers: Age, postal code
Confidential: Income
id SexMarital status
Income
1
2
...
Male
Male
...
Single
Single
…
13.500
11.000
…
5
On method-specific record linkage for risk assessment Disclosure Risk Scenario
Re-identification scenario
X = id || Xnc || Xc X’ = X’nc || Xc
Privacy is ensured, quasi-identifiers are anonymized
Data quality is preserved, confidential attributes are preserved
6
On method-specific record linkage for risk assessment Disclosure Risk Scenario
Data set 1 Data set 2
X1 X2 X3 X4
X1 X2 X3 X4
X1 X2 X3 X4
X’1 X’2 X’3 X’4
X’1 X’2 X’3 X’4
X’1 X’2 X’3 X’4
Problem: Find a correct mapping between data file 1 and data file 2
Record Linkage
7
On method-specific record linkage for risk assessment Disclosure Risk Scenario
Distance based Record linkage
Probabilistic Record linkage
• The nearest pairs of record are considered as linked pairs • It is very easy to tune
• Results very dependent of the parameters
• Moderated time cost
• Linked pairs are computed using conditional probabilities • Tuning is difficult
• Few parameters
• High time cost
8
Disclosure Risk Scenario
Preliminaries
Location Record Linkage
Conclusions and future work
9
On method-specific record linkage for risk assessment Preliminaries
Rank swapping - p
Algorithm
For all attrj where 1 j n
Attrj is sorted
all values xij are swapped with xil where i < l l+p
Sorting Attrj is reversed
End for
End algorithm Simple
Preserve µ and
All combinations disappear
10
On method-specific record linkage for risk assessment Preliminaries
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
Rank swapping - p example
p = 20%
8
6
10
7
9
2
1
4
5
3
1
2
3
4
5
6
7
8
9
10
11
On method-specific record linkage for risk assessment Preliminaries
Microaggregation - ka
k
a a a
k
k
k
a = 1 Optimal
a > 1, NP-Hard Heuristic
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
k=3
12
On method-specific record linkage for risk assessment Preliminaries
Optimal univariate Microaggregation
Result 1. When the elements are sorted according to an attribute, for any optimal partition, the elements in each cluster are contiguous (non overlapping clusters exist)
Result 2. All clusters of any optimal partition have between k and 2k-1 elements.
x1
x2
x3
x4
k = 2
Clusters are built using the nodes of the shortest path
algorithm
13
On method-specific record linkage for risk assessment Preliminaries
MDAV Microaggregation
k=2
X X’
MDAV is multivariate heuristic microaggegation
14
On method-specific record linkage for risk assessment Preliminaries
Score: Protection method evaluation
Score = 0.5 IL + 0.5 DR
IL = 100(0.2 IL1+0.2 IL2+0.2 IL3+0.2 IL4+0.2 IL5)
IL1 = mean of absolute error
IL2 = mean variation of average
IL3 = mean variation of variance
IL4 = mean variation of covariancie
IL5 = mean variation of correlation
DR = 0.25 DLD+0.25 PLD+0.5 ID
DLD = number of links using DBRL
PLD = number of links using PRL
ID = protected values near orginal
15
Disclosure Risk Scenario
Preliminaries
Location Record Linkage
Conclusions and future work
16
On method-specific record linkage for risk assessment Location Problem Desciption
L-RL: Location Record Linkage
Standard record linkage compares all records
Rank swapping, univariate microaggregation and other methods only use some original records to create the protected data set
It is unnecessary to compare all the records
17
On method-specific record linkage for risk assessment Location record linkage
Method Description
Xext X’QuickTime™ and a
Photo - JPEG decompressorare needed to see this picture.
18
On method-specific record linkage for risk assessment Location record linkage
Example: Rank swapping
QuickTime™ and aPhoto - JPEG decompressor
are needed to see this picture.
P=20%
17
6
13
14
16
19
12
5
16
Distance
19
On method-specific record linkage for risk assessment Location record linkage
Rank Swapping Experiments
Data sets:
Census (1080 records & 13 attributes)
EIA (4092 records & 10 attributes)
Rank swapping configurations:
p = 2 … 20
Score modifications:
DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
20
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
On method-specific record linkage for risk assessment Location record linkage
L-RL: Rank Swapping Linkage Results
21
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
On method-specific record linkage for risk assessment Location record linkage
L-RL: Rank Swapping Score Results
22
On method-specific record linkage for risk assessment Location record linkage
Univariate Microaggregation Experiments
Data sets:
Census (1080 records & 13 attributes)
EIA (4092 records & 10 attributes)
Univariate microaggregation configurations:
k = 10 … 50
Score modifications:
DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
23
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
On method-specific record linkage for risk assessment Location record linkage
L-RL: Univariate Microaggregation Linkage Results
24
On method-specific record linkage for risk assessment Location record linkage
L-RL: Univariate Microaggregation Score Results
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
25
On method-specific record linkage for risk assessment Location record linkage
MDAV Experiments
Data sets:
Census (1080 records & 13 attributes)
EIA (4092 records & 10 attributes)
Univariate microaggregation configurations:
k = 10 … 50
Score modifications:
DR = 0.166 LLD+ 0.166 DLD+ 0.166 PLD+ 0.5 ID
26
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
On method-specific record linkage for risk assessment Location record linkage
L-RL: MDAV Linkage Results
27
On method-specific record linkage for risk assessment Location record linkage
L-RL: MDAV Score Results
Para ver esta película, debedisponer de QuickTime™ y de
un descompresor Photo - JPEG.
28
Disclosure Risk Scenario
Preliminaries
Location Problem Description
Location Record Linkage
Conclusions and future work
29
On method-specific record linkage for risk assessment Conclusions and future work
• We have presented a new type of record linkage designed
to exploit the limitations of some protection methods
• L-RL method obtains a more accurate DR evaluation for
rank swapping and univariate microaggregation
• MDAV is immune to the location problem
Conclusions
• We plan to study the DR of MDAV and other protection methods using other ad-hoc methods
Future work
On method-specific record linkage for risk assessment
Jordi NinJavier Herranz Vicenç Torra