An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* [email protected] Dept. of Computer Engineering and Informatics

An Experimental Study of Association Rule Hiding Techniques

Emmanuel Pontikakis*Emmanuel Pontikakis*[email protected]@ceid.upatras.grDept. of Computer Engineering and InformaticsUniversity of PatrasPatra, Greece

Vassilios Verykios*Vassilios Verykios*[email protected]@cti.grDept. of Computer and Communication EngineeringUniversity of ThessalyVolos, Greece

*Computer Technology Institute*Computer Technology InstituteResearch Unit 3Research Unit 3Athens, GreeceAthens, Greece

Outline

Introduction - Related Work Distortion-based Techniques Blocking-based Techniques Comparison and Analysis Conclusions

Introduction

Database

User

Data Mining

Association RulesChangedDatabaseHide Sensitive Rules

Related Work

Association Rule Hiding Blocking-based Technique (Saygin,

Verykios, Clifton) Distortion-based (Sanitization)

Technique – (Oliveira, Zaiane, Verykios, Dasseni)

Outline

Introduction - Related Work Distortion-based Techniques Blocking-based Techniques Comparison and Analysis Conclusion

Distortion-based Techniques

A B C D

1 1 1 0

1 0 1 1

0 0 0 1

1 1 1 0

1 0 1 1

Rule ARule A→C has: →C has:

Support(Support(A→CA→C)=80%)=80%

Confidence(Confidence(A→CA→C)=100%)=100%

Sample DatabaseSample Database

A B C D

1 1 1 0

1 0 00 1

0 0 0 1

1 1 1 0

1 0 00 1

Distorted DatabaseDistorted Database

Rule ARule A→C has now: →C has now:

Support(Support(A→CA→C)=40%)=40%

Confidence(Confidence(A→CA→C)=50%)=50%

DistortionAlgorithm

Side EffectsBefore Hiding Before Hiding ProcessProcess

After Hiding After Hiding ProcessProcess

Side EffectSide Effect

Rule Ri has had

conf(Rconf(Rii)>MCT)>MCTRule Ri has now conf(Rconf(Rii)<MCT)<MCT

Rule Eliminated(Undesirable Side Effect)

Rule Ri has had

conf(Rconf(Rii)<MCT)<MCTRule Ri has now conf(Rconf(Rii)>MCT)>MCT

Ghost Rule(Undesirable Side Effect)

Large Itemset I has had sup(I)>MSTsup(I)>MST

Itemset I has now sup(I)<MSTsup(I)<MST

Itemset Eliminated(Undesirable Side Effect)

Distortion-based Techniques Challenges/Goals:

To minimize the undesirable Side Effects that the hiding process causes to non-sensitive rules.

To minimize the number of 1’s1’s that must be deleted in the database.

Algorithms must be linear in time as the database increases in size.

Our Proposal: Weight-based Sorting Distortion Algorithm (WSDA)

High Level Description: Input:

Initial Database Set of Sensitive Rules Safety Margin (for example 10%)

Output: Sanitized Database Sensitive Rules no longer hold in the

Database

WSDA Algorithm

High Level Description: 1st step:

Retrieve the set of transactions which support sensitive rule RRSS

For each sensitive rule RRSS find the number NN11 of transaction in which, one item that supports the rule will be deleted

WSDA Algorithm

High Level Description: 2nd step:

For each rule RRii in the Database with common items with RRSS compute a weight w w that denotes how strong is RRii

For each transaction that supports RRSS compute a priority PPii, that denotes how many strong rules this transaction supports

WSDA Algorithm

High Level Description: 3rd step:

Sort the NN11 transactions in ascending order according to their priority value PPii

4th step: For the first NN11 transactions hide an item

that is contained in RRSS

WSDA Algorithm

High Level Description: 5th step:

Update confidence and support values for other rules in the database

Experimental Results of WSDA algorithm

0

100

200

300

400

500

600

700

10% 20% 40% 60%

Safety Margin

Ite

ms

ets

Re

ma

ine

d

1.b

WSDA

Itemsets Remained unaffected in the Database

0,0%

10,0%20,0%

30,0%

40,0%

50,0%60,0%

70,0%

80,0%

10% 20% 40% 60%Safety Margin

Ru

les

Ch

ang

ed(%

)

1.b

WSDA

Rules ChangedIn the Database

Experimental Results of WSDA algorithm

0

10

20

30

40

50

60

70

80

90

2500 5000 7500 10000

Database Transactions

Tim

e in

sec

s

1.b

WSDA

Average number of items per transaction: 13/50

0

20

40

60

80

100

120

140

2500 5000 7500 10000

Database Transactions

Tim

e in

sec

s 1.b

WSDA

Average number of items per transaction: 20/50

Outline

Introduction - Related Work Distortion-based Techniques Blocking-based Techniques Comparison and Analysis Conclusion

Quality of Data Sometimes it is dangerous to delete some

items from the database (etc. medical databases) because the false data may create undesirable effects.

So, we have to hide the rules in the database by adding uncertainty without distorting the database.

Blocking-based Techniques

AA BB CC DD

11 11 11 00

11 00 11 11

00 00 00 11

11 11 11 00

11 00 11 11

AA BB CC DD

11 11 11 00

11 00 ?? 11

?? 00 00 11

11 11 11 00

11 00 11 11

BlockingAlgorithm

Initial DatabaseInitial Database New DatabaseNew Database

Support and Confidence becomes marginal. Support and Confidence becomes marginal.

In New Database: In New Database: 60% ≤ conf(A → C) ≤ 100%60% ≤ conf(A → C) ≤ 100%

Modification of Association Rule Definition

A rule’s A→→B confidence and support becomes marginal:

sup(A→B)A→B) [minsup(A→B), maxsup(A→B)][minsup(A→B), maxsup(A→B)]

conf(A→B) [minconf(A→B), maxconf(A→B)]conf(A→B) [minconf(A→B), maxconf(A→B)]

minsup(A→→B)=

maxsup(A→→B)=

D

BA )1()1(

D

BABABABA ?)(?)()1(?)(?)()1()1()1(

Modification of Association Rule Definition

minconf(A→B)=

maxconf(A→B)=

|?)(?)|(|)1(?)|(|1|

|?)(?)|(|)1(?)|(|?)()1|(|)1()1|(

BABAA

BABABABA

|?)(?)|(|)0(?)|(|1|

|)1()1|(

BABAA

BA

Negative Border Rules Set (NBRS) Definition

When a rule R has either

sup(R)>MSTsup(R)>MST AND conf(R)<MCTconf(R)<MCT

OR

sup(R)<MSTsup(R)<MST AND conf(R)>MCTconf(R)>MCT,

then we say that R belongs to NBRS.

Side Effects Definition Modification in Blocking-based Techniques

Before Hiding Before Hiding ProcessProcess

After Hiding ProcessAfter Hiding Process Side EffectSide Effect

Rule Ri has had

conf(Rconf(Rii)>MCT)>MCTRule Ri has now minconf(Rminconf(Rii)<MCT)<MCT

Rule Eliminated(Undesirable Side Effect)

Rule Ri has had

conf(Rconf(Rii)<MCT)<MCTRule Ri has now maxconf(Rmaxconf(Rii)>MCT)>MCT

Ghost Rule(Desirable Side Effect)

Large Itemset I has had sup(I)>MSTsup(I)>MST

Itemset I has now minsup(I)<MSTminsup(I)<MST

Itemset Eliminated(Undesirable Side Effect)

Itemset I has had

sup(I)<MSTsup(I)<MSTItemset I has now maxsup(I)>MSTmaxsup(I)>MST

Ghost Itemset(Desirable Side Effect)

Privacy Breaches Definitions If an item ii, some values of which, are hidden by ?’s?’s, is

contained in a sensitive rule, a privacy breach will occur if the adversary can assume that with c% confidencec% confidence.

For a rule RR with maxconf(R)>MCTmaxconf(R)>MCT, a privacy breach occurs if it can be estimated, with c% confidencec% confidence, that RR is either a sensitive or a ghost ruleghost rule.

For a blocked item ii in a specific transaction TT, a privacy breach occurs if the adversary can estimate with c%c% confidenceconfidence that its original value is either 0 or 1.

Blocking-Based Techniques Goals that an algorithm has to achieve:Goals that an algorithm has to achieve:

To put a relatively small number of ?’s and reduce significantly the confidence of senstitive rules.

To minimize the undesirable side effects (rules and itemsets lost) by selecting the items in the appropriate transactions to change, and maximize the desirable side effects.

To modify the database in a way that an adversary cannot recover the original values of the database.

Our Proposal: Blocking Algorithm (BA) High Level Description

1st step: For each sensitive rule RRSS (Rule RRSS has left itemset IILL and right

itemset IIRR) compute how many 0’s and 1’s you have to block, in order to reduce the confidence of RRS.S.

2nd step: Find the set of transactions TTRR that support RRSS or the set of

transactions TTLpR’LpR’ that support partially RRSS (support partially the left itemset and do not support the right itemset).

For each transaction in TTRR find the rules RRcommoncommon with at least one common item with IIRR and for each transaction in TTLpR’LpR’ find the R’R’commoncommon∈NBRS∈NBRS with at least one common item with IL.

Assign a weight ww for each RcommonRcommon and a weight w’ w’ for each R’commonR’common..

Assign a PPTT for each transaction in T T such as P PTT is large if transaction Ti Ti has many Rcommon Rcommon rules with large w, w, and a priority value PPT’T’ for each Ti’Ti’ such as PPTT’’ is small if transaction T T has many Rcommon Rcommon rules with large w’.w’.

Blocking Algorithm High Level Description

3rd step: Sort T∈TT∈TRR starting from them with lowest PPTiTi. and sort T’∈TT’∈TL’RpL’Rp

starting from them with highest PPTi’Ti’.

4th step: For the first NN11 sorted TT∈∈TTRR block an item i∈Ii∈IRR and for the first

NN00 sorted TT∈∈TTL’Rp L’Rp block an item i∈ Ii∈ ILL

5th step: Update values minconf(Ri)minconf(Ri), minsup(Ri)minsup(Ri), for all other rules that have

been affected.

Blocking-Based Techniques Main Problems of blocking technique:Main Problems of blocking technique:

1. The maximum confidence of a sensitive rule cannot be reduced.

2. An adversary can infer the hidden values if he applies a smart inference technique, if the blocking algorithm does not add much uncertainty in the database.

3. Both 0’s and 1’s must be hidden, because if only 1’s were hidden the adversary would simply replace all the ?’s with 1’s and would restore easily the initial database.

4. Many ?’s must be inserted, if we don’t want an adversary to infer hidden data.

Experimental Results of Blocking Algorithm

0

100

200

300

400

500

600

700


Lar

ge

Item

sets

R

emai

ned

BA

CRA

Large Itemsets Remained afterThe hiding process

0%

20%

40%

60%

80%

100%


Ru

les

Ch

ang

ed(%

)

BA

CRA

Rules changed (%) after theprocess

Experimental Results of Blocking Algorithm (2)

020406080100120140160

2500 5000 7500 10000Database Transactions

Tim

e in

sec

s

BA

CRA

Databases with average 20 items per transaction

0

20

40

6080

100

120

140

2500 5000 7500 10000Database Transactions

Tim

e in

sec

s

BA

CRA

Databases with average 13 items per transaction

Experimental Results of Blocking Algorithm (3)

020

406080100

120140160

180200

"3:1" "2:1" "1:1" "1:2" "1:3"

Proportion (0:1)

Ru

les

ch

an

ge

d

Rules changed, when weChange the proportion 0:1

0%5%10%15%20%25%30%35%40%45%


Mis

scla

ssif

ied

Item

s(%

)

Decision Tree ExperimentsMisclassified Items (%)

Outline


Comparison and Analysis

Distortion-based Distortion-based TechniquesTechniques

Blocking-based Blocking-based TechniquesTechniques

Privacy Privacy BreachesBreaches

No privacy breaches

Many kinds of privacy breaches

Simplicity of Simplicity of algorithmsalgorithms

Simpler More complicated

Database Database ModificationModification

Database contains false information

Many ?’s must be inserted in the Database

Outline


Conclusions There are open research problems in

Blocking Technique:

A) What techniques must be used in order to reduce the privacy breaches?

B) In what other ways can we prevent an adversary from inferring the association rules in the database?

C) Maybe applying a chi-square test to the final database reveal some correlations between the items

References [Evfimienski et.al] Alexandre Evfimievski, Ramakrishnan

Srikant, Rakesh Agrawal, Johannes Gehrke. Privacy Privacy Preserving Mining of Association Rules.Preserving Mining of Association Rules. SIGKDD 2002, Edmonton, Alberta Canada.

Murat Kantarcioglou and Chris Clifton, Privacy Preserving Privacy Preserving Distributed Mining of Association Rules on Horizontally Distributed Mining of Association Rules on Horizontally Partitioned DataPartitioned Data, In Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (2002), 24–31.

Jaideep Vaidya and Chris Clifton, Privacy Preserving Privacy Preserving Association Rule Mining in Vertically Partitioned DataAssociation Rule Mining in Vertically Partitioned Data, In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002), 639–644.

References Stanley R. M. Oliveira and Osmar R. Zaïane. Algorithms for Algorithms for

Balacing Privacy and Knowledge Discovery in Association Balacing Privacy and Knowledge Discovery in Association Rule MiningRule Mining. In Proc. of the Seventh International Database Engineering & Applications Symposium (IDEAS'03), pp. 54-63, Hong Kong, July 16-18, 2003.

Yucel Saygin, Vassilios Verykios, and Chris Clifton, Using Using Unknowns to Prevent Discovery of Association RulesUnknowns to Prevent Discovery of Association Rules, SIGMOD Record 30 (2001), no. 4, 45–54.

S. Verykios, Ahmed K. Elmagarmid, Bertino Elisa, Yucel Saygin, and Dasseni Elena, Association Rule HidingAssociation Rule Hiding, IEEE Transactions on Knowledge and Data Engineering (2003).

Documents

An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* [email protected] Dept. of Computer Engineering and Informatics