15
Privacy-Preserving Schema Reuse Nguyen Quoc Viet Hung, Do Son Thanh, Nguyen Thanh Tam, and Karl Aberer EPFL, Switzerland

Privacy-Preserving Schema Reuse

Embed Size (px)

DESCRIPTION

Authors: Nguyen Quoc Viet Hung, Do Son Thanh, Nguyen Thanh Tam, and Karl Aberer; EPFL, Switzerland

Citation preview

Page 1: Privacy-Preserving Schema Reuse

Privacy-Preserving Schema Reuse

Nguyen Quoc Viet Hung, Do Son Thanh, Nguyen Thanh Tam, and Karl AbererEPFL, Switzerland

Page 2: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 2DASFAA Security, privacy & trust

Schema Reuse

Query

Output

Contribute

Query

Output

Contribute

Traditional approach: shows all original schemas

Our approach: shows an anonymized (unified) schema

schema.orgfactual.com

Page 3: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 3DASFAA Security, privacy & trust

Motivation

• Schema Reuse offers many benefits:– Reduce development complexity: 

• New schemas require small modifications   copy and adapt existing schemas

• Large repositories exist: schema.org, freebase.com, factual.com, niem.gov– Increase the interoperability:

• Share common standard

• But, privacy needs to be considered:– Leak schema information 

Potential attack (e.g. SQL injection)– Maintain competitiveness: some parts of schemas are the source of 

revenue and business strategy.  

Page 4: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 4DASFAA Security, privacy & trust

Challenges

• How to define privacy constraints?• How to define an anonymized schema 

from multiple schemas?• How to define a utility function for a 

certain anonymized schema?• How to find an anonymized schema 

that satisfies privacy constraints and maximizes the utility function?

Query

Anonymized Schema

Privacy constraints

Our approach: shows an anonymized (unified) schema

Contributors

Page 5: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 5DASFAA Security, privacy & trust

Challenge 1 – Define privacy constraints

• Need to identify two elements– Sensitive information

• Attributes

– Privacy requirement• Prevent leaking provenance of sensitive attributes 

• Use presence constraint:A presence constraint  is a triple  , , , where  is a schema,  is a set of attributes, and  is a specified threshold. An anonymized schema  satisfies the presence constraint  if  ∈ .

Page 6: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 6DASFAA Security, privacy & trust

Challenge 2 – Define anonymized schema

• How to define “anonymized schema” given a set of schemas– Enough information to understand 

but not overwhelming

• Anonymized schema contains a set of “abstract” attributes– Abstract attribute is a set similar 

attributes

NameNum

NameCC Holder

CC

{Name, Holder}{CC, Num}

Original schemas

Anonymized schema

Abstract attribute

Page 7: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 7DASFAA Security, privacy & trust

Challenge 3 – Define utility function

• How to define utility function for a certain “anonymized schema”– Importance: sum of popularity of 

attributes• A schema that contains more popular 

attributes is better• An attribute that appears in more schemas is 

more popular

– Completeness: number of abstract attributes

• The more abstract attributes, the better

?

{Name, Holder}{Holder}{CC, Num}

{Holder}{CC}

Let Σ be the set of all possible anonymized schemas. The utilityfunction  : Σ → measures a mount of information of each anonymized schema.

Utility function: ∗

Importance CompletenessS1 S2 S3

Page 8: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 8DASFAA Security, privacy & trust

Challenge 4 – Optimization problem (1)

• NP‐Hard problem

Maximizing Anonymized SchemaGiven a schema group  and a set of privacy constraints  , construct an anonymized schema  ∗ such that  ∗ satisfies all constraints  and has the utility value.

Page 9: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 9DASFAA Security, privacy & trust

Challenge 4 – Optimization problem (2)

• Problem modeling– Schema group: Affinity matrix– Anonymized schema: Affinity instance

• Affinity instance is an affinity matrix with some empty cells 

a1a2

b1b2

c1c2

a1 b1 c1a2 b2 c2

Affinity matrix

{a1,b1}{a2,b2,c2}

Anonymized schema

a1 b1a2 b2 c2

a1 b1 c1b2

=

=

Affinity instance

{a1,b1,c1}{b2}

Need to find an affinity instance satisfying privacy constraints and having highest utility value

Page 10: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 10DASFAA Security, privacy & trust

Challenge 4 – Optimization problem (4)

• Overall solution:– Meta‐heuristic with 2 steps

• Greedy algorithm: find a possible solution• Randomized local search:  find optimal solution

– Improve performance• Divide and conquer: partition the set of constraints into  independent sets  satisfy  each set independently

Page 11: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 11DASFAA Security, privacy & trust

Experiments - Setting

Datasets: • Real data: 117 schemas• Synthetic data: vary the number of schemas and the number of attributes

Evaluation Metrics:– Utility loss: measures the amount of utility reduction w.r.t the existence 

of privacy constraints

• ∆ ∅

∅ where u∅ is utility without constraints,  is utility with a 

set of constraints Γ– Privacy loss: measures the amount of disagreement between actual 

privacy  and expected privacy Θ .

• ∆ ∥ Θ ∑ log

Page 12: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 12DASFAA Security, privacy & trust

Experiments – Computation Time

• 100 schemas, 50 attributes, 1500 constraints  running time is about 6s

Computation Time (log2 of msec.)

Page 13: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 13DASFAA Security, privacy & trust

Experiment – Privacy & Utility

• Validate the trade‐off between privacy and utility• Evaluation procedure

– Relax constraint: increase privacy threshold θ to  1 ,  is relaxing ratio• Observation

– The higher privacy you enforce, the more the utility loss.

Both utility loss and privacy loss  are normalized to [0,1]

∆ ∆ ∆

∆ ∆

∆ ∆ ∆

∆ ∆

Page 14: Privacy-Preserving Schema Reuse

DASFAA | 04.2014 14DASFAA Security, privacy & trust

Conclusion

Introduced schema reuse with privacy constraints Defined privacy constraints Defined an anonymized schema from multiple schemas Defined a utility function for a certain anonymized schema Constructed an anonymized schema that satisfies privacy 

constraints and maximizes the utility function

Page 15: Privacy-Preserving Schema Reuse

Thank you!

Questions