View
76
Download
2
Category
Tags:
Preview:
DESCRIPTION
Authors: Nguyen Quoc Viet Hung, Do Son Thanh, Nguyen Thanh Tam, and Karl Aberer; EPFL, Switzerland
Citation preview
Privacy-Preserving Schema Reuse
Nguyen Quoc Viet Hung, Do Son Thanh, Nguyen Thanh Tam, and Karl AbererEPFL, Switzerland
DASFAA | 04.2014 2DASFAA Security, privacy & trust
Schema Reuse
Query
Output
Contribute
Query
Output
Contribute
Traditional approach: shows all original schemas
Our approach: shows an anonymized (unified) schema
schema.orgfactual.com
DASFAA | 04.2014 3DASFAA Security, privacy & trust
Motivation
• Schema Reuse offers many benefits:– Reduce development complexity:
• New schemas require small modifications copy and adapt existing schemas
• Large repositories exist: schema.org, freebase.com, factual.com, niem.gov– Increase the interoperability:
• Share common standard
• But, privacy needs to be considered:– Leak schema information
Potential attack (e.g. SQL injection)– Maintain competitiveness: some parts of schemas are the source of
revenue and business strategy.
DASFAA | 04.2014 4DASFAA Security, privacy & trust
Challenges
• How to define privacy constraints?• How to define an anonymized schema
from multiple schemas?• How to define a utility function for a
certain anonymized schema?• How to find an anonymized schema
that satisfies privacy constraints and maximizes the utility function?
Query
Anonymized Schema
Privacy constraints
Our approach: shows an anonymized (unified) schema
Contributors
DASFAA | 04.2014 5DASFAA Security, privacy & trust
Challenge 1 – Define privacy constraints
• Need to identify two elements– Sensitive information
• Attributes
– Privacy requirement• Prevent leaking provenance of sensitive attributes
• Use presence constraint:A presence constraint is a triple , , , where is a schema, is a set of attributes, and is a specified threshold. An anonymized schema satisfies the presence constraint if ∈ .
DASFAA | 04.2014 6DASFAA Security, privacy & trust
Challenge 2 – Define anonymized schema
• How to define “anonymized schema” given a set of schemas– Enough information to understand
but not overwhelming
• Anonymized schema contains a set of “abstract” attributes– Abstract attribute is a set similar
attributes
…
NameNum
NameCC Holder
CC
{Name, Holder}{CC, Num}
Original schemas
Anonymized schema
Abstract attribute
DASFAA | 04.2014 7DASFAA Security, privacy & trust
Challenge 3 – Define utility function
• How to define utility function for a certain “anonymized schema”– Importance: sum of popularity of
attributes• A schema that contains more popular
attributes is better• An attribute that appears in more schemas is
more popular
– Completeness: number of abstract attributes
• The more abstract attributes, the better
?
{Name, Holder}{Holder}{CC, Num}
{Holder}{CC}
Let Σ be the set of all possible anonymized schemas. The utilityfunction : Σ → measures a mount of information of each anonymized schema.
Utility function: ∗
Importance CompletenessS1 S2 S3
DASFAA | 04.2014 8DASFAA Security, privacy & trust
Challenge 4 – Optimization problem (1)
• NP‐Hard problem
…
Maximizing Anonymized SchemaGiven a schema group and a set of privacy constraints , construct an anonymized schema ∗ such that ∗ satisfies all constraints and has the utility value.
DASFAA | 04.2014 9DASFAA Security, privacy & trust
Challenge 4 – Optimization problem (2)
• Problem modeling– Schema group: Affinity matrix– Anonymized schema: Affinity instance
• Affinity instance is an affinity matrix with some empty cells
a1a2
b1b2
c1c2
a1 b1 c1a2 b2 c2
Affinity matrix
{a1,b1}{a2,b2,c2}
Anonymized schema
a1 b1a2 b2 c2
a1 b1 c1b2
…
=
=
Affinity instance
{a1,b1,c1}{b2}
Need to find an affinity instance satisfying privacy constraints and having highest utility value
DASFAA | 04.2014 10DASFAA Security, privacy & trust
Challenge 4 – Optimization problem (4)
• Overall solution:– Meta‐heuristic with 2 steps
• Greedy algorithm: find a possible solution• Randomized local search: find optimal solution
– Improve performance• Divide and conquer: partition the set of constraints into independent sets satisfy each set independently
DASFAA | 04.2014 11DASFAA Security, privacy & trust
Experiments - Setting
Datasets: • Real data: 117 schemas• Synthetic data: vary the number of schemas and the number of attributes
Evaluation Metrics:– Utility loss: measures the amount of utility reduction w.r.t the existence
of privacy constraints
• ∆ ∅
∅ where u∅ is utility without constraints, is utility with a
set of constraints Γ– Privacy loss: measures the amount of disagreement between actual
privacy and expected privacy Θ .
• ∆ ∥ Θ ∑ log
DASFAA | 04.2014 12DASFAA Security, privacy & trust
Experiments – Computation Time
• 100 schemas, 50 attributes, 1500 constraints running time is about 6s
Computation Time (log2 of msec.)
DASFAA | 04.2014 13DASFAA Security, privacy & trust
Experiment – Privacy & Utility
• Validate the trade‐off between privacy and utility• Evaluation procedure
– Relax constraint: increase privacy threshold θ to 1 , is relaxing ratio• Observation
– The higher privacy you enforce, the more the utility loss.
Both utility loss and privacy loss are normalized to [0,1]
∆ ∆ ∆
∆ ∆
∆ ∆ ∆
∆ ∆
DASFAA | 04.2014 14DASFAA Security, privacy & trust
Conclusion
Introduced schema reuse with privacy constraints Defined privacy constraints Defined an anonymized schema from multiple schemas Defined a utility function for a certain anonymized schema Constructed an anonymized schema that satisfies privacy
constraints and maximizes the utility function
Thank you!
Questions
Recommended