Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Privacy Inference on Knowledge Graphs: Hardness and
Approximation
Jianwei Qian, Shaojie Tang, Huiqi Liu, Taeho Jung, Xiang-Yang Li
Illinois Tech, UT Dallas, USTC
Outline
Privacy Inference on Knowledge Graphs: Hardness and Approximation 2
Background & Related Work
Problem Formulation & Analysis
Heuristic Algorithm
Conclusion
Privacy leak is real!
Privacy Inference on Knowledge Graphs: Hardness and Approximation 3
Tip of the iceberg!
Privacy inference
The attacker can use background knowledge to inferusers’ sensitive information. E.g.
– Higher education => higher salary– Colleagues=> same company– Common hobbies => friends
Privacy Inference on Knowledge Graphs: Hardness and Approximation 4
Background knowledge(Prior knowledge)
Previous privacy attacks• Relational data
– Data publishing– Statistical query
• Graph data– De-anonymization– Privacy learning
• Other data forms– Spatio-temporal data, genome data, multimedia data
Privacy Inference on Knowledge Graphs: Hardness and Approximation 5
Outline
Privacy Inference on Knowledge Graphs: Hardness and Approximation 6
Background & Related Work
Problem Formulation & Analysis
Heuristic Algorithm
Conclusion
Our goal
Privacy Inference on Knowledge Graphs: Hardness and Approximation 7
To model the privacy inference attack and reveal its essence from a general view
(differs from most of previous work)
Challenges• Privacy is difficult to define and privacy leakage is
hard to quantify
• It is challenging to apply one single model to variousdata forms
• Privacy inference is hard to model because there are a large diversity of attack techniques
Privacy Inference on Knowledge Graphs: Hardness and Approximation 8
Overview of the solution
Privacy Inference on Knowledge Graphs: Hardness and Approximation 9
1• Model the attacker’s knowledge by a
knowledge graph
2• Abstract four base cases of privacy
inference
3• Formulate privacy inference and prove
the hardness
4• Design approximation algorithm which
reflects network evolution
Knowledge graph
A heterogeneous graph of all kinds of entities and their relations related to a specific domain or topic
– E.g. Freebase, Wikidata, Dbpedia, YAGO, NELL, Google’s Knowledge Graph, Facebook’s Entities Graph
Our knowledge graph
Alice
Carlo
Dave
Bob
EvanM
High school
USA
F 45
Gender
Birth place
age
Education
Lawyer
Profession
Colleague
Employee-employerEmployee-employer
Friend
Son-father
$133K
Average salary
User
Attribute
U-U
U-A
A-A
LA
Birth place
Belong to
Colleague
FriendGender
Each edge is associated with a probability, indicating the attacker’s confidence
Privacy inference base cases
• Triangle inference– A common neighbor
For Fig (a), we havePr 𝑒$%$& = Pr 𝑒$%( Pr 𝑒$&( Pr(𝑒$%$&|𝑒$%(, 𝑒$&()
A A
A1 A2
(a) (b)
(c) (d)
Inference probability
Multiple common neighbors
• Multiple base cases are combined altogether if they havemultiple common neighbors
• Inference probabilityPr(𝑒-.|𝑁-.) is computed by aggregation
Privacy Inference on Knowledge Graphs: Hardness and Approximation 13
v1
vi
vi-1
v2
s t●
●
●
To infer the relation between s and t
Assumption about triangle inference
• To infer an unknown edge 𝑒-.:– If s, t have common neighbor(s), then 𝑒-. exists with a
probability (the inference probability); – If there is no path connecting s, t, then 𝑒-. must not exists; – If s, t do not have a common neighbor but there is a path
connecting them, the status of 𝑒-. is TBD.
Privacy inference• Is the problem of computing 𝑝(𝑒-.), the probability
that an unknown edge 𝑒-. exists.
#P-hardnessThe #P-hard “s-t reliability” problem can be reduced to ourproblem.Please see detailed proof in our paper.
v0
v1
v2 vi-1
vi
vi+1
s t
● ● ●
Outline
Privacy Inference on Knowledge Graphs: Hardness and Approximation 17
Background & Related Work
Problem Formulation & Analysis
Heuristic Algorithm
Conclusion
Algorithm • A iterative algorithm based on Monte Carlo
simulation
Generate conjecture graph G0 fromknowledge graph G.
Flip a coin for each candidate pair todecide whether to add an edge.
Stop if est is inferred or no more edgescan be added; redo Step 2 otherwise.
An example
s
s s
st
t t
t
G1 G2
G G0
Privacy Inference on Knowledge Graphs: Hardness and Approximation 19
Simulations • Datasets
– Relational: Adult– Social network: Pokec
• Knowledge graph construction– Edge probabilities are synthetic: uniform, Gaussian
• Inference probabilities are set by statistics
• Metric– confidence gain
Δ𝑝 𝑒-. = 𝐼(𝑒-.)(𝑝4 𝑒-. − 𝑝6(𝑒-.))
Results
Confidence gain ∆ p-0.2 -0.1 0 0.1 0.2 0.3
CD
F
0
0.2
0.4
0.6
0.8
1Workclass
GovernmentPrivateSelf-empWithout-pay
Privacy Inference on Knowledge Graphs: Hardness and Approximation 21
Confidence gain ∆ p-0.4 -0.2 0 0.2 0.4
CD
F
0
0.2
0.4
0.6
0.8
1Marital status
DivorcedMarriedNever-mrdSeparatedWidowed
Confidence gain ∆ p-0.2 0 0.2 0.4 0.6
CD
F
0
0.2
0.4
0.6
0.8
1Uniform, region
Uni0.92, trueUni0.92, falseUni0.98, trueUni0.98, false
Confidence gain ∆ p-0.4 -0.2 0 0.2 0.4
CD
F
0
0.2
0.4
0.6
0.8
1Uniform, weight
Uni0.92, trueUni0.92, falseUni0.98, trueUni0.98, false
Outline
Privacy Inference on Knowledge Graphs: Hardness and Approximation 22
Background & Related Work
Problem Formulation & Analysis
Heuristic Algorithm
Conclusion
Summary of contributions
We have– analyzed the nature of background knowledge and model it
on a knowledge graph– formulated privacy inference and proved its #P-hardness– designed a heuristic algorithm to approximate it– done simulations on real world datasets to show the
effectiveness of our algorithm
Thank you!