24
Privacy Inference on Knowledge Graphs: Hardness and Approximation Jianwei Qian, Shaojie Tang, Huiqi Liu, Taeho Jung, Xiang-Yang Li Illinois Tech, UT Dallas, USTC

Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Privacy Inference on Knowledge Graphs: Hardness and

Approximation

Jianwei Qian, Shaojie Tang, Huiqi Liu, Taeho Jung, Xiang-Yang Li

Illinois Tech, UT Dallas, USTC

Page 2: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Outline

Privacy Inference on Knowledge Graphs: Hardness and Approximation 2

Background & Related Work

Problem Formulation & Analysis

Heuristic Algorithm

Conclusion

Page 3: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Privacy leak is real!

Privacy Inference on Knowledge Graphs: Hardness and Approximation 3

Tip of the iceberg!

Page 4: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Privacy inference

The attacker can use background knowledge to inferusers’ sensitive information. E.g.

– Higher education => higher salary– Colleagues=> same company– Common hobbies => friends

Privacy Inference on Knowledge Graphs: Hardness and Approximation 4

Background knowledge(Prior knowledge)

Page 5: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Previous privacy attacks• Relational data

– Data publishing– Statistical query

• Graph data– De-anonymization– Privacy learning

• Other data forms– Spatio-temporal data, genome data, multimedia data

Privacy Inference on Knowledge Graphs: Hardness and Approximation 5

Page 6: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Outline

Privacy Inference on Knowledge Graphs: Hardness and Approximation 6

Background & Related Work

Problem Formulation & Analysis

Heuristic Algorithm

Conclusion

Page 7: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Our goal

Privacy Inference on Knowledge Graphs: Hardness and Approximation 7

To model the privacy inference attack and reveal its essence from a general view

(differs from most of previous work)

Page 8: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Challenges• Privacy is difficult to define and privacy leakage is

hard to quantify

• It is challenging to apply one single model to variousdata forms

• Privacy inference is hard to model because there are a large diversity of attack techniques

Privacy Inference on Knowledge Graphs: Hardness and Approximation 8

Page 9: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Overview of the solution

Privacy Inference on Knowledge Graphs: Hardness and Approximation 9

1• Model the attacker’s knowledge by a

knowledge graph

2• Abstract four base cases of privacy

inference

3• Formulate privacy inference and prove

the hardness

4• Design approximation algorithm which

reflects network evolution

Page 10: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Knowledge graph

A heterogeneous graph of all kinds of entities and their relations related to a specific domain or topic

– E.g. Freebase, Wikidata, Dbpedia, YAGO, NELL, Google’s Knowledge Graph, Facebook’s Entities Graph

Page 11: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Our knowledge graph

Alice

Carlo

Dave

Bob

EvanM

High school

USA

F 45

Gender

Birth place

age

Education

Lawyer

Profession

Colleague

Employee-employerEmployee-employer

Friend

Son-father

$133K

Average salary

User

Attribute

U-U

U-A

A-A

LA

Birth place

Belong to

Colleague

FriendGender

Each edge is associated with a probability, indicating the attacker’s confidence

Page 12: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Privacy inference base cases

• Triangle inference– A common neighbor

For Fig (a), we havePr 𝑒$%$& = Pr 𝑒$%( Pr 𝑒$&( Pr(𝑒$%$&|𝑒$%(, 𝑒$&()

A A

A1 A2

(a) (b)

(c) (d)

Inference probability

Page 13: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Multiple common neighbors

• Multiple base cases are combined altogether if they havemultiple common neighbors

• Inference probabilityPr(𝑒-.|𝑁-.) is computed by aggregation

Privacy Inference on Knowledge Graphs: Hardness and Approximation 13

v1

vi

vi-1

v2

s t●

To infer the relation between s and t

Page 14: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Assumption about triangle inference

• To infer an unknown edge 𝑒-.:– If s, t have common neighbor(s), then 𝑒-. exists with a

probability (the inference probability); – If there is no path connecting s, t, then 𝑒-. must not exists; – If s, t do not have a common neighbor but there is a path

connecting them, the status of 𝑒-. is TBD.

Page 15: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Privacy inference• Is the problem of computing 𝑝(𝑒-.), the probability

that an unknown edge 𝑒-. exists.

Page 16: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

#P-hardnessThe #P-hard “s-t reliability” problem can be reduced to ourproblem.Please see detailed proof in our paper.

v0

v1

v2 vi-1

vi

vi+1

s t

● ● ●

Page 17: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Outline

Privacy Inference on Knowledge Graphs: Hardness and Approximation 17

Background & Related Work

Problem Formulation & Analysis

Heuristic Algorithm

Conclusion

Page 18: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Algorithm • A iterative algorithm based on Monte Carlo

simulation

Generate conjecture graph G0 fromknowledge graph G.

Flip a coin for each candidate pair todecide whether to add an edge.

Stop if est is inferred or no more edgescan be added; redo Step 2 otherwise.

Page 19: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

An example

s

s s

st

t t

t

G1 G2

G G0

Privacy Inference on Knowledge Graphs: Hardness and Approximation 19

Page 20: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Simulations • Datasets

– Relational: Adult– Social network: Pokec

• Knowledge graph construction– Edge probabilities are synthetic: uniform, Gaussian

• Inference probabilities are set by statistics

• Metric– confidence gain

Δ𝑝 𝑒-. = 𝐼(𝑒-.)(𝑝4 𝑒-. − 𝑝6(𝑒-.))

Page 21: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Results

Confidence gain ∆ p-0.2 -0.1 0 0.1 0.2 0.3

CD

F

0

0.2

0.4

0.6

0.8

1Workclass

GovernmentPrivateSelf-empWithout-pay

Privacy Inference on Knowledge Graphs: Hardness and Approximation 21

Confidence gain ∆ p-0.4 -0.2 0 0.2 0.4

CD

F

0

0.2

0.4

0.6

0.8

1Marital status

DivorcedMarriedNever-mrdSeparatedWidowed

Confidence gain ∆ p-0.2 0 0.2 0.4 0.6

CD

F

0

0.2

0.4

0.6

0.8

1Uniform, region

Uni0.92, trueUni0.92, falseUni0.98, trueUni0.98, false

Confidence gain ∆ p-0.4 -0.2 0 0.2 0.4

CD

F

0

0.2

0.4

0.6

0.8

1Uniform, weight

Uni0.92, trueUni0.92, falseUni0.98, trueUni0.98, false

Page 22: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Outline

Privacy Inference on Knowledge Graphs: Hardness and Approximation 22

Background & Related Work

Problem Formulation & Analysis

Heuristic Algorithm

Conclusion

Page 23: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Summary of contributions

We have– analyzed the nature of background knowledge and model it

on a knowledge graph– formulated privacy inference and proved its #P-hardness– designed a heuristic algorithm to approximate it– done simulations on real world datasets to show the

effectiveness of our algorithm

Page 24: Privacy Inference on Knowledge Graphs: Hardness and ...mypages.iit.edu/~jqian15/slides/Infer-Privacy-MSN.pdfKnowledge graph A heterogeneous graph of all kinds of entities and their

Thank you!