31
De-anonymizing Social Networks Presenter: Lijie Zhang Advisor: Weining Zhang

De-anonymizing Social Networks

  • Upload
    nan

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

De-anonymizing Social Networks. Presenter: Lijie Zhang Advisor: Weining Zhang. Outlines. Motivation Attack Model De-anonymization Algorithm Experiments Conclusions. Motivation. Social network (SN) owner publishes graph data for sharing - PowerPoint PPT Presentation

Citation preview

Page 1: De-anonymizing Social Networks

De-anonymizing Social Networks

Presenter: Lijie ZhangAdvisor: Weining Zhang

Page 2: De-anonymizing Social Networks

Outlines

Motivation Attack Model De-anonymization Algorithm Experiments Conclusions

Page 3: De-anonymizing Social Networks

Motivation

Social network (SN) owner publishes graph data for sharing Academic and government data-mining: phone call networks Advertising: Third-party applications: 550,000 Facebook applications

Private information on SNs: Node attributes: node degree in a sexual network Edge presence: a single call, romantic relationship

Page 4: De-anonymizing Social Networks

Motivation

SN owner publishes anonymized graph:Nodes have no identifying attributes

Propose a model to identify nodes from the anonymized graph:Re-identification: learn the entity to which the

node belongs to. Entity: an account, a real person, a group, an

organization

Page 5: De-anonymizing Social Networks

Outlines

Motivation Attack Model De-anonymization Algorithm Experiments Conclusions

Page 6: De-anonymizing Social Networks

Model – Social Network

Social Network S:A directed graph G=(V,E)A set of node attributes X: name, telephone

numberA set of edge attributes Y: type of relationshipTreat attributes values from a discrete domain

Page 7: De-anonymizing Social Networks

Model – Data Release

A sanitized subset of nodes and edges in S Computation:

Vsan: subset of V Xsan: subset of X including sensitive attributes Ysan: subset of Y including sensitive attributes Published attributes by themselves are insufficient for re-

identification Compute induced subgraph on Vsan Remove some edges and add faked edges

}),|{Y(e)},XVsan,v|{X(v)Esan,(Vsan,Ssan YsanYEsaneXsan

Page 8: De-anonymizing Social Networks

Model – Attacker

Purpose: extract sensitive information about specific individuals from anonymized SN graphs

Attacker’s knowledge Aggregate auxiliary information Individual auxiliary information

Page 9: De-anonymizing Social Networks

Aggregate auxiliary information

Large-scale information from other data sources and social networks whose membership overlaps with the target network Ssan Gaux={Vaux, Eaux} AuxX and AuxY: probability distributions of each node

attribute in Vaux and edge attribute in Eaux, respectively (prior knowledge).

Page 10: De-anonymizing Social Networks

Individual auxiliary information

Identifiable details about a small number of individuals from the target network Ssan and possibly relationships between them

Page 11: De-anonymizing Social Networks

Model – Breaching Privacy

Extract sensitive information about specific individuals from Ssan

Re-identify nodes from target SN Ssan Re-identification: find a mapping μbetween a node

in Vaux and a node in Vsan : ground truth mapping Succeeds if

G)()( vv G

Page 12: De-anonymizing Social Networks

Model – Breaching Privacy

Re-identification algorithm: Input: Ssan and Saux Output is the probability that vaux maps to vsan

Mapping adversary:

]1,0[}){(:~ VauxVsan),(~ sanaux vv

],[,,

],[,,

][,

][,

),(~),(~),(~),(~

],,,[

),(~),(~

],,[

vuYVsanvu auxaux

yvuYVsanvu auxauxauxaux

vXVsanv aux

xvXVsanv auxaux

vvuu

vvuuyvuYAdv

vv

vvxvXAdv

Page 13: De-anonymizing Social Networks

Model – Breaching Privacy

Privacy breach: privacy of vsan is breached w.r.t adversary Adv and privacy parameter , if

],,,[],,,[

],,[],,[

yvuYAuxyvuYAdvor

xvXAuxxvXAdv

auxauxauxaux

auxaux

Page 14: De-anonymizing Social Networks

Model – Measuring Success of an Attack

Let . The success rate of a de-anonymization algorithm outputting a probabilistic mapping , w.r.t a centrality measure , is the probability that μsampled from maps a node v to if v is selected according to

})(:{ vVvV Gauxmapped

~

~ )(vG

mapped

mapped

Vv

Vv G

v

vvvPR

)(

)()]()([

Page 15: De-anonymizing Social Networks

Outlines

Motivation Attack Model De-anonymization Algorithm Experiments Conclusions

Page 16: De-anonymizing Social Networks

De-anonymization Algorithm

Seed identification: apply individual auxiliary information

Propagation: apply aggregate auxiliary information

Page 17: De-anonymizing Social Networks

Algorithm - Seed Identification Input:

The target graph A clique of k nodes which are present both in the

auxiliary and the target graphs. The degree values of k nodes pairs of common-neighbor counts Error parameter ε

Output : k-clique with matching ( ) node degrees and common-neighbor counts.

2k

1S

Page 18: De-anonymizing Social Networks

Algorithm - Propagation

Inputs: G1, G2, Output: μ Iteratively find new mappings using the

topological structure of the network and the feedback from previously constructed mappings.

S

Page 19: De-anonymizing Social Networks

Algorithm - Propagationfunction propagationStep(lgraph, rgraph, mapping) for lnode in lgraph.nodes:

scores[lnode] = matchScores(lgraph, rgraph, mapping, lnode)if eccentricity(scores[lnode]) < theta: continuernode = (pick node from rgraph.nodes where

scores[lnode][node] = max(scores[lnode]))

scores[rnode] = matchScores(rgraph, lgraph, invert(mapping), rnode)if eccentricity(scores[rnode]) < theta: continuereverse_match = (pick node from lgraph.nodes where

scores[rnode][node] = max(scores[rnode]))

if reverse_match != lnode: continue

mapping[lnode] = rnode

Page 20: De-anonymizing Social Networks

Algorithm - Propagation

Eccentricity: measure how much a node in a graph “stands out” from the rest nodes.

Rejects the match if eccentricity of the set of mapping scores is below a threshold,

)()(max)max( 2

XXX

Page 21: De-anonymizing Social Networks

Algorithm - Propagation

Complexity: O((|E1|+|E2|)d1d2) d1 : a bound on the degree of the nodes in V1

Page 22: De-anonymizing Social Networks

Outlines

Motivation Attack Model De-anonymization Algorithm Experiments Conclusions

Page 23: De-anonymizing Social Networks

Experiments – Data Sets

Twitter, Flickr, LiveJournal:

Page 24: De-anonymizing Social Networks

Experiments – Seed Identification

Evaluate the feasibility of seed identification by measuring how much auxiliary information is needed to identify a unique node in the target graph.

LiveJournal graph: auxiliary and target Construct 4-cliques, and treat a 4-clique in the target

graph as a match as long as each degree and common-neighbor count matches within a factor of 1

Page 25: De-anonymizing Social Networks

Experiments – Seed Identification

Page 26: De-anonymizing Social Networks

Experiments – Propagation

Evaluate the robustness against perturbation and seed selection

Pairs of subgraphs (V1,V2), over 100,000 nodes each of a real-world SN One for auxiliary SN, the other as the target SN Perturbation strategy: two subgraphs has nodes

overlapped 25% and edges overlapped 50%

Page 27: De-anonymizing Social Networks

Evaluate the robustness against perturbation and seed selection

Page 28: De-anonymizing Social Networks

Experiments – Propagation

Mapping between two real-world social networks: Flickr and Twitter

Finding ground truth : Exact matches in either the username, or name field 27,000 mappings Human inspect ground truth error that is under 5%.

G

Page 29: De-anonymizing Social Networks

Mapping between two real-world social networks

Seeds: 150 pairs of nodes selected from Results:

30.8% of the mappings were re-identified correctly, 12.1% were identified incorrectly, and 57% were not identified.

41% of the incorrectly identified mappings (5% overall) were mapped to nodes which are at a distance 1 from the true mapping.

55% of the incorrectly identified mappings (6.7% overall) were mapped to nodes where the same geographic location was reported.

The above two categories overlap; of all the incorrect mappings, only 27% (or 3.3% overall) fall into neither category and are completely erroneous.

G

Page 30: De-anonymizing Social Networks

Conclusions

Anonymity is not sufficient for privacy when dealing with social networks.

Demonstrate feasibility of successful re-identification based solely on the network topology and assuming that the target graph is completely anonymized.

Page 31: De-anonymizing Social Networks

Reference

[1]  Arvind Narayanan and Vitaly Shmatikov, “De-anonymizing Social Networks”, IEEE Security & Privacy '09.