Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte...
Click here to load reader
prev
next
of 30
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and
Information Systems Univ. of N.C. Charlotte Reconstruction from
Randomized Graph via Low Rank Approximation
Slide 2
Outline Background & Motivation Low Rank Approximation on
Graph Data Reconstruction from Randomized Graph Evaluation Privacy
Issue 2
Slide 3
Background & Motivation 3
Slide 4
Background In the process of publishing/outsourcing network
data for mining/analysis, pure anonymization is not enough for
protecting the privacy due to topology based attacks(
Active/passive attacks, subgraph attacks). Graph
Randomization/Perturbation: Random Add/Del edges (no. of edges
unchanged) Random Switch edges (nodes degree unchanged) Feature
preserving randomization Spectrum preserving randomization Feature
preserving via Markov-chain based graph generation Clustering ---
grouping subgraphs into supernodes 4
Slide 5
Motivation We focus on whether we can reconstruct a graph from
s.t. 5 Our Focus
Slide 6
Low Rank Approximation on Graph Data 6
Slide 7
Adjacency Matrix & Its Eigen-Decomposition Matrix
Representation of Network Adjacency Matrix A (symmetric)
Eigen-decomposition: Questions: What are their relations with graph
topology? 7
Slide 8
8 Leading Eigenpairs vs. Graph Topology What are the role of
positive and negative eigen-pairs in graph topology? Without loss
of generality, we partition the node set into two groups and the
adjacency matrix can be partitioned as where and represent the
edges within the two groups and represents the edges between the
groups 8
Slide 9
9 Leading Eigenpairs vs. Graph Topology 9 r = 1 r = 2
Original
Slide 10
10 Leading Eigenpairs vs. Graph Topology 10 Original r = 1 r =
2
Slide 11
11 Leading Eigenpairs vs. Graph Topology 11 Originalr = 1 r = 4
r = 2
Slide 12
Low Rank Approximation on Graph Data Low Rank Approximation:
This provide a best r rank approximation to A To keep the structure
of adjacency matrix, discrete as following: 12
Slide 13
Reconstruction from Randomized Graph 13
Slide 14
14 Reconstructed Features (Political Blogs, Rand Add/Del 40% of
Edges) 14
Slide 15
15 Determine Number of Eigen-pairs Question: How to choose an
optimal rank r for reconstruction? Solution: Choose as the
indicator since it is closely related to the other features and
there exists an explicit moment estimator where m is the number of
edges, k is the number of edges add/delete, 15
Slide 16
Algorithm 16
Slide 17
Evaluation 17
Slide 18
18 Effect of Noise (Political Blogs) The method works well to a
certain level of noise Even with high level of noise, the
reconstructed features are still closer to the original than the
randomized ones 18
Slide 19
19 Reconstructed Features on 3 real network data 19
Reconstruction Quality When, the reconstructed features are closer
to the original ones than the randomized ones All positive for the
three data sets
Slide 20
Privacy Issue 20
Slide 21
21 Privacy Issue Question 1: Can this reconstruction be used by
attackers? Define the normalized Frobenius distance between A and
as 21 Political Books Enron Political Blogs Normalized F Norm
Slide 22
22 Privacy Issue Question 2: Which type of graphs would have
privacy breached? For low rank graphs which have, the distance
between the reconstructed graph and the original graph can be very
small 22
Slide 23
23 Synthetic Low Rank Graphs Here is a set of synthetic low
rank graphs generated from Political Blogs and you can see that the
reconstruction works on both the distance and features 23
Slide 24
24 Conclusion We show the relationship between graph
topological structure and eigen-pairs of the adjacency matrix We
propose a low rank approximation based reconstruction algorithm
with a novel solution to determine the optimal rank For most social
networks, our algorithm do not incur further disclosure risks of
individual privacy except for networks with low ranks or a small
number of dominant eigenvalues 24
Slide 25
25 Questions? Acknowledgments This work was supported in part
by U.S. National Science Foundation IIS-0546027 and CNS-0831204.
Thank You! 25
Slide 26
Background Publish/outsource data for mining/analysis 26
Public/Third party/Research Inst. Data Owner The original graph
data release publish Under Attacks!!! Privacy: protect sensitive
data (identity, relationship, sensitive attributes) Utility:
preserve features/patterns/distributions of data
Slide 27
Background Spectral Filter for Numerical Data derive estimation
of U from perturbed data Calculate covariance matrix Apply spectral
decomposition to Derive the eigenvalues information from the
covariance matrix of noise V and choose a proper number of
dimensions, r Let and, obtain the estimated data set using 27
Slide 28
New Challenges A is a 0-1 adjacency matrix whereas U is a
numerical matrix and is positive covariance matrix has only
non-negative eigenvalues whereas A has both positive and negative
eigenvalues. Can not define the covariance matrix for graph data
The strategy of determining the number of eigen components to use
in numerical data does not work for graph data since the first
eigenvalue of the noise matrix could be very large. 28 A is a 0-1
adjacency matrix whereas U is a numerical matrix and is positive
covariance matrix has only non-negative eigenvalues whereas A has
both positive and negative eigenvalues. Can not define the
covariance matrix for graph data The strategy of determining the
number of eigen components to use in numerical data does not work
for graph data since the first eigenvalue of the noise matrix could
be very large.
Slide 29
29 Data Sets Political Blogs Based on incoming and outgoing
links and posts during the time of 2004 presidential election 16714
links among 1222 US political blogs Political Books Based on the
political books sold by Amazon.com where nodes represent the books
and edges represent the co-purchasing of books 105 nodes and 441
edges Enron Based on email corpus of a real organization covering 3
years period where an edge represents there are at least 5 emails
sent between two people 151 nodes and 869 edges 29
Slide 30
30 Future Work Study whether similar LRA reconstruction can be
derived on other edge based perturbation strategies such as Rand
Switch and K-Anonimity. Reconstruction of distribution from
networked data. Distribution of networked data? Randomization
mechanism Privacy vs. utility (in general social networks and with
background knowledge attacks) Spectral analysis of graph topology
(signed/weighted/directed graph) 30