Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte...
Preview:
Citation preview
- Slide 1
- Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and
Information Systems Univ. of N.C. Charlotte Reconstruction from
Randomized Graph via Low Rank Approximation
- Slide 2
- Outline Background & Motivation Low Rank Approximation on
Graph Data Reconstruction from Randomized Graph Evaluation Privacy
Issue 2
- Slide 3
- Background & Motivation 3
- Slide 4
- Background In the process of publishing/outsourcing network
data for mining/analysis, pure anonymization is not enough for
protecting the privacy due to topology based attacks(
Active/passive attacks, subgraph attacks). Graph
Randomization/Perturbation: Random Add/Del edges (no. of edges
unchanged) Random Switch edges (nodes degree unchanged) Feature
preserving randomization Spectrum preserving randomization Feature
preserving via Markov-chain based graph generation Clustering ---
grouping subgraphs into supernodes 4
- Slide 5
- Motivation We focus on whether we can reconstruct a graph from
s.t. 5 Our Focus
- Slide 6
- Low Rank Approximation on Graph Data 6
- Slide 7
- Adjacency Matrix & Its Eigen-Decomposition Matrix
Representation of Network Adjacency Matrix A (symmetric)
Eigen-decomposition: Questions: What are their relations with graph
topology? 7
- Slide 8
- 8 Leading Eigenpairs vs. Graph Topology What are the role of
positive and negative eigen-pairs in graph topology? Without loss
of generality, we partition the node set into two groups and the
adjacency matrix can be partitioned as where and represent the
edges within the two groups and represents the edges between the
groups 8
- Slide 9
- 9 Leading Eigenpairs vs. Graph Topology 9 r = 1 r = 2
Original
- Slide 10
- 10 Leading Eigenpairs vs. Graph Topology 10 Original r = 1 r =
2
- Slide 11
- 11 Leading Eigenpairs vs. Graph Topology 11 Originalr = 1 r = 4
r = 2
- Slide 12
- Low Rank Approximation on Graph Data Low Rank Approximation:
This provide a best r rank approximation to A To keep the structure
of adjacency matrix, discrete as following: 12
- Slide 13
- Reconstruction from Randomized Graph 13
- Slide 14
- 14 Reconstructed Features (Political Blogs, Rand Add/Del 40% of
Edges) 14
- Slide 15
- 15 Determine Number of Eigen-pairs Question: How to choose an
optimal rank r for reconstruction? Solution: Choose as the
indicator since it is closely related to the other features and
there exists an explicit moment estimator where m is the number of
edges, k is the number of edges add/delete, 15
- Slide 16
- Algorithm 16
- Slide 17
- Evaluation 17
- Slide 18
- 18 Effect of Noise (Political Blogs) The method works well to a
certain level of noise Even with high level of noise, the
reconstructed features are still closer to the original than the
randomized ones 18
- Slide 19
- 19 Reconstructed Features on 3 real network data 19
Reconstruction Quality When, the reconstructed features are closer
to the original ones than the randomized ones All positive for the
three data sets
- Slide 20
- Privacy Issue 20
- Slide 21
- 21 Privacy Issue Question 1: Can this reconstruction be used by
attackers? Define the normalized Frobenius distance between A and
as 21 Political Books Enron Political Blogs Normalized F Norm
- Slide 22
- 22 Privacy Issue Question 2: Which type of graphs would have
privacy breached? For low rank graphs which have, the distance
between the reconstructed graph and the original graph can be very
small 22
- Slide 23
- 23 Synthetic Low Rank Graphs Here is a set of synthetic low
rank graphs generated from Political Blogs and you can see that the
reconstruction works on both the distance and features 23
- Slide 24
- 24 Conclusion We show the relationship between graph
topological structure and eigen-pairs of the adjacency matrix We
propose a low rank approximation based reconstruction algorithm
with a novel solution to determine the optimal rank For most social
networks, our algorithm do not incur further disclosure risks of
individual privacy except for networks with low ranks or a small
number of dominant eigenvalues 24
- Slide 25
- 25 Questions? Acknowledgments This work was supported in part
by U.S. National Science Foundation IIS-0546027 and CNS-0831204.
Thank You! 25
- Slide 26
- Background Publish/outsource data for mining/analysis 26
Public/Third party/Research Inst. Data Owner The original graph
data release publish Under Attacks!!! Privacy: protect sensitive
data (identity, relationship, sensitive attributes) Utility:
preserve features/patterns/distributions of data
- Slide 27
- Background Spectral Filter for Numerical Data derive estimation
of U from perturbed data Calculate covariance matrix Apply spectral
decomposition to Derive the eigenvalues information from the
covariance matrix of noise V and choose a proper number of
dimensions, r Let and, obtain the estimated data set using 27
- Slide 28
- New Challenges A is a 0-1 adjacency matrix whereas U is a
numerical matrix and is positive covariance matrix has only
non-negative eigenvalues whereas A has both positive and negative
eigenvalues. Can not define the covariance matrix for graph data
The strategy of determining the number of eigen components to use
in numerical data does not work for graph data since the first
eigenvalue of the noise matrix could be very large. 28 A is a 0-1
adjacency matrix whereas U is a numerical matrix and is positive
covariance matrix has only non-negative eigenvalues whereas A has
both positive and negative eigenvalues. Can not define the
covariance matrix for graph data The strategy of determining the
number of eigen components to use in numerical data does not work
for graph data since the first eigenvalue of the noise matrix could
be very large.
- Slide 29
- 29 Data Sets Political Blogs Based on incoming and outgoing
links and posts during the time of 2004 presidential election 16714
links among 1222 US political blogs Political Books Based on the
political books sold by Amazon.com where nodes represent the books
and edges represent the co-purchasing of books 105 nodes and 441
edges Enron Based on email corpus of a real organization covering 3
years period where an edge represents there are at least 5 emails
sent between two people 151 nodes and 869 edges 29
- Slide 30
- 30 Future Work Study whether similar LRA reconstruction can be
derived on other edge based perturbation strategies such as Rand
Switch and K-Anonimity. Reconstruction of distribution from
networked data. Distribution of networked data? Randomization
mechanism Privacy vs. utility (in general social networks and with
background knowledge attacks) Spectral analysis of graph topology
(signed/weighted/directed graph) 30