Upload
marci
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Automated Social Hierarchy Detection through Email Network Analysis. (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo. Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/12/11. Outline. Introduction SNA algorithm Results and Discussion - PowerPoint PPT Presentation
Citation preview
Automated Social Hierarchy Detection through Email Network
Analysis(SNAKDD07)
Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo
1
Advisor: Dr. Koh Jia-LingReporter: Che-Wei, Liang
Date: 2008/12/11
Outline
• Introduction• SNA algorithm• Results and Discussion• Conclusions and Future Work
2
Introduction
• The recent bankruptcy scandals in US companies such as Enron and WorldCom have increased the need to analyze electronic information– In order to define risk and identify any conflict of interest
among the entities of a corporate household
• Identifying the relationships between entities, or corporate hierarchy is not a straightforward task– Can be extracted by analyzing the email communication
data
3
SNA Algorithm
• For each mail user– Analyze and calculate several statistics for each
feature of each user
• Construct an email network graph– Vertices represent accounts, edges represent
communication between two accounts– Analysis cliques and other graph theoretical qualities– Combined to Social score
4
SNA Algorithm
• Two sets of statistics about user’s “importance”– Average response time
• The average time elapsed between a user sending an email and later receiving an email from that same user
• Considered a “response” if a received mail succeeds a sent mail within three days
– Cliques(maximal complete subgraphs)• find all cliques in a graph• Assumptions: users associated with a larger set and
frequency of cliques will be ranked higher
5
Cliques
6
Communication Networks
• Number of cliques– The number of cliques that the account is contained within
• Raw clique score– A score computed using the size of clique set
• Weighted clique score– A score computed using the “importance” of the people in
each clique
7
Communication Networks
• Degree centrality– Deg(vi) = ∑ j aij (aij entry of adjacent matrix A of G)
• Clustering coefficient– how close the vertex and its neighbors are to
being a clique
8
Communication Networks
• Mean of shortest path length from a specific vertex to all vertices in the graph G–
where dij D, D is the geodesic distance matrix of G
• Betweeness centrality– Proportion of all geodesic distances of all other
vertex that include vertex vi
9
Communication Networks
• “Hubs-and-authorities” importance– Calculates the “hubs-and-authorities” importance
of each vertex• J. Kleinberg. Authoritative sources in a hyperlinked
environment. Journal of the ACM, 46, 1999.
10
Social Score
• Social score– Rank users from most important to least important– Group users which have similar social scores and
clique connectivity– Determine n different levels of social hierarchy within
which to place all the users
11
Compute Social Score
• Scale and normalize each statistics
• Social score– A score between 0 and 100
12
Results and Discussion
• Using EMT– Java based email analysis engine built on a
database back-end– JUNG library is used for the degree and centrality
measures
• Present the analysis of the North American West Power Traders division of Enron Corporation
13
14
15
16
Conclusions and Future Work
• Enron dataset provides an excellent starting point of real world data
• By varying the feature weights, it is possible to– Pick out the most important individual– Group individuals with similar social qualities– Graphically draw an organization chart which
approximately simulates the real social hierarchy
17
Conclusions and Future Work
• The concept of average response time can be reworked by considering the order of response
• Consider common email usage times for each user and to adjust the received time of email
• New grouping and division algorithms are being considered
• Graph edges should be considered into arrange users into different level
18