Upload
kostya
View
41
Download
0
Tags:
Embed Size (px)
DESCRIPTION
MINFS544: Network-based Business Intelligence (BI). Feb 19 th , 2013 Daning Hu, Ph.D., Department of Informatics University of Zurich. F Schweitzer et al. Science 2009. Stop Contagious Failures in Banking Systems. - PowerPoint PPT Presentation
Citation preview
MINFS544: Network-based Business Intelligence (BI)
Feb 19th, 2013
Daning Hu, Ph.D.,Department of Informatics
University of Zurich
F Schweitzer et al. Science 2009
Stop Contagious Failures in Banking Systems
During 2008 financial tsunami, which bank(s) we should inject capital first to stop contagious failures in bank networks? 2
Utilize Peer Influence in Online Social Networks
Intelligent Advertising, Product Recommendation Who are the most influential people? What are the patterns of information diffusion?
3
Develop Strategies to Attack Terrorist Networks
A Global Salafi Jihad Terrorist NetworkHu et al. JHSEM 2009
How to effectively break down a terrorist network? 4
Network-based Business Intelligence
5
Network-based (Modeling and Analysis)Modeling and analyzing various real-world social and organizational networks to understand:
the cognitive and economic behaviors of the network actors; and the dynamic processes behind the network evolution
Based on the above…
Business Intelligence (BI)Design network-based BI algorithms and information systems to provide decision support in various application domains
Financial Risk Management, Security Informatics, and Knowledge
Management, etc. Network Analysis, Simulation of Network Evolution, Data Mining, etc.
MINFS544: Network-based Business Intelligence
• Lecturer: Dr. Daning Hu; Teaching Assistant: Dr. Jiaqi Yan
• Email: [email protected] [email protected]
• Credits: 3 ECTS credits
• Class Meetings: Tue 14:00-15:45 PM, or Thu 10:15 –12:00
pm (Please see the schedule)
• Language: English
• Audience: Master and doctoral students
• Office Hours: Tue 13:00–14:00 PM, Room 2.A.12
• Grading: Course report (term paper) 70%, presentation
20%, participation 10%
7
Grading
• 1. A full research paper (70%). The format of this paper can be found at: http://icis2012.aisnet.org/index.php/submissions– * If possible, get it published in ICIS 2013 and get it cited.
• This paper should include answers to the following questions:– What is the problem?– Why is it interesting and important?– Why is it hard? Why have previous approaches failed?– What are the key components of your approach?– What 1) models, 2) data sets and 3) metrics will be used to validate
the approach?
8
Grading
• 2. Oral presentation of the paper (using slides) + Q&A (20%)
• For presentations, please see slide on How to give a good research talk at:
• http://research.microsoft.com/en-us/um/people/simonpj/papers/giving-a-talk/giving-a-talk.htm
• 3. Active participations and interactions (10%)
Course ScheduleDate Event Deliverable
19.02.13 Course introduction, Kick-off meeting
20.02.13 – 04.03.13
One to one meeting 1 page summary due
05.03.13 Research method tutorial
06.03.13 – 18.03.13
One to one meeting on research progress
3 pages literature review due
19.03.13 Lecture, feedback and discussion
20.03.13 – 10.04.13
Writing Research-in-Progress Paper (RIP)
5 – 8 pages RIP due
11.04.13 Lecture, feedback and discussion
12.04.13 – 02.05.13
Writing full paper 8 – 12 pages full paper due (first deadline)
25.04.13, 16.05.13
Presentation days, feedback and discussion
29.05.13 Final paper due
A Brief History of Network Science
10
Mathematical foundation – Graph Theory1736
1930 Social Network Analysis and Theories Sociogram: Network visualization Six degree of separation Structural hole: Source of innovation
Network Science Economic networks (Agent modeling & simulation) Dynamic network analysis BI applications: product diffusion in social media,
recommendation systems
1990 (Physicists) Complex Network Topologies
Small-world model (e.g., WWW) Scale-free model (“Rich get richer”)
2000
2012
?
Outline
11
Introduction
Dynamic Analysis of Dark Networks
A Global Salafi Jihad (GSJ) Terrorist Network
A Narcotic Criminal Network
A Network Approach to Managing Bank Systemic Risk
Ongoing Work
Conclusion
Dynamic Network Analysis (DNA)
12
What Why How
Model the changes in
network evolution
Temporal changes in
network topological
measures
Dynamic network
recovery on
longitudinal data
Studying dynamic link formation processes behind
network evolution.
Nodes forming links Network Evolution
Statistical analysis of
determinants behind
link formation
Homophily
Preferential
attachment
Shared affiliations
Simulate the
evolution of networks
Agent-based
Modeling and
Simulation
Examine network
robustness
Research Testbed: A Global Terrorist Network
13
The Global Salafi Jihad (GSJ) network data is compiled by a
former CIA operation officer Dr. Marc Sageman - 366 terrorists
friendship, kinship, same religious leader, operational interactions, etc.
geographical origins, socio-economic status, education, etc.
when they join and leave GSJ
The goal of dynamic analysis gain insights about the evolution of GSJ network
develop effective attack strategies to break down GSJ network
Sample data of GSJ terrorists
14
a
15
Dynamic Network Analysis
16
What Why How
Model the changes
in network evolution
Temporal changes in
network topological
measures
Dynamic network
recovery on
longitudinal data
Studying dynamic processes (i.e., link formation) behind
network evolution.
Nodes’ behaviors Network Evolution
Statistical analysis of
determinants behind
link formation
Homophily
Preferential
attachment
Shared affiliations
Simulate the
evolution of networks
Agent-based
Modeling and
Simulation
Examine network
robustness
Temporal Changes in Network-level Measures
Average Degree <k >
0
2
4
6
8
10
12
14
16
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
deg
ree
<k>
Fig.1. The temporal changes in the (a) average degree, (b) and (c) degree distribution
Degree = number of links a node has
a
b
c
0.00
0.03
0.06
0.09
0.12
0.15
0.18
0.21
0.24
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
pro
bab
ility
of
deg
ree
1990
1991
1993
Poisson
0.00
0.03
0.06
0.09
0.12
0.15
0.18
0.21
0.24
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52
pro
bab
ility
of
deg
ree
1995
1997
1999
Findings
18
There are three stages for the evolution of the GSJ network:
1989 - 1993 The emerging stage:
The network grows in size
Accelerated Growth - No. of edges increases faster than nodes
Random network topology (Poisson degree distribution)
1994 - 2000 The mature stage:
The size of the network reached its peak in 2000
Scale-free topology (Power-law degree distribution)
2001 - 2003 The disintegration stage:
Falling into small disconnected components after 9/11
Temporal Changes in Node Centrality Measures
19
0
10
20
30
40
50
60
Degree
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Betweenness
Figure.2. Temporal changes in Degree and Betweenness centrality of Osama Bin Laden
Degree: No. of links a node has
Betweenness of a node i No. of shortest paths from all nodes to
all others that pass through node i
Measure i’s influence on the traffic
(information, resource) flowing through it
Findings and Possible Explanations
20
1994 – 1996: A sharp decrease in Bin Laden’s Betweenness
1994: Saudi revoked his citizenship and expelled him
1995: Went to Sudan and was expelled again under U.S. pressure
1996: Went to Afghanistan and established camps there
1998 –1999: Another sharp decrease in his Betweenness
After 1998 bombings of U.S. embassies, Bill Clinton ordered a freeze
on assets linked to bin Laden (top 10 most wanted)
August 1998: A failed assassination on him from U.S.
1999: UN imposed sanctions against Afghanistan to force the Taliban
to extradite him
Research Testbed: A Narcotic Criminal Network The COPLINK dataset contains 3 million police incident
reports from the Tucson Police Department (1990 to 2006).
3 million incident reports and 1.44 million individuals
Their personal and sociological information (age, ethnicity, etc.)
Time information: when two individuals co-offend
AZ Inmate affiliation data: when and where an inmate was housed
A Narcotic Criminal Network
19,608 individuals involved in organized narcotic crimes
29,704 co-offending pairs (links)
21
COPLINK Narcotic Data
Arizona Inmate Data
Overlapped (identified by first name, last name and DOB)
Number of People 36,548 165,540 19,608
Time Span 1990 - 2006 1985 - 2006 17 years
Table 1. Summary of the COPLINK dataset and the Arizona inmate dataset
Proportional hazards model (Cox Regression Analysis)
Homophily in age (group) and race
Shared affiliations:
Mutual acquaintances (through crimes)
Vehicle affiliation (same vehicle used by two in different crimes)
22
Statistical Analysis of Determinants for Link Formation
Fig.3. Results of multivariate survival (Cox regression) analysis of triadic closure (link formation).
IBM’s COPLINK is an intelligent police information system aims to to help speed up the crime detection process.
COPLINK calculates the co-offending likelihood score based on the proportional hazards model .
A ranked list of individuals based on their predicted likelihood of
co-offending with the suspect under investigation.
23
BI Application: Co-offending Prediction in COPLINK
Fig.4. Screenshots of the COPLINK system
Simulate Attacks on Dark Networks
24
Three attack (i.e. node removals) strategies:
Attack on hubs (highest degrees)
Attack on bridge (highest betweenness)
Real-world Attack (Attack order based on real-world data)
Simulate two types of attacks to examine the robustness
of the Dark networks
Simultaneous attacks (the degree/betweenness of nodes are NOT
updated after each removal) – Static
Progressive attacks (the degree/betweenness of nodes are
updated after each removal) – Dynamic
Hub Vs. Bridge Attacks
25
Both hub and bridge attacks are far more effective than real-
world arrests – Policy implications?
Both Dark networks are more vulnerable to Bridge attacks
than Hub attacks. Bridge (highest beweenness): Field lieutenants, operational leaders,
etc.
Hub (highest degree) : e.g., Bin LadenGSJ
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Fraction of nodes removed
S a
nd <
s>
S (Hub attacks)
S (Bridge attacks)
26
Summary and Contributions
We developed a set of Dynamic Network Analysis (DNA)
methods that are effective in
Linking network topological changes to analytical insights
Systematically capturing the link formation processes
Examining the determinants of link formation
Dark networks are
robust against real-world attacks
but vulnerable to targeted bridge attacks
COPLINK provides real-time decision support for fighting crimes.
27
Research Readings and Resources• 1. Networks Overview:• * Statistical mechanics of complex networks, Section III, VI
– http://rmp.aps.org/abstract/RMP/v74/i1/p47_1
• * Networks, Crowds, and Markets: – http://www.cs.cornell.edu/home/kleinber/networks-book/
• 2. Networks in Finance:• * Financial Networks blog and research databases:
– WRDS database– http://www.financialnetworkanalysis.com/research-database/– http://www.stern.nyu.edu/networks/electron.html– * Company Board Social Networks
28
Research Readings and Resources (cont.)• 3. Networks in Marketing:
– * Sinan Aral’s research in networks and marketing– Peer influence– http://web.mit.edu/sinana/www/
• * Social Media based Marketing:– http://searchengineland.com/guide/what-is-social-media-marketing
• 4. Recommender Systems:– http://www-cs-students.stanford.edu/~adityagp/recom.html
• 5. Word-of-Mouth Effects in Social Networks:– http://papers.ssrn.com/sol3/papers.cfm?abstract_id=393042&