View
29
Download
0
Category
Tags:
Preview:
DESCRIPTION
Flow Processes and the Structural Importance of Nodes. Mohamed Atta. Steve Borgatti Boston College. Data courtesy of Valdis Krebs. Attacking Terrorist Nets. Find and eliminate structurally important nodes and lines bridges, cut-points; minimum weight cutsets measures of centrality - PowerPoint PPT Presentation
Citation preview
Flow Processes and the Structural Importance
of NodesMohamed Atta
Data courtesy of Valdis KrebsSteve Borgatti
Boston College
Attacking Terrorist Nets• Find and eliminate structurally important
nodes and lines– bridges, cut-points; minimum weight
cutsets– measures of centrality
• closeness, betweenness, eigenvector, etc.
Terrorist Network
Terrorist Network
Data courtesy of Valdis Krebs
Mohamed Atta
Djamal Beghal Essid Sami Ben Khemais
Mamoun Darkazanli
Nawaf Alhazmi Raed Hijazi
Usman Bandukra
Many Problems• Data not good enough
– Mostly known after an event– Sensitive to error
• Benefits are short-term at best– Must address recruitment, training– it is precisely those organizations that
make heavy use of suicide bombers that are organized as networks
Was al Qaeda incapacitated by removal
of 19 hijackers?
DeadAlive?
Data courtesy of Valdis Krebs
One Additional Problem• Centrality measures make certain
assumptions about how things flow– and may produce poor estimates when
misapplied– need to work that out before deciding
which node to remove
Objective• Enumerate kinds of flow processes• Analyze properties• Relate to structural importance of nodes• Relate to existing measures of centrality
Types of Flow Processes• Gift process• Currency process• Transport process• Postal process
• Gossip process• E-mail process• Infection process• Influence process
(several others)
Gift Process• Canonical example:
– passing along used paperback novel • Single object in only one place at a time• Doesn’t travel between same pair twice• Could be received by the same person
twice• A--B--C--B--D--E--B--F--C ...
Currency Process• Canonical example:
– specific dollar bill moving through the economy
• Single object in only one place at a time• Can travel between same pair more
than once• A--B--C--B--C--D--E--B--C--B--C ...
Gossip Process• Example:
– juicy story moving through informal network
• Multiple copies exist simultaneously• Person tells only one person at a time*• Doesn’t travel between same pair twice• Can reach same person multiple times
* More generally, they tell a very limited number at a time.
E-Mail Process• Example:
– forwarded jokes and virus warnings– e-mail viruses themselves
• Multiple copies exist simultaneously• All (or many) connected nodes told
simultaneously (except the immediate source?)
Influence Process• Example:
– attitude formation• Multiple “copies” exist simultaneously• Multiple simultaneous transmission,
even between the same pairs of nodes
Infection Process• Example:
– virus which activates effective immunological response
• Multiple copies may exist simultaneously• Cannot revisit a node
• A--B--C--E--D--F...
Postal Process• Example:
– package delivered by postal service• Single object at only one place at one
time• Map of network enables the intelligent
object to select only the shortest paths to all destinations
Uncovering Flow Properties
• Take componential analysis approach– identify a set of flow processes– compare and contrast to discover
minumum set of attributes (properties) that distinguish them from each other
– view each distinct flow process as unique bundle of properties -- typology
Properties of Flow Processes
• Sequence type: path, trail, walk– path: can’t revisit node nor edge (tie)– trail: can revisit node but not edges– walk: can revisit edges & nodes
• Deterministic vs non-deterministic– blind vs guided– always chooses best route; aware of map
• Combine into 4-way “pattern” property: – geodesics, paths, trails, walks
Properties -- cont.• Duplication vs transfer (copy vs move)
– transfer/move: only one place at one time– duplication/copy: multiple copies exist
• Serial vs parallel duplication– serial: only one transmission at a time– parallel: broadcast to all surrounding nodes
• Combine into “method” 3-way property:– parallel dup., serial dup., transfer
Simplified Typology
parallel duplication
serial duplication transfer
geodesics postalpaths nameserver virus moochertrails e-mail gossip giftwalks influence currency
goods
information
So What?• The properties of a flow process
(together w/ node position) determine which nodes are structurally important– a node that is important in one process is
not important in another– off-the-shelf centrality measures implicitly
assume certain flow properties and are only interpretable for certain flow processes (ala Friedkin)
Closeness Centrality• A node’s centrality is sum of
geodesic distances to all others.– Length of shortest paths
• Is index of expected time until arrival of that-which-flows for consistent processes:– non-deterministic (e.g., postal)– parallel duplication (e.g., e-mail,
nameserver)
S
L
QP X
T M
Closeness Centrality
parallel duplication
serial duplication transfer
geodesics Freeman Freeman Freemanpaths Freeman NEW NOtrails Freeman NEW NOwalks Freeman ? Markov
How long does a token take to reach a node?
Calculating
Betweenness Centrality• Count no. of geodesic paths from
each node to every other node that pass through X– if there is more than one geo-desic
from S to T, count the prop-ortion that pass through X
• Interpret as – how often node utilized by others– potential for control & synthesis S
L
QP X
T M
Betweenness Flow Processes
• Consistent processes– postal process
• Nearly consistent– parallel processes (all routes at same time)– but ... needn’t choose between geodesics
• Implication– better for modeling transportation of goods
than information
Betweenness Centrality
parallel duplication
serial duplication transfer
geodesics NEW NEW Freemanpaths NEW NEW NEWtrails NEW NEW NEWwalks Friedkin? ? Friedkin?
How often does a token pass through a node?
Calculating
Eigenvector Centrality• Eigenvector of adjacency matrix
– in effect, counts number of walks of all lengths emanating from node, weighted inversely by length
• Interpreted as popularity or being in the thick of things
• Assumes flow can return to same nodes & lines
A ΣkAk
Row sumsof kAk
=+ 3A3 ++ 1A1 + ...
2A2k
“Cross-Platform” Centrality
• How far off are these centrality measures when used with wrong flow process?
• How can we correctly measure closeness and betweenness concepts in different flow contexts?
• Simulation modeling
Realized Centrality• Essence of closeness is the expected
time until arrival of fluenda– realized closeness is an empirical
measurement of the avg time until arrival– Freeman closeness is an estimator of this
• model-based formula that should correspond to actual longterm values if the model fits
• Betweenness is expected number of times a fluendum passes through node
Simulation Procedure• For each of 10,000 trials* ...
– For each node,• let token originate at the node & propagate
according to flow process rules until it can go no further
• record which nodes are visited along way and # of units of time needed to arrive at each node for first time
– Cumulate realized closeness and realized betweenness
(for deterministic flow processes)
*NOTE: Parallel processes only require 1 trial -- no randomness
Simulation Procedure• For each of 10,000 trials ...
– For each ordered pair of (source,target) nodes
• let token originate at source node & propagate according to flow process rules until it either reaches target node or can go no further
• record which nodes are visited and # of units of time needed to arrive at each node for first time
– Cumulate realized closeness and realized betweenness
(for non-deterministic processes)
Alternative Methods• Can use non-deterministic procedure on all
processes, for comparability to Freeman betweenness– numerical results quite different– but larger conclusions are the same
• But, logically, not sensible– Freeman’s dyadic method presupposes source
& target• i.e., non-deterministic process
Empirical Results• Compare realized closeness &
betweenness with Freeman measures across different flow processes
• Dataset is known ties among terrorists compiled by Valdis Krebs
• Start with betweenness
Betweenness in Postal Proc.
Name FreeBet RealBetMohamed Atta 1106.9 1108.4Essid Sami Ben Khemais 470.5 468.6Zacarias Moussaoui 434.5 423.0Nawaf Alhazmi 287.6 294.0Hani Hanjour 233.8 229.2Djamal Beghal 195.7 192.9Marwan Al-Shehhi 167.0 162.3Satam Suqami 137.4 136.7Ramzi Bin al-Shibh 88.2 86.2Abu Qatada 78.9 86.4Raed Hijazi 62.6 62.4Tarek Maaroufi 61.5 62.5Mamoun Darkazanli 61.0 61.0Imad Eddin Barakat Yarkas 54.7 65.0Fayez Ahmed 47.9 47.0Abdul Aziz Al-Omari* 42.6 42.8Hamza Alghamdi 40.9 44.9Saeed Alghamdi* 32.2 32.3Ziad Jarrah 31.1 29.0Ahmed Al Haznawi 28.3 28.2
Salem Alhazmi* 23.3 23.9Lotfi Raissi 21.7 21.2Agus Budiman 21.1 20.9Ahmed Alghamdi 13.7 14.4Ahmed Ressam 13.1 13.4Haydar Abu Doha 12.5 12.9Kamel Daoudi 11.8 8.4Khalid Al-Mihdhar 10.3 9.4Nabil al-Marabh 6.7 6.8Mohamed Bensakhria 6.5 7.8Wail Alshehri 4.5 5.0Mustafa Ahmed al-Hisawi 4.5 5.0Said Bahaji 3.6 3.8Jerome Courtaillier 2.9 3.0Waleed Alshehri 1.6 1.5Abu Walid 1.6 2.0Rayed Mohammed Abdullah 1.5 1.5Mehdi Khammoun 1.0 1.0Mohand Alshehri* 1.0 1.1Nabil Almarabh 0.0 0.0Abdussattar Shaikh 0.0 0.0
(all the rest are zeros on both measures)
Betweenness / Gossip Process
ID Real* Free* rReal rFree6 3.843 6.384 1 1
11 3.370 0.649 2 71 2.088 1.056 3 5
16 1.409 -0.181 4 1929 1.376 0.168 5 9
3 1.348 1.384 6 410 1.048 -0.111 7 1612 0.988 -0.078 8 15
4 0.951 -0.228 9 219 0.921 0.468 10 8
37 0.908 2.500 11 221 0.850 2.281 12 325 0.501 -0.349 13 3314 0.493 -0.121 14 17
8 0.479 -0.343 15 317 0.478 -0.361 16 35
41 0.399 0.005 17 125 0.377 -0.308 18 28
19 0.336 -0.174 19 1846 0.237 0.824 20 653 -0.031 -0.242 21 2324 -0.031 -0.238 22 2236 -0.032 -0.371 23 4313 -0.035 -0.287 24 2447 -0.041 0.111 25 10
• Sequential duplication across trails: rumors
• Scores standardized to =0, =1
scores ranks
Betweenness / Gossip
-2.000
-1.000
0.000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
-2.000 0.000 2.000 4.000 6.000
Freeman Betweenness
Rea
lized
Bet
wee
nnes
s
Betweenness in Gossip Proc.
Over-estimated by betweenness centrality
Under-estimated by betweenness centrality
Over-estimated by betweenness centrality
Data courtesy of Valdis Krebs
Token rarely gets to 46, so its realized betweenness cannot be as high as the Freeman measure estimates
Freeman measure is zero when contacts are connected
Blind vs Guided Flows• Nodes embedded in dense regions are
more important in blind processes than in nondeterministic processes.– It is in blind processes that we see bottling-
up phenom. that Granovetter alludes to
Path redundancy
Individual performance
Type of flow
Betweenness in Gift Process
ID Real* Free* rReal rFree6 3.985 6.384 1 1
11 3.356 0.649 2 71 2.188 1.056 3 53 1.470 1.384 4 4
16 1.395 -0.181 5 1929 1.328 0.168 6 910 1.155 -0.111 7 16
4 0.910 -0.228 8 2112 0.906 -0.078 9 15
9 0.858 0.468 10 837 0.851 2.500 11 214 0.656 -0.121 12 1725 0.585 -0.349 13 3321 0.553 2.281 14 3
7 0.401 -0.361 15 358 0.398 -0.343 16 315 0.363 -0.308 17 28
19 0.350 -0.174 18 1846 0.181 0.824 19 613 0.166 -0.287 20 2424 0.166 -0.238 21 2253 0.132 -0.242 22 2336 0.120 -0.371 23 4341 0.110 0.005 24 12
• Physical transfer along trails: used paperback
• Scores standardized to =0, =1
Betweenness / Gift
y = 0.6902x + 0.0048R2 = 0.4764
-2.000
-1.000
0.000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
-2.000 -1.000 0.000 1.000 2.000 3.000 4.000 5.000 6.000 7.000
Realized
Free
man
Cen
tral
ity
Closeness in Gossip Process
ID Real* Free* rReal rFree6 -2.493 -1.691 1 1
11 -1.612 -1.637 2 229 -1.286 -1.582 3 516 -1.156 -1.528 4 9
1 -1.384 -1.473 5 310 -1.189 -1.418 6 825 -0.863 -1.364 7 16
3 -1.384 -1.309 8 421 -1.286 -1.255 9 6
9 -1.058 -1.200 10 104 -0.406 -1.146 11 24
62 -0.993 -1.091 12 1112 -0.993 -1.037 13 1236 -0.765 -0.982 14 1724 -0.928 -0.927 15 1353 -0.765 -0.873 16 18
8 -0.928 -0.818 17 1455 -0.732 -0.764 18 1918 -0.895 -0.709 19 1522 -0.732 -0.655 20 2014 -0.374 -0.600 21 2537 -1.254 -0.546 22 727 -0.732 -0.491 23 21
7 0.050 -0.164 29 2913 -0.015 -0.327 26 26
• Sequential duplication across trails: rumors
• Scores* standardized to =0, =1
• Correlation is high -- much better than betweenness corr
scores ranks
Closeness / Gossip
-3.000
-2.000
-1.000
0.000
1.000
2.000
3.000
-3.000 -2.000 -1.000 0.000 1.000 2.000 3.000
Freeman Closeness
Rea
lized
Clo
sene
ss
Closeness in Gossip Process
Over-estimated by closeness centrality
Under-estimated by closeness centrality
Colors based on average arrival times Data courtesy of Valdis Krebs
In gossip process, token gets bottled up by dense regions, takes long time to escape to other groups. Hard for blind process to find way out.
Closeness in Currency Process
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
-3.00 -2.00 -1.00 0.00 1.00 2.00 3.00
Freeman Closeness
Lack of Symmetry• In many processes, avg distance to
node does not equal distance from the node– even though network is symmetrical
• People who can reach others in few steps are NOT the same as people who can be reached by others in few steps– Freeman closeness uncorrelated w/ former
Asymmetry Due to Degree Variance
1 2 3 4 5 6 7 8 9 101 5.7 5.2 4.6 5.6 4.6 13.2 11.3 10.6 5.32 2.4 2.3 2.4 2.4 6.3 14.4 12.7 11.8 7.13 4.2 4.8 4.4 4.2 4.9 14.5 12.6 11.4 8.14 3.3 4.0 3.7 3.9 5.7 12.7 11.2 10.3 3.95 3.0 3.2 2.6 3.1 6.6 15.0 13.3 12.3 7.86 6.9 20.8 8.5 11.1 20.6 13.0 7.8 7.5 7.17 7.6 20.0 11.0 8.9 19.8 4.1 3.1 3.1 3.08 7.4 19.8 10.7 8.9 19.6 3.0 3.1 3.1 3.09 7.7 19.4 10.1 9.2 19.1 3.6 3.8 3.8 3.6
10 4.2 16.7 9.4 4.3 16.5 3.7 4.1 4.2 4.0
“Distance” Matrix
From
To
Lack of Computability• Closeness in Gift Process
– Gift gets stuck in cul-de-sac, resulting in infinite time/distance
– Can’t compute expected time til arrival
parallel duplication
serial duplication transfer
geodesics Freeman Freeman Freemanpaths Freeman NEW NOtrails Freeman NEW NOwalks Freeman ? Markov
Correlations Among Centralities
EgoDensityEigenvec FreeClos FreeBet BetGossip BetEmail BetGift BetInfect BetMoney ReaGift CloGossip CloInfect CloMoneyEgoDensity 1.000 -0.180 0.362 -0.437 -0.437 -0.142 -0.425 -0.660 -0.472 -0.449 0.268 0.411 0.346Eigenvec -0.180 1.000 -0.797 0.503 0.880 0.376 0.901 0.501 0.853 0.881 -0.763 -0.861 -0.777FreeClos 0.362 -0.797 1.000 -0.532 -0.780 -0.480 -0.780 -0.606 -0.761 -0.825 0.900 0.881 0.816FreeBet -0.437 0.503 -0.532 1.000 0.683 0.059 0.685 0.348 0.757 0.610 -0.356 -0.501 -0.376BetGossip -0.437 0.880 -0.780 0.683 1.000 0.489 0.994 0.710 0.985 0.961 -0.707 -0.878 -0.825BetEmail -0.142 0.376 -0.480 0.059 0.489 1.000 0.477 0.728 0.423 0.599 -0.608 -0.631 -0.753BetGift -0.425 0.901 -0.780 0.685 0.994 0.477 1.000 0.689 0.987 0.968 -0.717 -0.876 -0.824BetInfect -0.660 0.501 -0.606 0.348 0.710 0.728 0.689 1.000 0.684 0.785 -0.632 -0.751 -0.840BetMoney -0.472 0.853 -0.761 0.757 0.985 0.423 0.987 0.684 1.000 0.948 -0.662 -0.839 -0.775ReaGift -0.449 0.881 -0.825 0.610 0.961 0.599 0.968 0.785 0.948 1.000 -0.814 -0.938 -0.923CloGossip 0.268 -0.763 0.900 -0.356 -0.707 -0.608 -0.717 -0.632 -0.662 -0.814 1.000 0.901 0.893CloInfect 0.411 -0.861 0.881 -0.501 -0.878 -0.631 -0.876 -0.751 -0.839 -0.938 0.901 1.000 0.956CloMoney 0.346 -0.777 0.816 -0.376 -0.825 -0.753 -0.824 -0.840 -0.775 -0.923 0.893 0.956 1.000
MDS of Correlations Among Centrality Scores
-2.26
-1.81
-1.37
-0.92
-0.47
-0.03
0.42
0.86
1.31
1.75
2.20
-2.26 -1.37 -0.47 0.42 1.31 2.20
FreeClos
Eigenvec
FreeBet
BetGossipBetEmail
BetGiftReaGift
CloGossip
CloInfect
BetInfect
CloMoney
BetMoney
Summary• Variety of flow processes
– Distinguished by a system of properties• Key properties include
– blind / guided– copy / move– serial / parallel– path / trail / walk
Summary -- cont.• Properties combine to form set of rules
that determine how things flow• These rules interact with structural
location to determine– who gets things earliest– who gets a lot of traffic– i.e., structural importance
Summary -- cont.• Centrality measures make assumptions
about the kinds of flow processes• Freeman measures only consistent with a
few flow processes– When applied to other flow processes they
get the “wrong” answer -- i.e., are not interpretable in the obvious way
– For other processes, need new measures (or use simulation) -- where computable
Assumptions• We can separate concepts from
measures– essence of closeness is time until arrival
• Betweenness make sense in deterministic context
Tentative Conclusions• Applied to deterministic (blind) flows,
Freeman measures over-estimate importance of peripherals and under-estimate importance of core nodes– flows get bottled-up in dense areas with
many redundant pathsPath redundancy
Individual performance
Type of flow
To Do List• Anova-like comparison of results across
different boxes in typology– is the big difference deterministic vs non? Walk
vs path/trail? Copy vs move?• How does the shape of the network
interact with the rules of flow to produce different structural importances for the nodes? -- core/periphery structures– Take loosely game-theoretic approach
To Do List -- cont.• Construct analytic measures
appropriate for each flow process• Extend to directed graphs and
probabilistic ties
Stop Talking.
Group Centrality
• These 3 nodes directly reach 42 people (2/3 of whole)
Data courtesy of Valdis Krebs
Group Centrality
• These 5 nodes directly reach 54 people (86% of whole)
Data courtesy of Valdis Krebs
Encourage Divisions
• Look for and encourage differences in the regions
Data courtesy of Valdis Krebs
Minimum Weight Cutsets
• Cutting just 2 nodes (or 5 ties) splits the network into two pieces
Data courtesy of Valdis Krebs
Mohamed AttaRamzi Bin al-Shibh
Note: data include persons now dead.
Objectives• Uncover assumptions behind measures
of centrality– assumptions about how things flow in nets
• Create a “theory” of how properties of flows affect structural importance of nodes– enable appropriate measurement
Proposition• Classical measures of centrality
presuppose certain properties of flows– may be inappropriate for flows with
different properties• More theoretically: Flow properties
determine the structural importance of nodes
Test Procedures• In non-deterministic and in parallel
duplication processes, realized closeness should match Freeman closeness calculation
• In postal processes, realized between-ness should match Freeman between-ness calculation
SNA Tactics “r” Short Term
• SNA helps decide which nodes & ties to prune, but ...– There are ties we don’t know about – New ties develop– Nodes are replaced– 11 Sept experiment
• Best used at cusp moments to break up a specific attack
Medium Term Tactics• Targeted harassment to shape network
(e.g., increase redundancy of paths)– reduces efficiency of communication?– information distortion can create confusion
• Adding fake nodes– spreading slightly changed information
• create trust problem– place fake nodes in key network positions
Medium Term• Push network into maladaptive shape
– factions– brittle network with many cutpoints– dense networks with redundant pathways
Encourage Divisions
• Look for and encourage differences in the regions
Data courtesy of Valdis Krebs
Long-Term Tactics• Dry up the money• Eliminate recruitment
– make other career paths more attractive– competing glamorous groups, some fake– whispering campaign to discredit members– minimize visible confrontation
Long-Term Tactics• Eliminate al Qaeda training system
– when communication is difficult, coordination is achieved via common training
– creates bond that is activated later to execute an operation
Terrorist Networks?• Long-term suggestions largely
organizational in character --- – is there really an advantage to
conceptualizing terrorist groups as networks?
• Is al Qaeda – any more of a network than corporations?– and any less of a formal organization?
Characteristics of Formal Organizations
– documented procedures / company manual
– occupational training / company orientation
– functional division of labor & specialization
– unity of command– career management– coordination baed on
authority & standardized work processes (professional training)
– targeted communication (unlike meandering gossip)
al Qaeda as Formal Org– detailed manuals
• technical info, management info, logistical -- what kind of house to rent
– lengthy & rigorous training courses– function division of cells– movement of personnel
al Qaeda as Formal Org– centralized decision-making
• coordination of efforts across countries– attacks on US embassies in Tanzania & Kenya
within 9 minutes of each other
• “Those who were trained to fly didn’t know the others. One group of people did not know the other group.” - Usama bin Laden
– communication non-deterministic?
Conclusions• Conventional network approach of using
centrality measures to identify targets needs to be modified– calculate appropriate measures of structural
importance• Even so, eliminating nodes provides only
short term relief• Some network-informed medium term
tactics
Conclusions -- cont.• Long term, we must
– address the finance, recruitment & training issues
• These are not fundamentally network issues– is al Qaeda really more like a social
network than a formal organization?– Going along with the media hype
Conclusions -- cont.• Org’l theory might be more fertile ground
for developing defensive tactics– resource dependency.
• Don’t cutoff funds, control them.– contingency theory. Complicate environs
• e.g., face other terrorist groups: Ath. Lib. Fr.– network governance.
• Make “friends” with al Qaeda’s allies– institutional theory. Fight legitimacy.
Recommended