25
Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University [email protected] August 2004 Abstract Social network analysis investigates the structure of human groups using pairwise links among their members. We show how matrix decompositions can be used to extend the standard repertoire of social network and link analysis tools to allow, for example, the inclusion of other information about individuals, and higher-order information about the relationships among them. We show how these extensions can be applied by analyzing the structure of al Qaeda and its related terrorist organizations. Much of the information about, for example, relative importance of al Qaeda members can be extracted from simple relational information. 1 Introduction Social network analysis explores the structure of groups in human society by modelling individuals, places, and objects as nodes of a graph, and adding links between nodes to represent relations among them. For example, important subgroups can be identified as cliques in the graph; individuals in particular positions of control can be identified by their centrality (using a number of measures); and substructures of particular interest (for example, communication chains) can be discovered [9, 12, 19]. Social network analysis has been applied to both terrorist and criminal networks. For example, Baker and Faulkner [2] relate location in a criminal network to length of eventual sentence; Sageman uses SNA to validate his division of al Qaeda members into four classes: leadership, core Arab, South-East Asian, and Maghreb [16]. 1.1 The challenge Transnational terrorism groups present a new problem for the countries against whom their actions are directed, usually characterized as asymmetric or 4th generation warfare [17]. Unlike the case of military conflicts between nations, such terrorist groups have a membership that is hard to define, few visible fixed targets, the ability to operate across borders relatively freely, and independent sources of funding, removing indirect ways to pressure them via sponsors. The resources that must be expended by both sides differ by several orders of magnitude: around 25 men and expenditures estimated to be only $500,000 were sufficient for al Qaeda to attack the World Trade Center. In contrast, U.S. spending in response is at least $100 billion [3, 5]. 1

Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University [email protected] August

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

Social Network Analysis via Matrix Decompositions: al Qaeda

D.B. SkillicornSchool of ComputingQueen’s [email protected]

August 2004

Abstract

Social network analysis investigates the structure of human groups using pairwise links amongtheir members. We show how matrix decompositions can be used to extend the standard repertoireof social network and link analysis tools to allow, for example, the inclusion of other informationabout individuals, and higher-order information about the relationships among them. We showhow these extensions can be applied by analyzing the structure of al Qaeda and its related terroristorganizations. Much of the information about, for example, relative importance of al Qaeda memberscan be extracted from simple relational information.

1 Introduction

Social network analysis explores the structure of groups in human society by modelling individuals,places, and objects as nodes of a graph, and adding links between nodes to represent relations amongthem. For example, important subgroups can be identified as cliques in the graph; individuals inparticular positions of control can be identified by their centrality (using a number of measures);and substructures of particular interest (for example, communication chains) can be discovered[9, 12, 19].

Social network analysis has been applied to both terrorist and criminal networks. For example,Baker and Faulkner [2] relate location in a criminal network to length of eventual sentence; Sagemanuses SNA to validate his division of al Qaeda members into four classes: leadership, core Arab,South-East Asian, and Maghreb [16].

1.1 The challenge

Transnational terrorism groups present a new problem for the countries against whom their actionsare directed, usually characterized as asymmetric or 4th generation warfare [17]. Unlike the case ofmilitary conflicts between nations, such terrorist groups have a membership that is hard to define,few visible fixed targets, the ability to operate across borders relatively freely, and independentsources of funding, removing indirect ways to pressure them via sponsors. The resources that mustbe expended by both sides differ by several orders of magnitude: around 25 men and expendituresestimated to be only ∼$500,000 were sufficient for al Qaeda to attack the World Trade Center. Incontrast, U.S. spending in response is at least $100 billion [3, 5].

1

Page 2: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

Counterterrorism efforts face corresponding difficulties in attempting to detect and preemptattacks. A metaphor suggested for al Qaeda is that it is a venture capitalist for terror [6]; proposalsfor attacks are brought to the leadership and those that are approved receive support in the formof training and financing. This means that attackers may have only minimal contact with the mainpart of the organization until quite close to the time an attack is mounted. This suggests that everycontact with known al Qaeda members, no matter how fleeting, needs to be treated as significant.The transnational nature of al Qaeda also makes it clear that a profile of a “typical” al Qaedamember does not exist – and there is some evidence that the group is trying to recruit memberswho appear even less like a hypothetical Salafist terrorist.

Al Qaeda is only the most visible of a number of movements whose grievance extends beyond asingle geographical region, whose aim is not simple visibility for their cause, and who have discoveredthat nation-states are vulnerable to asymmetric warfare attacks. Counterterrorism technologieswill, sadly, be of use even when al Qaeda has been defeated.

1.2 Link analysis technology

The techniques of social network analysis have some limitations as tools to explore the graphs thatmodel social groups. First, it is not straightforward to extract ‘higher-order’ information, that isinformation that is associated not with a connected pair of objects but with a larger set. Second,it is not easy to introduce and use information that is not naturally associated with edges of thegraph, for example demographic information. Third, social network analysis typically depends onthe precise connection structure of the graph, so that small changes in the graph may produce largechanges in its properties. This is a problem because information about terrorist groups necessarilymisses some data, and it is also likely that some data is wrong.

In this chapter, we show to how to use the machinery of matrix decompositions to extract moreinformation from a graph that models a social group. We use three such decompositions:

1. Singular value decomposition (SVD). Although SVD is commonly used for dimension re-duction, we use it both as a graph partitioning tool (an approach known as spectral graphpartitioning) and as a way to detect the most anomalous, and hence most interesting, nodesin a graph. SVD transforms data based on correlation, and so can extract structure that isincomplete; it does not require prespecification of the structures of interest.

2. Semidiscrete decomposition (SDD). SDD partitions data into subsets with similar attributevalues, in the process creating an unsupervised hierarchical classification tree. Hence it is aclustering tool that works in a different way to both SVD and metric-based clusterers suchas k-means that are, in any case, unreliable in high dimension.

3. Independent component analysis (ICA). ICA partitions data into the least Gaussian compo-nents possible. In a graph context, this amounts to selecting components that are the mostlike cliques as possible.

These techniques largely avoid the weaknesses of conventional link analysis: they include higher-order correlation information, they can use extra information associated with both edges and nodes,and they are robust in the presence of missing values (because these are often implied indirectly byother values) and wrong values (because correlation rather than equality is the basic comparator).

We illustrate the application of these methods on a dataset containing information about 366members of al Qaeda (current as of the beginning of 2004). The dataset contains typical relationship

2

Page 3: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

information, such as members who are related, who are friends, or who have encountered oneanother since joining the organization. However, it also contains demographic information (age,countries of origin and joining the group, education and marital status, etc.) and we are able toinclude this information in our analysis.

2 Matrix decompositions

We begin with a dataset containing information about n objects (people in our context), with mattributes about each one. Some of these attributes might be categorical (they come from a fixedset of choices such as marital status), numeric (such as age), or representative of connections amongthe objects. In this last case, we will use an adjacency representation: the friendships among the366 people will be represented by 366 different attributes, with a 0 value at position ij indicatingthat persons i and j are not friends and a 1 value indicating that they are. Of course, this regionof the dataset will typically be sparse (i.e. mostly 0s).

Such a dataset is naturally viewed as a matrix, A, with n rows and m columns. A matrixdecomposition expresses the matrix A as a product of other matrices in a way that reveals A’sstructure. Hence a typical matrix decomposition can be expressed as a matrix equation:

A = C S F

where C is n×m, S is an m×m diagonal matrix (off-diagonal entries are all 0), and F is m×m.Typically, the sizes of the matrices on the right hand side are restricted to force the decompositionto represent the data more compactly, so that the decomposition, truncated to k, becomes:

A ≈ Cn×k Sk×k Fk×m

Matrix decompositions are related to Expectation-Maximisation, where each different decomposi-tion imposes extra conditions on the way in which the partitioning is done.

There are several different ways to interpret a matrix decomposition, and each sheds differentlight on the underlying data. In the factor interpretation, the rows of F are interpreted as underly-ing or latent factors and the entries of C as ways to mix these factors to produce the observed data.The diagonal entries of S are weights, whose magnitude gives the relative importance of each factor.This view is commonplace and productive in the social sciences as factor analysis. Sometimes thefactors can be regarded as axes in some space, in which case the entries of C are coordinates ofpoints in this space.

The (outer) product of the ith column of C, the ith entry on the diagonal of S, and the i rowof F is a matrix of the same shape as A, and in fact A can be expressed as the sum of all of thesematrices. This allows a layer interpretation of the decomposition. A is obtained by sandwichingall of the outer-product matrices together, and so each of them can be regarded as making somecontribution to all of the values of A. Once again, the magnitude of the diagonal element providesinformation about how important each layer is to the total dataset.

2.1 Singular Value Decomposition

SVD transforms data in a way that converts correlation to proximity [4, 18]. In the decomposition:

A = U S V ′

3

Page 4: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

the matrices U and V are orthogonal (the superscript dash indicates transposition), and the diagonalentries of S, called the singular values, are non-increasing.

Because V is orthogonal, a geometric interpretation is natural. The rows of U can be understoodas the coordinates of points corresponding to the objects. The axes of the transformed space aresuch that the greatest variation in the original data lies along the direction of the first axis (the firstrow of V ), the greatest remaining variation along the second axis, and so on. Hence truncating atsome k gives a representation in a lower-dimensional space that captures the correlative structureas accurately as possible.

It is conventional to scale the data so that the relative magnitudes of each attribute are thesame, and also to subtract the mean from each column of attributes. If this latter is not done, thefirst singular vector represents the average magnitude of the data and is typically of less interest.However, when the data represents, for example, the adjacency matrix of a graph, it may not besensible to normalize the entries.

The SVD is completely symmetric with respect to rows and columns of the original matrix, sothat all of the analysis that can be done for objects can trivially be repeated for the attributes aswell.

SVD can be used in a number of ways to analyze a dataset:

• Dimensionality reduction. This is the most common use of SVD in data analysis since it pro-vides a way to reduce high-dimensional data (i.e. with many attributes) to lower dimension,losing as little information as possible in the process. When the original data contains noise,this dimensionality reduction can be regarded as denoising as well.

One of the benefits of dimensionality reduction is that choosing k = 2 or 3 allows the rowsof U to be plotted. This often makes it possible to understand at least the most significantstructure of a dataset by visual inspection.

• Clustering. In a transformed and truncated space, the relationships among the points havebeen clarified and consequently clustering might be expected to work more effectively. Thereare, broadly, two approaches, although each contains many competing variants. The first isto use metric-based clustering, for example k-means, in the new space. The second is to usethe properties of the SVD directly in an approach called spectral clustering [10]. For example,those points which lie in the cone around the first axis (those whose dot product with theaxis is less than 1/2) are placed in one cluster; those with the same property with respectto the second axis in the second cluster, and so on. This produces k clusters, of which thelast one is the ‘everything else’ cluster. In some settings, it is obviously correct to includein each cluster the points that lie within the cone corresponding to the negative direction ofeach axis as well. These points are negatively correlated with the others with which they arebeing lumped, but they are correlated nevertheless. In low dimensions, no formal clusteringalgorithm is required because the clusters can usually be seen.

When the matrix represents the adjacency matrix of a graph, the clustering produced by SVDis often similar to the clique structure of the graph.

• Ranking objects by their interestingness. We have already explained that each row of U canbe identified with a point in a k-dimensional space. Suppose that an arrow is drawn from theorigin of the space to each of these points. Then the angles between these vectors reveal thecorrelation among the points. Two points that are strongly positively correlated will have

4

Page 5: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

vectors that are close together. Their dot products, which correspond to the cosine of theangle between the vectors, will be large and positive. Two points that are strongly negativelycorrelated will point in almost opposite directions, and will have a dot product that is largeand negative. Two points that are uncorrelated should have a dot product that is close tozero, and it here that a problem arises. One way in which such a dot product can arise isthat the two vectors are almost at right angles to each other. However, typically the numberof available dimensions (k) is much smaller than the number of uncorrelated points (whichcould be n). There is another way in which the dot product can be close to zero and that isthat the point itself is close to the origin. Hence points that are uncorrelated with most ofthe other points will tend to be placed near the origin. For similar reasons, a point that iscorrelated with almost all of the other points will also tend to be placed near the origin.

Hence in the transformed space, points that are located far from the origin correspond toobjects that are interesting in the sense that their correlations with the other objects isunusual. Conversely, points that are close to the origin correspond to objects that are lessinteresting, either because they are randomly correlated with other objects, or correlatedsimilarly with all of them. Ranking the objects in order of the distance of their points fromthe origin allows the most interesting objects to be selected.

Because SVD is symmetric with respect to objects and attributes, exactly the same idea canbe used to discover the relative interestingness of the attributes.

2.2 Semidiscrete Decomposition

SDD [11, 15] decomposes a matrix A as:

A = X D Y

where the entries of X and Y are from {−1, 0, +1} and D is a diagonal matrix with non-increasingentries (a variant of the original SDD as described in [13]).

The natural interpretation of SDD is the layered one based on the outer product matrices. Theproduct of the ith column of X and the ith row of Y is a matrix which contains rectilinearly alignedpatterns of −1s and +1s against a background of 0s. The non-zero values can be regarded as astencil of locations within A where a set of values of similar magnitude (di) can be found. Thelocations where there is a +1 correspond to positive values of this magnitude and those where thereis a −1 correspond to negative values of this magnitude.

Hence, whereas SVD analyzes the data in a geometric space, SDD analyzes the data withinthe matrix itself, decomposing it into sets of hills and valleys, such that the sum of all of the setsrecreates the original data.

Furthermore, the values in the X matrix provide an unsupervised hierarchical classification ofthe objects. At the top level, those objects whose entry in the first column of X are +1 are in onebranch, while those whose entries are −1 are in an opposite branch. Those objects whose entriesare 0 are in yet a third branch, so that the classification tree is ternary. The tree is hierarchicalbecause the clusters with the largest value of di appear first.

Although SDD was originally developed as a storage-efficient analogue of SVD, there is nonecessary link between the classifications each produces. When the data naturally clusters intomany small, well-separated clusters, SDD and SVD tend to agree. It also often happens that the

5

Page 6: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

Short name Year joined the jihadFull name Age joined the jihad

Date of birth Place joined the jihadPlace of birth Country joined the jihad

Youth national status Acquaintance linksFamily socioeconomic status Friend links

Religious background Nuclear family linksEducational achievement Relative links

Type of education Religious leaderOccupation Ties not in sample

Marital status Role in organizationChildren Operation(s) involved

Social background FateLinks after joining

Figure 1: Dataset attributes.

top-level classification from SDD is aligned with the first axis of SVD, so that the +1 points are atone extremity and the −1 points at the other – but this does not necessarily happen.

2.3 Independent Component Analysis

ICA [1, 7, 8, 14] decomposes the data matrix, A, into components that are as statistically indepen-dent as possible (in contrast to SVD which decomposes the data into components that are linearlyindependent). We use the FastICA algorithm for convenience.

The ICA of a matrix A is:A = W H

(note that there is no ‘weight’ matrix in this case, and hence no natural ordering on the components).H represents the statistically independent factors and W the way in which these factors must bemixed to recreate A.

3 al Qaeda Dataset

We will illustrate the power of these matrix decomposition techniques by using them to investigatethe structures and relationships within al Qaeda, to the extent that they are publicly known. Weuse a dataset collected by Marc Sageman from a wide variety of public sources. The dataset containsinformation about 366 members of al Qaeda as of the beginning of 2004. The available attributesare shown in the table in Figure 1.

Many of these attributes are demographic in nature, but several describe the links among alQaeda members under various categories. Of course, there are many missing values because not allinformation is publicly available.

6

Page 7: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

Name BL centrality Name BL centrality

bin Laden 298 Jarrah 234Zawahiri 240 Shehhi 235Banshiri 226 Mihdhar 220M Atef 254 Hada 227

Sheikh Omar 222 Harithi 227Islambuli 230 Ayiri 232Zubaydah 260 Aktas 222Makkawi 242 Sungkar 229Hawsawi 227 Hambali 253

Taha 230 Faruq 233KSM 250 Ramda 251

Zarqawi 221 Melouk 220Qatada 221 Doha 225Hage 221 Trabelsi 244Khadr 222 Moussaoui 235Ghayth 224 Bahaiah 229Khallad 241 Khabab 228Shah 232 Khalifah 227Atta 246 Tabarak 222Shibh 260

Figure 2: Al Qaeda members with high Bavelas-Leavitt centrality.

We also use a subset of dataset, a link or adjacency matrix containing all of the links amongmembers, whether as family, relatives, friends, or members of the group. The graph of theserelationships has 366 nodes (of course) and 2171 edges. The maximum degree of the graph is 44(but of course this number is probably higher in practise), and the mean degree is 6.44. (This valuefor the mean degree is interesting because it matches the rule of thumb that members of a groupmuch have connections to about 6 others if they are to remain in the group.) The diameter of thegraph is 11.

Figure 2 gives the Bavelas-Leavitt centrality values that exceed 220. For each node, this measureis the ratio of the sum of all of the shortest paths to and from that node to the sum of all of theshortest paths in the entire dataset. Accordingly, it measures how close the node is to the centerof the graph of links in some notional space.

Many of the members with high scores are the leadership of al Qaeda as expected. However,there are several surprises: Hada, Harithi, Ayiri, Aktas, Faruq, Ramda, Melouk, Trabelsi andBahaiah. Examination of the data suggests that these members get such high centrality scoresbecause they have links to Osama bin Laden and several others of the top leadership. In the absenceof other knowledge, this complicates the use of a centrality measure as an analysis device becauseit does not distinguish well between the important leadership and those with little importance butwho are directly connected to the leadership. (Of course, this is further complicated by the factthat such people may be hangers-on, but may also be eminences grises.)

7

Page 8: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

The University of Arizona group have analyzed this dataset and used multidimensional scalingto produce a picture of the group’s connectivity (Jie Xu, personal communication, 2004). Thisshows that the dataset is naturally clustered into 13 almost-cliques, with about 60 members notallocated to a single clique.

A graph of the links within al Qaeda is maintained by Intelcenter and can be viewed on theirweb site (www.intelcenter.com/linkanalysis.html). While the graph is compendious, it is hard toextract actionable information from it.

4 Analysis using matrix decompositions

4.1 Using the links between individuals

In this section we consider only the results of enhanced link analysis, that is we consider the graphof relationships among al Qaeda members. The base dataset is a 366 × 366 adjacency matrix forthe graph that includes: acquaintances, family, friends, relations, and contacts after joining.

−0.35−0.3

−0.25−0.2

−0.15−0.1

−0.050

−0.1

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0.4

U1

U2

U3

Figure 3: SVD plot of al Qaeda members using only relationship attributes.

Figure 3 shows a 3-dimensional (truncated) view of the relationships among al Qaeda membersextracted from their links. The most obvious fact is that there is a clear division into three (perhapsfour) clusters. This radial pattern is typical: those points at the extremities represent individualswith the most interesting connections to the rest of the group. Many members are either connected

8

Page 9: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

in limited ways, or little is known about them. All such members resemble each other, and so tendto be located close to the origin.

The structure is made clearer by adding name labels (we follow Sageman’s usage) and removingpoints (and so individuals) that are located close to the origin. Figure 4 shows those points thatare more than 1.5 times the median distance from the origin, while Figure 5 removes even morepoints. It now becomes possible to identify the visible structure.

−0.35−0.3

−0.25−0.2

−0.15−0.1

−0.050

−0.15−0.1

−0.0500.05

0.10.15

0.20.25

0.3

0

0.1

0.2

0.3

0.4

Mihdhar

Shehhi Jarrah Shibh Atta

HazmiN Hanjour Essabar Mzoudi Motassadeq Belfas Bahaji KSM

HazmiS Sufaat

Khallad Hawsawi Slahi Gancz

Hambali

Darkazanli

Lillie Farik Iqbal Yunos Jabarah1

Rusdan

WanMin

M Atef

Ayub Jabir

Mustaqim Setiono Idris Sulaeman Baasyir

Taoussi Hassouna Laroussi Mehni Jalil Lahnech

Top

Mustofa

Shah

Dulmatin Amrozi Mobarok

Elomari

Ghozi

Imron

Syawal Hasyim Muhajir

Azahari

Abbas

Bentassir

Samudra

Ghoni

Faiz

Zulkarnaen

Sungkar

Mukhlas

Robert Khabab

Salamah Rushdi Qasim Khadr Hamza Hage Taha Ujayzah

Faruq Benyaich1

Islambuli Benyaich2

Banshiri Dumont Zemmiri Benbahlouli Djouina Souimdi Bouguelane

Makkawi Chaib Dahoumane Chedadi Bendaoui Aihar Bahaiah Charkaoui Khemais Zawahiri

Bensakhria Haouari Akhnouche Khabou Jaafar Ouzghar Boumez Hannachi Ikhlef Caze Atmani

Yarkas

bin Laden

Moussaoui Qatada Labsi Doha Ressam Zubaydah Trabelsi

Slimani Fateh Jaime Vallat Ramda Koussa Bouhadjar Maameri Belkacem Kelkal Bourada Maaroufi Zaoui Touchent

Garoise Rhouzlane RaFalah Hammadi Afkir Haddou Marzoug Benakcha Azil TaFalah Mountassir Boujedli

U2 U1

Zinedine Zarouali ZiyadU

3

Figure 4: SVD plot of 143 interesting members (greater than 1.5 times the median distance fromthe origin) labelled with short identifiers.

There are three clusters in these figures: a group of Algerians arranged vertically in the figures;a group of South East Asian members stretching to the right; and a group of leaders and some coreArabs towards the front. It is clear from these figures that Hambali plays a pivotal connecting rolebetween the SE Asian group and the leadership group; but further, the separation into two parallellines of the leadership group is entirely due to whether or not they have a link to Hambali. Thefact that Hambali is well-connected is obvious from the raw data – but it is not so obvious howintegral these connections are to holding al Qaeda together. The strong presence of the Algeriancluster is slightly surprising; while these members have been active over a long period, they are notobviously the most important members of al Qaeda’s European operations in the raw data.

Each of the clusters arranges the more important members farthest from the origin as expected.Notice that bin Laden is not the most extremal member of the leadership cluster – this appears tobe partly due to good tradecraft (he is not directly involved in operations) and to relative inactivityover the past few years. Note that Figure 5 selects the highest profile al Qaeda members well.

9

Page 10: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.35−0.3−0.25−0.2−0.15−0.1−0.050

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0

0.2

0.4

Mihdhar

Shehhi Jarrah

Shibh Atta

KSM

Hambali

Imron Azahari Samudra Zulkarnaen

Mukhlas

Zawahiri Yarkas

bin Laden

Ressam

Zarouali Ziyad

U1

U2

U3

Figure 5: SVD plot of 18 interesting members (greater than 4 times the median distance from theorigin) labelled with short identifiers.

Figure 6 shows the 143 interesting members, but using dimensions 4 to 6 of the SVD (in otherwords, relationships in less important dimensions). Here again there are 3 clusters, although theydo not bear the same close relationship to attack teams (although the group to the right are infact the September 2001 attackers and their support group). It is clear, once again, that the mostimportant members of the group are placed far from the origin.

Figure 7 overlays the SVD plot in dimensions 1–3 with information about which cultural groupeach member comes from. This figure shows the strong, separated, groups from the Middle Eastand from South East Asia. Although the vertical group are different from everyone else, most ofthe Maghreb/Algerian/French members resemble core Arabs.

Figures 8 and 9 are the same plots, but with the color and shape labelling derived from theSDD classification of the points. The top-level division in Figure 8 is coded by color: on one side,the important leaders, core Arabs, and SE Asians (red), in the middle the majority of the members(green), and on the other side, the Algerians (blue). The subsequent two levels are indicated bythe symbol shape as shown in the following table:

+1 +1 dot 0 +1 + −1 +1 diamond+1 0 circle 0 0 star −1 0 triangle down+1 −1 cross 0 −1 square −1 −1 triangle up

10

Page 11: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.35 −0.3 −0.25 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15−0.5 00.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

U4

U5

Ressam

Labsi Doha Zubaydah Yarkas

Fateh Hannachi Ikhlef Atmani Boumez Jaafar Akhnouche Ouzghar

Haouari Charkaoui Dahoumane Qatada Bensakhria Khemais Maaroufi Benyaich2 Chedadi Bourada Chaib Benyaich1 Trabelsi Bahaiah

U6 Robert Bentassir Darkazanli Slahi Elomari Ramda Kelkal Laroussi Mehni Hassouna Lahnech Jalil Taoussi Touchent Bouhadjar Koussa Maameri Khabou Belkacem Zaoui Vallat Jaime Moussaoui Slimani Faruq Atta Azahari Ghoni Bahaji Samudra Imron Idris Amrozi Mobarok Dulmatin Abbas Mzoudi Belfas Motassadeq Essabar Shibh Hasyim Muhajir Faiz Ghozi Top Setiono Sulaeman Syawal WanMin Ayub Mustofa Zinedine Mukhlas Ziyad Jabir Garoise Rhouzlane RaFalah Haddou Hammadi Benakcha Afkir Mountassir Marzoug Azil TaFalah Boujedli Mustaqim Zarouali Rusdan Yunos Baasyir Zulkarnaen Sufaat Iqbal Gancz Lillie Farik Jabarah1 Hanjour HazmiS Mihdhar Sungkar HazmiN Hawsawi Jarrah Shehhi Hambali Makkawi KSM Khallad M Atef Shah Hage Khadr Khabab

Salamah Rushdi bin Laden Banshiri Qasim Taha Hamza Ujayzah Islambuli

Zawahiri Caze

Souimdi Djouina Benbahlouli Bouguelane Zemmiri Bendaoui Aihar Dumont

Figure 6: SVD plot in dimensions 4–6 of 143 interesting members (greater than 1.5 times themedian distance from the origin) labelled with short identifiers.

Here the similarities are between the Algerian group and the SE Asians (indicated by diamonds).In all of the clusters, a few marginal members can be discerned, indicated by pluses. Although theSDD classification generally agrees with that of SVD, the benefit of the extra information is thatit provides substructure: a better view of boundaries and more detail within clusters.

Independent component analysis is naturally interpreted in a layered way: each componentdescribes some aspect of the dataset. In this case, ICA works as an effective clique detector. Itfinds small groups of individuals who are much more closely linked than usual.

Recall that we computeA = W H

From this, we can compute the set of outer product matrices formed by multiplying the ith columnof W with the ith row of H (giving a matrix with the same shape as A).

Figure 10 shows one example of such an outer product matrix. Because the rows of the originaldataset tend to have been organized in rough groups, the clique of connected individuals discov-ered by ICA happens to be located almost contiguously in the dataset, but there is no necessaryconnection.

We extract the individuals associated with each outer product by applying a threshold functionto each such matrix (in this case, 0.2 of the maximum value) to produce a 0-1 matrix. We then list

11

Page 12: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.35−0.3

−0.25−0.2

−0.15−0.1

−0.050

−0.1

0

0.1

0.2

0.3

0

0.1

0.2

0.3

0.4

U1U2

U3

Figure 7: SVD plot showing cultural group membership (squares – leadership, stars – core Arabs,circles – Maghreb, dots – S.E. Asian).

those objects (individuals) with a 1 anywhere in their row.Recall that ICA does not impose an importance ordering on components (at least not directly).

Some clusters that arise from the link dataset are:

• Christophe Caze, Lionel Dumont, Rachid Souimdi, Saad el Aihar, Amar Djouina, MouloudBougelane, Hassan Zemiri, Hocine Bendaoui, Seddick Benbahlouli, Laifa Khabou, Fateh Ka-mal (Groupe Roubaix, France 1994).

• Rachid Ramda, Ali Touchent, Boulem Bensaid, Safe Bourada, Smain Ait Ali Belkacem,Mohamed Drici, Ali ben Fatoum, David Vallat, Khaled Kelkal, Karim Koussa, AdelkaderMaameri, Abdelkader Bouhadjar, Nasserdine Slimani, Farid Melouk, Ahmed Zaoui (France1995).

• Osama bin Laden, Mohammed Atef, Mustafa Ahmed al-Hawsawi, Khalid Sheikh Mohammed,Waleed Tawfiq bin Attash, Mohamadou Ould Slahi, Mamoun Darkazanli, Mohammad binNasser Belfas, Mounir al-Motassadeq, Abdal Ghani Mzoudi, Said Bahaji, Mohammed Atta,Ramzi bin al-Shibh, Ziad Jarrah, Marwan el-Shehhi, Zakarya Essabar, Hani Hanjour, Nawafal-Hazmi, Khalid al-Mihdar, Saleem al-Hazmi, Fayez Ahmad el-Shehri, Ahmed al-Nami,Christian Ganczarski, Encep Nurjaman (Hambali) (largely organizers and participants ofWorld Trade Center attack, 2001)

12

Page 13: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.35−0.3

−0.25−0.2

−0.15−0.1

−0.050

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0

0.1

0.2

0.3

0.4

U1

U2

U3

Figure 8: SVD plot of members with SDD color and shape labelling, showing extra boundaryinformation.

• Mohamed Zinedine, Abdelilah Ziyad, Abdelkrim Afkir, Hamel Marzoug, Abdeslam Garoise,Radouane Hammadi, Stephane Ait Iddir, Mohamed Azil, Abdelaziz Rhouzlane, Abderrah-mane Boujedli, Kamel Benakcha, Rachid Falah, Tarek Falahm El Moustapha ben Haddou,Farid Zarouali, Abderrazak Mountassir (same group identified by SVD).

• Abu Bakar Baasyir, Abdullah Sungkar, Encep Nurjaman (Hambali), Ali Ghufron bin Nurhasyim(Mukhlas), Yassin Syawal, Rahman al-Ghozi, Abdul Aziz (Samudra), Enjang Bastaman(Jabir), Amrozi bin Nurhasyim, Ali Imron bin Nurhasyim, Hutomo Pamungkus (Mobarok),Faiz bin Abu Bakar Bafana, Hasyim bin Abbas, Mohammed Nasir bin Abbas (Sulaeman), Ab-dul Rahim Ayub, Azahari bin Husin, Aris Sumarsomo (Zulkarnaen), Suranto Abdul Ghoni,Noordin Mohammad Top, Jhoni Hendrawan (Idris), Pranata Yudha (Mustofa), Wan Min binWan Mat, Umar Dul Matin, Abbas Edy Setiono, Thoriqudin (Rusdan), Mustaquim, Muhajir(JI members, SE Asian attacks).

• Osama bin Laden, Zain al-Abidin Mohammed Hussein (Zubaydah), Omar ibn MahmoudOmar Othman (Qatada), Mohamed Heidar Zammar, Mamoun Darkazanli, Amar Makhlulif(Doha), Mohamed Bensakhria, Essid Sami ben Khemais, Tarek Maaroufi, Imad Eddin BarakatYarkas, Anwar Adnan Mohammad Salah, Mohammed Galeb Zouaydi, Tayssir Alluni, Ous-sama Dara, Mohammed Bahaiah, Jose Luis Galan Gonzalez, Abdelaziz Benyaich, Salahed-

13

Page 14: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.35−0.3

−0.25−0.2

−0.15−0.1

−0.050

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0

0.2

0.4

Mihdhar

Shehhi Jarrah Shibh Atta

HazmiN Hanjour Essabar Belfas Motassadeq Mzoudi Bahaji KSM

HazmiS Sufaat

Harbi Khallad Hawsawi Slahi Gancz

Hambali

Nami FAShehri WailShehri WalShehri MaShehri AIGhamdi HaGhamdi Suqami AIHGhamdi SaGhamdi Omari

Darkazanli

Nashiri Faris Marri Quso Badawi Nawar

Thomas

OmarSheikh Murad

Farik Lillie

Mehdi Tayer Gungun Roche Kastari

Abouhalima

Maidin

ZaMihd Yunos2 Naharudin

Rais

RamziYousef MOShehri Habib mithim Hanif Mardi Rashoud HasFaqih Hicks Ujaama ManFaqih Zaidan Bandar Aufi Sharif Owhali Shamrani Said Hajri

Marzuki Haddouchi Damir3 Damir2 Ilhan Rushud Maqbul Mutayri Kandari

AlAzdi Hajeri Dandani Elbaneh Derwish Johani Tbaiti Damir1 Ahdal

Iqbal

Tugluoglu Khalim

Gokhan Hussein

SamHada Shakur Muqrin BinHir

Yitiz Junaydi Hurayrah Halim

Karakus Kuncak

Yunos

Jabarah1 KKMohd Polat

Rusdan

Cabuk

WanMin

Jabarah2

M Atef

Ayub

Ugurlu Mohsin Nabi Ayaz Hafidh Siliwangi Dwikarna

Sani

BuHaymid Fikri

Jabir

Brahim Dhess Shadi MaFadli

Mustaqim Setiono

Jad Merabet

Fathi

Idris Sulaeman

Sassi Aouni

Baasyir

Lahnech Hassouna Taoussi Mehni Jalil Laroussi

Top

Mustofa

Benchellali3 Benchellali2 Shah Benchellali Fizazi2 Najjar

Dulmatin

Mujati

Mistooki

Khadr3

Amrozi

Miloudi

Mobarok

Khadr1 Khadr2 Khadr4 Elomari Hijazi2

Ghozi

Imron

Chadli Mahdjoub Laidouni

Syawal Hasyim

Benahmed

Muhajir

Fakhiri

Azahari

Abbas

Djaffo Bentassir

Samudra

Jamal

Ghoni

Zawahiri2

Faiz

Ekinci Ameroude Ghayth Brigitte Attar Jiddi Rafiki SbinLaden TaFadli Dahab

Zulkarnaen

Ersoz Haroun

Sungkar

Butt

Mukhlas

Tabarak Benyaich3 Khalifah Harithi James Ghalain Sarmad Harhra Janjalani Bonte Zarqawi Hada

Rahmouni MMKamel Arnaout Shihatah Odeh Sheikh Omar Loiseau Robert Bari Muntasir Sirri

Khabab

DavidC Mahjoub Benchellali1 Zammar AliMohd Jaballah Salim Salamah Aktas Rushdi

Ayiri Aydarus Fadl

Padilla Khechtou Fawwaz

EsSayed Mourad Parlin Qasim Hijazi Hoshar Khadr

Boughrara Bounour Hamza Darra Abrash Meskini Hage

Harkat Galan Reid Galeb Taha

Chebli Ujayzah

Faruq Khalfaoui Benyaich1 Fizazi Samir Zemiri Sliti

Islambuli Ouaer Daoudi JeromeC Benyaich2 Kadri Benali

Banshiri

Dumont Alluni Salah Omary Tebourski Gharssi Zemmiri Maroni Benbahlouli Djouina Bouguelane Souimdi Makkawi

Chaib Dahoumane Dahmane Chedadi Choulah Boukhari Bendaoui Aihar Bahaiah

Charkaoui Khemais

Zawahiri

Bensakhria Beghal Haouari Akhnouche Khabou Jaafar Ouzghar Boumez Hannachi Ikhlef Caze Kamel Atmani

Yarkas

bin Laden

Sabour Moussaoui Qatada Yadel Labsi Doha Badache Ressam Zubaydah Trabelsi

Slimani Fateh Jaime Bensaid Fattoum Drici Vallat Ramda Maameri Bouhadjar Koussa Belkacem Kelkal Melouk Bourada Maaroufi Zaoui Touchent Lounici Rabah Faradji Iddir Garoise Rhouzlane RaFalah Hammadi Afkir Haddou Marzoug Benakcha Azil TaFalah Mountassir Boujedli Zinedine Zarouali

U1U2

Ziyad

U3

Figure 9: SVD plot of members with SDD color and shape labelling and short identifiers.

dine Benyaich, Said Chedadi, Driss Chebli, Najib Chaid Mohamed, Mohamed Fizazi (NorthAfrican and European attacks).

• Zain al-Abidin Mohammed Hussein (Zubaydah), Safe Bourada, Laifa Khabou, Fateh Kamel,Abdellah Ouzgar, Zoheir Choulah, Said Atmani, Abderraouf Hannachi, Ahmed Ressam,Mustapha Labsi, Mourad Ikhlef, Adel Boumezbeur, Samit Ait Mohamed, Abdel Majit Da-houmane, Mokhtar Haouari, Amar Makhlulif (Doha), Yacine Akhnouche, Omar Chaabani(Jaafar), Rabah Kadri, Slimane Khalfaoui, Hassan Zemiri, Adil Charkaoui (Los Angeles mil-lennium attack).

Other groups include: those involved with early attacks in Egypt and the early leadership of alQaeda, and those involved in the Casablanca attack in 2003.

The interesting things about these groups are:

• Although they are based purely on link data, they correspond well to patterns of terroristattacks. This shows that al Qaeda’s functional structure (who plans, leads and carries outan attack) is heavily derived from existing familial and relationship connections among itsmembers. (Although some group link structure is present in the raw data, it is by no meansenough to determine attack groups.)

• Several people appear in multiple groups, thus revealing their role as the glue that bindsdisparate groups together. Notice that many groups with close geographical and relationship

14

Page 15: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

50 100 150 200 250 300 350

50

100

150

200

250

300

350

Figure 10: An example of an outer product matrix from the ICA of the relationship matrix. Thepresence of a small, connected group is easily visible.

ties still include one or two members of the al Qaeda leadership, showing how long-distancerelationships maintain overall group cohesiveness.

• There are some individuals who ought, on the face of it, to appear as members of a groupbut do not. This may happen simply because not enough is known about them so theyappear to be relatively unconnected generally. However, it is also possible that such peopleare deliberately trying to lie low, so it may be useful to apply extra scrutiny to them.

The choice of threshold affects the tightness of the boundary of each cluster – increasing thethreshold reduces the membership of each cluster and removes some apparently anomalous indi-viduals.

A sense of the overall ICA can be obtained by examining an image of the W and H matrices.Each column of the W matrix corresponds to one component. The presence of high values in thiscolumn indicates objects that are associated with this component (recall that the organization ofthe data tends to place similar people in adjacent rows already, which is why the clusters are soobvious). For example, column 1 reveals the cluster at rows ∼230–250, and column 3 reveals thecluster at rows ∼60–80 but with some weaker connections to the leadership (early rows).

Similarly, each row of the H matrix corresponds to a component, and indicates which attributes

15

Page 16: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

play a role in the selection of that component. The matrices are shown in Figures 11 and 12respectively.

1 2 3 4 5 6 7 8 9 10

50

100

150

200

250

300

350

Figure 11: Image of the W matrix. Each row corresponds to one component, and the distinctivecoloring to the members of that component.

4.2 Using demographic and relational information

We now add demographic information including: year of birth (dob), country of birth (birth-place), cultural group (clump), national status (natstatus), socioeconomic status (fses), religiousbackground (religbgnd), type of school attended (school), education level attained (educ), typeof education (edtype), occupation (occup), marital status (married), number of children (kids),possession of a criminal background (crimbgnd), year of joining al Qaeda (yrjoin), age at joiningal Qaeda (agejoin), place at which member joined (placejoin), country in which member joined(countryjoin), fate, and year left the group (yrleft) usually by death. The number of demographicattributes and their amount of variation they show produces plots with much less clustering.

Figure 13 shows the basic clustering among al Qaeda members based on SVD. It is clear that thegroup is fairly homogeneous, except for a distinct cluster towards the bottom of the figure. As weshall see, this cluster represents a subgroup of members who have a stronger religious backgroundand religious education than the majority.

Analysis of the attributes, shown in Figure 14, shows that dimension 1 captures the variation ineducational attainment, dimension 2 captures variations in locations such as where members were

16

Page 17: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

50 100 150 200 250 300 350

1

2

3

4

5

6

7

8

9

10

Figure 12: Image of the H matrix. Note that it is the transpose of the W matrix because the datamatrix is (almost) symmetric.

born and joined the organization, and dimension 3 captures differences in religious background andschooling.

These relationships among the dimensions in the attribute space must be consistent with therelationships among members in the object space. Figures 15 and 16 show the most unusualmembers, projected in different dimensions. We can see that, for example, al-Zawahiri and binLaden are well-educated while Omar Khadr and Abdul Karim Khadr are not (separation alongaxis U1). The second dimension captures differences in country of birth and country in which themember joined the jihad. Since the countries are coded alphabetically, this reveals no absoluteinformation about the structure of al Qaeda, although it may reveal some relative information. Forexample, Abdallah ibn Mohammad al-Rashoud was born and joined the jihad in Saudi Arabia,while Chellali Benchellali was born in Algeria and joined in France and Wadih el-Hage was born inLebanon and joined in Afghanistan.

Figure 16 shows the relationship between education and religious background. Now the verti-cal dimension represents degree of religious background, with Hage a non-Muslim with a seculareducation, and Mukhlas and Zulkarnaen from a religious background and pupils of a Madrassa.Note the small cluster in the lower left-hand corner of members who are both religious and highlyeducated: Sheikh Omar Abdel Rahman, who has doctorate, and Abu Bakar Baasyir and AbdullahSungkar, who both have Masters degrees.

17

Page 18: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.1 −0.05 0 0.05 0.1 0.15

−0.15−0.1−0.0500.050.1

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

U1U2

U3

Figure 13: SVD plot of al Qaeda members using both demographic and relationship attributes.

The size of the singular values (the diagonal of S) give some indication of the relative importanceof the factors corresponding to each dimension. These values are 13.5, 11.9 and 9.6 indicating thateducation is about 40% more important as an explanation for variation among al Qaeda membersthan religious background (13.5/9.6).

Figure 17 shows the effect of cultural background. It is clear that the leadership (squares)are slightly different from the remainder of the members, but this is not surprising because of thegroup’s history. Its leaders come from similar backgrounds and are of a similar age. There are verylittle difference between the characteristics of the other groups, although the Maghreb members(circles) show some systematic differences from the core Arab (stars) members.

Figure 18 shows an SVD overlaid with information obtained from SDD. In this case, we haveused a more powerful combination of the two called the JSS methodology: SVD is applied to thedata matrix, A, the decomposition is truncated at some k and the component matrices multipliedto give a modified version of A. SDD is then applied to the correlation matrix obtained from themodified version of A. This correlation matrix captures higher-order correlation information andtends to provide a clearer picture of complex data than using SDD directly on A.

It is clear from the figure that the extra information agrees with the clustering given by SVD.Note that the group of well-educated, religious members is captured as a subgroup. It is alsonoticeable that the well-educated cluster displays more variability than the matching cluster ofless educated members. There are substantial overlaps between the well-educated cluster and

18

Page 19: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.4−0.3

−0.2−0.1

00.1

0.20.3

0.4

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

−1

0

1

V2

dob

yrjoin

countryjoin

clump fses

birthplace yrleft

natstatus

V1

p331 p333 p334 p332 p338 p336 p337 p335 p83 p138 p321 p76 p327

religbgnd

p141 p132 p75 p80 p81 p77 p293 p82 p232 p326 p233 p305 p139 p103 p73 p292 p147 p245 p236 p246 p300 p72 p304 p127 p235 p234 p74 p243 p124 p294 p94 p143 p272 p322 p339 p111 p134 p244 p279 p289 p247 p110 p226 p248 p95 p242 p240 p188 p278 p186 p115 p291 p79 p135 p187 p277 p126 p109 p122 p140 p47 p275 p295 p223

p114 p314 p283 p125 p108 p48 p46 p164 p71 p347 p290 p144 p142 p280 p241 p123 p145 p325 p148 p315 p90 p225 p276 p329 p120 p78 p317 p298 p45 p267 p98 p101 p273 p299 p149 p117 p113 p265 p28 p316 p96 p119 p68 p104 p266 p340 p128 p249 p362 p269 p230 p229 p202 p190 p102 p100 p238 p121 p231 p284 p349 p268 p345 p303 p285 p261 p263 p274

school

p318 p286 p91 p270 p344 p311 p20 p130 p55 p348 p319 p359 p150 p146 p137 p136 p133 p131 p129 p106 p52 p36 kids married p105 p260 p85 p271 p37 p301 p210 p70 p88 p116 p93 p297 p84 p330 p181 p86 p157 p169 p346 p302 p97 p342 p165 p360 p92 p118 p185 p184 p112 p258 p288 p365 p69 p87 p194 p287 p307 p313 p201 p324 p172 p54 p59 p228 p107 p355 p352 p89 p254 p361 p363 p216 p354 p53 p328 p366 p308 p42 p237 p182 p198 p29 p56 p227 p320 p356 p199 p189 p282 p310 p215 p259 p44 p256 p224 p296 p200 p257 p353 p323 p255 p262 p264 p364 p26 p250 p192 p65 p38 p251 p221 p57 p220 p239 p253 p193 p191 p206 p209 p60 p167 p99 p61 p67 p51 p281 p217 p309

p205

p32 p18 p177 p154 p214 p252

p66 p9 p27 p168 p180 p207 p211

p63

p41 p208 p62 p204 p17 p341 p170 p176 p358 p219 p58 p212 p343 p50 p40 p196 p350 p16 p173 p213 p312 p35 p197 p203 p163 p162 p178 p158 p160 p218 p34 p152 p14 p151 p183 p7 p166 p306 p39 p222 p24 p159 p49 p19 p23 p156 p22 p5 p64 p174 p43 p357 p25 p161 p30 p351 p15 p8 p12 p31 p179 p33 p195 p11 p171 p175 p10 p4 p13 p155 p3 p21 p6

p153 p2 p1

fate crimbgnd

placejoin

occup

edtype

agejoin

educ

V3

Figure 14: SVD plot of dataset attributes, showing the relationships among them. The large clusterin the center are the attributes associated with explicit relationships.

the group’s leadership, providing further evidence that the stereotype of terrorists as ignorant,brainwashed, or psychotic does not apply to al Qaeda.

Independent Component Analysis is not useful on the demographic data because it tends toselect small groups who resemble each other on the basis of a few demographic attributes – whichis both misleading, and obvious from the raw data.

5 Discussion

5.1 Methodology

We can see from these results that the major benefit of SVD is its ability to select and order objects(in this case al Qaeda members) from most to least interesting. This is partly because al Qaeda isa fairly homogeneous organization, so that there are few significant demographic clusters within it.Even the clustering visible in the relationship data is important only for the more unusual/importantmembers – most of the rank and file are quite similar. SDD allows more detailed and discriminativeanalysis, as it is able to provide boundaries between subgroups more precisely.

The major benefit of ICA is its ability to find and select closely coupled groups of individuals.Unlike a traditional clique-discovery algorithm, ICA allows an individual to participate in several

19

Page 20: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.1−0.0500.050.10.15

−0.15

−0.1

−0.05

0

0.05

0.1

Zulkarnaen

Mukhlas

Baasyir

Sungkar

Sheikh Omar

Khadr3

Khadr4

Benchellali

Hannachi

Rashoud

Khabab Belfas

Zawahiri

bin Laden

Hage

Galan

U2

U1

Figure 15: SVD plot of 16 interesting members (greater than 1.5 times the median distance fromthe origin) in dimensions 1 and 2. Dimension 1 represents variation in education; dimension 2variation in place of origin.

groupings which is both more flexible and more realistic.A number of parameter choices were made in these algorithms: the number of dimensions

at which to truncate the SVD for visualization, and for preparation for SDD, the scaling of thearray entry magnitudes for SDD, the boundaries for considering objects interesting, the number ofcomponents used for ICA, and the thresholds used for using ICA components to select groups ofmembers. Sensible values for all of these were chosen, but other structures might conceivably berevealed by other parameter choices. At present, no principled ways to choose these parametersare known.

A major advantage of matrix decompositions over typical social network and link analysis toolsis complexity. The matrix decompositions used here have complexities that are typically cubic inn, the number of people being considered. However, when the data is sparse, as relationship datausually is, this can be reduced to linear, which remains feasible even for much, much larger datasets.In contrast, measures such as centrality have complexities that are at least cubic in n and oftenworse (because they often consider all paths in a graph). Furthermore, the software tools used forlink analysis often assume quite small networks and so are not optimized for datasets even of thissize, let along larger ones.

Link analysis also has a number of other drawbacks. First, such analyses are at the mercy

20

Page 21: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.1−0.0500.050.10.15

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Galan

Hage

bin Laden Zawahiri Belfas Khabab

RashoudU3

Hannachi

Benchellali

Khadr4

Khadr3 Sheikh Omar Sungkar Baasyir

Mukhlas Zulkarnaen

U1

Figure 16: SVD plot of 16 interesting members (greater than 1.5 times the median distance fromthe origin) in dimensions 1 and 3. Dimension 1 represents variation in education; dimension 3variation in religious background.

of their graph-drawing algorithms, which may create a misleading impression of the importanceof an individual through an accident of placement. Second, the individuals to which attention isdrawn are those with many connections. This is useful, but is easily extracted from the raw data,and fails to show either centrality measures or higher-order connections. Third, the graphs quicklybecome large, so that only small pieces can be seen at a time, which makes it hard to extract globalinformation or see large-scale patterns. Hence, although visualization via link analysis plays tohuman strengths in seeing patterns, the size and complexity of the graphs involves tends to makethis difficult to achieve in practise.

5.2 al Qaeda

It is clear from this analysis that al Qaeda is better regarded as a loose confederation of groups withrelated aims than as a hierarchically-controlled, functionally-organized single group. Repeatedly,the structure that emerges from considering relationships among members matches the structurerelated to groups that have carried out attacks. In al Qaeda, it is who you know, not what you knowthat determines your role in the organization. This observation has also been made by Sageman[16], Gunaratna [6], and others.

21

Page 22: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.1−0.0500.050.10.15

−0.15

−0.1

−0.05

0

0.05

0.1

U1

U2

Figure 17: SVD plot based on both relationship and demographic data, showing cultural groupmembership (squares – leadership, stars – core Arabs, circles – Maghreb, dots – S.E. Asian).

The difference between the three cultural groupings: core Arabs, South East Asian muslims,and European/Maghreb muslims, is also strong, as is the fact that the leadership is not only madeup of core Arabs, but is also much more tightly bound to this group than to the others. In fact,it is surprising that there are not more people who play the role of Hambali in connecting groupstogether – surprising enough that it raises the question of whether there are in fact such people,either not captured at all in this dataset, or about whom not enough is known to elicit this role.

It is also clear that al Qaeda is an egalitarian organization in the sense that there is no particularprofile to its members. Although education level is the most important variable among al Qaedamembers, there is no clear division across the spectrum from most to least educated. The onlyvariable that separates the group into two subclusters is religious background and schooling; and,contrary to widespread expectation, it is the more religious cluster that is the smaller.

The use of SVD as a technique for identifying the most interesting members of a group is alsoquite successful at identifying either group leaders or those with an important technical role. Forexample, Figure 4 identifies both bin Laden and Ramzi Mohammad Abdullah bin al-Shibh (whohandled money transfers for the September 2001 attacks).

Other possible analyses using matrix decompositions would be to restrict the dataset to thosestill alive and examine the relationships among the organization; and also to examine how the

22

Page 23: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

−0.1−0.0500.050.10.15

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Galan Jarrah Hage

Jabarah1 Jabarah2

Janjalani Padilla Harbi

Vallat Hassouna Harhra AIGhamdi Galeb Motassadeq Suqami Sharif Atta Tayer HazmiS Nami Butt BuHaymid Caze Ujaama Sufaat Mzoudi FAShehri Djaffo Hurayrah MaShehri Karakus Gokhan HazmiN Tugluoglu Ayaz SaGhamdi Omari Gancz Marri Thomas JeromeC Faris Essabar Roche Polat Yarkas Murad Meskini OmarSheikh Ersoz DavidC Mehni Damir2 Junaydi Faiz Bahaji Ghalain Jad Hajeri Hijazi KSM Tbaiti Dahab Mutayri Hawsawi Hasyim Elbaneh Chadli Tebourski Benchellali3 Rhouzlane Mihdhar Habib Moussaoui bin Laden Zawahiri2 Ilhan Bonte Najjar Boujedli HaGhamdi Khalifah Reid Azil Dhess Khechtou Zawahiri Omary Haddou Parlin WanMin Belfas BinHir Lillie Farik Ikhlef Shehhi Brigitte Abrash Samir Mahjoub Fawwaz Dumont RaFalah Yitiz Ujayzah Chedadi Dahmane Alluni Robert Damir1 WailShehri Sarmad Marzuki Maidin Brahim Mujati Islambuli Ekinci SamHada Dandani Ugurlu Mehdi Bari Khemais Darkazanli Hicks HasFaqih Boukhari Badawi Quso ManFaqih Merabet RamziYousef WalShehri MaFadli Chaib Chebli Shakur Hanjour Ouzghar Bentassir Salamah Daoudi Gharssi Mardi Cabuk Nawar Akhnouche Jaime Top TaFadli Haouari Ghayth Jaballah Maqbul Kuncak Johani Khabab James Sirri Mohsin Charkaoui Muntasir Darra Qasim Fathi Banshiri Taha Maaroufi Hijazi2 Boumez Labsi Dahoumane Khadr MOShehri Aydarus Kamel Azahari Arnaout Maroni AliMohd Mourad Fikri Koussa Benahmed Jiddi Mustofa Kastari Laroussi Shihatah MMKamel Boughrara Damir3 Bandar Bourada Khalim Makkawi Nabi Shibh Ziyad Haddouchi Garoise Shadi ZaMihd Benyaich3 Bounour Khallad M Atef Harkat Mahdjoub Kadri Rushdi Benakcha Hamza EsSayed SbinLaden Bouguelane Salah Zubaydah Rashoud Badache KKMohd Atmani Fateh JaafarU3 Harithi Jalil Rushud Zinedine Zaoui Ouaer mithim Sabour Laidouni Benyaich2 Ameroude Zaidan Trabelsi Zemiri Nashiri Bensakhria Benyaich1 Bensaid Sulaeman Benchellali2 Afkir Sassi Choulah Mistooki Ramda Aufi Zammar Khalfaoui Ressam Mountassir Lounici Melouk Fakhiri Doha Rahmouni Hanif TaFalah Setiono Benali Bahaiah AIHGhamdi Odeh Derwish Marzoug Lahnech Hada Taoussi Fadl Attar Aouni Miloudi AlAzdi Samudra Ahdal Faruq Beghal Fizazi Loiseau Abbas Aktas Yunos Yadel Kandari Dwikarna Hajri Said Rusdan Shah Aihar Khabou Belkacem Hannachi Sliti Shamrani Benchellali1 Benbahlouli Djouina Muqrin Qatada Drici Owhali Tabarak Fattoum Maameri Bouhadjar Bendaoui Zarouali Ghoni Zemmiri Ayiri Hoshar Slimani Rabah Touchent Iddir Kelkal Hammadi Hambali Idris Faradji Amrozi Benchellali Fizazi2 Rafiki Jamal Salim Abouhalima Dulmatin

Souimdi Slahi Khadr2 Naharudin Zarqawi Yunos2 Khadr1 Elomari Gungun

Khadr4 Halim Siliwangi Mustaqim Khadr3

Sheikh Omar Iqbal Ayub Rais Sani Sungkar Baasyir Hafidh Haroun Imron Ghozi Syawal Hussein Mukhlas Mobarok Muhajir Zulkarnaen Jabir

U1

Figure 18: SVD plot with SDD color and shape labelling, using both demographic and relationshipdata.

relationship structures have changed over time. Of course, the addition of classified informationto the data used here would also provide a more reliable picture. Together these three approachesmight make prediction possible: previous attacks have often been marked by quite strong changesin connectivity and apparent demographics over a short time period by the individuals carryingout the attacks and their support cells. Such changes would certainly be visible in the matrixdecomposition plots.

The dataset here does not consider connectivity as a function of communication, but clearlythe addition of links based on email or telephone conversation (for example, from Echelon) wouldprovide a better and more timely picture of connections within the organization.

6 Conclusion

Transnational terrorist groups such as al Qaeda present new challenges for counterterrorism. Likeall terrorist groups, their members and actions are hard to see against the background of innocentgroups and activities. Al Qaeda has demonstrated that groups formed as very loose federationscan still be effective, creating a new requirement to detect sporadic or short-lived links to create aworkable defense.

We have shown how matrix decomposition techniques can be used to enhance link and so-cial network analysis. These techniques are more revealing than clique detection and centrality

23

Page 24: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

measures, less reliant on exact data, and much more efficient to compute.While we discover nothing particularly new about al Qaeda, much existing knowledge is repli-

cated from much less data. In particular, the techniques we have used are able to detect and rankthe importance of members of the group solely based on their relationships. This is a powerfuladdition to the arsenal of counterterrorism data analysis techniques.Acknowledgement: I am deeply grateful to Marc Sageman for making the al Qaeda datasetavailable.

References

[1] F.R. Bach and M.I. Jordan. Finding clusters in Independent Component Analysis. TechnicalReport UCB/CSD-02-1209, Computer Science Division, University of California, Berkeley,2002.

[2] W.E. Baker and R.B. Faulkner. The social organization of conspiracy: Illegal networks inthe heavy electrical equipment industry. American Sociological Review, 58:837–860, December1993.

[3] J. Corbin. Al-Qaeda: In Search of the Terror Network that Threatens the World. Thunder’sMouth Press, 2002.

[4] G.H. Golub and C.F. van Loan. Matrix Computations. Johns Hopkins University Press, 3rdedition, 1996.

[5] United States Government. Final Report of the National Commission on Terrorist AttacksUpon the United States. 2004.

[6] R. Gunaratna. Inside al Qaeda. Berkley Publishing Group, 3rd edition, 2003.

[7] A. Hyvarinen. Survey on independent component analysis. Neural Computing Surveys, 2:94–128, 1999.

[8] A. Hyvarinen and E. Oja. Independent component analysis: Algorithms and applications.Neural Networks, 13(4–5):411–430, 2000.

[9] D. Jensen and J. Neville. Data mining in social networks. Invited presentation to the NationalAcademy of Sciences Workshop on Dynamic Social Network Modeling and Analysis, November2003.

[10] R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. In Proceedingsof the 41st Foundations of Computer Science (FOCS ’00), page 367, 2000.

[11] G. Kolda and D.P. O’Leary. A semi-discrete matrix decomposition for latent semantic indexingin information retrieval. ACM Transactions on Information Systems, 16:322–346, 1998.

[12] V.E. Krebs. Mapping networks of terrorist cells. Connections, 24(3):43–52, 2002.

[13] S. McConnell and D.B. Skillicorn. Semidiscrete decomposition: A bump hunting technique.In Australasian Data Mining Workshop, pages 75–82, December 2002.

24

Page 25: Social Network Analysis via Matrix …Social Network Analysis via Matrix Decompositions: al Qaeda D.B. Skillicorn School of Computing Queen’s University skill@cs.queensu.ca August

[14] A. Y. Ng, A. X. Zheng, and M. I. Jordan. Link analysis, eigenvectors and stability. In Proceed-ings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI-01),pages 903–910, 2001.

[15] D.P. O’Leary and S. Peleg. Digital image compression by outer product expansion. IEEETransactions on Communications, 31:441–444, 1983.

[16] M. Sageman. Understanding Terror Networks. University of Pennsylvania Press, 2004.

[17] R.H. Shultz and A. Vogt. The real intelligence failure on 9/11 and the case for a doctrineof striking first. In R.D. Howard and R.L. Sawyer, editors, Terrorism and Counterterrorism:Understanding the New security Environment, pages 405–428. McGraw-Hill Dushkin, 2004.

[18] G.W. Stewart. On the early history of the Singular Value Decomposition. Technical ReportTR-2855, University of Maryland, Department of Computer Science, March 1992.

[19] K.M. van Meter. Terrorists/liberators: Researching and dealing with adversary social net-works. Connections, 24(3):66–78, 2002.

25