13
Uncovering the Dark Web: A Case Study of Jihad on the Web Hsinchun Chen Artificial Intelligence Lab, Department of Management Information Systems, The University of Arizona, Tucson, AZ 85721, USA. E-mail: [email protected] Wingyan Chung Department of Operations and Management Information Systems, Leavey School of Business, Santa Clara University, Santa Clara, CA 95053, USA. E-mail: [email protected] Jialun Qin Management Department, College of Management, University of Massachusetts Lowell, Lowell, MA 01854, USA. E-mail: [email protected] Edna Reid Department of Library Science, Clarion University, Clarion, PA 16214, USA. E-mail: [email protected] Marc Sageman The Solomon Asch Center for Study of Ethnopolitical Conflict, University of Pennsylvania, Philadelphia, PA 19104, USA. E-mail: [email protected] Gabriel Weimann Department of Communication, University of Haifa, Haifa 31905, Israel. E-mail: [email protected] While the Web has become a worldwide platform for communication, terrorists share their ideology and com- municate with members on the “Dark Web”—the reverse side of the Web used by terrorists. Currently, the prob- lems of information overload and difficulty to obtain a comprehensive picture of terrorist activities hinder effec- tive and efficient analysis of terrorist information on the Web. To improve understanding of terrorist activities, we have developed a novel methodology for collecting and analyzing Dark Web information. The methodology incorporates information collection, analysis, and visual- ization techniques, and exploits various Web information sources. We applied it to collecting and analyzing infor- mation of 39 Jihad Web sites and developed visualization of their site contents, relationships, and activity levels. An expert evaluation showed that the methodology is very useful and promising, having a high potential to assist in investigation and understanding of terrorist activities by producing results that could potentially help guide both policymaking and intelligence research. Received September 20, 2006; revised June 29, 2007; accepted January 4, 2008 © 2008 ASIS&T Published online 7 April 2008 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/asi.20838 1. Introduction The Internet has evolved to become a global platform through which anyone can conveniently disseminate, share, and communicate ideas. Despite many advantages, misuse of the Internet has become ever more serious, however. Terrorist organizations, extremist groups, hate groups, and racial supremacy groups are using the Web to promote their ideology, to facilitate internal communications, to attack their enemies, and to conduct criminal activities. Warnings have been made that terrorists may launch attacks on such critical infrastructure as major e-commerce sites and govern- mental networks (Gellman, 2002). Insurgents in Iraq have posted Web messages asking for munitions, financial support, and volunteers (Blakemore, 2004). It therefore has become important to obtain from the Web intelligence that permits better understanding and analysis of terrorist and extremist groups. We define this reverse side of the Web as a “Dark Web,” the portion of the World Wide Web used to help achieve the sinister objectives of terrorists and extremists. Currently, intelligence from the Dark Web is scattered in diverse information repositories through which investigators need to browse manually to be aware of their content. Much JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 59(8):1347–1359, 2008

Uncovering the DarkWeb: A Case Study of Jihad on theWeb

Embed Size (px)

DESCRIPTION

While the Web has become a worldwide platform forcommunication, terrorists share their ideology and communicatewith members on the “DarkWeb”—the reverseside of the Web used by terrorists. Currently, the problemsof information overload and difficulty to obtain acomprehensive picture of terrorist activities hinder effectiveand efficient analysis of terrorist information on theWeb. To improve understanding of terrorist activities,we have developed a novel methodology for collectingand analyzing Dark Web information. The methodologyincorporates information collection, analysis, and visualizationtechniques, and exploits variousWeb informationsources. We applied it to collecting and analyzing informationof 39 JihadWeb sites and developed visualizationof their site contents, relationships,and activity levels. Anexpert evaluation showed that the methodology is veryuseful and promising, having a high potential to assist ininvestigation and understanding of terrorist activities byproducing results that could potentially help guide bothpolicymaking and intelligence research.

Citation preview

  • Uncovering the Dark Web: A Case Study of Jihadon the Web

    Hsinchun ChenArtificial Intelligence Lab, Department of Management Information Systems, The University of Arizona,Tucson, AZ 85721, USA. E-mail: [email protected]

    Wingyan ChungDepartment of Operations and Management Information Systems, Leavey School of Business, Santa ClaraUniversity, Santa Clara, CA 95053, USA. E-mail: [email protected]

    Jialun QinManagement Department, College of Management, University of Massachusetts Lowell, Lowell, MA 01854,USA. E-mail: [email protected]

    Edna ReidDepartment of Library Science, Clarion University, Clarion, PA 16214, USA. E-mail: [email protected]

    Marc SagemanThe Solomon Asch Center for Study of Ethnopolitical Conflict, University of Pennsylvania, Philadelphia,PA 19104, USA. E-mail: [email protected]

    Gabriel WeimannDepartment of Communication, University of Haifa, Haifa 31905, Israel. E-mail: [email protected]

    While the Web has become a worldwide platform forcommunication, terrorists share their ideology and com-municate with members on the Dark Webthe reverseside of the Web used by terrorists. Currently, the prob-lems of information overload and difficulty to obtain acomprehensive picture of terrorist activities hinder effec-tive and efficient analysis of terrorist information on theWeb. To improve understanding of terrorist activities,we have developed a novel methodology for collectingand analyzing Dark Web information. The methodologyincorporates information collection, analysis, and visual-ization techniques, and exploits various Web informationsources. We applied it to collecting and analyzing infor-mation of 39 JihadWeb sites and developed visualizationof their site contents, relationships,and activity levels. Anexpert evaluation showed that the methodology is veryuseful and promising, having a high potential to assist ininvestigation and understanding of terrorist activities byproducing results that could potentially help guide bothpolicymaking and intelligence research.

    Received September 20, 2006; revised June 29, 2007; accepted January 4,2008

    2008 ASIS&T Published online 7 April 2008 in Wiley InterScience(www.interscience.wiley.com). DOI: 10.1002/asi.20838

    1. IntroductionThe Internet has evolved to become a global platform

    through which anyone can conveniently disseminate, share,and communicate ideas. Despite many advantages, misuseof the Internet has become ever more serious, however.Terrorist organizations, extremist groups, hate groups, andracial supremacy groups are using the Web to promote theirideology, to facilitate internal communications, to attacktheir enemies, and to conduct criminal activities. Warningshave been made that terrorists may launch attacks on suchcritical infrastructure as major e-commerce sites and govern-mental networks (Gellman, 2002). Insurgents in Iraq haveposted Web messages asking for munitions, financial support,and volunteers (Blakemore, 2004). It therefore has becomeimportant to obtain from the Web intelligence that permitsbetter understanding and analysis of terrorist and extremistgroups. We define this reverse side of the Web as a DarkWeb, the portion of the World Wide Web used to help achievethe sinister objectives of terrorists and extremists.

    Currently, intelligence from the Dark Web is scattered indiverse information repositories through which investigatorsneed to browse manually to be aware of their content. Much

    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 59(8):13471359, 2008

  • of the information stored in search engine databases couldbe properly collected and analyzed for transformation intointelligence and knowledge that would enhance understand-ing of terrorists activities. However, search engines oftenoverwhelm users by producing laundry lists of irrelevantresults and creating information overload problems. Relatedbut unfocused information makes it difficult to obtain a com-prehensive description of a terrorist group or a terrorismtopic. Many Web resources contain information about ter-rorism, but a relatively small proportion comes from terroristgroups themselves and data on the Web often are not persis-tent and may be misleading. Many terrorist Web sites do notuse English, so investigators who do not speak that languagemay be unable to understand a sites content.

    In this article, we have addressed the aforementionedproblems by proposing and implementing a semiautomatedmethodology for collecting and analyzing Dark Web infor-mation. Leveraging human preciseness and machine effi-ciency, the methodology consists of various steps includingcollection, filtering, analysis, and visualization of Dark Webinformation. We used this comprehensive methodology tocollect and analyze data from 39 Arabic terrorist Web sitesand conducted an evaluation of the results. This researchaimed to study to what extent the methodology can assistterrorism analysts in collecting and analyzing Dark Webinformation. From a broader perspective, this research con-tributes to the development of the new science of Intelli-gence and Security Informatics (ISI), the study of the use anddevelopment of advanced information technologies, systems,algorithms, and databases for national security related appli-cations through an integrated technological, organizational,and policy based approach (Chen, 2005; Strickland & Hunt,2005). We believe that many existing computer and informa-tion systems techniques need to be reexamined and adaptedfor this unique domain to create new insights and innovations.

    The rest of this paper is structured as follows. The secondsection presents a review of terrorists use of informationtechnologies to facilitate terrorism, information services forstudying terrorism, and advanced techniques for collect-ing and analyzing terrorism information. The third sectiondescribes a methodology for collecting and analyzing DarkWeb information. The fourth section illustrates the use of themethodology in a case study of Jihad on the Web (whereJihad is an Islamic term referring to a holy war wagedagainst enemies) and discusses the evaluation results. The lastsection concludes the study and discusses future directions.

    2. Literature Review2.1. Terrorists Use of the Web

    Recent studies have shown how terrorists use the Web tofacilitate their activities. Tsfati and Weimann used the namesof terrorist organizations to search six search engines andfound 16 relevant sites in 1998 and 29 such sites in 2002(Tsfati & Weimann, 2002). Their analysis of site contentrevealed heavy use of the Web by terrorist organizations to

    share ideology, to provide news, and to justify use of violence.Relying on open source information (e.g., court testimony,reports, Web sites), researchers at the Institute for SecurityTechnology Studies identified five categories of terrorist useof the Web (Technical Analysis Group, 2004): propaganda(to disseminate radical messages); recruitment and training(to encourage people to join the Jihad and get online train-ing); fundraising (to transfer funds, conduct credit card fraudand other money laundering activities); communications (toprovide instruction, resources, and support via email, digi-tal photographs, and chat session); and targeting (to conductonline surveillance and identify vulnerabilities of potentialtargets such as airports). Among these, using the Web as apropaganda tool has been widely observed.

    Identified by the U.S. Government as a terrorist site,Alneda.com called itself the Center for Islamic Studiesand Research, a bogus name, and provided informationfor Al Qaeda (Thomas, 2003). To group members (insid-ers), terrorists use the Web to share motivational stories anddescriptions of operations. To mass media and non-members(outsiders), they provide analysis and commentaries ofrecent events on their Web sites. For example, Azzam.comurged Muslims to travel to Pakistan and Afghanistan tofight the Jewish-backed American Crusaders. Qassam.netappealed for donations to purchase AK-47 rifles (Kelley,2002). Al Qaeda and some humanitarian relief agenciesused the same bank accounts via www.explizit-islam.de(Thomas, 2003).

    Terrorists also share ideologies on the Web that providereligious commentaries to legitimize their actions. Based ona study of 172 members participating in the global SalafiJihad, Sageman concluded that the Internet has created aconcrete bond between individuals and a virtual religiouscommunity (Sageman, 2004). His study reveals that the Webappeals to isolated individuals by easing loneliness throughconnections to people sharing some commonality. Such vir-tual community offers a number of advantages to terrorists.It no longer ties to any nation, fostering a priority of fight-ing against the far enemy (e.g., the United States) ratherthan the near enemy. Internet chat rooms tend to encour-age extreme, abstract, but simplistic solutions, thus attractingmost potential Jihad recruits who are not Islamic scholars.The anonymity of Internet cafs also protects the identityof terrorists. However, Sageman does not consider the Inter-net to be a direct contact with Jihad, because devotion toJihad must be fostered by an intense period of face-to-faceinteraction. In addition, existing studies about terrorists useof the Web mostly use a manual approach to analyze volu-minous data. Such an approach does not scale up to rapidgrowth of the Web and frequent change of terroristsidentitieson the Web.

    2.2. Information Services for Studying TerrorismDespite the public nature of the Web, terrorists often try

    to prevent authorities from tracing their Web addresses andactivities, which has prompted several information services

    1348 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi

  • to monitor the Web sites of militant Islamic groups and toprovide access to translated versions of information postedthere. The Jihad and Terrorism Project was developed bythe Middle East Media Research Institute to bridge the lan-guage gap between theWest and the Middle East by providingtimely translations of Arabic, Farsi, and Hebrew documents(Middle East Media Research Institute, 2004). The Projectfor the Research of Islamist Movements (www.e-prism.org)studies radical Islam and Islamist movements, focusing pri-marily on Arabic sources. These projects provide access toan array of information such as translated news stories, tran-scripts, video clips, and training documents produced byterrorists but fall short of supporting analysis and visual-ization of terrorist data from the Dark Web (Project for theResearch of Islamist Movements, 2004).

    2.3. Advanced Information Technologies forCombating Terrorism

    Since the 9/11 attacks, there has been increased interestin using information technologies to counter terrorism. Astudy conducted by the U.S. Defense Advanced ResearchProjects Agency shows that their collaboration, model-ing, and analysis tools speeded analysis (Popp, Armour,Senator, & Numrych, 2004), but these tools were not tai-lored to collecting and analyzing Web information. Althoughnew approaches to terrorist network analysis have been calledfor (Carley, Lee, & Krackhardt, 2001), existing efforts haveremained mostly small scale; they have used manual anal-ysis of a specific terrorist organization and did not includeresources generated by terrorists in their native languages. Forinstance, Krebs manually collected data from English newsreleases after the 9/11 attacks and studied the network sur-rounding the 19 hijackers (Krebs, 2001).Although automatedsocial network analysis techniques have been proposed toanalyze and portray criminal networks, it is not clear whetherthe techniques are applicable to the mostly unstructured datain terrorist Web sites that contain textual and multimediadata (Xu & Chen, 2005). Their use of structured data ina police department database also does not help understandterrorist Web sites. Other advanced information technologieshaving potential to help analyze terrorist data on the Webinclude information visualization and Web mining.

    Information visualization technologies have been used inmany domains (Zhu & Chen, 2005) such as criminal anal-ysis (Chung, Chen, Chaboya, OToole, & Atabakhsh, 2005)and business stakeholder analysis (Chung, 2007). For exam-ple, multidimensional scaling (MDS) algorithms consist of afamily of techniques that portray a data structure in a spatialfashion, where the coordinates of data points are calculatedby a dimensionality reduction procedure (Young, 1987).MDS has been many different applications. Chung and hiscolleagues developed a new browsing method based on MDSto depict the competitive landscape of businesses on the Web(Chung, Chen, & Nunamaker, 2005). He and Hui appliedMDS to displaying author cluster maps in their author co-citation analysis (He & Hui, 2002). Eom and Farris applied

    MDS to author co-citation in decision support systems (DSS)literature over 1971 through 1990 in order to find contributingfields to DSS (Eom & Farris, 1996). Kealy applied MDS tostudying changes in knowledge maps of groups over timeto determine the influence of a computer-based collaborativelearning environment on conceptual understanding (Kealy,2001). Although much has been done in different domains tovisualize relationships of objects using MDS, no attemptsto apply it to discovering terrorists use of the Web have beenfound.

    Web mining is the use of data mining techniques toautomatically discover and extract information from Webdocuments and services (Chen & Chau, 2004; Etzioni, 1996).Chen and his colleagues (Chen, Fan, Chau, & Zeng, 2001)showed that the approach of integrating meta-searching withtextual clustering tools achieved high precision in searchingthe Web. Web page classification, a process of automati-cally assigning Web pages into predefined categories, canbe used to assign pages into meaningful classes (Mladenic,1998). Web page clustering, a process of identifying natu-rally occurring subgroups among a set of Web pages, can beused to discover trends and patterns within a large number ofpages (Chen, Schuffels, & Orwig, 1996). Although a numberof Web mining technologies exist (e.g., Chen & Chau, 2004;Last, Markov, & Kandel, 2006), there has not yet been a com-prehensive methodology to address problems of collectingand analyzing terrorist data on the Web. Unfortunately, exist-ing frameworks using data and text mining techniques (e.g.,Nasukawa & Nagano, 2001; Trybula, 1999) do not addressissues specific to the Dark Web.

    To our knowledge, few studies have used advanced Weband data mining technologies to collect and analyze terroristinformation on the Web, though these technologies have beenwidely applied in such other domains as business and scien-tific research (e.g., Chung et al., 2004; Marshall, McDonald,Chen, & Chung, 2004). New approaches to collecting andanalyzing terrorist information on the Web are needed.

    3. A Methodology for Collecting and AnalyzingDark Web Information3.1. The Methodology

    To address threats from the wide range of informationsources that terrorists and extremists use to spread their ideasand to conduct destructive activities, we have proposed asemiautomated methodology integrating various informationcollection and analysis techniques and human domain knowl-edge. Figure 1 shows the methodology aiming to effectivelyassist human investigators to obtain Dark Web intelligenceusing information sources, collection methods, filtering, andanalysis. Information sources consist of a wide range of providers of

    terrorist or terrorism information on the Web. Some of theseare readily accessible (e.g., search engines) while some, liketerrorism incident databases and Web sites developed and

    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1349DOI: 10.1002/asi

  • Information Sources

    Collection Methods

    DomainSpidering Back link

    search

    Group/PersonalProfile Search

    MetaSearching

    Downloading fromInternet archives

    and forums

    Filtering

    AnalysisDomesticTerrorism

    InternationalTerrorism

    The WebDark WebHate Groups | Racial Supremacy | Suicidal Attackers | Activists /

    Extremists | Anti-Government | ...

    Terrorist Group Web Sites

    SearchEngines

    PropagandaWeb Sites

    Publications on Terrorism

    -Domain knowledge-Linguistic knowledge

    -Verification-Group profiling-Showing relationships-Analyzing dynamics

    -Searching-Browsing-Spidering

    Indexing Visualization

    Extraction Clustering

    Classification

    Section 4.1.1

    Section 4.1.2

    Section 4.1.3

    FIG. 1. A methodology for collecting and analyzing Dark Web information.

    maintained by terrorists and their supporters, can only bereached with the help of domain experts.

    Collection methods make possible automatic searching,browsing, and harvesting of information from identifiedsources. Domain spidering starts with a set of relevant seedURLs and relies on an automatic Web page collection pro-gram, often called a spider or crawler, to harvest Web pageslinked to the seed URLs. Back-link search, supported bysome search engines such as Google (www.google.com) andAltaVista (www.altavista.com, acquired by Overture that wasthen acquired by Yahoo! in 2003), allows searching of Webpages that have hyperlinks pointing to a target Web domainor page. It helps investigators trace activities of terrorist sup-porters and sympathizers, whose Web pages often referenceterrorist sites (e.g., glorify martyrs actions, show a concur-rence of terrorist attacks). Group/personal profile search,exemplified by major Web portals such as Yahoo! (mem-bers.yahoo.com) and MSN (groups.msn.com), reveals theprofiles of groups or individuals who share the same inter-ests. Terrorists and their supporters may perhaps put hotlinks in their profiles, which allow investigators to discoverhidden linkages. Meta-searching uses related keywords asinput to query multiple search engines from which investi-gators or automated programs can collate top-ranked resultsand filter out duplicates to obtain highly pertinent URLs ofterrorist Web sites. With careful formulation of search termsand appropriate linguistic knowledge, they can obtain highlyrelevant results. For example, searching the Arabic name ofUsama Bin Laden ( ) in multiple search enginesreturns mixed results about terrorist news articles and ter-rorist Web sites, while augmenting Usama Bin Laden withthe keyword Sheikh (the head of tribe or leader in Arabic),

    which is frequently used by Al Qaeda to refer to Bin Laden,can give more relevant terrorist and supporter Web sites.Downloading from Internet archives and forums exploits thetemporal dimension of Web information. For instance,the InternetArchive (www.archive.org) offers access to histor-ical snapshots of Web sites. Usenet discussion forums providea wealth of textual communication that can be mined forhidden patterns over time.

    Filtering involves sifting through collected information andremoving irrelevant results, but to perform this task requiresdomain knowledge and linguistic knowledge. Domain knowl-edge refers to knowledge about terrorist groups, their relation-ships with other terrorist and supporter groups, their presenceon and usage of the Web, as well as their histories, activi-ties, and missions. Linguistic knowledge deals with terms,slogans, and other textual and symbolic clues in the nativelanguages of the terrorist groups. Filtering can be automaticor manual, depending on requirements for efficiency of pro-cess and precision of the results. Typically, manual filteringachieves high precision, but it is less efficient and relies ondomain experts who have had years of experience in the field.Automatic filtering is very efficient as it often uses computersand machine learning to process large amounts of data but theresults are less precise. Investigators can obtain high-qualitydata for analysis from filtered repositories.

    Analysis provides insights into data and helps investigatorsidentify trends and verify conjectures. Several functions sup-port these analytical tasks. Indexing relates textual terms toindividual Web pages, thereby supporting precise searchingof the pages. Extraction identifies meaningful entities suchas terrorist names, frequently used slogans, and suspiciousterms. Classification finds common properties among entities

    1350 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi

  • and assigns them to predefined categories to help investiga-tors predict trends of terrorist activities. Clustering organizesentities into naturally occurring groups and helps to iden-tify similar terrorist groups and their supporters. Visualizationpresents voluminous data in a format perceivable by humaneyes, so investigators can picture the relationship within a net-work organization of terrorist groups and can recognize theirunderlying structure.

    3.2. Discussion of the MethodologyAlthough the Internet has been publicly available since the

    1990s, the Dark Web emerged only in recent years. A lackof useful methodology designed for Dark Web data collec-tion and analysis has limited the capability to fight againstterrorism. As discussed above, the proposed methodologyhas incorporated various data and Web mining technologieswhile still allowing human domain knowledge to guide theirapplication. Its semiautomated nature combines machine effi-ciency with the advantages of human precision, a usefulcomplement to computers that usually fail to detect deceptionand ambiguity on the Dark Web. Its coverage of wide vari-eties of data sources and techniques ensures a comprehensiveDark Web data collection, a challenge often faced by terror-ism and intelligence analysts. Therefore, the methodologyand its integration and application of data and Web miningtechnologies to Dark Web analysis are novel contributions tothe ISI research.

    4. Jihad on the Web: A Case StudyTo demonstrate the value and usability of our methodol-

    ogy, we have applied it to collecting and analyzing the useof the Web for Jihad, an Islamic term referring to a holy warwaged against enemies as a religious duty. Believers contendthat those who die in Jihad become martyrs and are guaran-teed a place in paradise. In the recent decades, the conceptof Jihad has been used as an ideological weapon to combatagainst Western influences and secular governments and toestablish an ideal Islamic society (Encyclopedia BritannicaOnline, 2007). Jihad supporters are closely related to terror-ist groups while maintaining anonymity using the Web. Forexample, prior to the 9/11 attacks, Al-Qaeda members senteach other thousands of messages in a password-protectedsection of an extreme Islamic Web site (Anti-DefamationLeague, 2002). Terrorist groups such as Hamas, Hizbollah,and Palestinian Islamic Jihad also use Web sites as propa-ganda tools. We describe the steps of applying the methodol-ogy as follows (see Figure 1). The data described below werecollected in 2004.

    4.1. Application of the Methodology4.1.1. Collection. To collect data, we first identified foursuspicious URLs through Web searching, referencing to pub-lished terrorism reports, and performing personal profilesearches on Yahoo. (For example, we searched hizbollahin Google where we found its URL among the top-ranked

    results.) These URLs are Palestinian Islamic Jihad (PIJ;www.qudsway.com), Hizbollah (www.hizbollah.org), themilitary wing of Hamas (www.ezzedeen.net), and an Ara-bic Web site with a pro-Jihad forum (www.al-imam.net). A2003 U.S. Department of State report confirmed that PIJ,Hizbollah, and Hamas to be terrorist or terrorist-affiliatedgroups (Department of State, 2003). Though Al-Imam.netis not classified as a terrorist organization, it contains pro-Jihad forums in which messages and links to terrorist Websites are posted. We then used the back-link search functionof Google to obtain several hundreds URLs that point to thefour suspicious URLs. As Dark Web information can be scat-tered in many different sources and can be changed quicklyover time, the several methods used to identify the four initialURLs enabled us to cover a broader scope and a more timelycontent than relying only on published reports (e.g., U.S.Department of States annual report). While different initialURLs and different times of data collection could affect thecontent of the data collected, we believe that the choice ofthe four URLs are representative of the Dark Web. It wouldbe an interesting future direction to study the extent to whichdata collection affect the quality of analysis results.

    4.1.2. Filtering. We conducted two rounds of filtering.First, we manually filtered out unrelated sites, such as newsor governmental Web sites that report or discuss only terror-ist activities, religious Web sites with no reference to Jihador violence, and political Web sites where there is no men-tion or approval of terrorist activities. We retained Web sitesof terrorist organizations, those of terrorist leaders and thosethat praise terrorists or their actions. Forty-six sites remainedafter this round of filtering.

    Second, with the help of a native Arabic speaker (whois not a terrorism expert), we manually added 14 terror-ist and supporter sites identified by querying Google withthe keywords (in Arabic) that we had found in the terroristand supporter sites. Such keywords included the leaders andorganizations names in Arabic (mojahedin iran, markazdawa, , etc.). To limit the scope of analy-sis, we considered only the top 50 results returned from thesearch engine in each query search. In addition, we manu-ally removed 21 sites from the set of all sites obtained basedon their relevance to the domain. This round of filtering andrefining resulted in 39 Arabic Web sites24 terrorist sitesand 15 supporter sites.

    4.1.3. Analysis. We performed clustering, classification,and visualization on the 94,326 Web pages collected bycrawling the 39 terrorist and supporter sites using an exhaus-tive breadth-first search spidering program (with a maximumdepth of 10 levels). The first analysis task we performed wasclustering in which we considered as input the 46 Web sitesidentified from the first round of filtering (see paragraph 1 ofSection 4.1.2). The clustering involves calculating a similar-ity between each pair of Web sites in our collection to uncoverhidden Web communities. We define similarity to be a real-valued multivariable function of the number of hyperlinks in

    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1351DOI: 10.1002/asi

  • one Web site (A) pointing to another Web site (B), and thenumber of hyperlinks in the latter site (B) pointing to theformer site (A). In addition, a hyperlink is weighted propor-tionally to how deep it appears in the Web site hierarchy. Forinstance, a hyperlink appearing on the homepage of a Website is given a higher weight than hyperlinks appearing at adeeper level. Specifically, the similarity between Web sitesA and B is calculated as follows:

    Similarity(A,B) =

    All links Lb/w A and B

    11 + lv(L)

    where lv(L) is the level of link L in the Web site hierar-chy, with homepage as level 0 and the level increased by 1with each level down in the hierarchy. Using these heuristics,a computer program automatically extracted hyperlinks onWeb pages and calculated their similarities.

    In the second analysis task, we classified the sites by theiraffiliations with terrorist groups, ideologies, and religions,and by their Web site attributes. Our native Arabic speakermanually identified the affiliations of all the Web sites accord-ing to their site content. Even with the help of the Arabicspeaker, the components of methodology are generic enoughto be applicable to other domains. The choice of this Arabicspeaker, (again, who is not a terrorism expert), also wouldnot affect the results. Table 1 shows the details of the Websites and their affiliations.

    In addition to using affiliations, we classified the sites byindicating how terrorists and their supporters use the Webto facilitate their activities. From our literature review, weidentified six types of terrorist use of the Web and 27 uniqueWeb site attributes. Table 2 presents these attributes catego-rized under the six types. Following this coding scheme, theArabic speaker manually read through all the subject Webpages to record terrorist uses of the Web. Similarly to thatused in studying the openness of government Web sites (LaPorte, Jong, & Demchak, 1999), our coding involved findingwhether an attribute existed on the Web sites (i.e., binary scor-ing). Manual coding of each Web site required 45 minutes to1 hour.

    To reveal patterns of terroristWeb site existence and degreeof a sites activities, we performed in the third analysis tasktwo types of visualization: multidimensional scaling andsnowflake visualizations.

    Multidimensional scaling visualization provided a high-level picture of all the terrorist groups and their rela-tionships. We used Multidimensional scaling (MDS) totransform a high-dimensional similarity matrix to a set oftwo-dimensional coordinates (Young, 1987). While othervisualization techniques might have been applicable, wechose MDS because it suits the current data structure andprovides a vivid picture summarizing terrorist groups rela-tionships. Figure 2 shows these relationships in which thesites appear as nodes and the lines connect pairs of sites thathave at least one hyperlink pointing from one site to another.Using the similarity matrix as input, the MDS algorithm cal-culated coordinates of each site and placed the sites on a

    two-dimensional space where proximity reflects similarity.Upon closer examination of the figure, seven clusters of sitesemerge. (The numbers in parentheses refer to the sites inTable1. The URLs were filtered out in the second-round filteringbut appeared in the collection after the first-round filtering.)

    (1) Hizballah Cluster (# 7, 11, 12, hizbollah.org, andintiqad.org) contains the Web site of Hizballah group(www.hizbollah.org) and its affiliated sites such as HizbollahE-magazine (www.intiqad.org), Hizbollah Support Associ-ation (#11), and the site of Sayyed Hassan Nasrollah (#12),a major leader of Hizbollah.

    (2) Palestinian Cluster (# 4, 5, 6, 9, 13, 14, 15, 36, andh4palestine.com) includes militant groups fighting againstIsrael (e.g., Al-Aqsa Martyrs Brigade, Hamas). There arelinks between sites of the same group (e.g., # 4 and 14) andlinks between sites of different groups (e.g., # 9 and 6).

    (3) Al Qaeda Cluster (# 26, 28, 31, 35, 37, and sahwah.com)includes Salafi groups supporters Web sites that often arelinked to each other in their Other friendly Web sites sec-tion. They use their Web sites heavily to propagate theirideology. For example, Al-ansar.biz posted a video of thebeheading of Nicholas Berg, one of the first civilians killedby terrorists (Newman, 2004). Alsakifah.org provides anonline discussion forum.

    (4) Caucasian Cluster (# 10, 34, kavkazcenter.com, kavkaz.tv,kavkazcenter.net, and kavkazcenter.info) consists of Websites that link to Chechen rebels and provide news updatesfrom Chechen areas. For example, Qoqaz.com has docu-mented operations against Russian military.

    (5) Jihad Supporters (# 29, 30, 32, 33, clearguidance.blogspot.com, and ummanews.com) consist of Web sites providingnews and general information on the global Jihad movement.These sites rarely are linked to each other and often play apropaganda role that targets outsiders.

    (6) Hizb-Ut-Tahrir (# 27, hizb-ut-tahrir.org, expliciet.nl,khilafah.com, and hilafet.com) contains a non-terrorist polit-ical group, Hizb-Ut-Tahrir, dedicated to the restoration ofIslamic law and Khilafah (global leadership of Muslims). Ithas a presence in many Arab countries (e.g., Lebanon, Jor-dan) and some European countries. For instance, Expliciet.nlis a Dutch Web site based in the Netherlands.

    (7) Tanzeem-e-Islami Cluster (tanzeem.org) consists of a sin-gle site representing the Pakistani Tanzeem-e-Islami partywith no clear ties to terrorism.

    Snowflake visualization supports analysis of differentdimensions (or categories) of activities of a Web site clus-ter. It originates from a star plot that has been widely used todisplay multivariate data (Chambers, Cleveland, Kleiner, &Tukey, 1983). A snowflake shown in Figure 2 represents aterrorist site cluster. Figure 3 shows five snowflake diagrams,each representing the degree of activity of terrorist/supportergroups in the five terrorist clusters (Clusters 15) describedabove. (Clusters 6 and 7 are not included because they do notcontain terrorist sites.) The six sides of a snowflake repre-sent the six dimensions of terrorist use of the Web, as shownin Table 2 and explained above. Each of these six dimen-sions represents a normalized scale between 0 and 1 (activityindex), showing the degree of activity on the dimensions.

    1352 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi

  • TABL

    E1.

    Ana

    lysis

    ofJ

    ihad

    terr

    orist

    grou

    psan

    dth

    eirs

    upp

    orte

    rss

    ites.

    No

    Nam

    eU

    RLa

    Des

    crip

    tionb

    Terr

    oris

    tgro

    upc

    Rel

    igio

    n

    Terr

    oris

    tGro

    ups

    Web

    Site

    s(tot

    al:24

    )1

    Spec

    ialF

    orc

    ew

    ww

    .sp

    ecia

    lforc

    e.ne

    tPr

    ovid

    esco

    mpu

    terg

    ame

    repl

    icat

    ing

    the

    fight

    ing

    scen

    esbe

    twee

    nLe

    bane

    sere

    sista

    nce

    and

    Isra

    elio

    ccu

    pier

    sH

    izba

    llah

    Shia

    Mus

    lim

    2Pa

    lesti

    neIn

    foin

    Urd

    upa

    lesti

    ne-in

    fo-u

    rdu.

    com

    Ham

    asn

    ews

    Web

    site

    inU

    rdu

    Ham

    asSu

    nniM

    uslim

    3A

    l-Man

    arw

    eb.m

    anar

    tv.o

    rgTh

    eWeb

    site

    ofA

    l-Man

    ar,

    theT

    Vch

    anne

    lofL

    eban

    ese

    Hiz

    balla

    hH

    izba

    llah

    Shia

    Mus

    lim4

    Abr

    arw

    ayw

    ww

    .ab

    rarw

    ay.co

    mN

    ews

    Web

    site

    ofI

    slam

    icJih

    ado

    fPal

    estin

    eG

    uerri

    llagr

    oup

    Pale

    stini

    anIs

    lam

    icJih

    adSu

    nniM

    uslim

    5Is

    lam

    icJih

    adM

    ail

    ww

    w.jim

    ail.co

    mN

    ews

    Web

    site

    ofI

    slam

    icJih

    ado

    fPal

    estin

    eG

    uerri

    llagr

    oup

    Pale

    stini

    anIs

    lam

    icJih

    adSu

    nniM

    uslim

    6Ez

    z-al

    -din

    eAl-Q

    assam

    ww

    w.ez

    zede

    en.n

    etA

    gene

    ralp

    orta

    lofI

    zz-E

    deen

    Al-Q

    asam

    Ham

    asSu

    nniM

    uslim

    7H

    izbo

    llah

    ww

    w.hi

    zbol

    lah.

    tvTh

    eoffi

    cial

    Web

    site

    ofH

    izba

    llah

    Org

    aniz

    atio

    nH

    izba

    llah

    Shia

    Mus

    lim8

    Info

    Pale

    stina

    ww

    w.in

    fopa

    lesti

    na.c

    omH

    amas

    info

    rmat

    ion

    and

    new

    sW

    ebSi

    tein

    Mal

    ayH

    amas

    Sunn

    iMus

    lim9

    Kat

    aeb

    AlA

    qsa

    ww

    w.ka

    taeb

    alaq

    sa.c

    omTh

    eoffi

    cial

    Web

    Site

    ofA

    lAqs

    aM

    arty

    rsB

    rigad

    esA

    l-Aqs

    aM

    arty

    rsB

    rigad

    eSe

    cula

    r10

    Kav

    kaz

    ww

    w.ka

    vka

    z.or

    g.uk

    The

    new

    sW

    ebSi

    teo

    fChe

    chen

    guer

    rilla

    fight

    ers

    Isla

    mic

    Inte

    rnat

    iona

    lBrig

    ade,

    Spec

    ialP

    urpo

    seIs

    lam

    icR

    egim

    ent,

    Riy

    adus

    -Sal

    ikhi

    nR

    econ

    naiss

    ance

    and

    Sabo

    tage

    Bat

    talio

    no

    fCh

    eche

    nM

    arty

    rs

    Sunn

    iMus

    lim

    11M

    oqaw

    ama

    ww

    w.m

    oqa

    wam

    a.tv

    Web

    site

    oft

    heH

    izba

    llah

    ssu

    ppor

    tgro

    upH

    izba

    llah

    Shia

    Mus

    lim12

    Nas

    rolla

    hw

    ww

    .n

    asro

    llah.

    org

    Hiz

    balla

    hle

    ader

    ssit

    e(S

    heikh

    Has

    san

    Nas

    rolla

    h)H

    izba

    llah

    Shia

    Mus

    lim13

    Alsh

    ohad

    aw

    ww

    .b-

    alsh

    ohda

    .com

    Web

    site

    ofH

    amas

    and

    Isla

    mic

    Jihad

    dedi

    cate

    dto

    mar

    tyrs

    Ham

    as,P

    ales

    tinia

    nIs

    lam

    icJih

    adSu

    nniM

    uslim

    14Qu

    dsW

    ayw

    ww

    .qu

    dsw

    ay.co

    mPr

    ovid

    esge

    nera

    lnew

    so

    fIsla

    mic

    Jihad

    ofP

    ales

    tine

    Pale

    stini

    anIs

    lam

    icJih

    adSu

    nniM

    uslim

    15R

    antis

    iw

    ww

    .ra

    ntis

    i.net

    Web

    site

    ofA

    bdel

    Azi

    zAlR

    antis

    iaH

    amas

    lead

    erH

    amas

    Sunn

    iMus

    lim16

    Peop

    les

    Moja

    hedin

    of

    Iran

    ww

    w.ira

    n.m

    ojahe

    din.or

    gW

    ebsit

    epos

    ting

    stat

    emen

    tsby

    theP

    eopl

    es

    Moja

    hedin

    Org

    aniz

    atio

    nM

    ujahe

    din-e

    Kha

    lqO

    rgan

    izat

    ion

    Secu

    lar

    17N

    atio

    nalC

    ounc

    ilo

    fR

    esist

    ance

    ofI

    ran

    ww

    w.ira

    nncr

    fac.

    org

    Offi

    cial

    Web

    site

    oft

    heFo

    reig

    nA

    ffairs

    Com

    mitt

    eeo

    fthe

    Nat

    iona

    lCo

    unci

    lofR

    esist

    ance

    ofI

    ran

    Muja

    hedin

    -eK

    halq

    Org

    aniz

    atio

    nSe

    cula

    r

    18Ir

    ania

    nPe

    ople

    sFa

    daee

    Gue

    rrilla

    sw

    ww

    .sia

    hkal

    .com

    The

    mem

    oria

    lWeb

    Site

    oft

    heIr

    ania

    nPe

    ople

    sFa

    daee

    Gue

    rrilla

    sM

    ujahe

    din-e

    Kha

    lqO

    rgan

    izat

    ion

    Secu

    lar

    19Th

    eO

    rgan

    izat

    ion

    of

    Iran

    ian

    Peop

    les

    Feda

    ian

    ww

    w.fa

    dai.o

    rgTh

    eO

    rgan

    izat

    ion

    ofI

    rani

    anPe

    ople

    sFe

    daia

    n(M

    ajority

    )offi

    cial

    Web

    site

    Muja

    hedin

    -eK

    halq

    Org

    aniz

    atio

    nSe

    cula

    r

    20O

    rgan

    izat

    ion

    ofI

    rani

    anPe

    ople

    sFe

    daye

    eG

    uerri

    llas

    ww

    w.fa

    daia

    n.or

    gO

    rgan

    izat

    ion

    ofI

    rani

    anPe

    ople

    sFe

    daye

    eG

    uerri

    llasm

    emo

    rial

    Web

    site

    Muja

    hedin

    -eK

    halq

    Org

    aniz

    atio

    nSe

    cula

    r

    21Th

    eU

    nion

    ofP

    eopl

    es

    Feda

    ian

    ofI

    ran

    ww

    w.et

    ehad

    efed

    aian

    .org

    New

    san

    din

    form

    atio

    nW

    ebsit

    eo

    fthe

    Uni

    ono

    fPeo

    ple

    sFe

    daia

    no

    fIra

    nM

    ujahe

    din-e

    Kha

    lqO

    rgan

    izat

    ion

    Secu

    lar (Con

    tinue

    d)

    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1353DOI: 10.1002/asi

  • TABL

    E1.

    (Con

    tinue

    d)

    No

    Nam

    eU

    RLa

    Des

    crip

    tionb

    Terr

    oris

    tgro

    upc

    Rel

    igio

    n

    22R

    evo

    lutio

    nary

    Peop

    les

    Libe

    ratio

    nFr

    ont

    ww

    w.dh

    kc.n

    etR

    evo

    lutio

    nary

    Peop

    lesL

    iber

    atio

    nFr

    onto

    ffici

    alW

    ebsit

    e.Pr

    ovid

    esnew

    san

    dst

    atem

    ents

    oft

    heorg

    aniz

    atio

    nR

    evo

    lutio

    nary

    Peop

    les

    Libe

    ratio

    nA

    rmy/

    Fron

    tSe

    cula

    r

    23D

    HK

    CIn

    tern

    atio

    nal

    ww

    w.dh

    kc.in

    foW

    ebsit

    eo

    fDH

    KC

    inTu

    rkish

    Rev

    olu

    tiona

    ryPe

    ople

    sLi

    bera

    tion

    Arm

    y/Fr

    ont

    Secu

    lar

    24Cr

    usad

    eB

    egin

    sjor

    gev

    inhe

    do.si

    tes.u

    ol.c

    om.b

    rTh

    eB

    razi

    l-bas

    edW

    ebsit

    elin

    ksto

    Lash

    kar-e

    -Tai

    ba

    a

    terr

    orist

    org

    aniz

    atio

    nba

    sed

    inPa

    kista

    nLa

    shka

    r-eTa

    yyib

    aSu

    nniM

    uslim

    Supp

    orte

    rsW

    ebsit

    es(to

    tal:1

    5)25

    AlA

    nsar

    ww

    w.al

    -ans

    ar.bi

    zPr

    ovid

    essu

    ppor

    tto

    AlQ

    aeda

    org

    aniz

    atio

    n,as

    wel

    las

    artic

    les

    abou

    tthe

    Sala

    fiSu

    nniI

    deol

    ogy

    AlQ

    aeda

    Sunn

    iMus

    lim

    26A

    loka

    bw

    ww

    .al

    okab

    .co

    mPr

    ovid

    esar

    ticle

    sabo

    utth

    eSa

    lafi

    Sunn

    iIde

    olog

    yan

    dth

    eJih

    adist

    movem

    ent

    AlQ

    aeda

    Sunn

    iMus

    lim

    27A

    lsaki

    fah

    Foru

    mw

    ww

    .al

    saki

    fah.

    org

    Prov

    ides

    educ

    atio

    nals

    erv

    ices

    and

    afo

    rum

    dedi

    cate

    dto

    the

    disc

    ussio

    no

    fthe

    Sala

    fiId

    eolo

    gyA

    lQae

    daSu

    nniM

    uslim

    28Ci

    had

    ww

    w.ci

    had.

    net

    Age

    nera

    lJih

    adW

    ebsit

    epr

    ovid

    ing

    info

    rmat

    ion

    abou

    tall

    Jihad

    activ

    ities

    aro

    un

    dth

    ew

    orld

    AlQ

    aeda

    Sunn

    iMus

    lim

    29Cl

    earG

    uida

    nceF

    oru

    mw

    ww

    .cl

    earg

    uida

    nce.

    com

    Foru

    mo

    fJih

    adsu

    ppor

    ters

    AlQ

    aeda

    Sunn

    iMus

    lim30

    Shei

    khH

    amid

    Bin

    Abd

    alla

    hA

    lAli

    ww

    w.h-

    alal

    i.net

    Sala

    fiEd

    ucat

    iona

    lWeb

    site

    with

    som

    eJih

    adid

    eas

    AlQ

    aeda

    Sunn

    iMus

    lim

    31Jih

    adun

    spun

    ww

    w.jih

    adun

    spun.c

    omPr

    o-Jih

    adn

    ews

    Web

    site

    AlQ

    aeda

    Sunn

    iMus

    lim32

    Mak

    tab-

    Al-J

    ihad

    ww

    w.m

    akta

    b-al

    -jihad

    .com

    Pro-

    Jihad

    new

    sW

    ebsit

    eA

    lQae

    daSu

    nniM

    uslim

    33Qo

    qaz

    ww

    w.qo

    qaz.

    com

    Jihad

    new

    sfro

    mth

    eCa

    ucas

    usIs

    lam

    icIn

    tern

    atio

    nalB

    rigad

    e,Sp

    ecia

    lPur

    pose

    Isla

    mic

    Reg

    imen

    t,R

    iyad

    us-S

    alik

    hin

    Rec

    onna

    issan

    cean

    dSa

    bota

    geB

    atta

    lion

    of

    Chec

    hen

    Mar

    tyrs

    Sunn

    iMus

    lim

    34Su

    ppor

    ters

    of

    Shar

    eeah

    ww

    w.sh

    aree

    ah.o

    rgA

    gene

    ralp

    orta

    lded

    icat

    edto

    the

    Jihad

    istm

    ovem

    ent

    AlQ

    aeda

    Sunn

    iMus

    lim

    35M

    olta

    qaw

    ww

    .al

    mol

    taqa

    .org

    Ham

    asFo

    rum

    Ham

    asSu

    nniM

    uslim

    36Sa

    raya

    ww

    w.sa

    raya

    .com

    Pro-

    Jihad

    Web

    site

    AlQ

    aeda

    Sunn

    iMus

    lim37

    Osa

    ma

    Bin

    Lade

    n1o

    sam

    abin

    lade

    n.5u

    .com

    AW

    ebsit

    ede

    dica

    ted

    toO

    sam

    aB

    inLa

    den

    AlQ

    aeda

    Sunn

    iMus

    lim38

    Taw

    hed

    ww

    w.ta

    whe

    d.w

    sPr

    o-Jih

    adW

    ebsit

    eA

    lQae

    daSu

    nniM

    uslim

    39Th

    eR

    ight

    Wo

    rdw

    ww

    .rig

    htw

    ord

    .net

    Pro-

    AlQ

    aeda

    Web

    Porta

    lA

    lQae

    daSu

    nniM

    uslim

    aSo

    me

    oft

    heU

    RLsa

    nd

    sites

    may

    have

    been

    chan

    ged

    atth

    etim

    eo

    frea

    ding

    due

    toth

    era

    pid

    chan

    geo

    fthe

    Dar

    kW

    eb.

    b The

    desc

    riptio

    nsar

    eo

    btai

    ned

    from

    theW

    ebsit

    es.

    cD

    escr

    iptio

    nso

    fthe

    sete

    rror

    istgr

    oups

    appe

    arin

    the

    U.S

    .Dep

    artm

    ento

    fSta

    teR

    epor

    tPa

    ttern

    ofG

    loba

    lTer

    rori

    sm,2

    002.

    1354 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi

  • TABL

    E2.

    Cate

    gorie

    soft

    erro

    ristu

    seo

    fthe

    Web

    and

    Web

    site

    attr

    ibu

    tes.

    Cate

    gory

    Attr

    ibu

    teD

    escr

    iptio

    n

    Com

    mun

    icat

    ions

    E-m

    ail

    Any

    liste

    dem

    aila

    ddre

    sso

    rfe

    edba

    ckfo

    rm.

    Tele

    phon

    e(in

    cludin

    gWeb

    phon

    e)Te

    leph

    one

    num

    bers

    ofo

    rgan

    izat

    ion

    offi

    cial

    s.M

    ultim

    edia

    tool

    sVi

    deo

    clip

    sofb

    ombi

    ngsa

    nd

    oth

    erac

    tiviti

    es.V

    ideo

    ,so

    un

    dre

    cord

    ing

    &ga

    me

    (e.g.,

    lead

    ers

    mes

    sage

    san

    din

    struc

    tions

    ).O

    nlin

    efe

    edba

    ckfo

    rmA

    llow

    the

    use

    rto

    give

    feed

    back

    or

    ask

    ques

    tions

    toth

    eWeb

    site

    ow

    ner

    san

    dm

    aint

    aine

    rs.

    Doc

    umen

    tatio

    nR

    epor

    t,bo

    ok,l

    ette

    r,m

    emo

    and

    oth

    erre

    sou

    rces

    prov

    ided

    (e.g.,

    inpd

    f,W

    ord

    ,an

    dEx

    celf

    orm

    ats).

    Fund

    raisi

    ngEx

    tern

    alai

    dm

    entio

    ned

    Oth

    ergr

    oups

    or

    gover

    nm

    ents

    supp

    ortin

    gth

    eorg

    aniz

    atio

    n.Fu

    ndtr

    ansf

    erFu

    ndtr

    ansf

    erm

    etho

    ds.

    Don

    atio

    nD

    onat

    ions

    un

    dert

    hefo

    rmo

    fdire

    ctba

    nkde

    posit

    s.Ch

    arity

    Don

    atio

    nsto

    relig

    ious

    wel

    fare

    org

    aniz

    atio

    nsas

    soci

    ated

    with

    terr

    orist

    org

    aniz

    atio

    n.Su

    ppor

    tgro

    ups

    Subo

    rgan

    izat

    iona

    lstr

    uctu

    res

    char

    ged

    with

    the

    fund

    raisi

    ngpr

    ogra

    m.

    Oth

    ers

    Oth

    erat

    trib

    ute

    sbe

    long

    ing

    toth

    isca

    tego

    ry.

    Shar

    ing

    ideo

    logy

    Miss

    ion

    The

    majo

    rgoa

    lso

    fthe

    org

    aniz

    atio

    n(e.

    g.,de

    struc

    tion

    ofa

    nen

    emy

    stat

    e,lib

    erat

    ion

    ofo

    ccu

    pied

    terr

    itorie

    s).D

    octri

    neTh

    ebe

    liefs

    oft

    hegr

    oup

    (e.g.,

    relig

    ious

    ,co

    mm

    un

    ist,e

    xtr

    eme

    right

    ).Ju

    stific

    atio

    no

    fthe

    use

    ofv

    iole

    nce

    Ideo

    logy

    con

    done

    sthe

    use

    ofv

    iole

    nce

    toac

    com

    plish

    goal

    s(e.g

    .,su

    icid

    ebo

    mbi

    ng).

    Pinp

    oint

    ing

    enem

    ies

    Clas

    sifies

    oth

    ersa

    sei

    ther

    enem

    ieso

    rfri

    ends

    (e.g.,

    U.S

    .ise

    nem

    y,Ta

    liban

    regi

    me

    isfri

    endl

    y).Pr

    opag

    anda

    (insid

    ers)

    Slog

    ans

    Shor

    tphr

    ases

    with

    relig

    ious

    or

    ideo

    logi

    calc

    onnota

    tions

    .D

    ates

    Men

    tions

    date

    sin

    the

    histo

    ryo

    fthe

    terr

    orist

    grou

    p,su

    chas

    the

    date

    ofa

    majo

    ratta

    ck.

    Mar

    tyrs

    desc

    riptio

    nLi

    ststh

    en

    ames

    ofm

    embe

    rsw

    hodi

    edin

    terr

    orism

    rela

    ted

    ope

    ratio

    nso

    rde

    scrip

    tions

    oft

    heci

    rcum

    stanc

    es.

    Lead

    ers

    nam

    e(s)

    Terr

    oris

    tgro

    upsl

    eade

    r(s)n

    ame

    ascl

    aim

    edby

    theW

    ebsit

    e.B

    anne

    rand

    seal

    Ban

    nerd

    epic

    ting

    repr

    esen

    tativ

    efig

    ures

    ,gra

    phic

    alsy

    mbo

    ls,or

    seal

    soft

    heorg

    aniz

    atio

    n.N

    arra

    tives

    ofo

    pera

    tions

    and

    even

    tsPr

    ovid

    esn

    arra

    tives

    oft

    heo

    pera

    tions

    and

    atta

    ckso

    fthe

    grou

    p.O

    ther

    sO

    ther

    attr

    ibu

    tes

    belo

    ngin

    gto

    this

    cate

    gory

    .

    Prop

    agan

    da(ou

    tside

    rs)R

    efer

    ence

    tom

    edia

    cover

    age

    For

    exam

    ple,

    theW

    ebsit

    ecr

    itici

    zesW

    este

    rnm

    edia

    cover

    age

    ofe

    ven

    tsw

    ithex

    plic

    itm

    entio

    no

    foutle

    tso

    feven

    tssu

    chas

    CNN

    ,CBS

    .N

    ews

    repo

    rting

    Gro

    ups

    ow

    nin

    terp

    reta

    tion

    ofe

    ven

    ts.

    Virtu

    alco

    mm

    un

    ityLi

    stser

    vA

    utom

    atic

    mai

    ling

    lists

    erver

    that

    broa

    dcas

    tsto

    ever

    yone

    on

    the

    list.

    Tex

    tcha

    tro

    om

    Virtu

    alro

    om

    whe

    rea

    chat

    sess

    ion

    take

    spl

    ace.

    Tex

    tmes

    sagi

    ngch

    atse

    ssio

    nsu

    chas

    ICQ.

    Mes

    sage

    boar

    dA

    llow

    sm

    embe

    rsto

    post

    and

    read

    mes

    sage

    son

    line.

    Web

    ring

    Ase

    rieso

    fweb

    sites

    linke

    dto

    geth

    erin

    arin

    gth

    atby

    clic

    king

    thro

    ugh

    allo

    fthe

    sites

    inth

    erin

    gth

    ev

    isito

    rw

    illev

    entu

    ally

    com

    eba

    ckto

    the

    orig

    inat

    ing

    site.

    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1355DOI: 10.1002/asi

  • 11

    12

    27

    7

    5

    13

    15

    14

    9

    6

    36

    37

    2628

    34

    29

    35

    3332

    ktab-al-jihad.com

    1. Hizballah Cluster

    5. Jihad Supporters

    6. Hizb-Ut-Tahrir Cluster

    2. Palestinian Cluster

    4

    30

    4. Caucasian Cluster

    31

    3. Al-Qaeda Cluster

    FIG. 2. Clustering and visualization of terrorist Web sites (The numbers refer to those appearing in Table 1)*.

    The activity index of Cluster c on dimension d was calculatedby the following formula:

    Activity Index (c, d) =

    ni

    mj

    wi,j

    m n

    where wi,j ={

    1 attribute i occurs in Web site j0 otherwise

    n = total number of attributes in the specified dimension d;m = total number of Web sites belonging to the specifiedCluster c.

    The closer the activity index is to 1, the more active acluster is on that dimension. This index reveals in what areasthe terrorist groups are active and hence provides investiga-tors and analysts with clues about how to devise strategies tocombat a group.

    4.2. Results and Discussions

    Our preliminary observations show that the methodol-ogy yielded promising results. For example, it identifiedWeb sites affiliated with 10 of the 26 groups classified asJihad terrorist organizations in the U.S. State Departmentreport on terrorism. Al-Ansar.biz (# 26), the site that posted

    the beheading video of Nicholas Berg, posted messagesfrom Al Qaeda leaders such as Osama Bin Laden, AymanAl-Zawahiri, and Al-Zarqawi, praising their attacks on ene-mies. Another site, Tawhed.com (site 39), posted a poempraising the 9/11 attacks. The rhetoric of the poem commonlyappears in many Al Qaeda affiliated Web sites, referring tothe Americans as crusaders ( ). Words like Sunna andJamah ( ) reflect the branch of Islam to which theSalafi groups belong.

    From the snowflake diagrams (Figure 3), we found thatterrorists and supporters use the Web heavily to share ideol-ogy and to propagate ideas, especially to their members. Forexample, the Palestinian cluster (Cluster 2) actively sharesits ideology and heavily uses the Web as a propaganda toolfor members. The Web sites in this cluster support libera-tion of Palestine, pinpoint and criticize their enemies, anddescribe details of operations and rationales supported byQuaran verses. In contrast, Jihad supporters (Custer 5) rarelyuse the Web for propaganda but share ideology and com-municate there. The Hizbollah cluster (Cluster 1) resemblesthe Palestinian cluster in heavy use of the Web for sharingideology and insider propaganda. For example, the sites inthis cluster glorify martyrs and leaders and also were usedmoderately for outsider propaganda and communications.In all the five clusters, we found little evidence of using the

    1356 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi

  • Communications

    Fundraising

    Sharingideology

    Propaganda(insiders)

    Propaganda(outsiders)

    Virtualcommunity

    Cluster 1: Hizballah Cluster

    0.53

    0.20

    0.92

    0.72

    0.50

    0.13

    Communications

    Fundraising

    Sharingideology

    Propaganda(insiders)

    Propaganda(outsiders)

    Virtualcommunity

    Cluster 2: Palestinian Cluster

    0.43

    0.10

    0.81

    0.81

    0.44

    0.35

    Communications

    Fundraising

    Sharingideology

    Propaganda(insiders)

    Propaganda(outsiders)

    Virtualcommunity

    Cluster 3: Al-Qaeda Cluster

    0.52

    0.12

    0.85

    0.30

    0.30

    0.32

    Communications

    Fundraising

    Sharingideology

    Propaganda(insiders)

    Propaganda(outsiders)

    Virtualcommunity

    Cluster 4: Caucasian Cluster

    0.60

    0.10

    0.50

    0.50

    0.50

    0.40

    Communications

    Fundraising

    Sharingideology

    Propaganda(insiders)

    Propaganda(outsiders)

    Virtualcommunity

    Cluster 5: Jihad Supporters

    0.40

    0.05

    0.500.210.38

    0.20

    FIG. 3. Snowflake visualization of five terrorist site clusters.

    Web for fundraising or building a virtual community. Prob-ably such uses have gone underground or do not appear onthe Web.

    4.3. Expert Evaluation and Results

    Based on the above results, we have invited a terrorismexpert to conduct an evaluation of the methodology. A seniorfellow of the U.S. Institute of Peace at Washington D.C., theexpert is a professor of communication in a major researchuniversity in Israel. Having expertise in modern terrorism andthe Internet, he has published more than 80 refereed journalarticles and books and is a frequent speaker at internationalconferences on counter terrorism. This expert also leads ateam of about 16 research assistants who regularly moni-tor 4,300 sites on the Dark Web for terrorist activities. Theapproach he and his team use to collect and analyze terror-ists use of the Web is largely manual, relying on laborioushuman browsing and monitoring of selected Web sites. Hisexperience in manual analysis served to contrast with ourmethodology that automated part of the DarkWeb data collec-tion and analysis. We decided to use expert validation insteadof other evaluation methods because of two reasons: (1)Lab experiment is not suitable because typical experimental

    subjects do not have much knowledge in the Dark Web, and(2) it is not feasible to invite terrorists to participate in an inter-view or empirical evaluation. The expert was not involved inwriting this article.

    The evaluation was conducted using an unbiased struc-tured questionnaire and a formal procedure. We showed theresults to our expert and asked him to provide detailed com-ments on the categorization of Web sites and attributes, thevisualization and clustering of terrorist groups, and the usabil-ity of the snowflake visualization. In general, he deemed theresults to be very promising and the methodology designto be excellent. He believed that this was the start of avery important research that will result in a useful databaseand a reliable methodology to update and maintain thedatabase.

    The expert was greatly impressed by the visualization andclustering capabilities of the methodology, and he providedvaluable comments on our work. However, he said that the39 Web sites shown in Table 1 do not represent the entirepopulation of all terrorist Web sites, the number of whichhe estimated to be over four thousands. Because we focusedonly on Middle Eastern terrorist groups (rather than all ter-rorist groups in the world), we believe that our methodologyhas yielded representative results and has automated much

    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1357DOI: 10.1002/asi

  • of the manual work of identifying and analyzing terroristWeb sites. He suggested adding qualitative measures such aspersuasive appeals, rhetoric, and attribution of guilt to theWeb site attributes shown in Table 2. We believe that theseimportant attributes are difficult to be incorporated intothe automated processing of our methodology because oftheir qualitative nature. He considered the clustering and visu-alization shown in Figure 2 to be very important because of itsusefulness to investigation of terrorist activities on the Web.He called the snowflake visualization very accurate and veryuseful to investigation of terrorist Web sites but criticized theway we created linkages among Web sites. He suggested con-sidering textual citations and other references in addition tousing only hyperlinks.

    Overall, the expert agreed that the results were verypromising because they offer useful investigation leads andwould be very helpful to improve understanding of terror-ist activities on the Web. Because of the high qualificationand relevant experience of this expert, we believe that theevaluation results can accurately reflect the effectiveness ofthe methodology. These results also contributed to advanc-ing the ISI discipline by showing the applicability of themethodology to Dark Web data collection and analysis.

    5. Conclusions and Future DirectionsCollecting and analyzing Dark Web information has chal-

    lenged investigators and researchers because terrorists caneasily hide their identities and remove traces of their activi-ties on the Web. The abundance of Web information has madeit difficult to obtain a comprehensive picture of terroristsactivities. In this article, we have proposed a methodology toaddress these problems. Using advanced Web mining, con-tent analysis, visualization techniques, and human domainknowledge, the methodology exploited various informationsources to identify and analyze 39 Jihad Web sites. Infor-mation visualization was used to help to identify terroristclusters and to understand terrorist use of the Web. Our expertevaluation showed that the methodology yielded promisingresults that would be very useful to assist investigation of ter-rorism. The expert considered the visualization results veryuseful, having potential to guide policymaking and intelli-gence research. Therefore, this research has contributed todeveloping a useful methodology for collecting and analyzingDark Web information, applying the methodology to study-ing and analyzing 39 Jihad Web sites, and providing formalevaluation results of the usability of the methodology.

    We are pursuing a number of directions to further ourresearch. As terrorists often change their Web sites to removetraces of their activities, we plan to archive the Dark Web con-tent digitally and apply our methodology to tracing terroristactivities over time. We will develop scalable techniques tocollect such volatile yet valuable content to visualize largevolumes of Dark Web data and extract meaningful entitiesfrom terrorist Web sites. These efforts will help investigatorstrace and prevent terrorist attacks.

    6. AcknowledgmentsThis research was partly supported by funding from the

    U.S. Government Department of Homeland Security andCorporation for National Research Initiatives and by SantaClara University. We thank contributing members of theUniversity of Arizona Artificial Intelligence Lab for theirsupport and assistance.

    ReferencesAnti-Defamation League. (2002). Jihad Online: Islamic Terrorists and the

    Internet, retrieved March 26, 2008 from http://www.adl.org/internet/jihad_online.pdf.

    Blakemore, B. (November 23, 2004). Web posting may provide insightinto Iraq insurgency. ABC News, retrieved March 26, 2008 from http://abcnews.go.com/WNT/story?id=277421.

    Carley, Kathleen M. Ju-Sung Lee and David Krackhardt, 2001, DestabilizingNetworks, Connections, 24(3): 3134.

    Chambers, J., Cleveland, W., Kleiner, B., & Tukey, P. (1983). Graphicalmethods for data analysis. Wadsworth International Group (Belmont, CA)and Duxbury Press (Boston, MA).

    Chen, H. (2005). Introduction to the special topic issue: Intelligence andsecurity informatics. Journal of the American Society for InformationScience and Technology, 56(3), 217220.

    Chen, H., & Chau, M. (2004). Web mining: Machine learning for Web appli-cations. In M. E. Williams (Ed.),Annual review of information science andtechnology (Vol. 38, pp. 289329). Medford, NJ: Information Today, Inc.

    Chen, H., Fan, H., Chau, M., & Zeng, D. (2001). MetaSpider: Meta-searchingand categorization on the Web. Journal of the American Society forInformation Science and Technology, 52(13), 11341147.

    Chen, H., Schuffels, C., & Orwig, R. (1996). Internet categorization andsearch: A self-organizing approach. Journal of Visual Communicationand Image Representation, 7(1), 88102.

    Chung, W. (2008). Visualizing E-Business Stakeholders on the Web: AMethodology and Experimental Results. International Journal of Elec-tronic Business, 6(1), 2008, 2546.

    Chung, W., Chen, H., Chaboya, L.G., OToole, C., & Atabakhsh, H. (2005).Evaluating event visualization: A usability study of COPLINK Spatio-Temporal Visualizer. International Journal of Human-Computer Studies,62(1), 127157.

    Chung, W., Chen, H., & Nunamaker, J.F. (2005). A visual framework forknowledge discovery on the Web: An empirical study on business intelli-gence exploration. Journal of Management Information Systems, 21(4),5784.

    Chung, W., Zhang, Y., Huang, Z., Wang, G., Ong, T.-H., & Chen, H. (2004).Internet searching and browsing in a multilingual world:An experiment onthe Chinese business intelligence portal (CBizPort). Journal of the Amer-ican Society for Information Science and Technology, 55(9), 818831.

    Department of State. (2003). Patterns of Global Terrorism 2002: The UnitedStates Government, retrieved March 26, 2008 from http://www.state.gov/s/ct/rls/crt/2002/.

    Encyclopedia Britannica Online. (2007). Jihad. Retrieved March 26, 2008from http://www.britannica.com/ebc/article-9368558, Britannica ConciseEncyclopedia.

    Eom, S.B., & Farris, R.S. (1996). The contributions of organizational scienceto the development of decision support systems research subspecial-ties. Journal of the American Society for Information Science, 47(12),941952.

    Etzioni, O. (1996). The World Wide Web: Quagmire or gold mine?Communications of the ACM, 39(11), 6568.

    Gellman, B. (June 27, 2002). Cyber-attacks by Al Qaeda feared.Washington Post, page A01, retrieved March 26, 2008 from http://www.washingtonpost.com/ac2/wp-dyn/A50765-2002Jun26.

    He, Y., & Hui, S.C. (2002). Mining a Web citation database for authorco-citation analysis. Information Processing and Management, 38(4),491508.

    1358 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi

  • Kealy, W.A. (2001). Knowledge maps and their use in computer-based col-laborative learning. Journal of Educational Computing Research, 25(4),325349.

    Kelley, J. (July 10, 2002). Militants Wire Web With Links to Jihad. USAToday, retrieved March 26, 2008 from http://www.usatoday.com/news/world/2002/07/10/web-terror-cover.htm.

    Krebs, V.E. (2001). Mapping network of terrorist cells. Connections, 24(3),4352.

    La Porte, T. M., Jong, M. d., & Demchak, C. C. (1999). Public Organi-zations on the World Wide Web: Empirical Correlates of AdministrativeOpenness. Paper presented at the Proceedings of the 5th National PublicManagement Research conference, College Station, TX.

    Last, M., Markov, A., & Kandel, A. (2006). Multi-Lingual Detection ofTerrorist Content on the Web. Paper presented at the Proceedings of thePAKDD06 International Workshop on Intelligence and Security Infor-matics, Singapore, Springer, Berlin / Heidelberg, pp. 1630.

    Marshall, B., McDonald, D., Chen, H., & Chung, W. (2004). EBizPort: col-lecting and analyzing business intelligence information. Journal of theAmerican Society for Information and Science and Technology, 55(10),873891.

    Middle East Media Research Institute. (2004). Jihad and Terrorism Stud-ies Project. Retrieved March 2004, retrieved March 26, 2008 from http://www.memri.org/jihad.html.

    Mladenic, D. (1998). Turning Yahoo into an automatic web page classifier.Paper presented at the Proceedings of the 13 European Conference onArtificial Intelligence, Brighton, UK.

    Nasukawa, T., & Nagano, T. (2001). Text analysis and knowledge miningsystem. IBM Systems Journal, 40(4), 967984.

    Newman, M. (2004, May 11). Video appears to show beheading ofAmericancivilian. The New York Times.

    Popp, R., Armour, T., Senator, T., & Numrych, K. (2004). Countering terror-ism through information technology. Communications of theACM, 47(3),3643.

    Project for the Research of Islamist Movements. (2004). PRISM, 2004,retrieved March 26, 2008 from http://www.e-prism.org.

    Sageman, M. (2004). Understanding terror networks. Philadelphia, PA:University of Pennsylvania Press.

    Strickland, L.S., & Hunt, L.E. (2005). Technology, security, and individ-ual privacy: New tools, new threats, and new public perceptions. Journalof the American Society for Information Science and Technology, 56(3),221234.

    Technical Analysis Group. (2004). Examining the cyber capabilities ofIslamic terrorist groups. Hanover, NH: Institute for Security TechnologyStudies at Dartmouth College.

    Thomas, T.L. (2003, Spring). Al Qaeda and the Internet: The danger ofcyberplanning. Parameters, 112123.

    Trybula, W.J. (1999). Text mining. In M.E. Williams (Ed.), Annual reviewof information science and technology (Vol. 34, pp. 385419). Medford,NJ: Information Today, Inc.

    Tsfati, Y., & Weimann, G. (2002). retrieved March 26, 2008 fromhttp://www.terrorism.com/, Terror on the Internet. Studies in Conflict &Terrorism, 25, 317332.

    Xu, J., & Chen, H. (2005). Criminal network analysis and visualization.Communications of the ACM, 48(6), 100107.

    Young, F.W. (1987). Multidimensional scaling: History, theory, and applica-tions. Hillsdale, NJ: Lawrence Erlbaum Associates.

    Zhu, B., & Chen, H. (2005). Chapter 4: Information Visualization. InB. Cronin (Ed.), Annual Review of Information Science and Technology(Vol. 39, pp. 139177). Medford, NJ: Information Today, Inc.

    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1359DOI: 10.1002/asi