15
Women in ISIS Propaganda: A Natural Language Processing Analysis of Topics and Emotions in a Comparison with Mainstream Religious Group Mojtaba Heidarysafa 1 , Kamran Kowsari 1 , Tolu Odukoya 2 , Philip Potter 3 , Laura E. Barnes 1,3 , and Donald E. Brown 1,3 1 Department of Systems & Information Engineering, University of Virginia, Charlottesville, VA, USA 2 School of Data Science, University of Virginia, Charlottesville, VA, USA 3 Department of Politics, University of Virginia, Charlottesville, VA, USA * Co-corresponding authors: [email protected] Abstract. Online propaganda is central to the recruitment strategies of extremist groups and in recent years these efforts have increasingly ex- tended to women. To investigate ISIS’ approach to targeting women in their online propaganda and uncover implications for counterterrorism, we rely on text mining and natural language processing (NLP). Specifi- cally, we extract articles published in Dabiq and Rumiyah (ISIS’s online English language publications) to identify prominent topics. To iden- tify similarities or differences between these texts and those produced by non-violent religious groups, we extend the analysis to articles from a Catholic forum dedicated to women. We also perform an emotional analysis of both of these resources to better understand the emotional components of propaganda. We rely on Depechemood (a lexical-base emotion analysis method) to detect emotions most likely to be evoked in readers of these materials. The findings indicate that the emotional appeal of ISIS and Catholic materials are similar. Keywords: ISIS propaganda, Topic modeling, Emotion detection, Nat- ural language processing 1 Introduction Since its rise in 2013, the Islamic State of Iraq and Syria (ISIS) has utilized the Internet to spread its ideology, radicalize individuals, and recruit them to their cause. In comparison to other Islamic extremist groups, ISIS’ use of technology was more sophisticated, voluminous, and targeted. For example, during ISIS’ advance toward Mosul, ISIS related accounts tweeted some 40,000 tweets in one day [6].However, this heavy engagement forced social media platforms to institute policies to prevent unchecked dissemination of terrorist propaganda to their users, forcing ISIS to adapt to other means to reach their target audience. arXiv:1912.03804v1 [cs.CL] 9 Dec 2019

Women in ISIS Propaganda: A Natural Language Processing

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Women in ISIS Propaganda: A Natural Language Processing

Women in ISIS Propaganda: A NaturalLanguage Processing Analysis of Topics andEmotions in a Comparison with Mainstream

Religious Group

Mojtaba Heidarysafa1, Kamran Kowsari1, Tolu Odukoya2, Philip Potter3,Laura E. Barnes1,3, and Donald E. Brown1,3

1 Department of Systems & Information Engineering, University of Virginia,Charlottesville, VA, USA

2 School of Data Science, University of Virginia, Charlottesville, VA, USA3 Department of Politics, University of Virginia, Charlottesville, VA, USA

∗ Co-corresponding authors: [email protected]

Abstract. Online propaganda is central to the recruitment strategies ofextremist groups and in recent years these efforts have increasingly ex-tended to women. To investigate ISIS’ approach to targeting women intheir online propaganda and uncover implications for counterterrorism,we rely on text mining and natural language processing (NLP). Specifi-cally, we extract articles published in Dabiq and Rumiyah (ISIS’s onlineEnglish language publications) to identify prominent topics. To iden-tify similarities or differences between these texts and those producedby non-violent religious groups, we extend the analysis to articles froma Catholic forum dedicated to women. We also perform an emotionalanalysis of both of these resources to better understand the emotionalcomponents of propaganda. We rely on Depechemood (a lexical-baseemotion analysis method) to detect emotions most likely to be evokedin readers of these materials. The findings indicate that the emotionalappeal of ISIS and Catholic materials are similar.

Keywords: ISIS propaganda, Topic modeling, Emotion detection, Nat-ural language processing

1 Introduction

Since its rise in 2013, the Islamic State of Iraq and Syria (ISIS) has utilized theInternet to spread its ideology, radicalize individuals, and recruit them to theircause. In comparison to other Islamic extremist groups, ISIS’ use of technologywas more sophisticated, voluminous, and targeted. For example, during ISIS’advance toward Mosul, ISIS related accounts tweeted some 40,000 tweets inone day [6].However, this heavy engagement forced social media platforms toinstitute policies to prevent unchecked dissemination of terrorist propaganda totheir users, forcing ISIS to adapt to other means to reach their target audience.

arX

iv:1

912.

0380

4v1

[cs

.CL

] 9

Dec

201

9

Page 2: Women in ISIS Propaganda: A Natural Language Processing

2 M. Heidarysafa et al.

One such approach was the publication of online magazines in different lan-guages including English. Although discontinued now, these online resources pro-vided a window into ISIS ideology, recruitment, and how they wanted the worldto perceive them. For example, after predominantly recruiting men, ISIS beganto also include articles in their magazines that specifically addressed women.ISIS encouraged women to join the group by either traveling to the caliphate orby carrying out domestic attacks on behalf of ISIS in their respective countries.This tactical change concerned both practitioners and researchers in the coun-terterrorism community. New advancements in data science can shed light onexactly how the targeting of women in extremist propaganda works and whetherit differs significantly from mainstream religious rhetoric.

We utilize natural language processing methods to answer three questions:

– What are the main topics in women-related articles in ISIS’ online maga-zines?

– What similarities and/or differences do these topics have with non-violent,non-Islamic religious material addressed specifically to women?

– What kind of emotions do these articles evoke in their readers and are theresimilarities in the emotions evoked from both ISIS and non-violent religiousmaterials?

As these questions suggest, to understand what, if anything, makes extremist ap-peals distinctive, we need a point of comparison in terms of the outreach effortsto women from a mainstream, non-violent religious group. For this purpose, werely on an online Catholic women’s forum. Comparison between Catholic mate-rial and the content of ISIS’ online magazines allows for novel insight into thedistinctiveness of extremist rhetoric when targeted towards the female popula-tion. To accomplish this task, we employ topic modeling and an unsupervisedemotion detection method.

The rest of the paper is organized as follows: in Section 2, we review relatedworks on ISIS propaganda and applications of natural language methods. Sec-tion 3 describes data collection and pre-processing. Section 4 describes in detailthe approach. Section 5 reports the results, and finally, Section 6 presents theconclusion.

2 Related Work

Soon after ISIS emerged and declared its caliphate, counterterrorism practi-tioners and political science researchers started to turn their attention towardsunderstanding how the group operated. Researchers investigated the origins ofISIS, its leadership, funding, and how they rose became a globally dominant non-state actor [12]. This interest in the organization’s distinctiveness immediatelyled to inquiries into ISIS’ rhetoric, particularly their use of social media andonline resources in recruitment and ideological dissemination. For example, Al-Tamimi examines how ISIS differentiated itself from other jihadist movementsby using social media with unprecedented efficiency to improve its image with

Page 3: Women in ISIS Propaganda: A Natural Language Processing

Women in ISIS Propaganda: A Natural Language Processing Analysis 3

locals [2]. One of ISIS’ most impressive applications of its online prowess wasin the recruitment process. The organization has used a variety of materials,especially videos, to recruit both foreign and local fighters. Research shows thatISIS propaganda is designed to portray the organization as a provider of justice,governance, and development in a fashion that resonates with young western-ers [7]. This propaganda machine has become a significant area of research, withscholars such as Winter identifying key themes in it such as brutality, mercy,victimhood, war, belonging and utopianism. [26]. However, there has been in-sufficient attention focused on how these approaches have particularly targetedand impacted women. This is significant given that scholars have identified thedistinctiveness of this population when it comes to nearly all facets of terrorism.

ISIS used different types of media to propagate its messages, such as videos,images, texts, and even music. Twitter was particularly effective and the ArabicTwitter app allowed ISIS to tweet extensively without triggering spam-detectionmechanisms the platform uses [6]. Scholars followed the resulting trove of dataand this became the preeminent way in which they assess ISIS messages. Forexample, in [4] they use both lexical analysis of tweets as well as social networkanalysis to examine ISIS support or opposition on Twitter. Other researchersused data mining techniques to detect pro-ISIS user divergence behavior at vari-ous points in time [18]. By looking at these works, the impact of using text min-ing and lexical analysis to address important questions becomes obvious. Properusage of these tools allows the research community to analyze big chunks of un-structured data. This approach, however, became less productive as the socialmedia networks began cracking down and ISIS recruiters moved off of them.

With their ability to operate freely on social media now curtailed, ISIS re-cruiters and propagandists increased their attentiveness to another longstandingtool–English language online magazines targeting western audiences. Al Hayat,the media wing of ISIS, published multiple online magazines in different lan-guages including English. The English online magazine of ISIS was named Dabiqand first appeared on the dark web on July 2014 and continued publishing for15 issues. This publication was followed by Rumiyah which produced 13 Englishlanguage issues through September 2017. The content of these magazines pro-vides a valuable but underutilized resource for understanding ISIS strategies andhow they appeal to recruits, specifically English-speaking audiences. They alsoprovide a way to compare ISIS’ approach with other radical groups. Ingram com-pared Dabiq contents with Inspire (Al Qaeda publication) and suggested that AlQaeda heavily emphasized identity-choice, while ISIS’ messages were more bal-anced between identity-choice and rational-choice [8]. In another research paper,Wignell et al. [25] compared Dabiq and Rumiah by examining their style andwhat both magazine messages emphasized. Despite the volume of research onthese magazines, only a few researchers used lexical analysis and mostly reliedon experts’ opinions. [23] is one exception to this approach where they used wordfrequency on 11 issues of Dabiq publications and compared attributes such asanger, anxiety, power, motive, etc.

Page 4: Women in ISIS Propaganda: A Natural Language Processing

4 M. Heidarysafa et al.

This paper seeks to establish how ISIS specifically tailored propaganda tar-geting western women, who became a particular target for the organization asthe “caliphate” expanded. Although the number of recruits is unknown, in 2015it was estimated that around 10 percent of all western recruits were female [15].Some researchers have attempted to understand how ISIS propaganda targetswomen. Kneip, for example, analyzed women’s desire to join as a form of eman-cipation [9]. We extend that line of inquiry by leveraging technology to answerkey outstanding questions about the targeting of women in ISIS propaganda.

To further assess how ISIS propaganda might affect women, we used emotiondetection methods on these texts. Emotion detection techniques are mostly di-vided into lexicon-base or machine learning-base methods. Lexicon-base methodsrely on several lexicons while machine learning (ML) methods use algorithm todetect the elation of texts as inputs and emotions as the target, usually trained ona large corpus. Unsupervised methods usually use Non-negative matrix factoriza-tion (NMF) and Latent Semantic Analysis (LSA) [5] approaches. An importantdistinction that should be made when using text for emotion detection is thatemotion detected in the text and the emotion evoked in the reader of that textmight differ. In the case of propaganda, it is more desirable to detect possibleemotions that will be evoked in a hypothetical reader. In the next section, wedescribe methods to analyze content and technique to find evoked emotions ina potential reader using available natural language processing tools.

3 Data Collection & Pre-Processing

3.1 Data collection

Finding useful collections of texts where ISIS targets women is a challenging task.Most of the available material are not reflecting ISIS’ official point of view orthey do not talk specifically about women. However, ISIS’ online magazines arevaluable resources for understanding how the organization attempts to appealto western audiences, particularly women. Looking through both Dabiq andRumiyah, many issues of the magazines contain articles specifically addressingwomen, usually with “ to our sisters ” incorporated into the title. Seven out offifteen Dabiq issues and all thirteen issues of Rumiyah contain articles targetingwomen, clearly suggesting an increase in attention to women over time.

We converted all the ISIS magazines to texts using pdf readers and all ar-ticles that addressed women in both magazines (20 articles) were selected forour analysis. To facilitate comparison with a mainstream, non-violent religiousgroup, we collected articles from catholicwomensforum.org, an online resourcecatering to Catholic women. We scrapped 132 articles from this domain. Whilethis number is large, the articles themselves are much shorter than those pub-lished by ISIS. These texts were pre-processed by tokenizing the sentences andeliminating non-word tokens and punctuation marks. Also, all words turned intolower case and numbers and English stop words such as “our, is, did, can, etc.” have been removed from the produced tokens. For the emotion analysis part,we used a spacy library as part of speech tagging to identify the exact role of

Page 5: Women in ISIS Propaganda: A Natural Language Processing

Women in ISIS Propaganda: A Natural Language Processing Analysis 5

words in the sentence. A word and its role have been used to look for emotionalvalues of that word in the same role in the sentence.

3.2 Pre-Processing

Text Cleaning and Pre-processing Most text and document datasets con-tain many unnecessary words such as stopwords, misspelling, slang, etc. In manyalgorithms, especially statistical and probabilistic learning algorithms, noise andunnecessary features can have adverse effects on system performance. In thissection, we briefly explain some techniques and methods for text cleaning andpre-processing text datasets [10].

Tokenization Tokenization is a pre-processing method which breaks a streamof text into words, phrases, symbols, or other meaningful elements called to-kens [24]. The main goal of this step is to investigate the words in a sentence [24].Both text classification and text mining requires a parser which processes thetokenization of the documents; for example:sentence [1] :

After sleeping for four hours, he decided to sleep for another four.In this case, the tokens are as follows:

{“After” “sleeping” “for” “four” “hours” “he” “decided” “to” “sleep” “for”“another” “four”}.

Stop words Text and document classification includes many words which donot hold important significance to be used in classification algorithms such as{“a”, “about”, “above”, “across”, “after”, “afterwards”, “again”,. . .}. The mostcommon technique to deal with these words is to remove them from the textsand documents [19].

Term Frequency-Inverse Document Frequency K Sparck Jones [20] pro-posed inverse document frequency (IDF) as a method to be used in conjunctionwith term frequency in order to lessen the effect of implicitly common words inthe corpus. IDF assigns a higher weight to words with either high frequency orlow frequency term in the document. This combination of TF and IDF is wellknown as term frequency-inverse document frequency (tf-idf). The mathemat-ical representation of the weight of a term in a document by tf-idf is given inEquation 1.

W (d, t) = TF (d, t) ∗ log( N

df(t) ) (1)

Here N is the number of documents and df(t) is the number of documentscontaining the term t in the corpus. The first term in Equation 1 improves therecall while the second term improves the precision of the word embedding [22].Although tf-idf tries to overcome the problem of common terms in the document,

Page 6: Women in ISIS Propaganda: A Natural Language Processing

6 M. Heidarysafa et al.

it still suffers from some other descriptive limitations. Namely, tf-idf cannotaccount for the similarity between the words in the document since each wordis independently presented as an index. However, with the development of morecomplex models in recent years, new methods, such as word embedding, havebeen presented that can incorporate concepts such as similarity of words andpart of speech tagging.

4 Method

In this section, we describe our methods used for comparing topics and evokedemotions in both ISIS and non-violent religious materials.

4.1 Content Analysis

The key task in comparing ISIS material with that of a non-violent group involvesanalyzing the content of these two corpora to identify the topics. For our analysis,we considered a simple uni-gram model where each word is considered as a singleunit. Understanding what words appear most frequently provides a simple metricfor comparison. To do so we normalized the count of words with the numberof words in each corpora to account for the size of each corpus. It should benoted, however, that a drawback of word frequencies is that there might be somedominant words that will overcome all the other contents without conveyingmuch information.

Topic modeling methods are the more powerful technique for understandingthe contents of a corpus. These methods try to discover abstract topics in acorpus and reveal hidden semantic structures in a collection of documents. Themost popular topic modeling methods use probabilistic approaches such as prob-abilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA).LDA is a generalization of pLSA where documents are considered as a mixtureof topics and the distribution of topics is governed by a Dirichlet prior (α). Fig-ure 1 shows plate notation of general LDA structure where β represents prior ofword distribution per topic and θ refers to topics distribution for documents [3].Since LDA is among the most widely utilized algorithms for topic modeling, weapplied it to our data. However, the coherence of the topics produced by LDAis poorer than expected.

To address this lack of coherence, we applied non-negative matrix factoriza-tion (NMF). This method decomposes the term-document matrix into two non-negative matrices as shown in Figure 2. The resulting non-negative matrices aresuch that their product closely approximate the original data. Mathematicallyspeaking, given an input matrix of document-terms V , NMF finds two matricesby solving the following equation [13]:

minW,H‖V −WH‖F s.t H ≥ 0,W ≥ 0.

Where W is topic-word matrix and H represents topic-document matrix.

Page 7: Women in ISIS Propaganda: A Natural Language Processing

Women in ISIS Propaganda: A Natural Language Processing Analysis 7

Fig. 1: Plate notation of LDA model

NMF appears to provide more coherent topic on specific corpora. O’Callaghanet al. compared LDA with NMF and concluded that NMF performs better incorporas with specific and non-mainstream areas [14]. Our findings align withthis assessment and thus our comparison of topics is based on NMF.

4.2 Emotion detection

Propaganda effectiveness hinges on the emotions that it elicits. But detectingemotion in text requires that two essential challenges are overcome.

First, emotions are generally complex and emotional representation mod-els are correspondingly contested. Despite this, some models proposed by psy-chologists have gained wide-spread usage that extends to text-emotion analysis.Robert Plutchik presented a model that arranged emotions from basic to com-plex in a circumplex as shown in Figure 3. The model categorizes emotions into8 main subsets and with addition of intensity and interactions it will classifyemotions into 24 classes [17]. Other models have been developed to capture allemotions by defining a 3-dimensional model of pleasure, arousal, and dominance.

Fig. 2: NMF decomposition of document-term matrix [11]

Page 8: Women in ISIS Propaganda: A Natural Language Processing

8 M. Heidarysafa et al.

The second challenge lies in using text for detecting emotion evoked in apotential reader. Common approaches use either lexicon-base methods (such askeyword-based or ontology-based model) or machine learning-base models (usu-ally using large corpus with labeled emotions) [5]. These methods are suited toaddressing the emotion that exist in the text, but in the case of propaganda weare more interested in emotions that are elicited in the reader of such materi-als. The closest analogy to this problem can be found in research that seek tomodel feelings of people after reading a news article. One solution for this typeof problem is to use an approach called Depechemood.

Depechemood is a lexicon-based emotion detection method gathered fromcrowd-annotated news [21]. Drawing on approximately 23.5K documents withaverage of 500 words per document from rappler.com, researchers asked sub-jects to report their emotions after reading each article. They then multipliedthe document-emotion matrix and word-document matrix to derive emotion-word matrix for these words. Due to limitations of their experiment setup, theemotion categories that they present does not exactly match the emotions fromthe Plutchik wheel categories. However, they still provide a good sense of thegeneral feeling of an individual after reading an article. The emotion categories ofDepechemood are: AFRAID, AMUSED, ANGRY, ANNOYED, DON’T CARE,HAPPY, INSPIRED, SAD. Depechemood simply creates dictionaries of wordswhere each word has scores between 0 and 1 for all of these 8 emotion categories.We present our finding using this approach in the result section.

Fig. 3: 2D representation of Plutchik wheel of emotions [16]

Page 9: Women in ISIS Propaganda: A Natural Language Processing

Women in ISIS Propaganda: A Natural Language Processing Analysis 9

Table 1: NMF Topics of women in ISIS

earl

yis

lam

wom

enIs

lam

/kh

ilafa

hm

arri

age

isla

mic

pray

ing

wom

en’s

life

hijr

ahis

lam

icdi

vorc

em

othe

rhoo

dsp

ousa

lre

lati

onsh

ipT

opic

0T

opic

1T

opic

2T

opic

3T

opic

4T

opic

5T

opic

6T

opic

7T

opic

8T

opic

91

said

wa

husban

dmasjid

worldly

islamic

wala

mou

rning

said

spou

ses

2kh

adija

hhijrah

wom

anprayer

spen

dstate

said

idda

hibn

said

3mun

afiqin

salla

msharah

said

regards

khila

fah

bara

widow

jihad

husban

d4

iman

alay

hisaid

masajid

wearin

gab

usake

home

repo

rted

backbitin

g5

abu

salla

llhu

wives

home

mat

brothe

rsen

mity

husban

dchild

ren

listening

6ibn

khilfah

wife

going

men

struation

arriv

edka

bibn

charity

wife

7du

nya

abmarry

wom

anprop

het

knew

salam

said

child

abu

8mothe

rsisters

ibn

ibn

life

mujah

idin

remaine

dpe

rfum

eab

udivorce

9pe

ople

radiyallh

umarrie

dprop

het

clothing

previous

prayer

wom

anwealth

word

10asma

ibn

jihd

hadith

small

soon

shariah

away

flock

prob

lems

11hu

sban

dqu

rndu

nyrepo

rted

lived

child

ren

muslim

snigh

tmuslim

instead

12ba

krislamic

say

prevent

time

later

affectio

nwear

cause

backbite

13be

lievers

praise

perm

itted

men

family

inform

edislam

widow

salbu

khari

relatio

nship

14he

arts

state

lawful

default

aishah

hijrah

anger

house

prop

het

tong

ue15

kille

dsaid

proh

ibite

dmuslim

garm

ent

shariah

aqidah

sleep

policy

wom

an16

know

slavegirl

lord

leav

ing

despite

dua

good

marrie

dshep

herd

person

17steadfast

land

islam

abu

follo

wjourne

yrelig

ion

clothing

soul

brothe

r18

sumay

yah

firm

fear

stay

concern

returned

return

abwaging

home

19sisters

peop

leman

wives

living

umm

state

pregna

ntlord

hind

20spread

ing

land

ssister

shariah

food

leave

relativ

eshu

sban

dsmothe

rna

rrated

Page 10: Women in ISIS Propaganda: A Natural Language Processing

10 M. Heidarysafa et al.

Table 2: NMF Topics of women in catholic forum

fem

inis

mla

wge

nder

iden

tity

mar

rage

/di

vorc

ech

urch

mot

herh

ood

birt

hco

ntro

llif

ese

xual

ity

pare

ntin

g

Top

ic0

Top

ic1

Top

ic2

Top

ic3

Top

ic4

Top

ic5

Top

ic6

Top

ic7

Top

ic8

Top

ic9

1wom

enab

ortio

ngend

ermarria

gechurch

mom

human

aehu

man

sex

parents

2ab

ortio

ncourt

lgbt

divo

rce

catholic

god

pill

social

sexu

alscho

ol3

pro

constit

utiona

lidentit

yfamily

bishop

smary

vitae

ecology

men

child

ren

4men

circuit

tran

sgen

der

child

ren

pope

chris

tcontraception

work

regn

erus

scho

ols

5life

fede

ral

ideology

marita

lsyno

dmothe

rcontrol

revo

lutio

ncheap

fede

ralist

6feminist

decision

tran

smarria

ges

priests

jesus

birth

peop

leconsent

education

7march

abortio

nssex

divorced

fran

cis

like

wom

enfamily

wom

enstud

ents

8feminists

state

sexu

alspou

ses

wom

enba

bycontraceptives

moral

desire

tran

sgen

der

9wom

anka

nsas

male

annu

lment

rome

love

health

political

porn

child

10feminism

suprem

ereality

families

letter

mothe

rhoo

deff

ects

politics

weinstein

read

ing

11equa

lity

law

female

spou

sechris

tchild

man

date

man

market

kids

12female

med

icaid

activ

ists

marrie

dmccarric

kworld

fertility

dign

itymarria

geyouth

13choice

roe

catholic

aban

done

dvatic

anlife

iud

person

mating

girl

14male

constit

ution

biological

love

bishop

charlie

catholic

ecological

power

family

15movem

ent

judg

edy

spho

riacatholics

holy

child

ren

contraceptive

society

pornograph

ygend

er16

vulnerab

lecase

person

church

god

light

sang

erna

ture

flyconfused

17sex

plan

ned

agen

dafid

elity

faith

ful

son

med

ical

world

good

continue

18toda

ypa

rentho

odpe

ople

olde

rfaith

little

prevent

liberty

like

boy

19tim

erig

htorientation

situations

priestho

odgot

sexu

alself

risk

public

20hu

man

government

repo

rthu

sban

dau

thority

day

free

middle

kind

news

Page 11: Women in ISIS Propaganda: A Natural Language Processing

Women in ISIS Propaganda: A Natural Language Processing Analysis 11

Fig. 4: Word frequency of most common words in catholic corpora

Fig. 5: Word frequency of most common words in dabiq corpora

5 Results

In this section, we present the results of our analysis based on the contents of ISISpropaganda materials as compared to articles from the Catholic women forum.We then present the results of emotion analysis conducted on both corpora.

5.1 Content Analysis

After pre-processing the text, both corpora were analyzed for word frequencies.These word frequencies have been normalized by the number of words in eachcorpus. Figure 5 shows the most common words in each of these corpora.

A comparison of common words suggests that those related to marital rela-tionships ( husband, wife, etc.) appear in both corpora, but the religious themeof ISIS material appears to be stronger. A stronger comparison can be made

Page 12: Women in ISIS Propaganda: A Natural Language Processing

12 M. Heidarysafa et al.

using topic modeling techniques to discover main topics of these documents. Al-though we used LDA, our results by using NMF outperform LDA topics, dueto the nature of these corpora. Also, fewer numbers of ISIS documents mightcontribute to the comparatively worse performance. Therefore, we present onlyNMF results. Based on their coherence, we selected 10 topics for analyzing withinboth corporas. Table 1 and Table 2 show the most important words in each topicwith a general label that we assigned to the topic manually. Based on the NMFoutput, ISIS articles that address women include topics mainly about Islam,women’s role in early Islam, hijrah (moving to another land), spousal relations,marriage, and motherhood.

The topics generated from the Catholic women forum are clearly quite differ-ent. Some, however, exist in both contexts. More specifically, marriage/divorce,motherhood, and to some extent spousal relations appeared in both generatedtopics. This suggests that when addressing women in a religious context, thesemay be very broadly effective and appeal to the feminine audience. More impor-tantly, suitable topic modeling methods will be able to identify these similaritiesno matter the size of the corpus we are working with. Although, finding the sim-ilarities/differences between topics in these two groups of articles might providesome new insights, we turn to emotional analysis to also compare the emotionsevoked in the audience.

5.2 Emotion Analysis

We rely on Depechemood dictionaries to analyze emotions in both corpora. Thesedictionaries are freely available and come in multiple arrangements. We used aversion that includes words with their part of speech (POS) tags. Only words thatexist in the Depechemood dictionary with the same POS tag are considered forour analysis. We aggregated the score for each word and normalized each articleby emotions. To better compare the result, we added a baseline of 100 randomarticles from a Reuters news dataset as a non-religious general resource whichis available in an NLTK python library. Figure 6 shows the aggregated score fordifferent feelings in our corpora.

Both Catholic and ISIS related materials score the highest in “inspired” cat-egory. Furthermore, in both cases, being afraid has the lowest score. However,this is not the case for random news material such as the Reuters corpus, whichare not that inspiring and, according to this method, seems to cause more fearin their audience. We investigate these results further by looking at the mostinspiring words detected in these two corpora. Table 3 presents 10 words thatare among the most inspiring in both corpora. The comparison of the two listsindicate that the method picks very different words in each corpus to reach tothe same conclusion. Also, we looked at separate articles in each of the issues ofISIS material addressing women. Figure 7 shows emotion scores in each of the20 issues of ISIS propaganda. As demonstrated, in every separate article, thismethod gives the highest score to evoking inspirations in the reader. Also, inmost of these issues the method scored “being afraid” as the lowest score in eachissue.

Page 13: Women in ISIS Propaganda: A Natural Language Processing

Women in ISIS Propaganda: A Natural Language Processing Analysis 13

Fig. 6: Comparison of emotions of our both corpora along with Reuters news

6 Conclusion and Future Work

In this paper, we have applied natural language processing methods to ISISpropaganda materials in an attempt to understand these materials using avail-able technologies. We also compared these texts with a non-violent religiousgroups’ (both focusing on women related articles) to examine possible similari-ties or differences in their approaches. To compare the contents, we used wordfrequency and topic modeling with NMF. Also, our results showed that NMFoutperforms LDA due to the niche domain and relatively small number of doc-uments.

The results suggest that certain topics play a particularly important roles inISIS propaganda targeting women. These relate to the role of women in earlyIslam, Islamic ideology, marriage/divorce, motherhood, spousal relationships,and hijrah (moving to a new land).

Comparing these topics with those that appeared on a Catholic women fo-rum, it seems that both ISIS and non-violent groups use topics about moth-erhood, spousal relationship, and marriage/divorce when they address women.Moreover, we used Depechemood methods to analyze the emotions that thesematerials are likely to elicit in readers. The result of our emotion analysis sug-gests that both corpuses used words that aim to inspire readers while avoidingfear. However, the actual words that lead to these effects are very different inthe two contexts. Overall, our findings indicate that, using proper methods, au-tomated analysis of large bodies of textual data can provide novel insight insightinto extremist propaganda that can assist the counterterrorism community.

Page 14: Women in ISIS Propaganda: A Natural Language Processing

14 M. Heidarysafa et al.

Fig. 7: Feeling detected in ISIS magazines (first 7 issues belong to Dabiq and last13 belong to Rumiyah)

Table 3: Words with highest inspiring scores

Words

GroupCatholic ISIS

W ord1 avarice uprightnessW ord2 perceptive memorizationW ord3 educationally mercifulW ord4 stereotypically afflictionW ord5 distrustful gentlenessW ord6 reverence masjidW ord7 unbounded verilyW ord8 antichrist sublimityW ord9 loneliness recompenseW ord10 feelings fierceness

References

1. Aggarwal, C.C.: Machine learning for text. Springer (2018)2. Al-Tamimi, A.J.: The dawn of the islamic state of iraq and ash-sham. Current

Trends in Islamist Ideology 16, 5 (2014)3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. Journal of machine

Learning research 3(Jan), 993–1022 (2003)4. Bodine-Baron, E., Helmus, T.C., Magnuson, M., Winkelman, Z.: Examining isis

support and opposition networks on twitter. Tech. rep., RAND Corporation SantaMonica United States (2016)

5. Canales, L., Martínez-Barco, P.: Emotion detection from text: A survey. In: Pro-ceedings of the Workshop on Natural Language Processing in the 5th InformationSystems Research Working Days (JISIC), pp. 37–43 (2014)

6. Farwell, J.P.: The media strategy of isis. Survival 56(6), 49–55 (2014)

Page 15: Women in ISIS Propaganda: A Natural Language Processing

Women in ISIS Propaganda: A Natural Language Processing Analysis 15

7. Gates, S., Podder, S.: Social media, recruitment, allegiance and the islamic state.Perspectives on Terrorism 9(4) (2015)

8. Ingram, H.J.: An analysis of inspire and dabiq: Lessons from aqap and islamicstate’s propaganda war. Studies in Conflict & Terrorism 40(5), 357–375 (2017)

9. Kneip, K.: Female jihad–women in the isis. politikon 29 (2016)10. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown,

D.: Text classification algorithms: A survey. Information 10(4), 150 (2019)11. Kuang, D., Brantingham, P.J., Bertozzi, A.L.: Crime topic modeling. Crime Sci-

ence 6(1), 12 (2017)12. Laub, Z., Masters, J.: Islamic state in iraq and greater syria. The Council on

Foreign Relations. June 12 (2014)13. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix fac-

torization. Nature 401(6755), 788 (1999)14. O’callaghan, D., Greene, D., Carthy, J., Cunningham, P.: An analysis of the coher-

ence of descriptors in topic modeling. Expert Systems with Applications 42(13),5645–5657 (2015)

15. Perešin, A.: Fatal attraction: Western muslimas and isis. Perspectives on Terrorism9(3) (2015)

16. Plutchik, R.: A general psychoevolutionary theory of emotion. In: Theories ofemotion, pp. 3–33. Elsevier (1980)

17. Plutchik, R.: The nature of emotions: Human emotions have deep evolutionaryroots, a fact that may explain their complexity and provide tools for clinical prac-tice. American scientist 89(4), 344–350 (2001)

18. Rowe, M., Saif, H.: Mining pro-isis radicalisation signals from social media users.In: Tenth International AAAI Conference on Web and Social Media (2016)

19. Saif, H., Fernández, M., He, Y., Alani, H.: On stopwords, filtering and data sparsityfor sentiment analysis of twitter (2014)

20. Sparck Jones, K.: A statistical interpretation of term specificity and its applicationin retrieval. Journal of documentation 28(1), 11–21 (1972)

21. Staiano, J., Guerini, M.: Depechemood: a lexicon for emotion analysis from crowd-annotated news. arXiv preprint arXiv:1405.1605 (2014)

22. Tokunaga, T., Makoto, I.: Text categorization based on weighted inverse documentfrequency. In: Special Interest Groups and Information Process Society of Japan(SIG-IPSJ. Citeseer (1994)

23. Vergani, M., Bliuc, A.M.: The evolution of the isis’language: a quantitative anal-ysis of the language of the first year of dabiq magazine. Sicurezza, Terrorismo eSocietà= Security, Terrorism and Society 2(2), 7–20 (2015)

24. Verma, T., Renu, R., Gaur, D.: Tokenization and filtering process in rapidminer.International Journal of Applied Information Systems 7(2), 16–18 (2014)

25. Wignell, P., Tan, S., O’Halloran, K., Lange, R.: A mixed methods empirical ex-amination of changes in emphasis and style in the extremist magazines dabiq andrumiyah. Perspectives on Terrorism 11(2), 2–20 (2017)

26. Winter, C.: The Virtual’Caliphate’: Understanding Islamic State’s PropagandaStrategy, vol. 25. Quilliam London (2015)