Discourse Network Analyzer ManualDiscourse Network Analyzer Manual Philip Leifeld, Johannes Gruber and Felix Rolf Bossner Last update: DNA 2.0 beta 20 with rDNA 2.0.4 on February 5,

Discourse Network Analyzer Manual

Philip Leifeld, Johannes Gruber and Felix Rolf Bossner

Last update: DNA 2.0 beta 20 with rDNA 2.0.4 on February 5, 2018.

Contents

1 Introduction 1

2 DNA algorithms 3Philip Leifeld

2.1 Congruence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Conflict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Subtract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.4 Ignore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.5 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.6 Affiliation networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.7 Normalization for affiliation networks . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Installation of DNA and rDNA 6Johannes Gruber

3.1 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.2 macOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 Installing the programs themselves . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Using DNA: Preparation of your DNA Workspace 18Felix Rolf Bossner and Johannes Gruber

4.1 Creating a new DNA database . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1.1 Creating a local DNA file . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.1.2 Creating and using a remote database (MySQL) . . . . . . . . . . . . . 22

4.2 User Management: Multiple Coders and Permissions . . . . . . . . . . . . . . . 23

4.3 Statement Types and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

i

4.3.1 Adjusting the variables of interest . . . . . . . . . . . . . . . . . . . . . . 27

4.3.2 Adjusting the statement types . . . . . . . . . . . . . . . . . . . . . . . . 30

4.4 Final step: Approving your workspace and creating the DNA file . . . . . . . . 31

5 Using DNA: Importing and Organizing your Raw Data 33Felix Rolf Bossner

5.1 Opening an existing DNA database . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.2 Importing Documents (Raw Data) . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2.1 Importing single Documents manually via Copy and Paste . . . . . . . . 34

5.2.2 Importing multiple Documents semi-automatically from text files . . . . 36

5.2.3 Importing Documents from other DNA databases . . . . . . . . . . . . . . 39

5.3 Organizing documents (Raw Data) . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.3.1 Deleting and navigating through documents . . . . . . . . . . . . . . . . 42

5.3.2 Editing the documents’ meta data (author, time etc...) . . . . . . . . . . 42

6 Using DNA: Coding the Data 48

7 Using DNA: Exporting the coded Data 49

8 rDNA: Using DNA from R 50Philip Leifeld

8.1 Getting started with rDNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

8.2 Retrieving networks and attributes . . . . . . . . . . . . . . . . . . . . . . . . . 52

ii

Chapter 1

Introduction

This manual demonstrates how to install, set up and use the open-source standalone softwareDiscourse Network Analyzer (DNA) and its companion R package rDNA (Leifeld 2018), whichare designed for researchers using the method discourse network analysis.1 By combiningcontent and dynamic network analysis, this method can reveal the structure and dynamics ofdebates, such as political discourses. For the user, the method comprises three basic steps: first,to annotate statements of actors in unstructured (text) sources; second, to create networksfrom the thereby structured data; and finally to analyse and interpret the results by employingthe toolbox of network analysis. The results can take a number of different forms, such ascongruence or conflict networks at the actor or concept level, affiliation networks of actors andconcept stances, and longitudinal versions of these networks (see Section 2 or Leifeld (2016a)for a comprehensive overview of the method).

In recent years, discourse network analysis has been employed by a growing number of schol-ars in a wide field of policy sectors, such as pension politics (Leifeld 2013, 2016b), climatepolitics (Fisher et al. 2013a,b; Broadbent and Vaughter 2014; Gkiouzepas and Botetzagias2015; Manfredo et al. 2014; Schneider and K. 2014; Stoddart and Tindall 2015; Wagner andPayne 2017; Yun et al. 2014), software patents and property rights (Leifeld and Haunss 2012;Herweg 2013), Internet policy (Breindl 2013), infrastructure projects (Nagel 2016), energy pol-icy (Brutschin 2013; Haunss et al. 2017; Imbert 2017; Rinscheid 2015; Rinscheid et al. 2015),shooting rampages (Hurka and Nebel 2013), abortion (Muller 2014a,b, 2015), outdoor sports(Stoddart et al. 2015), water politics (Brandenberger et al. 2015; Cisneros 2015), deforestation(Rantala and Di Gregorio 2014), genetically modified organisms (Tosun et al. 2015), highereducation (Nägler 2015), international financial politics (Werner 2015), and online deception(Wu and Zhou 2015). Many of these studies were already employing DNA and as the academicinterest in discourse network analysis has steadily grown in recent years, the use of both, themethod and the software, will eventually become more widespread.

The benefit of using the Java software DNA is that it is specifically designed to aid the user inthe first two of basic steps of discourse network analysis. It is mainly designed for qualitativecontent analysis which in this case means, to annotate the statements, or in other words, to

1This manual is a work in progress and will be continuously updated during the year 2018. Seegithub.com/leifeld/dna/blob/master/manual/ for the most recent version.

1

https://github.com/leifeld/dna/blob/master/manual/dna-manual.pdf

structure the text data. The program is also able, to create different kinds of network matri-ces and export them to other programs for further analysis and plotting. DNA is furthermoredesigned to help the user to develop a comprehensive workflow, from importing raw text, overmanaging multiple users as well as different types of documents, supporting the coding processwith a regular expression highlighter to basic analysis options during export. Furthermore,while the software is primarily designed for discourse network analysis, it is also very flexibleas the statement types can be adjusted (see Section 4.3) to accommodate different contentanalysis tasks, such as framing analysis. While there are numerous alternative software pack-ages for qualitative content analysis, there are very few which were specifically developed withdiscourse network analysis in mind and therefore lack the functionality necessary for exportingnetwork data.

The companion R package to DNA—rDNA—additionally helps with the remaining basic stepmentioned above: analysis of the coded statements. rDNA integrates the results from codingperformed in DNA with the statistical computing environment R to perform more in-depthanalysis of the coded material. While data can be exported to other software such as Ucinet,visone, NetMiner, Gephi and others, R is the preferred choice, as it enables reproducibleresearch, is free and open source and has a huge community of users and developers who areengaged in all kinds of data analysis tasks. Consequently, R has several fantastic packagesdeveloped specifically for network analysis—such as statnet (Handcock et al. 2008a), igraph(Csardi and Nepusz 2006a), sna (Butts 2016), network (Butts 2008a), tidygraph (Pedersen2017b) and ggraph (Pedersen 2017a)—and a lot more that can aid content and networkanalysis even though they were written with other tasks in mind. Most of these packageswork seamlessly with data processed by rDNA and therefore add a myriad of possibilities tothe native functions of our own R package.

The outline of this manual is as follows. Section 2 is a concise and fairly technical descriptionof the types of networks DNA can export. Section 3 explains how to install DNA and rDNA whichboth rely on a correctly set up Java runtime environment. While installing the programsthemselves would probably not be worth its own section, installing Java on some operatingsystems can be surprisingly tough—at least without our explanations. Then four sectionsfollow, which describe the usage of DNA in detail: Section 4 describes how to set up a projectin DNA, including the creation of a database, adding and managing users and how to set up oredit statement types and variables. Section 5 explains how you can import and organise yourraw data (i. e. documents). Section 6 and Section 7 will explain—once they are completed—how material is coded in DNA and how coded data can be exported to other programs forfurther analysis. Section 8 is an introductory tutorial on using the rDNA package to performadditional analysis and plotting tasks using the infrastructure provided by R.

Both DNA and rDNA can be downloaded from GitHub(See Section 3). Questions and bug reportscan be posted in the issue tracker on GitHub.

2

https://github.com/leifeld/dna


Chapter 2

DNA algorithmsPhilip Leifeld

This section summarizes the main algorithms implemented in DNA in a technical way.

X is a three-dimensional array representing statement counts. xijk is a specific count value inthis array, with the first index i denoting an instance of the first variable (e. g. , organizationi), the second index j denoting an instance of the second variable (e. g. , concept j), and thethird index k denoting a level on the qualifier variable (e. g. , agreement = 1). For example,xijk = 5 could mean that organization i mentions concept j with intensity k five times.

Where the qualifier variable is binary, false values are represented as 0 and true values as 1on the k index, i. e. , Kbinary = {0; 1}. Where the qualifier variable is integer, the respectiveinteger value is used as the level. This implies that k can take positive or negative values or0, i.e, K integer ⊆ Z. Note that all k levels of the scale are included in K, not just those valuesthat are empirically observed.

Indices with a prime denote a second instance of an element, e. g. , i′ may denote anotherorganization. Y denotes the output matrix to be obtained by applying a transformation toX. The following transformations are possible:

2.1 Congruence

In a congruence network, the edge weight between nodes i and i′ represents the number oftimes they co-support or co-reject second-variable nodes (if a binary qualifier is used) or thecumulative similarity between i and i′ over their assessments of second-variable nodes (in thecase of an integer qualifier variable).

In the integer case:

ycongruenceii′ = Φii′

n∑j=1

∑k

∑k′

xijkxi′jk′

(1− |k − k′||K| − 1

) (2.1)

where Φii′(·) denotes a normalization function (to be specified below).

3

In the binary case, i. e. , |K| = 2, this reduces to

ycongruence binaryii′ = Φii′

n∑j=1

∑k

xijkxi′jk + (1− xijk)(1− xi′jk)

. (2.2)

2.2 Conflict

Binary case:

yconflict binaryii′ = Φii′

n∑j=1

∑k

(1− xijk)xi′jk + xijk(1− xi′jk)

. (2.3)

More generally, in the integer case:

yconflictii′ = Φii′

n∑j=1

∑k

∑k′

xijkxi′jk′

(|k − k′||K| − 1

) (2.4)

2.3 Subtract

ysubtractii′ = ycongruenceii′ − yconflictii′ (2.5)

2.4 Ignore

yignoreii′ = Φii′

n∑j=1

((∑k

xijk

)(∑k

xi′jk

)) (2.6)

2.5 Normalization

In the simplest case, normalization can be switched off, in which case Φnoii′ (ω) = ω.

Alternatively, edge weights can be divided by the average activity of nodes i and i′:

Φavgii′ (ω) =

ω

12

(∑nj=1

∑k xijk +

∑nj=1

∑k xi′jk

) . (2.7)

With Jaccard normalization, we don’t just count i’s and i′’s activity and sum them up inde-pendently, but we add up both their independent activities and their joint activity, i. e. , bothmatches and non-matches:

ΦJaccardii′ (ω) =

ω∑nj=1

∑k xijk[xi′jk = 0] +

∑nj=1

∑k xi′jk[xijk = 0] +

∑nj=1

∑k xijkxi′jk

.

(2.8)

4

With cosine normalization, we take the product in the denominator:

Φcosineii′ (ω) =

ω√(∑n

j=1

∑k xijk)2

√(∑n

j=1

∑k xi′jk)2

. (2.9)

2.6 Affiliation networks

Ignoring the qualifier variable:

yaffiliation ignoreij = Φij

(∑k

xijk

)(2.10)

Subtracting negative from positive ties (integer case):

yaffiliation subtract binaryij = Φij

(∑k

k · xijk

)(2.11)

Subtracting negative from positive ties (binary case):

yaffiliation subtract binaryij = Φij

(∑k

(k · xijk − (1− k) · xijk)

)(2.12)

Note that the binary case is not merely a special case of the weighted affiliation network inthis case.

2.7 Normalization for affiliation networks

With activity normalization, ties from active nodes receive lower weights:

Φactivityij (ω) =

ω∑nj=1

∑k xijk

(2.13)

With prominence normalization, ties to prominent nodes receive lower weights:

Φprominenceij (ω) =

ω∑mi=1

∑k xijk

(2.14)

5

Chapter 3

Installation of DNA and rDNAJohannes Gruber

This section explains how DNA and rDNA can be installed on common desktop operating systems.As DNA is written in Java, both DNA and rDNA rely on Java to work on your computer properly.This section turned out to be longer than we initially expected as we noticed that installingJava on different operating systems can be a bit challenging—luckily for you though we figuredit out anyway and provide you with all the steps. Installing and configuring a valid JavaRuntime Environment on your machine will thus be the first and only complicated step of theinstallation. However, following the simple steps below, one should not run into problems whilesetting up Java. The advantage of the Java programming language for academic software isthat it both runs on different operating systems without altering the source code—once theRuntime Environment is set up—and that it is—for the most part—open source. Besidessetting up the Java Runtime Environment, the installation of DNA and rDNA is identical ondifferent operating systems. If you feel confident that Java is already correctly set up on yourcomputer, you can therefore skip to Section 3.4 if you like. Otherwise please continue to thesection for the operating system you wish to install DNA and rDNA on: Windows, macOS orLinux.

For more experienced users, here is a short version of the steps described below:

1. (On Mac: install Apple’s legacy version of Java—even though we will never use it. )

2. Install Java Runtime Environment (JRE) (Version 8) on your computer.

3. (On Windows and Mac: set up the “JAVA_HOME” to the installation path of yourJRE.)

4. Download the newest executable JAR from github.com/leifeld/dna/releases.

5. (On Linux: make the JAR file executable.)(On Mac: allow excetuting apps from an unidentified developer.)

6. You can now run the standalone DNA or continue to install rDNA as well.

6

https://support.apple.com/downloads/DL1572/en_US/javaforosx.dmg

https://github.com/leifeld/dna/releases

Figure 3.1: Downloading JRE from Oracle

7. Download and install R (and RStudio).

8. In R: install the necessary R packages rJava and devtools.

9. In R: install rDNA via

devtools::install_github("leifeld/dna/rDNA",args = "--no-multiarch")

3.1 Windows

To install the necessary Java Runtime Environment on your Windows computer, simply go tojava.com/en/download/manual.jsp, scroll down to and download “Windows Offline (64-bit)” (see Figure 3.1; download “Windows Offline” instead if you are using a 32-bit versionof Windows). During the installation, you can accept all the default options, including theinstallation path.

Next, you should set “JAVA_HOME” in your environmental variables to tell your WindowsPC where your Java installation lives. This step is optional, but can prevent many issueswith Java, people had in the past. To set “JAVA_HOME”, you need to navigate to the menu“edit the system environment variables” . The easiest way to get there is to hit thebutton on your keyboard and enter “environment”. Windows will then search for programsand settings menus which include this title and should usually display the menu we are looking

7

https://www.java.com/en/download/manual.jsp

Figure 3.2: Edit JAVA_HOME to tell Windows where your Java lives

for on top.1 In this menu you have to find the button “Environment variables...” . Clickingthis button should open the window shown in Figure 3.2.

Under User Variables, click New.2 Enter the variable name “JAVA_HOME” and the pathto your java installation in the field “Variable value” . If you haven’t altered the defaultinstall location, you should find Java in "C:\Program Files\Java\jre1.8.0_151" or if youchose to install a 32-bit version of Java in "C:\Program Files (x86)\Java\jre1.8.0_151"(which will cause problems though if you try to use it with a 64-bit version of R).3

Windows should now recognise Java and be able to run Java commands. To test this, we canopen the Command Prompt (press the button on your keyboard and simply enter “cmd”and then hit “Enter”) and type a Java command, e. g. “java -version”. If the installationwas successful, the output should display information about the Java-version and build asdepicted in Figure 3.3.

After installing Java, you are ready to use DNA and could skip to Section 3.4 if you are notinterested in installing rDNA as well. In order to use rDNA the rest of this section will explain howto install R and a recommended integrated development environment (IDE) called RStudio,which makes working with R a lot easier and also looks a lot better than the default interface.

1On older versions of Windows, this might not work. On Windows 7 you can alternatively right-click on“My Computer” and select “Properties → Advanced”. On Windows 8 “Control Panel → System → AdvancedSystem Settings”.

2This sets “JAVA_HOME” just for the current user. If you want to make Java available for all users onthe computer you are working on, you can create a System Variable instead.

3Note, that you have to repeat this procedure whenever the installation path of Java changes, for example,whenever Java is updated.

8

https://en.wikipedia.org/wiki/Integrated_development_environment

https://www.rstudio.com/products/RStudio/

Figure 3.3: Testing Java installation in Windows Command Prompt

Install R on Windows

1. First, you need to download R from cran.r-project.org/bin/windows/base/.

2. On the top of the page click on Download R 3.4.3 for Windows (or a newer versionif available).

3. Install the downloaded file, e. g. “R-3.4.3-win.exe”. Usually, it is fine to leave all defaultsettings in the installation options.

4. Go to rstudio.com/products/rstudio/download/.

5. At the bottom of the page, under “Installers for Supported Platforms”, click on the linkRStudio 1.1.383 - Windows Vista/7/8/10 (or a newer version if available). Againthe default installation options are fine in most cases and can be accepted unchanged.

6. After installation, you can use R by opening RStudio.

Traditionally, the first test you perform in a new programming language is to write a “Hello,World!” program. To do this in R, you simply type print(“Hello World!”) in the “Console”(the window which covers the left half of RStudio ). Alternatively, you can make R perform asimple mathematical operation. If everything is set up correctly, the output should look likethis:

print("Hello World!")

## [1] "Hello World!"

# You can also use R as a calculator2 * 3

## [1] 6

The chunk of code above marks the first time we are using R commands in this manual. Itmight be worth, to explain what this means for users who are not familiar with documents

9

https://cran.r-project.org/bin/windows/base/

https://www.rstudio.com/products/rstudio/download/

which contain R code. Whenever code is shown in this manual it is decorated with a light greybackground. Comments in R code (i. e. text targeted at the user to explain what is happeningin a specific line) are marked with a #, are formatted in italic font and in dark grey. Theoutput, which is generated by running a command, is marked by two # and formatted inblack. This means that every line which does not start with ## contains R code which youcan copy and paste to the Console in RStudio and run. Alternnativly, you can also copythe code to an R script and execute it by either clicking on this button on the upperright of RStudio, near the corner or you can use the shortcut “Ctrl+Enter”. Both ways, thehighlighted code or the line in which the caret is currently flashing are sent to the console andexecuted. If this works fine, you should be able to continue to the next section which describesInstalling the programs themselves.

3.2 macOS

On macOS, you have to install two versions of Java in order for rDNA to work properly. Thereasons behind this are too complicated to cover here, but basically, Apple built its own versionof Java, which needs to be on your machine, even though it is outdated. Therefore we needto first install the legacy Java 6—which we will never use—before installing the correct JavaRuntime Environment version 8.4

First, please download the file support.apple.com/downloads/DL1572/en_US/javaforosx.dmgand install it, accepting all defaults. After this has finished, we can proceed to get the newversion of the Java Runtime Environment. Go to java.com/en/download/manual.jsp andscroll down to download “Mac OS X (10.7.3 version and above)” (see Figure 3.4). Again,install the program accepting all defaults.

After installing Java, you are ready to use DNA and could skip to Section 3.4 if you are notinterested in installing rDNA as well. In order to use rDNA the rest of this section will explain howto install R and a recommended integrated development environment (IDE) called RStudio,which makes working with R a lot easier and also looks a lot better than R’s default interface.

Install R on Mac

1. First, you need to download R from cran.r-project.org/bin/macosx/.

2. On the top of the page click on R-3.4.3.pkg (or a newer version if available).

3. Install the downloaded file. Usually, it is fine to leave all default settings in the instal-lation options.

4. Go to rstudio.com/products/rstudio/download/.

5. At the bottom of the page, under “Installers for Supported Platforms”, click on the linkRStudio 1.1.383 - Mac OS X 10.6+ (64-bit) (or a newer version if available). Againthe default installation options are fine in most cases and can be accepted unchanged.

4If you do not wish to ever use rDNA or any other R package which relies on Java, you might not need bothversions and can just download the newest Java Runtime Environment. However, installing Java version 8before the legacy Java will cause problems if you’ll ever change your mind.

10

https://support.apple.com/downloads/DL1572/en_US/javaforosx.dmg

https://www.java.com/en/download/manual.jsp



https://cran.r-project.org/bin/macosx/


Figure 3.4: Downloading JRE from Oracle

6. Then you need to install the program “Xcode” from the app store.

7. After installation, you can use R by opening RStudio.





## [1] 6

The chunk of code above marks the first time we are using R commands in this manual. Itmight be worth, to explain what this means for users who are not familiar with documentswhich contain R code. Whenever code is shown in this manual it is decorated with a light grey

11

background. Comments in R code (i. e. text targeted at the user to explain what is happeningin a specific line) are marked with a #, are formatted in italic font and in dark grey. Theoutput, which is generated by running a command, is marked by two # and formatted inblack. This means that every line which does not start with either # or ## contains R codewhich you can copy and paste to the Console in RStudio and run. Alternatively, you can alsocopy the code to an R script and execute it by either clicking on this button on theupper right of RStudio, near the corner or you can use the shortcut “Ctrl+Enter”. Both ways,the highlighted code or the line in which the caret is currently flashing are sent to the consoleand executed.

Now unfortunatly, working with Java from within R on a Mac is a bit messy. Apple’s ownversion of Java, although important to have installed, does not run in combination with R.That is why we have to tell your system which version of Java to use by default. To do this,we have to enter a few system commands, which you can either do in the Terminal app ordirectly from within R using the system function:

# list files in java_homesystem("/usr/libexec/java_home -V")##Matching Java Virtual Machines (3):## 1.8.0_60, x86_64: "Java SE 8" /Library/Java/JavaVirtualMachines/jdk1.8.0...## 1.6.0_65-b14-468, x86_64: "Java SE 6" /Library/Java/JavaVirtualMachines/...## 1.6.0_65-b14-468, i386: "Java SE 6" /Library/Java/JavaVirtualMachines/1....

# see default version of Javasystem("java -version")##java version "1.8.0_60"##Java(TM) SE Runtime Environment (build 1.8.0_60-b27)##Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)

If your output looks like the above, you are ready to install rJava. If the first command doesnot show 1.8.0_60, x86_64 (or any other version staring with 1.8.), you need to install Javaversion 8 again (see above) and possibly reboot your computer. If the second command showsjava version "1.6.0_65", but version 1.8 is listed in the output from the first command, youcan set the default by excecuting the following command:

# Set JAVA_HOMEsystem("export JAVA_HOME=`/usr/libexec/java_home -v 1.8`")

After that, you should be able to continue to the next section which describes Installing theprograms themselves.

3.3 Linux

Since you are using Linux, we assume that you are sufficiently comfortable with using thecommand line. Therefore, we only provide the necessary steps for installing Java as commands.

12

First check if Java might already be installed:

$java -version

If not, install it, e. g. via APT:

$sudo apt-get install default-jre

Optional: You can also install the Java development kit at this point, which is sometimesrecommended for working with R and Java.

$sudo apt-get install default-jdk

After installing Java, you are ready to use DNA and could skip to Section 3.4 if you are notinterested in installing rDNA as well. In order to use rDNA the rest of this section will explain howto install R and a recommended integrated development environment (IDE) called RStudio,which makes working with R a lot easier and also looks a better than the default GUI.

Install R on Linux

1. Since the version of R on the default repositories tends to be fairly outdated, we add therepository of the Comprehensive R Archive Network (CRAN) to our sources.list:

$sudo add-apt-repository "deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu artful/"

Note, that you need to replace /ubuntu artful/ with your flavour and versionof Linux. Visit CRAN to see which ones are available. cran.rstudio.com is also justone of several CRAN mirrors, so you could replace it with a different one if you prefer.

2. Next, you need to add R to your keyring. Seen below is how you would accomplish thatin Ubuntu:

$gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9$gpg -a --export E084DAB9 | sudo apt-key add -

Or

$sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9

3. Update apt and install R (and r-base-dev if you wish to compile packages from source):

13



https://cran.rstudio.com/bin/linux/

https://cran.r-project.org/mirrors.html

$sudo apt-get update$sudo apt-get install r-base install r-base-dev

4. Now install RStudio via gdebi (and install gdebi first if you don’t already have it)5:

$sudo apt-get install gdebi-core$wget https://download1.rstudio.org/rstudio-1.1.419-amd64.deb$sudo gdebi -n rstudio-1.1.419-amd64.deb$rm rstudio-1.1.419-amd64.deb

5. (Up until version 1.1.423, RStudio depends on an outdated version of (libgstreamer).You thus either need to use the new version (which is currently just a preview) or installthe old version of (libgstreamer) using the method explained in this blogpost.)

6. After the installation has finished, you can use R by opening RStudio.





## [1] 6

The chunk of code above marks the first time we are using R commands in this manual. Sinceit looks similar to the terminal commands we used above, you probably have no problemreading it. But just in case, it might be worth to explain what you see there: whenever code isshown in this manual it is decorated with a light grey background. Comments in R code (i. e.text targeted at the user to explain what is happening in a specific line) are marked with a#, are formatted in italic font and in dark grey. The output, which is generated by running acommand, is marked by two # and formatted in black. This means that every line which doesnot start with ## contains R code which you can copy and paste to the Console in RStudioand run. Alternnativly, you can also copy the code to an R script and execute it by eitherclicking on this button on the upper right of RStudio, near the corner or you can usethe shortcut “Ctrl+Enter”. Both ways, the highlighted code or the line in which the caret iscurrently flashing are sent to the console and executed.

5Alternatively, you can download an installation file from rstudio.com/products/rstudio/download/.

14

https://mikewilliamson.wordpress.com/2016/11/14/installing-r-studio-on-ubuntu-16-10/


Now before we can actually run rDNA, we need to associate Java with R. To do this, you caneither go back to the terminal, or you can invoke a system command directly from within Rusing the system function:

$sudo R CMD javareconf

Or:

system("sudo R CMD javareconf")

After this is finished, you are now set to start installing DNA and the rDNA-package themselves.

3.4 Installing the programs themselves

Once Java is set up correctly, you can simply download the latest version of DNA as a JAR filefrom github.com/leifeld/dna/releases (see Figure 3.5). JAR or .jar files are technically archivefiles which usually contain a computer program written in Java, along with all the picturesand libraries necessary to run the program. Once the download is finished, you can startthe program by double-clicking on the downloaded file. However, on Linux, it is sometimesnecessary to make the file executable first (e. g. via $chmod +x /path/to/your/dna.jar orusing a GUI-method). On newer version of macOS, a program from an "unidentified developer"(i. e., if the program has not been registered with apple) needs to be made a security exceptionbefore you can run it. To do so for DNA control-click the program’s icon, then choose “Open”from the shortcut menu. If clicking on the file does not open the program on a windowsmachine, right-click on the .jar file → “Open with” → “Use another app” and then navigateto the file "C:\Program Files\Java\jre1.8.0_151\bin\javaw.exe".

If you are not interested in using rDNA, you can now skip to the next section.

At this point, I assume that you have installed R and have at least a minimal understandingof how the program works. If that is not the case, you might want to jump back to where weexplain how to install Install R on Windows, Install R on Mac or Install R on Linux. If youhave already done this, we can go ahead and install rDNA from within R. First, we need toinstall the package rJava (Urbanek 2016), which is the most important dependency of rDNA:

install.packages("rJava")

To see if this worked, or to troubleshoot potential problems, we can run a couple of Javacommands from within R:

library("rJava")# 1. initialize JVM.jinit()# 2. retrieve the Java-version.jcall("java/lang/System", "S", "getProperty", "java.version")

15


https://askubuntu.com/a/484719/570716

Figure 3.5: Download DNA jar file from GitHub releases page

## [1] "1.8.0_151"

# 3. retrieve JAVA_HOME location.jcall("java/lang/System", "S", "getProperty", "java.home")

## [1] "/usr/lib/jvm/java-8-openjdk-amd64/jre"

# 4. retrieve Java architecture.jcall("java/lang/System", "S", "getProperty", "sun.arch.data.model")

## [1] "64"

# 5. retreive architecture of OS (This should have 64 in it if step 4 displays# "64").jcall("java/lang/System", "S", "getProperty", "os.arch")

## [1] "amd64"

# 6. retrieve architecture of R as well (This should again have 64 in it if# step 4 and 5 display 64)R.Version()$arch

## [1] "x86_64"

16

Now what you want to make sure, in case something is not working correctly with rJava, is ifthe architectures of Java, your operating system and your version of R match (see comments4., 5., and 6. in above’s code chunk).

Once this is done, you should install the package devtools (Wickham and Chang 2016), whichpermits installing R packages from GitHub.

install.packages("devtools")

Since we only need one function from the package devtools at this point, it is not necessary toinvoke the library command to load the whole package. Instead you can write “devtools::”and then type the function you want to use.6

devtools::install_github("leifeld/dna/rDNA",args = "--no-multiarch")

After this is done as well, the final step of the installation is to test if rDNA can be loaded intoR correctly and to perform a basic operation with it—opening DNA from within R. In order todo so, you first need to download DNA, which can also be done in R with the download.filecommand (see Section 8 for more information about this code chunk).

# download two files necessary to test rDNAdownload.file(

"https://github.com/leifeld/dna/raw/master/manual/dna-2.0-beta20.jar",destfile = "dna-2.0-beta20.jar", mode = "wb") # download DNA jar

# load librarylibrary("rDNA")

# initialise the file you just downloadeddna_init("dna-2.0-beta20.jar")

# start up DNA from R with the sample file to see if everything workeddna_gui(infile = dna_sample())

If these commands can be executed correclty, you are ready and set to use both DNA and rDNA.How you can do so will be described in the rest of this manual.

6The option args = "–no-multiarch" should normally not be necessary, but prevents errors on some oper-ating systems. Since devtools tries to test both 32-bit and 64-bit version of a package during installation, theprocess inevitably fails as only one architecture of Java is available.

17

Chapter 4

Using DNA: Preparation of your DNAWorkspaceFelix Rolf Bossner and Johannes Gruber

After installing the program (see Section 3), you can now create your first DNA database foryour own research project. How you set up a DNA database will mainly depend on the needsof your personal research design—which should usually be clear before you start analysingdata. Therefore, DNA can be customised during the creation of a new database in accordancewith how you are planning to use the tool.

4.1 Creating a new DNA database

In order to create a new DNA database file, you have to click on the index tab “File” (in theupper left corner of your DNA program window) and select the option “New DNA database”(see Figure 4.1). As a result, a new window will open (see Figure 4.2), in which you find amenu that provides you with a step-by-step guidance for specifying the configuration of yourpersonal DNA database

Clicking on the first tab in the sidebar of this menu—“Database” (see Figure 4.2)—opens amenu, which allows you to choose the file name and storage location of your database. Forthis first step of your set-up, DNA provides you with two options in respect to the type ofdatabase, in which your data is stored. Which of these options best fits your research projectis dependent on the circumstances of your coding process:

The preset option “Local .dna file” means, that the dataset is stored in a localfile 1 on your PC or device. This file, with the file extension .dna, can be movedon your machine, sent via email, uploaded and shared via a cloud file hostingservice—such as Dropbox—and can generally be treated in the same way as anyother file PC users are familiar with. A local .dna file will be sufficient in most userscenarios, for example, if you employ a single coder working on a single computer,

1Technically an SQLite file.

18

Figure 4.1: Starting a new Database

Figure 4.2: Choose if database will be stored locally or remotly

19

if multiple coders work on a single dataset at non-overlapping intervals or whenmultiple coders work at the same time on different datasets, which you mergeafter the coding process (see Section 5.2.3). For most users, this simpler optionwill adequate in order to use DNAİt is not necessary to be familiar with settingup and managing an SQLite or MySQL database. If you think, the scenariosdescribed above cover your intended use of DNA, you can now jump to the nextsection and start Creating a local DNA file.

However, for more experienced user or research projects in which several coderswant to work on the same database at the same time, a second option was includedinto DNA “Remote database on a server” . This stores your data in a MySQLdatabase which could be stored locally on your machine—which would defy thepurpose though—on a private sever—such as a Network-attached storage (NAS)—or on an online Cloud server. You should select this option if you employ a singlecoder working on multiple devices or multiple coders working on a single datasetat the same time. The preconditions for using this type of storage are that allcoders have a stable connection to the database during the coding process—e. g.via the internet—and that you set up an online MySQL database in advance. Ifthis is how you want to proceed, you can now jump directly to the section whichdescibes the necessary steps for Creating and using a remote database (MySQL).

4.1.1 Creating a local DNA file

1. Click on the button “Browse” (see Figure 4.2). Now a pop-up menu—similiar to theone shown in Figure 4.3—should be open.

2. In this pop-up menu, you can choose the storage location of your database on your localdevice from the “Save in” slide down menu. Enter the name of your database in the field“File Name” and confirm your choices by pressing the “Save” button (see Figure 4.3).Now the pop-up menu will close.

3. Next, it is important, that you confirm your choices again by pressing the“Apply” button (see Figure 4.4). If you forget to press this button, you cannot createthe database in the final step, because the program will report “No database selected”(see Figure 4.14).

If you just employ a single coder and don´t want to change or supplement the preset stan-dard research variables (“person”, “organization”, “concept”, “agreement”) or types of codeablestatements (“Statement”, “Annotation”), you can now proceed directly to the final step. If youuse this manual as a beginner´s tutorial for working with DNA, however, it would be helpful tofollow the steps outlined in sections 4.2 and 4.3 in order to gain a better understanding of theDNA’s potential uses and its functions.

20

https://dev.mysql.com/doc/mysql-getting-started/en/

Figure 4.3: Choose location of database window

Figure 4.4: Apply database choice

21

Figure 4.5: Create MySQL database

4.1.2 Creating and using a remote database (MySQL)

Before you can configure DNA for working with a remote MySQL database, it is necessary toexecute at least three basic operations in MySQL (see Figure 4.5).2

1. You have to create a database on your MySQL server (usually by the commandCREATE DATABASE ’DatabaseName’).

2. As you probably don´t want to allow all coders access to all other databases stored onyour MySQL server, you should create distinct user profile(s) for the coding process ofyour DNA project. Even if DNA itself allows for managing multiple different coder roles, werecommend to create separate user profiles for each of the individual coders—especiallyif they simultaneously edit the content of your database. It is also advisable to createpasswords for the access to your database, not only for safety reasons, but also becauseDNA sometimes has problems with signing in users without a password. Consequentlyyou would use the CREATE USER ’Username’@’%’ IDENTIFIED BY ’Password’ command.Note, that in this step you could also restrict the respective users access to your databaseto a specific device by replacing ’%’ through a particular server address if this is necessary.

3. Finally, you have to equip the users with the necessary rights to edit your database. InMySQL simply use GRANT ALL PRIVILEGES ON Databasename.* TO ’Username’@’%’, asit makes more sense to specify distinct user roles and rights directly in DNA (see 4.2),where options were tailored to fit discourse network-analytical coding purposes.

Once the MySQL database is set up, you only have to select the option “Remote databaseon a server” in the first tab of the sidebar menu “Database” in DNA (see Creating a new

2For a detailed introduction to database management with MySQL see dev.mysql.com/doc/mysql-getting-started.

22



Figure 4.6: Connecting to local MySQL database

DNA database) and enter the respective username and password created in the previous stepin the respective fields “User” and “Password” as well as to specify the server address ofthe database, with which you want to connect, in the field “mysql://” . If you want to accessthe database remotely from another device, you have to indicate the URL or IP-address ofyour host server, the port (which is 3306 in default, but can be configured manually) and thename of your database in the format “Hostserveraddress:Port/Databasename” . If youuse DNA on the device hosting the database you can instead use the configuration shownin Figure 4.6 (“localhost/Databasename”). By clicking the button “Check” you can nowcheck if DNA is able to connect to your database. If this is successful, you will receive themessage “Ok. Tables will be created” (see Figure 4.6); if not, DNA will report “Error:Connection could not be established” . In case of the latter, you should check the validityof your server address, username and password and—if necessary—repeat the steps outlinedabove. It should be noted that—for security reasons—MySQL doesn´t allow remote accesswith the “root” superuser-profile in most cases. Similar to the generation of a local .dna file,it is finally important, that you confirm your choices again by pressing the “Apply” button(see Figure 4.6). If you forget to press this button, you cannot create the database in the finalstep, because the program will report “No database selected” (see Figure 4.14).

4.2 User Management: Multiple Coders and Permissions

This second step of preparing your DNA workspace allows you to generate multiple useridentities with different sets of rights for different coders. Thus, you can specify for eachcoder, which parts of the dataset each user can see or edit and thereby pre-structure yourcoding and research process. In order to do so, click on second tab “Coder” in the sidebar ofthe “Create new database” menu (see Figure 4.7).

In the main window (see Figure 4.7) you can now see a list with all coders and how many of

23

https://dev.mysql.com/doc/refman/5.5/en/connecting.html

Figure 4.7: Adding a second coder to the database

Figure 4.8: Configuring coder permissions

the 12 possible actions they are permitted to perform. Now you can either add a new userprofile by clicking the “Add” button (see Figure 4.7) or select an existing coder and adjusther/his users rights by clicking on the user and then on the “Edit” button (see Figure 4.10).Both options will open the pop-up menu shown in (see Figure 4.8).

This pop-up menu allows you to configure an individual profile for each coder in three simplesteps:

1. You can choose the colour for the coder (see Figure 4.8, step 1 ). It is recommendedto choose different—if possible—divergent colours for each coder, because this permitsyou to detect at the first glance, which user coded which statement, as every codedstatement is marked in the individual colour of its respective coder (see middle columnof Figure 4.9).

2. You can enter the preferred name of each coder in the field “Name” . If possible with

24

Figure 4.9: Change coder identity

respect to data protection rules, it is recommended to use the real names of the coders.This makes it easier for them to select their profile (in the upper left of the main programwindow) the first time they start the program (see Figure 4.9).

3. The final step allows you to configure the permissions of each coder individually by(de)selecting the respective rights via a click (see Figure 4.8, step 3 ). Each new user hasall of the 12 configurable permissions in the preset mode. Which parts of the datasetan individual coder should be able to see or edit, should depend on your coding process.For better orientation a few practical implications of the 12 configurable permissions arelisted in Table 4.1 . Please keep in mind, that every user can see and change to otheruser identities either accidentally or because of non-compliance, as s/he has to selecther/his role the first time s/he starts the program and can change her/his role anytime(see above and Figure 4.9)

Finally you approve your choices by clicking the OK button (see Figure 4.8, step 4). It ispossible to change the settings either in the “new database” menu by selecting the respectiveuser and clicking the “Edit” button (see Figure 4.9) or changing the coder settings in themain menu.

Table 4.1: User permissions explained

Permission Practical Implication

add documents The user can add new documents (i. e., raw data) manually (viacopy and paste or retyping) to the database ⇒ user has (also) aresearch function.

import documents The user can import new documents from other sources like .txt orother .dna files to the database or recode the metadata of multipledocuments ⇒ user has (also) a research function.

25



delete documents The user can delete documents from the database or dataset. Thisoption requires at least the other permission “view others’ docu-ments” if the user has an organizing or editing function (structuringdatabase for coding by other users) or the permission “add docu-ments” and “add statements” if the coder determines own codes andorganizes her/his own set of data.

edit documents The user can edit her/his own documents (i. e., raw data), butnot necessarily the codings in these documents that were madeby other users—which would require the permission “edit others’statements”—or the documents uploaded by other users—which re-quires the permission “edit others’ documents”. This option requiresat least the other permission “add documents” or “import docu-ments” and should be selected if the user determines own codes andorganizes her/his own set of data or acts as a researcher for the othercoders.

view others’ documents The user can view the documents uploaded by other users. Thisoption is necessary for a collaborative coding process in which onlya part of the users selects and uploads the raw data (i. e., documents)for all other users. The option should not be selected if each codercomes up with own codes and organizes her/his own set of data.

edit others’ documents The user can edit the documents uploaded by other users. This op-tion requires at least the other permission “view others’ documents”and should be selected if a user organizes or edits the raw dataprovided by other users.

add statements The coder actually codes the data by creating and editing state-ments. If only a part of the users select and upload the raw datathis option requires the additional permission “view others’ docu-ments”. If the coder suggests own codes and organizes her/his ownset of data this option requires either the additional permission “adddocuments” or “import documents”.

view others’ statements The coder can view the statements coded by other users. For ex-ample the Coder “DNA User” would not see the yellow statementof the Coder “Admin” in Figure 4.9 if this option was deselected forher/his user role. This option should be de-selected if you want toestablish a blind coding process.

edit others’ statements The coder can edit or correct the statements coded by other users.This option requires at least the other permission “view others’ state-ments” and should only be selected for few users with an organizing,controlling or editing function.

add coders The user can add new coders (see Section 4.2). This option shouldonly be selected for few users with an organizing function.

26



edit statement types The user can change or complement the variables of interest (seeSection 4.3). This option should only be selected for very few usersor the researchers themselves because possible adjustment of thesevariables is usually only necessary in cases when the research designand/or research questions change fundamentally.

edit regex settings The user can specify keywords which are highlighted in the text,along with a text color. For example, in Figure 4.9 the word “col-ors” is highlighted in the raw data text (middle column), becauseit was specified as a keyword in the regex highlighter sidebar in thebottom left of the DNA window. If a user does not have the rightto edit the regex setting, the buttons “Add” and “Remove” in thishighlighter would be hidden, but the keyword would neverthelessbe visibly highlighted in the text and listed in the regex highlightersidebar. Thus, if you specify a distinct set of theory based keywordsin advance in order to render the coding procedure semi-automatic,you should not enable this option or select it only for few users,as the respective coder could change the keywords. However, if youdon´t have a theoretically relevant set of keywords in advance or justspecify them as a assistance for your coders, you can allow them toformulate such keywords by themselves.

4.3 Statement Types and Variables

Clicking on the third tab in the sidebar of the “Create new database” menu—“StatementTypes” (see Figure 4.11)—opens a menu, which allows you to adjust or supplement eitherthe variables or the types of statements, which your coders derive from the raw data.

4.3.1 Adjusting the variables of interest

The statement type “DNA Statement” represents a text portion of your raw data, wherean actor reveals her/his opinion/belief/etc. about an issue. Thus, the main task of yourcoder(s) is to identify such text portions and gain the relevant data about the actor or hisopinion/belief/etc. Your research question or theory should not only dictate what kind ofinformation should be coded as statements, but also which relevant variables of this informationshould be captured by the coder. As you can see in the “Statement Types” menu, DNAs defaultconfiguration allows capturing four variables. Selecting “DNA Statement” and clicking onthe button “Edit” (see Figure 4.11) opens a pop-up window (see Figure 4.12), which reveals thenature of this four preconfigured variables, along whose lines the coders can collect information:

• the person who makes the statement.

27

Figure 4.10: Edit coder details

Figure 4.11: Edit Statement Types

28

Figure 4.12: Edit Statement Type details

• the organization the speaker is affiliated with.

• the concept (opinion/belief/etc.) which is raised by the actor.

• a dummy variable indicating whether the actor agrees with the concept or not.

Furthermore the pop-up window depicted in Figure 4.12 shows, that each variable is assignedto a specific data type: While “person”, “organization” and “concept”—according to their natureas nominal variables—will be coded by a short text, “agreement” as a dichotomous variablewill be coded as a boolean data type , which accordingly only allows for two forms (eitheragreement or non-agreement). Neither the data type nor the name of the variables can bechanged directly. However by selecting a variable and clicking on the trash symbol (on theright side of the “Add Variable” button, Figure 4.12, step 4) you can delete a variable andsubsequently replace it by a new one. Generating a new variable—either to replace one ofthe preconfigured variables or because you are interested in an additional or a different set ofvariables—is possible in five simple steps:

1. You have to select an existing variable in order to activate the variable menu (see 1,Figure 4.12).

2. Now you can enter the name of the new variable in the text field at the bottom of thepop-up window (see 2, Figure 4.12). For example, in Figure 4.12 we are interested incollecting the age of the person who makes the statement. Please note, that DNA does

29

https://en.wikipedia.org/wiki/Boolean_data_type#Python.2C_Ruby.2C_and_JavaScript

not allow spaces in variable names. Putting a space in the variable name will disablethe “Add Variable” button necessary for step 4.

3. Now you can choose the data type of your variable by clicking on one of the fouroptions. In our example, we choose the option “integer”, as the age of a person isneither a nominal nor a dichotomous variable, but an integer number) (see Figure 4.12,step 3 ).

4. You have to click on the “Add-Variable” button, which has the form of a green plussymbol (see 4, Figure 4.12). If this button is disabled, you probably did not select aexisting variable (step 1) or have a space in your variable name (see step 2).

5. Click the “OK” button to confirm your choices (see Figure 4.12, step 5 ).

Please note, that—for the statement type “DNA Statement”—you should only specify vari-ables, in which you have an actual research interest in and that accordingly have to be codedfor all statements by all coders. If you are interested in additional and optional informationabout some statements, you can specify them as variables of the other preconfigured statementtype—“Annotation” .

4.3.2 Adjusting the statement types

There are very few research scenarios, in which it is necessary to complement the two existingtypes of statements with further ones or with an adjustment of type “DNA statement”. Oneof them would be, if you study two parallel yet different research questions, which employ thesame dataset and the same coders at the same time. In this case, you could first rename thestatement type “DNA Statement” by selecting it from the statement type menu, clicking the“Edit” button (see Figure 4.11), entering the new name (in this case: “Statement for ResearchProject 1”) in the text field on top of the pop-up window (see Figure 4.12) and pressingthe “OK” button (see 5, Figure 4.12). Subsequently you would open a new pop-up windowby clicking on the “Add” button in the statement type menu (left button in Figure 4.11). Thenname the new statement type (in this case: “Statement for Research Project 2”) in the textfield on top of the pop-up window and choose a color (different from the other type) byclicking on the colored button next to this text field. Then you also need to specify therelevant variables synchronous to the procedure depicted in Section 4.3.1. However, pleaseevaluate carefully, if it is really neccesary for your second research interest that you specify asecond statement type or if it would be possible to either conceptualize it as a variable of theexisting statement type or study it sequentially or with a different set of coders (and thereforein a different DNA dataset). More than two statement types (besides “Statement” and“Annotation”) can cause a confusion of the coders and therefore compromise thevalidity of the coding procedure.

30

https://en.wikipedia.org/wiki/Integer_(computer_science)

Figure 4.13: Summary of your about to be created DNA database

4.4 Final step: Approving your workspace and creating theDNA file

Finally, clicking on the “Summary” tab in the sidebar of the “Create new database” menuprovides you with a summary of your choices in respect to the configuration of your codingprocess (see Figure 4.13). After controlling each of the three information you can now createyour database by clicking on the “Create database” button. If this button is disabled andyou get the error “No database selected” (see Figure 4.14), you probably forgot to click theApply button after specifying your database (see Section 4.1.1, step 3 ). After creating thedatabase, the new database will open in the main DNA window (see Figure 4.1) and you canproceed towards loading up and organizing the raw data.

31

Figure 4.14: No databse selected (e. g. if choice was not applied)

32

Chapter 5

Using DNA: Importing andOrganizing your Raw DataFelix Rolf Bossner

This section describes how to upload and organize your research project’s raw data—i. e. thetext files (newspaper articles, press releases etc.) containing the uncoded statements—in DNA.First it will be layed out how you open an existing database—either locally or from a remotelocation. Then you will learn how to import new documenst into DNA—either by importing onedocument at a time or by selecting mutliple documents for import. Finally, we tell you howyou can organise the documents in your database and how you can change your docuemtns’metadata.

5.1 Opening an existing DNA database

First of all, you have to choose, in which DNA Database you want to upload and process yourdata. To open a DNAdatabase, simply follow the steps depicted in Figure 1: First, click on theindex tab “File” and select the option “Open DNA database” (see Figure 5.1, step 1 ). As aresult, a pop-up window will appear, which allows you to choose between opening a “Local.dna file” or a “remote database on a server” . If your database is stored on a remoteserver, you should choose the second option and repeat the procedure outlined in Creatingand using a remote database (MySQL). If your dataset is stored in a folder on your localPC or device, you can proceed with the preset option and click on the button “Browse” (seeFigure 5.1, step 2 ), which will open a further pop-up window, in which you can find yourdatabase by choosing its storage location from the “Save in” slide down menu (see step 3 ),selecting the respective database (see step 4 ) and clicking on the button “Open” both in thepop-up and the “Open existing database...” window (see steps 5 and 6 ).

33

Figure 5.1: Opene DNAdatabase

5.2 Importing Documents (Raw Data)

There are four different—partly semi-automatic—ways to upload your raw data and relateddescriptive information (title, date, author, source, section and type of document) into DNA:Importing single Documents manually via Copy and Paste, Importing multiple Documentssemi-automatically from text files, Importing Documents from other DNA databases and usingrDNA to import data which is already available in R (WIP!). All four will be explained in detailin this section.

5.2.1 Importing single Documents manually via Copy and Paste

The most basic way to import data to DNA requires you to manually copy and paste the contentand the descriptive information for each of your documents into the text fields of a pop-upwindow, which you open by clicking on the index tab “Documents” and selecting the option“Add new document” (see Figure 5.2). This window has eight text boxes, in which you canenter information from and about your source data (see Figure 5.2):

• The field “title” is mandatory and may include any kind of information, for instance aunique ID if you plan to collect additional information about the articles in a separatedatabase. Duplicate article titles are not allowed.

• The field “date” is also mandatory and preset on the current time and day. You canchange it by either clicking on the year, month, day or time and adjusting the respective

34

Figure 5.2: Open DNA-database

value via the arrows on the right or by manually entering the date in the format “YYYY-MM-DD hh:mm:ss”. Please make sure you enter the date correctly because otherwisethe algorithms for longitudinal data will not work properly.

• The fields “author” , “source” , “section” and “type” are optional, but this additionalinformation can help you to efficiently organize your data and ensure the reproducibilty,transparency and future usage of your research project. You can enter these informationeither manually or select an author, source, section or type you specified for a previ-ously added document from the drop-down menu, which appears when you click on thedownward arrow buttonon the left of the respective field.

• To insert the content of your document, copy your article from a website or any othertext source and paste it in the text field (largest field at the bottom of the pop-up window). Single line breaks are automatically removed, while double line breaks(paragraph breaks) are preserved. Some escape sequences and special characters areautomatically removed when text is inserted.

• If you want to add further meta information to your document, which does not fit thepreset categories, you can use the field “notes” .

Finally—after checking your specifications—you can import the document to DNA by clickingthe “Add” button.

35

Figure 5.3: Downloading files from the LexisNexis newspaper archive

5.2.2 Importing multiple Documents semi-automatically from text files

If you want to analyze a greater number of articles, it quickly becomes tedious to manuallycopy and paste each document and its meta data. This is why DNA also offers a semi-automaticway to upload multiple documents and their relevant meta data (author, date, source, type)at the same time.

Downloading and Preparing your Raw Data. This way of importing raw data to DNArequires that you save all documents as separate “.txt” files (one file for each article) in acommon folder. Please note, that you have to use the “.txt” format for saving your data,as DNA can not import “.doc” or “.pdf” files.1 In case you use the newspaper database ofLexisNexis—which is available through many university lbraries—for finding and retrievingyour raw data, please make sure that you download all documents separately (by selectingthe individual document before clicking the download button, see Figure 5.3, step 1-2 ) andchoose the document format “Text” (under “Format Options” in the Download pop-up menu,see Figure 5.3, step 3-4 ) before downloading the data (see Figure 5.3, step 5 ).2

If you want to use the preset regex configurations (in contrast to adjusting them) for auto-matically detecting and uploading the meta data of your documents, you should use a filename in the format “DD.MM.YYYY - Author - Source - TYPE.txt” with blanksbefore and after the minuses, where “DD.MM.YYYY” is the date, on which the articlewas published. While “Author” and “Source” do not require a special format or length (e. g.you can use the first and/or last name of the author), the type of the document must alwaysbe indicated by capital letters. For example, the file name of the article spon.de/aeclD, which

1You can, however, save Word-documents as .txt files or use an online converter to transform PDFs intotxt files. Note, that you need to make sure (both cases) that the .txt file is saved with UTF 8 encoding.

2If you use rDNA it will soon also be possible to import LexisNexis data into DNA via using rDNA and a newRpackage called LexisNexisTools.

36

http://spon.de/aeclD

https://github.com/JBGruber/LexisNexisTools

Figure 5.4: Import text files

is used as an example here, would have the format “31.03.2014 - Ralf Neukirch - SPON In-ternational - DIGITALRESOURCE.txt”. Please note, that plain text files are somtimes savedas “.TXT” instead of “.txt” files. While this is technically the same, it can cause problemswhile importing multiple text files. If this is the case, you have to either change the presetRegex configuration or correct the “.txt” suffix manually in the file name(s). Otherwise theautomatic detection of your documents’ meta data will not work.

Importing your Raw Data into DNA If you prepared your data adequately, you canretrieve the documents and the relevant additional information in four simple steps (see Fig-ure 5.4):

1. Click on the index tab “Documents” and select the option “Import text files” (seeFigure 5.4, step 1 ). As a result, a new window will open, in which you press the button“Select folder” (see step 2 ). This will open a further pop-up menu. Here, you have toselect the folder, in which you saved the text files of your raw data, from the “Lookin” slide down menu (see step 3 ) and click the button “Open” (see step 4 ).

2. Now all documents, which are stored in the respective folder, should be listed in the mainwindow of the “Import text files...” pop-up (see Figure 5.5). If this isn’t the case,please check if your documents are saved in the right file format (.txt). In order to check,whether DNA is able to automatically identify your documents’ meta data, select one ofthe documents and click on the “Refresh” button (see Figure 5.5). If you specified thefile names correctly, you can now see the respective meta data of the selected document

37


in the fields “Title”, “Author”, “Source”, “Type” and “Date” of the “Preview” Section atthe bottom right of the “Import text files” window (see Figure 5.5).

3. If you want to adjust or amend the meta data manually, just select the document,uncheck the box “Regex” of the field you want to edit and enter the new or additionalinformation in the field on the left. Then click again on the “Refresh” button to check,whether your changes were accepted.

4. Finally, click on the button “Import files” to import all documents of the respectivefolder into your DNA database (you do not need to select each document for import).

Adjusting the Regex Configuration for automatic identification of meta data. Theprevious steps assumed that you use the preset configuration of DNA to detect and upload themeta data (Title, Author, Source, Type, Date) of your documents automatically into yourdatabase. However, if you are interested in automatically importing additional informationabout your source data (in the fields “Section” or “Notes”) or if your file names departfrom the naming system layed out here (but nevertheless contain all relevant information ina systematic order), DNA allows you to change, adjust or amend the pattern, through whichthe meta data about your documents is derived from the file names. The commands/rules, onwhich the “translation” of file names into meta data is based, are formulated in the Regularexpessions (in short: Regex) syntax and can be edited for each kind of information (Title,Author, Source, Section, Tyoe, Notes, Date) in the field “Pattern” on the bottom left ofthe “Import text files...” window (see Figure 5.5). If you want to amend or adjust thissettings it is recommended to use a Regex Cheatsheet (see e. g. cheatography.com or this

38

https://en.wikipedia.org/wiki/Regular_expression

https://en.wikipedia.org/wiki/Regular_expression

https://www.cheatography.com/davechild/cheat-sheets/regular-expressions/

http://www.txt2re.com/index-perl.php3?s=31.03.2014%20-%20Ralf%20Neukirch%20-%20SPON%20International%20-%20DIGITALRESOURCE.txt&-94&-102&80&77&75&81&-95&79&76&78&82&13&14&5&3



regex “translator”). As further support, Figure 5.6 translates the preset regular expressions ofthe DNA “Import text files...” option.

5.2.3 Importing Documents from other DNA databases

You can also import documents from other DNA databases. This function is particularly relevantin two scenarios: First, if you not only want to use the raw data, but also the codedstatements of an already finished research project, this function allows you to import both.Secondly, if there is more than one person working on the same project at the sametime and you did not use multiple user roles (see Section 4.2) to enable your coders to workon the same remote database. In the second scenario, you should use this function to prepareyour datasets or merge the codings, as it is usually difficult to merge the files manually lateron. In the latter scenario, the function helps you to avoid trouble with duplicate statementIDs and article names, as DNA will take care of e. g. duplicates automatically.

Make sure, that you know which version of DNA (DNA 2.0 or older) was used to create and

39



Figure 5.7: Import a DNA 2.0-database

edit the database, from which you want to import data, before using the “Import from DNA”function. If you use this manual as a beginner’s tutorial for working with DNA please downloadthe file “sample.dna” from the DNA github.com/leifeld/dna/releases. This file contains a smallselection of documents and statements from a larger project about congressional hearings onclimate change, employed in the project described in Fisher et al. (2013a,b).

To import documents (and the included code statements), click on the index tab “Docu-ments” and select the option “Import from DNA 2.0 file” , if DNA 2.0 was used to create andedit the database. As the internal structure of .dna files has significantly changed since version1.31, databases created with an older version of DNA need to be impored using the seperatemethod “Import from DNA 1.31 file” (see Figure 5.7, step 1 ). As a result of either step, afurther pop-up menu will open (see Figure 5.8). In this window, you have to select the folder,in which you saved the text files of your raw data, from the “Look in” slide down menu(see step 2 ) and select the respective .dna file (see step 3 ). Click the button “Open” (seestep 4 ) to then open the menu depicted in Figure 5.8.

In this menu, you can select, which documents (and respective which coded statements) fromthe original DNA database you want to import in your database by either manually checkingor unchecking the boxes on the left of the document title or by using the function “Keywordfilter” . This function is particularly helpful if you want to only import few documents witha specific common characteristic (author, topic) from a very large dataset. Clicking on thebutton “Keyword filter...” (see left button in Figure 5.8) opens a new pop-up window, inwhich you can enter a specific search term. For example, if you downloaded and opened the“sample.dna” file, you can select all congressional hearings of NGO representatives by enteringthe keyword “NGO” in the text field and pressing the button “OK” in the “Keyword filer”

40


https://github.com/leifeld/dna/raw/master/manual/sample.dna

Figure 5.8: Import Statements menu

41

pop-up window (see Figure 5.8). Now only the boxes of the three documents, which containthe hearings of NGO representatives Kateri Callahan, David Hamilton and Nayak Navin,should be checked, while the other boxes are unchecked. The “Keyword filter..” function isbased on the same regex syntax described in Adjusting the Regex Configuration for automaticidentification of meta data. This means, you can also use more specified regular expressions(see Figure 5.6 or regex cheatsheet) to select certain articles. For example, if you enter a “ˆN”in the “Keyword filter” DNA will select all articles starting with a capital N. If you want toundo your selections, you can also automatically select or unselect all articles by pressing thebutton “(Un)select all” in the middle of the “Import statements” window (see Figure 5.8).Pressing the right button “Import selected” in the same window imports all documents witha checked box (and the respective coded statements) in your DNA database (see Figure 5.8). Ifyou use this manual as a beginner’s tutorial for working with DNA, you should try importingall documents and the respective statements from the file “sample.dna” into your database.

5.3 Organizing documents (Raw Data)

5.3.1 Deleting and navigating through documents

All your imported documents are listed in the upper middle table of the DNA main window.If you click on an article, its corresponding text (i. e. the speech) will be displayed in thetext area below the document table. By clicking on, for example, the entry “109-1: Callahan,Kateri-NGO-Y” you open the speech of Kateri Callahan, a representative of the Alliance toSave Energy. You can adjust the size of the document table (by clicking on the bar above thetext area and moving it vertically with your cursor) or its colums (by clicking on the edge ofthe column and moving it horizontally with your cursor). You can also customize the metainformation, which are displayed in the document table: Just right click on any documentand use the appearing context menu to (un-)check the boxes of the information you (don’t)want to be displayed (see Figure 5.9, step 1 ). A structured (and customised) overview ofyour raw data is essential for detecting missing information and thus efficiently controlling,organizing and coding your data. For example, if you display the meta information “Type” (bychecking the respective box in the context menu), you can see that the type of all documentsfrom the sample.dna file is not listed.

The same context menu can be used to delete documents from your database by selecting thedocuments you want to delete (pressing and holding the “Ctrl” key for selecting multipledocuments), opening the context menu with a right click and choosing the option “Deleteselected documents” .

5.3.2 Editing the documents’ meta data (author, time etc...)

DNA allows you to edit, delete or complement the descriptive information related to your rawdata (title, date, author, source, section and type of document). Similiar to the proceduresoutlined in Section 5.2 there is a manual as well as a semi-automatic way to adjust the metadata of your documents.

42


https://github.com/leifeld/dna/raw/master/manual/sample.dna

Figure 5.9: Import Statements menu

Editing the documents’ meta data manually. The most basic way to edit your doc-uments’ meta data is to select the document, of which you want to edit the information(by left-clicking on it) and adjusting the values in the “Document properties” submenu onthe middle left of the DNA main window (see Figure 5.9, step 2 ) by either manually typingin the relevant information or by selecting an already specified author, a source, a section ora type from the drop-down menu on the right of the respective meta field. For example,in Figure 5.9 (step 2 ) Kateri Callahans speech was selected, and the value “NGO” (for Non-Governmental Organisation) was manually specified as “Type of document” by entering it inthe field “Type” of the “Document properties” submenu. Do not forget to press the button“Save” in the submenu (see Figure 5.9, step 2 ) to confirm your edits.

Please note, that you can manually only edit the meta data of one document at one time.If you try to select multiple documents for editing, the “Document properties” submenu willdisappear, returning “(No document or permission)”.

Editing the documents’ meta data semi-automatically. However if you want to adjustthe meta data of a greater number of articles, it quickly becomes tedious to manually editinformation about each document. This is why DNA also offers a semi-automatic way to edit,delete or complement the descriptive information related to your documents. In order to edityour documents’ meta data semi-automatically, click on the index tab “Documents” andselect the option “Batch-recode meta-data” (see Figure 5.9, step 3 ). As a result, a pop-upwindow similiar to Figure 5.10 will open. In the upper half of this pop-up window you findnine fields, which can be configured in order to adjust the meta data formultiple documentsat once:

43

Figure 5.10: Meta information recode window

• The field “Target field:” specifies, which kind of meta information (i. e. title, author,source, section, type, notes) should be adjusted by choosing the respective meta datacategory from the slide-down menu (which you open by clicking the arrow on the rightof the target field).

• The field “Source field:” specifies, where the data you want to use for adjusting the tar-get field is stored. For example, if you simply want to delete or correct (e. g. misspelled)title-, author-, source-, section-, type- or notes-metadata, you usually choose the samefield as source field as you have chosen as target field, since you want to adjust the dataalready stored in this field. However, if you want to add new data to a (maybe empty orincomplete) target field, you have to choose the part of the meta information as sourcefield, which contains the information, from which you want to derive the new data. Asthe document title should contain all relevant meta information, “Title” is usually usedas source field for the latter case.

• The field “Matching on target regex” allows you to automatically delimit the doc-uments which you want to adjust, based on the information stored in the document’starget field. Similiar to all regex implementations in DNA you can either use search termsor regular expressions to filter the documents. If you, for instance, misspelled the author“Ralf Neukirch” sometimes as “Ralf Neunkirch”, you can correct all your misspellingsby simply selecting “Author” as “Target field”, entering “Ralf Neunkirch” in the field“Matching on target regex:” and the correct version (“Ralf Neukirch”) in the field “Newtarget field”. As “Matching on target regex” automatically deselects all non-matchingcases (here: All documents, who do not have “Ralf Neunkirch” specified as their author),the meta information (here: “Author”) remains the same for all other documents.

44

• The field “Matching on source regex” similarly allows you to automatically filter thedocuments of which you want to alter the meta data, based on the information stored inthe document’s source field. For example, if you realise that Ralf Neukirch does not writefor “SPON International” (as you erroneously specified), but for “THE GUARDIAN”, youcan simply correct all your misspecifications by first selecting “Source” as the “Targetfield” and “Author” as the “Source field”, secondly entering “Ralf Neukirch” in the field“Matching on source regex” and then specifying “THE GUARDIAN” as “New targetfield”.

• The field “%target regular expression” allows you to specify/match a part of thetarget field, which you want to use as new information in the same field. For example,if the field “Author” somehow contains the full document titles you can reduce theinformation in the field “Author” to just the name of the respective author by enteringthe regular expression “(?<=.+?–-).+?(?= -)” (see Figure 5.6 or regex cheatsheet) in thefield “%target regular expression” and entering “%target” in the field “New target field”.Please note, that if you do not use this function, you should not change thepreset value “.+” in this field—because if you do, your recoding might not obtainthe expected results.

• The field “%target replacement” defines a new value for the information in the targetfield—similarly to the fields “New target field” and “%source replacement”. If you use“%target” as “New target field”, you have to specify the new, additional, corrected orreduced information in this field.

• The field “%source regular expression” allows you to specify/match a part of thesource field, which you want to use as new information in the target field. For example, ifyour source field is “Title” and the titles of your documents have the recommended format(i. e. “DD.MM.YYYY - Author - Source - TYPE.txt” with blanks before and after theminuses; see Section 5.2.2) you can automatically specify the meta information for thefield “Author” by (1.) choosing “Author” as the “Target field” and “Title” as the “Sourcefield”, (2.) entering the regular expression “(?<=.+?–-).+?(?= -)” (see Figure 5.6 orregex cheatsheet) in the field “%source regular expression” and (3.) entering “%source”in the field “New target field”. Please note, that if you do not use this function,you should not change the preset value “.+” in this field—because if you do,your recoding might not obtain the expected results.

• The field “%source replacement”—similarly to the fields “New target field” and “%tar-get replacement”—defines a new value for the information in the target field. If you use“%source” as “New target field”, you have to specify the new, additional, corrected orreduced information in this field.

• The field “New target field” defines the new, corrected, reduced or additional data,which is entered in your target field (see examples above). Please note, that this field hasto be set on “%source” (preset value) if you use the functions “source regular expression”or “source replacement” and has to be set on “%target” if you use the functions “targetregular expression” or “target replacement”. Otherwise, the respective functions will notwork.

45



The lower half of the “Recode document meta-data” pop-up window (see Figure 5.10) displaysa table with four columns and a row for each of your documents, which help you to preview,control and trace back your changes to the meta data:

• The column “ID” contains the individual ID of each of your documents. This column canbe particularly helpful if you specify a recoding procedures for a certain set of documents.If you know the ID of a few exemplary documents from this set, you can quickly traceback and understand the consequences of your recoding specifications by scrolling downto the respective IDs and taking a look at the other columns of these documents.

• The column “Source field” displays the field, from which you get the meta data forrecoding the target field. It is particularly helpful to understand the sequence of infor-mation in the source field, if you want to specify a “%source regular expression” or use“Matching on source regex” (for example, if only some source fields contain the relevantinformation).

• The column “Old target field” shows the meta data in the target fields prior to youradjustments. It is particularly helpful if you want to use “%target regular expression” oruse “Matching on target regex” (for example, if you only want to change the value of acertain set of target fields).

• The column “New target field” displays the consequences of your adjustment. Itis particular helpful to check if your recoding will be successful or if some recodingoutcomes are actually undesired (for example, if the target field already contained therelevant information, but is recoded nevertheless).

Your recodings are only applied, if you press the button “Recode” (on the lower right of the“Recode document meta data” window, see Figure 5.10). Once this is applied, it cannotbe undone! So please control the consequences of your recodings by using the table at thelower half of the window. However, before pressing the “Recode” button, you can revert alladjustments by pressing the button “Revert changes” and therefore are able to experimentwith the meta data (regex) specifications.

As noted previously, all documents from the file “sample.dna” do not specify any meta dataconcerning the type of the respective document. Both Figure 5.10 and Figure 5.11 illustratean exemplary semi-automatic procedure for complementing this information based on theinformation stored in the document title (here: The organisation, to which the respectivespeaker belongs to). Thus in both examples, “Type” is selected as “Target field”, while “Title”is selected as “Source field”.

The example in Figure 5.10 uses manual search terms to specify the meta information forthe document type. By entering “NGO” in the field “Matching on source regex” the adjustmentsare limited to the documents, which contain “NGO” in the document title. By entering “NGO”in the field “New target field”, the new value for “Type” is specified for the selected documents.As you can see in the table on the lower half of the “Recode meta-data” window, this very simpleprocedure is insofar successful, as only the target fields of documents containing hearings ofNGO-representatives are changed and the target fields of all other documents (including those

46

Figure 5.11: Meta information recode window (regex explained)

with already correct “Type” information) remain unchanged. However, this procedure wouldhave to be repeated for each kind of organisation from the sample (NGO, GOV, BUS).

The more elegant way of semi-automatically specifying meta information is depicted in Fig-ure 5.11, which uses the Regex-syntax. Here, by entering ˆ? in the field “Matching ontarget regex”, only those documents are selected for amendment, which do not already containany information about the document type (therefore excluding those documents with alreadycorrect “Type” information). By specifying (?<=.+?-)[A-Z]+ as “%source regular expression”(and accordingly “%source” as “New target field”), DNA is instructed to filter any string ofupper-case characters before a minus in the document title and set it as a new value for“Type”. Thus you can recode the document type for all documents at once, ensuring thatalready specified values are not overwritten—as evident from the table in the lower half of thewindow.

47

Chapter 6

Using DNA: Coding the Data

Coming soon...

48

Chapter 7

Using DNA: Exporting the coded Data

Coming soon...

49

Chapter 8

rDNA: Using DNA from RPhilip Leifeld

DNA can be connected to the statistical computing environment R (R Core Team 2014) throughthe rDNA package (Leifeld 2018). There are two advantages to working with R on DNA data.

The first advantage is replicability. The network export function of DNA has many options.Remembering what options were used in an analysis can be difficult. If the analysis is executedin R, commands—rather than mouse clicks—are used to extract networks or attributes fromDNA. These commands are saved in an R script file. This increases replicability because thescript can be re-used many times. For example, after discovering a wrong code somewherein the DNA database, it is sufficient to fix this problem in the DNA file and then re-run the Rscript instead of manually setting all the options again. This reduces the probability of makingerrors and increases replicability.

The second advantage is the immense flexibility of R in terms of statistical modelling. AnalysingDNA data in R permits many forms of data analysis beyond simple visualization of the resultingnetworks. Examples include cluster analysis or community detection, scaling and applicationof data reduction techniques, centrality analysis, and even statistical modelling of networkdata. R is also flexible in terms of combining and matching the data from DNA with other datasources.

8.1 Getting started with rDNA

The first step is to install R. Installing additional R packages for network analysis and cluster-ing, such as statnet (Goodreau et al. 2008; Handcock et al. 2008b, 2016), xergm (Leifeld et al.2017a,b), igraph (Csardi and Nepusz 2006b), and cluster (Maechler et al. 2017), is recom-mended. Moreover, it is necessary to install the rJava package (Urbanek 2016), on which therDNA package depends, and the devtools package (Wickham and Chang 2016), which permitsinstalling R packages from GitHub (see Section 3.4).

50

install.packages("statnet")install.packages("xergm")install.packages("igraph")install.packages("cluster")install.packages("rJava")install.packages("devtools")

After installing these supplementary packages, the rDNA package can be installed from GitHub.The devtools package contains a function that permits easy installation of R packages fromGitHub and can be used as follows to install rDNA:

library("devtools")install_github("leifeld/dna/rDNA")

Once installed, the rDNA package must be attached to the workspace:

library("rDNA")

To ensure that the following results can be reproduced exactly, we should set the random seedin R:

set.seed(12345)

Now we are able to use the package. The first step is to initialize DNA. Out of the box, rDNAdoes not know where the DNA .jar file is located. We need to register DNA with rDNA to usethem together. To do that, one needs to save the DNA .jar file to the working directory ofthe current R session and then initialize DNA as follows (with dna-2.0-beta20.jar in thisexample):

dna_init("dna-2.0-beta20.jar")

After initializing DNA, we can open the DNA graphical user interface from the R command line:

dna_gui()

Alternatively, we can provide the file name of a local DNA database as an argument, and thedatabase will be opened in DNA. For example, we could open the sample.dna database that isprovided for download on GitHub under Releases:

dna_gui("sample.dna")

For this to work, the database file has to be saved in the working directory of the R session,or the path needs to be provided along with the file name.

51



In addition to opening the GUI, we will want to retrieve networks and attributes from DNA.For this to happen, a connection with a DNA database must first be established using thedna_connection function:

conn <- dna_connection("sample.dna")

The dna_connection function accepts a file name of the database including full or relativepath (or, alternatively, a connection string to a remote MySQL database) and optionally thelogin and password for the database (in case a remote MySQL database is used). Details aboutthe connection can be printed by calling the resulting object called conn.

After initializing DNA and establishing a connection to a database, we can now retrieve datafrom DNA. We will start with a simple example of a two-mode network from the sampledatabase. To compute the network matrix, the connection that we just established mustbe supplied to the dna_network function:

nw <- dna_network(conn)

The resulting matrix can be plotted using visualization functions from the statnet suite ofpackages:

library("statnet")gplot(nw)

It is also easily possible to retrieve the attributes of a variable, for example the colours andtypes of organizations, using the dna_attributes function:

at <- dna_attributes(conn)

The result is a data frame with organizations in the rows and one column per organizationalattribute. The next section will provide usage examples of both the dna_network and thedna_attributes functions.

8.2 Retrieving networks and attributes

This section will explore the dna_network function and facilities for retrieving attributes inmore detail. The dna_network function has a number of arguments, which resemble the export

52

options in the DNA export window. The help page for the dna_network function provides detailson these arguments. It can be opened using the command

help("dna_network")

We will start with a simple example: a one-mode congruence network of organizations in apolicy debate. The sample.dna database is a small excerpt from a larger empirical researchproject that tries to map the ideological debates around American climate politics in the U.S.Congress over time. Details about the dataset from which this excerpt is taken are provided byFisher et al. (2013a,b). Here, it suffices to say that the sample.dna file contains speeches fromhearings in the U.S. Congress in which interest groups and legislators make statements abouttheir views on climate politics. Accordingly, one should expect to find a polarized debate withenvironmental groups on one side and industrial interest groups on the other side. To computea one-mode congruence network, the following code can be used:

congruence <- dna_network(conn,networkType = "onemode",statementType = "DNA Statement",variable1 = "organization",variable2 = "concept",qualifier = "agreement",qualifierAggregation = "congruence",duplicates = "document")

The result is an organization × organization matrix, where the cells represent on how manyconcepts any two actors (i e., the row organization and the column organization) had the sameissue stance (by values of the qualifier variable agreement).

The arguments of the dna_network function resemble the options in the DNA export window.Details on the various arguments of the function can be obtained by displaying the help page(?dna_network). In th code chunk above, statementType = "DNA Statement" indicates whichstatement type should be used for the network export. In this case, the statement type DNAStatement contains the variables organization, concept, and agreement. The argumentqualifierAggregation = "congruence" causes rDNA to count how often the unique elementsof variable1 have an identical value on the qualifier variable (here: agreement) when theyrefer to a concept (variable2).

If the algorithm finds duplicate statements within documents—i. e., statements containing thesame organization, concept, and agreement pattern—, only one of them is retained for theanalysis (duplicates = "document").

The resulting matrix can be converted to a network object and plotted as follows:

nw <- network(congruence)plot(nw,

edge.lwd = congruence^2,displaylabels = TRUE,

53

label.cex = 0.5,usearrows = FALSE,edge.col = "gray")

Alliance to Save Energy

Energy and Environmental Analysis, Inc.

Environmental Protection Agency

National Petrochemical & Refiners Association

Senate

Sierra Club

U.S. Public Interest Research Group

Here, we used the edge.lwd argument of the plot.network function to make the line widthproportional to the strength of congruence between actors. We used squared edge weights toemphasize the difference between low and high edge weights. We also displayed the labelsof the nodes at half the normal size, suppressed arrow heads, and changed the colour of theedges to grey. More information about the visualization capabilities of the network and snapackages are provided by Butts (2008b,c, 2015).

The network is not particularly polarized. We can suspect that some of the concepts are notvery contested. If they are supported by all actors, this may mask the extent of polarizationwith regard to the other concepts. From our experience with the dataset, we can tell in thisparticular case that the concept “There should be legislation to regulate emissions.” is in factvery consensual. If everybody agrees to this concept, it obfuscates the real structure of thenetwork. Therefore we should exclude it from the congruence network. To do that, we needto export and plot the congruence network again and use the excludeValues argument thistime:

congruence <-dna_network(conn,

networkType = "onemode",statementType = "DNA Statement",variable1 = "organization",variable2 = "concept",qualifier = "agreement",qualifierAggregation = "congruence",duplicates = "document",excludeValues = list("concept" =

"There should be legislation to regulate emissions."))nw <- network(congruence)plot(nw,

edge.lwd = congruence^2,

54

displaylabels = TRUE,label.cex = 0.5,usearrows = FALSE,edge.col = "gray")





Senate

Sierra Club


This reveals the structure of the actor congruence network. There are two camps revolvingaround environmental groups on the right and industrial interest groups and state actors onthe left, with Energy and Environmental Analysis, Inc. taking a bridging position. Thestrongest belief congruence ties can be found within, rather than between, the coalitions.

Next, we should tweak the congruence network further by changing the appearance of thenodes. We can use the colours for the organization types saved in the database and applythem to the nodes in the network. We can also make the size of each node proportional toits activity. The dna_attributes function serves to retrieve these additional data from DNA.The result is a data frame with the relevant data for each organization in the colour andfrequency columns:

at <- dna_attributes(conn,statementType = "DNA Statement",variable = "organization")

at

## id value color type alias note## 1 16 Alliance to Save Energy #00CC00 NGO## 2 7 Energy and Environmental Analysis, Inc. #FF9900 Business## 3 14 Environmental Protection Agency #000000 Government## 4 25 National Petrochemical & Refiners Association #FF9900 Business## 5 11 Senate #000000 Government## 6 19 Sierra Club #00CC00 NGO## 7 22 U.S. Public Interest Research Group #00CC00 NGO## frequency in dataset in network## 1 2 TRUE TRUE## 2 3 TRUE TRUE## 3 1 TRUE TRUE## 4 1 TRUE TRUE

55

## 5 2 TRUE TRUE## 6 4 TRUE TRUE## 7 5 TRUE TRUE

To use these data in the congruence network visualization, we can use the plotting facilitiesof the plot.network function:

plot(nw,edge.lwd = congruence^2,displaylabels = TRUE,label.cex = 0.5,usearrows = FALSE,edge.col = "gray",vertex.col = at$color,vertex.cex = at$frequency)

Alliance to Save EnergyEnergy and Environmental Analysis, Inc.



Senate

Sierra Club


This yields a clear visualization of the actor congruence network, with simultaneous displayof the network structure including its coalitions, the actors’ activity in the debate, and actortypes.

Another way to visualize a discourse network is a two-mode network visualization. To computea two-mode network of organizations and concepts, the following code can be used:

affil <- dna_network(conn,networkType = "twomode",statementType = "DNA Statement",variable1 = "organization",variable2 = "concept",qualifier = "agreement",qualifierAggregation = "combine",duplicates = "document",verbose = FALSE)

This creates a 7×6 matrix of organizations and their relations to concepts. The argument networkType = "twomode" is necessary because the rows and columns of the affil matrix should

56

contain different variables. The arguments variable1 = "organization" and variable2 ="concept" define which variables should be used for the rows and columns, respectively. Thearguments qualifier = "agreement" and qualifierAggregation = "combine" define howthe cells of the matrix should be populated: agreement is a binary variable, and the combineoption causes a cell to have a value of 0 if the organization never refers to the concept, 1 if theorganization refers to the respective concept exclusively in a positive way, 2 if the organizationrefers to the concept exclusively in a negative way, and 3 if there are both positive and negativestatements by the organization about the concept. rDNA reports on the R console what eachcombination means.

As in the previous example, the resulting network matrix can be converted to a networkobject (as defined in the network package). The colours of the edges can be stored as an edgeattribute, and the resulting object can be plotted with different colours representing positive,negative, and ambivalent mentions.

nw <- network(affil, bipartite = TRUE)colors <- as.character(t(affil))colors[colors == "3"] <- "blue"colors[colors == "2"] <- "red"colors[colors == "1"] <- "green"colors <- colors[colors != "0"]set.edge.attribute(nw, "color", colors)plot(nw,

edge.col = get.edge.attribute(nw, "color"),vertex.col = c(rep("white", nrow(affil)),

rep("black", ncol(affil))),displaylabels = TRUE,label.cex = 0.5)





Senate

Sierra Club


CO2 legislation will not hurt the economy.

Cap and trade is the solution.

Climate change is caused by greenhouse gases (CO2).

Climate change is real and anthropogenic.

Emissions legislation should regulate CO2.

There should be legislation to regulate emissions.

In this example, we first converted the two-mode matrix to a bipartite network object, thencreated a vector of colours for the edges (excluding zeros), and inserted this vector into thenetwork object as an edge attribute. It was necessary to work with the transposed affilmatrix (using the t function) because the set.edge.attribute function expects edge at-

57

tributes in a row-wise order while the as.character function returns them in a column-wiseorder based on the affil matrix. Finally, we plotted the network object with edge coloursand labels. In the visualization, we used white nodes for organizations and black nodes forconcepts.

We can now see the opinions of all actors on the various concepts. The blue edge indicatesthat Energy and Environmental Analysis, Inc. has both positive and negative things tosay about the concept “Emissions legislation should regulate CO2”. This is why thisorganization acts as a bridge between the two camps in the congruence network. Further-more, we can now see more clearly that the concept we omitted in the congruence network,“There should be legislation to regulate emissions”, is viewed positively by four or-ganizations, but still receives a negative mention by one actor. The green edges span bothcamps, and this caused additional connections between the two groups. The negative tie isignored in the construction of the congruence network because conflicts are not counted andthere is no second red tie to that concept.

58

Bibliography

Brandenberger, L. M., Schläpfer, I., Leifeld, P., and Fischer, M. (2015). Overlapping subsys-tems: Swiss water policy across media and parliament.

Breindl, Y. (2013). Discourse networks on state-mandated access blocking in Germany andFrance. info, 15(6):42–62.

Broadbent, J. and Vaughter, P. (2014). Inter-disciplinary analysis of climate change andsociety: A network approach. In Manfredo, M. J., Vaske, J. J., Rechkemmer, A., andDuke, E. A., editors, Understanding Society and Natural Resources, pages 203–228. SpringerNetherlands, Dordrecht.

Brutschin, E. (2013). Dynamics in EU policy-making: Market: The liberalization of theeuropean gas market.

Butts, C. T. (2008a). network: a package for managing relational data in r. Journal ofStatistical Software, 24(2).

Butts, C. T. (2008b). Social network analysis with sna. Journal of Statistical Software,24(6):1–51.

Butts, C. T. (2008c). network: A package for managing relational data in R. Journal ofStatistical Software, 24(2):1–36.

Butts, C. T. (2015). network: Classes for Relational Data. The Statnet Project (http://statnet.org). R package version 1.13.0.

Butts, C. T. (2016). sna: Tools for Social Network Analysis. R package version 2.4.

Cisneros, P. (2015). Subsystem interconnectedness as part of coalition strategies for policychange: Mining and water management in ecuador between 1991 and 2010.

Csardi, G. and Nepusz, T. (2006a). The igraph software package for complex network research.InterJournal, Complex Systems:1695.

Csardi, G. and Nepusz, T. (2006b). The igraph software package for complex network research.InterJournal, Complex Systems, 1695(5):1–9.

Fisher, D. R., Leifeld, P., and Iwaki, Y. (2013a). Mapping the ideological networks of Americanclimate politics. Climatic Change, 116(3):523–545.

59

http://statnet.org

http://statnet.org

Fisher, D. R., Waggle, J., and Leifeld, P. (2013b). Where does political polarization comefrom? Locating polarization within the U.S. climate change debate. American BehavioralScientist, 57(1):70–92.

Gkiouzepas, G. and Botetzagias, I. (2015). Climate change coverage in greek newspapers:2001–2008. Environmental Communication, 11(4):490–514.

Goodreau, S. M., Handcock, M. S., Hunter, D. R., Butts, C. T., and Morris, M. (2008). Astatnet tutorial. Journal of Statistical Software, 24(9):1–26.

Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N., Bender-deMoll, S., and Morris, M. (2016). statnet: Software Tools for the Statistical Analysis ofNetwork Data. The Statnet Project (http://www.statnet.org). R package version 2016.9.

Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., and Morris, M. (2008a).statnet: Software tools for the representation, visualization, analysis and simulation ofnetwork data. Journal of Statistical Software, 24(1):1–11.

Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., and Morris, M. (2008b).statnet: Software tools for the representation, visualization, analysis and simulation ofnetwork data. Journal of Statistical Software, 24(1):1–11.

Haunss, S., Dietz, M., and Nullmeier, F. (2017). Der ausstieg aus der atomenergie: Diskursnet-zwerkanalyse als beitrag zur erklärung einer radikalen politikwende. Zeitschrift für Diskurs-forschung, (3):288–315.

Herweg, S. (2013). Politische diskursnetzwerke und der konflikt um das antipiraterie-abkommen acta.

Hurka, S. and Nebel, K. (2013). Framing and policy change after shooting rampages: Acomparative analysis of discourse networks. Journal of European Public Policy, 20(3):390–406.

Imbert, I. (2017). An inquiry into the material and ideational dimensions of policymaking: Acase study of fuel poverty in Germany.

Leifeld, P. (2013). Reconceptualizing major policy change in the advocacy coalition framework:A discourse network analysis of german pension politics. Policy Studies Journal, 41(1):169–198.

Leifeld, P. (2016a). Discourse Network Analysis: Policy Debates as Dynamic Networks.

Leifeld, P. (2016b). Policy Debates as Dynamic Networks: German Pension Politics and Priva-tization Discourse, volume 29 of Schriften des Zentrums für Sozialpolitik Bremen. Campus,Frankfurt, 1. aufl., neue ausg edition.

Leifeld, P. (2018). rDNA. A Package to Control Discourse Network Analyzer from R. Uni-versity of Glasgow, School of Social and Political Sciences, Glasgow. R package version2.0.4.

60

http://www.statnet.org

Leifeld, P., Cranmer, S. J., and Desmarais, B. A. (2017a). Temporal Exponential RandomGraph Models with btergm: Estimation and Bootstrap Confidence Intervals. Forthcoming.

Leifeld, P., Cranmer, S. J., and Desmarais, B. A. (2017b). xergm: Extensions of ExponentialRandom Graph Models. R package version 1.8.2.

Leifeld, P. and Haunss, S. (2012). Political discourse networks and the conflict over softwarepatents in europe. European Journal of Political Research, 51(3):382–409.

Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M., and Hornik, K. (2017). cluster: ClusterAnalysis Basics and Extensions. R package version 2.0.6.

Manfredo, M. J., Vaske, J. J., Rechkemmer, A., and Duke, E. A., editors (2014). UnderstandingSociety and Natural Resources. Springer Netherlands, Dordrecht.

Muller, A. (2014a). Het meten van beleidscontroverse en polarisatie met discoursnetwerk-analyse: De case van het abortusdebat in de belgische kamer. Sociologos. Tijdschrift voorSociologie, 35(3):159–184.

Muller, A. (2014b). Het meten van discourscoalities met discoursnetwerkanalyse: Naar eenformele analyse van het politieke vertoog. Res Publica, 56(3):337–364.

Muller, A. (2015). Using discourse network analysis to measure discourse coalitions: Towardsa formal analysis of political discourse. World Political Science, 11(2):17.

Nagel, M. (2016). Polarisierung im politischen Diskurs: Eine Netzwerkanalyse zum Konfliktum Stuttgart 21. Springer VS, Wiesbaden.

Nägler, R. (2015). With|out a partner. the idea of cooperation in higher education discourses.SSRN Electronic Journal.

Pedersen, T. L. (2017a). ggraph: An Implementation of Grammar of Graphics for Graphs andNetworks. R package version 1.0.0.

Pedersen, T. L. (2017b). tidygraph: A Tidy API for Graph Manipulation. R package version1.0.0.

Rantala, S. and Di Gregorio, M. (2014). Multistakeholder environmental governance in action:REDD+ discourse coalitions in tanzania. Ecology and Society, 19(2).

R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundationfor Statistical Computing, Vienna, Austria.

Rinscheid, A. (2015). Crisis, policy discourse, and major policy change: Exploring the role ofsubsystem polarization in nuclear energy policymaking. European Policy Analysis, 1(2).

Rinscheid, A., Eberlein, B., and Schneider, V. (2015). Complex policy trajectories in riskdomains: Nuclear power in Canada, Germany and Japan. In International Conference onPublic Policy (ICPP).

61

Schneider, V. and K., J. (2014). Punctuations and displacements in policy discourse: Theclimate change issue in Germany 2007-2010. In Silvern, S. and Young, S., editors, Environ-mental change and sustainability. InTech, Rijeka, Croatia.

Stoddart, M. C., Ramos, H., and Tindall, D. B. (2015). Environmentalists’ mediawork forjumbo pass and the tobeatic wilderness, Canada: Combining text-centred and activist-centred approaches to news media and social movements. Social Movement Studies,14(1):75–91.

Stoddart, M. C. and Tindall, D. B. (2015). Canadian news media and the cultural dynamicsof multilevel climate governance. Environmental Politics, 24(3):401–422.

Tosun, Jale, Schaub, and Simon (2015). To mobilize or not: political attention and the reg-ulation of GMOs. In Seventh International Conference on Coexistence between GeneticallyModified (GM) and non-GM based Agricultural Supply Chains, Amsterdam.

Urbanek, S. (2016). rJava: Low-Level R to Java Interface. R package version 0.9-8.

Wagner, P. and Payne, D. (2017). Trends, frames and discourse networks: Analysing thecoverage of climate change in irish newspapers. Irish Journal of Sociology, 25(1):5–28.

Werner, C. (2015). The relevance of left and right in EU affairs: A case study of Germanparliament debates on the Greek crisis. In Conference "Democracy: A Citizen Perspective".

Wickham, H. and Chang, W. (2016). devtools: Tools to Make Developing R Packages Easier.R package version 1.12.0.

Wu, J. and Zhou, L. (2015). Dobnet: Exploiting the discourse of deception behaviour touncover online deception strategies. Behaviour & Information Technology, 34(9):936–948.

Yun, S.-J., Ku, D., Park, N.-B., and Han, J. (2014). Framing climate change as an economicopportunity in south korean newspapers. Development and Society, 43(2):219–238.

62