INFOGAPHICS: DATA AND INFORMATION VISUALISATION AND …dagda.shef.ac.uk/dispub/dissertations/2012-13/External/... · 2013-09-24 · 3 Abstract Background. Information and Data Visualisation

INFOGAPHICS: DATA AND INFORMATION VISUALISATION AND ITS USE IN JOURNALISM - A CASE STUDY ON GUARDIAN’S DATA STORE

A study submitted in partial fulfillment of the requirements for the degree of MSc in Digital Library Management

at

THE UNIVERSITY OF SHEFFIELD

by

Charalampia Boula

September 2013

2

I am deeply grateful to my

parents and my sister Daphne

for their love and support.

~~~~~~~~~~~~

A big Thank You to

Dr. Farida Vis for her

supervision and

guidance.

~~~~~~~~~~~~

I would also like to express my

gratitude to Mrs Lisa Evans,

Mr. Jacopo Ottaviani and Mr.

Paul Bradshaw for agreeing to

be interviewed and for

providing me with a great

insight on the subject.

~~~~~~~~~~~~

A special Thank you to

Dr. Andrew Cox and to

Dr. George. F. Turner for

their advice and support

throughout the year.

3

AbstractBackground.

Information and Data Visualisation show significant increase of use and importance especially in the media. New creative and scientific tools of data processing and visualisation have led to more effective and creative visualisations, but also to more complex ones. As primal providers of information for the public, the media have turned their spotlight in Data driven journalism, with ultimate aim to attract the readers'/users' attention and to increase the credibility of their publications. Aims.

The study primarily aims to examine the role of information and data visualisation in journalism by examining the case of the biggest data-journalism portfolio in UK, The Guardians' "Data Store". Methods.

An inductive methodological approach is followed with the employment of both qualitative and quantitative research methods. The qualitative approach consists of interviews with professionals in the field of data journalism and a thematic analysis of those interviews. The quantitative approach is based on a Systematic Content Analysis of 295 articles published on The Guardian Data Store. Findings.

Quantitative research showed that 50% of the articles provided at least one type of visualisation with an average of two visualisartions. It also revealed some tendencies in the use of specific visualisation tools for particular visualisation types and particular visualisation types per subject categories. Interviews revealed the creative workflow in data journalism and visualisation, the challenge of connecting numerical data to human stories and the important role of data journalism and visualisation in transparency. Conclusions.

The study managed to meet its objectives in a good degree and concluded that data journalism and visualisation will continue to grow in use and importance as data processing and visualisation tools keep advancing and more people of different backgrounds will combine their knowledge and skills on the field and bring more effectiveness and creativity.

3

Table of Contents

Abstract ......................................................................................................... 3

Table of Contents .......................................................................................... 3 Index of Tables and Charts (in main dissertation body) ........................... 6

Tables ................................................................................................................................... 6 Charts ................................................................................................................................... 6

1. Introduction ............................................................................................... 7 1.1 Context and significance of the topic ................................................................................. 7 1.2 Controversies ...................................................................................................................... 7 1.3 Brief synopsis of literature review ...................................................................................... 8 1.4 Rationale behind the choice of topic .................................................................................. 8 1.5 About this research ............................................................................................................. 8

1.5.1 Research aims, questions and objectives .................................................................... 9 Aim: ................................................................................................................................................ 9 Research Questions: ....................................................................................................................... 9 Objectives: .................................................................................................................................... 10

1.5.2 Methodology .............................................................................................................. 10 1.6 Dissertation Structure ....................................................................................................... 10 1.7 The Guardian Data Store .................................................................................................. 11 2. Literature Review .................................................................................... 11 2.1. Terminology .................................................................................................................... 11 2.2 Brief summary of the History of Visualisation ................................................................ 12 2.3 Data and Information Visualisation .................................................................................. 13

2.3.1 Principles of Good and Effective Data Visualisation ............................................... 13 2.3.2 Visualisation Types ................................................................................................... 14 2.3.3 Data Processing and Data Visualisation Tools ........................................................ 14 2.3.4 New Tendencies in Data Visualisation ..................................................................... 16 2.3.5 Challenges and Controversies in Data Visualisation ............................................... 18

2.3.5.1 Raw versus Aggregate Data ............................................................................................. 18 2.3.5.2 Avoiding nonsense ........................................................................................................... 18 2.3.5.3 Strange Visualisations: How much is too much and what is considered as Bad Visualisation? ............................................................................................................................... 19 2.3.5.4. Cultural Bias in Data Visualisation and Objectivity ....................................................... 19

2.4. Data Journalism ............................................................................................................... 20 2.4.1 Data and its Challenges ............................................................................................ 21 2.4.2 Open Data and Crowdsourcing ................................................................................ 21 2.4.3 Big Data .................................................................................................................... 22

2.5 Data Visualisation in Data Journalism ............................................................................. 24 2.5.1 Workflow in Data Journalism ........................................................................................ 25 2.6 The Guardian Data Store .................................................................................................. 25 3. Methodology ............................................................................................ 26 3.1 Ethical Approval ............................................................................................................... 27 3.2 Qualitative Research ......................................................................................................... 27

3.2.1 Design and execution of interviews ........................................................................... 29 3.2.1.1 Profile of Interviewees ..................................................................................................... 29

Jacopo Ottaviani: ..................................................................................................................... 29 Lisa Evans: .............................................................................................................................. 29 Paul Bradshaw: ........................................................................................................................ 29

3.2.1.2 Interviews' Preparation and Conducting .......................................................................... 30 3.2.1.3 Data Collection and Processing ....................................................................................... 31

4

3.2.1.4 Limitations and disadvantages of interviewing ................................................................ 31 3.3 Quantitative Research: ...................................................................................................... 32

3.3.1 Design and Implementation of Systematic Content Analysis .................................... 33 3.3.1.1 Limitations in Coding ...................................................................................................... 35 3.3.1.2 Data Processing ................................................................................................................ 36

3.3.2 Inter-Coder Reliability Testing ................................................................................. 36 3.3.2.1 Inter-Coder Reliability Test Results ................................................................................. 37

4. Findings and Discussion ....................................................................... 40 4.1 Research Question 1: ........................................................................................................ 40

4.1.1.1 Most Important Findings and Parallel Discussion: .......................................................... 40 Visualisations per Article ........................................................................................................ 40 Provision of Data Summary and Data Sets (or links to data source) ........................... 41 Authors by Number of Publications and Year (in descending order) ........................... 42 Articles Per Subject per Year .............................................................................................. 44 Visualisation Types .............................................................................................................. 45 Visualisation Tools ................................................................................................................ 47 Frequencies of Use of Tools, Types and Frequencies of Subjects of Authors - The case of Simon Rogers .......................................................................................................... 48 Visualisation Types per Subject and Subjects per Visualisation Types: ..................... 50 Visualisation Tools and Visualisation Types .................................................................... 54

4.2 Research Question 2: ........................................................................................................ 59 4.2.2 Theme 1: Data Sources, Data Gathering and Processing, Data Visualisation: Workflow, Tools and Decision Making .............................................................................. 59

4.2.2.1 Findings: ........................................................................................................................ 59 4.2.2.2 Discussion: .................................................................................................................... 61

4.3 Research Question 3: ........................................................................................................ 62 4.3.2 Theme 2: Data Journalism and Data Visualisation: Importance, Reasons for Increased interest, Impact in Journalism Required Professional Skills ............................ 62


4.4 Research Question 4: ........................................................................................................ 65 4.4.2 Theme 3: Weaknesses, Limitations, Negative Aspects and Dangers of Data Journalism and Data Visualisation .................................................................................... 65

4.4.2.1 Findings: ........................................................................................................................ 65 4.4.2.2. Discussion .................................................................................................................... 67

4.4.3 Theme 4: Future Prospective and Challenges of Data Journalism and Data Visualisation ....................................................................................................................... 67


5. Conclusion .............................................................................................. 69 Meeting Objectives: ..................................................................................................................... 69 Evaluation of Methodology Approach ......................................................................................... 69 Key Findings: ............................................................................................................................... 70 Future Research Suggestions and Recommendations .................................................................. 72

Bibliography ................................................................................................ 73

Appendices ................................................................................................. 81 Appendix 1: Ethical (Application, Consent Form, Approval) ................................................ 82 Appendix 2: Qualitative Research Methodology - Interviews' Questionnaire & Transcripts 91

2.1 Indicative Interviews' Questionnaire ........................................................................... 91 2.2 Transcript of Interview with Jacopo Ottaviani ........................................................... 93 2.3 Transcript of Interview with Lisa Evans .................................................................... 106

5

2.4 Transcript of Interview with Paul Bradsaw ............................................................... 117 Appendix 3 - Content Analysis Methodology ...................................................................... 129

3.1 Code Frame, Limitations, Clarifications (Tables A-E) .............................................. 129 Appendix 4 - Quantitative Research Findings ...................................................................... 136

4.1 - Visualisations per Article (Table 1, Chart 1) .......................................................... 136 4.2 - Provision of Data Summary and Data Sets (or links to data source) (Table 2, Chart 2) ...................................................................................................................................... 137 4.3 - Authors by Number of Publications and Year (in descending order) (Tables 3-5, Charts 3-4) ....................................................................................................................... 138 4.4 - Articles Per Subject per Year (Tables 6-7, Charts 5-12) ......................................... 142 4.5 - Visualisation Types (Table 8, Charts 13-14) ........................................................... 147 4.6 - Visualisation Tools (Tables 9-11, Charts 15-16) ..................................................... 149 4.7 - Frequencies of Use of Tools, Types and Frequencies of Subjects per Author (in descending Order ............................................................................................................. 152

Author: Simon Rogers (Tables 12-14, Charts 17-19) ................................................................ 152 Author: Ami Sedghi (Tables 15-17, Charts 20-22) .................................................................... 155 Author: Mona Chalabi (Tables 18-20, Charts 23-25) ................................................................ 158 Author: John Burn-Murdoch (Tables 21-23, Charts 26-28) ...................................................... 161 Author: Lisa Evans (Tables 24-26, Charts 29-31) ..................................................................... 164 Author: James Ball (Tables 27-29, Charts 32-34) ..................................................................... 167 Author: Claire Provost (Tables 30-32, Charts 35-37) ................................................................ 170 Author: Katy Stoddard (Tables 33-35, Charts 38-40) ................................................................ 173 Author: Nick Evershed (Tables 36-38, Charts 41-43) ............................................................... 176 Author: Randeep Ramesh (Tables 39-41, Charts 44-46) ........................................................... 179 Author: Sarah Hartley (Tables 42-44, Charts 47-49) ................................................................. 182 Author: Kevin Anderson (Tables 45-47, Charts 50-52) ............................................................. 185

4.8 Types of Visualisations per Subject and Subjects of Visualisations per Type ........... 188 Visualisation Types per Subject: (Table 48, Charts 53-68) ....................................................... 188 Subjects per Visualisation Types: (Table 49, Charts 69-83) ...................................................... 198

4.9 Most Used Visualisation Types per Most Used Visualisation Tools and Vice Versa 208 Most Used Visualisation Types per Most Used Visualisation Tools (Table 50, Charts 84-93) 208 Most Used Visualisation Tools per Most Used Visualisation Types (Table 51, Charts 94-101) .................................................................................................................................................... 214

6

Index of Tables and Charts (in main dissertation body)

Tables Table No Title Page

1 Variables’ Coding Scheme 34

2 Intercoder Reliability Test Results. 38

3 Main Authors (Publications per year, Percentage of total publications) 43

Charts Chart No Title Page

1 Number of 1st, 2nd & 3rd Visualisation, Total Number of Visualisations 41

2 Provision of Data Summary and Data Sets (or links to data source) % of total 41

3 Main Authors (percentage of total publications) 42

4 Main Authors (Publications per year, Percentage of total publications) 43

5 Articles per Subject (Percentages) in total (all years) 44

6 Articles per Subject per Year (Frequencies) 45

7 Types of 1st, 2nd and 3rd Visualisation (Percentages) 46

8 Types of Visualisations (Percentages of total use) 46

9 Main Visualisation Tools' Use Per Year (Frequencies) 47

10 Total Use of Main Visualisation Tools (Percentage) in descending order. 48

11 Simon Roger's Use of Visualisation Tools (Frequencies) 49

12 Simon Roger's Use of Visualisation Types (Frequencies) 49

13 Subjects' frequency in Simon Roger's articles 50

14 Visualisation Type: 1. Interactive, per Subject (Frequencies) 51

15 Visualisation Type: 10. Map, per Subject (Frequencies) 51

16 Subject 1. Politics / Government / Public Administration, per Visualisation Type (Frequencies) 52

17 Subject 7. Society, per Visualisation Type (Frequencies) 52

18 Subject 3. Culture, per Visualisation Type (Frequencies) 53

19 Subject 2. Sports, per Visualisation Type (Frequencies) 53

20 Type 1. Interactive, per most important Tools (Frequency) 54

21 Type 7. Bar Chart, per most important Tools (Frequency) 55

22 Type 10. Map, per most important Tools (Frequency) 55

23 Tool 1. Tableau, per most important Visualisation Types (Frequency) 56

24 Tool 4. Google Fusion, per most important Visualisation Types (Frequency) 56

25 Tool 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per most important Visualisation Types (Frequency) 57

26 Tool 11. Graphic from External Source, per most important Visualisation Types (Frequency) 57

27 Tool 12. Datawrapper from External Source, per most important Visualisation Types (Frequency) 58

7

1. Introduction

1.1 Context and significance of the topic Data driven journalism and data visualisation are constantly growing in importance, their use by the media and in the number of people who specialise on them on a professional level. The majority of the important media and news networks nowadays feature on their webpages a variety of published articles, which reveal a story that is based on, proved or reinforced by the analysis of relevant data. This new era of journalism was significantly affected by a turn of governments and organisations to publically release data files. Journalists had now access to new sources of information that they could use to investigate various topics of public interest.

1.2 Controversies Data visualisation in data journalism is constantly evolving with the release of new visualisation and data processing tools and the combination of programming and coding languages with such tools. The generated results can vary from very simple to very complex, not only in terms of statistical calculations but also in terms of aesthetics. However, there is still much controversy on identifying the fine line that separates an effective and aesthetically nice visualisation from a very impressive one that could be uncomprehending for the reader. Another dilemma that data journalists often face is whether they should simply present the data to the readers in an understandable format, most frequently visualised, allowing them to interpret it their own way or if they should clearly present the conclusions they reached, influencing in a way the readers' perception (McGhee, 2010).

8

1.3 Brief synopsis of literature review The literature review provides definitions of the main key terms and a brief historical background of visualisation, followed further by an examination of its various types and some of the main tools used in the creation process. It then progresses to an analysis of the role of data, more specifically that of open data and its sources,

along with the new challenges that big data brings in public information and journalism. The final parts consist of literature on data journalism, the use of data visualisation in data journalism and specifically on The Guardian Data Store.

1.4 Rationale behind the choice of topic Recent studies and literature such as that of Segel & Heer (2010) provide a broad and detailed description of how and why data and information visualisation are used in such extent by the media. They describe the basic styles and tools and, sometimes, the procedure of the production of data visualisation behind an article and its story. However, there is a lack of studies that, through the systematic examination of case studies, can provide results about the different variables that affect the final result of visualization, such as, for example: the subject category of articles, which tools are used for each of the various visualisation types, the identification of a possible tendency to use more specific visualisation types and tools in some subject categories. This study tries to do that in a small scale, setting the foundation for other more extensive similar studies in the future.

1.5 About this research The reader should bear in mind that this research examines a small sample of the portfolio of data journalism articles published by one main source, The Guardian Data Store. Nevertheless, this study could serve as a pilot for future studies of

entire collections from multiple sources, not only from that of the current case study. Such research, in combination with taking into consideration other factors (such as people's perception of such articles and their visualisations), could help to

9

not only find that balance, but also to identify creative patterns of current use and possible tendencies for the future.

1.5.1 Research aims, questions and objectives

Aim: This study primarily aims to examine the role of information and data visualisation in journalism by examining the case of the biggest data-journalism portfolio in UK, The Guardian’s Data Store.

Research Questions: As this is a study that implements inductive methodology, there is no initial hypothesis to prove or disprove but rather an effort to explore and try to answer

some central questions on topic. The main questions that this study wishes to answer are:

§ Which various types of visualisation and tools used can be recognised in the portfolio-case study? Are any norms or patterns of them identified?

§ Which is the creative process behind an Infographic created and/or hosted by The Guardian Data Store? In more detail:

o Which is the creative process step by step and who are the decision makers?

o Which data types are the most broadly used and how is data selected and gathered?

o Which are the most important tools used in the process either for data processing and analysis or for visualisation?

§ How do journalism professionals perceive data and information visualisation in terms of value and effectiveness?

§ Which are the possible weaknesses, limitations and the negative aspects of data and information visualisation?

10

Objectives: Answering those research questions will help meeting its objectives, which are to identify:

§ How data and information visualisation is used in journalism and why is its

use constantly increasing § Which are the required skills and knowledge in order to work on data

visualisation on a professional level § Which is the importance of data and data visualization, as perceived by the

professionals § Which are the possible limitations, weaknesses and negative aspects or

impact of data journalism and information visualization § The various tools used either in data analysis (and possibly formulation /

editing) or in visualisation, and more specifically by The Guardian § Possible tendencies, norms, co-relations on Guardian’s portfolio, mainly

regarding subject, visualisation type and tools

1.5.2 Methodology This study employs both qualitative and quantitative research methods in order to manage to answer effectively the research questions, as choosing only one of those two approaches would lead to inconclusive or unclear results. The qualitative research method is a thematic analysis of interviews with professionals on the field and the quantitative method consists of a systematic content analysis of a sample of the articles published on The Guardian Data Store.

1.6 Dissertation Structure

This dissertation is divided in five chapters. The first (and current) one, the second that consists of literature review, the third which analyses the methodology behind the study, the fourth that presents the findings of the study and examines them in relation to the literature review and the fifth which serves as a conclusion. The three

11

appendices present the ethical approval of the study, and the analysis and results of both research methods applied, transcripts for interviews and coding scheme and statistical results of the systematic content analysis.

1.7 The Guardian Data Store

Since its first official publication in January 14th 2009, under the supervision of Simon Rogers, The Guardian Data Store on its Data Blog has published more than 3000 data-driven articles, the greatest portfolio of its kind in UK. The articles were created both by journalists of the organisation and freelance data journalists. In the majority of those articles at least one element of visualised data is provided, either created by The Guardian Graphics' team, by the author of the article with the use of various visualisation tools and applications, by a freelance designer for The Guardian or by other external sources whose creations were hosted in articles in the Data Blog.

2. Literature Review

2.1. Terminology Terms such as "infographics", "data visualisation" or "information visualisation" are

steadily becoming more and more popular in the media. In literature one can find

various definitions for each of those terms. It was decided, though, not to adopt

specific definitions but rather to compose new ones, deriving from the overall

literature study.

Information graphics, or infographics, could be defined as the visual representation

of data, information or knowledge. They combine the use of graphics and text aiming

to present the available information and data in the clearest, most understandable

and memorable way. This is the reason that in the majority of the times they visually

present selected important parts or summaries of the available data sets or selected

pieces of information, with the ultimate goal of delivering the story hidden in the

12

data. The creative process of information graphics is called Information

Visualisation.

Data visualisation is the process of gathering, filtering, analysing and visualising

data to provide a final outcome for the target group (Kramer de Oliveira Barros &

Araujo Bertoti, 2012). It is a more narrow term than that of information visualisation,

as the object that is being visualised is usually specifc data set(s). In data driven

journalism this ultimate goal is that this outcome will support or present the story

behind the article.

2.2 Brief summary of the History of Visualisation

Humans have expressed their need to tell a story or to visualise information since

the early years of human presence. Cave paintings in the Paleolithic era are

considered the people's first effort to tell stories (Mol, 2011) and show the way they

hunted or their perception of the spiritual world. Even before 1000BC, ancient

civilisations such as Greeks, Babylonians, Egyptians and Chinese, tried to visually

present planetary movements, created the first maps that served as navigation

guides and made the first regional planning drafts (“A Quick Illustrated History of

Visualisation,” n.d.).

Philosopher Ramon Llull (1232-1315) was the creator of the first knowledge trees to

portray in the form of a diagram the relationships between terms or concepts. Nicole

Oresme, in 1350, conceptualised the first bar chart and Abraham Ortelius changed

the course of chartography forever, when in 1570 he created the first modern Atlas.

(Friendly & Denis, 2001).

Mathematician J.H. Lambert (1728-1777) and politician William Playfair (1759-1823)

are the two people who established the era of modern visualisation. They were the

first to publish time series graphs that visualised economic data in graphs rather

than tables, which was the usual tactic until then. In this way the reader could shape

the data and make easier comparisons of its values in different times. They also

introduced the first bar charts, pie charts and histograms in the form that is known

today. French engineer Charles Joseph Minard introduced the concept of narrative

13

graphics of space and time where he combines a time scale and a data map to

portray the continuous losses during Napoleon's campaign (Tufte, 2001).

Today, computers and specialised software allow people to create very advanced

and complex graphs, either static or interactive, in a relatively small amount of time

and with great precision. The advances in software and visualisation tools along with

the increase of use of Open Data (Simonite, 2012) and social media's easy sharing

options have definitely contributed to the increasing popularity of infographics,

especially by important media network.

2.3 Data and Information Visualisation

2.3.1 Principles of Good and Effective Data Visualisation Tufte (2001) defines Graphical Excellence as "A well-designed presentation of

interesting data, a matter of substance, statistics and of design...It consists of

complex ideas communicated with clarity, precision and efficiency... It provides the

viewer with the greatest number of ideas in the shortest time...It is nearly always

multivariate... and requires telling the truth about the data".

The power of data visualisation is that it allows the viewers to see "insights" that

would not have been visible if they were only provided with numbers (Smiciklas,

2012). Data is definitely the key, and the essence of data visualisation is the story

that it represents. However, for some, data visualisation is also considered an art

(Landman, 2013). The aesthetic aspect, undoubtedly important, however, in no

chance it should surpass in importance and priority a good data analysis. A good

data analysis is the alpha of an effective and understandable representation. The

main elements of data visualisation should initially be "structure, precision, integrity,

depth and functionality" and secondly "decoration", if that is necessary (Cairo,

2012). Simplicity, however, is the key. Colors, patterns, font alternations should be

used mainly to "convey information and not for decoration" (Wong, 2010).

14

2.3.2 Visualisation Types

There are many ways to classify the different types of visualisation. That said, one of

the most significant ones is differentiating them between static and interactive.

Static are considered the printed visualisations or those online that would look the

same or almost the same if they were printed. The reader is not requested to

participate in any other way in order to see the final result of the visualised data.

Interactive visualizations, on the contrary, usually involve motion or active

engagement of the reader/user who can, for example, select specific fields to filter

the data results or can actively choose the depth of the information they wish to

receive. This is a focused and more detailed data representation, alternatively

known as drilling down (Murray, 2013), which usually manages to capture more the

attention of the reader.

Although defining an infographic as static or interactive is essential, it would be only

a primal description of it. There are many other types of categorisation and

subcategorisation depending on the infographic's morphology and the purpose it

serves. For example, according to Bounford (2000), graphics can be classified to

those who are used for i) illustrating and storytelling, and ii) for statistical

representation. In the first category are usually included graphs such as: symbols,

pictorials, relational diagrams, time diagrams (timelines) and organisational

diagrams. In the second category, the types most frequently used are: tables, line

graphs, scatter graphs, bar charts, area charts, volume charts and combined charts.

2.3.3 Data Processing and Data Visualisation Tools

There is a great variety of data processing and data visualisation tools for anyone

who is interested in the field. The available options can vary from totally free

downloadable or online applications to very expensive creative platforms or

database and content management system plugins used by relatively big

organisations.

15

Some of the most popular tools are (Entry-level tools Online visualisations,” 2012 ;

Top Ten Tools for Data Journalism,", 2013; Halevy & Mcgregor, 2012; Barkai, 2013;

Rogers 2011):

1. Tableau and its free version Tableau Public: One of the most popular and

advanced data visualisation platforms, allows multiple layering of data, a

quality that makes it very effective for interactive visualisations (“Tableau

Software,” n.d.).

2. Many Eyes: One of the first free experimental web applications, created by

IBM, that produces advanced visualisations, static and interactive, which are

then hosted on its site. The users can thus browse and see archives of

visualisations created by others. It was the inspiration for many other tools

later developed, such as Tableau and Google Fusion. Unfortunately, this

application has not been substantially updated and has started losing ground

(“ManyEyes Visualisation Experiment,” n.d.).

3. Google Refine: Google refine was a refining and restructuring tool for data,

powered by Google. It is now called Open Refine (“Google Refine,” n.d.)

4. Google Fusion: A web based experimental application of Google for the

process of spreadsheets and the creation of graphs and maps, including

interactive ones. One of the most preferred tools, especially by data

journalists. (“Google Fusion Tables Experimental Application,” n.d.)

5. Datawrapper: Easy, simple and effective free data visualisation tool for the

creation of charts that can be either hosted in the service or self hosted in

the user’s website (“Datawrapper Software,” n.d.)

6. CartoDB: An online application for the analysis and interactive visualisation

of Geospatial Data, offering multiple layering data editing and display, along

with advanced css editing, html coding, database connection and query

execution (“CartoDB: Geospatial Data Visualisation,” n.d.)

7. ScraperWiki: A free web based tool, frequently used to clean, refine and

analyse data, although it additionally offers visualisation and extra coding

options (“ScraperWiki,” n.d.)

8. Wordle: Free web based text processing application used for the creation of

word clouds (“Wordle,” n.d.)

16

9. Adobe Creative Cloud (Suite): Adobes' popular programs for illustrations,

photo editing, animation, video and interactive applications (“Adobe Creative

Cloud,” n.d.)

10. Prezi: An online visual presentations' creating tool that can be used to create

storyboards for animated story telling or information presentation (“Prezi

Virtual Presentation Whiteboard,” n.d.)

11. BatchGeo: A cloud based map making application, easy and simple to use

(“BatchGeo,” n.d.)

12. Other visualisation and data refining-processing tools such as Tabula

(Bounegru, 2013) Crystal, Geotime, Dreamweaver (Ostergren, Hemsley,

Belarde-lewis, Walker, & Hall, 2011), Circos, Timeline, Protovis,

DataWrangler (“Data Visualisation - Selected tools,” 2013) and Visual.ly

It is very important to mention that despite the great available variety of tools, the

use of coding languages and scripts, such as JavaScript and Python, are inevitable,

especially in cases of complex data sets or data that gets updated constantly.

Coding and scripts, such as those offered by D3.js, allow more specific and

customised visualisations, according to the exact needs of the project and the

wished of the creators (Murray, 2013).

2.3.4 New Tendencies in Data Visualisation Despite this classification, and due to the creativity of graphics' teams and the

advances in the designing and data processing and visualisation software, many of

the graph types which are more often used for illustrating and storytelling can be

also used for data representation and statistics and vice versa. There are no

limitations to the possible combinations as there are also many more new types of

infographics that have recently emerged. Designers, statisticians, data experts and

researchers have cooperated in the designing and creative process of new

innovative software applications and tools that process data through advanced

algorithms and produce functional visualisations of high aesthetic standards and

using rich colour palettes, rich shapes and patterns, beautiful symbols and fonts.

17

Visual quantitative representations of words with the type of word clouds, advanced

networks, arc diagrams, area groupings, centralised bursts and rings, circled globes,

circular ties, elliptical implosions, flow charts, radial convergence graphs, radial

implosion graphs, ramifications and scaling circles are some of the newer designing

tendencies in visualisation (see the form of some of them in the above pic) and they

usually portray relations, connections, hierarchies of data and information elements

and values. All the above diagram types can be either static or interactive and can

represent static data sets or dynamic data that is being constantly updated, resulting

to a continuous change of the graphs (Lima, 2011).

Image Source: Lima, 2011, p. 158.

New tendencies in information visualisation broke the barriers of visualising data or

text. A new type of graphs-free visualisation, "direct visualisation", based on already

visualised material (images, videos) has emerged. While many do not consider that

18

as information visualisation, this new creative type of it will find much use in the field

of education, research and humanities, where displaying full detail rather than

graphs is crucial (Manovich, 2011).

Additionally, new emerging scientific fields such as bioinformatics, or social media

analytics, have employed visualisation to portray their research findings and

statistical calculations and have engaged more designers and researchers to create

more creative representations and more effective tools (Heer, Bostock, &

Ogievetsky, 2010).

2.3.5 Challenges and Controversies in Data Visualisation

2.3.5.1 Raw versus Aggregate Data

Having to build a data set from the beginning through data collection, data scraping

and mining could be very challenging and relatively time consuming. Its advantage,

compared to aggregate data sets is that it is built from the start in the most

convenient form for the data analyst prior to the data processing. Additionally, the

need for data clearing is minimal. Aggregate data sets usually need clearing and

reformatting before progressing to any visualisation actions, which in the case of a

bad original data set, could be very time consuming as well. Furthermore, it is

essential to ensure the credibility of the source and to be certain that the population

or sample are sufficient for the desired analysis (Ward, Grinstein, & Keim, 2010) and

that the cost of obtaining the data, if not provided free, is within the limits of the

project budget (Hox & R., 2005).

2.3.5.2 Avoiding nonsense A creator of data visualisation will very frequently need to combine data sets form

different sources, which most possibly come in different formats. The first essential

step is to create a unified data set whose variables, values and scales will have

unified structure and format so that any visualised comparisons and relations will

make sense to the reader. The second important step is to decide which

comparisons and relations are logical and actually portray something meaningful

19

and interesting for the reader and are not trying to lead to wrong assumptions (Ward

et al., 2010).

2.3.5.3 Strange Visualisations: How much is too much and what is considered

as Bad Visualisation?

There have been a lot of debates regarding the acceptable level of complexity of a

data visualisation. In an interview for the project "Journalism in the Age of Data" of

professor McGhee (2010), Alberto Cairo mentions: " Unfortunately informatics is

something that is usually dominated by fashion. The fashion that is winning now is

strange visualisations". There is no clear line or definition of what is a bad

visualisation and opinions vary in a great extent. Apart from the obvious reasons

that could make a visualisation bad and potentially misleading, such as scale

distortion, unclear lines and colours, it is generally agreed that very complex or

strange visualisations fail to communicate the story, as they tend to be

incomprehensible. They violate a basic principle of effective visualisation, which is

simplicity (Ward et al., 2010), despite the fact that they might initially capture the

viewers' attention.

2.3.5.4. Cultural Bias in Data Visualisation and Objectivity Data visualisation creators, especially when their readers are international, need to

take into consideration that many visual elements, such as colours, text or symbols,

may have different significance in different cultures, a fact that may jeopardise the

people's perception of the visualisation. It is advisable to review and slightly

customise, if necessary, each visualisation according to the cultural background of

the target group in question (Schaap, 2012).

Furthermore, there is no such thing as "objective" or neutral data visualisation (Hohl,

2011) due to the human interference in each step of the process. Therefore,

according to Ball (2013), it is necessary to achieve balance between analysis and

20

presentation, in order for their readers to feel that it makes sense and that they can

trust their infographic.

2.4. Data Journalism Data-driven journalism started taking its current form since the mid 2000s when the

most important newspapers and other independent news organisations, especially

in U.S and U.K, like The New York Times, The Guardian and ProPublica, created in

their offices teams of journalists with knowledge on data and computing. Those

teams create interactive maps and other visualisations and presentations using

computer applications that "collect, process, analyse and visualise data sets"

(Parasie & Dagiral, 2012).

Notwithstanding, until recently, journalists lacked the ability to work with data. This

was the main obstacle that prevented them from working on data related projects.

(Aitamurto, Sirkkunen, & Lehtonen, 2011). The recent focus on data journalism and

its significance and potential is clear in the following statement: "Data-driven

journalism is the future", by Sir Tim-Berners-Lee, inventor of the World Wide Web.

That is because the possibilities and the available options in data processing,

visualisation techniques, programming languages and data, especially open data

and open government data, are endless (Arthur, 2010).

The aim of Data Journalism is not to just provide the data and the statistics but also

to tell a story through them focusing on people. "Stories are told about people and to

people" mentions Paul Bradshaw (as quoted in Marshall, 2012). The most significant

quality of Data Journalism though, is that it enables journalism, especially

investigative journalism, to reach deeper according to the investigative reporter

Diana Priest (McGhee, 2010). In its very essence it is a matter of democracy as it

can be used as one of the main "weapons" that people and journalists can use to

hold accountable politicians and governments (Cohen, Hamilton, & Turner, 2011).

Despite all the advantages of data journalism, there is one potential risk that

journalists should bear in mind. They must not forget that they will still need to

search for the human side in the story and not to get lost in data. With the increasing

interest on data in all its forms, inevitably many more people, bloggers and most

21

importantly reporters will turn to it and after obtaining certain skills could be able to

manage data very well and come to useful findings. It is necessary for them though,

especially for reporters, not to forget that it is the story that matters first and that they

will still need to synthesise various pieces of information and not to find themselves

overwhelmed by data (Oliver, 2010). Additionally, even if they become very good at

data management and analysis, there will be times that data sets could be so

complicated and large, as for example in the case of the Wikileaks War Logs

(Rogers, 2010), that it should be managed and analysed by or with the help of

experts in order to be reshaped into a useful and more understandable, for both

readers and reporters, format.

2.4.1 Data and its Challenges One of the greatest challenges for a data journalist is obtaining the data, its original

format and the cost of obtaining or collecting it (Aitamurto et al., 2011). Data,

nowadays, can be found in various forms and from various sources. Data journalists

can either gather primary data or can find or acquire secondary data. It can be

scrapped from the Internet with the use of coding and programming (Cohen et al.,

2011), or it can be gathered through crowdsourcing, through subscriptions or survey

carrying.

Data and structured information sources for a journalist might be many, however,

that does not mean that the data would be "ready to use". Many times, refining,

filtering and rearranging are essential in order for the "dirty data" (Halevy &

McGregor, 2012) to be reliable for using and analysing.

2.4.2 Open Data and Crowdsourcing Open Data changed the landscape of data and information management but also of

journalism, politics and communication. It also has changed the landscape for

citizens as well. In quest for transparency, in 2006, The Guardian launched the

"Free Our Data" campaign (“Free Our Data Campaign,” n.d.). In 2010, David

22

Cameron announced the publication of a variety sets of database both by the

government and the local authorities (Oliver, 2010).

Open Government Data are available online for free and in various formats so that

all people are able to have access, and under licence that allow re-use (Davies,

2010 ; Joel, 2011 ). For journalists though "using open data means republishing it in

a different, consolidated or curated format, or in a way which makes it easier to

explore and make sense of" (Leimdorfer & Thereaux, 2012). Journalists re-use,

reshape and combine different data sets that they then provide to the public along

with the relevant visualisation that usually completes their articles. This process of

disseminating data to the public is one of the four reasons for which journalists use

open data. The other three reasons are: "i) To discover newsworthy facts or stories,

ii) To discover trends hidden in large datasets, and iii) to create data visualisations"

(Kronenburg, 2011).

Nevertheless, it is not only data journalism that benefits from open data. The

benefits from all this process are great since open data is benefited as well from

Data Journalism in two ways: i) its value increases through visualisation, and ii) in

various cases journalists participate in the creation of open data sets (Kronenburg,

2011).

Crowdsourcing has the advantage of time saving as many people participate and

collaborate in a quick research for data collection that would otherwise take one

researcher a much longer time to complete. People's comments on The Guardian's

MP expenses released data set led journalists to further investigations and to the

creation of more related stories (Flew, Spurgeon, Daniel, & Swift, 2012).

2.4.3 Big Data

There are various opinions as to what Big Data is. The common dimension in all of

them is that Big Data is the great amount of information and data that constantly

grows. All this great amount of information and data is collected with the use of

advanced algorithms. Algorithms are often programmed to extract, process and

transform data and information that do not come in traditional forms, such as photos,

23

text, video and audio files. According to Dah Gardner though, Big Data is much

more than its size; "It is the ability to extract meaning: to sort through masses of

numbers and find the hidden pattern, the unexpected correlation, the surprising

connection" (Smolan & Erwitt, 2012).

The speed and size rate at which data is generated by the humanity are so high that

it is difficult to conceptualise with our human mind. Just the data produced by social

media in a daily basis is enormous and very complicated to process as in a great

degree this would mean processing and analysing online human behaviour and

expression (Mahrt & Scharkow, 2013). Similar challenges are faced in data

produced by big digitisation projects and they can be tackled quite successfully up to

a certain degree with the help of crowdsourcing (Smolan & Erwitt, 2012). Yet, as the

amount of data constantly increases, data management and processing systems

and tools are also becoming more effective and more advanced in a need to comply

with the process of data that could even have the size of exabytes (“Big data needn’t

be a big headache: How to tackle mind-blowing amounts of information,” 2012).

However, even if big data is processed successfully in terms of statistical analysis,

this does not mean that the numbers will definitely be right. One of the great

challenges of big data is that as any data set created by humans, they cannot be

totally objective and they should only be examined, evaluated and considered only if

they are seen in the greater sociological context of the people and the place(s) that it

was generated from (Crawford, 2013). Another challenge is the constant need for

more advanced data management tools and systems which will also be cost

effective (Buhl, Röglinger, Moser, & Heidemann, 2013).

Big data can be a very important source of information, especially in financial

journalism, as it can help reporters monitor companies, organisations and the

government for legal or ethical violations. However, here too data should be used as

a tool and not as the aim. It is important to question the results of the data analysis

are right and to check the facts with the help of specialists and sources that can

provide insights to the story (Marshall, 2013).

24

2.5 Data Visualisation in Data Journalism The media use data and information visualisation in order to provide their

users/audiences with a visual representation of the information and/or data they

believe that support a good story. Apart from the printed or TV-Broadcasted

infographics, almost all leading media organisations host entire portfolios of

infographics on their websites, either in simple, static, forms or in interactive form.

"Data can be both the source of data journalism, but it can also be the tool that the

story is told" (Bradshaw, 2012, cited in Grey, Chambers, & Bounegru, 2012)

Therefore, a journalist can either find during data analysis a story that is worth

telling, or can back up existing or emerging stories with valid arguments that derive

from data analysis. The "evidence" might be there but without a convincing analysis

to support it, it may remain vague and unnoticed. Good data visualisation is the way

to prevent the story from being unnoticed, as "an infographic should provoke

thought" according to Steve Duenes (Losowsky et al., 2011).

In the project "Journalism in the Age of Data" (McGhee, 2010) several professional

graphists and data journalists from leading media organisations were interviewed

about the challenges, the basic principles and the required skills of data journalists,

on individual and team level. Regarding the principles of visualisation and data

journalism, John Grimwade stressed the importance of clearly telling a story, under

the condition that principles of graphics apply, and not just to spin off numbers.

Referring to the required professional skills, most interviewees agreed on the

importance of collaboration between different specialties, since it is almost

impossible for one person to be able to do it all. Therefore, agreeing with Weber and

Rall (2012) for the "need of speaking the same language", understanding, if not

having knowledge, of statistics, coding and graphics design is essential for data

journalists, so that collaboration with the specialists in each field can run smoothly.

Professor Michael Stoll added the need for some basic knowledge in social

sciences.

Of the challenges mentioned in the project, Paul Steiger from ProPublica was

skeptical as to how far can accessibility and openness go, a concern that agrees

with Stolte (2012) who mentions that legal and ethical data collection (especially

25

through internet scraping) and reproduction/distribution that would respect personal

privacies need to be a prerequisite of good data journalism.

Another important challenge is the time restrictions and deadlines of newsrooms,

especially during breaking news. Accuracy, integrity and credibility always should

come first before speed and visual aesthetics (Weber & Rall, 2012). Steve Duenes,

graphics' director of New York Times (“The New York Times: Multimedia,” n.d.),

believes that it is crucial "to have people physically close to the story" despite of how

good the graphics can be. However budget limitations and "shortsightedness" in

some newsrooms about the role of visual journalists do not usually make that

possible (Losowsky et al., 2011).

2.5.1 Workflow in Data Journalism

The usual steps in the creative process of a data driven article are (Aitamurto et al.,

2011):

1. Identifying the potentiality of a story and how data could contribute to it

2. Finding and gathering the appropriate data sets for the research

3. Clear, correct and reform data if necessary

4. Analyse and combine data sets

5. Writing the story and creating the relevant visualisations

6. Publish the relevant data sets together with the story and the visualisations

7. Invite and challenge the readers to reuse the data and share the stories with

others through social media

In the case study content analysis of Giardina & Medina (2012) on the workflow of

the infographics department of The New York Times, it was discovered that the

"available graphic tools" and the "adopted reporting processes" are two of the main

factors that influence this workflow.

2.6 The Guardian Data Store The Guardian uses data visualisation to portray some of their main articles. While

The New York Times is famous for the very high aesthetic value of their

visualisations, The Guardian, usually, uses simpler types of visualisation. However,

26

The Guardian Data Store is one of the world's richest in story-variety and most well

respected data journalism portfolios in the world since it's launch in 2009.

Simon Rogers (2013), founder of The Guardian Data Store, highlights some of the

following principles of good data journalism:

§ It's "all about the story"

§ Provide the key data people need

§ "Make it personal"

§ "Engage": always put the data file of the visualisation on article and make all

data accessible when possible

§ Simplify and share with readers complicated and bid data sets

§ Continuous promotion of "Open data movement”

§ Anyone can do it if they concentrate on what they can do best and

designate, if necessary, the rest to other specialists in their fields

3. Methodology An inductive research approach of mixed qualitative and quantitative methods (Salmons, 2010) was chosen as the most suitable for this study. Although there were other options, the combination of semi-structured interviews with a thematic analysis of those (qualitative approach) and a systematic content analysis of a sample of articles of The Guardian Data Store (quantitative approach) were considered the most appropriate to answer the research questions. The joint power and complementary nature of the advantages of each method (Dawson, 2009) would help answer the research questions.

There are various alternative research approaches a researcher could take with such a great source of material, like the articles of The Guardian Data Store. Some of those can be found as future research suggestions in the Conclusion chapter. It is essential though to mention that the differentiation of the research method approaches would depend on the kind of questions the researcher would wish to answer. For example, if the researcher would like to examine how readers perceive those articles and the grade of understanding they have of them and particularly of

27

their visualisations, then a quantitative research method on a number of readers with the use of questionnaires in combination with a qualitative research method such as a focused observation group of readers, possibly along with interviews, would be an good approach. Other possible studies are suggested on the Conlusions' chapter.

3.1 Ethical Approval

The proposal of this study was examined by The Information School Research

Ethics Panel and was evaluated as 'Low Risk'. The study was ethically approved by the panel as it was found to be in accordance with the University of Sheffield’s

policies and procedures.

3.2 Qualitative Research Some of the main qualities and, in most cases, advantages of the qualitative research is that it examines issues and phenomena, within an inductive research approach, in a broader way, trying to describe, understand and in some cases explain them from an internal point of view in various ways:

§ By focusing on the opinions and experiences of individuals, either personal or professional, on case studies and examining their knowledge (Burns, 2000)

§ By observing and/or testing actions, interactions and communications while they take place, and then analysing the data selected from this process (Kvale, 2007)

§ By examining items, archives and material such as images, videos or documents that could contain useful information of such nature (Kvale, 2007)

Qualitative research, in the majority of the cases, is not carried out with a background of pre-defined concepts and hypothesis. On the contrary, hypotheses are usually absent from this method and, in the rare case they are used, they are formed and structured during the procedure along with other various concepts

28

nature (Kvale, 2007). This is exactly what this inductive study was designed to do, as there was no pre-defined hypothesis to verify. For this research, gathering experience, opinions and knowledge of professionals of data journalism and data visualisation and understanding their perspectives (Burns, 2000) was considered highly critical in helping answering the following research questions: Which is the creative process behind an infographic created and/or hosted by The Guardian Data Store? In more detail:

§ Which is the creative process step by step and who are the decision makers?

§ Which data types are the most broadly used and how is data selected and gathered?

§ Which are the most important tools used in the process either for data processing and analysis or for visualisation?

§ How do journalism professionals perceive data and information visualisation in terms of value and effectiveness?

§ Which are the possible weaknesses, limitations and the negative aspects of data & information visualisation?

It was decided that in-depth interviews with professionals that currently work or have worked in the past or who have published on The Guardian Data Store in free-lance basis were the most effective way to gather such data. In-depth interviews is a qualitative research method were the researcher tries to collect from the interviewee, in the form of a conversation, information and data on their insight, their point of view on various issues, their personal experience and/or feelings on an different topics. The approach is not to "put things to someone's mind"

(Hannabuss, 1996) but rather to let them unfold their perspective. More specifically, interviews were considered highly effective in providing deeper understanding on specific aspects of the study. The collected data from the interviews would be used to understand better the subject of the research and to help answer research questions (Salmons, 2010), and shed light and provide to blur or complicated issues such as:

29

§ How data and information visualisation is used in journalism and why is its use constantly increasing?

§ Which are the required skills and knowledge in order to work on data visualisation on a professional level?

§ Which is the importance of data and data visualisation as perceived by the professionals?

§ Which are the possible limitations, weaknesses and negative aspects or impact of data journalism and information visualisation?

Providing answers to the issues mentioned above would significantly bring the research closer to meeting some of its main objectives.

3.2.1 Design and execution of interviews

3.2.1.1 Profile of Interviewees1

Jacopo Ottaviani: Freelance data journalist and with strong technical background in programming. His data journalism work is often portrayed in the popular Italian news site "Il Fatto Quotidiano".

Lisa Evans: Former data researcher for The Guardian with a special interest in statistics, has written or co-written 139 articles. She is currently working for the Open Knowledge Foundation.

Paul Bradshaw: Award-winning online journalist, author of the "Online Journalism Handbook", Course Leader for the MA in Online Journalism at Birmingham City University and visiting professor at City University, London.

1 Links to the profiles of each intreviewee at The Guardian Data Store and to personal web

pages or blogs are provided in the References

30

3.2.1.2 Interviews' Preparation and Conducting In order for the research to be ethically reliable (Salmons, 2010), a consent form that described how the interviews would be conducted, how the data would be recorded and who would have access to it, was created for the interviewees to read and verify, prior to the interview, that they agree to its terms. All three interviewees agree to the terms of the consent form, either by signing it digitally, or in person or by replying to the email that contained the consent form that they had read it and that they agree to its terms. The interview with Mr. Ottaviani was held through a Skype video call on July 15th, 2013. The interview with Mr Bradshaw (in person) took place at his office in Birmingham City University on July 15th, 2013 and the interview with Mrs Evans was conducted through a Skype audio call, on 2nd August 2013. All interviews, with the permission of the interviewees, were audio recorded in a digital recorder. The audio files were stored in a personal computer, with no access to third parties. Additionally, all interviewees agreed not to be anonymised and did not wish any part of the interviews to be omitted from the research. The interview questions were designed in a semi-structured form because this type of interviews allows flexibility but also helps maintain a better control of the procedure. The questions were a combination of open and closed format, which required a good balanced set of questions in order to allow the respondent to express their opinion but also for the replies not to be very time consuming (Walliman, 2011). At the beginning of the interviews a converging-question approach was followed, were the respondents were asked more general questions (Thomas, 2003). Although the interviews were designed in semi-structured form, they were at some points conducted as open-ended when that was possible, mainly when the

informants were providing an insight and a description of their experience on specific cases. In such moments the researcher hands the reigns of the interview to the interviewees, allowing them to express themselves in greater freedom and more naturally (Burns, 2000). Looser forms of interviewing, with semi-structured or open-ended questions provide a great environment for a response-guided approach,

31

were the interviewer can instantly create follow up questions based on replies given by the informants on initial questions. This enables the researcher to focus in detail on the respondents' opinion on issues that were related or derived from the initial question (Thomas, 2003).

3.2.1.3 Data Collection and Processing

All interviews were transcribed in verbatim form word-by-word (Kvale, 2007) (transcripts available in Appendix 2), checked and then careful notes were taken for each one and then their data was processed with the method of Thematic Analysis. Thematic analysis is the identification of patterns and main key themes through the careful examination and basic coding of the extracted data. Key themes provide a strong connection to the research questions and are broader than codes, which primarily identify connections between various data elements. It is a method that allows flexibility and is relatively easy to implement for inexperienced researchers

(Braun & Clarke, 2006).

3.2.1.4 Limitations and disadvantages of interviewing The interviews were designed to be conducted either face-to-face, or through a Skype call or video call depending on the respondents' preference and availability, but also on some other limitations, such as a great distance between the researcher and the interviewee or time scheduling and budget limitations, where a trip to conduct an interview face to face would either consume too much time or would have a high cost. However, in the case where an interview was conducted through a Skype video call, the result was very similar to that of a face-to-face interview. A final limitation of this method is that many of the interview requests sent to the selected contacts can be and were, in this particular survey, ignored, despite the repeated efforts of communication. One of the disadvantages of interviews is that transcribing them can be very time-consuming. Additionally, it could be some times difficult for the researcher to maintain objectivity (McNeil & Chapman, 2005) and carry on a bias-free interview.

32

Therefore, a careful questionnaire preparation and testing was carried out prior to each of the interviews.

3.3 Quantitative Research: Although qualitative research helps provide answers for a number of the research questions set, it is insufficient and inadequate to provide all answers. The Guardian Data Store has now more than 3000 articles published on its Data Blog since its first publication on January 14th, 2009. All those articles contain raw data that can be only gathered, refined and processed through a quantitative research method, more precisely with the method of Systematic Content Analysis. The exact number of articles collected and examined in the study and the time frame they cover is analysed in detail in part 3.3.1. Systematic Content Analysis can vary from very basic to extremely complex. With

the continuously increasing available number, size and types of data sources, especially those available in an electronic and digital format, a great number of research techniques arose and more effective tools were built. Additionally, it is now more frequent that the majority of data sets are created or processed through the collaboration of more than one researchers (K. Krippendorff, 2004). However, although these techniques and tools help handle larger amounts of data than before and help diminish the duration of the process, Systematic Content Analysis is considered very time consuming and its results can still be altered by defective material sources (Devi, 2009). For this research though, this method is the main way that a researcher could find some answers to the following basic research questions:

§ Which various types of visualisation and tools used can be recognised in the portfolio-case study? Are any norms or patterns of them identified?

The findings of the content analysis can also complement the findings of the qualitative methods specifically for the case study of The Guardian Data Store. The aim of the quantitative research methods is mainly to help meet some of the main objectives of this research, which are to identify:

33

§ The various tools used either in data analysis (and possibly formulation / editing) or in visualisation, and more specifically by The Guardian.

§ Possible tendencies, norms, co-relations on The Guardian’s portfolio, mainly regarding subject, visualisation type and tools

3.3.1 Design and Implementation of Systematic Content Analysis On the A-Z section of The Guardian Data Blog one can find and download in the form of a spreadsheet, a complete index of all the published data sets and articles. More specifically, since the first article published on 14th January, 2009, until 30th July, 2013, when the articles were collected for the study, this spreadsheet consisted of 2959 articles. The spreadsheet contains details such as the hyperlink to each article, the date and time of its publication and its title. Those 2959 articles were defined as the original population of the quantitative research. Of those 2959 articles a sample of approximately 10%, 295 articles, was selected through the method of Systematic Sampling. Systematic sampling is one of the most frequent methods used in statistics in order to select a specific number of members or items as a sample population from a much larger number of the original population. A random starting point was set as that of the 10th in order

article of the original spreadsheet with a pre-defined fixed, periodic interval of ten articles. Therefore, the articles selected were the 10th, 20th, 30th... etc., up to the 2950th which was the final one. After the sample was selected, it was noticed that some of the dates of the articles were either missing from the table or were in the wrong format. They were corrected after examining each of those articles (about 10 in number) and then the basic sample spreadsheet was ready. Each of the sample articles were classified according to a Code Frame, based on Vis (2012) and Lotan, et al (2011). A clear defining of variables, objective procedures of coding and categories is essential (Mayring, 2000) for a scientific research method as they help increase its level of objectivity (Prasad, 2008).

34

The research focused on the following 15 variables of each article that were classified: Table: 1. Variables’ Coding Scheme

Variable

Code Name

Variable Description

Var1 Year of Publication

Var2 Number of visualizations

Var3 Author of article

Var4 Subject Category

Var5 Existence of Visualisation Number 1



Var8 Type of Visualisation Number 1



Var11 Tool for Visualisation Number 1



Var14 Existence of Data Summary

Var15 Existence of Data Set

Although The Guardian provided the date and time of each publication on the spreadsheet, in order to facilitate the research, an additional column that indicated only the year of publication of each of the articles was created. After the creation of

this category, the articles were ranked in ascending order by the Year of Publication. The subject categories' classification was mainly based on the category tagging of each article from its author or creating team.

35

3.3.1.1 Limitations in Coding

The classification of the Subject category classification faced one of the most severe limitations and difficulties on this research. An article could only be classified in one subject category, although many times it referred to issues that belonged to more than one category.

The classification of the types of visualisation was based on the book "Digital Diagrams" by Trevor Bunford (2000). Again, more limitations were faced in this part of the research as well. There were times when an image of data visualisation contained more than one type of visualisations. These were treated as separate items and not as one. Furthermore, there was the complex issue of static and interactive visualisations. In the case where the interactive visualisation was based on the basic types of static visualisation and where, for example, the user could click on a bar chart and see more numbers or select a different variable from the menu, those visualizations were treated as static and the type of visualisation was stated. In the cases of motion or animated graphics, complicated interactive networks and clouds or combinations of multiple interactive types which the user had to actively explore, those were classified simply as interactive, in order to avoid the confusion and blur boundaries of such complicated multiple classification. Graphics portrayed in

videos were classified under the category type of videos. In an effort to avoid confusion and mistakes created by such limitations, some basic classification rules were created for all variables, in order to help decide how to classify each article in a single category. Those rules and code frame also help eliminate possible bias of the researcher (Prasad, 2008). The entire classification code frame and its specifications, rules and assumptions made can be found in Appendix 3.1.

36

3.3.1.2 Data Processing After the primary data was gathered, it was processed and tabulated (Walliman, 2011) using Excel and with the help of basic descriptive statistics (Rugg, 2007), a set of results such as frequencies (United States General Accounting Office. GAO, 1989) and percentages. The generated results could help answer some of the research questions. More specifically, the main focus was on examining:

§ The main authors of the articles in number of publications § The number of articles and the percentage of the sample that contained

visualisation § The average number of visualisations contained in each article § The number and percentage of articles that provided the relevant data set § The number and percentage of articles that provided the relevant data

summary § The number and percentage of articles in each subject category § The visualisation tools used each year and then how the use of each of

selected tools progressed through out the years § The frequencies of use of tools, types and the frequencies of subjects per

author, in order to identify possible tendencies § The types of visualisations per subject vice versa, in order to identify

possible tendencies § The most used visualisation types per most used visualisation tools and vice

versa, in order to identify possible tendencies

3.3.2 Inter-Coder Reliability Testing Although generating some basic results is the main goal of the researcher, it is very important that the method of research is reliable (McNeil & Chapman, 1985). This means that if at least a second person was given training on how the research is conducted and was explained the code scheme, its rules, and clearly defined procedures (Graziano & Raulin, 2012), then this second person would create very similar data to that created by the first researcher (Krippendorff, 2003). This method of measuring the research reliability is called Inter-Coder Reliability. Although it

37

does not immediately ensure that the results are valid, Inter-Coder Reliability can help reassure in a higher grade that the data interpretations are valid. It is also a very helpful way to evaluate and edit the code frame when necessary in order to be more effective. In this research a secondary sample of a 10% of the articles of the original sample was provided to a second coder who was trained based on the code scheme and the set rules and was asked to complete the data spreadsheet for this secondary sample of 29 articles. The Inter-Coder Reliability was tested online in a nominal level of 2 coders on with the online tool Recal. The Inter-Coder Reliability was calculated by The Percent Agreement and Scott's Pi. Although Percent Agreement is easier to calculate, Scott's Pi is an index that shows the level of reliability after "taking into consideration in its calculations the agreement by chance" (Freelon, 2010). Therefore, Scott's Pi is considered a more objective index. For this study, a minimum of 0.8 is the required result for Scott's Pi.

3.3.2.1 Inter-Coder Reliability Test Results The table below shows the Inter-Coder Reliability test results' table from the exported CSV of Test results, The Percent Agreement and the result for Scott's Pi of each variable. The test showed at least a 93% of agreement between the two coders for each variable while the result of Scott's Pi for variables 1-14 was at least 0.9 and for variable 15 was 0.84. This proves that the reliability of the coding scheme meets the minimum requirements. A screenshot taken when the results were produced (since ReCal does not generate a reference number for each testing) is provided after the table.

38

Table 2: Inter-Coder Reliability Test Results.

FILENAME Inter-Coder1.csv Filesize 2028 bytes n columns 30 n variables 15 n coders per var 2 Percent Agreement Scott's Pi Variable 1 (cols 1 & 2) 100 1 Variable 2 (cols 3 & 4) 100 1 Variable 3 (cols 5 & 6) 100 1 Variable 4 (cols 7 & 8) 96.55172414 0.962214984 Variable 5 (cols 9 & 10) 96.55172414 0.900854701 Variable 6 (cols 11 & 12) 96.55172414 0.916786227 Variable 7 (cols 13 & 14) 100 1 Variable 8 (cols 15 & 16) 100 1 Variable 9 (cols 17 & 18) 96.55172414 0.932322054 Variable 10 (cols 19 & 20) 96.55172414 0.916305916 Variable 11 (cols 21 & 22) 93.10344828 0.910493827 Variable 12 (cols 23 & 24) 100 1 Variable 13 (cols 25 & 26) 100 1 Variable 14 (cols 27 & 28) 100 1 Variable 15 (cols 29 & 30) 93.10344828 0.847368421

39

Screenshot of the results output of Inter-Coder Reliability Test in ReCal:

40

4. Findings and Discussion This chapter is divided into four parts, each part corresponding to a research question and the equivalent objective(s). The section for the first research question presents the results of the quantitative research and a brief discussion on them. The other three sections, one for each of the other three research questions, are analysed under four themes recognised on the thematic analysis of the qualitative

research, the interviews. The findings for each theme are provided with the relevant discussion.

4.1 Research Question 1: § Which various types of visualisation and tools used can be recognised in the

portfolio-case study? Are any norms or patterns of them identified?

4.1.1 Objective: To identify:

§ The various tools used either in data analysis (and possibly formulation /

editing) or in visualisation, by The Guardian.

§ Possible tendencies, norms, co-relations on The Guardian’s portfolio, mainly


4.1.1.1 Most Important Findings and Parallel Discussion2:

Visualisations per Article

Of the 295 articles examined, more than half included at least one type of

visualisation. The average number of visualisations per article was 1,97, almost two

visualisations per article.

2 Please note that the numerical order of the charts and tables in this chapter is different

than that of Appendix 4, where more charts and tables are included for each category (see

main table of contents for those). Additionally, all numbers and percentages for year 2013

refer to publications until 30th July.

41

Chart 1: Number of 1st, 2nd &3rd Visualisation, Total Number of Visualisations

Provision of Data Summary and Data Sets (or links to data source)

Chart 2. Provision of Data Summary and Data Sets (or links to data source) % of total

Additionally, of the total number of articles, about 73% included at least a data set and approximately 36% included a data summary. 35% of the articles included both.

12%

20%

6% 5% 7%

50%

Number of Visualisations per Article

Articles without Visualisation

Articles with 1 Visualisation

Articles with 2 Visualisations


35.59%

72.54%

0.68%

37.97% 34.92%

Articles with Data Summary

Articles with Data Set

Articles with only Data Summary

Articles with only Data Set

Articles with Both Data

Summary and Data Set

Provision of Data Summary and Data Set

42

Authors by Number of Publications and Year (in descending order)

Being the creator of The Guardian Data Store, it was evident that Simon Rogers would be the author with the most publications, something that Chart 3 confirms. On Table 3 and Chart 4 of the next page, it is interesting to notice that Mona Chalabi, the third author in number of publications has only very recently published the vast majority of articles, as she became a member of The Guardian Data team in late 2012. It is also important to mention that one of the professionals interviewed for this study, Lisa Evans, is the 5th most published in the blog, as she was a member of The Guardian Data team for three years. Finally, about one fifth of the total articles examined has as first authors people with less than three publications on the blog, mainly writing for it in freelance base. Chart 3. Main Authors (percentage of total publications)

41.02%

9.15% 7.46% 5.42% 3.39% 2.71% 2.03% 2.03% 1.69% 1.02% 1.02% 1.02%

18.31%

Authors

43

Chart 4. Main Authors (Publications per year, Percentage of total publications)

Table 3. Main Authors (Publications per year, Percentage of total publications)

Author Name

Number of Articles in 2009





Total Number of Articles

Total Percentage

Simon Rogers 21 24 31 30 15 121 41.02% Ami Sedghi 0 4 7 7 9 27 9.15% Mona Chalabi 0 0 1 0 21 22 7.46% John Burn-Murdoch 0 0 1 12 3 16 5.42% Lisa Evans 0 2 3 5 0 10 3.39% James Ball 0 0 3 2 3 8 2.71% Claire Provost 0 0 2 3 1 6 2.03% Katy Stoddard 1 4 1 0 0 6 2.03% Nick Evershed 0 0 0 0 5 5 1.69% Randeep Ramesh 0 0 0 3 0 3 1.02% Sarah Hartley 0 2 0 1 0 3 1.02% Kevin Anderson 3 0 0 0 0 3 1.02%

Others 9 8 14 25 9 65 18.31%

44

Articles Per Subject per Year

Chart 5: Articles per Subject (Percentages) in total (all years)

About 19% of the articles were about "Politics, Government and Public

administration", with second articles those about social issues and third articles

those about culture. On Chart 6, one can see the number of articles per subject

category, per year.

18.64% 7.12% 8.14%

4.41% 4.75%

6.44% 14.92%

1.69% 5.76%

4.07% 5.42% 5.76%

3.73% 4.07% 4.41%

Politics / Government / Public Administration Sports Culture Health

Military / War Education Society

Crime / Terrorism World News

Global Development Environment / Weather / Nature

Media / Journalism Transportation

Technology / Science Economy / Business

Total Percentage Per Subject

45

Chart 6. Articles per Subject per Year (Frequencies)

Visualisation Types

The most frequently used visualisation type was the bar chart, used in almost 19% of the total number of visualisations, followed by maps with 16% and interactive visualisations at almost 15% (Chart 8). On Chart 7, one can see that maps and interactive visualisations were mostly used as the 1st visualisation of the articles, while bar charts and line graphs were more frequently used as second and tables as 3rd. This is consistent with the fact that maps and interactives attract and engage the audience more (Murray, 2013), so they would be used first in order to take advantage of this quality of theirs. Simplest types of visualisation, such as bar charts, line graphs and tables are used more to provide quick and brief insights into the data and the article and mostly appear later in it, after or within the actual text.

46

Chart 7. Types of 1st, 2nd and 3rd Visualisation (Percentages)

Chart 8. Types of Visualisations (Percentages of total use)

14.90%

2.78%

0.25%

3.28% 2.53%

9.60%

18.94%

8.33% 6.82%

15.91%

2.27%

7.58%

4.04%

1.01% 0.76% 1.01%

Total Percentage of Use Per Type of Visualisation

47

Visualisation Tools

As one can see in Chart 10 of the next page, the majority of the visualisations of the

The Guardian Data Blog were host of visualisations from external sources, such as

graphs from reports of official organisations, with second in frequency graphs

created by The Guardian's Graphics' team or external freelance graphists for the

Guardian. Third in frequency were visualisations created with Datawrapper, fourth

with Google Fusion and then those with Tableau and Many Eyes.

In Chart 9 below, we notice that although the creations from The Guardian's

graphics' team remain relatively steady through the years, visualisations from

external sources were hosted mostly in 2012. The use of Datawrapper increased

dramatically last year, making it the most used tool for 2013, while the use of Many

Eyes dropped to almost zero since 2012, probably due to the fact that it has not

been updated sufficiently (Rogers, 2011).

Chart 9. Main Visualisation Tools' Use Per Year (Frequencies)

48

Chart 10. Total Use of Main Visualisation Tools (Percentage) in descending order.

Frequencies of Use of Tools, Types and Frequencies of Subjects of Authors - The case of Simon Rogers

As Charts 11, 12 and 13 reveal, in his articles Simon Rogers mostly hosted

visualisations created by external sources, The Guardian's Graphics' team and

visualations created with Google Fusion, Many Eyes (at earlier stages of the Blog)

and Datawrapper. The majority of the visualisations were interactive, maps or Bar

Charts and his articles were about Politics/Government/Public Administration,

Society, World News and Culture. Similar tables and charts for the top 12 authors

can be found in Appendix 4.

49

Chart 11. Simon Roger's Use of Visualisation Tools (Frequencies)

Chart 12. Simon Roger's Use of Visualisation Types (Frequencies)

49#

29#22#

13# 11# 9# 7# 4# 3# 2# 1# 1#

11.#Graphic#fro

m#External#S

ource##

9.#Guardian#Graphics'#Team#/#

4.#Google#Fusion#

3.#Many#Eyes#

12.#Datawrapper#

6.#Google#Docs#/#Drive#

2.#Wo

rdle.net#

1.#Tableau#

8.#Infomous#

14.#Prezi#

5.#Zoom.it#

17.#Cartödb#

Author:(Simon(Rogers,(Tools(Used(Times#of#Use##

36 35

19

13 12 12 9 9 8

5 5 4 1 1

Author: Simon Rogers, Types Used

50

Chart 13. Subjects' frequency in Simon Roger's articles

Visualisation Types per Subject and Subjects per Visualisation Types:

Chart 14 shows that most interactive visualisations were used in articles whose

subject was Society, Sports, Politics/ Government/ Public Administration and Culture, which is natural since interactive visualisations, as Charts 16-19 show, were the second most used tool in articles about Society, first in articles about Society and Sports (let’s not forget London 2012) and among the top five tools for articles on Politics/Government/Public Administration. Chart 15 shows that most maps were used in articles whose subject was Society and Politics/Government/Public Administration, which is consistent with what Charts 16 and 17 show, where maps were the third most used tool in both categories.

25

18

10 9 9 7 7 7 6 5 5 4 4 3 2

Author: Simon Rogers, Subjects

51

Chart 14. Visualisation Type: 1. Interactive, per Subject (Frequencies)

Chart 15. Visualisation Type: 10. Map, per Subject (Frequencies)

10 9

6 6

4 4 4

3 3 3

2 2

1 0 0

Society Sports

Politics / Government / Public Culture

Education World News

Media / Journalism Military / War

Crime / Terrorism Environment / Weather / Nature

Global Development Transportation

Technology / Science Health

Economy / Business

Type: Interactive, Per Subject

9 8

7 7

6 5

3 3 3

2 2 2 2 2

1

Society Politics / Government / Public

Health Transportation

Education Global Development

Culture World News

Economy / Business Sports

Military / War Crime / Terrorism

Environment / Weather / Nature Media / Journalism

Technology / Science

Type: Map, per Subject

52

Chart 16. Subject 1. Politics / Government / Public Administration, per Visualisation Type (Frequencies)

Chart 17. Subject 7. Society, per Visualisation Type (Frequencies)

12 12

8 7

6 6 6 5

3 2

1 1 0 0 0 0

12 12

8 7

6 6 6 5

3 2

1 1 0 0 0 0

Subject: Politics / Government / Public Administration, per Type

16

10 9

6 6

3 3 2 2

1 1 0

Subject: Society, per Type

53

Chart 18. Subject 3. Culture, per Visualisation Type (Frequencies)

Chart 19. Subject 2. Sports, per Visualisation Type (Frequencies)

6 6

4 4 4 4

3

2 2 2

1 1 1

0

Subject: Culture, per Type

9

7 6

4 4

2 2 2 1 1

0

Subject: Sports, per Type

54

Visualisation Tools and Visualisation Types

Types per tools:

Charts 20 to 23 show that most interactive visualisations were created by external sources and with the tool Many Eyes. Additionally, most bar charts were created with Datawrapper and most maps with Google Fusion. Tools per types:

Charts 24 to 28 show that Tableau was used mostly for bar charts and Google Fusion for maps, while the Guardian team created mostly tables, area charts and maps. External sources were mainly used as a source of: interactive visualisations, combination graphs, maps and bar charts, while line graphs were the type of visualisation most created with Datawrapper as a tool. Chart 20. Type 1. Interactive, per most important Tools (Frequency)

23

18

6

5

4

2

0

0

0

0

11. Graphic from External Source

3. Many Eyes

Not Known / Not available

9. Guardian Graphics' Team / Guardian Data

Other

1. Tableau

12. Datawrapper

4. Google Fusion

6. Google Docs / Drive

2. Wordle.net

Type: Interactive, per Tool

55

Chart 21. Type 7. Bar Chart, per most important Tools (Frequency)

Chart 22. Type 10. Map, per most important Tools (Frequency)

40

11

9

6

6

3

0

0

0

0

12. Datawrapper



9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The

1. Tableau

4. Google Fusion

3. Many Eyes


2. Wordle.net

Other

Type: Bar Chart, per Tool

28

15

9

6

4

2

0

0

0

0

4. Google Fusion




Other

1. Tableau

12. Datawrapper

3. Many Eyes


2. Wordle.net

Type: Map, per Tool

56

Chart 23. Tool 1. Tableau, per most important Visualisation Types (Frequency)

Chart 24. Tool 4. Google Fusion, per most important Visualisation Types (Frequency)

6

2 2 2 1 1 1

0 0 0

Tool: Tableau, per Type

28

3 2 0 0 0 0 0 0 0

Tool: Google Fusion, per Type

57

Chart 25. Tool 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per most important Visualisation Types (Frequency)

Chart 26. Tool 11. Graphic from External Source, per most important Visualisation Types (Frequency)

22

14

9 6 5 4 3 2

0 0

Tool: Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per Type

23 22

15

12

9 7

5 4

0 0

Tool: Graphic from External Source, per Type

58

Chart 27. Tool 12. Datawrapper from External Source, per most important Visualisation Types (Frequency)

Summarising:

The results of the content analysis revealed that, on average, two visualisations are

found on the articles of The Guardian Data Store, while almost three quarters of the

articles offered at least the data set. One can notice that specific tools are used

more frequently for certain visualisation types, while some visualisation types are

more often chosen for articles of specific subject categories. While the sample is too

small to reveal correlations, it can show some tendency in the preference of types

and tools, not only from specific authors, but also for specific subjects. More than 50

tables and 100 charts of more detailed analysis can be found in Appendix 4, while

the entire excel file with the spreadsheet of the systematic content analysis and all

charts and tables in larger size can be downloaded from

https://copy.com/xOsRJcSR1wwL .

40

13

2 0 0 0 0 0 0 0

Tool: Datawrapper, per Type

59

4.2 Research Question 2: § Which is the creative process behind an infographic created and/or hosted

by The Guardian Data Store? In more detail:

o Which is the creative process step by step and who are the decision

makers?

o Which data types are the most broadly used and how is data selected

and gathered?

o Which are the most important tools used in the process either for

data processing and analysis or for visualisation?


§ The various tools used either in data analysis (and possibly formulation /

editing) or in visualisation

4.2.2 Theme 1: Data Sources, Data Gathering and Processing, Data Visualisation: Workflow, Tools and Decision Making

4.2.2.1 Findings:

The interviewees mentioned that they usually try to find data form official sources

such as data issued by governments, National Statistic Institutes, Organisations,

sometimes agencies and through scrapping or personal communication. The use of

freely available data is important as this might need to be reproduced and it

enforces the "open data movement". Mr Ottaviani chooses the data he decides to

use by keeping the data that answers the research question he has set, while for

Mrs Evans the process is more intuitive. Mr Bradshaw, tries to strip the story to its

core details and its background, applies basic journalism rules and is disciplined on

the data analysis.

All three interviewees mentioned that it is usually the topic, the current news

headings or a hypothesis that lead to the search of available relevant data, if

possible from multiple sources, which could give a story worth telling. However,

there could be cases where data comes first and through a quick examination of it,

the journalist sees a story in it. For example, Mr Ottaviani on his article "Data

journalism in Italy: how did 1,000 prisoners die?" (2012) he had first decided to work

on this topic because he felt morally obliged to let people know about what was

happening in Italian prisons and then he set the research questions: who, where and

60

how they die. He gathered the relevant data and visualised it on an interactive map

that showed the deaths per prison all over Italy.

The usual workflow of a production of a data driven publication is:

1. Identifying the topic or the hypothesis

2. Searching for available data sources, primarily official but sometimes through

scraping as well, depending on the topic

3. Evaluating the data sets, picking the most relevant

4. Clearing, combining or merging them, to produce a simpler data set that

reveals a story and which could be visualised

5. Find additional sources of information that can make the story more focused

on how it affects people

6. Decide on the relevant visualisation

7. Write the final story and provide a data set

8. Publish story

The decision on the visualisation type that will be used for Mr Ottaviani depends on

the nature of the story, for Mrs Evans is cooperation with the graphics' team and

applying the basic charting principles, and Mr Bradshaw usually bases his decision

on the suggested approach the A. Abela3 (2009) Chart Chooser indicates. However,

Mr Ottaviani mentions that he prefers interactive visualisation because it involves

the users more if it is simple to use. Mrs Evans' decision on using static or dynamic

visualisation depends on the story, while Mr Bradshaw tends to prefer static

visualisation. However, he says that this also depends on the level of interactivity

that could be applied in each case.

The most preferred data scraping, cleaning, analysis and data visualisation tools

among the interviewees are: Scraperwiki, Google Refine, Google Fusion Tables,

Google Docs, Excel, Datawrapper, BatchGeo, Leaflet, Tableau, Adobe's Photoshop

and Illustrator, and scripts and languages such as Javascript and Python. Lisa

Evans mentioned that in the data store where she had worked for three years, they

preferred the various Google tools, Datawrapper, and the Adobe products for other

designing various illustrations.

3 http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf

61

Finally, the interviewees gave their opinion on what Big Data is. Mr Ottaviani defined

it as "Data that cannot be elaborated by a single computer but by multiple parallel

computers and with the use of algorithms". Mr Bradshaw did not provide a concrete

definition as he believes that it is not a practically useful term. He mentioned that

very big data sets existed before and what he believes has changed for this term to

emerge, is that data is now seen in a qualitatively different way.

4.2.2.2 Discussion:

Mr Ottaviani, Mrs Evans and Mr Bradshaw gather their data from open government

data and open data publications of various official institutes and organisations, but

also through scraping and crowdsourcing. Similarly to the description of Leimdorfer

& Thereaux (2012) on what using open data means for journalists, the interviewees

re-use, clear, combine and reshape the available data sets in order to see if there is

a story worth telling to the people and to provide their readers with a more clear and

simpler data set. The revealed creative workflow they follow is in a very great degree

to that of Aitamurto et al., 2011, which can be found in the literature review.

The research showed that the majority of the tools used for visualisation, and data

scraping and analysis by the three interviewees are commonly found in the recent

bibliography (for example Barkai, 2013), as suggested tools. Additionally, many of

them, such as Google Fusion Tables, Tableau, Datawrapper are among the ones

more frequently used by The Guardian Data Store, as the findings of the quantitative

study reveal.

As Murray (2013) stresses, interactive visualisation tools are used to attract and

engage users more, but additionally to portray many levels of data and information,

that static visualisation types fail to do at once. The examination of The Guardian's

Data Store portfolio revealed that the use of interactive visualisation is high and in

the majority of the articles it is the first visualisation provided, something which

strengthens its role in readers' attraction and engagement.

62

4.3 Research Question 3: § How do journalism professionals perceive data and information visualisation

in terms of value and effectiveness?


§ How data and information visualisation is used in journalism and why is its

use constantly increasing

§ Which are the required skills and knowledge in order to work on data

visualisation on a professional level

§ Which is the importance of data journalism and data visualisation as

perceived by the professionals

4.3.2 Theme 2: Data Journalism and Data Visualisation: Importance, Reasons for Increased interest, Impact in Journalism Required Professional Skills

4.3.2.1 Findings:

Data driven journalism and data visualisation are growing in importance and

expressed interest in them for a variety of reasons. The first reason mentioned by all

three interviewees is the need for government transparency and accountability.

Journalists' role is partly to hold power accountable and data and information are

powers used in financial and political decision-making. Technological improvements

that lead to the increased use of technology in people's every day lives has led to a

growing amount of data and information circulating online. From this data it is

possible to extract interesting stories for the people, stories that would be based on

fact-checked data and statistics. The final reason is the advantage of speed that

online information has to offer to the reader.

Data and information visualisation particularly, enhance the advantage of speed.

Data visualisation manages to communicate quickly, more simply and intuitively

messages of ideas and concepts that would otherwise be difficult to explain or could

have been ignored. It also attracts attention in a way similar to that of a headline.

Another advantage of visualisation, according to Mr Bradshaw, is that it broadens

the range of people that a story can have an impact to, as it is favourable to non-

textual people.

63

The level of understanding visualisation on articles and the message they

communicate depends on the quality of the visualisation, as Mr Ottaviani stresses. If

visualisation is well made and provides, for instance, comparison of scales and

sizes to elements that people are aware of, it is better understood. Additionally, a

good visualisation can reveal patterns that would otherwise not be obvious and it is

always interesting to see how people find stories in a visualisation. Mrs Evans

mentions that a very good source of feedback about the way a visualisation was

perceived by people is their comments under the article. Nevertheless, in general,

Mrs Evans believes that people like visualisation because it is less time consuming,

they can explore it themselves, they like getting the bigger picture on data released

by the governments and enjoy a story that was nicely put together. Mr Bradhsaw

has no feedback on people's level of understanding of visualization highlighting that

there is no evidence to say that articles with visualisation are better understood by

the readers and that what can be measured instead is the impact of the story and

the scale of its reproduction by other media.

The impact of data journalism and data visualisation on journalism, according to Mr

Ottaviani, is that it brought in the centre the question of fact-checking. It is a matter

of truth to base opinion on something scientifically provable. They enable journalists

to push governments for transparency, to open their archives, that have a lot of

interesting data, which, however, governments might hesitate to release because it

would put them in a bad position and it would cause controversies. For Mrs Evans,

data journalism has had an impact on the profession and the role of data journalists

in the broader team of journalists. It resulted to data journalists gradually becoming

more respected as journalists in general and not to be considered just a part of the

graphics' team anymore.

Mr Bradshaw believes that all changes, which led to the flourishing of data

journalism and visualisation, are consequences of a rapidly changing information

environment in general. It is the way advertising is measured that changes the

environment of information. Journalists are challenged to increase traffic on the

media they work for or cooperate with, and there is pressure for them to publish

more publications. Moreover, specialised media and bloggers, who might have a

64

deep knowledge of a subject, can easily identify mistakes on a story and that brings

more pressure for factual accuracy.

A data journalist has to combine a minimum of skills and to have a basic knowledge,

or at least an understanding, of many fields. Mr Ottaviani specifically suggests some

knowledge of programming, like HTML and CSS initially and then Javascript or

Python, basic statistics, design, social media, advanced excel (macros and pivot

tables), some database management like MySQL and, of course, the basic

principles of journalism, like ethics. Mrs Bradshaw stresses the importance of having

"an eye for a story" that might be hidden in data and being able to analyse data and

communicate results effectively, in a way that are connected to human stories. For

Mrs Evans it is very important to ask experts of a specific field, as data journalisms

are bridging the gap between specialised fields rather than being an expert in all.

On a team level, an ideal data journalism and visualisation team would include about

four people of different specialties, like a programmer, who would do the data

scraping, a good graphics' designer with knowledge of interactive visualisation as

well, one or two journalists and ideally a statistician or someone with background in

Maths. All those people, however, would need to know a bit of all fields in order to

understand each other and exchange ideas. The overlapping of different skills is

very interesting for Mr Ottaviani.

4.3.2.2 Discussion:

Smiciklas (2012) mentioned that the power of data visualisation is to allow viewers see "insights" that would have not been visible to them if they were only provided with the numerical data. The interviewees, almost similarly, explain that data visualisation communicates concepts and messages that would otherwise be difficult to explain or could have been ignored. Tufte (2001) on his description of Graphical Excellence, set as one of its prerequisites the communication of the message in the shortest time. All interviewees in the research mentioned speed as another very important quality of data visualisation.

In the literature review, Cohen, Hamilton and Turner (2011) referred to data

journalism as the "weapon" that people and journalists can use to hold politicians

65

and governments accountable. According to the interviewees, this is the main

reason that the importance and use of data journalism and data visualisation, are

constantly growing. As Diana Priest (McGhee, 2010) pointed out, data journalism is

branch of the broader field of investigative journalism, facts and the truth is what all

investigators look for. With data journalism, as Mr Otaviani said, fact-checking is in

the centre of attention.

On the project "Journalism in the Age of Data" (McGhee, 2010) several

professionals mentioned as fundamental skill in data journalism, the ability

cooperate with people of different expertise and skills, by having the basic

understanding of the other fields. This overlapping of fields was highlighted by the

interviewees, as very important and inevitable, since one person cannot do it all and

would eventually need to seek the help of an expert.

4.4 Research Question 4: § Which are the possible weaknesses, limitations and the negative aspects of

data and information visualisation?


§ Which are the possible limitations, weaknesses and negative aspects or

impact of data journalism and information visualisation

4.4.2 Theme 3: Weaknesses, Limitations, Negative Aspects and Dangers of Data Journalism and Data Visualisation

4.4.2.1 Findings:

The drawbacks of data visualisation, according to Mr Bradshaw, are that it can

oversimplify, or lose subtleties or complexities of the story. For this reason it should

always be in partnership with other information. Like any form of communication it

can be misleading and the way to avoid such a situation is the ethical considerations

66

that accompany top journalism: to shrive to be accurate, into context and not to

misrepresent4.

Mrs Evans says that the best way to avoid this is to always publish the data set with

it, considering that if something is wrong or an important aspect is ignored, people

will comment on it. As she notes, it is very easy for something to go wrong when an

infographic is created, since there are too many decisions to be taken, even in terms

of design. She believes that the most difficult visualisation type for people to

understand is complex networks.

In Mr Ottaviani's opinion, the risk with data journalism in general and, consequently,

with data visualisation, is that journalists report to people news in a quantitative form

and might fail to give people something that creates an emotional response. Mr

Ottavianni's philosophy is to "give numbers an identity". Additionally, journalists

need to overcome possible prejudices they might have and clearly present the facts

and the context, even if they contradict what they believed that far. Finally, he

believes that data visualisation can be misleading and provided a link5 to a webpage

that highlights examples of bad visualisation.

Other important issues that need to be considered in data journalism, is the need to

respect copyright and database rights and not to publish something that would break

the law or violate people's privacy, as Mr Bradshaw comments. Mr Ottaviani agrees

that, especially after the example of Wikileaks, journalists need to be careful and not

to expose people in danger or harmful situations by publishing their personal

information. Mrs Evans stresses the importance of being straightforward to people

and corporations when they give their data, as to the reproduction and publication of

it so that they know the potential consequences.

4 When Mr Bradshaw was asked for examples of bad visualisation, he provided the following link with bookmarks of bad visualisation: http://pinboard.in/u:paulbradshaw/t:badvis 5 When Mr Ottaviani was asked for examples of bad visualisation, he provided the following link with examples of bad visualisation http://flowingdata.com/category/statistics/mistaken-data/

67

4.4.2.2. Discussion

The danger of misleading and inaccurate visualisation is stressed in the literature

review by Ward et al., (2010). A visualisation can be misleading when it is

inaccurate or when design aspects of it, like scales, lines are wrong, disproportional,

unclear or too complicated. Mrs Evans believes that what leads to a misleading

visualisarion is, apart from what could fail in the design, the generally the wrong way

that the data is approached.

The great risk though of data journalism just providing the numerical data to the

people without telling a story that connects with them. They must not forget that they

will need to search for the human side of the story (Oliver, 2010). Likewise, Mr

Ottaviani's philosophy is to "give an identity to the numbers".

Paul Steiger from ProPublica expressed his concern on how far can accessibility

and openness go (McGhee, 2010). After the incident of Wikileaks, where people’s

identities and other private information was leaked, Mr Ottavianni stresses the

necessity that journalists are very careful to what they publish or reproduce.

4.4.3 Theme 4: Future Prospective and Challenges of Data Journalism and Data Visualisation

4.4.3.1 Findings:

Three different perspectives were given by the interviewees about the future

prospective and challenges in data journalism and data visualisation. Mr Ottaviani

believes that they will spread more since digital media offer opportunities that print

media do not. Since paper will mostly disappear in a few decades, online media

offer the possibility to expand. Additionally, stories are easily shared online and

readers are more involved. They can interact with other readers, they can fact-check

and they can even participate in building data sets and creating stories. This is a

very interesting side of data journalism and data visualisation to keep an eye on.

68

Mrs Evans believes that the future will bring better and more sophisticated tools and

hopes that more people will come to data journalism, especially people with skills on

both statistics and understanding what is useful for the readers.

Mr Bradshaw foresees a conflict about the kind of information journalists will be

seeking and that people in power will not want to make available. He also believes

that there will be fights around Freedom Of Information laws.

Accoording to Mr Bradshaw, journalists will become better in collecting data through

scraping or leaks, as other data might not be available elsewhere but online. More

online data will give opportunities for personalisation of stories, especially with the

employment of social media, like Facebook, since it will be easier to connect stories

to specific people and to make analysis of human networks and connections, which

historically was really hard to do.

4.4.3.2 Discussion:

Sir Tim-Berners-Lee (Arthur, 2010), mentioned that "Data-driven journalism is the future" because of he endless possibilities and options in data processing,

visualisation techniques, programming languages and especially open and open government data. Mrs Evans believes that the tools in the future will be more advanced and more skilled people will want to work in the field. Especially people with advanced programming knowledge will be needed more for data scraping in the future, as Mr Bradshaw mentions, because people and journalists might not be able to find the data set they wish for in the traditional sources.

69

5. Conclusion Data journalism and data visualisation are constantly growing in importance and use. This dissertation aimed to investigate the use of data visualisation in journalism, by examining as a case study one of the most respected providers of data journalism publications, The Guardian Data Store.

Meeting Objectives: The study's objectives were to identify:

1. Possible tendencies, norms, co-relations on The Guardian’s portfolio, mainly


2. The various tools used either in data analysis (and possibly formulation /

editing) or in visualisation, and more specifically by The Guardian.

3. How data and information visualisation is used in journalism and why is its

use constantly increasing

4. Which are the required skills and knowledge in order to work on data

visualisation on a professional level

5. Which is the importance of data journalism and data visualisation as

perceived by the professionals

6. Which are the possible limitations, weaknesses and negative aspects or

impact of data journalism and information visualisation

Evaluation of Methodology Approach The first methodological approach followed in the study was a Systematic Content Analysis of a sample of data-driven articles published by The Guardian Data Store. This qualitative method was the most suitable to use in order to meet the first ans part of the second objective, the identification of possible tendencies, norms or co-relations between the subject category of the articles, the different visualisation tools and the various visualisation types. The sample though was relatively small compared to the total population of the published data-driven articles. Additionally, there were technical limitations in the coding options of some variables, such as

70

the author, where only the first authors of the articles are taken into consideration. Another limitation was the need to classify articles only into one subject category, when it was clear that some could be classified under multiple categories. In order to face such restrictions, a detailed coding scheme for each variable was created with all the specifications on the classification and other relevant restrictions. The reliability of the coding scheme and the content analysis was tested online with the assistance of a second coder when the latter's coding, was compared to that of the researcher. The reliability test showed high agreement results. The second methodological approach that was followed was the conduction of semi-structured interviews with data driven journalists Jacoppo Ottaviani, Paul Bradshaw and Lisa Evans, who have either published on The Guardian Data Store in a freelance basis, or had worked for it. The data gathered on the interviews was analysed using the method of thematic analysis. This qualitative method was considered the most adequate to help meet objectives three to six and partly objective two (mostly through the interview of Mrs Evans who worked on The Guardian Data Blog).

Key Findings: The quantitative research showed that on the average number of visualisations

per article in The Guardian Data Store were two, while more than half of the articles featured at least one type of visualisation. A data set (or a link to it) and/or a data summary was provided in most of the articles. The main authors in quantity of articles published were identified, with most frequent Simon Rogers who created the blog in 2009 and who only recently left The Guardian. Other notable authors were: Ami Sedghi, Mona Chalabi, John Burn-Murdoch, James Ball and Lisa Evans, who was also one of the professionals interviewed for the current research. The main subject categories of the research were Politics/Government/Local Administration, Society and culture.

71

The visualisations were created by for the Guardian by its graphic's team or other freelance graphic designers, by external sources in the cases where the blog was just featuring the visualisation of another source, and then with the tools Datawrapper, Google Fusion Tables and Tableau. The main visualisation types were bar charts, maps and interactives. The reseach showed tendency to use specific tools for specific visualisation types, for example Datawrapper for most bar charts. It also showed how various tools were substituted by others during the years or how for some subject categories, a specific type of visualisation was used more often. The qualitative method revealed that some of the visualisation tools like Datawrapper, Google Fusion, Tableau, preferred by The Guardian data store, were also among the preferred ones by the interviewees. The creative workflow of a publication since the original conception of the topic till the final publication was described, with a special insight on decision-making, the main data sources, such as open government data or open data publications and data gathering, processing and visualisation. The reasons for which data journalism and data visualisation were growing in use and importance were examined, with the main reasons, for data journalism, being the pressure for more transparency and government accountability. For data visualisation the main reasons were its quality for quick communication of the story that the data represents, many times with interactive visualisations, which engage users more. The research highlighted the possible dangers and risks of data journalism and visualisation, like the misleading representation of facts and figures or failing to

bring in the story a human dimension that would interest the people. Finally it explored possible future challenges and prospective, like the need for more data scraping as most interesting data that would create a more unique story rather than conventional data sets, or the possibility of personalising stories with the use of social media as a tool.

72

Future Research Suggestions There are many possible different studies that could be conducted about data visualusation and data journalism. A first alternative research or extension of the current research could be how readers perceive those articles and the grade of understanding they have of them and particularly of their visualisations, Another option could be one that would focus on the reciprocation of The Guardian Data Store's articles from the readers active on social media. A focus on the rate of sharing and spreading of the articles on social media and on the readers' feedback to those articles through comments and replies on Facebook or Twitter, for example, would be very interesting. Word Count: 14893.

73

Bibliography 1. A Quick Illustrated History of Visualisation. (n.d.). Data Art. Retrieved July 10, 2013,

from http://www.data-art.net/resources/history_of_vis.php 2. A-Z section of The Guardian Data Blog: The complete Index of Data Sets. (n.d.).

Retrieved July 30, 2013, from http://www.theguardian.com/technology/page/2009/jun/17/1

3. Abela, A. (2009). Chart Suggestions—A Thought-Starter. Retrieved from http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf

4. Adobe Creative Cloud. (n.d.). Adobe. Retrieved July 20, 2013, from

http://www.adobe.com 5. Aitamurto, T., Sirkkunen, E., & Lehtonen, P. (2011). Trends In Data Journalism.

6. Arthur, C. (2010, November 22). Analysing data is the future for journalists, says Tim

Berners-Lee. The Guardian. Retrieved April 30, 2013, from

http://www.guardian.co.uk/media/2010/nov/22/data-analysis-tim-berners-lee 7. Ball, J. (2013). Can you trust an infographic? The Guardian. Retrieved from

http://www.guardian.co.uk/media/shortcuts/2013/jan/09/can-you-trust-an-

infographic 8. Barkai, M. (2013). Data Visualisation Tools and Trends to Watch: An Interview with

Datavisualisation.ch. Data Driven Journalism.

9. BatchGeo. (n.d.). BatchGeo LLC. Retrieved July 10, 2013, from

http://www.batchgeo.com 10. Big data needn’t be a big headache: How to tackle mind-blowing amounts of

information. (2012). Strategic Direction, 28(8), 22–24.

doi:10.1108/02580541211249583 11. Bounegru, L. (2013). Slides, Tools and Other Resources From the School of Data

Journalism 2013. Data Driven Journalism. Retrieved April 30, 2013, from

http://datadrivenjournalism.net/news_and_analysis/slides_tools_and_other_resources_from_the_school_of_data_journalism_2013

12. Bounford, T. (2000). Digital Diagrams. (P. Leek, Ed.) (1st ed., pp. 38–107, 118–119).

London, UK: Cassel & Co. 13. Bradshaw, P. (n.d.). Online Journalism Blog. Retrieved from

http://onlinejournalismblog.com 14. Bradshaw, P. (2010). How to be a data journalist. The GuardianThe. Retrieved July

22, 2013, from http://www.theguardian.com/news/datablog/2010/oct/01/data-

journalism-how-to-guide 15. Bradshaw, P. (2012a). Olympic torch relay places - How were they allocated? Get

the data. The Guardian. Retrieved July 22, 2013, from

http://www.theguardian.com/sport/datablog/2012/jul/26/olympic-torch-relay-places

74

16. Bradshaw, P. (2012b). 2012 Olympics investigation: The story behind the olympic

sponsors. The Guardian. Retrieved July 22, 2013, from http://www.theguardian.com/news/datablog/2012/jun/06/olympics-2012-

investigation

17. Bradshaw, P. (2012c). Who are the mystery Olympic torchbearers? Get the data. The Guardian. Retrieved July 22, 2013, from

http://www.theguardian.com/sport/datablog/2012/jul/11/2012-olympic-torch-relay-

torchbearers-sponsors-data 18. Bradshaw, P. (2013). Council spending on the Olympic torch relay: Where did the

money go? The Guardian. Retrieved July 22, 2013, from

http://www.theguardian.com/news/datablog/2013/mar/06/council-spending-olympic-torch-relay-where-did-money-go

19. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative

Research in Psychology, 3(2), 77–101. Retrieved from http://dx.doi.org/10.1191/1478088706qp063oa

20. Buhl, H. U., Röglinger, M., Moser, F., & Heidemann, J. (2013). Big Data: A

Fashionable Topic with(out) Sustainable Relevance for Research and Practice? Business & Information Systems Engineering, 5(2), 65–69. doi:10.1007/s12599-013-

0249-5

21. Burns, R. B. (2000). Introduction to Research Methods (4th ed., pp. 391–392, 423–435). London, UK: Sage Publications Ltd.

22. Cairo, A. (2012). Infographics and Visualizations as Tools For the Mind. Visual.ly

Blog. Retrieved from http://blog.visual.ly/infographics-and-visualizations-as-tools-

for-the-mind/ 23. CartoDB: Geospatial Data Visualisation. (n.d.). CartoDB. Retrieved June 20, 2013,

from http://cartodb.com

24. Chabot, C. (2009). Graphically Speaking Demystifying Visual Analytics. IEEE

Computer Graphics and Applications, 29(2), 84 –87. Retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4797520

25. Cohen, S., Hamilton, J. T., & Turner, F. (2011). Computational journalism.

Communications of the ACM, 54(10), 66–71. doi:10.1145/2001269.2001288 26. Crawford, K. (2013). The Hidden Biases in Big Data. Harvard Business Review.

Retrieved June 25, 2013, from

http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html 27. D3.js - Data Driven Documents. (n.d.). D3.js. Retrieved August 10, 2013, from

http://d3js.org

28. Data Visualisation - Selected tools. (2013). Retrieved May 16, 2013, from http://selection.datavisualization.ch

29. Datawrapper Software. (n.d.). Retrieved June 12, 2013, from http://datawrapper.de

75

30. Davies, T. (2010). Open data, democracy and public sector. Retrieved from

http://www.academia.edu/988533/Open_Data_Democracy_and_Public_Sector_Reform

31. Dawson, C. (2009). Introduction to Research Methods: A practical guide for anyone

underatking a research project (Fourth., pp. 115–116). Oxford, UK: How To Content. Retrieved from https://www.dawsonera.com/abstract/9781848033429

32. Devi, N. B. (2009). Qualitative and Quantitative Methods in Libraries, International

Conference. In Understanding the Qualitative and Quantitative Methods in The Context of Content Analysis (pp. 1–10). Chania Crete, Greece.

33. Domokos, J., & Evans, L. (2011). Jobcentres “tricking” people out of benefits to cut

costs, says whistleblower. The Guardian. Retrieved August 03, 2013, from

http://www.theguardian.com/politics/2011/apr/01/jobcentres-tricking-people-benefit-sanctions

34. Entry-level tools Online visualisations. (2012). NetMagazine.com. Retrieved July 05,

2013, from http://www.netmagazine.com/features/top-20-data-visualisation-tools 35. Flew, T., Spurgeon, C., Daniel, A., & Swift, A. (2012). The Promise of Computational

Journalism. Journalism Practice, 6(2), 157–171. doi:10.1080/17512786.2011.616655

36. Fogg, A. (2013). Immigration, crime, benefits: Everything you know about the state of the nation is wrong. The Independent. Retrieved July 15, 2013, from

http://www.independent.co.uk/voices/comment/immigration-crime-benefits-

everything-you-know-about-the-state-of-the-nation-is-wrong-8697574.html 37. Free Our Data Campaign. (n.d.). The Guardian. Retrieved July 20, 2013, from

http://www.freeourdata.org.uk

38. Freelon, D. (n.d.). ReCal: reliability calculation for the masses. Retrieved August 18, 2013, from http://dfreelon.org/utils/recalfront/

39. Freelon, D. G. (2010). ReCal: Intercoder Reliability Calculation as a Web Service. International Journal of Internet Science, 5(1), 20–33. Retrieved from

http://www.ijis.net/ijis5_1/ijis5_1_freelon.pdf

40. Friendly, M., & Denis, D. J. (2001). Milestones in the history of thematic cartography, statistical graphics, and data visualization. Retrieved from

http://www.datavis.ca/milestones/

41. Giardina, M., & Medina, P. (2012). Information Graphics Design Challenges and Workflow Management. In International Conference on Communication, Media,

Technology and Design (pp. 246–252). Instanbul, Turkey. Retrieved from

http://www.cmdconf.net/2012/makale/46.pdf 42. Google Fusion Tables Experimental Application. (n.d.). Google Research. Retrieved

June 12, 2013, from

https://support.google.com/fusiontables/?hl=en#topic=1652595

76

43. Google Refine. (n.d.). Google. Retrieved June 12, 2013, from

http://code.google.com/p/google-refine/ 44. Graziano, A. M., & Raulin, M. L. (2012). Research Methods: A process of Inquiry (8th

ed., pp. 136, 320). New Jersey: Pearson Academic Computing.

45. Grey, J., Chambers, L., & Bounegru, L. (2012). The data journalism handbook: How

Journalists Can Use Data to Improve the News (p. 242). O’Reilly Media. Retrieved

from http://datajournalismhandbook.org 46. Halevy, A., & McGregor, S. (2012). Data Management for Journalism. Retrieved

from ftp://ftp.research.microsoft.com/pub/debull/A12sept/journal.pdf 47. Hannabuss, S. (1996). Feature article Research interviews. New Library World,

97(1129), 22–30. doi:10.1108/03074809610122881

48. Heer, J., Bostock, M., & Ogievetsky, V. (2010). A tour through the Visualization zoo.

COMMUNICATIONS OF THE ACM, 53(6), 59–67. doi:10.1145/1743546 49. Hohl, M. (2011). From abstract to actual: art and designer-like enquiries into data

visualisation. Kybernetes, 40(7/8), 1038–1044. doi:10.1108/03684921111160278

50. Hox, J. J., & R., B. H. (2005). Data collection: Primary vs. Secondary. In Encyclopedia of Social Measurement. Elsevier Inc. Retrieved from http://igitur-

archive.library.uu.nl/fss/2007-1113-200953/hox_05_data collection,primary versus

secondary.pdf 51. Jacoppo Ottaviani’s Blog, at “Il Fatto Quotidiano.” (n.d.). Il Fatto Quotidiano.

Retrieved July 08, 2013, from http://www.ilfattoquotidiano.it/blog/jottaviani/

52. Jacoppo Ottaviani’s Profile at The Guardian. (n.d.). The Guardian. Retrieved July 08,

2013, from http://www.theguardian.com/profile/jacopo-ottaviani 53. Joel, G. (2011). #ijf11: The key term in open data? It’ s “re-use”, says Jonathan

Gray. Journalism.co.uk. Retrieved June 10, 2013, from

http://blogs.journalism.co.uk/2011/04/18/ijf11-the-key-term-in-open-data-its-re-use-says-jonathan-gray/

54. Kramer de Oliveira Barros, R., & Araujo Bertoti, G. (2012). An Information Visualization Tool for Data Journalism. In IHC 2012 Companion Proceedings (pp.

41–42). Cuiaba, Brazil. Retrieved from http://dl.acm.org/citation.cfm?id=2400094

55. Krippendorff, K. (2004). Reliability in Content Analysis: Some Common Misconceptions and Recommendations. Human Communication Research, 30(3),

411–433. doi:10.1093/hcr/30.3.411

56. Krippendorff, Klaus. (2003). Content Analysis  : An Introduction to Its Methodology

(2nd ed., pp. 18–43, 81–96). London, UK: Sage Publications Inc. 57. Kronenburg, T. (2011). Data Journalism Fuelling PSI Re- use, Topic Report

No.2011/2. Retrieved from http://epsiplatform.eu/sites/default/files/Topic Report

Data Journalism.pdf

77

58. Kvale, S. (2007). Doing Interviews (pp. x, xi, 46–47, 84–109). London, UK: Sage

Publications Ltd. 59. Landman, C. (2013). Data | Visualization | Art  ? MastersOfMedia.hum.uva.nl.

Retrieved May 10, 2013, from http://mastersofmedia.hum.uva.nl/2013/03/13/data-

visualization-art/ 60. Leimdorfer, A., & Thereaux, O. (2012). How open data is redefining the roles of the

journalist, audience and publisher. In USING OPEN DATA: Policy modeling, citizen

empowerment, data journalism. Brussels. Retrieved from

http://www.w3.org/2012/06/pmod/pmod2012_submission_9.pdf 61. Lima, M. (2011). Visual Complexity: Mapping Patterns of Information (pp. 158–219).

New York: Princeton Architectural Press.

62. Lisa Evans’ Personal Web Page. (n.d.). Retrieved July 15, 2013, from http://objectgroup.org/

63. Lisa Evans’ Profile at The Guardian. (n.d.). The Guardian. Retrieved July 15, 2013,

from http://www.theguardian.com/profile/lisaevans 64. Losowsky, A., Duenes, S., Corbineau, A., Kleiner, C., Grundy, P., Schwochow, J., &

Franchi, F. (2011). Visual Storytelling: Inspiring a New Visual Language (pp. 24–31).

Berlin: Gestalten. 65. Lotan, G., Ananny, M., Gaffney, D., & Boyd, D. (2011). The Revolutions Were

Tweeted  : Information Flows During the 2011 Tunisian and Egyptian Revolutions Web Ecology Project Web Ecology Project. International Journal of Communication,

5, 1375–1405. Retrieved from

http://ijoc.org/ojs/index.php/ijoc/article/view/1246/643

66. Mahrt, M., & Scharkow, M. (2013). The Value of Big Data in Digital Media Research. Journal of Broadcasting & Electronic Media, 57(1), 20–33.

doi:10.1080/08838151.2012.761700

67. Manovich, L. (2011). What is visualisation? Visual Studies, 26(1), 36–49.

doi:10.1080/1472586X.2011.548488 68. ManyEyes Visualisation Experiment. (n.d.). IBM. Retrieved June 10, 2013, from

http://www-958.ibm.com/software/analytics/manyeyes/

69. Marshall, S. (2012). PPAdigital: Paul Bradshaw ’ s five principles of data management. Journalism.co.uk. Retrieved July 10, 2013, from

http://blogs.journalism.co.uk/2012/09/26/ppadigital-paul-bradshaws-five-

principles-of-data-management/ 70. Marshall, S. (2013). How big data is changing financial journalism. Journalism.co.uk.

Retrieved from http://www.journalism.co.uk/news/-hhldn-how-big-data-is-

changing-financial-journalism/s2/a551791/ 71. Mayring, P. (2000). Qualitative Content Analysis Basic Ideas of Content Analysis.

Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 1(2).

78

Retrieved from http://www.utsc.utoronto.ca/~kmacd/IDSC10/Readings/text

analysis/CA.pdf 72. McGhee, G. (Writer, Producer). (2010, September 23). "Journalism in the Age of

Data" [Web Video]. Retrieved from http://t.co/7ViPzDAywj 73. McNeil, P., & Chapman, S. (2005). Research Methods (3rd ed., pp. 9–24, 59–67,

161–154). New York: Routledge.

74. Mol, L. (2011). The Potential Role for Infographics in Science Communication. Vrije

Universiteit Amsterdam. Retrieved from http://www.sg.uu.nl/academie/infographics/Laura Mol Master Thesis SC Final-

small.pdf 75. Murray, S. (2013). Interactive Data Visualisation for the web. (M. Blanchette, Ed.) (p.

2). O’Reilly Media, Inc.

76. Oliver, L. (2010). UK government’s open data plans will benefit local and national journalists. Journalism.co.uk. Retrieved from 12/07/2013

77. Open Refine. (n.d.). GitHub. Retrieved July 25, 2013, from http://openrefine.org

78. Ostergren, M., Hemsley, J., Belarde-lewis, M., Walker, S., & Hall, M. G. (2011). A

vision for Information Visualization in Information Science. In iConference ’11 Proceedings of the 2011 (pp. 531–537). doi:10.1145/1940761.1940834

79. Ottaviani, J. (2012). Data journalism in Italy: how did 1,000 prisoners die? The

Guardian. Retrieved July 15, 2013, from

http://www.theguardian.com/news/datablog/2012/may/23/italian-prisoners-deaths

80. Parasie, S., & Dagiral, E. (2012). Data-driven journalism and the public good: “Computer-assisted-reporters” and “programmer-journalists” in Chicago. New

Media & Society. doi:10.1177/1461444812463345

81. Paul Bradshaw’s Collection of Bad Visualisation Examples. (n.d.). Retrieved August 17, 2013, from http://pinboard.in/u:paulbradshaw/t:badvis

82. Paul Bradshaw’s Profile at The Guardian. (n.d.). The Guardian. Retrieved July 15,

2013, from http://www.theguardian.com/profile/paul-bradshaw 83. Prasad, B. D. (2008). Content Analysis: A method in Social Science Research. In

Research Methods for Social Work (pp. 174–193). New Delhi: Rawat Publications.

Retrieved from http://www.css.ac.in/download/deviprasad/content analysis. a method of social science research.pdf

84. Prezi Virtual Presentation Whiteboard. (n.d.). Prezi. Retrieved June 20, 2013, from

http://prezi.com 85. Rogers, S. (2010). “One hell of a spreadsheet”: turning 90,000 rows of WikiLeaks

data into a story. Journalism.co.uk. Retrieved July 15, 2013, from

http://www.journalism.co.uk/news-features/-039-one-hell-of-a-spreadsheet-039--turning-90-000-rows-of-wikileaks-data-into-a-story/s5/a540109/

79

86. Rogers, S. (2011). Data visualisation: in defence of bad graphics. The Guardian Data

Blog. Retrieved June 10, 2013, from

http://www.theguardian.com/news/datablog/2011/oct/17/data-visualisation-visualization

87. Rogers, S. (2013). Facts Are Sacred: The Power of Data (1st ed., p. 309). London,

UK: Faber and Faber Limited, Guardian Books. 88. Rugg, G. (2007). Using Statistics: A Gentle Introduction (pp. 25–52). New York:

Open University Press, Mc Graw - Hill Education.

89. Salmons, J. (2010). Online Interviews in Real Time (pp. 38–71). London, UK: Sage

Publications Inc. 90. Schaap, J. (2012). Cultural Bias in Data Visualization. Masters of Media, New Media

& Digital Culture M.A., University of Amsterdam. Retrieved July 10, 2013, from

http://mastersofmedia.hum.uva.nl/2012/03/28/cultural-bias-in-data-visualization/ 91. ScraperWiki. (n.d.). ScraperWiki. Retrieved June 10, 2013, from

https://scraperwiki.com

92. Segel, E., & Heer, J. (2010). Narrative visualization: telling stories with data. IEEE

transactions on visualization and computer graphics, 16(6), 1139–1148.

doi:10.1109/TVCG.2010.179 93. Smiciklas, M. (2012). The Power of Infographics (pp. 21–34). U.S.A.: Que.

94. Smolan, R., & Erwitt, J. (2012). The Human Face of Big Data (pp. 14–15, 136–157).

Sausalito, California: Against All Odds Productions.

95. Stolte, Y. (2012). Journalism and Access to Data The Phone Hacking Scandal , WikiLeaks. Datenschutz und Datensicherheit, 5, 354–358. Retrieved from

http://link.springer.com/content/pdf/10.1007%2Fs11623-012-0134-2.pdf

96. Tableau Software. (n.d.). Tableau Software. Retrieved June 10, 2013, from http://www.tableausoftware.com

97. The New York Times: Multimedia. (n.d.). The New York Times. Retrieved August 10,

2013, from http://www.nytimes.com/pages/multimedia/ 98. Thomas, R. M. (2003). Data-Collection Processes and Instruments. In Blending

Qualitative & Quantitative Research Methods in Theses and Dissertations (pp. 57–

75). Sage Publications, Inc. doi:10.4135/9781412983525

99. Top Ten Tools for Data Journalism. (2013). Interhacktives.com. Retrieved June 10,

2013, from http://www.interhacktives.com/2013/05/10/top-ten-tools-for-data-journalism/

100. Tufte, E. R. (2001). The Visual Display of Quantitative Information (pp. 13–77).

Chelshire, Connecticut: Graphics Press LLC\. 101. United States General Accounting Office. GAO. (1989). Content Analysis  : A

Methodology for Structuring and Analyzing Written Material (pp. 1–31). Retrieved

from http://archive.gao.gov/d48t13/138426.pdf

80

102. Vis, F. (2012). Actor Types code frame. Retrieved from

http://researchingsocialmedia.files.wordpress.com/2012/01/actor-types-code-frame3.pdf

103. Walliman, N. S. R. (2011). Research Methods: The Basics (pp. 15–29, 63–113).

London, UK: Taylor & Francis Routledge. Retrieved from https://www.dawsonera.com/abstract/9780203836071

104. Ward, M., Grinstein, G., & Keim, D. (2010). Interactive Data Visualisation:

Foundations, Techniques, and Applications (pp. 130–148, 365–374). Natick, MA: A K Peters, Ltd.

105. Weber, W., & Rall, H. (2012). Data Visualization in Online Journalism and Its

Implications for the Production Process. In 2012 16th International Conference on Information Visualisation (pp. 349–356). Ieee. doi:10.1109/IV.2012.65

106. Wong, D. M. (2010). The Wall Street Journal: Guide to Information Graphics (p.

143). New York: W. W. Norton & Company.

107. Wordle. (n.d.). Wordle. Retrieved July 12, 2013, from http://www.wordle.net

108. Yau, N. (n.d.). Mistaken Data. Flowing Data. Retrieved July 15, 2013, from http://flowingdata.com/category/statistics/mistaken-data/

81

Appendices

82

Appendix 1: Ethical (Application, Consent Form, Approval)

83

The University of Sheffield. Proposal for Information School Research Ethics Review

Students Staff This proposal submitted by: This proposal is for: Undergraduate Specific research project X Postgraduate (Taught) – PGT Generic research project Postgraduate (Research) – PGR This project is funded by:

Project Title: "Infographics: Data and Information Visualization and its use in Journalism - A

Case Study on Guardian's Data Store".

Start Date: 08/07/2013 End Date: 02/09/2013 Principal Investigator (PI): (student for supervised UG/PGT/PGR research)

Charalampia Boula

Email: [email protected] Supervisor: (if PI is a student)

Farida Vis

Email: [email protected] Indicate if the research: (put an X in front of all that apply) Involves adults with mental incapacity or mental illness, or those unable to make a personal

decision Involves prisoners or others in custodial care (e.g. young offenders) Involves children or young people aged under 18 years of age Involves highly sensitive topics such as ‘race’ or ethnicity; political opinion; religious,

spiritual or other beliefs; physical or mental health conditions; sexuality; abuse (child, adult); nudity and the body; criminal activities; political asylum; conflict situations; and personal violence.

Please indicate by inserting an “X” in the left hand box that you are conversant with the University’s policy on the handling of human participants and their data. X

We confirm that we have read the current version of the University of Sheffield Ethics Policy Governing Research Involving Human Participants, Personal Data and Human Tissue, as shown on the University’s research ethics website at: www.sheffield.ac.uk/ris/other/gov-ethics/ethicspolicy

Part B. Summary of the Research

84

B1. Briefly summarise the project’s aims and objectives: (This must be in language comprehensible to a layperson and should take no more than one-half page. Provide enough information so that the reviewer can understand the intent of the research) Summary:

Aim:

This study primarily aims to examine the role of information and data visualisation in journalism,

based on an analysis of the biggest journalistic Infographics portfolio in UK, The Guardians' "Data

Store".

Objectives:

To identify:

§ How data and information visualisation is used journalism.

§ Why its use is constantly increasing.

§ The various tools used either in data analysis (and possibly formulation / editing) or in

visualisation, more specifically by The Guardian.

§ Required skills and knowledge in order to work on data visualisation

§ Its importance, as perceived by the professionals.

§ Possible tendencies, norms, co-relations on Guardians' portfolio, mainly regarding subject,

visualisation type and tools

§ Limitations, weaknesses and possible negative aspects or impact of data and information

visualisation.

B2. Methodology: Provide a broad overview of the methodology in no more than one-half page. In-depth interviews with professionals, taken on a visit at the offices of The Guardian and on meetings

or Skype conversations with other freelance professionals that have worked on the creation of some

of the Infographics featured on The Guardian Data Store. The interviewed professionals may be

editors, journalists, graphic designers, data analysts and/or other member(s) of the visualisation team,

who either decide on the concept and the data used or participate on designing and creation of the

visualisations. The interviews will be semi-structured because this type of interviews allows flexibility

but also help maintain a better control of the procedure. The questions will be a combination of open

and closed format, with a possible focused discussion-analysis of selected, from the portfolio, articles.

If more than one method, e.g., survey, interview, etc. is used, please respond to the questions in Section C for each method. That is, if you are using both a survey and interviews, duplicate the page and answer the questions for each method; you need not duplicate the information, and may simply indicate, “see previous section.” C1. Briefly describe how each method will be applied

85

Method (e.g., survey, interview, observation, experiment): Interviews Description – how will you apply the method? The interviews will take place either at the offices of The Guardian in London, or will be conducted through Skype, for the participants that leave abroad, work on freelance basis or are unavailable to meet in person. About your Participants C2. Who will be potential participants? Among others: Simon Rogers, James Ball, Lisa Evans, Jacopo Ottatiavi, Paul Bradshaw. C3. How will the potential participants be identified and recruited? Suggested by supervisor, contacted by email. C4. What is the potential for physical and/or psychological harm / distress to participants? None C5. Will informed consent be obtained from the participants?

X Yes No

If Yes, please explain how informed consent will be obtained? I will obtain hand-signed consent forms from the participants I will meet in person. If No, please explain why you need to do this, and how the participants will be de-briefed? In case the interviews are held through Skype, I will send the consent form to the interviewees by email and I will ask them reply my the email that they have read the specific consent form and that by replying to this email they agree to its terms, giving therefore, their consent to participate in the interview. Alternatively, if some participants have electronic signature, they can sign the form with that and send it back to me by email. C6. Will financial / in kind payments (other than reasonable expenses and compensation for time) be offered to participants? (Indicate how much and on what basis this has been decided) No About the Data

C7. What data will be collected? (Tick all that apply)

Print Digital Participant observation Audio recording (of face-to-face or Skype interviews)

X

Video recording (Screen recording of Skype Interviews if participants agree)

X

86

Computer logs Questionnaires/Surveys Other: Skype Chat or Email with questions (In case Skype interview fails)

X

Other: C8. What measures will be put in place to ensure confidentiality of personal data, where appropriate? Both audio and/or video files of the interviews will be stored securely and no third parties will have access to the data. All interviews will be transcribed. All interviewees will be asked if they agree for their name to be mentioned in the dissertation or if they prefer to retain anonymity and to be referred to as Interviewee 1, Interviewee 2, etc. They will also be asked at the end of the interview if they wish for something they mentioned to be omitted in the transcript. C9. How/Where will the data be stored? The data will be stored safely in digital format on personal computer and a personal secondary hard external disk and no third parties will have access to it. C10. Will the data be stored for future re-use? If so, please explain The data may be re-used in the future for further analysis of the subject in a possible future article publication. About the Procedure C11. Does your research raise any issues of personal safety for you or other researchers involved in the project (especially if taking place outside working hours or off University premises)? If so, please explain how it will be managed. The research does not raise any issues of personal safety for the researchers or me.

87

The University of Sheffield. Research Ethics Review Information School Declaration Title of Research Project: "Infographics: Data and Information Visualization and its use in Journalism - A Case Study on Guardian's Data Store". We confirm our responsibility to deliver the research project in accordance with the University of Sheffield’s policies and procedures, which include the University’s ‘Financial Regulations’, ‘Good Research Practice Standards’ and the ‘Ethics Policy Governing Research Involving Human Participants, Personal Data and Human Tissue’ (Ethics Policy) and, where externally funded, with the terms and conditions of the research funder. In submitting this research ethics application form I am also confirming that:

� The form is accurate to the best of our knowledge and belief. � The project will abide by the University’s Ethics Policy. � There is no potential material interest that may, or may appear to, impair the independence

and objectivity of researchers conducting this project. � Subject to the research being approved, we undertake to adhere to the project protocol

without unagreed deviation and to comply with any conditions set out in the letter from the University ethics reviewers notifying me of this.

� We undertake to inform the ethics reviewers of significant changes to the protocol (by contacting our academic department’s Ethics Coordinator in the first instance).

� we are aware of our responsibility to be up to date and comply with the requirements of the law and relevant guidelines relating to security and confidentiality of personal data, including the need to register when necessary with the appropriate Data Protection Officer (within the University the Data Protection Officer is based in CiCS).

� We understand that the project, including research records and data, may be subject to inspection for audit purposes, if required in future.

� We understand that personal data about us as researchers in this form will be held by those involved in the ethics review procedure (e.g. the Ethics Administrator and/or ethics reviewers) and that this will be managed according to Data Protection Act principles.

� If this is an application for a ‘generic’ project all the individual projects that fit under the generic project are compatible with this application.

� We understand that this project cannot be submitted for ethics approval in more than one department, and that if I wish to appeal against the decision made, this must be done through the original department.

Name of the Student (if applicable): Charalampia Boula Name of Principal Investigator (or the Supervisor): Farida Vis Date: [insert date] 05/07/2013

88

The University of Sheffield. Information School

"Infographics: Data and Information Visualization and its use in

Journalism - A Case Study on Guardian's Data Store".

Researchers Charalampia Boula Supervisor: Farida Vis Purpose of the research This study primarily aims to examine the role of information and data visualisation in journalism, based on an analysis of the biggest journalistic Infographics portfolio in UK, The Guardians' "Data Store". The study's main objectives, among others, are to: 1) Investigate how data and information visualisation is used journalism and which is its importance as perceived by the professionals, 2) Examine the various tools used either in data analysis (and possibly formulation / editing) or in visualisation, and more specifically by The Guardian, 3) Identify the required skills and knowledge in order to work on data visualisation, 4) Discover limitations, weaknesses and possible negative aspects or impact of data and information visualisation. Who will be participating? Professionals who either work for The Guardian Data Store or have in the past cooperated and published with them. What will you be asked to do? You will be asked to participate in an interview of approximately 30 minutes, and answer open and closed format questions. What are the potential risks of participating? The risks of participating are the same as those experienced in everyday life. What data will we collect? The interviews will be audio recorded either when held face-to-face or through Skype, and might also be video recorded with a screen recording software (if participants agree) in the case where the interviews are conducted through Skype. All data (audio/video files) will be stored securely in a file in personal computer and an external hard drive and no third parties will have access to the data. What will we do with the data? The data will be mainly used for the purposes of the dissertation, but it may be re-used in the future for further analysis of the subject for a possible future article publication. If a participant wishes for the data from their interview to be used only for the purposes of the dissertation, then it should be mentioned to the researcher and the data will be deleted after the dissertation is complete.

89

Will my participation be confidential? All data (audio and/or video files) of the interviews will be stored securely and no third parries will have access to the data. All interviewees will be asked if they agree for their name to be mentioned in the dissertation or if they prefer to retain anonymity and to be referred to as Interviewee 1, Interviewee 2, etc. They will also be asked at the end of the interview if they wish for something they mentioned to be omitted in the transcript. What will happen to the results of the research project? The results of this study will be included in my master’s dissertation, which will be publicly available. Please contact the School in six months. I confirm that I have read and understand the description of the research project, and that I have had an opportunity to ask questions about the project. I understand that my participation is voluntary and that I am free to withdraw at any time without any negative consequences. I understand that I may decline to answer any particular question or questions, or to do any of the activities. If I stop participating at all time, all of my data will be purged. I understand that my responses will be kept strictly confidential, that my name or identity will not be linked to any research materials, and that I will not be identified or identifiable in any report or reports that result from the research. I give permission for the research team members to have access to my anonymised responses. I give permission for the research team to re-use my data for future research as specified above. I agree to take part in the research project as described above. Participant Name (Please print) Participant Signature

Researcher Name (Please print) Researcher Signature Date Note: If you have any difficulties with, or wish to voice concern about, any aspect of your participation in this study, please contact Dr. Angela Lin, Research Ethics Coordinator, Information School, The University of Sheffield ([email protected]), or to the University Registrar and Secretary.

90

Information School Research Ethics Panel Letter of Approval Date: 8th July 2013 TO: Charalampia Boula The Information School Research Ethics Panel has examined the following application: Title: Infographics: Data and Information Visualization and its use in Journalism - A

Case Study on Guardian's Data Store.

Submitted by: Charalampia Boula And found the proposed research involving human participants to be in accordance with the University of Sheffield’s policies and procedures, which include the University’s ‘Financial Regulations’, ‘Good Research Practice Standards’ and the ‘Ethics Policy Governing Research Involving Human Participants, Personal Data and Human Tissue’ (Ethics Policy). This letter is the official record of ethics approval by the School, and should accompany any formal requests for evidence of research ethics approval. Effective Date: 8th July 2013

Dr Angela Lin Research Ethics Coordinator

91

Appendix 2: Qualitative Research Methodology - Interviews' Questionnaire & Transcripts

2.1 Indicative Interviews' Questionnaire Do you agree to be recorded? If you want anything to be omitted from the transcript or to remain anonymous please let me know... START RECORDING: My name is Charalampia Boula and I am having an interview with ......................... We will start with some general questions.... 1) Why are Data Driven Journalism and Data Visualisation constantly growing in importance and use in your opinion? 2) Do you use Data Visualisation in your work and why? 3) Which are the main advantages on using Data Visualisation and what the disadvantages? 4) Do you believe that Data/Info Visualisation can be misleading? If yes, which is the best way to avoid this from happening? 5) How do you choose what to keep and what to omit from the available data? 6) Which are you main data sources that you prefer to use? 7) Is it important that the data you use is freely available? 8) Could you give an example when it would be not possible to give access to the data? Under what circumstances would something like that happen? (We know that The Guardian Data Store of course always publishes the data.) 9) In case the data is released by the government or another official organisation, could you say a few things about what tends to be your standard procedure? In case the data is collected by you, which tends to be your standard procedure till the final publication? 10) Big data is now a very hot topic. Could you please say what is big data for you? 11) Which are the tools you prefer to use to gather/refine/organise data and which to visualise it and why? 12) How do you decide which type of visualisation the best for your story? 13) Do you prefer static or dynamic visualisation? Could you say something about the strengths and weaknesses of each one?

92

14) I am now going to give you two potential statements and I would like you to comment on them: The first one is: Do you search for available data on a specific story and try to prove/disprove a hunch or identify the basic information hidden in it? The second one: Do you examine various data sets in general to see if there is a possible underlying story? Do you prefer any of the two statements more and which one occurs more often? 15) Now I would like to carry to your own work on The Guardian Data Store and to comment on a little bit on this and particularly to comment on/elaborate a little bit on the.... (procedure - if not explained, which tools were used and why, what made them investigate the topic, how they decided which variables to focus on, how much time it took and who cooperated on it...) 16) Do you have any insight/feedback on how are articles with data visualisation are perceived by the readers?

§ Do you believe that the readers understand them better because they are on the form of data journalism?

§ Do you believe for example that your stories worked better because they had this visualisation?

§ Do you think that most people find these data visualisations easy to understand? Could you give an example of a data visualisation or a type of data visualisation that you believe it is hard for the average data to understand?

17) According to you, which do you think are the required skills/knowledge for someone to work on data driven journalism and data visualisation on a professional level? 18) If you were building a data journalistic team for a national newspaper could you say something about the kinds of skills that team needs and how big the team needs to be? How many people?

19) How do you think that data journalism and data visualisation have changed journalism? How do you see data journalism and data visualisation develop in the future?

93

2.2 Transcript of Interview with Jacopo Ottaviani

Charalampia Boula: Hello. Jacopo Ottaviani: Hello. C.B.: Hi, how are you? Thank you for agreeing to help me with my dissertation. J.O.: You’re welcome, you’re welcome. C.B.: I had asked you on the email and you said it is okay for me to rec ord you. J.O.: Yes, yes. C.B.: I am only recording the sound. Okay? J.O.: Yeah, yeah. Sure, okay. C.B.: If after we finish you wish for me to remove something from my transcript I can do that and if you also wish to remain anonymous I can do that as well, I can refer to you with a code name or ‘interviewee number 1’, for example. Whatever you wish, if you don’t have any objections I can use your name. Just let me know. J.O.: Okay, then, that’s okay. C.B.: Okay. So, I’m having an interview with Mr Ottaviani and I will start with some general questions. And, the first question is why are data driven journalism and data visualisation constantly growing in importance and use, in your opinion. J.O.: You mean in Italy, or in the UK? C.B.: In general. J.O.: In general?

C.B.: In general, yes. J.O.: Okay, in general I would say that it’s growing as fast as the technology. I mean, the information is growing, the quantity of the information circulated online is growing so from those data that goes around on the internet it’s possible to extract stories that can be useful or interesting for the people, for the readers. So, one of the major causes I would say is the explosion of data online. And also, another reason for this could be the growing sensibility of governments and of institutions to be more transparent. For example, probably you know the Open Data. C.B.: Yeah, I have questions about that. J.O.: You know that? So, first of all I would say technology, information technology that it’s getting more common and more present in people’s lives so there is a lot of data circulated for example in the social media, on Twitter, on Facebook, everywhere. And that growing sensibility about open data and transparency of the

94

governments and then yeah, of course, this one, the third reason I am saying now it’s connected to the first probably but as you probably know the online media are taking, are replacing slowly, slowly and fastly, depends on where, but classic traditional media like papers, newspapers or magazines, everything is going online, so... data journalism is something that uses a lot, or data journalism and media visualisation uses a lot of interactives that are... that can’t be easily put on the paper but should be online so the readers can click around, they can write and interact. So okay, I think I cannot find another reason. Errm, oh yeah, data and statistics generally are easier to be fact-checking in my opinion, that people’s opinion or other sources of information so when you have a data set then you can use the scientific methods to check the quality of the data and the quality of the statistics. So, it’s more reliable somehow and the discussion on a piece of data journalism can have more methods to check the quality, that it’s you can read for example the methodology behind a data set and you can discuss about the quality of it. So you know, fact checking is another key point. And, yeah I think if I have more ideas after... C.B.: We will discuss, I have more questions so in the course and a lot will be answered to the other questions so it’s fine so far. My second question is, I’ve seen that you’ve used data visualisation and, do you use it in all your stories, in all your articles and why, or some of them, and why do you do that, why do you prefer to use it? J.O.: Okay, well personally, errm, I usually do a lot of Maths, I make Maths and I really believe in this mean, in this form of communication because it gives you an overview on some phenomenon and it’s very useful to have a map because it gives you an idea on how a phenomenon is distributed on spatial data, on some country, on some continent or even some city. So, it gives you a geographic view on some phenomenon and data visualisation in general are not only maps but they can be

also new forms of communication, of visual communication, they can be charts, they can be a combination of charts, they can be also a sort of... C.B.: Graphs and... J.O.: Yeah, graphs, networks, and they are very useful? In my opinion because they are a very fast, a very quick view on something that could not be read in any other way. So, for example, you have some data sets on some issue in the UK and you want to report on it, you want to explain that to the readers, and to make it easier, to make it faster, you can use design, you can use maps, you can use the other

95

data visualisation, and if you think it’s faster to read something from a chart, something from a, an infographic, than from the data itself or from a text, how it was 10 or 20 years ago, it was mostly text. Okay, there were already [sound break] it was not so developed like today and so now we are using more images and interactive to be faster with the readers, in my opinion it’s a matter of speed because today we have so much information and we have to read more information in less time, to make it easier to be received. C.B.: I will jump in another question that I had about that so I will go there. Do you have any feedback on how the readers, how they perceive the articles based on data journalism and they have visualisation, for example, do you believe that they understand them better because of that? For example, if the same article was just written with words and didn’t have any visualisation do you believe that the readers understand them better with visualisation, does it depend on the type of visualisation perhaps? J.O.: Yeah, in my opinion, it depends a lot on the quality of the data visualisation or the data journalism piece but in general I think if it’s well done, for example if the info graphic is well done then the readers can really understand better a phenomenon because it uses a lot of, it uses for example our perception of space if you want to show a proportion. Probably you saw some info graphics where they show how many soccer fields of forest is in one area or how many soccer fields of forests are cut down in an area. I saw something about that lately and it’s very intuitive to understand, more than text. If you say the number of square kilometres that are covered by forest in an area it’s not very easy to understand how much it is. But if you compare it, if you transform the quantity in an image then everybody knows, then it’s easier to understand the information and with maps also the yeah, everybody knows how it is, what’s the shape of each country and from the map you can see where the problems are and you can see patterns from the map that

you wouldn’t see from a text, unless the journalist already made in a study. But it’s very interesting to see how readers can find stories by themselves from some kind of map, from some kind of visualisation. Mostly, sometimes can happen that you make a map and then the readers they take stories from it because they play with it, they click around and they notice that near their place something is going on and they go deeper on it, so it’s a win-win situation.

96

C.B.: Yes. Do you have an example of a data visualisation that wasn’t, that you think that it was very difficult for people to understand? Have you seen something that... J.O.: Yes I have. If you wish I can send it to you later6 C.B.: That would be great J.O.: I'll just make a note to send it to you. You want visualisation that doesn't work? C.B.: Yes, something that you believe that it would be very difficult for an average reader to understand. J.O.: Okay, I will look for it and I will send it to you. C.B.: Okay, thank you. So, through the discussion you’ve answered already my next question was which are the main advantages on using data visualisation and do you believe there are any disadvantages on that? J.O.: Disadvantages... Let me think, disadvantages. Oh yeah, in my opinion there is one risk on doing data journalism in general but also data visualisation. And the risk is that you’re reporting stories without giving the quality that you view to them, you are just reporting something quantitatively and this can result in coldness, I don’t know if I express it very well. C.B.: Yes, I understand. J.O.: When I say coldness I mean that you’re not really giving something that creates an emotional reaction on people. When you write a story or for example when you interview somebody then you can go deeper inside the psychology of the other and it’s very difficult to do it with data journalism because data journalism is based on numbers, on statistics and statistics aggregate a huge amount of stories so I think it’s a challenge to connect the quantity and the aggregated data to the single stories and if you can create a bridge between these two methodologies then really you can have an efficient way to communicate the stories. That’s

actually my philosophy; my philosophy is to give numbers an identity. So, for example, I made that map on deaths in prisons, I don’t know if you saw it. C.B.: Yes I’ve seen it. I’ve seen both in the Data Store J.O.: Yeah the idea is that each of those markers on the map are the important stories so if you can combine the map with single stories then you can really have, you can really reach a better result in terms of communication, in terms of reaction also in the readers.

6The provided link was: http://flowingdata.com/category/statistics/mistaken-data/

97

C.B.: Do you believe that it’s especially when it affects people or it has to do with stories, like the ones on prisons, the deaths when you mention the names of the younger people. Do you believe that this is more important when it has to do with people? J.O.: Yeah, I mean that depends actually. It could be people, it could be environment but everything is connected to people. So, for example, if you map for example environmental problems or pollution over, I don’t know, over the UK or over London or over a neighbourhood and then the people discover that something is going wrong near their place then you can raise a reaction, you can create a reaction then yeah. But yeah, of course, it has to be connected to people. C.B.: How do you choose what to keep and what to omit from the available data? J.O.: Can you repeat please? C.B.: How do you choose what to use and what to leave on the side from the data you have? J.O.: Actually, I based all my problems, all my researches on my data journalism researches on research questions. So, usually I have some research questions and I want to answer to them and so, for example, in the prisons I wanted to show who died in prisons and that’s why I reported all the data that were connected to that. The questions were who died, where and how. So, for example, I didn’t write, I didn’t include some informations regarding I don’t know, the administration on the prisons. But usually I start by making questions, with making research questions that in my opinion can be relevant for the readers. Then I answered them. C.B.: Okay, that’s clear. And which are your main data sources that you prefer to use? J.O.: Well, usually if it’s available I use government data and if it’s open it’s better, because I don’t have to transform it into an open format so usually it’s easier to use for example the Excels that are made available by the governments. If not, I use

‘Scrapers’ So I make the data sets myself and in particular I use Scraperwiki, I can send you the link of that. C.B.: They have the link on your articles in the data store so I’ve seen it. J.O.: Yeah yeah, it’s a UK based business tool to create scrappers so you can also visit it, it’s in Liverpool and they have some servers that help journalists or developers to scrape data. So, scrapping probably you know what it is, it’s to create a dataset from a free text, from something that’s it not yet a data set you just take data from web pages and then you just transpose it to a dataset in excel

98

format or CSV for example. So yeah I use government data and if I need it I just take the data from other sources using the scrapping technologies. C.B.: Do you believe that it’s important to use data that it’s freely available, is it better not just for practical reasons for you but for the reader. Do you believe that for example if they see the original governmental excel file, for example, it is for them easier, or do they have more trust for example in what they read? J.O.: Well yeah, usually of course, if the data comes from statistics use from the government there should be more work behind it of course but that depends on the country and it depends on the institute. But sometimes of course now if we think of the UK or if we think of America or if we think of the first world countries then probably the datasets are reliable but for example there could be governments that don’t release data or if they release it they don’t release it completely so that depends from the country but yeah if we stick with the UK and the Europe governmental data or mostly I would say for example Institutes of statistics like the Royal Institute of Statistics in the UK or every country has one of those, those are quite good usually. In Italy I use that too. And they’re moving all their work online now yeah, so... C.B.: Do you have any example why you are writing a story but you cannot, erm, that it’s not possible to give access to the data to the reader? For example, in the Guardian there is always the data file to download for the reader. Have you seen or have you worked on an article where you couldn’t release the data where you had just to give some facts and nothing more and for some reason you couldn’t make it available to the reader? J.O.: You mean if I found some source... C.B.: It could be a confidential source or something that it’s not allowed; it could be for safety reasons that it shouldn’t be published. J.O.: Uhm, well there is the most important case of Wikileaks probably you know it

and it was very controversial because when Wikileaks published all the day people said it wasn’t checked and whether it included lots of names of people who could risk their lives afterwards. That’s probably the most important case. And for example they included the names of people who worked for America or also local sources of information and they could risk their lives after that. But yeah that’s the most important example in my mind now. But of course privacy is very important and we, data journalists, we have to be careful to expose all the information that is necessary and to be careful with the personal data that we are going to publish.

99

C.B.: Okay, that’s good. And in the case that the data is released by the government or another official organisation, which is the standard procedure that you tend to use in order to reach to the final publication of the article? J.O.: Well, first I choose the topic. For example, if I want to report on some environmental issue then I start looking for all the statistics published by the local institute of statistics then I look for the data released by the Ministry of the Environmental Protection and then I try to see if there are any independent observatories that release datasets, that happens usually with important issues. And that’s also very interested to see the differences between the data released by the governments and the data released by the independent institutes or by the independent associations. And then when I have identified the most relevant datasets I try to combine them if possible or to take what I need from each of them and to merge the tables if it’s needed or even to show the different scenarios that emerge from the different datasets. So can by that one visualisation is based on one dataset and another one is based on another one. So, I try to get the most out of everything. But usually I would suggest to use more datasets coming from different institutions because this is a way to see how every institution can be biased. Or... that’s it yeah. C.B.: Do you also do the visualisation yourself? J.O.: Well, usually I do Maths myself yes and then usually I also make some charts and I do them myself also and the scrappers yeah I do them myself and yeah, although it would be better to have a team. You have to consider that in Italy, like in all the Southern countries in Europe data journalism is not that developed actually. So I’m one of the first persons who made it in Italy and it’s still very difficult to tell the editors that’s it’s very important to have a data journalism unit in the newspaper. They don’t understand the real value yet of it. And this in Italy or also in Spain and Portugal, I suppose also in Greece.

C.B.: Oh yes it’s probably the same. It’s not that popular yet or if it’s used it’s used in a more premature way. J.O.: Yeah yeah, I agree. C.B.: And in case that you collect the data yourself, I mean, do you work purely on raw data? J.O.: No, no. Usually I use datasets.

100

C.B.: Okay, so then I’m moving onto another area that is quite a very hot topic at the moment; big data. What is big data for you? Because every person has a different perception of what big data is. J.O.: Yeah, big data in my opinion is that data that cannot be handled and elaborated by a single computer. So, also I can tell you that, also from my computer scientist’s background, because I have bachelor in computer science. So probably my definition of big data is more scientific and it’s not the same of what is represented everywhere nowadays. In my opinion big data is that data that can be elaborated with parallel computers, with multiple calculators that can handle extra bytes of data and you can imagine big data can be the DNA that comes from the human body or the huge amount of, without using the example of biology, you can think of the whole amounts of information that is going around on Twitter, that’s big data. It’s really "unhandle-able" and unreadable with simple computers, with the computers we use at home. You need special computers and special algorithms to handle it and to get meaning out of it, yeah. C.B.: Do you believe it’s only the dimension of quantity that defines its peak or could it be duration, for example, if there is a data that goes on for a very long time that they try to collect? J.O.: Well, actually big data itself it might be that it’s a matter of quantity yes, but elaborating more at least you can say that from big data then you can extract little datasets that come from big data originally but then they can be handled by simple computers or by simple developers but yeah you can still consider it like a result of big data, a self class of the big data or a subset yeah. C.B.: Okay. Which tools apart from Scraperwiki that you already mentioned, which other tools, free or paid programmes, you use to either gather, refine and organise data and to visualise? J.O.: Okay, well there are plenty of tools but my favourite tools are Scrapper wiki as

you said then Google refine, then, let me think, I can also see here, and in my works I use Batchgeo, I don’t know if you know it, it’s to make maps C.B.: Yes I’ve seen that yes. J.O.: It’s a very simple tool to make maps, it’s called Batchgeo then Google Maps is useful but nowadays I’m changing to Open Street maps and Map Box and Leaflet that are more difficult to use but they give you more freedom when you want to make maps than Google Maps. Google Fusion Tables for example is very easy to use but it doesn’t give you a lot of freedom. So I prefer Map Box and

101

Leaflet, that’s another one. If you want I can also send you the links of this but it’s very easy to Google them. C.B.: I’ll Google them and in case I cannot find any one I’ll ask you on an email. J.O.: Yeah sure. And, let me think of some other tools. Ah another good one is Datawrapper. C.B.: Yes, I’ve heard of that as well. It’s very used in the Data Store as well, I’ve seen it. J.O.: Yes, Datawrapper is very good and it lets you do charts very easily and interactive charts. C.B.: And do you prefer static or dynamic visualisation? By dynamic I mean interactive. J.O.: I prefer dynamic yeah. C.B.: And which are the strengths and weaknesses of each one, the dynamic and the static? J.O.: You mean the main differences between the dynamic and the static? C.B.: Which one is better in what. Mostly which one is weaker, it’s not that good to do something. J.O.: Yeah, well first of all it depends if you’re publishing online or on paper. If you publish on paper you have to do all static because it’s obvious, but if you publish online it’s always better if the users can interact with what they have in front of them. And so if you have a dynamic visualisation on which the readers can click and they can click with it I would say it’s suggestible because it involves the readers on a major degree. But yeah in some cases you have to be careful when making interactives because if they’re too complex then the users don’t know how to use it. So in that case probably you have to simplify it and in some cases if it’s not possible to simplify the dynamic one you can even do the static one. But it’s always a balance between complexity and efficiency and also beauty sometimes,

yeah. C.B.: How do you decide which type of visualisation is better for your story? And does it depend on the topic or on the data you have? J.O.: It depends on both. And actually sometimes you would like to make some stories but the data doesn’t allow you to do it. Or for example you would like to map a phenomenon but to do that you need geolocated data and that’s not always available. So it depends on what you want to represent but when you found out what you want to represent or expose in your stories you have to also find the data

102

that allows you to do that. So, I would say that the data infuelnes the choice of the data visualisation, yeah. C.B.: Now I’m going to give you 2 statements because I separated in 2 different statements rather than a very big question that would be confusing and I would like your comments on those. The first one is: ‘Do you search for available data on a specific story and try to prove or disprove a hunch you have about something, about a topic and try to see if the data proves or disproves what you had in your mind before writing the story or finding any other information about that? J.O.: That’s an interesting question. Sometimes we have prejudices and of course we have to deal with them. I don’t know if you read the article on ‘The Independent’7, I can send it to you in which it’s written out the perception of reality it’s really far from the reality itself. And they ask people to answer simple questions about statistics on general issues in the country. For example: ‘Is crime increasing or decreasing?’. And all the people said it’s increasing but then if you check this statement with the data you discover that it’s not. So sometimes we are sure about something but when we read the data and when we analyse the data we discover that it’s not like that. And that is one of the most important aims of data journalism; to check what is written around by journalists by opinionists, to double-check it and give a reliable opinion on it based on data, based on statistics. So I would say that everybody has prejudices but if you want to be a good data journalism you have to be open to review all you prejudices and to disproof what you had in your mind and yeah that can be critical can be also controversial because it seems that if you want to review some issue using the data it seems that you want to carry the message that that issue it’s not so important because okay it’s decreasing for example the phenomenon of domestic violence it seems that you want to say okay we are worrying too much but no, it’s never too much. So okay, sometimes you have to tell the truth but you have also to underline that this can be still a problem

even if it’s smaller, it’s important to act on it. C.B.: I understand. Do you sometimes try to see if there is a story without having anything in your mind when a dataset is available online or is realised by the government. J.O.: Yeah, sometimes it happens that I have a dataset I find it just by chance, I start reading inside it very quickly and if I see something interesting that can be 7 Link to the article on The Independent: http://www.independent.co.uk/voices/comment/immigration-crime-benefits-everything-you-know-about-the-state-of-the-nation-is-wrong-8697574.html

103

geolocated and/or can be represented with data visualisation then I keep it here in my computer and I try to make something out of it. C.B.: But what happens more often is that you usually have the story in your mind and you try to find the available data on that? J.O.: Uhm, yeah, more than the story I have the topic. Because the story should come afterwards when you have the data. But yeah, if I have the topic in my mind then I start looking for the datasets or the possible data sources and when I find them then I continue, I keep on going with the research and with the refinement. It’s a long process usually, it’s never very quick. Anyway I would say that data journalism is a branch of investigative journalism somehow, so it takes time, it takes long time to do it. C.B.: It takes time. For example, the one about the Italian prisons that I read, your piece on the Data Store. How long did it take for you to reach to the final? J.O.: Well yeah. It took a month, exactly one month and I have to say that it was the first time I was doing something like that so I have to learn a lot from the technical point of view, but for example that was my first Scraper I made with Scraperwiki and that’s something related to programmation and although I was already keen on programming and I was already familiar with programming I had to learn how to use that tool, that particular tool. And that took a long time but apart from that also managing all the issues related to data journalism anyway takes long times. You can reduce the part related to developing because the more you’re familiar with a tool the faster the experience with it but usually it takes a while. And of course if you have a team working with you it takes shorter, if you have somebody that makes all the software development part then you can concentrate on the data investigation and you can make things parallel so you save time But if you’re doing all by yourself like a one man band then it takes time so the prison work took one month.

C.B.: And what made you investigate that specific subject; of the many deaths in the Italian prisons. J.O.: The importance of the issue in my opinion that it’s a big problem in Italy. The prisons are overcrowded there is no space for prisoners and a lot of them are committing suicide and it’s all kept in silence, nobody knows about them. So, I was morally involved in that I felt that okay that’s something that should be underlined, should be put in the centre of the attention, of the public opinion. So I would say

104

that the relevance of the topic pushed me to do more and to look for the datasets and then to visualise them with a map, to map them and to visualise them. C.B.: Do you have time for 2 or 3 questions; my final ones? Because I don’t know if you need to leave. J.O.: No no you can, yes. Please. C.B.: According to you which are the required skills or knowledge for someone to work on the data journalism and data visualisation field on a professional level? J.O.: I would say that the first thing to study is programmation. So some programming, some statistics, some design and then journalism like ethics, all what is already studied in all the journalism schools but you have to add the programming, you have to add design well okay social media, how to use social media but that’s probably the easiest of the ones I’ve mentioned. Statistics is important also, you have to study how it works; statistics. C.B.: Specifically for programming do you have any...? J.O.: Yeah I have some suggestions. I’d say if you don’t know anything you have to start with HTML, then CSS, then Java Script and then could be useful to add some Python and it could be useful to study for example how to use Excel but in a deeper level so use the macros for example tables how to use Pivot tables and Google refine also requires some programming. Oh and another cool thing would be regular expressions (?) and yeah I would start with this. Oh okay you can also study some MySQL, you know that? That’s also important how to handle databases. C.B.: Yes okay. So my next question is if you were building data journalistic team for a national newspaper uhm which... J.O.: Yeah okay I would like to, I would like to but in Italy... I think something is moving now but it’s still necessary sometime, it’s not yet concrete. C.B.: So if you were the first one to build a team like that, for example if a

newspaper a major newspaper asked you to do that how big that team will be how many people and what types of skills would they have? J.O.: I would say that 3 or 4 people are enough. So I would prefer a small team with highly educated highly skilled people and I would include one programmer, one designer and one or two journalists and okay if you really want my dream team I will also include a statistician. Yeah that would be very interested to have one statistician who can really give you suggestions one statistics on a technical level you can make big things with that. But yeah so it’s all about combining different

105

backgrounds together. And would be also interesting that every member of the team knows a bit of every skill. Of course everybody has to be specialised but the people should exchange ideas also on what it’s not their real field. It’s interesting to see how their different skills overlap. C.B.: My final question is how do you feel that data journalism and data visualisation have changed, journalism in general, and what do you see in the future? J.O.: Journalism in general well it brings the question of statistics and numbers in the centre and for example what I was saying before, fact checking, data journalism helps to put in the centre the question of fact checking. That is, is what I’m writing right or not? It’s a matter of truth I mean because if you write your opinion without basing your opinion on numbers on statistics on something that is scientifically provable the risk is that you’re just giving an opinion, which doesn’t represent the reality. So data journalism is helping to put the truth in the centre and also it helps to, for example, push or foster the governments’ transparency but all this movement of data journalism is always asking the governments to open up their archives and their data because governments have a lot of data and from that data you can write very interesting stories but, of course, not all the governments wish to release their data because often it’s controversial and puts them in a bad position but data journalism is helping to involve governments in this process. C.B.: I understand. And what do you see in the future for data journalism and data visualisation? J.O.: I think it will get more popular more spread and because the online media offer a lot of opportunities and since the paper is going to disappear quite soon, not so soon, but I would say in some decades or even less what the online media offers gives data journalism a possibility to extend, to expand, to get more popular. C.B.: It is easier to share also, isn’t it?

J.O.: Yeah, it’s also easier to share and it involves the readers on a bigger level, on higher level and people can comment single instances can interact between themselves, can also add to fact-check or can also add to build datasets actually with crowdsourcing. That’s another really interesting reality and methodology. I would mention it between the most interesting ones because it involves the readers to build a dataset and to contribute at a new story of example. C.B.: Okay, thank you very much for your help. J.O.: You’re welcome, I hope it was helpful.

106

C.B.: It was really helpful, there were some things that have never crossed my mind and know I have a mini perception. [...A quick conversation about the Master's programme in Sheffield follows, which is not relevant to the research] C.B.: Is it ok if I use your name? J.O.: Yeah, I think I haven't said anything bad, right? C.B.: No, you haven't. J.O.: So, yeah, you can use my name. C.B.: Thank you.

2.3 Transcript of Interview with Lisa Evans

C.B.: So, for the record, I’m here with Miss Lisa Evans and I’m having an interview for my dissertation. Miss Evans I would like to start with some general questions. Why do you believe data journalism and data visualisation are constantly growing

in importance and in use in your opinion? L.E.: In Europe? C.B.: Aha. L.E.: So was the question...? Could you just? C.B.: To repeat? Yes, I’m sorry my voice it’s a little bit... L.E.: No, it’s not that it’s just that I think it cuts out a little bit. C.B.: Why do you believe that data journalism and data visualisation are constantly growing in importance and in use in your opinion? L.E.: I think it’s partly because the technology has become available more easily for people to use so like the barrier for access to making charts really quickly has bellowed quite a lot in the last couple of year, in the last few year and that’s because, partly because there’s been more funding for those kind of projects and the bigger companies like Google and so forth have made visualisation projects and then it kind of makes sense doesn’t it? With news been online rather than just the paper through the door there is two things coinciding quite closely cause we previously had charts that took a long time, well not a long time but a lot of skills to make and were really star lighted for the newspaper and now we put them online and everything is a kind of quicker so I think improved technology and a move online for all kinds of communication has brought about more data journalism and also obviously more governments are releasing data more this called transparency

107

movement has taken place where there’s some accountability, you gain accountability through being open about how you contact your business. So yeah 3 things come together with data journalism. C.B.: Okay. I’ve seen that you use data visualisation in your work, why do you use it which are the benefits from this? L.E.: You can communicate very quickly and kind of intuitively some fairly difficult ideas, difficult to explain ideas otherwise in words, so yeah it’s kind of a really powerful medium to be able to visualise things, not that I do it brilliantly well, just there’s much better people but with being at the team of the Guardian there’s great skills to work together and make visualisations that really are effective. C.B.: Which do you believe, if there are any disadvantages on visualisation? L.E.: Uhm, yeah I think, oh I should add to the previous question you’ve just asked. I think people, we notice, well I notice, that there are 2 sorts of visualisation, ones that are just like one quick instant message and ones where people can take time and explore them like the really detailed maps that we make. So, with the last, they really explored, the things that people take time and explore that’s quite a good way to engage with your audience because by spending more time, by thinking more they’re usually adding more comments or going and looking at other parts of the websites then you’re able to establish more of a relationship with them, with certain types of infographic. And then the disadvantages of using an infographic, well it’s very easy to get it wrong and there’s so much involved in it as well, I mean like just a simple bar-chart there are so many decisions that you have to make when you’re creating it, if you’re doing it all from scratch like the graphics team do it at the Guardian. And so they have their own conventions such as do you put the number on the scale on the line, below the line, above the line; do you put the last, like, when do you cut of the axes, what level do you have the data going above, if you use the greatest value you’ve got on it and so forth and yeah so there’s a lot of

decisions to be made and obviously you can do that beforehand, you can have that as a convention that you just work to, but other people come to it with different eyes and then yeah sometimes you worry that you’re giving the wrong impression cause you do always have to decisions about you want to emphasise on the graphic even from a huge dataset that’s got lots of different stories in it. Sometimes you might be emphasising something and you’re missing the bit, the best point or something like that so yeah they’re more complicated to deal with.

108

C.B.: Do you believe that data visualisation can be misleading and if you believe that this could happens which is the best way to avoid it? L.E.: Yeah they can definitely be misleading and the best way to avoid it is publish all the data so that your audience can go away and look through and write in the comments ‘Hey you’ve missed this point that’s much more important than the ones you’ve emphasised’ or whatever. C.B.: Yes, I understand. How do you choose what to keep and what to omit from the available data? L.E.: It’s, all comes down to judgement. After a while it becomes kind of more instinctive like when you’ve engaged with, when you’ve read all the comments and things that people respond to and also there’s applying test and things like what’s the biggest, the classic thing is what’s the biggest percentage change or what’s the biggest change in this data and that might often be the emphasis than you want to give them but other than, but then there’s like the, about the meaning of that data so there might be a small change or something that really matters a lot to people so yeah all comes down to judgement and experience, and having a good team around you who’s been experienced in writing for the newspaper and working for the Guardian for a long time, was really empowering (?). C.B.: Which are the main data sources that you prefer to use? L.E.: Usually we go for official types of sources so things from agencies that, and organisations, that have got a good methodology and they’re well respected so for example with the situation in Syria we, to start with because there weren’t official figures, we just gathered data from other newspapers and our own newspaper to count a number of people who lost their lives since the uprising and then the UN published their own figures so you automatically go to the more official source and use that, but we can still use the references to the newspaper articles that quoted deaths and things.

C.B.: Is it important that the data you use is freely available? L.E.: Yeah, yeah we always have to share it pretty much so that’s kind of the way that the Data Blog is set up instead of having this very fixed format where you’ve got headline, brief summary of the article, then a description of data and maybe some inforgraphics in that and then always at the bottom the data source so yeah the data has to be freely available otherwise we break that format. C.B.: Have you even been in a situation where you couldn’t give access to the data to the people? You couldn’t provide the data online?

109

L.E.: Yeah, sometimes, we have to learn to be really really plain with people when they share their data that we would want to republish it and that sometimes cause trouble so I think especially with private companies cause they’re data soft and commercially sensitive and if one person in their team gets afraid that they’re giving away something that might be combined with some other data somewhere and they will reveal something that’s usual (?) to their business they might call out say that’s, so you just have to be very clear with people at the start like we are the data blog and we publish data that’s kind of what we, that’s the format that we’ve got so and then that kind of avoids that situation. C.B.: In case that data is released by the government or another official organisation could you say a few things about what tends to be your standard procedure? L.E.: So, when the data is released, like the workflow, kind of? C.B.: Yes, exactly. L.E.: So, normally we look through the data and then and read kind of some summary of what data means and then I would call someone if that doesn’t make sense call the press officer or some contact that’s been given and then make a decision about whether it’s worth running as a story that day and then we’ll start either cleaning up the data which means that we put it usually in a Google doc and get rid of all no-cells (?) maybe think about what’s really essential in this data set and then if there’s something that doesn’t make any, doesn’t add any information then we clean that up and we’ll always link to the official source anyway so people can get their full dataset and then but often it’s just like a few columns in which case then we’ll just keep all of that but sometimes it’s sheets and sheets of someone’s spreadsheet or like we need to tidy it up cause we don’t want it to like a complete mess when readers look at it and then we think about what can we do visually with it, sometimes there isn’t anything that it’s obvious to do, the data

stands by itself and it’s interesting enough then we think oh well what can we do, what percentages’ changes make any difference to this think about what we can do just in terms of making it more meaningful to the readers like, if it’s say it’s something like the number of and we always try an bring things down to the personal level if we can so like how it affects individual so if we’ve got like a total amount of spending for a whole country we might think well maybe we can put this in terms of amount per person so do a calculation like that and then write up what we’ve done and what we think of the data and get any quotes that are relevant and

110

then start building the Blog post around that and the we just pass it onto the editor and give it a good look over it see if we’ve made any obvious mistakes with other people in the team and then publish it but it’s all quite a quick process it usually happens within a few hours for some of the posts other ???? of the press will take a lot more work but if you’re gonna keep relevant and up to date with the news then we really need to be able to publish things every few hours. C.B.: In case data is collected by you, it’s not released by an organisation or a government is the procedure different? L.E.: Yes, that would take a lot longer or at least, well for example when it was the London riots a few years ago then we were gathering reports of rioting and mapping them as they happened so that was like we had the map live really quickly and a few reports mapped on it and then we just kept adding to that throughout the day so that was like a continuous process that we made, that story continuous and there’s a few cases of that and then a similar thing was with Syria when we looked at reports of shootings and things but that wasn’t released, that took a while and it we didn’t release until we’d got enough data so yeah there’s different types of stories. C.B.: Big data is now a very hot topic; could you say what big data is for you? L.E.: Uhm yeah I think big data is really it’s kind of a different skill-set really to what we did at the data blog, it’s a lot more statistic heavy and a lot more requires a lot more a different set of skills for managing that data so you need to be able to use Unix and so forth as far as I’m aware there aren’t any tools that are equivalent of Google Charts and Google Spreadsheets for big data so you really got to be a programmer or at least now a command line tools that we just feed through lots and lots of data and pick out some things but yeah I think it’s gonna be interesting what happens with big data. C.B.: Which are the tools that you prefer to use to gather, refine and organise data

and which to visualise and why do prefer those? L.E.: So things that we use most frequently in that job were Google Docs and Google Maps and then for bigger datasets we used Fusion Tables, Google Fusion Tables and then it kind of with the things that we all there are like high core things that everyone used and then other people develop skills like John got really good at Tableau and yeah part of that job was working with the graphics team a lot so we were really hired by the graphics team to be researchers for them and then data blog was kind of an outcome of the research that we did for them that we could put

111

online in lots of ways so it was always kind of like half doing data blog stuff and half doing graphics research and working with them really closely which was great because and they use all the Adobe chores like Photoshop things and Illustrator and things so yeah we had a little bit of use of those things too, we had access to Photoshop and illustrator and occasionally use this but yeah and then we did play with some things and then found that they weren’t that useful for our particular job but were incredibly useful towards in general so we used Refine a little bit but mostly we found that with the data size that we had we could mostly do it by hand anyway like we weren’t often dealing with huge datasets that needed Refining anyway yeah so the Google suite of tools were our main ones and then Excel was useful, all the cleaning up happened on Excel and then we just imported it yeah. C.B.: How do you decide which type of visualisation is the best for your story? L.E.: Working with the graphics team really and then if we didn’t work with them what’s available already we’re fairly limited on the Google charts so yeah uhm it wasn’t that tricky but yeah I guess you apply the principles like if you’ve got a complete dataset and then you want to set, and then you’ve got it divided into different pieces then like if you’ve got total spending and then you break it up into pieces then that’s something you could use in a pie chart or but otherwise you wouldn’t use a pie chart just kind of applying those basics, charting principles and yeah. C.B.: Do you prefer static or dynamic visualisation? L.E.: It depends sometimes static visualisation is a complete picture but sometimes dynamic is usually more difficult to do but it often gives you that kind of relationship with the readers where they can properly explore for themselves and find things with in comments and stick around for a bit longer so yeah finding a really good topic and making an interactive for it it’s really kind of the best thing for the relationship with the readers but then like forcing something into an infrographic,

into an interactive it’s not gonna work either so it really depends on the topic. C.B.: About the static visualisation which do you believe are its advantages? L.E.: I think just in terms of like initial visual impact it’s the best and then yeah that’s I think that’s its main advantage it’s just convenient, very focused on what it’s doing C.B.: I am now going to give you two potential statements and I would like you to comment on them. The first one is ‘Do you search for available data on a specific

112

story and then try to prove or disprove a hunch or identify basic information hidden in it?’ L.E.: Uhm so which one...? C.B.: This one is the first one. The second one is if you examine various datasets in general to see if there’s a possible underlying story. L.E.: It’s usually the first that we sometimes there’s like kind of investigation but often with this it’s just obvious that something is really viable to the news that are giving vine (?) so like if housing is in the news then finding a relevant dataset that will explain that story more deeply, it’s usually driven by what’s at the news at the moment, what’s on people’s minds, what issue needs explaining more, and then we find a dataset and then we look into it maybe that there’s something that comes up from that but more often is just a deeper understanding of a story that’s already running at the time so that applies with the riots that for example like there’s no other stories to do and there are riots the next (?) and then a couple of summers ago so you just you look for anything you can on that issue to make it clearer to so in a sense you’re digging more into a story uhm that’s already present but you’re not like taking that data and then that might happen at a latest date which it did with the riots’ stuff where we took the locations of people who convicted of rioting and looked at where they lived and then layered a deprivation index on top of that so you could see that relatively clearly that the people who were involved in the riots were more likely to be from a poorer background so yeah often it’s driven by the news and then sometimes you go deeper into investigation. C.B.: I would like now to carry to your own work on the Data Store and comment a little bit on a specific story a specific article story line that perhaps has a great significance or has an interesting story behind a if you could say a little bit of how, which story it is because you have published a lot in The Guardian Data Store, and what made you investigate the topic, how you decided which variables to focus on.

I will let you select which one because you have published a lot and I would like to know which one for you has the most interesting story behind. L.E.: Yeah so we had this guy from a guy who’s been investigating a story which was all about people who claimed disability who claimed job seekers allowance, have their benefits cut so there was uhm he’d found someone who had worked at a job centre and they were being biased to reduce people’s job seekers allowance claim and put them onto something called put them like put their claim on hold while they were being investigated so they were "sanctioned" that’s what they’re

113

called and so he’d kind of got a story but he’d want it to back it up with some figures and so we went to, we used a database that’s on the Department for Work and Pensions which has the numbers of people who’ve had their benefits cut and been sanctioned, it’s like a temporary cut in their benefits and yeah so it wasn’t completely obvious the figures were really the naming of the figures was all a bit kind of cryptic so we had to do a lot of phoning with the Press Officer who obviously didn’t want us to run the story and so it was kind of finding all kinds of confusing things to distract us so but we did eventually get the data that we needed that we could feel confident with and then we did some analysis to look at how that the number of sanctions had changed overtime and there was this big search also how the sanctions had changed over the country overtime and we put it all together in a map and it really reinforced that story and gave it some real weight and then they were able to run it as a front page story and so yeah I think that was quite a good example of using data to back up something that we knew was an issue. C.B.: How much time did it take to produce the final publication, till the final publication? L.E.: It took about a week and they wanted to run it at the end of the week anyway cause it was one of the longer running stories, like it was a longer time frame to prepare for it because it was always gonna be true (?) Yeah. C.B.: Approximately how many people did work in this story? L.E.: So there was the guy who got the whistle blow and made the video out of it and then there was me and Simon helped out at the end with the calculations when I was thinking ‘Oh Gosh we haven’t got a story!’ and he was like ‘No no I think we really have!’. So three of us but, yeah... C.B.: Okay do you have any insight or feedback on how articles with data visualisation are perceived by the readers?

L.E.: Well there’s all the comments at the bottom which you really really, we always, I always read them anyway cause you’ve got to kind of take it with a balanced view cause sometimes people just hate pie charts and that’s their thing and they and they’re not kind of justified in complaining about use of pie charts cause they are perfectly reasonable thing to use on a particular dataset and then other times they have got a real point about certain things and so in general people like it I mean the popularity of the Data Blog and the time the three years (?) that I was there it just grew like to this kind of huge extent where people were actually, it became the

114

place where people sent their visualisations. And yeah I think people enjoy cause it’s just nice and it seems that it’s less time consuming in lots of ways and you also get the chance to explore it for yourself and it’s just a moth a kind of moth, light, fun way of learning about what’s happening in the news but in more detail and just this thing is happening and also like to get into this huge datasets that the government was releasing like what’s the bigger picture of these huge databases that they’re releasing. So yeah we got I think that we are going to find without a doubt that there’s a huge interest in visualisation, people get hugely enthusiastic about it and really passionate about it. C.B.: So you believe that the stories, your stories were better because they had visualisation? L.E.: Yeah I think people enjoy a story with something that they can look at as well quite often it’s when it’s really nicely put together and a lot of thought has gone into it and if there’s add to the story sometimes we did stories where the graphic was the thing that the article talked around so when we did for example the superpowers of China and the US and that shows lots of different factors the infographic was the huge, the biggest part of the page and then the article talked around the, it was built?? around the infographic. C.B.: Do you believe that readers understand visualisation? L.E.: I think generally they do yeah, I think if they don't like any feature, in the comments you'll be understanding, I mean that you’ll definitely be told if it doesn’t make sense. C.B.: And do you have an example of bad visualisation? L.E.: I noticed the other day they put up the ten worst visualisations. But yeah I don’t really I can’t really think of anything. I think the thing that bothers me the most in terms of visualisation it’s not the visualisation itself is the approach to the data analysis that’s wrong and then the visualisation follows on from that so, for

example, if you’re looking at mortality rates in hospitals and some hospitals have one or two patients with that particular like mortality rates from heart disease in different hospitals and some hospitals only have two or three patients who have heart disease but yeah and other hospitals have thousands of patients. Then to do a percentage check, a percentage of mortality rates on those hospitals and then put them side by side and basically turn that into a league table is really bad analysis and that can be much better to do statistical analysis on that where you look at what you would expect from a random uhm like if you ask the question that

115

was more like are these results more than you’d expect if there was a random number of deaths in that area so is it more than something that would just happen by chance? Is a much better question to ask in that dataset than which has got the worst percentage mortality rate because you’re not really comparing like to like (?) and then so you would need a different visualisation than say to visualise that question about chance are the death rates more than, greater than you would expect by chance? Then you have to visualise that differently to if you’re just asking about mortality rates which you might just got as a bar chart or a league table or something and then that’s really misleading it can panic people who live in an area where they’ve got the highest percentage mortality rate even though they’ve only got a couple of patients. So yeah there’s a kind of issues and I think they’re the worst mistakes that people can make just not applying the right analysis. C.B.: Is there a specific type of visualisation that is hard for people to understand? L.E.: Yeah there’s, I think network type things are really hard and people in the comments often say that doesn’t mean anything so yeah network diagrams seem to not work so well and I think that’s also because of the analysis too like you need to really apply some kind of network analysis to the data and then it might be that that’s a visualisation and need it if you come up with some pretty solid analysis on it. So yeah I think that’s... C.B.: The network, I’ve been told by another interviewee about this specific type as well. Do you believe that in general the readers understand the articles better because they are visualised? L.E.: I think so yeah, I think it really, really helps to give people a way into big datasets or to a big issue. C.B.: According to you which are the acquired skills or knowledge for someone to work on data driven journalism and data visualisation in a professional level? L.E.: I think it’s actually not as high as I would have thought before I entered it

because a lot of it you don’t need a deep statistical background which I thought you’d probably would, actually it’s quite good to, I mean it would help enormously but it’s actually okay to be really sensitive and ask experts in that area so for the Olympics last year a couple of statisticians I think three statisticians were hired to look at the data and they did a really nice job and so I think it’s okay to work with statisticians and not be a statistician and that doesn’t devalue what you’re doing because if you’re very sensitive to both on the one hand like the statistics but you’re not completely ‘I’ve spent years studying’ and then on the other hand to

116

what your audience would be interested in and what would benefit people in general to know or what they’d respond to, the way to put things. Then I think it’s much more of a bridging gap, bridging that gap than there’s being an expert at statistics AND being an expert at writing and being engaging to people and stuff. So yeah I think being very sensitive to most of those things. C.B.: I have a couple of more questions and then I will be fine. If you were building a data journalistic team for a national newspaper how many people the ideal would include and of which speciality? L.E.: Yeah so l’d definitely choose someone who’s a really good graphic designer cause the graphics team at the Guardian and the interactives team they were just really, really excellent and experienced and they were a lot more, they weren’t just designers they were data analysts too in lots of ways so they wouldn’t just draw a picture of anything unless they understood it and they were like a check on that so I’d pick someone just like them at least one or two people like them and then the interactives’ team would just the same so maybe a couple, okay one person who’s an interactive designer who’s as good as the people that were working there at the time and then I think someone who’s really experienced in these, like Simon, and then someone who’s fairly kind of young but maybe got some kind of Maths background yeah cause I think you do need someone who’s gonna question that but not to the extent that they won’t run any stories like they have to work really well with the person who’s, cause the person who’s, who knows the news really well and knows how to, the kind of level that you need to communicate at. C.B.: Great. And how do you think that data journalism and data visualisation have changed journalism in general? L.E.: Yeah I think they’ve, well at The Guardian the data journalism team was really well respected and part of the whole, people from all different specialities and news would come and talk with us and kind of, and work with us rather than treat as like

a service, do you see what I mean? So they would treat us, not like we were fact finders for them but more like we were some, one team (?) that could work together, we could do a data spin on a data heavy story on their topic that they’d been investigating that would add topic to their piece if it had facts behind it and so I think that wasn’t really initially always the case when I first started we were kind of part of the graphics team and over the two years that I was there we became much more kind of respected members of the journalism team as well as the graphics’ team.

117

C.B.: What do you see in the future for data journalism and data visualisation? L.E.: That’s a good question. I think we definitely see better tools, I saw the other day a really nice CartoDB, It’s brought out all these lovely pictures that would have taken lots of time to make by hand or would have needed an expert and now you can connect up and layer two datasets on a map and that’s really good. And then so the tools are definitely getting better, they’re just gonna improve and yeah I hope more people will come onto it who’ve got both really good stats skills and also really understand what’s useful to people. C.B.: Okay, thank you very much. I would like to ask you only if you agree for me to use your name on the dissertation. L.E.: Yeah sure. C.B.: Is there something that you’d like me to omit or to delete or not to include, something you’ve said? L.E.: It's ok. C.B.: Thank you very much, you’ve been really helpful and I really appreciate your help, thank you! L.E.: No worries. Good luck with it. C.B.: Have a nice day and a nice weekend! L.E.: You too! C.B.: Goodbye! L.E.: Bye bye!

2.4 Transcript of Interview with Paul Bradsaw

C.B.: I’m here with Mr Bradshaw, my name is Charalampia Boula and I’m having an interview. I would like to start with some general questions. Why are data driven journalism and data visualisation constantly growing in importance and in use in your opinion? P.B.: Well that seems that do you think that? Well I think data, journalism around data is more important partly because data is becoming more important and journalism’s role is partly to hold power to an account and data is a form of power at the moment information is power and data is used to make a number of financial and political decisions. So it’s very important from that point of view journalism is also about communication and translation and with large amounts of data visualisation for example is a way of communicating clearly what might actually be

118

loads and loads of number and would have otherwise be less interesting and fewer people would find that data. C.B.: I’ve seen that you’ve used data visualisation on your work so which do you believe are the main advantages of using visualisation and if there are any disadvantages, which could they be? PB.: I think visualisation is very good for grabbing someone’s attention it can almost be used like a headline or like a quote would be traditionally used in text-only journalism, it can be a good way of demonstrating a complex concept on complex story more simpler. And also it’s good for people who are not textual people, a lot of people are very visual they search visually they communicate visually people so it broadens the range of people that a story might have an impact with. It does have drawbacks like anything, I think it can oversimplify, it can, you can lose the subtleties and complexities of a story and so I think it’s important often to use visualisation in partnership with other information attached with video or whatever. C.B.: I’ve read in one of your articles that you said that sometimes the inforgraphics travel on their own... P.B.: Yeah yeah. C.B.: And it could be a disadvantage because people... P.B.: Well, I think it’s important when in an inforgraphic to include a link in or some sort of URL that people can follow to other context, yes. C.B.: Do you believe that data and information visualisation can be misleading? And if yes, which is the best way to avoid that? P.B.: I think any form of communication can be misleading and the way to avoid that would be with the usual, I guess ethical considerations that accompany any up top journalism, which is that you strive to be accurate, you strive to pop into context and not misrepresent so yes exactly the same processes.

C.B.: How do you choose what to keep and what to omit from the available data? P.B.: Again I think it’s the same as most journalism processes; you will take information out of a story if it is not pertinent, relevant... You know you want to strip back a story to the core details, the core facts, the key courts (?), the relevant background and so you reply the same rules to a visualisation, are we telling a story that all 400 you know law authorities or politicians are or are we just telling a story about five? Can we tell it more clearly by focusing on what our story is about and quite often we did, there are a lot of stories to tell so it’s quite, you need to be

119

more disciplined than text journalism? Sometimes being ruthless into what you take out, doing ??? the way this represents. C.B.: Which are your main data sources and what do you prefer to use? P.B.: It depends on the story. I mean on a regular basis I get updates from data.co.uk, the Office of National Statistics, F.O.I. requests on what do they know, how the particular key word in them that they might be interested in, so the particular sources like that but a lot of the time it will be something, for example, I’m working on some, on housing infographic at the moment so that’s a case of seeking guides’ data on housing not in fact is on data.co.uk, it’s on the gov.uk?? website it’s the Guardian, it’s non-profit organisations. There’s no scraping but something else I’m doing for something else involves scraping information from a series of job websites so it really depends, I do like scrapping because it’s a way of getting data that no one else has, so I get back something that I personally prefer but I get data from all sorts of sources. C.B.: Is it important that the data you use is freely available? P.B.: Yes, is the short answer. I think there’s a distinction to be made in terms of, you could possibly say that as a journalist I’m more likely to do something with data if I’m the only person who’s managed to gather it. As a citizen, I prefer, you know on principle, and as a journalist on principle, generally I think data should be more freely available and particularly available in format to make it easier to combine, so data should be clean as well as freely available yes. C.B.: We know that the Guardian data always publishes data but could you give an example that when it was not possible for you to give access to the data readers and under what circumstances something like that could happen. P.B.: The obvious reason would be copyright or database rights. So if I have script data of, for example a job’s website copy, if I was to publish all that data I’d be breaking the law and so that’s something I’m not going to do unless I feel that

there’s a public interest argument is so strong that it would be unlikely that I would be sued or if I was sued that an organisation would back me in defending that so there are legal reasons there may be cases where data is personal and reveals people’s identities or I have reason to believe that the data is chronically inaccurate or that there are inaccuracies that it would not be ethical to publish, those would be the main reasons why I would not publish data in front (?).

120

C.B.: In case the data is released by the government or another official organisation could you say a few things about what tends to be the standard procedure from the collection to the publication, the final publication? B.: Do you mean the collection by the agency or by me? C.B.: By you. P.B.: So if an organisation publishes data what do I do. I mean I would, first of all I would try and identify what it is in the data that I’m interested in because quite often there would be a lot of possible avenues and you could waste a lot of time cleaning up data or mixing data together that you don’t need to clean or mix up so I’d identify the particular aspect or for example I’ve been working on data this week that’s around housing costs and the stuff about temporary accommodation, about private and it’s about bed and breakfast so If I say I’m interested in bed and breakfast I would take that particular piece of data and clean that up so that’s in the format I could visualise or combine with other data or sort or aggregate or do something with so quite often there would be empty rows in the data and I would take the data into Google refine and strip out the empty rows, the headings might be in multiple rows and again Google Refine would allow me to combine those into a single header. So once I’ve cleaned it up I might want to duplicate entries as well, things like that then I would try to analyse that somewhere so if there are multiple rows for different items of spending for example I might use a pivot table to add them all together I might combine different years to give a view of a time, I might add extra context or I might need the populations for each regions or things like that over years is a good example of that and then I would probably pick up the phone to ask any questions about it. So, for example, in the bed and breakfast example some local authorities are actually making money they’ve got far more income than expenditure, which seems odd so pick up the phone to some of those authorities and say how does that work and try to understand what the calculations

are and what the money is that’s coming in and going out and then I guess at the end of all that you might do some sorts of chart and strip out details in the chart or you might write a text with plot (?) and quotes get at this studies you might combine it to it. C.B.: In case you collect the data from scratch yourself which is the equivalent procedure, I mean, where do they differentiate from the…? P.B.: There’s probably no difference apart from the fact that if I’m collecting it myself the cleaning, I will have prevented some of the cleaning problems, there

121

wouldn’t be empty roles, there wouldn’t be multi role headings the yeah I’d have more of an understanding I might be able to strip out duplicate entries at the point of collection so some of the clean might be done as part of the gathering, but I guess in that sense you’re still pass the process you’re still part of the gathering. C.B.: Big data is a very hot topic now, could you please say what is big data for you because there are different opinions? P.B.: I think like a lot of neologisms like data journalism, like citizen journalism, like visualisation they… there’s no, different people understand different things by it it’s a rebranding of something old but there’s a reason for it to exist something which is that something has changed my understanding of big data is that there’s a way of signalling the quantities of data being gathered or, that we can work with has changed in a way that affects what we do qualitatively. So I don’t think you can say above a certain amount of data I think it’s more of a cultural court (?) to say, to signal that something is unusually large in the context of someone’s experience but I don’t think there’s a hard and fast definition of what big data is. Does that make sense? C.B.: Yes, yes I understand, I understand yes. It’s very difficult to define and some people only focus on the quantity or other’s combine it with the qualitative aspect. P.B.: I don’t it’s a practically useful term, I don’t think it relates to anything concrete, I think it’s a socially useful term, culturally useful term in talking very generally. C.B.: When you said that it’s a rebranding of something old it obviously existed before but people perhaps hadn’t realised that it could stand as something on its own as a science of its own, no science exactly… P.B.: I think first of all was more of it, there have been very big data sets in the past but we have needed enormous computers the same process imply (?) but much bigger computers much fewer computers to do things with it. So that, you know

data journalism existed in the form of computer system to be parting, but has broaden to take in some of the other things we need a new term to essentially recognise that and talk about computer systems to be parting in a new way so think it’s the same things we’re still talking about data but we’re signalling some way that we’re talking about data in a qualitatively different way not specifically but just generally this is qualitatively different. C.B.: Which are the tools that you prefer to use to cover, refine, organise data and then to visualise and why?

122

P.B.: I use ??? to scrap it, to scrap relatively easy to datasets. I use Scrapper wiki and Python to scrap basic more complex data I use excel to analyse and sometimes Google drive, I use Google Refine to clean it up and sometimes Excel, and sometimes Google Docs, and sometimes Python even and sometimes command line, commands, terminal commands like to combine spreadsheet and I use Fusion Tables to combine datasets, I might use Excel to combine it, I might use Python again, visualisation I tend to go for Datawrapper at first because it’s nice and quick and let lumited (?) I use BatchGeo, I’ll use fusion tables again, I’ll use Tableau sometimes because I don’t have a PC at home and it doesn’t work on Mac I don’t use it that often. And I don’t know, a bit of Java script libraries that I’d like to do more with but I don’t that much and I’ve probably forgotten stuff but yeah, it depends on the particular problem you know I’ll turn to Google, if I’ve got a problem I will turn to search for solutions to that problem. C.B.: How do you decide which type of visualisation is the best for your story? P.B.: I do have quite often use, I used to use a chart chooser by someone called A Abella which kind of... C.B.: Is it the one I found a little one of the…? P.B.: It’s probably, I’ve probably mentioned it yeah. But I’ve kind of absorbed that in mind and I will do it in my head so I’ll decide is it a story about comparison or is it a story about the constitution of something, is it about distribution, is it about relationships and then you know if it’s about the composition of something as a snapshot I’ll use a pie chart or a tree map, if it’s composition of a time then bar chart if it’s… so I’ll kind of decide on what the story is and then pick the chart that tells that story. C.B.: Do you prefer static or dynamic visualisation? By dynamic I mean interactive, can you say something? P.B.: I tend to use static ones, relatively static ones because of speed and

because I tend to use visualisation to just tell, to give an overview of something accompanying the text story I know Caroline Beavon will use more interactivity with Tableau for example, to give different views on the same data but it’s just been nurtured of the stuff that I work with tends to be, I’m more text based, Caroline is more visually based in her work so we will… so we will probably work differently on that sense. It depends on the two I mean Datawrapper has interactivity in terms of that you can select from drop down menus and stuff like that and fusion tables has interactivity you can clink on different locations and get an information and… so I’ll

123

tend to use those of the two that I use most and I’ll use the interactive that comes with those but it’s really about telling a story simpler, at some point I did use Leaflets because which is a Java script library to do a map because it allowed me to do more interactivity and it can be used on a mobile phone to centralise the user’s location and I’d like to use this functionality more but it depends on having a story that requires that functionality. C.B.: I have two statements that I will say to you so I would like your comment in each one of these; the first is ‘Do you search for data on a specific story and the try to prove or disprove a hands or identify a basic information that could derive from that data? And the second one is ‘Do you examine various datasets in general to see if there’s a possible underlying story? P.B.: Both. C.B.: Both. Is there something that occurs more often? One of those two that…? P.B.: Because I’m not a, because I’m not a kind of an employed journalist as such I tend to do more with an hypothesis and look for the data that surrounds that… but it depends really If I was, you know if I needed to fill spares I needed to get content out regularly I’d be doing more starting off the data, looking for the stories and that, and that’s the stuff that I’m trying to do with my students on the Birmingham Data Blog because it’s simpler to do but I’m more interested in finding data based on an idea. C.B.: I would like to, if you would like to choose one of the articles you’ve posted in the data store and tell me a little bit about it, what made you choose…? P.B.: Okay, let’s go with that then… ‘So the Olympic torch relay places have been allocated’. There’s… I mean I guess there’s a bit of history to this and the others arrived a bit (?) straightforward, with this I was asked to write a story/post and that’s probably the end of the story with how to be a data journalist with a story behind the sponsors, I mean these are all relatively similar, all the Olympic ones are

similar. We were working on investigation I spoke to James Ball or Simon Rogers depending on the article and I can’t remember who for which, I spoke to them said ‘we’ve got this data do you want me to do a piece of the data blog about X’ so in this case we would compiled information on what happened to all the places in the torch relay places, how they had been allocated, by whom and to whom. And I’d been done for the book and I said ‘Do you want me to do a post about this particular aspect of the book?’ and I’d asked Caroline ‘Would you do a visualisation for this?’ So Caroline did the visualisation and I did an overview of the

124

story. And I guess part of the reason for doing that is because it broadens the… exposes the book and the story to more people and profit from the book go to charity so I’d like more people to buy the book, you can download it for free but I’d like to raise money for the charity, it raises the profile of Caroline, and it’s nice to be able to, for Caroline to have The Guardian on her CV, it’s nice for the other contributors to the investigation to be able to feel that it’s getting that sort of exposure, it improves, you know, the search engine optimisation that helped me investigate. So there are a number of benefits really but broadly speaking it’s a story that needs to be told I think, it’s… yeah. C.B.: How did you decide which variables to focus on? B.: Which? C.B.: Variables to search and focus on, I mean… P.B.: In this case we’ve been investigated the torch relay for a few weeks and individual instances of corporate executives been given torch relay classes and that story being departed and we could keep on doing that and find more and more executives but what we wanted, what I wanted to do I guess, was move from the individual and anecdotes (?) to something that was about assistance and find out what went wrong, rather than here’s something that’s shameful, why did that happen, who’s responsible, was it Adidas or was it ...? Was a promise actually broken and to what degree was it broken and so on. So, in order to do that I knew we needed to take VIPs and places and work out where they went and we knew that 2012 had gone to this particular campaign, a public campaign, a public cause publicity but, and we knew that some had been given to cooperate partners to distribute themselves and they were supposed to do that publicly so was really about saying we’ve got this figure but, you know, we were on the phone to Lloyd's saying ‘How much of this went to external and how much went to internal? And eventually they said ‘Right 50% went to internal’. And likewise we collected

various evidence of different campaigns by Samsung and different campaigns by Coca Cola. So it was, that was really a lot of document-based research, there wasn’t, there weren’t any datasets involved here other than, essentially we created a dataset out of public records you know out of here’s a statement from ...? and 2012 places, here’s a press release from Samsung about campaign that was for ten places, and so on. And there’s a dataset that has which organisation, how many places, which campaign and so on and that’s about, so that’s the methodology and it’s really about showing a system of work and trying to compare

125

that with the promise that was made which was 95% would be made available to the public. C.B.: How long did it take to create it although it was…? P.B.: Well, that particular thing probably took a week from start to finish, well from deciding to do that. But a lot of the information, a lot of the documents that formed part had already been collected along the way so it’s a case of looking back and kind of putting that into a spreadsheet, so I already had bookmarked press releases and things like that and I was going back on all of that. So it was kind of personal archive research and then identified gaps like Lloyd's and say ‘Right we need to phone them up cause we have no documents about Lloyds and phone up Coca Cola and pressure them and try and get The Guardian so the Guardian tried to get figures out of Coca Cola and that, which is another reason for working with them, you know you’re more likely to get access to things if you’re at The Guardian. C.B.: And apart from Caroline did anyone else cooperate at that? P.B.: Carol Maiers did a lot of work, I mean she was the one who was at the phone Lloyd's With this particular one there might have been one or two of those but it was mainly me and Carol doing the dig in, and I think James Ball at The Guardian, to an extent. The book is a whole of contributions from all sorts of people. C.B.: Thank you. And do you have any insight or feedback on how the articles with data visualisation are perceived by the readers? P.B.: Uhm… no. C.B.: Do you believe that the readers understand them better because they, of the form they are visualised? P.B.: Uhm… not particularly, I think the only thing I would say is that we did very early on in the investigation I did a map with fusion tables of Nottingham born torchbearers and Nottingham sport website took it into gymnastic ...? use ...? and write an article about it. So it had an impact in that sense and then that was, and

actually was repeated, someone in ...? did one, someone in Wales and so on. So clearly had an impact in some sense, that’s about it really it’s difficult to get an idea of any of the feedback and… yeah. C.B.: In general about, not for your article specifically, do you believe that in general articles with visualisation are better understood by the people, by the readers, the average reader? P.B.: I don’t know, I mean I don’t have any evidence to base a judgement. What I would say is that I think images generally are more effective at bringing someone

126

into an article. And charts raise a question I think that the article then helps answer. So, in theory you are likely to get people more engaged in the article from the start because of a chart, regardless if the chart itself helps at understand a bit better, it gives them more motivation to, I would say, because there is more motivation to read, is almost a promise that this will be explained. C.B.: You can’t know if, do you know if your stories worked better because, for example this one that had the visualisation, do you believe that it worked better because of that? P.B.: Yes. C.B.: Yes. And do you think that most people find data visualisation as easy to understand? P.B.: I don’t really have any evidence on that so what I think doesn’t necessarily carry any weight. As I said, there’s evidence of people that communicate visually that they, so I think it aids, what they/I do. C.B.: Do you have an example of data visualisation that was used badly that it was really hard for people to understand, for an average reader? P.B.: Yeah I keep my links, I’ve started saving bad visualisation, a simple one would be delicious.com-???? It’s not my work but there you go. Let’s try this [typing on PC to find the correct link that was: http://pinboard.in/u:paulbradshaw/t:badvis ] C.B.: Okay, thank you very much. I’ve read your article on how to be a data journalist. Which are the skill that someone needs to have in order to work on data driven journalism and data visualisation on a professional level? P.B.: What skills do you need? C.B.: Aha. P.B.: I think you need an eye for a story, you need to be able to see what stories exist, that might exist in data, you need to be able to analyse data and to find that, and then you need to be able to communicate the results effectively so that quite

often means writing about data in a way that isn’t bumped up/down with numbers that might human stories in them, so that’s about kind of leaving the data behind and speaking to people. C.B.: If you were building a data journalistic team for a national newspaper, could you say something about the kinds of people that you would recruit, I mean who many people of which specialisation, the idea, the minimum, perhaps… P.B.: I think ideally you want someone with a subject specialism. There’s a lot of ifs, there’s a lot of things, there’s a lot factors that come into account. So, for example,

127

if you’re in a team within a newspaper that has expertees more broadly then you don’t necessarily need a team (?) but you need to be able to say to the health reporter of an education specialist what do I need to know about issues in education or whatever. So you need access to either within or outside the team to subject expertees because that’s what leads to the kind of hypothesis about checking particular claims or looking for the impact of things or seeing if policies worked and then the same sorts of skills again you need people with a nose for news, you need people who can do basic spreadsheet work and cleaning an analysis and I think increasingly you need developers you know programmatic skills of being able to write scraper or being able to create systems about stream line process ever more if it is easy to work with public data but increasingly that will not be enough and also I think the more useful data will might not be the public data you need people that have got the F.O.I., you need people who’ve got the/a scraper pen you need people who have contacts who can leak/link (?) data. So people who have access to data that others don’t and that involves developers and F.O.I. expertees' contacts. C.B.: And my final set of questions is how do you think that journalism data visualisation have changed journalism and what do you in see the future for…? P.B.: How have they changed journalism uhm… Well I think, I don’t think it’s about data journalism but I think it’s an increased pressure on journalists to be factually accurate, partly , not just because of data journalism but because other people are able to publish now and say we know this subject inside out and you have made a mistake so there’s more pressure in terms of bloggers. There’s, it’s easier to access factual information and check claims and things like that so that’s had an impact. I think we’re telling stories in different ways so visual storytelling is becoming more important and that’s having an impact and I think it’s having an impact in terms of we can see that it’s selling papers like it can lead to a number of

stories like MPs’ expenses, Wikileaks, we can see that it leads to higher engagement in terms of traffic and stickiness and commercially, so that’s leading to a pressure to do more, but I don’t think it’s data journalism that’s causing those pressures I think it’s other things so I think it’s changes in how advertising is measured it’s changes in the information environment, in the information availability, its changes in how data is used by politicians you need there’s a lot more abuse of data by politicians but perhaps we're used to it because there is

128

more availability of data so journalists are probably reacting more to the information environment more generally than to data journalism as such… yeah. C.B.: And in the future what do you see? P.B.: In the future I think as I said I think there’s always gonna be a conflict between the information that a journalist is seeking and the information that powerful people want to make available and that plays out in a number of ways; the F.O.I. laws are under pressure because of what’s been done with them so I can see there being fights around F.O.I. in two major ways; one there’s a fight to roll it back to put more than it’s on it but also there’s a fight to extend it to private companies for example than dividing public services so those two fights are taking place I think there’s more press release data going to be made available so there’s gonna be more spin on data than journalists are gonna have to unspin and I think journalist are gonna become better at getting data that isn’t available through those means, so either being leaked by sources or being obtained through scraping because more of it it’s gonna be available online. So you kind of got to balance and that’s where I see most of it playing out. There’s gonna be more use of data as well so there’s gonna be more opportunities for personalisation, for stories to be told in ways that relate throughout to a person, so you plug in through Facebook and that story is told in terms of your area, your friends, your skills, your health conditions either. And that’s I think an area that’s going to grow a lot, network analysis connection between people, that’s being historically hard to do and is becoming easier so relationships ...? and things like that. So, that probably sums a lot. C.B.: Okay. P.B.: Alright? C.B.: Yes, thank you. Is it okay if I use your name? P.B.: Yeah, yeah, yeah. C.B.: Thank you and if there’s something that you would like me to omit?

P.B.: No. C.B.: Okay, thank you.

129

Appendix 3 - Content Analysis Methodology

3.1 Code Frame, Limitations, Clarifications (Tables A-E)

Table A: List of Variables:

Variable Code Name

Variable Description

Var1 Year of Publication Var2 Number of visualisations Var3 Author of article Var4 Subject Category Var5 Existence of Visualisation Number 1 Var6 Existence of Visualisation Number 2 Var7 Existence of Visualisation Number 3 Var8 Type of Visualisation Number 1 Var9 Type of Visualisation Number 2 Var10 Type of Visualisation Number 3 Var11 Tool for Visualisation Number 1 Var12 Tool for Visualisation Number 2 Var13 Tool for Visualisation Number 3 Var14 Existence of Data Summary Var15 Existence of Data Set

Coding description:

1. Var1: A numerical value indicating the year of publication, raging from 2009 till

2013 2. Var2: A numerical value indicating the number of visualisations in the article

3. Var3: A numerical value corresponding to a specific name for each author. In

the cases of multiple authors only the fist one is considered. Value 0 indicates articles with broken links or empty content

4. Var4: A numerical value corresponding to the subject category to which the

article belongs. Table of the code of each subject category to follow. . Value 0 indicates articles with broken links or empty content

5. Var5: Numerical value 1 indicates the existence of 1st visualisation, value 0

indicates non existence

130

6. Var6: Numerical value 1 indicates the existence of 2nd visualisation, value 0

indicates non existence 7. Var7: Numerical value 1 indicates the existence of 3rd visualisation, value 0

indicates non existence 8. Var8: Numerical value corresponding to a specific type of visualisation. Table

of the code of each type to follow. Value 0 corresponds to non existing visualisations, or articles with broken links or empty content

9. Var9: Numerical value corresponding to a specific type of visualisation. Table

of the code of each type to follow. Value 0 corresponds to non existing visualisations or articles with broken links or empty content

10. Var10: Numerical value corresponding to a specific type of visualisation. Table

of the code of each type to follow. Value 0 corresponds to non existing visualisations or articles with broken links or empty content

11. Var11: Numerical value corresponding to a specific tool used for the creation

of visualisation. Table of the code of each tool to follow. Value 0 corresponds to non existing visualisations or to the cases where a tool can not be identified8

12. Var12: Numerical value corresponding to a specific tool used for the creation

of visualisation. Table of the code of each tool to follow. Value 0 corresponds to non existing visualisations or to the cases where a tool can not be identified9

13. Var13: Numerical value corresponding to a specific tool used for the creation10

of visualisation. Table of the code of each tool to follow. Value 0 corresponds to non existing visualisations or to the cases where a tool can not be identified

14. Var14: Numerical value 1 indicates the provision of a Data Summary of the

Data Set on which the article was based. Value 0 indicates non-provision or articles with broken links. Tables in articles that are not clearly identified as 'Data Summary' at the end of the article, are considered general data tables

15. Var15: Numerical value 1 indicates the provision of the full Data Set or a link to

the source of the Data Set, on which the article was based (either on a downloadable spreadsheet or through a link to the source of the data). Value 0 indicates non-provision or articles with broken links

8 With the exception of the visualisation type of 'Tables', please see clarifications under Table E. 9 With the exception of the visualisation type of 'Tables', please see clarifications under Table E. 10 With the exception of the type visualisation of 'Tables', please see clarifications under Table E.

131

Please note that if an image of data visualisation contained more than one type of visualisations, those were treated as separate items and not as one.

Table B: Var3 : Author, Coding Scheme Code Number

Author's Name

Code Number

Author's Name

Code Number

Author's Name

1 Simon Rogers 24 Ersa Turk 47 Simon Day 2 Ami Sedghi 25 Rebecca

Ratcliffe 48 Anna Powell-

Smith 3 Theresa

Malone 26 Mona

Chalabi 49 Chris Hanretty

4 James Ball 27 Nick Evershed

50 Nick Mead

5 Graham Snowdon

28 Peter Walker 51 Jennifer Jones

6 Lisa Evans 29 Julia Kollewe 52 David McGillivray 7 Claire Provost 30 Simon

Choppin 53 Christine Oliver

8 Nathan Yau 31 Felicity Brown

54 Danny Dorling

9 Jeevan Vasagar

32 Severin Carrell

55 Gary Blight

10 Alice Woolley 33 Elena Moya 56 Nona Buckley-Irvine

11 Jonathan Grey 34 Denis Campbell

57 Jake Porway

12 Randeep Ramesh

35 Charles Arthur

58 Kathry Torney

13 Pete Robbins 36 David Henke 59 Harry Enten 14 Katy Stoddard 37 Premesagar

Rose 60 Lisa O'Carroll

15 Nigel Shadbolt 38 Larry Elliott 61 Chris Fenn 16 Jessica

Shepherd 39 David Mc

Candless 62 Andrew Sparrow

17 Sarah Hartley 40 SA Mathieson

63 George Arnett

18 Jonathan Glennie

41 Adam Vaughan

64 Andrés Monroy-Hernández

19 Paul Bradshaw 42 John Burn-Murdoch

65 Sam Weaver

20 Jonathan Grey 43 Tom MacInnes

66 Margot Huysman

21 Kevin Anderson

44 Nathan Green

22 The Guardian (no specific author mentioned)

45 Alasdair Rae

23 Juliette Garside

46 Antonia Kanczucla

132

Clarification: Author 3.Theresa Malone, was assigned as author by mistake.

However, the mistake was detected in an early stage of the research and her code was not replaced by another name in an effort not to miscalculate the coding results (the coding for the relevant article was corrected).

Clarifications - Limitations about Var4 Coding:

§ Due to coding limitations, in order to be compatible with ReCal (Intercoder Reliability Testing Online), if an article has multiple authors, only the first one is considered

Table C: Var4 : Subject Category, Coding Scheme Code Number

Category Code Number

Category

1 Politics / Government/ Public Administration

9 World News

2 Sports 10 Global Development 3 Culture 11 Environment / Weather /

Nature 4 Health 12 Media / Journalism 5 Military / War 13 Transportation 6 Education 14 Technology / Science 7 Society 15 Economy / Business 8 Crime / Terrorism

Clarifications - Limitations on some categories of Var4 Coding:

§ Politics / Government / Public Administration: Under this category are

classified articles about politics, government, local government, public administration

§ Sports: Under this category are classified articles about athletic events, sports,

athletes § Culture: Under this category are classified articles about music, books, theatre,

TV-shows, radio, cinema, museums, libraries, events, history, archaeology, food (when not related to health), customs, awards

§ Health: Under this category are classified articles about health, nutrition,

cosmetic surgery, diseases, epidemics

133

§ Military / War: Under this category are classified articles about military, war,

warzones, refugees, victims § Education: Under this category are classified articles about education, schools,

universities, literacy, (unless referring to Governmental reforms & policies of those, then they classified under category 1)

§ Society: Under this category are classified articles about unemployment,

employment, poverty, immigration (unless referring to immigration policies, then classified under Category 1), social media, demographics (population, income), drugs and alcohol consumption, traveling, life-style

§ Crime / Terrorism: Under this category are classified articles about crime,

terrorism (victims, attacks, rates) § World News: Under this category are classified articles about specific news or

topics about a country other than UK or do a comparison between countries, unless the subject is very clear

§ Global Development: Under this category are classified articles about aid,

poverty, global development § Environment / Weather / Nature: Under this category are classified articles

about environment, weather, nature, natural resources, energy consumption, natural disasters

§ Media / Journalism: Under this category are classified articles about media

stations or organisations, journalism, journalism conferences § Technology / Science: Under this category are classified articles about

technology, science, data and open data (unless referring to Government data & open data, then they are classified under category 1), data visualisation

§ Economy / Business: Under this category are classified articles about

economics, economy, business, (unless referring to Governmental reforms & policies of those, then they classified under category 1)

134

Table D: Var8, Var9 & Var10: Types of Visualisation, Coding Scheme based on (Bounford, 2000).

Code Number

Type Code Number

Type

1 Interactive 10 Map 2 Word Cloud 11 Symbol 3 Pie Chart 12 Combination of types 4 Spreadsheet 13 Relational Diagram 5 Video 14 Network Map 6 Line Graph 15 Timeline 7 Bar Chart 16 Scatter Graph 8 Table 0 Not Available / Not shown /

Broken Link / Not functioning 9 Area Chart

Clarifications - Limitations on Var8, Var9 & Var10 Coding:

§ Interactive are considered the visualisations that are animated, usually show combination of types of graphs that require the active participation from the user in order to be explored more

§ If an interactive graph or map have a clear type of graph (for example a bar chart that looks interactive because the user can click on and see some data, or a map that is interactive with more data when one clicks on) then these visualisations are classified under the type of chart they clearly show (bar chart, line graph, etc). Therefore, maps created with Google Fusion for example, or bar/line charts created with Datawrapper, although have a small degree of interactivity, they are classified as map, bar chart, line graph, etc., accordingly.

135

Table E: Var11, Var12 & Var13: Visualisation Tools, Coding Scheme

Code Number

Tool Code Number

Tool

1 Tableau 10 Compete 2 Wordle.net 11 Graphic from

External Source 3 Many Eyes 12 Datawrapper 4 Google Fusion 13 Timetric 5 Zoom.it 14 Prezi 6 Google

Docs / Drive 15 ZeeMaps

7 Opta 16 BatchGeo 8 Infomous 17 Cartödb 9 Guardian Graphics' Team /

Guardian Data Team/ External Freelance Graphist for The Guardian

0 Not Available / Not shown

Clarifications - Limitations on Var11, Var12 & Var13 Coding:

§ Clarification about Tool 7. Opta was assigned as a tool by mistake.

However, the mistake was detected in an early stage of the research and its code was not replaced by another tool in an effort not to miscalculate the coding results (the coding for the relevant article was corrected).

§ Clarification about Tool 9. "Guardian Graphics' Team / Guardian Data

Team/ External Freelance Graphist for The Guardian": Under this category are assigned visualisations that were created by The Guardian Graphics' Team or by an external freelance graphist with an indication 'For The Guardian'. Additionally, in many articles one can see tables (usually pictured with grey color palettes) that are created either by the author of the article of another member of The Guardian Data Journalism Team. However it is not possible to know the tool that it was used to create the table. They could be tables created in a word editing program, or in excel, or with the use of a database language, such as SQL, etc. In all those cases, those visualisation types are classified as created by the data journalism team, therefore are classified under Tool 9. They are not assigned value 0.

136

Appendix 4 - Quantitative Research Findings

All spreadsheets that include the coding for all variables of all articles, links to the articles, their titles and all the statictical tables and charts can be found and downloaded at: https://copy.com/xOsRJcSR1wwL Due to limitations on page margins, some tables and charts had to be inserted as an image. On the link provided above one can find the entire excel file with all tables and charts in larger size. Also note that all data for 2013 refer to articles published till 30th of July (7 out of the 12 months of the year).

4.1 - Visualisations per Article (Table 1, Chart 1) Table 1. Number of 1st, 2nd &3rd Visualisation,Total Number of Visualisations

Chart 1. Number of 1st, 2nd &3rd Visualisation, Total Number of Visualisations

12%

20%

6% 5% 7%

50%

Number of Visualisations per Article

Articles without Visualisation

Articles with 1 Visualisation



Articles with more than 3 Visualisations

Number or Articles

Percentage of Articles

Articles without Visualisation 71 24.07% Articles with 1 Visualisation 118 40.00% Articles with 2 Visualisations 39 13.22% Articles with 3 Visualisations 28 9.49%

Articles with more than 3 Visualisations 39 13.22% Total 295 100.00%

Average number of visualisations per article: 1.97

137

4.2 - Provision of Data Summary and Data Sets (or links to data source) (Table 2, Chart 2) Table 2. Provision of Data Summary and Data Sets (or links to data source) % of total Number Percentage Articles with Data Summary 105 35.59% Articles with Data Set 214 72.54% Articles with only Data Summary 2 0.68% Articles with only Data Set 112 37.97% Articles with Both Data Summary and Data Set 103 34.92%

Chart 2. Provision of Data Summary and Data Sets (or links to data source) % of total

35.59%

72.54%

0.68%

37.97% 34.92%

Articles with Data Summary

Articles with Data Set

Articles with only Data Summary

Articles with only Data Set

Articles with Both Data

Summary and Data Set

Provision of Data Summary and Data Set

138

4.3 - Authors by Number of Publications and Year (in descending order) (Tables 3-5, Charts 3-4) Table 3, part 1. Authors by Number of Publications and Year (in descending order)

139

Table 3, part 2. Authors by Number of Publications and Year (in descending order)

Table 4. Main Authors (percentage of total publications)

Author Name Total Percentage

Simon Rogers 41.02% Ami Sedghi 9.15% Mona Chalabi 7.46% John Burn-Murdoch 5.42% Lisa Evans 3.39% James Ball 2.71% Claire Provost 2.03% Katy Stoddard 2.03% Nick Evershed 1.69% Randeep Ramesh 1.02% Sarah Hartley 1.02% Kevin Anderson 1.02% Others 18.31%

140

Chart 3. Main Authors (percentage of total publications)

Table 5. Main Authors (Publications per year, Percentage of total publications)

Author Name







Total Percentage

Simon Rogers 21 24 31 30 15 121 41.02% Ami Sedghi 0 4 7 7 9 27 9.15% Mona Chalabi 0 0 1 0 21 22 7.46% John Burn-Murdoch 0 0 1 12 3 16 5.42% Lisa Evans 0 2 3 5 0 10 3.39% James Ball 0 0 3 2 3 8 2.71% Claire Provost 0 0 2 3 1 6 2.03% Katy Stoddard 1 4 1 0 0 6 2.03% Nick Evershed 0 0 0 0 5 5 1.69% Randeep Ramesh 0 0 0 3 0 3 1.02% Sarah Hartley 0 2 0 1 0 3 1.02% Kevin Anderson 3 0 0 0 0 3 1.02%

Others 9 8 14 25 9 65 18.31%

41.02%

9.15% 7.46% 5.42% 3.39% 2.71% 2.03% 2.03% 1.69% 1.02% 1.02% 1.02%

18.31%

Authors

141

Chart 4. Main Authors (Publications per year, Percentage of total publications)

142

4.4 - Articles Per Subject per Year (Tables 6-7, Charts 5-12) Table 6. Articles per Subject per Year (Frequencies)

Subject Category Subject Name







1 Politics / Government / Public Administration 6 12 13 12 12 55

2 Sports 0 1 6 9 5 21 3 Culture 2 5 8 5 4 24 4 Health 3 1 3 4 2 13 5 Military / War 3 3 1 1 6 14 6 Education 0 4 7 6 2 19 7 Society 4 7 8 14 11 44

8 Crime / Terrorism 1 0 2 1 1 5

9 World News 4 0 4 4 5 17

10 Global Development 0 2 3 5 2 12

11 Environment / Weather / Nature 4 2 2 5 3 16

12 Media / Journalism 1 3 2 9 2 17

13 Transportation 1 1 1 4 4 11

14 Technology / Science 1 1 1 4 5 12

15 Economy / Business 4 2 2 4 1 13

34 44 63 87 65 Total 34 44 63 87 65

Chart 5. Articles per Subject Total (Frequencies)

143

Chart 6. Articles per Subject per Year (Frequencies)

Table 7. Articles per Subject per Year (Percentages)

Subject Name

Percentage per Subject 2009





Total Percentage

Politics / Government / Public Administration 17.65% 27.27% 20.63% 13.79% 18.46% 18.64% Sports 0.00% 2.27% 9.52% 10.34% 7.69% 7.12% Culture 5.88% 11.36% 12.70% 5.75% 6.15% 8.14% Health 8.82% 2.27% 4.76% 4.60% 3.08% 4.41% Military / War 8.82% 6.82% 1.59% 1.15% 9.23% 4.75% Education 0.00% 9.09% 11.11% 6.90% 3.08% 6.44% Society 11.76% 15.91% 12.70% 16.09% 16.92% 14.92% Crime / Terrorism 2.94% 0.00% 3.17% 1.15% 1.54% 1.69% World News 11.76% 0.00% 6.35% 4.60% 7.69% 5.76% Global Development 0.00% 4.55% 4.76% 5.75% 3.08% 4.07% Environment / Weather / Nature 11.76% 4.55% 3.17% 5.75% 4.62% 5.42% Media / Journalism 2.94% 6.82% 3.17% 10.34% 3.08% 5.76% Transportation 2.94% 2.27% 1.59% 4.60% 6.15% 3.73% Technology / Science 2.94% 2.27% 1.59% 4.60% 7.69% 4.07% Economy / Business 11.76% 4.55% 3.17% 4.60% 1.54% 4.41%

Please note that 2 articles had no content displayed therefore their subject could not be identified.

144

Chart 7. Articles per Subject (Percentages) in total (all years)

Chart 8. Articles per Subject (Percentages) in 2009

18.64% 7.12% 8.14%

4.41% 4.75%

6.44% 14.92%

1.69% 5.76%

4.07% 5.42% 5.76%

3.73% 4.07% 4.41%

Politics / Government / Public Sports Culture Health






Total Percentage Per Subject

17.65%

0.00%

5.88%

8.82%

8.82%

0.00%

11.76%

2.94%

11.76%

0.00%

11.76%

2.94%

2.94%

2.94%

11.76%

Politics / Government / Public

Sports

Culture

Health

Military / War

Education

Society

Crime / Terrorism

World News

Global Development

Environment / Weather / Nature

Media / Journalism

Transportation


Economy / Business


145



27.27% 2.27%

11.36% 2.27%

6.82% 9.09%

15.91% 0.00% 0.00%

4.55% 4.55%

6.82% 2.27% 2.27%

4.55%








20.63% 9.52%

12.70% 4.76%

1.59% 11.11%

12.70% 3.17%

6.35% 4.76%

3.17% 3.17%

1.59% 1.59%

3.17%








146



13.79% 10.34%

5.75% 4.60%

1.15% 6.90%

16.09% 1.15%

4.60% 5.75% 5.75%

10.34% 4.60% 4.60% 4.60%








18.46% 7.69%

6.15% 3.08%

9.23% 3.08%

16.92% 1.54%

7.69% 3.08%

4.62% 3.08%

6.15% 7.69%

1.54%








147

4.5 - Visualisation Types (Table 8, Charts 13-14) Table 8. Types of 1st, 2nd and 3rd Visualisation (Frequencies and Percentages)

148

Chart 13. Types of 1st, 2nd and 3rd Visualisation (Percentages)

Chart 14. Types of Visualisations (Percentages of total use)

14.90%

2.78%

0.25%

3.28% 2.53%

9.60%

18.94%

8.33% 6.82%

15.91%

2.27%

7.58%

4.04%

1.01% 0.76% 1.01%

Total Percentage of Use Per Type of Visualisation

149

4.6 - Visualisation Tools (Tables 9-11, Charts 15-16) Table 9. Visualisation Tools' Use Per Year (Frequencies) and Total Use (Frequencies and Percentage)

Var11, Var12, Var13

Tool Name

Number of Use in 2009





Total Number of Use

% of Use Compared to Other tools

1 Tableau 0 0 3 13 4 20 5.80%

2 Wordle.net

0 7 3 0 0 10 2.90% 3 Many Eyes 0 5 12 1 0 18 5.22%

4 Google Fusion 0 3 13 10 7 33 9.57%

5 Zoom.it 0 0 1 0 0 1 0.29%

6 Google Docs / Drive 0 9 3 2 0 14 4.06%

7 Opta 0 0 0 0 0 0 0.00% 8 Infomous 0 0 3 0 0 3 0.87%

9

Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 9 16 11 18 15 69

20.00%

10 Compete 0 0 2 0 0 2 0.58%

11

Graphic from External Source 14 7 6 61 25 113

32.75%

12 Datawrapper 0 0 3 8 45 56

16.23%

13 Timetric 0 0 0 0 0 0 0.00% 14 Prezi 0 0 1 1 0 2 0.58% 15 ZeeMaps 0 0 0 1 0 1 0.29% 16 BatchGeo 0 0 0 0 1 1 0.29% 17 Cartödb 0 0 0 0 2 2 0.58%

Total 345

150

Table 10. Main Visualisation Tools' Use Per Year (Frequencies)

Main Visualisa

tion Tools per

Year (Frequen

cies)

Graphic from

External Source

Guardian Graphics'

Team / Guardian

Data Team/ External

Freelance Graphist for The

Guardian Data-

wrapper Google Fusion Tableau

Many Eyes

Google Docs / Drive

Wordle.net Others

Number of Use in 2009 14 9 0 0 0 0 0 0 0

Number of Use in 2010 7 16 0 3 0 5 9 7 0 Number of Use in 2011 6 11 3 13 3 12 3 3 7

Number of Use in 2012 61 18 8 10 13 1 2 0 2

Number of Use in 2013 25 15 45 7 4 0 0 0 3

Chart 15. Main Visualisation Tools' Use Per Year (Frequencies)

151

Table 11. Total Use of Main Visualisation Tools (Percentage) in descending order.

Tool Name % of Use Compared to Other tools

Graphic from External Source 32.75% Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 20.00% Datawrapper 16.23% Google Fusion 9.57% Tableau 5.80% Many Eyes 5.22% Google Docs / Drive 4.06% Wordle.net 2.90% Others 3.48%

Chart 16. Total Use of Main Visualisation Tools (Percentage) in descending order.

152

4.7 - Frequencies of Use of Tools, Types and Frequencies of Subjects per Author (in descending Order

Author: Simon Rogers (Tables 12-14, Charts 17-19) Table 12. Total Use of Visualisation Tools (Frequencies)

Author Code / Name Tool Code/ Name Times of Use

1. Simon Rogers 11. Graphic from External Source 49

9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 29

4. Google Fusion 22

3. Many Eyes 13

12. Datawrapper 11

6. Google Docs / Drive 9

2. Wordle.net 7

1. Tableau 4

8. Infomous 3

14. Prezi 2

5. Zoom.it 1

17. Cartödb 1

10. Compete 0

13. Timetric 0

15. Zee Maps 0

16. Batchgeo 0

Chart 17. Total Use of Visualisation Tools (Frequencies)

49#

29#22#

13# 11# 9# 7# 4# 3# 2# 1# 1#

11.#Graphic#from#External#Source##

9.#Guardian#Graphics'#Team#/#

4.#Google#Fusion#

3.#Many#Eyes#

12.#Datawrapper#

6.#Google#Docs#/#Drive#

2.#Wordle.net#

1.#Tableau#

8.#Infomous#

14.#Prezi#

5.#Zoom.it#

17.#Cartödb#

Author:(Simon(Rogers,(Tools(Used(Times#of#Use##

153

Table 13. Total Use of Visualisation Types (Frequencies)

Author Code / Name Type of Visualisation Times of Use

1. Simon Rogers 1. Interactive

36

10. Map 35

7. Bar Chart 19

12. Combination of types 13

9. Area Chart 12

13. Relational Diagram 12

4. Spreadsheet 9

6. Line Graph 9

2. Word Cloud 8

8. Table 5

11. Symbol 5

5. Video 4

3. Pie Chart 1

15. Timeline 1

14. Network Map 0

16. Scatter Graph 0

Chart 18. Total Use of Visualisation Types (Frequencies)

36 35

19

13 12 12 9 9 8

5 5 4 1 1

Author: Simon Rogers, Types Used

154

Table 14. Total Frequencies of Subjects

Author Code / Name Subject code / Name

Number of Articles

1. Simon Rogers

1. Politics / Government / Public Administration 25

7. Society 18

9. World News 10

3. Culture 9

5. Military / War 9

2. Sports 7

4. Health 7

12. Media / Journalism 7

6. Education 6

11. Environment / Weather / Nature 5

13. Transportation 5

8. Crime / Terrorism 4

15. Economy / Business 4

14. Technology / Science 3

10. Global Development 2

Chart 19. Total Frequencies of Subjects

25

18

10 9 9 7 7 7 6 5 5 4 4 3 2

Author: Simon Rogers, Subjects

155

Author: Ami Sedghi (Tables 15-17, Charts 20-22) Table 15. Total Use of Visualisation Tools (Frequencies)

Author Code / Name Tool Code/ Name Times of Use 2. Ami Sedghi 13. Timetric 10

12. Datawrapper 8

10. Compete 6

1. Tableau 2

3. Many Eyes 1

4. Google Fusion 1


2. Wordle.net 0

5. Zoom.it 0

8. Infomous 0


11. Graphic from External Source 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0

Chart 20. Total Frequencies of Used Visualisation Tools

10

8

6

2 1 1 1

Author: Ami Sedghi, Tools Used

156



2. Ami Sedghi 7. Bar Chart 12

1. Interactive 5

6. Line Graph 3

10. Map 3


11. Symbol 2

4. Spreadsheet 1

8. Table 1

9. Area Chart 1

15. Timeline 1

2. Word Cloud 0

3. Pie Chart 0

5. Video 0


14. Network Map 0

16. Scatter Graph 0

Chart 21. Total Frequencies of Used Visualisation Types

12

5 3 3 3

2 1 1 1 1

Author: Ami Sedghi, Types Used

157



Number of Articles

2. Ami Sedghi 2. Sports 7

3. Culture 6

6. Education 4

7. Society 4


5. Military / War 1

9. World News 1


4. Health 0








7 6

4 4

1 1 1 1

Author: Ami Sedghi, Subjects

158

Author: Mona Chalabi (Tables 18-20, Charts 23-25) Table 18. Total Use of Visualisation Tools (Frequencies)


26. Mona Chalabi 12. Datawrapper 17



4. Google Fusion 1

17. Cartödb 1

1. Tableau 0

2. Wordle.net 0

3. Many Eyes 0

5. Zoom.it 0


8. Infomous 0

10. Compete 0

13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0


17

10

6

1 1

12. Datawrapper 11. Graphic from External Source

9. Guardian Graphics' Team / Guardian

Data Team/ External Freelance Graphist for The Guardian

4. Google Fusion 17. Cartödb

Author: Mona Chalabi, Tools Used

159


Author Code / Name Type of Visualisation Times of Use 26. Mona Chalabi 7. Bar Chart 11

6. Line Graph 8

1. Interactive 6

8. Table 4

10. Map 2

5. Video 1


14. Network Map 1

16. Scatter Graph 1

2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

9. Area Chart 0

11. Symbol 0


15. Timeline 0


11

8

6

4

2 1 1 1 1

Author: Mona Chalabi, Types Used

160



Number of Articles

26. Mona Chalabi 7. Society 7


5. Military / War 3

9. World News 3





2. Sports 0

3. Culture 0

4. Health 0

6. Education 0





7

3 3 3 2 2

1 1

Author: Mona Chalabi, Subjects

161

Author: John Burn-Murdoch (Tables 21-23, Charts 26-28) Table 21. Total Use of Visualisation Tools (Frequencies)

Author Code / Name Tool Code/ Name Times of Use 42. John Burn-Murdoch


1. Tableau 9


4. Google Fusion 1

2. Wordle.net 0

3. Many Eyes 0

5. Zoom.it 0


8. Infomous 0

10. Compete 0

12. Datawrapper 0

13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0


15

9

3

1


1. Tableau 9. Guardian Graphics' Team / Guardian


4. Google Fusion

Author: John Burn-‐Murdoch, Tools Used

162



4 4 4

3 3 3

2 2 2

1

Author: John Burn-‐Murdoch, Types Used


42. John Burn-Murdoch

1. Interactive

4

7. Bar Chart 4

10. Map 4

6. Line Graph 3


16. Scatter Graph 3

9. Area Chart 2

11. Symbol 2

14. Network Map 2

8. Table 1

2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

5. Video 0


15. Timeline 0

163


Author Code / Name Subject code / Name Number of Articles 42. John Burn-Murdoch

7. Society 4




2. Sports 1

3. Culture 1

6. Education 1



4. Health 0

5. Military / War 0


9. World News 0




4

3

2 2

1 1 1 1 1

Author: John Burn-‐Murdoch, Subjects

164

Author: Lisa Evans (Tables 24-26, Charts 29-31) Table 24. Total Use of Visualisation Tools (Frequencies)


6. Lisa Evans 11. Graphic from External Source 4


3. Many Eyes 1

1. Tableau 0

2. Wordle.net 0

4. Google Fusion 0

5. Zoom.it 0


8. Infomous 0

10. Compete 0

12. Datawrapper 0

13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0


4

3

1



Team/ External Freelance Graphist for

The Guardian

3. Many Eyes

Author: Lisa Evans, Tools Used

165



6. Lisa Evans 1. Interactive 2

7. Bar Chart 2

8. Table 2


10. Map 1

2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

5. Video 0

6. Line Graph 0

9. Area Chart 0

11. Symbol 0


14. Network Map 0

15. Timeline 0

16. Scatter Graph 0


2 2 2 2

1

1. Interactive 7. Bar Chart 8. Table 12. Combination of types

10. Map

Author: Lisa Evans, Types Used

166



Number of Articles

6. Lisa Evans 1. Politics / Government / Public Administration 4

3. Culture 1

7. Society 1

9. World News 1




2. Sports 0

4. Health 0

5. Military / War 0

6. Education 0






4

1 1 1 1 1 1

Author: Lisa Evans, Subjects

167

Author: James Ball (Tables 27-29, Charts 32-34) Table 27. Total Use of Visualisation Tools (Frequencies)


4. James Ball 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 3

3. Many Eyes 1


1. Tableau 0

2. Wordle.net 0

4. Google Fusion 0

5. Zoom.it 0


8. Infomous 0

10. Compete 0

12. Datawrapper 0

13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0


3

1 1


Team/ External Freelance Graphist for The Guardian

3. Many Eyes 11. Graphic from External Source

Author: James Ball, Tools Used

168



4. James Ball 8. Table 2

1. Interactive 1

5. Video 1

6. Line Graph 1

10. Map 1

2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

7. Bar Chart 0

9. Area Chart 0

11. Symbol 0



14. Network Map 0

15. Timeline 0


2

1 1 1 1

8. Table 1. Interactive 5. Video 6. Line Graph 10. Map

Author: James Ball, Types Used

169



Number of Articles

4. James Ball 1. Politics / Government / Public Administration 3

4. Health 2

6. Education 1

7. Society 1


2. Sports 0

3. Culture 0

5. Military / War 0


9. World News 0







3

2

1 1 1

1. Politics / Government /

Public Administration

4. Health 6. Education 7. Society 15. Economy / Business

Author: James Ball, Subjects

170

Author: Claire Provost (Tables 30-32, Charts 35-37) Table 30. Total Use of Visualisation Tools (Frequencies)


7. Claire Provost 11. Graphic from External Source 3


1. Tableau 0

2. Wordle.net 0

3. Many Eyes 0

4. Google Fusion 0

5. Zoom.it 0


8. Infomous 0

10. Compete 0

12. Datawrapper 0

13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0


3

2

11. Graphic from External Source 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian

Author: Claire Provost, Tools Used

171


Author Code / Name Type of Visualisation Times of Use 7. Claire Provost 10. Map 3

6. Line Graph 2

8. Table 2

1. Interactive 1


2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

5. Video 0

7. Bar Chart 0

9. Area Chart 0

11. Symbol 0


14. Network Map 0

15. Timeline 0

16. Scatter Graph 0


3

2 2

1 1

10. Map 6. Line Graph 8. Table 1. Interactive 12. Combination of

types

Author: Claire Provost, Types Used

172



Number of Articles

7. Claire Provost 10. Global Development

6


2. Sports 0

3. Culture 0

4. Health 0

5. Military / War 0

6. Education 0

7. Society 0


9. World News 0







6

0

10. Global Development 6.Others

Author: Claire Provost, Subjects

173

Author: Katy Stoddard (Tables 33-35, Charts 38-40) Table 33. Total Use of Visualisation Tools (Frequencies)


14. Katy Stoddard 2. Wordle.net 3


1. Tableau 0

3. Many Eyes 0

4. Google Fusion 0

5. Zoom.it 0

8. Infomous 0


10. Compete 0


12. Datawrapper 0

13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0


3

1

2. Wordle.net 6. Google Docs / Drive

Author: Kathy Stodard, Tools Used

174


Author Code / Name Type of Visualisation Times of Use 14. Katy Stoddard 2. Word Cloud

3

4. Spreadsheet 1

1. Interactive 0

3. Pie Chart 0

5. Video 0

6. Line Graph 0

7. Bar Chart 0

8. Table 0

9. Area Chart 0

10. Map 0

11. Symbol 0



14. Network Map 0

15. Timeline 0

16. Scatter Graph 0


3

1

2. Word Cloud 4. Spreadsheet

Author: Kathy Stodard, Types Used

175



Number of Articles

14. Katy Stoddard 3. Culture

3


7. Society 1


2. Sports 0

4. Health 0

5. Military / War 0

6. Education 0


9. World News 0







3

1 1 1

3. Culture 1. Politics / Government / Public Administration

7. Society 13. Transportation

Author: Kathy Stodard, Subjects

176

Author: Nick Evershed (Tables 36-38, Charts 41-43) Table 36. Total Use of Visualisation Tools (Frequencies)


27. Nick Evershed 12. Datawrapper 10



1. Tableau 0

2. Wordle.net 0

3. Many Eyes 0

4. Google Fusion 0

5. Zoom.it 0


8. Infomous 0

10. Compete 0

13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0


10

1 1

12. Datawrapper 9. Guardian Graphics' Team / Guardian Data



Author: Nick Evershed, Tools Used

177


Author Code / Name Type of Visualisation Times of Use 27. Nick Evershed 8. Table

2

5. Video 1

6. Line Graph 1

7. Bar Chart 1

10. Map 1

1. Interactive 0

2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

9. Area Chart 0

11. Symbol 0



14. Network Map 0

15. Timeline 0

16. Scatter Graph 0


2

1 1 1 1

8. Table 5. Video 6. Line Graph 7. Bar Chart 10. Map

Author: Nick Evershed, Types Used

178



Number of Articles

27. Nick Evershed 9. World News 2





2. Sports 0

3. Culture 0

4. Health 0

5. Military / War 0

6. Education 0

7. Society 0






2

1 1 1

9. World News 11. Environment / Weather / Nature

13. Transportation 14. Technology / Science

Author: Nick Eveshed, Subjects

179

Author: Randeep Ramesh (Tables 39-41, Charts 44-46) Table 39. Total Use of Visualisation Tools (Frequencies)


12. Randeep Ramesh 4. Google Fusion 1

9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian

1

12. Datawrapper 1

1. Tableau 0

2. Wordle.net 0

3. Many Eyes 0

5. Zoom.it 0


8. Infomous 0

10. Compete 0


13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0


1 1 1

4. Google Fusion 9. Guardian Graphics' Team / Guardian Data


12. Datawrapper

Author: Randeep Ramesh, Tools Used

180


Author Code / Name Type of Visualisation Times of Use 12. Randeep Ramesh 9. Area Chart

2

6. Line Graph 1

7. Bar Chart 1

8. Table 1

10. Map 1

1. Interactive 0

2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

5. Video 0

11. Symbol 0



14. Network Map 0

15. Timeline 0

16. Scatter Graph 0


2

1 1 1 1

9. Area Chart 6. Line Graph 7. Bar Chart 8. Table 10. Map

Author: Randeep Ramesh, Types Used

181



Number of Articles

12. Randeep Ramesh 7. Society

2


2. Sports 0

3. Culture 0

4. Health 0

5. Military / War 0

6. Education 0


9. World News 0








2

1

7. Society 1. Politics / Government / Public Administration

Author: Randeep Ramesh, Subects

182

Author: Sarah Hartley (Tables 42-44, Charts 47-49) Table 42. Total Use of Visualisation Tools (Frequencies)


17. Sarah Hartley 3. Many Eyes 1

4. Google Fusion 1


15. Zee Maps 1

1. Tableau 0

2. Wordle.net 0

5. Zoom.it 0


8. Infomous 0

10. Compete 0


12. Datawrapper 0

13. Timetric 0

14. Prezi 0

16. Batchgeo 0

17. Cartödb 0


1 1 1 1

3. Many Eyes 4. Google Fusion 9. Guardian Graphics' Team / Guardian


15. Zee Maps

Author: Sarah Hartley, Tools Used

183


Author Code / Name Type of Visualisation Times of Use 17. Sarah Hartley 10. Map

2

1. Interactive 1

9. Area Chart 1

2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

5. Video 0

6. Line Graph 0

7. Bar Chart 0

8. Table 0

11. Symbol 0



14. Network Map 0

15. Timeline 0

16. Scatter Graph 0


2

1 1

10. Map 1. Interactive 9. Area Chart

Author: Sarah Hartley, Types Used

184



Number of Articles

17. Sarah Hartley 1. Politics / Government / Public

Administration 1



2. Sports 0

3. Culture 0

4. Health 0

5. Military / War 0

6. Education 0

7. Society 0


9. World News 0






1 1 1

1. Politics / Government / Public Administration

12. Media / Journalism 13. Transportation

Author: Sarah Hartley, Subjects

185

Author: Kevin Anderson (Tables 45-47, Charts 50-52) Table 45. Total Use of Visualisation Tools (Frequencies)


21. Kevin Anderson 11. Graphic from External Source 5

1. Tableau 0

2. Wordle.net 0

3. Many Eyes 0

4. Google Fusion 0

5. Zoom.it 0


8. Infomous 0


10. Compete 0

12. Datawrapper 0

13. Timetric 0

14. Prezi 0

15. Zee Maps 0

16. Batchgeo 0

17. Cartödb 0


5

0

11. Graphic from External Source Others

Author: Kevin Anderson, Tools Used

186



21. Kevin Anderson 13. Relational Diagram

3

5. Video 1

10. Map 1

1. Interactive 0

2. Word Cloud 0

3. Pie Chart 0

4. Spreadsheet 0

6. Line Graph 0

7. Bar Chart 0

8. Table 0

9. Area Chart 0

11. Symbol 0


14. Network Map 0

15. Timeline 0

16. Scatter Graph 0


3

1 1

13. Relational Diagram 5. Video 10. Map

Author: Kevin Anderson, Types Used

187



Number of Articles

21. Kevin Anderson




2. Sports 0

3. Culture 0

4. Health 0

5. Military / War 0

6. Education 0

7. Society 0


9. World News 0






1 1 1

1. Politics / Government / Public Administration

12. Media / Journalism 15. Economy / Business

Author: Kevin Anderson, Subjects

188

4.8 Types of Visualisations per Subject and Subjects of Visualisations per Type

Visualisation Types per Subject: (Table 48, Charts 53-68) Table 48. Part 1: Number of Graphs per Visualisation Type per Subject

Type

Subject Code /Name

1. Interactive

2. Word Cloud

3. Pie

Chart 4.

Spreadsheet 5.

Video 6. Line Graph

7. Bar Chart

8. Table

1. Politics / Government / Public Administration 6 7 0 3 2 6 12 6 2. Sports 9 0 0 2 0 4 6 4

3. Culture 6 4 1 2 0 4 6 4

4. Health 0 0 0 0 0 0 3 4

5. Military / War 3 0 0 1 1 1 1 1

6. Education 4 0 0 0 0 2 8 0

7. Society 10 0 0 3 2 6 16 2

8. Crime / Terrorism

3 0 0 0 0 4 1 1

9. World News 4 0 0 0 2 0 6 3

10. Global Development

2 0 0 0 0 3 4 3 11. Environment / Weather / Nature 3 0 0 2 0 3 1 3

12. Media / Journalism

4 0 0 0 2 0 1 0 13. Transportation 2 0 0 0 0 0 3 8

14. Technology / Science

1 0 0 0 0 0 6 0 15. Economy / Business 0 0 0 0 0 3 1 2

189

Table 48. Part 2: Number of Graphs per Visualisation Type per Subject

Type

Subject Code /Name

9. Area Chart

10. Map

11. Symbol

12. Combination of types

13. Relational Diagram

14. Network Map

15. Timeline

16. Scatter Graph

1. Politics / Government / Public Administration 12 8 0 5 1 0 0 1 2. Sports 1 2 2 7 0 0 1 0

3. Culture 2 3 4 1 2 0 1 0

4. Health 0 7 1 1 3 0 0 0

5. Military / War 2 2 0 3 1 0 0 0

6. Education 0 6 0 0 0 0 0 0

7. Society 6 9 1 1 0 0 0 3

8. Crime / Terrorism

4 2 0 0 0 0 0 0

9. World News 0 3 0 0 4 0 0 0

10. Global Development

0 5 0 1 0 0 0 0 11. Environment / Weather / Nature 2 2 0 6 1 0 0 0

12. Media / Journalism

2 2 1 1 3 1 1 0 13. Transportation 0 7 0 0 0 0 0 0

14. Technology / Science

0 1 0 3 0 1 0 0 15. Economy / Business 1 3 0 1 1 2 0 0

190

Chart 53. Visualisation Type: 1. Interactive, per Subject (Frequencies)

Chart 54. Visualisation Type: 2. Word Cloud, per Subject (Frequencies)

10 9

6 6

4 4 4

3 3 3

2 2

1 0 0

Society Sports

Politics / Government / Public Culture

Education World News

Media / Journalism Military / War


Global Development Transportation

Technology / Science Health

Economy / Business

Type: Interactive, Per Subject

7

4

0

Politics / Government / Public Administration

Culture Others

Type: Word Cloud, Per Subject

191

Chart 55. Visualisation Type: 3. Pie Chart, per Subject (Frequencies)

Chart 56. Visualisation Type: 4. Spreadsheet, per Subject (Frequencies)

1

0

Culture Others

Type: Pie Chart, per Subject

3 3

2 2 2

1

0

Type: Spreadsheet, per Subject

192

Chart 57. Visualisation Type: 5. Video, per Subject (Frequencies)

Chart 58. Visualisation Type: 6. Line Graph, per Subject (Frequencies)

2 2 2 2

1

0

Type: Video, per Subject

6

6

4

4

4

3

3

3

2

1

0


Society

Sports

Culture

Crime / Terrorism

Global Development


Economy / Business

Education

Military / War

Others

Type: Line Graph, per Subject

193

Chart 59. Visualisation Type: 7. Bar Chart, per Subject (Frequencies)

Chart 60. Visualisation Type: 8. Table, per Subject (Frequencies)

16 12

8 6 6 6 6

4 3 3

1 1 1 1 1


Education Sports Culture

World News Technology / Science Global Development

Health Transportation Military / War


Media / Journalism Economy / Business

Type: Bar Chart, Per Subject

8

6

4

4

4

3

3

3

2

2

1

1

0

Transportation


Sports

Culture

Health

World News

Global Development


Society

Economy / Business

Military / War

Crime / Terrorism

Others

Type: Table, Per Subject

194

Chart 61. Visualisation Type: 9. Area Chart, per Subject (Frequencies)

Chart 62. Visualisation Type: 10. Map, per Subject (Frequencies)

12

6

4

2

2

2

2

1

1


Society

Crime / Terrorism

Culture

Military / War


Media / Journalism

Sports

Economy / Business

Type: Area Chart, per Subject

9 8

7 7

6 5

3 3 3

2 2 2 2 2

1


Health Transportation

Education Global Development

Culture World News

Economy / Business Sports

Military / War Crime / Terrorism

Environment / Weather / Nature Media / Journalism


Type: Map, per Subject

195

Chart 63. Visualisation Type: 11. Symbol, per Subject (Frequencies)

Chart 64. Visualisation Type: 12. Combination, per Subject (Frequencies)

4

2

1 1 1

0

Culture Sports Health Society Media / Journalism

Others

Type: Symbol, per Subject

7

6

5

3

3

1

1

1

1

1

1

0

Sports



Military / War


Culture

Health

Society

Global Development

Media / Journalism

Economy / Business

Others

Type: Combination, per Subject

196

Chart 65. Visualisation Type: 13. Relational Diagram, per Subject (Frequencies)

Chart 66. Visualisation Type: 14. Network Map, per Subject (Frequencies)

4

3

3

2

1

1

1

1

0

World News

Health

Media / Journalism

Culture


Military / War


Economy / Business

Others

Type: Relational Diagram, per Subject

2

1 1

0

Economy / Business

Media / Journalism Technology / Science

Others

Type: Network Map, per Subject

197

Chart 67. Visualisation Type: 15. Timeline, per Subject (Frequencies)

Chart 68. Visualisation Type: 15. Timeline, per Subject (Frequencies)

1 1 1

0

Sports Culture Media / Journalism Others

Type: Timeline, per Subject

3

1

0

Society Politics / Government / Public Administration

Others

Type: Scatter Graph, per Subject

198

Subjects per Visualisation Types: (Table 49, Charts 69-83)

Table 49. Part 1: Number of Graphs per Subject per Visualisation Type

Subject Code / Name

Type

1. Politics / Government /

Public Administration

2. Sports

3. Culture

4. Health

5. Military / War

6. Education

7. Society

8. Crime / Terroris

m

Interactive

6 9 6 0 3 4 10 3 Word Cloud 7 0 4 0 0 0 0 0

Pie Chart 0 0 1 0 0 0 0 0

Spreadsheet 3 2 2 0 1 0 3 0

Video 2 0 0 0 1 0 2 0

Line Graph 6 4 4 0 1 2 6 4

Bar Chart 12 6 6 3 1 8 16 1

Table

6 4 4 4 1 0 2 1

Area Chart 12 1 2 0 2 0 6 0

Map

8 2 3 7 0 6 9 2

Symbol

0 2 4 1 0 0 1 0

Combination of types

5 7 1 1 0 0 1 0 Relational Diagram 1 0 2 3 0 0 0 0

Network Map

0 0 0 0 0 0 0

Timeline 0 1 1 0 0 0 0 0

Scatter Graph 1 0 0 0 0 0 3 0

199

Table 49. Part 2: Number of Graphs per Subject per Visualisation Type

Subject Code / Name

Type

9. World News

10. Global

Development

11. Environment / Weather /

Nature

12. Media

/ Journalism

11. Transportat

ion

12. Technology

/ Science

13. Economy / Business

Interactive

4 2 3 4 2 1 0 Word Cloud 0 0 0 0 0 0 0

Pie Chart 0 0 0 0 0 0 0

Spreadsheet 0 0 2 0 0 0 0

Video 2 0 0 2 0 0 1

Line Graph 0 3 3 0 0 0 3

Bar Chart 6 4 1 1 3 6 1

Table

3 3 3 0 8 0 2

Area Chart 4 0 0 2 0 0 1

Map

3 5 2 2 7 1 3

Symbol

0 0 0 1 0 0 0

Combination of types

0 1 6 1 0 3 1 Relational Diagram 4 0 1 3 0 0 1

Network Map

0 0 0 1 0 1 2

Timeline 0 0 0 1 0 0 0

Scatter Graph 0 0 0 0 0 0 0

200

Chart 69. Subject 1. Politics / Government / Public Administration, per Visualisation Type (Frequencies)

Chart 70. Subject 2. Sports, per Visualisation Type (Frequencies)

12 12

8 7

6 6 6 5

3 2

1 1 0 0 0 0

12 12

8 7

6 6 6 5

3 2

1 1 0 0 0 0

Subject: Politics / Government / Public Administration, per Type

9

7 6

4 4

2 2 2 1 1

0

Subject: Sports, per Type

201

Chart 71. Subject 3. Culture, per Visualisation Type (Frequencies)

Chart 72. Subject 4. Health, per Visualisation Type (Frequencies)

6 6

4 4 4 4

3

2 2 2

1 1 1

0

Subject: Culture, per Type

7

4

3 3

1 1

0 Map Table Bar Chart Relational

Diagram Symbol Combination

of types Others

Subject: Health, per Type

202

Chart 73. Subject 5. Military / War, per Visualisation Type (Frequencies)

Chart 74. Subject 6. Education, per Visualisation Type (Frequencies)

3

2

1 1 1 1 1

0

Subject: Military/War, per Type

8

6

4

2

0 Bar Chart Map Interactive Line Graph Others

Subject: Education, per Type

203

Chart 75. Subject 7. Society, per Visualisation Type (Frequencies)

Chart 76. Subject 8. Crime / Terrorism, per Visualisation Type (Frequencies)

16

10 9

6 6

3 3 2 2

1 1 0

Subject: Society, per Type

4

3

2

1 1

0 Line Graph Interactive Map Bar Chart Table Others

Subject: Crime / Terrorism, per Type

204

Chart 77. Subject 9. World News, per Visualisation Type (Frequencies)

Chart 78. Subject 10. Global Development, per Visualisation Type (Frequencies)

6

4 4 4

3 3

2

0 Bar Chart Interactive Area Chart Relational

Diagram Table Map Video Others

Subject: World News, per Type

5

4

3 3

2

1

0

Subject: Global Development, per Type

205

Chart 79. Subject 11. Environment / Weather / Nature, per Visualisation Type (Frequencies)

Chart 80. Subject 12. Media / Journalism, per Visualisation Type (Frequencies)

6

3 3 3

2 2

1 1

0

Subject: Environment / Weather / Nature, per Type

4

3

2 2 2

1 1 1 1 1

0

Subject: Media / Journalism, per Type

206

Chart 81. Subject 13. Transportation, per Visualisation Type (Frequencies)

Chart 82. Subject 14. Technology / Science, per Visualisation Type (Frequencies)

8

7

3

2

0 Table Map Bar Chart Interactive Others

Subject: Transportation, per Type

6

3

1 1 1

0 Bar Chart Combination

of types Interactive Map Network Map Others

Subect: Technology / Science, per Type

207

Chart 83. Subject 15. Economy / Business, per Visualisation Type (Frequencies)

3 3

2 2

1 1 1 1 1

0

Subject: Economy / Business, per Type

208

4.9 Most Used Visualisation Types per Most Used Visualisation Tools and Vice Versa

Most Used Visualisation Types per Most Used Visualisation Tools (Table 50, Charts 84-93) Table 50. Most Used Types per Most Used Tools (Frequencies)

Type Name / Code Tool Name / Code

7. Bar Chart

10. Map

1. Interactive

6. Line Graph

8. Table

12. Combination of types

9. Area Chart

13. Relational Diagram

4. Spreadsheet

2. Word Cloud

11. Graphic from External Source 9 15 23 7 4 22 5 12 0 0 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 6 9 5 2 22 4 14 3 0 0 12. Datawrapper 40 0 0 13 2 0 0 0 0 0 4. Google Fusion 3 28 0 0 0 0 2 0 0 0 1. Tableau 6 2 2 1 0 2 1 1 0 0 3. Many Eyes 0 0 18 0 0 0 0 0 0 0


0 0 0 0 0 0 0 0 13 0 2. Wordle.net 0 0 0 0 0 0 0 0 0 10 Other 0 4 4 2 0 0 0 0 0 0 Not Known / Not available 11 6 6 12 0 2 5 0 0 1

209

Chart 84. Type 1. Interactive, per most important Tools (Frequency)

Chart 85. Type 2. Word Cloud, per most important Tools (Frequency)

23

18

6

5

4

2

0

0

0

0


3. Many Eyes



Other

1. Tableau

12. Datawrapper

4. Google Fusion


2. Wordle.net

Type: Interactive, per Tool

10 1

0 0 0 0 0 0 0 0

2. Wordle.net Not Known / Not available

11. Graphic from External Source 9. Guardian Graphics' Team / Guardian Data

12. Datawrapper 4. Google Fusion

1. Tableau 3. Many Eyes

6. Google Docs / Drive Other

Type: Word Cloud, per Tool

210

Chart 86. Type 4. Spreadsheet, per most important Tools (Frequency)

Chart 87. Type 6. Line Graph, per most important Tools (Frequency)

13

0

0

0

0

0

0

0

0

0




12. Datawrapper

4. Google Fusion

1. Tableau

3. Many Eyes

2. Wordle.net

Other


Type: Spreadsheet, per Tool

13

12

7

2

2

1

0

0

0

0

12. Datawrapper




Other

1. Tableau

4. Google Fusion

3. Many Eyes


2. Wordle.net

Type: Line Graph, per Tool

211

Chart 88. Type 7. Bar Chart, per most important Tools (Frequency)

Chart 89. Type 8. Table, per most important Tools (Frequency)

40

11

9

6

6

3

0

0

0

0

12. Datawrapper




1. Tableau

4. Google Fusion

3. Many Eyes


2. Wordle.net

Other

Type: Bar Chart, per Tool

22

4

2

0

0

0

0

0

0

0



12. Datawrapper

4. Google Fusion

1. Tableau

3. Many Eyes


2. Wordle.net

Other


Type: Table, per Tool

212

Chart 90. Type 9. Area Chart, per most important Tools (Frequency)

Chart 91. Type 10. Map, per most important Tools (Frequency)

14

5

5

2

1

0

0

0

0

0




4. Google Fusion

1. Tableau

12. Datawrapper

3. Many Eyes


2. Wordle.net

Other

Type: Area Chart, per Tool

28

15

9

6

4

2

0

0

0

0

4. Google Fusion




Other

1. Tableau

12. Datawrapper

3. Many Eyes


2. Wordle.net

Type: Map, per Tool

213

Chart 92. Type 12. Combination, per most important Tools (Frequency)

Chart 93. Type 13. Relational Diagram, per most important Tools (Frequency)

22

4

2

2

0

0

0

0

0

0



1. Tableau


12. Datawrapper

4. Google Fusion

3. Many Eyes


2. Wordle.net

Other

Type: Combination, per Tool

12

3

1

0

0

0

0

0

0

0



1. Tableau

12. Datawrapper

4. Google Fusion

3. Many Eyes


2. Wordle.net

Other


Type: Relational Diagram, per Tool

214

Most Used Visualisation Tools per Most Used Visualisation Types (Table 51, Charts 94-101) Table 51. Most Used Tools per Most Used Types (Frequencies)

Tool Name / Code Type Name / Code


9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian

12. Datawrapper

4. Google Fusion

1. Tableau

3. Many Eyes


2. Wordle.net

Other


7. Bar Chart 9 6 40 3 6 0 0 0 0 11

10. Map 15 9 0 28 2 0 0 0 4 6

1. Interactive 23 5 0 0 2 18 0 0 4 6 6. Line Graph 7 2 13 0 1 0 0 0 2 12 8. Table 4 22 2 0 0 0 0 0 0 0 12. Combination of types 22 4 0 0 2 0 0 0 0 2 9. Area Chart 5 14 0 2 1 0 0 0 0 5 13. Relational Diagram 12 3 0 0 1 0 0 0 0 0 4. Spreadsheet 0 0 0 0 0 0 13 0 0 0 2. Word Cloud 0 0 0 0 0 0 0 10 0 1

215

Chart 94. Tool 1. Tableau, per most important Visualisation Types (Frequency)

Chart 95. Tool 2. Wordle.net, per most important Visualisation Types (Frequency)

6

2 2 2 1 1 1

0 0 0

Tool: Tableau, per Type

10

0 0 0 0 0 0 0 0 0

Tool: Wordle.net, per Type

216

Chart 96. Tool 3. Many Eyes, per most important Visualisation Types (Frequency)

Chart 97. Tool 4. Google Fusion, per most important Visualisation Types (Frequency)

18

0 0 0 0 0 0 0 0 0

Tool: Many Eyes, per Type

28

3 2 0 0 0 0 0 0 0

Tool: Google Fusion, per Type

217

Chart 98. Tool 6. Google Docs / Drive, per most important Visualisation Types (Frequency)

Chart 99. Tool 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per most important Visualisation Types (Frequency)

13

0 0 0 0 0 0 0 0 0

Tool: Google Docs / Drive, per Type

22

14

9 6 5 4 3 2

0 0

Tool: Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per Type

218

Chart 100. Tool 11. Graphic from External Source, per most important Visualisation Types (Frequency)

Chart 101. Tool 12. Datawrapper from External Source, per most important Visualisation Types (Frequency)

23 22

15

12

9 7

5 4

0 0

Tool: Graphic from External Source, per Type

40

13

2 0 0 0 0 0 0 0

Tool: Datawrapper, per Type

219

Information School.

Access to Dissertation A Dissertation submitted to the University may be held by the Department (or School) within which the Dissertation was undertaken and made available for borrowing or consultation in accordance with University Regulations. Requests for the loan of dissertations may be received from libraries in the UK and overseas. The Department may also receive requests from other organisations, as well as individuals. The conservation of the original dissertation is better assured if the Department and/or Library can fulfill such requests by sending a copy. The Department may also make your dissertation available via its web pages. In certain cases where confidentiality of information is concerned, if either the author or the supervisor so requests, the Department will withhold the dissertation from loan or consultation for the period specified below. Where no such restriction is in force, the Department may also deposit the Dissertation in the University of Sheffield Library. To be completed by the Author – Select (a) or (b) by placing a tick in the appropriate box If you are willing to give permission for the Information School to make your dissertation available in these ways, please complete the following: ✓ (a) Subject to the General Regulation on Intellectual Property, I, the author, agree to this dissertation being made

immediately available through the Department and/or University Library for consultation, and for the Department and/or Library to reproduce this dissertation in whole or part in order to supply single copies for the purpose of research or private study

(b) Subject to the General Regulation on Intellectual Property, I, the author, request that this dissertation be withheld from loan, consultation or reproduction for a period of [ ] years from the date of its submission. Subsequent to this period, I agree to this dissertation being made available through the Department and/or University Library for consultation, and for the Department and/or Library to reproduce this dissertation in whole or part in order to supply single copies for the purpose of research or private study

Name: CHARALAMPIA BOULA

Department MSc in DIGITAL LIBRARY MANAGEMENT

Signed

Date 01/09/2013

To be completed by the Supervisor – Select (a) or (b) by placing a tick in the appropriate box (a) I, the supervisor, agree to this dissertation being made immediately available through the Department and/or

University Library for loan or consultation, subject to any special restrictions (*) agreed with external organisations as part of a collaborative project.

*Special restrictions

(b) I, the supervisor, request that this dissertation be withheld from loan, consultation or reproduction for a period of [ ] years from the date of its submission. Subsequent to this period, I, agree to this dissertation being made available through the Department and/or University Library for loan or consultation, subject to any special restrictions (*) agreed with external organisations as part of a collaborative project

Name: Department Signed Date THIS SHEET MUST BE SUBMITTED WITH DISSERTATIONS IN ACCORDANCE WITH DEPARTMENTAL REQUIREMENTS.