Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
INFOGAPHICS: DATA AND INFORMATION VISUALISATION AND ITS USE IN JOURNALISM - A CASE STUDY ON GUARDIAN’S DATA STORE
A study submitted in partial fulfillment of the requirements for the degree of MSc in Digital Library Management
at
THE UNIVERSITY OF SHEFFIELD
by
Charalampia Boula
September 2013
2
I am deeply grateful to my
parents and my sister Daphne
for their love and support.
~~~~~~~~~~~~
A big Thank You to
Dr. Farida Vis for her
supervision and
guidance.
~~~~~~~~~~~~
I would also like to express my
gratitude to Mrs Lisa Evans,
Mr. Jacopo Ottaviani and Mr.
Paul Bradshaw for agreeing to
be interviewed and for
providing me with a great
insight on the subject.
~~~~~~~~~~~~
A special Thank you to
Dr. Andrew Cox and to
Dr. George. F. Turner for
their advice and support
throughout the year.
3
AbstractBackground.
Information and Data Visualisation show significant increase of use and importance especially in the media. New creative and scientific tools of data processing and visualisation have led to more effective and creative visualisations, but also to more complex ones. As primal providers of information for the public, the media have turned their spotlight in Data driven journalism, with ultimate aim to attract the readers'/users' attention and to increase the credibility of their publications. Aims.
The study primarily aims to examine the role of information and data visualisation in journalism by examining the case of the biggest data-journalism portfolio in UK, The Guardians' "Data Store". Methods.
An inductive methodological approach is followed with the employment of both qualitative and quantitative research methods. The qualitative approach consists of interviews with professionals in the field of data journalism and a thematic analysis of those interviews. The quantitative approach is based on a Systematic Content Analysis of 295 articles published on The Guardian Data Store. Findings.
Quantitative research showed that 50% of the articles provided at least one type of visualisation with an average of two visualisartions. It also revealed some tendencies in the use of specific visualisation tools for particular visualisation types and particular visualisation types per subject categories. Interviews revealed the creative workflow in data journalism and visualisation, the challenge of connecting numerical data to human stories and the important role of data journalism and visualisation in transparency. Conclusions.
The study managed to meet its objectives in a good degree and concluded that data journalism and visualisation will continue to grow in use and importance as data processing and visualisation tools keep advancing and more people of different backgrounds will combine their knowledge and skills on the field and bring more effectiveness and creativity.
3
Table of Contents
Abstract ......................................................................................................... 3
Table of Contents .......................................................................................... 3 Index of Tables and Charts (in main dissertation body) ........................... 6
Tables ................................................................................................................................... 6 Charts ................................................................................................................................... 6
1. Introduction ............................................................................................... 7 1.1 Context and significance of the topic ................................................................................. 7 1.2 Controversies ...................................................................................................................... 7 1.3 Brief synopsis of literature review ...................................................................................... 8 1.4 Rationale behind the choice of topic .................................................................................. 8 1.5 About this research ............................................................................................................. 8
1.5.1 Research aims, questions and objectives .................................................................... 9 Aim: ................................................................................................................................................ 9 Research Questions: ....................................................................................................................... 9 Objectives: .................................................................................................................................... 10
1.5.2 Methodology .............................................................................................................. 10 1.6 Dissertation Structure ....................................................................................................... 10 1.7 The Guardian Data Store .................................................................................................. 11 2. Literature Review .................................................................................... 11 2.1. Terminology .................................................................................................................... 11 2.2 Brief summary of the History of Visualisation ................................................................ 12 2.3 Data and Information Visualisation .................................................................................. 13
2.3.1 Principles of Good and Effective Data Visualisation ............................................... 13 2.3.2 Visualisation Types ................................................................................................... 14 2.3.3 Data Processing and Data Visualisation Tools ........................................................ 14 2.3.4 New Tendencies in Data Visualisation ..................................................................... 16 2.3.5 Challenges and Controversies in Data Visualisation ............................................... 18
2.3.5.1 Raw versus Aggregate Data ............................................................................................. 18 2.3.5.2 Avoiding nonsense ........................................................................................................... 18 2.3.5.3 Strange Visualisations: How much is too much and what is considered as Bad Visualisation? ............................................................................................................................... 19 2.3.5.4. Cultural Bias in Data Visualisation and Objectivity ....................................................... 19
2.4. Data Journalism ............................................................................................................... 20 2.4.1 Data and its Challenges ............................................................................................ 21 2.4.2 Open Data and Crowdsourcing ................................................................................ 21 2.4.3 Big Data .................................................................................................................... 22
2.5 Data Visualisation in Data Journalism ............................................................................. 24 2.5.1 Workflow in Data Journalism ........................................................................................ 25 2.6 The Guardian Data Store .................................................................................................. 25 3. Methodology ............................................................................................ 26 3.1 Ethical Approval ............................................................................................................... 27 3.2 Qualitative Research ......................................................................................................... 27
3.2.1 Design and execution of interviews ........................................................................... 29 3.2.1.1 Profile of Interviewees ..................................................................................................... 29
Jacopo Ottaviani: ..................................................................................................................... 29 Lisa Evans: .............................................................................................................................. 29 Paul Bradshaw: ........................................................................................................................ 29
3.2.1.2 Interviews' Preparation and Conducting .......................................................................... 30 3.2.1.3 Data Collection and Processing ....................................................................................... 31
4
3.2.1.4 Limitations and disadvantages of interviewing ................................................................ 31 3.3 Quantitative Research: ...................................................................................................... 32
3.3.1 Design and Implementation of Systematic Content Analysis .................................... 33 3.3.1.1 Limitations in Coding ...................................................................................................... 35 3.3.1.2 Data Processing ................................................................................................................ 36
3.3.2 Inter-Coder Reliability Testing ................................................................................. 36 3.3.2.1 Inter-Coder Reliability Test Results ................................................................................. 37
4. Findings and Discussion ....................................................................... 40 4.1 Research Question 1: ........................................................................................................ 40
4.1.1.1 Most Important Findings and Parallel Discussion: .......................................................... 40 Visualisations per Article ........................................................................................................ 40 Provision of Data Summary and Data Sets (or links to data source) ........................... 41 Authors by Number of Publications and Year (in descending order) ........................... 42 Articles Per Subject per Year .............................................................................................. 44 Visualisation Types .............................................................................................................. 45 Visualisation Tools ................................................................................................................ 47 Frequencies of Use of Tools, Types and Frequencies of Subjects of Authors - The case of Simon Rogers .......................................................................................................... 48 Visualisation Types per Subject and Subjects per Visualisation Types: ..................... 50 Visualisation Tools and Visualisation Types .................................................................... 54
4.2 Research Question 2: ........................................................................................................ 59 4.2.2 Theme 1: Data Sources, Data Gathering and Processing, Data Visualisation: Workflow, Tools and Decision Making .............................................................................. 59
4.2.2.1 Findings: ........................................................................................................................ 59 4.2.2.2 Discussion: .................................................................................................................... 61
4.3 Research Question 3: ........................................................................................................ 62 4.3.2 Theme 2: Data Journalism and Data Visualisation: Importance, Reasons for Increased interest, Impact in Journalism Required Professional Skills ............................ 62
4.3.2.1 Findings: ........................................................................................................................ 62 4.3.2.2 Discussion: .................................................................................................................... 64
4.4 Research Question 4: ........................................................................................................ 65 4.4.2 Theme 3: Weaknesses, Limitations, Negative Aspects and Dangers of Data Journalism and Data Visualisation .................................................................................... 65
4.4.2.1 Findings: ........................................................................................................................ 65 4.4.2.2. Discussion .................................................................................................................... 67
4.4.3 Theme 4: Future Prospective and Challenges of Data Journalism and Data Visualisation ....................................................................................................................... 67
4.4.3.1 Findings: ........................................................................................................................ 67 4.4.3.2 Discussion: .................................................................................................................... 68
5. Conclusion .............................................................................................. 69 Meeting Objectives: ..................................................................................................................... 69 Evaluation of Methodology Approach ......................................................................................... 69 Key Findings: ............................................................................................................................... 70 Future Research Suggestions and Recommendations .................................................................. 72
Bibliography ................................................................................................ 73
Appendices ................................................................................................. 81 Appendix 1: Ethical (Application, Consent Form, Approval) ................................................ 82 Appendix 2: Qualitative Research Methodology - Interviews' Questionnaire & Transcripts 91
2.1 Indicative Interviews' Questionnaire ........................................................................... 91 2.2 Transcript of Interview with Jacopo Ottaviani ........................................................... 93 2.3 Transcript of Interview with Lisa Evans .................................................................... 106
5
2.4 Transcript of Interview with Paul Bradsaw ............................................................... 117 Appendix 3 - Content Analysis Methodology ...................................................................... 129
3.1 Code Frame, Limitations, Clarifications (Tables A-E) .............................................. 129 Appendix 4 - Quantitative Research Findings ...................................................................... 136
4.1 - Visualisations per Article (Table 1, Chart 1) .......................................................... 136 4.2 - Provision of Data Summary and Data Sets (or links to data source) (Table 2, Chart 2) ...................................................................................................................................... 137 4.3 - Authors by Number of Publications and Year (in descending order) (Tables 3-5, Charts 3-4) ....................................................................................................................... 138 4.4 - Articles Per Subject per Year (Tables 6-7, Charts 5-12) ......................................... 142 4.5 - Visualisation Types (Table 8, Charts 13-14) ........................................................... 147 4.6 - Visualisation Tools (Tables 9-11, Charts 15-16) ..................................................... 149 4.7 - Frequencies of Use of Tools, Types and Frequencies of Subjects per Author (in descending Order ............................................................................................................. 152
Author: Simon Rogers (Tables 12-14, Charts 17-19) ................................................................ 152 Author: Ami Sedghi (Tables 15-17, Charts 20-22) .................................................................... 155 Author: Mona Chalabi (Tables 18-20, Charts 23-25) ................................................................ 158 Author: John Burn-Murdoch (Tables 21-23, Charts 26-28) ...................................................... 161 Author: Lisa Evans (Tables 24-26, Charts 29-31) ..................................................................... 164 Author: James Ball (Tables 27-29, Charts 32-34) ..................................................................... 167 Author: Claire Provost (Tables 30-32, Charts 35-37) ................................................................ 170 Author: Katy Stoddard (Tables 33-35, Charts 38-40) ................................................................ 173 Author: Nick Evershed (Tables 36-38, Charts 41-43) ............................................................... 176 Author: Randeep Ramesh (Tables 39-41, Charts 44-46) ........................................................... 179 Author: Sarah Hartley (Tables 42-44, Charts 47-49) ................................................................. 182 Author: Kevin Anderson (Tables 45-47, Charts 50-52) ............................................................. 185
4.8 Types of Visualisations per Subject and Subjects of Visualisations per Type ........... 188 Visualisation Types per Subject: (Table 48, Charts 53-68) ....................................................... 188 Subjects per Visualisation Types: (Table 49, Charts 69-83) ...................................................... 198
4.9 Most Used Visualisation Types per Most Used Visualisation Tools and Vice Versa 208 Most Used Visualisation Types per Most Used Visualisation Tools (Table 50, Charts 84-93) 208 Most Used Visualisation Tools per Most Used Visualisation Types (Table 51, Charts 94-101) .................................................................................................................................................... 214
6
Index of Tables and Charts (in main dissertation body)
Tables Table No Title Page
1 Variables’ Coding Scheme 34
2 Intercoder Reliability Test Results. 38
3 Main Authors (Publications per year, Percentage of total publications) 43
Charts Chart No Title Page
1 Number of 1st, 2nd & 3rd Visualisation, Total Number of Visualisations 41
2 Provision of Data Summary and Data Sets (or links to data source) % of total 41
3 Main Authors (percentage of total publications) 42
4 Main Authors (Publications per year, Percentage of total publications) 43
5 Articles per Subject (Percentages) in total (all years) 44
6 Articles per Subject per Year (Frequencies) 45
7 Types of 1st, 2nd and 3rd Visualisation (Percentages) 46
8 Types of Visualisations (Percentages of total use) 46
9 Main Visualisation Tools' Use Per Year (Frequencies) 47
10 Total Use of Main Visualisation Tools (Percentage) in descending order. 48
11 Simon Roger's Use of Visualisation Tools (Frequencies) 49
12 Simon Roger's Use of Visualisation Types (Frequencies) 49
13 Subjects' frequency in Simon Roger's articles 50
14 Visualisation Type: 1. Interactive, per Subject (Frequencies) 51
15 Visualisation Type: 10. Map, per Subject (Frequencies) 51
16 Subject 1. Politics / Government / Public Administration, per Visualisation Type (Frequencies) 52
17 Subject 7. Society, per Visualisation Type (Frequencies) 52
18 Subject 3. Culture, per Visualisation Type (Frequencies) 53
19 Subject 2. Sports, per Visualisation Type (Frequencies) 53
20 Type 1. Interactive, per most important Tools (Frequency) 54
21 Type 7. Bar Chart, per most important Tools (Frequency) 55
22 Type 10. Map, per most important Tools (Frequency) 55
23 Tool 1. Tableau, per most important Visualisation Types (Frequency) 56
24 Tool 4. Google Fusion, per most important Visualisation Types (Frequency) 56
25 Tool 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per most important Visualisation Types (Frequency) 57
26 Tool 11. Graphic from External Source, per most important Visualisation Types (Frequency) 57
27 Tool 12. Datawrapper from External Source, per most important Visualisation Types (Frequency) 58
7
1. Introduction
1.1 Context and significance of the topic Data driven journalism and data visualisation are constantly growing in importance, their use by the media and in the number of people who specialise on them on a professional level. The majority of the important media and news networks nowadays feature on their webpages a variety of published articles, which reveal a story that is based on, proved or reinforced by the analysis of relevant data. This new era of journalism was significantly affected by a turn of governments and organisations to publically release data files. Journalists had now access to new sources of information that they could use to investigate various topics of public interest.
1.2 Controversies Data visualisation in data journalism is constantly evolving with the release of new visualisation and data processing tools and the combination of programming and coding languages with such tools. The generated results can vary from very simple to very complex, not only in terms of statistical calculations but also in terms of aesthetics. However, there is still much controversy on identifying the fine line that separates an effective and aesthetically nice visualisation from a very impressive one that could be uncomprehending for the reader. Another dilemma that data journalists often face is whether they should simply present the data to the readers in an understandable format, most frequently visualised, allowing them to interpret it their own way or if they should clearly present the conclusions they reached, influencing in a way the readers' perception (McGhee, 2010).
8
1.3 Brief synopsis of literature review The literature review provides definitions of the main key terms and a brief historical background of visualisation, followed further by an examination of its various types and some of the main tools used in the creation process. It then progresses to an analysis of the role of data, more specifically that of open data and its sources,
along with the new challenges that big data brings in public information and journalism. The final parts consist of literature on data journalism, the use of data visualisation in data journalism and specifically on The Guardian Data Store.
1.4 Rationale behind the choice of topic Recent studies and literature such as that of Segel & Heer (2010) provide a broad and detailed description of how and why data and information visualisation are used in such extent by the media. They describe the basic styles and tools and, sometimes, the procedure of the production of data visualisation behind an article and its story. However, there is a lack of studies that, through the systematic examination of case studies, can provide results about the different variables that affect the final result of visualization, such as, for example: the subject category of articles, which tools are used for each of the various visualisation types, the identification of a possible tendency to use more specific visualisation types and tools in some subject categories. This study tries to do that in a small scale, setting the foundation for other more extensive similar studies in the future.
1.5 About this research The reader should bear in mind that this research examines a small sample of the portfolio of data journalism articles published by one main source, The Guardian Data Store. Nevertheless, this study could serve as a pilot for future studies of
entire collections from multiple sources, not only from that of the current case study. Such research, in combination with taking into consideration other factors (such as people's perception of such articles and their visualisations), could help to
9
not only find that balance, but also to identify creative patterns of current use and possible tendencies for the future.
1.5.1 Research aims, questions and objectives
Aim: This study primarily aims to examine the role of information and data visualisation in journalism by examining the case of the biggest data-journalism portfolio in UK, The Guardian’s Data Store.
Research Questions: As this is a study that implements inductive methodology, there is no initial hypothesis to prove or disprove but rather an effort to explore and try to answer
some central questions on topic. The main questions that this study wishes to answer are:
§ Which various types of visualisation and tools used can be recognised in the portfolio-case study? Are any norms or patterns of them identified?
§ Which is the creative process behind an Infographic created and/or hosted by The Guardian Data Store? In more detail:
o Which is the creative process step by step and who are the decision makers?
o Which data types are the most broadly used and how is data selected and gathered?
o Which are the most important tools used in the process either for data processing and analysis or for visualisation?
§ How do journalism professionals perceive data and information visualisation in terms of value and effectiveness?
§ Which are the possible weaknesses, limitations and the negative aspects of data and information visualisation?
10
Objectives: Answering those research questions will help meeting its objectives, which are to identify:
§ How data and information visualisation is used in journalism and why is its
use constantly increasing § Which are the required skills and knowledge in order to work on data
visualisation on a professional level § Which is the importance of data and data visualization, as perceived by the
professionals § Which are the possible limitations, weaknesses and negative aspects or
impact of data journalism and information visualization § The various tools used either in data analysis (and possibly formulation /
editing) or in visualisation, and more specifically by The Guardian § Possible tendencies, norms, co-relations on Guardian’s portfolio, mainly
regarding subject, visualisation type and tools
1.5.2 Methodology This study employs both qualitative and quantitative research methods in order to manage to answer effectively the research questions, as choosing only one of those two approaches would lead to inconclusive or unclear results. The qualitative research method is a thematic analysis of interviews with professionals on the field and the quantitative method consists of a systematic content analysis of a sample of the articles published on The Guardian Data Store.
1.6 Dissertation Structure
This dissertation is divided in five chapters. The first (and current) one, the second that consists of literature review, the third which analyses the methodology behind the study, the fourth that presents the findings of the study and examines them in relation to the literature review and the fifth which serves as a conclusion. The three
11
appendices present the ethical approval of the study, and the analysis and results of both research methods applied, transcripts for interviews and coding scheme and statistical results of the systematic content analysis.
1.7 The Guardian Data Store
Since its first official publication in January 14th 2009, under the supervision of Simon Rogers, The Guardian Data Store on its Data Blog has published more than 3000 data-driven articles, the greatest portfolio of its kind in UK. The articles were created both by journalists of the organisation and freelance data journalists. In the majority of those articles at least one element of visualised data is provided, either created by The Guardian Graphics' team, by the author of the article with the use of various visualisation tools and applications, by a freelance designer for The Guardian or by other external sources whose creations were hosted in articles in the Data Blog.
2. Literature Review
2.1. Terminology Terms such as "infographics", "data visualisation" or "information visualisation" are
steadily becoming more and more popular in the media. In literature one can find
various definitions for each of those terms. It was decided, though, not to adopt
specific definitions but rather to compose new ones, deriving from the overall
literature study.
Information graphics, or infographics, could be defined as the visual representation
of data, information or knowledge. They combine the use of graphics and text aiming
to present the available information and data in the clearest, most understandable
and memorable way. This is the reason that in the majority of the times they visually
present selected important parts or summaries of the available data sets or selected
pieces of information, with the ultimate goal of delivering the story hidden in the
12
data. The creative process of information graphics is called Information
Visualisation.
Data visualisation is the process of gathering, filtering, analysing and visualising
data to provide a final outcome for the target group (Kramer de Oliveira Barros &
Araujo Bertoti, 2012). It is a more narrow term than that of information visualisation,
as the object that is being visualised is usually specifc data set(s). In data driven
journalism this ultimate goal is that this outcome will support or present the story
behind the article.
2.2 Brief summary of the History of Visualisation
Humans have expressed their need to tell a story or to visualise information since
the early years of human presence. Cave paintings in the Paleolithic era are
considered the people's first effort to tell stories (Mol, 2011) and show the way they
hunted or their perception of the spiritual world. Even before 1000BC, ancient
civilisations such as Greeks, Babylonians, Egyptians and Chinese, tried to visually
present planetary movements, created the first maps that served as navigation
guides and made the first regional planning drafts (“A Quick Illustrated History of
Visualisation,” n.d.).
Philosopher Ramon Llull (1232-1315) was the creator of the first knowledge trees to
portray in the form of a diagram the relationships between terms or concepts. Nicole
Oresme, in 1350, conceptualised the first bar chart and Abraham Ortelius changed
the course of chartography forever, when in 1570 he created the first modern Atlas.
(Friendly & Denis, 2001).
Mathematician J.H. Lambert (1728-1777) and politician William Playfair (1759-1823)
are the two people who established the era of modern visualisation. They were the
first to publish time series graphs that visualised economic data in graphs rather
than tables, which was the usual tactic until then. In this way the reader could shape
the data and make easier comparisons of its values in different times. They also
introduced the first bar charts, pie charts and histograms in the form that is known
today. French engineer Charles Joseph Minard introduced the concept of narrative
13
graphics of space and time where he combines a time scale and a data map to
portray the continuous losses during Napoleon's campaign (Tufte, 2001).
Today, computers and specialised software allow people to create very advanced
and complex graphs, either static or interactive, in a relatively small amount of time
and with great precision. The advances in software and visualisation tools along with
the increase of use of Open Data (Simonite, 2012) and social media's easy sharing
options have definitely contributed to the increasing popularity of infographics,
especially by important media network.
2.3 Data and Information Visualisation
2.3.1 Principles of Good and Effective Data Visualisation Tufte (2001) defines Graphical Excellence as "A well-designed presentation of
interesting data, a matter of substance, statistics and of design...It consists of
complex ideas communicated with clarity, precision and efficiency... It provides the
viewer with the greatest number of ideas in the shortest time...It is nearly always
multivariate... and requires telling the truth about the data".
The power of data visualisation is that it allows the viewers to see "insights" that
would not have been visible if they were only provided with numbers (Smiciklas,
2012). Data is definitely the key, and the essence of data visualisation is the story
that it represents. However, for some, data visualisation is also considered an art
(Landman, 2013). The aesthetic aspect, undoubtedly important, however, in no
chance it should surpass in importance and priority a good data analysis. A good
data analysis is the alpha of an effective and understandable representation. The
main elements of data visualisation should initially be "structure, precision, integrity,
depth and functionality" and secondly "decoration", if that is necessary (Cairo,
2012). Simplicity, however, is the key. Colors, patterns, font alternations should be
used mainly to "convey information and not for decoration" (Wong, 2010).
14
2.3.2 Visualisation Types
There are many ways to classify the different types of visualisation. That said, one of
the most significant ones is differentiating them between static and interactive.
Static are considered the printed visualisations or those online that would look the
same or almost the same if they were printed. The reader is not requested to
participate in any other way in order to see the final result of the visualised data.
Interactive visualizations, on the contrary, usually involve motion or active
engagement of the reader/user who can, for example, select specific fields to filter
the data results or can actively choose the depth of the information they wish to
receive. This is a focused and more detailed data representation, alternatively
known as drilling down (Murray, 2013), which usually manages to capture more the
attention of the reader.
Although defining an infographic as static or interactive is essential, it would be only
a primal description of it. There are many other types of categorisation and
subcategorisation depending on the infographic's morphology and the purpose it
serves. For example, according to Bounford (2000), graphics can be classified to
those who are used for i) illustrating and storytelling, and ii) for statistical
representation. In the first category are usually included graphs such as: symbols,
pictorials, relational diagrams, time diagrams (timelines) and organisational
diagrams. In the second category, the types most frequently used are: tables, line
graphs, scatter graphs, bar charts, area charts, volume charts and combined charts.
2.3.3 Data Processing and Data Visualisation Tools
There is a great variety of data processing and data visualisation tools for anyone
who is interested in the field. The available options can vary from totally free
downloadable or online applications to very expensive creative platforms or
database and content management system plugins used by relatively big
organisations.
15
Some of the most popular tools are (Entry-level tools Online visualisations,” 2012 ;
Top Ten Tools for Data Journalism,", 2013; Halevy & Mcgregor, 2012; Barkai, 2013;
Rogers 2011):
1. Tableau and its free version Tableau Public: One of the most popular and
advanced data visualisation platforms, allows multiple layering of data, a
quality that makes it very effective for interactive visualisations (“Tableau
Software,” n.d.).
2. Many Eyes: One of the first free experimental web applications, created by
IBM, that produces advanced visualisations, static and interactive, which are
then hosted on its site. The users can thus browse and see archives of
visualisations created by others. It was the inspiration for many other tools
later developed, such as Tableau and Google Fusion. Unfortunately, this
application has not been substantially updated and has started losing ground
(“ManyEyes Visualisation Experiment,” n.d.).
3. Google Refine: Google refine was a refining and restructuring tool for data,
powered by Google. It is now called Open Refine (“Google Refine,” n.d.)
4. Google Fusion: A web based experimental application of Google for the
process of spreadsheets and the creation of graphs and maps, including
interactive ones. One of the most preferred tools, especially by data
journalists. (“Google Fusion Tables Experimental Application,” n.d.)
5. Datawrapper: Easy, simple and effective free data visualisation tool for the
creation of charts that can be either hosted in the service or self hosted in
the user’s website (“Datawrapper Software,” n.d.)
6. CartoDB: An online application for the analysis and interactive visualisation
of Geospatial Data, offering multiple layering data editing and display, along
with advanced css editing, html coding, database connection and query
execution (“CartoDB: Geospatial Data Visualisation,” n.d.)
7. ScraperWiki: A free web based tool, frequently used to clean, refine and
analyse data, although it additionally offers visualisation and extra coding
options (“ScraperWiki,” n.d.)
8. Wordle: Free web based text processing application used for the creation of
word clouds (“Wordle,” n.d.)
16
9. Adobe Creative Cloud (Suite): Adobes' popular programs for illustrations,
photo editing, animation, video and interactive applications (“Adobe Creative
Cloud,” n.d.)
10. Prezi: An online visual presentations' creating tool that can be used to create
storyboards for animated story telling or information presentation (“Prezi
Virtual Presentation Whiteboard,” n.d.)
11. BatchGeo: A cloud based map making application, easy and simple to use
(“BatchGeo,” n.d.)
12. Other visualisation and data refining-processing tools such as Tabula
(Bounegru, 2013) Crystal, Geotime, Dreamweaver (Ostergren, Hemsley,
Belarde-lewis, Walker, & Hall, 2011), Circos, Timeline, Protovis,
DataWrangler (“Data Visualisation - Selected tools,” 2013) and Visual.ly
It is very important to mention that despite the great available variety of tools, the
use of coding languages and scripts, such as JavaScript and Python, are inevitable,
especially in cases of complex data sets or data that gets updated constantly.
Coding and scripts, such as those offered by D3.js, allow more specific and
customised visualisations, according to the exact needs of the project and the
wished of the creators (Murray, 2013).
2.3.4 New Tendencies in Data Visualisation Despite this classification, and due to the creativity of graphics' teams and the
advances in the designing and data processing and visualisation software, many of
the graph types which are more often used for illustrating and storytelling can be
also used for data representation and statistics and vice versa. There are no
limitations to the possible combinations as there are also many more new types of
infographics that have recently emerged. Designers, statisticians, data experts and
researchers have cooperated in the designing and creative process of new
innovative software applications and tools that process data through advanced
algorithms and produce functional visualisations of high aesthetic standards and
using rich colour palettes, rich shapes and patterns, beautiful symbols and fonts.
17
Visual quantitative representations of words with the type of word clouds, advanced
networks, arc diagrams, area groupings, centralised bursts and rings, circled globes,
circular ties, elliptical implosions, flow charts, radial convergence graphs, radial
implosion graphs, ramifications and scaling circles are some of the newer designing
tendencies in visualisation (see the form of some of them in the above pic) and they
usually portray relations, connections, hierarchies of data and information elements
and values. All the above diagram types can be either static or interactive and can
represent static data sets or dynamic data that is being constantly updated, resulting
to a continuous change of the graphs (Lima, 2011).
Image Source: Lima, 2011, p. 158.
New tendencies in information visualisation broke the barriers of visualising data or
text. A new type of graphs-free visualisation, "direct visualisation", based on already
visualised material (images, videos) has emerged. While many do not consider that
18
as information visualisation, this new creative type of it will find much use in the field
of education, research and humanities, where displaying full detail rather than
graphs is crucial (Manovich, 2011).
Additionally, new emerging scientific fields such as bioinformatics, or social media
analytics, have employed visualisation to portray their research findings and
statistical calculations and have engaged more designers and researchers to create
more creative representations and more effective tools (Heer, Bostock, &
Ogievetsky, 2010).
2.3.5 Challenges and Controversies in Data Visualisation
2.3.5.1 Raw versus Aggregate Data
Having to build a data set from the beginning through data collection, data scraping
and mining could be very challenging and relatively time consuming. Its advantage,
compared to aggregate data sets is that it is built from the start in the most
convenient form for the data analyst prior to the data processing. Additionally, the
need for data clearing is minimal. Aggregate data sets usually need clearing and
reformatting before progressing to any visualisation actions, which in the case of a
bad original data set, could be very time consuming as well. Furthermore, it is
essential to ensure the credibility of the source and to be certain that the population
or sample are sufficient for the desired analysis (Ward, Grinstein, & Keim, 2010) and
that the cost of obtaining the data, if not provided free, is within the limits of the
project budget (Hox & R., 2005).
2.3.5.2 Avoiding nonsense A creator of data visualisation will very frequently need to combine data sets form
different sources, which most possibly come in different formats. The first essential
step is to create a unified data set whose variables, values and scales will have
unified structure and format so that any visualised comparisons and relations will
make sense to the reader. The second important step is to decide which
comparisons and relations are logical and actually portray something meaningful
19
and interesting for the reader and are not trying to lead to wrong assumptions (Ward
et al., 2010).
2.3.5.3 Strange Visualisations: How much is too much and what is considered
as Bad Visualisation?
There have been a lot of debates regarding the acceptable level of complexity of a
data visualisation. In an interview for the project "Journalism in the Age of Data" of
professor McGhee (2010), Alberto Cairo mentions: " Unfortunately informatics is
something that is usually dominated by fashion. The fashion that is winning now is
strange visualisations". There is no clear line or definition of what is a bad
visualisation and opinions vary in a great extent. Apart from the obvious reasons
that could make a visualisation bad and potentially misleading, such as scale
distortion, unclear lines and colours, it is generally agreed that very complex or
strange visualisations fail to communicate the story, as they tend to be
incomprehensible. They violate a basic principle of effective visualisation, which is
simplicity (Ward et al., 2010), despite the fact that they might initially capture the
viewers' attention.
2.3.5.4. Cultural Bias in Data Visualisation and Objectivity Data visualisation creators, especially when their readers are international, need to
take into consideration that many visual elements, such as colours, text or symbols,
may have different significance in different cultures, a fact that may jeopardise the
people's perception of the visualisation. It is advisable to review and slightly
customise, if necessary, each visualisation according to the cultural background of
the target group in question (Schaap, 2012).
Furthermore, there is no such thing as "objective" or neutral data visualisation (Hohl,
2011) due to the human interference in each step of the process. Therefore,
according to Ball (2013), it is necessary to achieve balance between analysis and
20
presentation, in order for their readers to feel that it makes sense and that they can
trust their infographic.
2.4. Data Journalism Data-driven journalism started taking its current form since the mid 2000s when the
most important newspapers and other independent news organisations, especially
in U.S and U.K, like The New York Times, The Guardian and ProPublica, created in
their offices teams of journalists with knowledge on data and computing. Those
teams create interactive maps and other visualisations and presentations using
computer applications that "collect, process, analyse and visualise data sets"
(Parasie & Dagiral, 2012).
Notwithstanding, until recently, journalists lacked the ability to work with data. This
was the main obstacle that prevented them from working on data related projects.
(Aitamurto, Sirkkunen, & Lehtonen, 2011). The recent focus on data journalism and
its significance and potential is clear in the following statement: "Data-driven
journalism is the future", by Sir Tim-Berners-Lee, inventor of the World Wide Web.
That is because the possibilities and the available options in data processing,
visualisation techniques, programming languages and data, especially open data
and open government data, are endless (Arthur, 2010).
The aim of Data Journalism is not to just provide the data and the statistics but also
to tell a story through them focusing on people. "Stories are told about people and to
people" mentions Paul Bradshaw (as quoted in Marshall, 2012). The most significant
quality of Data Journalism though, is that it enables journalism, especially
investigative journalism, to reach deeper according to the investigative reporter
Diana Priest (McGhee, 2010). In its very essence it is a matter of democracy as it
can be used as one of the main "weapons" that people and journalists can use to
hold accountable politicians and governments (Cohen, Hamilton, & Turner, 2011).
Despite all the advantages of data journalism, there is one potential risk that
journalists should bear in mind. They must not forget that they will still need to
search for the human side in the story and not to get lost in data. With the increasing
interest on data in all its forms, inevitably many more people, bloggers and most
21
importantly reporters will turn to it and after obtaining certain skills could be able to
manage data very well and come to useful findings. It is necessary for them though,
especially for reporters, not to forget that it is the story that matters first and that they
will still need to synthesise various pieces of information and not to find themselves
overwhelmed by data (Oliver, 2010). Additionally, even if they become very good at
data management and analysis, there will be times that data sets could be so
complicated and large, as for example in the case of the Wikileaks War Logs
(Rogers, 2010), that it should be managed and analysed by or with the help of
experts in order to be reshaped into a useful and more understandable, for both
readers and reporters, format.
2.4.1 Data and its Challenges One of the greatest challenges for a data journalist is obtaining the data, its original
format and the cost of obtaining or collecting it (Aitamurto et al., 2011). Data,
nowadays, can be found in various forms and from various sources. Data journalists
can either gather primary data or can find or acquire secondary data. It can be
scrapped from the Internet with the use of coding and programming (Cohen et al.,
2011), or it can be gathered through crowdsourcing, through subscriptions or survey
carrying.
Data and structured information sources for a journalist might be many, however,
that does not mean that the data would be "ready to use". Many times, refining,
filtering and rearranging are essential in order for the "dirty data" (Halevy &
McGregor, 2012) to be reliable for using and analysing.
2.4.2 Open Data and Crowdsourcing Open Data changed the landscape of data and information management but also of
journalism, politics and communication. It also has changed the landscape for
citizens as well. In quest for transparency, in 2006, The Guardian launched the
"Free Our Data" campaign (“Free Our Data Campaign,” n.d.). In 2010, David
22
Cameron announced the publication of a variety sets of database both by the
government and the local authorities (Oliver, 2010).
Open Government Data are available online for free and in various formats so that
all people are able to have access, and under licence that allow re-use (Davies,
2010 ; Joel, 2011 ). For journalists though "using open data means republishing it in
a different, consolidated or curated format, or in a way which makes it easier to
explore and make sense of" (Leimdorfer & Thereaux, 2012). Journalists re-use,
reshape and combine different data sets that they then provide to the public along
with the relevant visualisation that usually completes their articles. This process of
disseminating data to the public is one of the four reasons for which journalists use
open data. The other three reasons are: "i) To discover newsworthy facts or stories,
ii) To discover trends hidden in large datasets, and iii) to create data visualisations"
(Kronenburg, 2011).
Nevertheless, it is not only data journalism that benefits from open data. The
benefits from all this process are great since open data is benefited as well from
Data Journalism in two ways: i) its value increases through visualisation, and ii) in
various cases journalists participate in the creation of open data sets (Kronenburg,
2011).
Crowdsourcing has the advantage of time saving as many people participate and
collaborate in a quick research for data collection that would otherwise take one
researcher a much longer time to complete. People's comments on The Guardian's
MP expenses released data set led journalists to further investigations and to the
creation of more related stories (Flew, Spurgeon, Daniel, & Swift, 2012).
2.4.3 Big Data
There are various opinions as to what Big Data is. The common dimension in all of
them is that Big Data is the great amount of information and data that constantly
grows. All this great amount of information and data is collected with the use of
advanced algorithms. Algorithms are often programmed to extract, process and
transform data and information that do not come in traditional forms, such as photos,
23
text, video and audio files. According to Dah Gardner though, Big Data is much
more than its size; "It is the ability to extract meaning: to sort through masses of
numbers and find the hidden pattern, the unexpected correlation, the surprising
connection" (Smolan & Erwitt, 2012).
The speed and size rate at which data is generated by the humanity are so high that
it is difficult to conceptualise with our human mind. Just the data produced by social
media in a daily basis is enormous and very complicated to process as in a great
degree this would mean processing and analysing online human behaviour and
expression (Mahrt & Scharkow, 2013). Similar challenges are faced in data
produced by big digitisation projects and they can be tackled quite successfully up to
a certain degree with the help of crowdsourcing (Smolan & Erwitt, 2012). Yet, as the
amount of data constantly increases, data management and processing systems
and tools are also becoming more effective and more advanced in a need to comply
with the process of data that could even have the size of exabytes (“Big data needn’t
be a big headache: How to tackle mind-blowing amounts of information,” 2012).
However, even if big data is processed successfully in terms of statistical analysis,
this does not mean that the numbers will definitely be right. One of the great
challenges of big data is that as any data set created by humans, they cannot be
totally objective and they should only be examined, evaluated and considered only if
they are seen in the greater sociological context of the people and the place(s) that it
was generated from (Crawford, 2013). Another challenge is the constant need for
more advanced data management tools and systems which will also be cost
effective (Buhl, Röglinger, Moser, & Heidemann, 2013).
Big data can be a very important source of information, especially in financial
journalism, as it can help reporters monitor companies, organisations and the
government for legal or ethical violations. However, here too data should be used as
a tool and not as the aim. It is important to question the results of the data analysis
are right and to check the facts with the help of specialists and sources that can
provide insights to the story (Marshall, 2013).
24
2.5 Data Visualisation in Data Journalism The media use data and information visualisation in order to provide their
users/audiences with a visual representation of the information and/or data they
believe that support a good story. Apart from the printed or TV-Broadcasted
infographics, almost all leading media organisations host entire portfolios of
infographics on their websites, either in simple, static, forms or in interactive form.
"Data can be both the source of data journalism, but it can also be the tool that the
story is told" (Bradshaw, 2012, cited in Grey, Chambers, & Bounegru, 2012)
Therefore, a journalist can either find during data analysis a story that is worth
telling, or can back up existing or emerging stories with valid arguments that derive
from data analysis. The "evidence" might be there but without a convincing analysis
to support it, it may remain vague and unnoticed. Good data visualisation is the way
to prevent the story from being unnoticed, as "an infographic should provoke
thought" according to Steve Duenes (Losowsky et al., 2011).
In the project "Journalism in the Age of Data" (McGhee, 2010) several professional
graphists and data journalists from leading media organisations were interviewed
about the challenges, the basic principles and the required skills of data journalists,
on individual and team level. Regarding the principles of visualisation and data
journalism, John Grimwade stressed the importance of clearly telling a story, under
the condition that principles of graphics apply, and not just to spin off numbers.
Referring to the required professional skills, most interviewees agreed on the
importance of collaboration between different specialties, since it is almost
impossible for one person to be able to do it all. Therefore, agreeing with Weber and
Rall (2012) for the "need of speaking the same language", understanding, if not
having knowledge, of statistics, coding and graphics design is essential for data
journalists, so that collaboration with the specialists in each field can run smoothly.
Professor Michael Stoll added the need for some basic knowledge in social
sciences.
Of the challenges mentioned in the project, Paul Steiger from ProPublica was
skeptical as to how far can accessibility and openness go, a concern that agrees
with Stolte (2012) who mentions that legal and ethical data collection (especially
25
through internet scraping) and reproduction/distribution that would respect personal
privacies need to be a prerequisite of good data journalism.
Another important challenge is the time restrictions and deadlines of newsrooms,
especially during breaking news. Accuracy, integrity and credibility always should
come first before speed and visual aesthetics (Weber & Rall, 2012). Steve Duenes,
graphics' director of New York Times (“The New York Times: Multimedia,” n.d.),
believes that it is crucial "to have people physically close to the story" despite of how
good the graphics can be. However budget limitations and "shortsightedness" in
some newsrooms about the role of visual journalists do not usually make that
possible (Losowsky et al., 2011).
2.5.1 Workflow in Data Journalism
The usual steps in the creative process of a data driven article are (Aitamurto et al.,
2011):
1. Identifying the potentiality of a story and how data could contribute to it
2. Finding and gathering the appropriate data sets for the research
3. Clear, correct and reform data if necessary
4. Analyse and combine data sets
5. Writing the story and creating the relevant visualisations
6. Publish the relevant data sets together with the story and the visualisations
7. Invite and challenge the readers to reuse the data and share the stories with
others through social media
In the case study content analysis of Giardina & Medina (2012) on the workflow of
the infographics department of The New York Times, it was discovered that the
"available graphic tools" and the "adopted reporting processes" are two of the main
factors that influence this workflow.
2.6 The Guardian Data Store The Guardian uses data visualisation to portray some of their main articles. While
The New York Times is famous for the very high aesthetic value of their
visualisations, The Guardian, usually, uses simpler types of visualisation. However,
26
The Guardian Data Store is one of the world's richest in story-variety and most well
respected data journalism portfolios in the world since it's launch in 2009.
Simon Rogers (2013), founder of The Guardian Data Store, highlights some of the
following principles of good data journalism:
§ It's "all about the story"
§ Provide the key data people need
§ "Make it personal"
§ "Engage": always put the data file of the visualisation on article and make all
data accessible when possible
§ Simplify and share with readers complicated and bid data sets
§ Continuous promotion of "Open data movement”
§ Anyone can do it if they concentrate on what they can do best and
designate, if necessary, the rest to other specialists in their fields
3. Methodology An inductive research approach of mixed qualitative and quantitative methods (Salmons, 2010) was chosen as the most suitable for this study. Although there were other options, the combination of semi-structured interviews with a thematic analysis of those (qualitative approach) and a systematic content analysis of a sample of articles of The Guardian Data Store (quantitative approach) were considered the most appropriate to answer the research questions. The joint power and complementary nature of the advantages of each method (Dawson, 2009) would help answer the research questions.
There are various alternative research approaches a researcher could take with such a great source of material, like the articles of The Guardian Data Store. Some of those can be found as future research suggestions in the Conclusion chapter. It is essential though to mention that the differentiation of the research method approaches would depend on the kind of questions the researcher would wish to answer. For example, if the researcher would like to examine how readers perceive those articles and the grade of understanding they have of them and particularly of
27
their visualisations, then a quantitative research method on a number of readers with the use of questionnaires in combination with a qualitative research method such as a focused observation group of readers, possibly along with interviews, would be an good approach. Other possible studies are suggested on the Conlusions' chapter.
3.1 Ethical Approval
The proposal of this study was examined by The Information School Research
Ethics Panel and was evaluated as 'Low Risk'. The study was ethically approved by the panel as it was found to be in accordance with the University of Sheffield’s
policies and procedures.
3.2 Qualitative Research Some of the main qualities and, in most cases, advantages of the qualitative research is that it examines issues and phenomena, within an inductive research approach, in a broader way, trying to describe, understand and in some cases explain them from an internal point of view in various ways:
§ By focusing on the opinions and experiences of individuals, either personal or professional, on case studies and examining their knowledge (Burns, 2000)
§ By observing and/or testing actions, interactions and communications while they take place, and then analysing the data selected from this process (Kvale, 2007)
§ By examining items, archives and material such as images, videos or documents that could contain useful information of such nature (Kvale, 2007)
Qualitative research, in the majority of the cases, is not carried out with a background of pre-defined concepts and hypothesis. On the contrary, hypotheses are usually absent from this method and, in the rare case they are used, they are formed and structured during the procedure along with other various concepts
28
nature (Kvale, 2007). This is exactly what this inductive study was designed to do, as there was no pre-defined hypothesis to verify. For this research, gathering experience, opinions and knowledge of professionals of data journalism and data visualisation and understanding their perspectives (Burns, 2000) was considered highly critical in helping answering the following research questions: Which is the creative process behind an infographic created and/or hosted by The Guardian Data Store? In more detail:
§ Which is the creative process step by step and who are the decision makers?
§ Which data types are the most broadly used and how is data selected and gathered?
§ Which are the most important tools used in the process either for data processing and analysis or for visualisation?
§ How do journalism professionals perceive data and information visualisation in terms of value and effectiveness?
§ Which are the possible weaknesses, limitations and the negative aspects of data & information visualisation?
It was decided that in-depth interviews with professionals that currently work or have worked in the past or who have published on The Guardian Data Store in free-lance basis were the most effective way to gather such data. In-depth interviews is a qualitative research method were the researcher tries to collect from the interviewee, in the form of a conversation, information and data on their insight, their point of view on various issues, their personal experience and/or feelings on an different topics. The approach is not to "put things to someone's mind"
(Hannabuss, 1996) but rather to let them unfold their perspective. More specifically, interviews were considered highly effective in providing deeper understanding on specific aspects of the study. The collected data from the interviews would be used to understand better the subject of the research and to help answer research questions (Salmons, 2010), and shed light and provide to blur or complicated issues such as:
29
§ How data and information visualisation is used in journalism and why is its use constantly increasing?
§ Which are the required skills and knowledge in order to work on data visualisation on a professional level?
§ Which is the importance of data and data visualisation as perceived by the professionals?
§ Which are the possible limitations, weaknesses and negative aspects or impact of data journalism and information visualisation?
Providing answers to the issues mentioned above would significantly bring the research closer to meeting some of its main objectives.
3.2.1 Design and execution of interviews
3.2.1.1 Profile of Interviewees1
Jacopo Ottaviani: Freelance data journalist and with strong technical background in programming. His data journalism work is often portrayed in the popular Italian news site "Il Fatto Quotidiano".
Lisa Evans: Former data researcher for The Guardian with a special interest in statistics, has written or co-written 139 articles. She is currently working for the Open Knowledge Foundation.
Paul Bradshaw: Award-winning online journalist, author of the "Online Journalism Handbook", Course Leader for the MA in Online Journalism at Birmingham City University and visiting professor at City University, London.
1 Links to the profiles of each intreviewee at The Guardian Data Store and to personal web
pages or blogs are provided in the References
30
3.2.1.2 Interviews' Preparation and Conducting In order for the research to be ethically reliable (Salmons, 2010), a consent form that described how the interviews would be conducted, how the data would be recorded and who would have access to it, was created for the interviewees to read and verify, prior to the interview, that they agree to its terms. All three interviewees agree to the terms of the consent form, either by signing it digitally, or in person or by replying to the email that contained the consent form that they had read it and that they agree to its terms. The interview with Mr. Ottaviani was held through a Skype video call on July 15th, 2013. The interview with Mr Bradshaw (in person) took place at his office in Birmingham City University on July 15th, 2013 and the interview with Mrs Evans was conducted through a Skype audio call, on 2nd August 2013. All interviews, with the permission of the interviewees, were audio recorded in a digital recorder. The audio files were stored in a personal computer, with no access to third parties. Additionally, all interviewees agreed not to be anonymised and did not wish any part of the interviews to be omitted from the research. The interview questions were designed in a semi-structured form because this type of interviews allows flexibility but also helps maintain a better control of the procedure. The questions were a combination of open and closed format, which required a good balanced set of questions in order to allow the respondent to express their opinion but also for the replies not to be very time consuming (Walliman, 2011). At the beginning of the interviews a converging-question approach was followed, were the respondents were asked more general questions (Thomas, 2003). Although the interviews were designed in semi-structured form, they were at some points conducted as open-ended when that was possible, mainly when the
informants were providing an insight and a description of their experience on specific cases. In such moments the researcher hands the reigns of the interview to the interviewees, allowing them to express themselves in greater freedom and more naturally (Burns, 2000). Looser forms of interviewing, with semi-structured or open-ended questions provide a great environment for a response-guided approach,
31
were the interviewer can instantly create follow up questions based on replies given by the informants on initial questions. This enables the researcher to focus in detail on the respondents' opinion on issues that were related or derived from the initial question (Thomas, 2003).
3.2.1.3 Data Collection and Processing
All interviews were transcribed in verbatim form word-by-word (Kvale, 2007) (transcripts available in Appendix 2), checked and then careful notes were taken for each one and then their data was processed with the method of Thematic Analysis. Thematic analysis is the identification of patterns and main key themes through the careful examination and basic coding of the extracted data. Key themes provide a strong connection to the research questions and are broader than codes, which primarily identify connections between various data elements. It is a method that allows flexibility and is relatively easy to implement for inexperienced researchers
(Braun & Clarke, 2006).
3.2.1.4 Limitations and disadvantages of interviewing The interviews were designed to be conducted either face-to-face, or through a Skype call or video call depending on the respondents' preference and availability, but also on some other limitations, such as a great distance between the researcher and the interviewee or time scheduling and budget limitations, where a trip to conduct an interview face to face would either consume too much time or would have a high cost. However, in the case where an interview was conducted through a Skype video call, the result was very similar to that of a face-to-face interview. A final limitation of this method is that many of the interview requests sent to the selected contacts can be and were, in this particular survey, ignored, despite the repeated efforts of communication. One of the disadvantages of interviews is that transcribing them can be very time-consuming. Additionally, it could be some times difficult for the researcher to maintain objectivity (McNeil & Chapman, 2005) and carry on a bias-free interview.
32
Therefore, a careful questionnaire preparation and testing was carried out prior to each of the interviews.
3.3 Quantitative Research: Although qualitative research helps provide answers for a number of the research questions set, it is insufficient and inadequate to provide all answers. The Guardian Data Store has now more than 3000 articles published on its Data Blog since its first publication on January 14th, 2009. All those articles contain raw data that can be only gathered, refined and processed through a quantitative research method, more precisely with the method of Systematic Content Analysis. The exact number of articles collected and examined in the study and the time frame they cover is analysed in detail in part 3.3.1. Systematic Content Analysis can vary from very basic to extremely complex. With
the continuously increasing available number, size and types of data sources, especially those available in an electronic and digital format, a great number of research techniques arose and more effective tools were built. Additionally, it is now more frequent that the majority of data sets are created or processed through the collaboration of more than one researchers (K. Krippendorff, 2004). However, although these techniques and tools help handle larger amounts of data than before and help diminish the duration of the process, Systematic Content Analysis is considered very time consuming and its results can still be altered by defective material sources (Devi, 2009). For this research though, this method is the main way that a researcher could find some answers to the following basic research questions:
§ Which various types of visualisation and tools used can be recognised in the portfolio-case study? Are any norms or patterns of them identified?
The findings of the content analysis can also complement the findings of the qualitative methods specifically for the case study of The Guardian Data Store. The aim of the quantitative research methods is mainly to help meet some of the main objectives of this research, which are to identify:
33
§ The various tools used either in data analysis (and possibly formulation / editing) or in visualisation, and more specifically by The Guardian.
§ Possible tendencies, norms, co-relations on The Guardian’s portfolio, mainly regarding subject, visualisation type and tools
3.3.1 Design and Implementation of Systematic Content Analysis On the A-Z section of The Guardian Data Blog one can find and download in the form of a spreadsheet, a complete index of all the published data sets and articles. More specifically, since the first article published on 14th January, 2009, until 30th July, 2013, when the articles were collected for the study, this spreadsheet consisted of 2959 articles. The spreadsheet contains details such as the hyperlink to each article, the date and time of its publication and its title. Those 2959 articles were defined as the original population of the quantitative research. Of those 2959 articles a sample of approximately 10%, 295 articles, was selected through the method of Systematic Sampling. Systematic sampling is one of the most frequent methods used in statistics in order to select a specific number of members or items as a sample population from a much larger number of the original population. A random starting point was set as that of the 10th in order
article of the original spreadsheet with a pre-defined fixed, periodic interval of ten articles. Therefore, the articles selected were the 10th, 20th, 30th... etc., up to the 2950th which was the final one. After the sample was selected, it was noticed that some of the dates of the articles were either missing from the table or were in the wrong format. They were corrected after examining each of those articles (about 10 in number) and then the basic sample spreadsheet was ready. Each of the sample articles were classified according to a Code Frame, based on Vis (2012) and Lotan, et al (2011). A clear defining of variables, objective procedures of coding and categories is essential (Mayring, 2000) for a scientific research method as they help increase its level of objectivity (Prasad, 2008).
34
The research focused on the following 15 variables of each article that were classified: Table: 1. Variables’ Coding Scheme
Variable
Code Name
Variable Description
Var1 Year of Publication
Var2 Number of visualizations
Var3 Author of article
Var4 Subject Category
Var5 Existence of Visualisation Number 1
Var6 Existence of Visualisation Number 2
Var7 Existence of Visualisation Number 3
Var8 Type of Visualisation Number 1
Var9 Type of Visualisation Number 2
Var10 Type of Visualisation Number 3
Var11 Tool for Visualisation Number 1
Var12 Tool for Visualisation Number 2
Var13 Tool for Visualisation Number 3
Var14 Existence of Data Summary
Var15 Existence of Data Set
Although The Guardian provided the date and time of each publication on the spreadsheet, in order to facilitate the research, an additional column that indicated only the year of publication of each of the articles was created. After the creation of
this category, the articles were ranked in ascending order by the Year of Publication. The subject categories' classification was mainly based on the category tagging of each article from its author or creating team.
35
3.3.1.1 Limitations in Coding
The classification of the Subject category classification faced one of the most severe limitations and difficulties on this research. An article could only be classified in one subject category, although many times it referred to issues that belonged to more than one category.
The classification of the types of visualisation was based on the book "Digital Diagrams" by Trevor Bunford (2000). Again, more limitations were faced in this part of the research as well. There were times when an image of data visualisation contained more than one type of visualisations. These were treated as separate items and not as one. Furthermore, there was the complex issue of static and interactive visualisations. In the case where the interactive visualisation was based on the basic types of static visualisation and where, for example, the user could click on a bar chart and see more numbers or select a different variable from the menu, those visualizations were treated as static and the type of visualisation was stated. In the cases of motion or animated graphics, complicated interactive networks and clouds or combinations of multiple interactive types which the user had to actively explore, those were classified simply as interactive, in order to avoid the confusion and blur boundaries of such complicated multiple classification. Graphics portrayed in
videos were classified under the category type of videos. In an effort to avoid confusion and mistakes created by such limitations, some basic classification rules were created for all variables, in order to help decide how to classify each article in a single category. Those rules and code frame also help eliminate possible bias of the researcher (Prasad, 2008). The entire classification code frame and its specifications, rules and assumptions made can be found in Appendix 3.1.
36
3.3.1.2 Data Processing After the primary data was gathered, it was processed and tabulated (Walliman, 2011) using Excel and with the help of basic descriptive statistics (Rugg, 2007), a set of results such as frequencies (United States General Accounting Office. GAO, 1989) and percentages. The generated results could help answer some of the research questions. More specifically, the main focus was on examining:
§ The main authors of the articles in number of publications § The number of articles and the percentage of the sample that contained
visualisation § The average number of visualisations contained in each article § The number and percentage of articles that provided the relevant data set § The number and percentage of articles that provided the relevant data
summary § The number and percentage of articles in each subject category § The visualisation tools used each year and then how the use of each of
selected tools progressed through out the years § The frequencies of use of tools, types and the frequencies of subjects per
author, in order to identify possible tendencies § The types of visualisations per subject vice versa, in order to identify
possible tendencies § The most used visualisation types per most used visualisation tools and vice
versa, in order to identify possible tendencies
3.3.2 Inter-Coder Reliability Testing Although generating some basic results is the main goal of the researcher, it is very important that the method of research is reliable (McNeil & Chapman, 1985). This means that if at least a second person was given training on how the research is conducted and was explained the code scheme, its rules, and clearly defined procedures (Graziano & Raulin, 2012), then this second person would create very similar data to that created by the first researcher (Krippendorff, 2003). This method of measuring the research reliability is called Inter-Coder Reliability. Although it
37
does not immediately ensure that the results are valid, Inter-Coder Reliability can help reassure in a higher grade that the data interpretations are valid. It is also a very helpful way to evaluate and edit the code frame when necessary in order to be more effective. In this research a secondary sample of a 10% of the articles of the original sample was provided to a second coder who was trained based on the code scheme and the set rules and was asked to complete the data spreadsheet for this secondary sample of 29 articles. The Inter-Coder Reliability was tested online in a nominal level of 2 coders on with the online tool Recal. The Inter-Coder Reliability was calculated by The Percent Agreement and Scott's Pi. Although Percent Agreement is easier to calculate, Scott's Pi is an index that shows the level of reliability after "taking into consideration in its calculations the agreement by chance" (Freelon, 2010). Therefore, Scott's Pi is considered a more objective index. For this study, a minimum of 0.8 is the required result for Scott's Pi.
3.3.2.1 Inter-Coder Reliability Test Results The table below shows the Inter-Coder Reliability test results' table from the exported CSV of Test results, The Percent Agreement and the result for Scott's Pi of each variable. The test showed at least a 93% of agreement between the two coders for each variable while the result of Scott's Pi for variables 1-14 was at least 0.9 and for variable 15 was 0.84. This proves that the reliability of the coding scheme meets the minimum requirements. A screenshot taken when the results were produced (since ReCal does not generate a reference number for each testing) is provided after the table.
38
Table 2: Inter-Coder Reliability Test Results.
FILENAME Inter-Coder1.csv Filesize 2028 bytes n columns 30 n variables 15 n coders per var 2 Percent Agreement Scott's Pi Variable 1 (cols 1 & 2) 100 1 Variable 2 (cols 3 & 4) 100 1 Variable 3 (cols 5 & 6) 100 1 Variable 4 (cols 7 & 8) 96.55172414 0.962214984 Variable 5 (cols 9 & 10) 96.55172414 0.900854701 Variable 6 (cols 11 & 12) 96.55172414 0.916786227 Variable 7 (cols 13 & 14) 100 1 Variable 8 (cols 15 & 16) 100 1 Variable 9 (cols 17 & 18) 96.55172414 0.932322054 Variable 10 (cols 19 & 20) 96.55172414 0.916305916 Variable 11 (cols 21 & 22) 93.10344828 0.910493827 Variable 12 (cols 23 & 24) 100 1 Variable 13 (cols 25 & 26) 100 1 Variable 14 (cols 27 & 28) 100 1 Variable 15 (cols 29 & 30) 93.10344828 0.847368421
39
Screenshot of the results output of Inter-Coder Reliability Test in ReCal:
40
4. Findings and Discussion This chapter is divided into four parts, each part corresponding to a research question and the equivalent objective(s). The section for the first research question presents the results of the quantitative research and a brief discussion on them. The other three sections, one for each of the other three research questions, are analysed under four themes recognised on the thematic analysis of the qualitative
research, the interviews. The findings for each theme are provided with the relevant discussion.
4.1 Research Question 1: § Which various types of visualisation and tools used can be recognised in the
portfolio-case study? Are any norms or patterns of them identified?
4.1.1 Objective: To identify:
§ The various tools used either in data analysis (and possibly formulation /
editing) or in visualisation, by The Guardian.
§ Possible tendencies, norms, co-relations on The Guardian’s portfolio, mainly
regarding subject, visualisation type and tools
4.1.1.1 Most Important Findings and Parallel Discussion2:
Visualisations per Article
Of the 295 articles examined, more than half included at least one type of
visualisation. The average number of visualisations per article was 1,97, almost two
visualisations per article.
2 Please note that the numerical order of the charts and tables in this chapter is different
than that of Appendix 4, where more charts and tables are included for each category (see
main table of contents for those). Additionally, all numbers and percentages for year 2013
refer to publications until 30th July.
41
Chart 1: Number of 1st, 2nd &3rd Visualisation, Total Number of Visualisations
Provision of Data Summary and Data Sets (or links to data source)
Chart 2. Provision of Data Summary and Data Sets (or links to data source) % of total
Additionally, of the total number of articles, about 73% included at least a data set and approximately 36% included a data summary. 35% of the articles included both.
12%
20%
6% 5% 7%
50%
Number of Visualisations per Article
Articles without Visualisation
Articles with 1 Visualisation
Articles with 2 Visualisations
Articles with 3 Visualisations
35.59%
72.54%
0.68%
37.97% 34.92%
Articles with Data Summary
Articles with Data Set
Articles with only Data Summary
Articles with only Data Set
Articles with Both Data
Summary and Data Set
Provision of Data Summary and Data Set
42
Authors by Number of Publications and Year (in descending order)
Being the creator of The Guardian Data Store, it was evident that Simon Rogers would be the author with the most publications, something that Chart 3 confirms. On Table 3 and Chart 4 of the next page, it is interesting to notice that Mona Chalabi, the third author in number of publications has only very recently published the vast majority of articles, as she became a member of The Guardian Data team in late 2012. It is also important to mention that one of the professionals interviewed for this study, Lisa Evans, is the 5th most published in the blog, as she was a member of The Guardian Data team for three years. Finally, about one fifth of the total articles examined has as first authors people with less than three publications on the blog, mainly writing for it in freelance base. Chart 3. Main Authors (percentage of total publications)
41.02%
9.15% 7.46% 5.42% 3.39% 2.71% 2.03% 2.03% 1.69% 1.02% 1.02% 1.02%
18.31%
Authors
43
Chart 4. Main Authors (Publications per year, Percentage of total publications)
Table 3. Main Authors (Publications per year, Percentage of total publications)
Author Name
Number of Articles in 2009
Number of Articles in 2010
Number of Articles in 2011
Number of Articles in 2012
Number of Articles in 2013
Total Number of Articles
Total Percentage
Simon Rogers 21 24 31 30 15 121 41.02% Ami Sedghi 0 4 7 7 9 27 9.15% Mona Chalabi 0 0 1 0 21 22 7.46% John Burn-Murdoch 0 0 1 12 3 16 5.42% Lisa Evans 0 2 3 5 0 10 3.39% James Ball 0 0 3 2 3 8 2.71% Claire Provost 0 0 2 3 1 6 2.03% Katy Stoddard 1 4 1 0 0 6 2.03% Nick Evershed 0 0 0 0 5 5 1.69% Randeep Ramesh 0 0 0 3 0 3 1.02% Sarah Hartley 0 2 0 1 0 3 1.02% Kevin Anderson 3 0 0 0 0 3 1.02%
Others 9 8 14 25 9 65 18.31%
44
Articles Per Subject per Year
Chart 5: Articles per Subject (Percentages) in total (all years)
About 19% of the articles were about "Politics, Government and Public
administration", with second articles those about social issues and third articles
those about culture. On Chart 6, one can see the number of articles per subject
category, per year.
18.64% 7.12% 8.14%
4.41% 4.75%
6.44% 14.92%
1.69% 5.76%
4.07% 5.42% 5.76%
3.73% 4.07% 4.41%
Politics / Government / Public Administration Sports Culture Health
Military / War Education Society
Crime / Terrorism World News
Global Development Environment / Weather / Nature
Media / Journalism Transportation
Technology / Science Economy / Business
Total Percentage Per Subject
45
Chart 6. Articles per Subject per Year (Frequencies)
Visualisation Types
The most frequently used visualisation type was the bar chart, used in almost 19% of the total number of visualisations, followed by maps with 16% and interactive visualisations at almost 15% (Chart 8). On Chart 7, one can see that maps and interactive visualisations were mostly used as the 1st visualisation of the articles, while bar charts and line graphs were more frequently used as second and tables as 3rd. This is consistent with the fact that maps and interactives attract and engage the audience more (Murray, 2013), so they would be used first in order to take advantage of this quality of theirs. Simplest types of visualisation, such as bar charts, line graphs and tables are used more to provide quick and brief insights into the data and the article and mostly appear later in it, after or within the actual text.
46
Chart 7. Types of 1st, 2nd and 3rd Visualisation (Percentages)
Chart 8. Types of Visualisations (Percentages of total use)
14.90%
2.78%
0.25%
3.28% 2.53%
9.60%
18.94%
8.33% 6.82%
15.91%
2.27%
7.58%
4.04%
1.01% 0.76% 1.01%
Total Percentage of Use Per Type of Visualisation
47
Visualisation Tools
As one can see in Chart 10 of the next page, the majority of the visualisations of the
The Guardian Data Blog were host of visualisations from external sources, such as
graphs from reports of official organisations, with second in frequency graphs
created by The Guardian's Graphics' team or external freelance graphists for the
Guardian. Third in frequency were visualisations created with Datawrapper, fourth
with Google Fusion and then those with Tableau and Many Eyes.
In Chart 9 below, we notice that although the creations from The Guardian's
graphics' team remain relatively steady through the years, visualisations from
external sources were hosted mostly in 2012. The use of Datawrapper increased
dramatically last year, making it the most used tool for 2013, while the use of Many
Eyes dropped to almost zero since 2012, probably due to the fact that it has not
been updated sufficiently (Rogers, 2011).
Chart 9. Main Visualisation Tools' Use Per Year (Frequencies)
48
Chart 10. Total Use of Main Visualisation Tools (Percentage) in descending order.
Frequencies of Use of Tools, Types and Frequencies of Subjects of Authors - The case of Simon Rogers
As Charts 11, 12 and 13 reveal, in his articles Simon Rogers mostly hosted
visualisations created by external sources, The Guardian's Graphics' team and
visualations created with Google Fusion, Many Eyes (at earlier stages of the Blog)
and Datawrapper. The majority of the visualisations were interactive, maps or Bar
Charts and his articles were about Politics/Government/Public Administration,
Society, World News and Culture. Similar tables and charts for the top 12 authors
can be found in Appendix 4.
49
Chart 11. Simon Roger's Use of Visualisation Tools (Frequencies)
Chart 12. Simon Roger's Use of Visualisation Types (Frequencies)
49#
29#22#
13# 11# 9# 7# 4# 3# 2# 1# 1#
11.#Graphic#fro
m#External#S
ource##
9.#Guardian#Graphics'#Team#/#
4.#Google#Fusion#
3.#Many#Eyes#
12.#Datawrapper#
6.#Google#Docs#/#Drive#
2.#Wo
rdle.net#
1.#Tableau#
8.#Infomous#
14.#Prezi#
5.#Zoom.it#
17.#Cartödb#
Author:(Simon(Rogers,(Tools(Used(Times#of#Use##
36 35
19
13 12 12 9 9 8
5 5 4 1 1
Author: Simon Rogers, Types Used
50
Chart 13. Subjects' frequency in Simon Roger's articles
Visualisation Types per Subject and Subjects per Visualisation Types:
Chart 14 shows that most interactive visualisations were used in articles whose
subject was Society, Sports, Politics/ Government/ Public Administration and Culture, which is natural since interactive visualisations, as Charts 16-19 show, were the second most used tool in articles about Society, first in articles about Society and Sports (let’s not forget London 2012) and among the top five tools for articles on Politics/Government/Public Administration. Chart 15 shows that most maps were used in articles whose subject was Society and Politics/Government/Public Administration, which is consistent with what Charts 16 and 17 show, where maps were the third most used tool in both categories.
25
18
10 9 9 7 7 7 6 5 5 4 4 3 2
Author: Simon Rogers, Subjects
51
Chart 14. Visualisation Type: 1. Interactive, per Subject (Frequencies)
Chart 15. Visualisation Type: 10. Map, per Subject (Frequencies)
10 9
6 6
4 4 4
3 3 3
2 2
1 0 0
Society Sports
Politics / Government / Public Culture
Education World News
Media / Journalism Military / War
Crime / Terrorism Environment / Weather / Nature
Global Development Transportation
Technology / Science Health
Economy / Business
Type: Interactive, Per Subject
9 8
7 7
6 5
3 3 3
2 2 2 2 2
1
Society Politics / Government / Public
Health Transportation
Education Global Development
Culture World News
Economy / Business Sports
Military / War Crime / Terrorism
Environment / Weather / Nature Media / Journalism
Technology / Science
Type: Map, per Subject
52
Chart 16. Subject 1. Politics / Government / Public Administration, per Visualisation Type (Frequencies)
Chart 17. Subject 7. Society, per Visualisation Type (Frequencies)
12 12
8 7
6 6 6 5
3 2
1 1 0 0 0 0
12 12
8 7
6 6 6 5
3 2
1 1 0 0 0 0
Subject: Politics / Government / Public Administration, per Type
16
10 9
6 6
3 3 2 2
1 1 0
Subject: Society, per Type
53
Chart 18. Subject 3. Culture, per Visualisation Type (Frequencies)
Chart 19. Subject 2. Sports, per Visualisation Type (Frequencies)
6 6
4 4 4 4
3
2 2 2
1 1 1
0
Subject: Culture, per Type
9
7 6
4 4
2 2 2 1 1
0
Subject: Sports, per Type
54
Visualisation Tools and Visualisation Types
Types per tools:
Charts 20 to 23 show that most interactive visualisations were created by external sources and with the tool Many Eyes. Additionally, most bar charts were created with Datawrapper and most maps with Google Fusion. Tools per types:
Charts 24 to 28 show that Tableau was used mostly for bar charts and Google Fusion for maps, while the Guardian team created mostly tables, area charts and maps. External sources were mainly used as a source of: interactive visualisations, combination graphs, maps and bar charts, while line graphs were the type of visualisation most created with Datawrapper as a tool. Chart 20. Type 1. Interactive, per most important Tools (Frequency)
23
18
6
5
4
2
0
0
0
0
11. Graphic from External Source
3. Many Eyes
Not Known / Not available
9. Guardian Graphics' Team / Guardian Data
Other
1. Tableau
12. Datawrapper
4. Google Fusion
6. Google Docs / Drive
2. Wordle.net
Type: Interactive, per Tool
55
Chart 21. Type 7. Bar Chart, per most important Tools (Frequency)
Chart 22. Type 10. Map, per most important Tools (Frequency)
40
11
9
6
6
3
0
0
0
0
12. Datawrapper
Not Known / Not available
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The
1. Tableau
4. Google Fusion
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Other
Type: Bar Chart, per Tool
28
15
9
6
4
2
0
0
0
0
4. Google Fusion
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data
Not Known / Not available
Other
1. Tableau
12. Datawrapper
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Type: Map, per Tool
56
Chart 23. Tool 1. Tableau, per most important Visualisation Types (Frequency)
Chart 24. Tool 4. Google Fusion, per most important Visualisation Types (Frequency)
6
2 2 2 1 1 1
0 0 0
Tool: Tableau, per Type
28
3 2 0 0 0 0 0 0 0
Tool: Google Fusion, per Type
57
Chart 25. Tool 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per most important Visualisation Types (Frequency)
Chart 26. Tool 11. Graphic from External Source, per most important Visualisation Types (Frequency)
22
14
9 6 5 4 3 2
0 0
Tool: Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per Type
23 22
15
12
9 7
5 4
0 0
Tool: Graphic from External Source, per Type
58
Chart 27. Tool 12. Datawrapper from External Source, per most important Visualisation Types (Frequency)
Summarising:
The results of the content analysis revealed that, on average, two visualisations are
found on the articles of The Guardian Data Store, while almost three quarters of the
articles offered at least the data set. One can notice that specific tools are used
more frequently for certain visualisation types, while some visualisation types are
more often chosen for articles of specific subject categories. While the sample is too
small to reveal correlations, it can show some tendency in the preference of types
and tools, not only from specific authors, but also for specific subjects. More than 50
tables and 100 charts of more detailed analysis can be found in Appendix 4, while
the entire excel file with the spreadsheet of the systematic content analysis and all
charts and tables in larger size can be downloaded from
https://copy.com/xOsRJcSR1wwL .
40
13
2 0 0 0 0 0 0 0
Tool: Datawrapper, per Type
59
4.2 Research Question 2: § Which is the creative process behind an infographic created and/or hosted
by The Guardian Data Store? In more detail:
o Which is the creative process step by step and who are the decision
makers?
o Which data types are the most broadly used and how is data selected
and gathered?
o Which are the most important tools used in the process either for
data processing and analysis or for visualisation?
4.2.1 Objective: To identify:
§ The various tools used either in data analysis (and possibly formulation /
editing) or in visualisation
4.2.2 Theme 1: Data Sources, Data Gathering and Processing, Data Visualisation: Workflow, Tools and Decision Making
4.2.2.1 Findings:
The interviewees mentioned that they usually try to find data form official sources
such as data issued by governments, National Statistic Institutes, Organisations,
sometimes agencies and through scrapping or personal communication. The use of
freely available data is important as this might need to be reproduced and it
enforces the "open data movement". Mr Ottaviani chooses the data he decides to
use by keeping the data that answers the research question he has set, while for
Mrs Evans the process is more intuitive. Mr Bradshaw, tries to strip the story to its
core details and its background, applies basic journalism rules and is disciplined on
the data analysis.
All three interviewees mentioned that it is usually the topic, the current news
headings or a hypothesis that lead to the search of available relevant data, if
possible from multiple sources, which could give a story worth telling. However,
there could be cases where data comes first and through a quick examination of it,
the journalist sees a story in it. For example, Mr Ottaviani on his article "Data
journalism in Italy: how did 1,000 prisoners die?" (2012) he had first decided to work
on this topic because he felt morally obliged to let people know about what was
happening in Italian prisons and then he set the research questions: who, where and
60
how they die. He gathered the relevant data and visualised it on an interactive map
that showed the deaths per prison all over Italy.
The usual workflow of a production of a data driven publication is:
1. Identifying the topic or the hypothesis
2. Searching for available data sources, primarily official but sometimes through
scraping as well, depending on the topic
3. Evaluating the data sets, picking the most relevant
4. Clearing, combining or merging them, to produce a simpler data set that
reveals a story and which could be visualised
5. Find additional sources of information that can make the story more focused
on how it affects people
6. Decide on the relevant visualisation
7. Write the final story and provide a data set
8. Publish story
The decision on the visualisation type that will be used for Mr Ottaviani depends on
the nature of the story, for Mrs Evans is cooperation with the graphics' team and
applying the basic charting principles, and Mr Bradshaw usually bases his decision
on the suggested approach the A. Abela3 (2009) Chart Chooser indicates. However,
Mr Ottaviani mentions that he prefers interactive visualisation because it involves
the users more if it is simple to use. Mrs Evans' decision on using static or dynamic
visualisation depends on the story, while Mr Bradshaw tends to prefer static
visualisation. However, he says that this also depends on the level of interactivity
that could be applied in each case.
The most preferred data scraping, cleaning, analysis and data visualisation tools
among the interviewees are: Scraperwiki, Google Refine, Google Fusion Tables,
Google Docs, Excel, Datawrapper, BatchGeo, Leaflet, Tableau, Adobe's Photoshop
and Illustrator, and scripts and languages such as Javascript and Python. Lisa
Evans mentioned that in the data store where she had worked for three years, they
preferred the various Google tools, Datawrapper, and the Adobe products for other
designing various illustrations.
3 http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf
61
Finally, the interviewees gave their opinion on what Big Data is. Mr Ottaviani defined
it as "Data that cannot be elaborated by a single computer but by multiple parallel
computers and with the use of algorithms". Mr Bradshaw did not provide a concrete
definition as he believes that it is not a practically useful term. He mentioned that
very big data sets existed before and what he believes has changed for this term to
emerge, is that data is now seen in a qualitatively different way.
4.2.2.2 Discussion:
Mr Ottaviani, Mrs Evans and Mr Bradshaw gather their data from open government
data and open data publications of various official institutes and organisations, but
also through scraping and crowdsourcing. Similarly to the description of Leimdorfer
& Thereaux (2012) on what using open data means for journalists, the interviewees
re-use, clear, combine and reshape the available data sets in order to see if there is
a story worth telling to the people and to provide their readers with a more clear and
simpler data set. The revealed creative workflow they follow is in a very great degree
to that of Aitamurto et al., 2011, which can be found in the literature review.
The research showed that the majority of the tools used for visualisation, and data
scraping and analysis by the three interviewees are commonly found in the recent
bibliography (for example Barkai, 2013), as suggested tools. Additionally, many of
them, such as Google Fusion Tables, Tableau, Datawrapper are among the ones
more frequently used by The Guardian Data Store, as the findings of the quantitative
study reveal.
As Murray (2013) stresses, interactive visualisation tools are used to attract and
engage users more, but additionally to portray many levels of data and information,
that static visualisation types fail to do at once. The examination of The Guardian's
Data Store portfolio revealed that the use of interactive visualisation is high and in
the majority of the articles it is the first visualisation provided, something which
strengthens its role in readers' attraction and engagement.
62
4.3 Research Question 3: § How do journalism professionals perceive data and information visualisation
in terms of value and effectiveness?
4.3.1 Objective: To identify:
§ How data and information visualisation is used in journalism and why is its
use constantly increasing
§ Which are the required skills and knowledge in order to work on data
visualisation on a professional level
§ Which is the importance of data journalism and data visualisation as
perceived by the professionals
4.3.2 Theme 2: Data Journalism and Data Visualisation: Importance, Reasons for Increased interest, Impact in Journalism Required Professional Skills
4.3.2.1 Findings:
Data driven journalism and data visualisation are growing in importance and
expressed interest in them for a variety of reasons. The first reason mentioned by all
three interviewees is the need for government transparency and accountability.
Journalists' role is partly to hold power accountable and data and information are
powers used in financial and political decision-making. Technological improvements
that lead to the increased use of technology in people's every day lives has led to a
growing amount of data and information circulating online. From this data it is
possible to extract interesting stories for the people, stories that would be based on
fact-checked data and statistics. The final reason is the advantage of speed that
online information has to offer to the reader.
Data and information visualisation particularly, enhance the advantage of speed.
Data visualisation manages to communicate quickly, more simply and intuitively
messages of ideas and concepts that would otherwise be difficult to explain or could
have been ignored. It also attracts attention in a way similar to that of a headline.
Another advantage of visualisation, according to Mr Bradshaw, is that it broadens
the range of people that a story can have an impact to, as it is favourable to non-
textual people.
63
The level of understanding visualisation on articles and the message they
communicate depends on the quality of the visualisation, as Mr Ottaviani stresses. If
visualisation is well made and provides, for instance, comparison of scales and
sizes to elements that people are aware of, it is better understood. Additionally, a
good visualisation can reveal patterns that would otherwise not be obvious and it is
always interesting to see how people find stories in a visualisation. Mrs Evans
mentions that a very good source of feedback about the way a visualisation was
perceived by people is their comments under the article. Nevertheless, in general,
Mrs Evans believes that people like visualisation because it is less time consuming,
they can explore it themselves, they like getting the bigger picture on data released
by the governments and enjoy a story that was nicely put together. Mr Bradhsaw
has no feedback on people's level of understanding of visualization highlighting that
there is no evidence to say that articles with visualisation are better understood by
the readers and that what can be measured instead is the impact of the story and
the scale of its reproduction by other media.
The impact of data journalism and data visualisation on journalism, according to Mr
Ottaviani, is that it brought in the centre the question of fact-checking. It is a matter
of truth to base opinion on something scientifically provable. They enable journalists
to push governments for transparency, to open their archives, that have a lot of
interesting data, which, however, governments might hesitate to release because it
would put them in a bad position and it would cause controversies. For Mrs Evans,
data journalism has had an impact on the profession and the role of data journalists
in the broader team of journalists. It resulted to data journalists gradually becoming
more respected as journalists in general and not to be considered just a part of the
graphics' team anymore.
Mr Bradshaw believes that all changes, which led to the flourishing of data
journalism and visualisation, are consequences of a rapidly changing information
environment in general. It is the way advertising is measured that changes the
environment of information. Journalists are challenged to increase traffic on the
media they work for or cooperate with, and there is pressure for them to publish
more publications. Moreover, specialised media and bloggers, who might have a
64
deep knowledge of a subject, can easily identify mistakes on a story and that brings
more pressure for factual accuracy.
A data journalist has to combine a minimum of skills and to have a basic knowledge,
or at least an understanding, of many fields. Mr Ottaviani specifically suggests some
knowledge of programming, like HTML and CSS initially and then Javascript or
Python, basic statistics, design, social media, advanced excel (macros and pivot
tables), some database management like MySQL and, of course, the basic
principles of journalism, like ethics. Mrs Bradshaw stresses the importance of having
"an eye for a story" that might be hidden in data and being able to analyse data and
communicate results effectively, in a way that are connected to human stories. For
Mrs Evans it is very important to ask experts of a specific field, as data journalisms
are bridging the gap between specialised fields rather than being an expert in all.
On a team level, an ideal data journalism and visualisation team would include about
four people of different specialties, like a programmer, who would do the data
scraping, a good graphics' designer with knowledge of interactive visualisation as
well, one or two journalists and ideally a statistician or someone with background in
Maths. All those people, however, would need to know a bit of all fields in order to
understand each other and exchange ideas. The overlapping of different skills is
very interesting for Mr Ottaviani.
4.3.2.2 Discussion:
Smiciklas (2012) mentioned that the power of data visualisation is to allow viewers see "insights" that would have not been visible to them if they were only provided with the numerical data. The interviewees, almost similarly, explain that data visualisation communicates concepts and messages that would otherwise be difficult to explain or could have been ignored. Tufte (2001) on his description of Graphical Excellence, set as one of its prerequisites the communication of the message in the shortest time. All interviewees in the research mentioned speed as another very important quality of data visualisation.
In the literature review, Cohen, Hamilton and Turner (2011) referred to data
journalism as the "weapon" that people and journalists can use to hold politicians
65
and governments accountable. According to the interviewees, this is the main
reason that the importance and use of data journalism and data visualisation, are
constantly growing. As Diana Priest (McGhee, 2010) pointed out, data journalism is
branch of the broader field of investigative journalism, facts and the truth is what all
investigators look for. With data journalism, as Mr Otaviani said, fact-checking is in
the centre of attention.
On the project "Journalism in the Age of Data" (McGhee, 2010) several
professionals mentioned as fundamental skill in data journalism, the ability
cooperate with people of different expertise and skills, by having the basic
understanding of the other fields. This overlapping of fields was highlighted by the
interviewees, as very important and inevitable, since one person cannot do it all and
would eventually need to seek the help of an expert.
4.4 Research Question 4: § Which are the possible weaknesses, limitations and the negative aspects of
data and information visualisation?
4.4.1 Objective: To identify:
§ Which are the possible limitations, weaknesses and negative aspects or
impact of data journalism and information visualisation
4.4.2 Theme 3: Weaknesses, Limitations, Negative Aspects and Dangers of Data Journalism and Data Visualisation
4.4.2.1 Findings:
The drawbacks of data visualisation, according to Mr Bradshaw, are that it can
oversimplify, or lose subtleties or complexities of the story. For this reason it should
always be in partnership with other information. Like any form of communication it
can be misleading and the way to avoid such a situation is the ethical considerations
66
that accompany top journalism: to shrive to be accurate, into context and not to
misrepresent4.
Mrs Evans says that the best way to avoid this is to always publish the data set with
it, considering that if something is wrong or an important aspect is ignored, people
will comment on it. As she notes, it is very easy for something to go wrong when an
infographic is created, since there are too many decisions to be taken, even in terms
of design. She believes that the most difficult visualisation type for people to
understand is complex networks.
In Mr Ottaviani's opinion, the risk with data journalism in general and, consequently,
with data visualisation, is that journalists report to people news in a quantitative form
and might fail to give people something that creates an emotional response. Mr
Ottavianni's philosophy is to "give numbers an identity". Additionally, journalists
need to overcome possible prejudices they might have and clearly present the facts
and the context, even if they contradict what they believed that far. Finally, he
believes that data visualisation can be misleading and provided a link5 to a webpage
that highlights examples of bad visualisation.
Other important issues that need to be considered in data journalism, is the need to
respect copyright and database rights and not to publish something that would break
the law or violate people's privacy, as Mr Bradshaw comments. Mr Ottaviani agrees
that, especially after the example of Wikileaks, journalists need to be careful and not
to expose people in danger or harmful situations by publishing their personal
information. Mrs Evans stresses the importance of being straightforward to people
and corporations when they give their data, as to the reproduction and publication of
it so that they know the potential consequences.
4 When Mr Bradshaw was asked for examples of bad visualisation, he provided the following link with bookmarks of bad visualisation: http://pinboard.in/u:paulbradshaw/t:badvis 5 When Mr Ottaviani was asked for examples of bad visualisation, he provided the following link with examples of bad visualisation http://flowingdata.com/category/statistics/mistaken-data/
67
4.4.2.2. Discussion
The danger of misleading and inaccurate visualisation is stressed in the literature
review by Ward et al., (2010). A visualisation can be misleading when it is
inaccurate or when design aspects of it, like scales, lines are wrong, disproportional,
unclear or too complicated. Mrs Evans believes that what leads to a misleading
visualisarion is, apart from what could fail in the design, the generally the wrong way
that the data is approached.
The great risk though of data journalism just providing the numerical data to the
people without telling a story that connects with them. They must not forget that they
will need to search for the human side of the story (Oliver, 2010). Likewise, Mr
Ottaviani's philosophy is to "give an identity to the numbers".
Paul Steiger from ProPublica expressed his concern on how far can accessibility
and openness go (McGhee, 2010). After the incident of Wikileaks, where people’s
identities and other private information was leaked, Mr Ottavianni stresses the
necessity that journalists are very careful to what they publish or reproduce.
4.4.3 Theme 4: Future Prospective and Challenges of Data Journalism and Data Visualisation
4.4.3.1 Findings:
Three different perspectives were given by the interviewees about the future
prospective and challenges in data journalism and data visualisation. Mr Ottaviani
believes that they will spread more since digital media offer opportunities that print
media do not. Since paper will mostly disappear in a few decades, online media
offer the possibility to expand. Additionally, stories are easily shared online and
readers are more involved. They can interact with other readers, they can fact-check
and they can even participate in building data sets and creating stories. This is a
very interesting side of data journalism and data visualisation to keep an eye on.
68
Mrs Evans believes that the future will bring better and more sophisticated tools and
hopes that more people will come to data journalism, especially people with skills on
both statistics and understanding what is useful for the readers.
Mr Bradshaw foresees a conflict about the kind of information journalists will be
seeking and that people in power will not want to make available. He also believes
that there will be fights around Freedom Of Information laws.
Accoording to Mr Bradshaw, journalists will become better in collecting data through
scraping or leaks, as other data might not be available elsewhere but online. More
online data will give opportunities for personalisation of stories, especially with the
employment of social media, like Facebook, since it will be easier to connect stories
to specific people and to make analysis of human networks and connections, which
historically was really hard to do.
4.4.3.2 Discussion:
Sir Tim-Berners-Lee (Arthur, 2010), mentioned that "Data-driven journalism is the future" because of he endless possibilities and options in data processing,
visualisation techniques, programming languages and especially open and open government data. Mrs Evans believes that the tools in the future will be more advanced and more skilled people will want to work in the field. Especially people with advanced programming knowledge will be needed more for data scraping in the future, as Mr Bradshaw mentions, because people and journalists might not be able to find the data set they wish for in the traditional sources.
69
5. Conclusion Data journalism and data visualisation are constantly growing in importance and use. This dissertation aimed to investigate the use of data visualisation in journalism, by examining as a case study one of the most respected providers of data journalism publications, The Guardian Data Store.
Meeting Objectives: The study's objectives were to identify:
1. Possible tendencies, norms, co-relations on The Guardian’s portfolio, mainly
regarding subject, visualisation type and tools
2. The various tools used either in data analysis (and possibly formulation /
editing) or in visualisation, and more specifically by The Guardian.
3. How data and information visualisation is used in journalism and why is its
use constantly increasing
4. Which are the required skills and knowledge in order to work on data
visualisation on a professional level
5. Which is the importance of data journalism and data visualisation as
perceived by the professionals
6. Which are the possible limitations, weaknesses and negative aspects or
impact of data journalism and information visualisation
Evaluation of Methodology Approach The first methodological approach followed in the study was a Systematic Content Analysis of a sample of data-driven articles published by The Guardian Data Store. This qualitative method was the most suitable to use in order to meet the first ans part of the second objective, the identification of possible tendencies, norms or co-relations between the subject category of the articles, the different visualisation tools and the various visualisation types. The sample though was relatively small compared to the total population of the published data-driven articles. Additionally, there were technical limitations in the coding options of some variables, such as
70
the author, where only the first authors of the articles are taken into consideration. Another limitation was the need to classify articles only into one subject category, when it was clear that some could be classified under multiple categories. In order to face such restrictions, a detailed coding scheme for each variable was created with all the specifications on the classification and other relevant restrictions. The reliability of the coding scheme and the content analysis was tested online with the assistance of a second coder when the latter's coding, was compared to that of the researcher. The reliability test showed high agreement results. The second methodological approach that was followed was the conduction of semi-structured interviews with data driven journalists Jacoppo Ottaviani, Paul Bradshaw and Lisa Evans, who have either published on The Guardian Data Store in a freelance basis, or had worked for it. The data gathered on the interviews was analysed using the method of thematic analysis. This qualitative method was considered the most adequate to help meet objectives three to six and partly objective two (mostly through the interview of Mrs Evans who worked on The Guardian Data Blog).
Key Findings: The quantitative research showed that on the average number of visualisations
per article in The Guardian Data Store were two, while more than half of the articles featured at least one type of visualisation. A data set (or a link to it) and/or a data summary was provided in most of the articles. The main authors in quantity of articles published were identified, with most frequent Simon Rogers who created the blog in 2009 and who only recently left The Guardian. Other notable authors were: Ami Sedghi, Mona Chalabi, John Burn-Murdoch, James Ball and Lisa Evans, who was also one of the professionals interviewed for the current research. The main subject categories of the research were Politics/Government/Local Administration, Society and culture.
71
The visualisations were created by for the Guardian by its graphic's team or other freelance graphic designers, by external sources in the cases where the blog was just featuring the visualisation of another source, and then with the tools Datawrapper, Google Fusion Tables and Tableau. The main visualisation types were bar charts, maps and interactives. The reseach showed tendency to use specific tools for specific visualisation types, for example Datawrapper for most bar charts. It also showed how various tools were substituted by others during the years or how for some subject categories, a specific type of visualisation was used more often. The qualitative method revealed that some of the visualisation tools like Datawrapper, Google Fusion, Tableau, preferred by The Guardian data store, were also among the preferred ones by the interviewees. The creative workflow of a publication since the original conception of the topic till the final publication was described, with a special insight on decision-making, the main data sources, such as open government data or open data publications and data gathering, processing and visualisation. The reasons for which data journalism and data visualisation were growing in use and importance were examined, with the main reasons, for data journalism, being the pressure for more transparency and government accountability. For data visualisation the main reasons were its quality for quick communication of the story that the data represents, many times with interactive visualisations, which engage users more. The research highlighted the possible dangers and risks of data journalism and visualisation, like the misleading representation of facts and figures or failing to
bring in the story a human dimension that would interest the people. Finally it explored possible future challenges and prospective, like the need for more data scraping as most interesting data that would create a more unique story rather than conventional data sets, or the possibility of personalising stories with the use of social media as a tool.
72
Future Research Suggestions There are many possible different studies that could be conducted about data visualusation and data journalism. A first alternative research or extension of the current research could be how readers perceive those articles and the grade of understanding they have of them and particularly of their visualisations, Another option could be one that would focus on the reciprocation of The Guardian Data Store's articles from the readers active on social media. A focus on the rate of sharing and spreading of the articles on social media and on the readers' feedback to those articles through comments and replies on Facebook or Twitter, for example, would be very interesting. Word Count: 14893.
73
Bibliography 1. A Quick Illustrated History of Visualisation. (n.d.). Data Art. Retrieved July 10, 2013,
from http://www.data-art.net/resources/history_of_vis.php 2. A-Z section of The Guardian Data Blog: The complete Index of Data Sets. (n.d.).
Retrieved July 30, 2013, from http://www.theguardian.com/technology/page/2009/jun/17/1
3. Abela, A. (2009). Chart Suggestions—A Thought-Starter. Retrieved from http://extremepresentation.typepad.com/files/choosing-a-good-chart-09.pdf
4. Adobe Creative Cloud. (n.d.). Adobe. Retrieved July 20, 2013, from
http://www.adobe.com 5. Aitamurto, T., Sirkkunen, E., & Lehtonen, P. (2011). Trends In Data Journalism.
6. Arthur, C. (2010, November 22). Analysing data is the future for journalists, says Tim
Berners-Lee. The Guardian. Retrieved April 30, 2013, from
http://www.guardian.co.uk/media/2010/nov/22/data-analysis-tim-berners-lee 7. Ball, J. (2013). Can you trust an infographic? The Guardian. Retrieved from
http://www.guardian.co.uk/media/shortcuts/2013/jan/09/can-you-trust-an-
infographic 8. Barkai, M. (2013). Data Visualisation Tools and Trends to Watch: An Interview with
Datavisualisation.ch. Data Driven Journalism.
9. BatchGeo. (n.d.). BatchGeo LLC. Retrieved July 10, 2013, from
http://www.batchgeo.com 10. Big data needn’t be a big headache: How to tackle mind-blowing amounts of
information. (2012). Strategic Direction, 28(8), 22–24.
doi:10.1108/02580541211249583 11. Bounegru, L. (2013). Slides, Tools and Other Resources From the School of Data
Journalism 2013. Data Driven Journalism. Retrieved April 30, 2013, from
http://datadrivenjournalism.net/news_and_analysis/slides_tools_and_other_resources_from_the_school_of_data_journalism_2013
12. Bounford, T. (2000). Digital Diagrams. (P. Leek, Ed.) (1st ed., pp. 38–107, 118–119).
London, UK: Cassel & Co. 13. Bradshaw, P. (n.d.). Online Journalism Blog. Retrieved from
http://onlinejournalismblog.com 14. Bradshaw, P. (2010). How to be a data journalist. The GuardianThe. Retrieved July
22, 2013, from http://www.theguardian.com/news/datablog/2010/oct/01/data-
journalism-how-to-guide 15. Bradshaw, P. (2012a). Olympic torch relay places - How were they allocated? Get
the data. The Guardian. Retrieved July 22, 2013, from
http://www.theguardian.com/sport/datablog/2012/jul/26/olympic-torch-relay-places
74
16. Bradshaw, P. (2012b). 2012 Olympics investigation: The story behind the olympic
sponsors. The Guardian. Retrieved July 22, 2013, from http://www.theguardian.com/news/datablog/2012/jun/06/olympics-2012-
investigation
17. Bradshaw, P. (2012c). Who are the mystery Olympic torchbearers? Get the data. The Guardian. Retrieved July 22, 2013, from
http://www.theguardian.com/sport/datablog/2012/jul/11/2012-olympic-torch-relay-
torchbearers-sponsors-data 18. Bradshaw, P. (2013). Council spending on the Olympic torch relay: Where did the
money go? The Guardian. Retrieved July 22, 2013, from
http://www.theguardian.com/news/datablog/2013/mar/06/council-spending-olympic-torch-relay-where-did-money-go
19. Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative
Research in Psychology, 3(2), 77–101. Retrieved from http://dx.doi.org/10.1191/1478088706qp063oa
20. Buhl, H. U., Röglinger, M., Moser, F., & Heidemann, J. (2013). Big Data: A
Fashionable Topic with(out) Sustainable Relevance for Research and Practice? Business & Information Systems Engineering, 5(2), 65–69. doi:10.1007/s12599-013-
0249-5
21. Burns, R. B. (2000). Introduction to Research Methods (4th ed., pp. 391–392, 423–435). London, UK: Sage Publications Ltd.
22. Cairo, A. (2012). Infographics and Visualizations as Tools For the Mind. Visual.ly
Blog. Retrieved from http://blog.visual.ly/infographics-and-visualizations-as-tools-
for-the-mind/ 23. CartoDB: Geospatial Data Visualisation. (n.d.). CartoDB. Retrieved June 20, 2013,
from http://cartodb.com
24. Chabot, C. (2009). Graphically Speaking Demystifying Visual Analytics. IEEE
Computer Graphics and Applications, 29(2), 84 –87. Retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4797520
25. Cohen, S., Hamilton, J. T., & Turner, F. (2011). Computational journalism.
Communications of the ACM, 54(10), 66–71. doi:10.1145/2001269.2001288 26. Crawford, K. (2013). The Hidden Biases in Big Data. Harvard Business Review.
Retrieved June 25, 2013, from
http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html 27. D3.js - Data Driven Documents. (n.d.). D3.js. Retrieved August 10, 2013, from
http://d3js.org
28. Data Visualisation - Selected tools. (2013). Retrieved May 16, 2013, from http://selection.datavisualization.ch
29. Datawrapper Software. (n.d.). Retrieved June 12, 2013, from http://datawrapper.de
75
30. Davies, T. (2010). Open data, democracy and public sector. Retrieved from
http://www.academia.edu/988533/Open_Data_Democracy_and_Public_Sector_Reform
31. Dawson, C. (2009). Introduction to Research Methods: A practical guide for anyone
underatking a research project (Fourth., pp. 115–116). Oxford, UK: How To Content. Retrieved from https://www.dawsonera.com/abstract/9781848033429
32. Devi, N. B. (2009). Qualitative and Quantitative Methods in Libraries, International
Conference. In Understanding the Qualitative and Quantitative Methods in The Context of Content Analysis (pp. 1–10). Chania Crete, Greece.
33. Domokos, J., & Evans, L. (2011). Jobcentres “tricking” people out of benefits to cut
costs, says whistleblower. The Guardian. Retrieved August 03, 2013, from
http://www.theguardian.com/politics/2011/apr/01/jobcentres-tricking-people-benefit-sanctions
34. Entry-level tools Online visualisations. (2012). NetMagazine.com. Retrieved July 05,
2013, from http://www.netmagazine.com/features/top-20-data-visualisation-tools 35. Flew, T., Spurgeon, C., Daniel, A., & Swift, A. (2012). The Promise of Computational
Journalism. Journalism Practice, 6(2), 157–171. doi:10.1080/17512786.2011.616655
36. Fogg, A. (2013). Immigration, crime, benefits: Everything you know about the state of the nation is wrong. The Independent. Retrieved July 15, 2013, from
http://www.independent.co.uk/voices/comment/immigration-crime-benefits-
everything-you-know-about-the-state-of-the-nation-is-wrong-8697574.html 37. Free Our Data Campaign. (n.d.). The Guardian. Retrieved July 20, 2013, from
http://www.freeourdata.org.uk
38. Freelon, D. (n.d.). ReCal: reliability calculation for the masses. Retrieved August 18, 2013, from http://dfreelon.org/utils/recalfront/
39. Freelon, D. G. (2010). ReCal: Intercoder Reliability Calculation as a Web Service. International Journal of Internet Science, 5(1), 20–33. Retrieved from
http://www.ijis.net/ijis5_1/ijis5_1_freelon.pdf
40. Friendly, M., & Denis, D. J. (2001). Milestones in the history of thematic cartography, statistical graphics, and data visualization. Retrieved from
http://www.datavis.ca/milestones/
41. Giardina, M., & Medina, P. (2012). Information Graphics Design Challenges and Workflow Management. In International Conference on Communication, Media,
Technology and Design (pp. 246–252). Instanbul, Turkey. Retrieved from
http://www.cmdconf.net/2012/makale/46.pdf 42. Google Fusion Tables Experimental Application. (n.d.). Google Research. Retrieved
June 12, 2013, from
https://support.google.com/fusiontables/?hl=en#topic=1652595
76
43. Google Refine. (n.d.). Google. Retrieved June 12, 2013, from
http://code.google.com/p/google-refine/ 44. Graziano, A. M., & Raulin, M. L. (2012). Research Methods: A process of Inquiry (8th
ed., pp. 136, 320). New Jersey: Pearson Academic Computing.
45. Grey, J., Chambers, L., & Bounegru, L. (2012). The data journalism handbook: How
Journalists Can Use Data to Improve the News (p. 242). O’Reilly Media. Retrieved
from http://datajournalismhandbook.org 46. Halevy, A., & McGregor, S. (2012). Data Management for Journalism. Retrieved
from ftp://ftp.research.microsoft.com/pub/debull/A12sept/journal.pdf 47. Hannabuss, S. (1996). Feature article Research interviews. New Library World,
97(1129), 22–30. doi:10.1108/03074809610122881
48. Heer, J., Bostock, M., & Ogievetsky, V. (2010). A tour through the Visualization zoo.
COMMUNICATIONS OF THE ACM, 53(6), 59–67. doi:10.1145/1743546 49. Hohl, M. (2011). From abstract to actual: art and designer-like enquiries into data
visualisation. Kybernetes, 40(7/8), 1038–1044. doi:10.1108/03684921111160278
50. Hox, J. J., & R., B. H. (2005). Data collection: Primary vs. Secondary. In Encyclopedia of Social Measurement. Elsevier Inc. Retrieved from http://igitur-
archive.library.uu.nl/fss/2007-1113-200953/hox_05_data collection,primary versus
secondary.pdf 51. Jacoppo Ottaviani’s Blog, at “Il Fatto Quotidiano.” (n.d.). Il Fatto Quotidiano.
Retrieved July 08, 2013, from http://www.ilfattoquotidiano.it/blog/jottaviani/
52. Jacoppo Ottaviani’s Profile at The Guardian. (n.d.). The Guardian. Retrieved July 08,
2013, from http://www.theguardian.com/profile/jacopo-ottaviani 53. Joel, G. (2011). #ijf11: The key term in open data? It’ s “re-use”, says Jonathan
Gray. Journalism.co.uk. Retrieved June 10, 2013, from
http://blogs.journalism.co.uk/2011/04/18/ijf11-the-key-term-in-open-data-its-re-use-says-jonathan-gray/
54. Kramer de Oliveira Barros, R., & Araujo Bertoti, G. (2012). An Information Visualization Tool for Data Journalism. In IHC 2012 Companion Proceedings (pp.
41–42). Cuiaba, Brazil. Retrieved from http://dl.acm.org/citation.cfm?id=2400094
55. Krippendorff, K. (2004). Reliability in Content Analysis: Some Common Misconceptions and Recommendations. Human Communication Research, 30(3),
411–433. doi:10.1093/hcr/30.3.411
56. Krippendorff, Klaus. (2003). Content Analysis : An Introduction to Its Methodology
(2nd ed., pp. 18–43, 81–96). London, UK: Sage Publications Inc. 57. Kronenburg, T. (2011). Data Journalism Fuelling PSI Re- use, Topic Report
No.2011/2. Retrieved from http://epsiplatform.eu/sites/default/files/Topic Report
Data Journalism.pdf
77
58. Kvale, S. (2007). Doing Interviews (pp. x, xi, 46–47, 84–109). London, UK: Sage
Publications Ltd. 59. Landman, C. (2013). Data | Visualization | Art ? MastersOfMedia.hum.uva.nl.
Retrieved May 10, 2013, from http://mastersofmedia.hum.uva.nl/2013/03/13/data-
visualization-art/ 60. Leimdorfer, A., & Thereaux, O. (2012). How open data is redefining the roles of the
journalist, audience and publisher. In USING OPEN DATA: Policy modeling, citizen
empowerment, data journalism. Brussels. Retrieved from
http://www.w3.org/2012/06/pmod/pmod2012_submission_9.pdf 61. Lima, M. (2011). Visual Complexity: Mapping Patterns of Information (pp. 158–219).
New York: Princeton Architectural Press.
62. Lisa Evans’ Personal Web Page. (n.d.). Retrieved July 15, 2013, from http://objectgroup.org/
63. Lisa Evans’ Profile at The Guardian. (n.d.). The Guardian. Retrieved July 15, 2013,
from http://www.theguardian.com/profile/lisaevans 64. Losowsky, A., Duenes, S., Corbineau, A., Kleiner, C., Grundy, P., Schwochow, J., &
Franchi, F. (2011). Visual Storytelling: Inspiring a New Visual Language (pp. 24–31).
Berlin: Gestalten. 65. Lotan, G., Ananny, M., Gaffney, D., & Boyd, D. (2011). The Revolutions Were
Tweeted : Information Flows During the 2011 Tunisian and Egyptian Revolutions Web Ecology Project Web Ecology Project. International Journal of Communication,
5, 1375–1405. Retrieved from
http://ijoc.org/ojs/index.php/ijoc/article/view/1246/643
66. Mahrt, M., & Scharkow, M. (2013). The Value of Big Data in Digital Media Research. Journal of Broadcasting & Electronic Media, 57(1), 20–33.
doi:10.1080/08838151.2012.761700
67. Manovich, L. (2011). What is visualisation? Visual Studies, 26(1), 36–49.
doi:10.1080/1472586X.2011.548488 68. ManyEyes Visualisation Experiment. (n.d.). IBM. Retrieved June 10, 2013, from
http://www-958.ibm.com/software/analytics/manyeyes/
69. Marshall, S. (2012). PPAdigital: Paul Bradshaw ’ s five principles of data management. Journalism.co.uk. Retrieved July 10, 2013, from
http://blogs.journalism.co.uk/2012/09/26/ppadigital-paul-bradshaws-five-
principles-of-data-management/ 70. Marshall, S. (2013). How big data is changing financial journalism. Journalism.co.uk.
Retrieved from http://www.journalism.co.uk/news/-hhldn-how-big-data-is-
changing-financial-journalism/s2/a551791/ 71. Mayring, P. (2000). Qualitative Content Analysis Basic Ideas of Content Analysis.
Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 1(2).
78
Retrieved from http://www.utsc.utoronto.ca/~kmacd/IDSC10/Readings/text
analysis/CA.pdf 72. McGhee, G. (Writer, Producer). (2010, September 23). "Journalism in the Age of
Data" [Web Video]. Retrieved from http://t.co/7ViPzDAywj 73. McNeil, P., & Chapman, S. (2005). Research Methods (3rd ed., pp. 9–24, 59–67,
161–154). New York: Routledge.
74. Mol, L. (2011). The Potential Role for Infographics in Science Communication. Vrije
Universiteit Amsterdam. Retrieved from http://www.sg.uu.nl/academie/infographics/Laura Mol Master Thesis SC Final-
small.pdf 75. Murray, S. (2013). Interactive Data Visualisation for the web. (M. Blanchette, Ed.) (p.
2). O’Reilly Media, Inc.
76. Oliver, L. (2010). UK government’s open data plans will benefit local and national journalists. Journalism.co.uk. Retrieved from 12/07/2013
77. Open Refine. (n.d.). GitHub. Retrieved July 25, 2013, from http://openrefine.org
78. Ostergren, M., Hemsley, J., Belarde-lewis, M., Walker, S., & Hall, M. G. (2011). A
vision for Information Visualization in Information Science. In iConference ’11 Proceedings of the 2011 (pp. 531–537). doi:10.1145/1940761.1940834
79. Ottaviani, J. (2012). Data journalism in Italy: how did 1,000 prisoners die? The
Guardian. Retrieved July 15, 2013, from
http://www.theguardian.com/news/datablog/2012/may/23/italian-prisoners-deaths
80. Parasie, S., & Dagiral, E. (2012). Data-driven journalism and the public good: “Computer-assisted-reporters” and “programmer-journalists” in Chicago. New
Media & Society. doi:10.1177/1461444812463345
81. Paul Bradshaw’s Collection of Bad Visualisation Examples. (n.d.). Retrieved August 17, 2013, from http://pinboard.in/u:paulbradshaw/t:badvis
82. Paul Bradshaw’s Profile at The Guardian. (n.d.). The Guardian. Retrieved July 15,
2013, from http://www.theguardian.com/profile/paul-bradshaw 83. Prasad, B. D. (2008). Content Analysis: A method in Social Science Research. In
Research Methods for Social Work (pp. 174–193). New Delhi: Rawat Publications.
Retrieved from http://www.css.ac.in/download/deviprasad/content analysis. a method of social science research.pdf
84. Prezi Virtual Presentation Whiteboard. (n.d.). Prezi. Retrieved June 20, 2013, from
http://prezi.com 85. Rogers, S. (2010). “One hell of a spreadsheet”: turning 90,000 rows of WikiLeaks
data into a story. Journalism.co.uk. Retrieved July 15, 2013, from
http://www.journalism.co.uk/news-features/-039-one-hell-of-a-spreadsheet-039--turning-90-000-rows-of-wikileaks-data-into-a-story/s5/a540109/
79
86. Rogers, S. (2011). Data visualisation: in defence of bad graphics. The Guardian Data
Blog. Retrieved June 10, 2013, from
http://www.theguardian.com/news/datablog/2011/oct/17/data-visualisation-visualization
87. Rogers, S. (2013). Facts Are Sacred: The Power of Data (1st ed., p. 309). London,
UK: Faber and Faber Limited, Guardian Books. 88. Rugg, G. (2007). Using Statistics: A Gentle Introduction (pp. 25–52). New York:
Open University Press, Mc Graw - Hill Education.
89. Salmons, J. (2010). Online Interviews in Real Time (pp. 38–71). London, UK: Sage
Publications Inc. 90. Schaap, J. (2012). Cultural Bias in Data Visualization. Masters of Media, New Media
& Digital Culture M.A., University of Amsterdam. Retrieved July 10, 2013, from
http://mastersofmedia.hum.uva.nl/2012/03/28/cultural-bias-in-data-visualization/ 91. ScraperWiki. (n.d.). ScraperWiki. Retrieved June 10, 2013, from
https://scraperwiki.com
92. Segel, E., & Heer, J. (2010). Narrative visualization: telling stories with data. IEEE
transactions on visualization and computer graphics, 16(6), 1139–1148.
doi:10.1109/TVCG.2010.179 93. Smiciklas, M. (2012). The Power of Infographics (pp. 21–34). U.S.A.: Que.
94. Smolan, R., & Erwitt, J. (2012). The Human Face of Big Data (pp. 14–15, 136–157).
Sausalito, California: Against All Odds Productions.
95. Stolte, Y. (2012). Journalism and Access to Data The Phone Hacking Scandal , WikiLeaks. Datenschutz und Datensicherheit, 5, 354–358. Retrieved from
http://link.springer.com/content/pdf/10.1007%2Fs11623-012-0134-2.pdf
96. Tableau Software. (n.d.). Tableau Software. Retrieved June 10, 2013, from http://www.tableausoftware.com
97. The New York Times: Multimedia. (n.d.). The New York Times. Retrieved August 10,
2013, from http://www.nytimes.com/pages/multimedia/ 98. Thomas, R. M. (2003). Data-Collection Processes and Instruments. In Blending
Qualitative & Quantitative Research Methods in Theses and Dissertations (pp. 57–
75). Sage Publications, Inc. doi:10.4135/9781412983525
99. Top Ten Tools for Data Journalism. (2013). Interhacktives.com. Retrieved June 10,
2013, from http://www.interhacktives.com/2013/05/10/top-ten-tools-for-data-journalism/
100. Tufte, E. R. (2001). The Visual Display of Quantitative Information (pp. 13–77).
Chelshire, Connecticut: Graphics Press LLC\. 101. United States General Accounting Office. GAO. (1989). Content Analysis : A
Methodology for Structuring and Analyzing Written Material (pp. 1–31). Retrieved
from http://archive.gao.gov/d48t13/138426.pdf
80
102. Vis, F. (2012). Actor Types code frame. Retrieved from
http://researchingsocialmedia.files.wordpress.com/2012/01/actor-types-code-frame3.pdf
103. Walliman, N. S. R. (2011). Research Methods: The Basics (pp. 15–29, 63–113).
London, UK: Taylor & Francis Routledge. Retrieved from https://www.dawsonera.com/abstract/9780203836071
104. Ward, M., Grinstein, G., & Keim, D. (2010). Interactive Data Visualisation:
Foundations, Techniques, and Applications (pp. 130–148, 365–374). Natick, MA: A K Peters, Ltd.
105. Weber, W., & Rall, H. (2012). Data Visualization in Online Journalism and Its
Implications for the Production Process. In 2012 16th International Conference on Information Visualisation (pp. 349–356). Ieee. doi:10.1109/IV.2012.65
106. Wong, D. M. (2010). The Wall Street Journal: Guide to Information Graphics (p.
143). New York: W. W. Norton & Company.
107. Wordle. (n.d.). Wordle. Retrieved July 12, 2013, from http://www.wordle.net
108. Yau, N. (n.d.). Mistaken Data. Flowing Data. Retrieved July 15, 2013, from http://flowingdata.com/category/statistics/mistaken-data/
81
Appendices
82
Appendix 1: Ethical (Application, Consent Form, Approval)
83
The University of Sheffield. Proposal for Information School Research Ethics Review
Students Staff This proposal submitted by: This proposal is for: Undergraduate Specific research project X Postgraduate (Taught) – PGT Generic research project Postgraduate (Research) – PGR This project is funded by:
Project Title: "Infographics: Data and Information Visualization and its use in Journalism - A
Case Study on Guardian's Data Store".
Start Date: 08/07/2013 End Date: 02/09/2013 Principal Investigator (PI): (student for supervised UG/PGT/PGR research)
Charalampia Boula
Email: [email protected] Supervisor: (if PI is a student)
Farida Vis
Email: [email protected] Indicate if the research: (put an X in front of all that apply) Involves adults with mental incapacity or mental illness, or those unable to make a personal
decision Involves prisoners or others in custodial care (e.g. young offenders) Involves children or young people aged under 18 years of age Involves highly sensitive topics such as ‘race’ or ethnicity; political opinion; religious,
spiritual or other beliefs; physical or mental health conditions; sexuality; abuse (child, adult); nudity and the body; criminal activities; political asylum; conflict situations; and personal violence.
Please indicate by inserting an “X” in the left hand box that you are conversant with the University’s policy on the handling of human participants and their data. X
We confirm that we have read the current version of the University of Sheffield Ethics Policy Governing Research Involving Human Participants, Personal Data and Human Tissue, as shown on the University’s research ethics website at: www.sheffield.ac.uk/ris/other/gov-ethics/ethicspolicy
Part B. Summary of the Research
84
B1. Briefly summarise the project’s aims and objectives: (This must be in language comprehensible to a layperson and should take no more than one-half page. Provide enough information so that the reviewer can understand the intent of the research) Summary:
Aim:
This study primarily aims to examine the role of information and data visualisation in journalism,
based on an analysis of the biggest journalistic Infographics portfolio in UK, The Guardians' "Data
Store".
Objectives:
To identify:
§ How data and information visualisation is used journalism.
§ Why its use is constantly increasing.
§ The various tools used either in data analysis (and possibly formulation / editing) or in
visualisation, more specifically by The Guardian.
§ Required skills and knowledge in order to work on data visualisation
§ Its importance, as perceived by the professionals.
§ Possible tendencies, norms, co-relations on Guardians' portfolio, mainly regarding subject,
visualisation type and tools
§ Limitations, weaknesses and possible negative aspects or impact of data and information
visualisation.
B2. Methodology: Provide a broad overview of the methodology in no more than one-half page. In-depth interviews with professionals, taken on a visit at the offices of The Guardian and on meetings
or Skype conversations with other freelance professionals that have worked on the creation of some
of the Infographics featured on The Guardian Data Store. The interviewed professionals may be
editors, journalists, graphic designers, data analysts and/or other member(s) of the visualisation team,
who either decide on the concept and the data used or participate on designing and creation of the
visualisations. The interviews will be semi-structured because this type of interviews allows flexibility
but also help maintain a better control of the procedure. The questions will be a combination of open
and closed format, with a possible focused discussion-analysis of selected, from the portfolio, articles.
If more than one method, e.g., survey, interview, etc. is used, please respond to the questions in Section C for each method. That is, if you are using both a survey and interviews, duplicate the page and answer the questions for each method; you need not duplicate the information, and may simply indicate, “see previous section.” C1. Briefly describe how each method will be applied
85
Method (e.g., survey, interview, observation, experiment): Interviews Description – how will you apply the method? The interviews will take place either at the offices of The Guardian in London, or will be conducted through Skype, for the participants that leave abroad, work on freelance basis or are unavailable to meet in person. About your Participants C2. Who will be potential participants? Among others: Simon Rogers, James Ball, Lisa Evans, Jacopo Ottatiavi, Paul Bradshaw. C3. How will the potential participants be identified and recruited? Suggested by supervisor, contacted by email. C4. What is the potential for physical and/or psychological harm / distress to participants? None C5. Will informed consent be obtained from the participants?
X Yes No
If Yes, please explain how informed consent will be obtained? I will obtain hand-signed consent forms from the participants I will meet in person. If No, please explain why you need to do this, and how the participants will be de-briefed? In case the interviews are held through Skype, I will send the consent form to the interviewees by email and I will ask them reply my the email that they have read the specific consent form and that by replying to this email they agree to its terms, giving therefore, their consent to participate in the interview. Alternatively, if some participants have electronic signature, they can sign the form with that and send it back to me by email. C6. Will financial / in kind payments (other than reasonable expenses and compensation for time) be offered to participants? (Indicate how much and on what basis this has been decided) No About the Data
C7. What data will be collected? (Tick all that apply)
Print Digital Participant observation Audio recording (of face-to-face or Skype interviews)
X
Video recording (Screen recording of Skype Interviews if participants agree)
X
86
Computer logs Questionnaires/Surveys Other: Skype Chat or Email with questions (In case Skype interview fails)
X
Other: C8. What measures will be put in place to ensure confidentiality of personal data, where appropriate? Both audio and/or video files of the interviews will be stored securely and no third parties will have access to the data. All interviews will be transcribed. All interviewees will be asked if they agree for their name to be mentioned in the dissertation or if they prefer to retain anonymity and to be referred to as Interviewee 1, Interviewee 2, etc. They will also be asked at the end of the interview if they wish for something they mentioned to be omitted in the transcript. C9. How/Where will the data be stored? The data will be stored safely in digital format on personal computer and a personal secondary hard external disk and no third parties will have access to it. C10. Will the data be stored for future re-use? If so, please explain The data may be re-used in the future for further analysis of the subject in a possible future article publication. About the Procedure C11. Does your research raise any issues of personal safety for you or other researchers involved in the project (especially if taking place outside working hours or off University premises)? If so, please explain how it will be managed. The research does not raise any issues of personal safety for the researchers or me.
87
The University of Sheffield. Research Ethics Review Information School Declaration Title of Research Project: "Infographics: Data and Information Visualization and its use in Journalism - A Case Study on Guardian's Data Store". We confirm our responsibility to deliver the research project in accordance with the University of Sheffield’s policies and procedures, which include the University’s ‘Financial Regulations’, ‘Good Research Practice Standards’ and the ‘Ethics Policy Governing Research Involving Human Participants, Personal Data and Human Tissue’ (Ethics Policy) and, where externally funded, with the terms and conditions of the research funder. In submitting this research ethics application form I am also confirming that:
� The form is accurate to the best of our knowledge and belief. � The project will abide by the University’s Ethics Policy. � There is no potential material interest that may, or may appear to, impair the independence
and objectivity of researchers conducting this project. � Subject to the research being approved, we undertake to adhere to the project protocol
without unagreed deviation and to comply with any conditions set out in the letter from the University ethics reviewers notifying me of this.
� We undertake to inform the ethics reviewers of significant changes to the protocol (by contacting our academic department’s Ethics Coordinator in the first instance).
� we are aware of our responsibility to be up to date and comply with the requirements of the law and relevant guidelines relating to security and confidentiality of personal data, including the need to register when necessary with the appropriate Data Protection Officer (within the University the Data Protection Officer is based in CiCS).
� We understand that the project, including research records and data, may be subject to inspection for audit purposes, if required in future.
� We understand that personal data about us as researchers in this form will be held by those involved in the ethics review procedure (e.g. the Ethics Administrator and/or ethics reviewers) and that this will be managed according to Data Protection Act principles.
� If this is an application for a ‘generic’ project all the individual projects that fit under the generic project are compatible with this application.
� We understand that this project cannot be submitted for ethics approval in more than one department, and that if I wish to appeal against the decision made, this must be done through the original department.
Name of the Student (if applicable): Charalampia Boula Name of Principal Investigator (or the Supervisor): Farida Vis Date: [insert date] 05/07/2013
88
The University of Sheffield. Information School
"Infographics: Data and Information Visualization and its use in
Journalism - A Case Study on Guardian's Data Store".
Researchers Charalampia Boula Supervisor: Farida Vis Purpose of the research This study primarily aims to examine the role of information and data visualisation in journalism, based on an analysis of the biggest journalistic Infographics portfolio in UK, The Guardians' "Data Store". The study's main objectives, among others, are to: 1) Investigate how data and information visualisation is used journalism and which is its importance as perceived by the professionals, 2) Examine the various tools used either in data analysis (and possibly formulation / editing) or in visualisation, and more specifically by The Guardian, 3) Identify the required skills and knowledge in order to work on data visualisation, 4) Discover limitations, weaknesses and possible negative aspects or impact of data and information visualisation. Who will be participating? Professionals who either work for The Guardian Data Store or have in the past cooperated and published with them. What will you be asked to do? You will be asked to participate in an interview of approximately 30 minutes, and answer open and closed format questions. What are the potential risks of participating? The risks of participating are the same as those experienced in everyday life. What data will we collect? The interviews will be audio recorded either when held face-to-face or through Skype, and might also be video recorded with a screen recording software (if participants agree) in the case where the interviews are conducted through Skype. All data (audio/video files) will be stored securely in a file in personal computer and an external hard drive and no third parties will have access to the data. What will we do with the data? The data will be mainly used for the purposes of the dissertation, but it may be re-used in the future for further analysis of the subject for a possible future article publication. If a participant wishes for the data from their interview to be used only for the purposes of the dissertation, then it should be mentioned to the researcher and the data will be deleted after the dissertation is complete.
89
Will my participation be confidential? All data (audio and/or video files) of the interviews will be stored securely and no third parries will have access to the data. All interviewees will be asked if they agree for their name to be mentioned in the dissertation or if they prefer to retain anonymity and to be referred to as Interviewee 1, Interviewee 2, etc. They will also be asked at the end of the interview if they wish for something they mentioned to be omitted in the transcript. What will happen to the results of the research project? The results of this study will be included in my master’s dissertation, which will be publicly available. Please contact the School in six months. I confirm that I have read and understand the description of the research project, and that I have had an opportunity to ask questions about the project. I understand that my participation is voluntary and that I am free to withdraw at any time without any negative consequences. I understand that I may decline to answer any particular question or questions, or to do any of the activities. If I stop participating at all time, all of my data will be purged. I understand that my responses will be kept strictly confidential, that my name or identity will not be linked to any research materials, and that I will not be identified or identifiable in any report or reports that result from the research. I give permission for the research team members to have access to my anonymised responses. I give permission for the research team to re-use my data for future research as specified above. I agree to take part in the research project as described above. Participant Name (Please print) Participant Signature
Researcher Name (Please print) Researcher Signature Date Note: If you have any difficulties with, or wish to voice concern about, any aspect of your participation in this study, please contact Dr. Angela Lin, Research Ethics Coordinator, Information School, The University of Sheffield ([email protected]), or to the University Registrar and Secretary.
90
Information School Research Ethics Panel Letter of Approval Date: 8th July 2013 TO: Charalampia Boula The Information School Research Ethics Panel has examined the following application: Title: Infographics: Data and Information Visualization and its use in Journalism - A
Case Study on Guardian's Data Store.
Submitted by: Charalampia Boula And found the proposed research involving human participants to be in accordance with the University of Sheffield’s policies and procedures, which include the University’s ‘Financial Regulations’, ‘Good Research Practice Standards’ and the ‘Ethics Policy Governing Research Involving Human Participants, Personal Data and Human Tissue’ (Ethics Policy). This letter is the official record of ethics approval by the School, and should accompany any formal requests for evidence of research ethics approval. Effective Date: 8th July 2013
Dr Angela Lin Research Ethics Coordinator
91
Appendix 2: Qualitative Research Methodology - Interviews' Questionnaire & Transcripts
2.1 Indicative Interviews' Questionnaire Do you agree to be recorded? If you want anything to be omitted from the transcript or to remain anonymous please let me know... START RECORDING: My name is Charalampia Boula and I am having an interview with ......................... We will start with some general questions.... 1) Why are Data Driven Journalism and Data Visualisation constantly growing in importance and use in your opinion? 2) Do you use Data Visualisation in your work and why? 3) Which are the main advantages on using Data Visualisation and what the disadvantages? 4) Do you believe that Data/Info Visualisation can be misleading? If yes, which is the best way to avoid this from happening? 5) How do you choose what to keep and what to omit from the available data? 6) Which are you main data sources that you prefer to use? 7) Is it important that the data you use is freely available? 8) Could you give an example when it would be not possible to give access to the data? Under what circumstances would something like that happen? (We know that The Guardian Data Store of course always publishes the data.) 9) In case the data is released by the government or another official organisation, could you say a few things about what tends to be your standard procedure? In case the data is collected by you, which tends to be your standard procedure till the final publication? 10) Big data is now a very hot topic. Could you please say what is big data for you? 11) Which are the tools you prefer to use to gather/refine/organise data and which to visualise it and why? 12) How do you decide which type of visualisation the best for your story? 13) Do you prefer static or dynamic visualisation? Could you say something about the strengths and weaknesses of each one?
92
14) I am now going to give you two potential statements and I would like you to comment on them: The first one is: Do you search for available data on a specific story and try to prove/disprove a hunch or identify the basic information hidden in it? The second one: Do you examine various data sets in general to see if there is a possible underlying story? Do you prefer any of the two statements more and which one occurs more often? 15) Now I would like to carry to your own work on The Guardian Data Store and to comment on a little bit on this and particularly to comment on/elaborate a little bit on the.... (procedure - if not explained, which tools were used and why, what made them investigate the topic, how they decided which variables to focus on, how much time it took and who cooperated on it...) 16) Do you have any insight/feedback on how are articles with data visualisation are perceived by the readers?
§ Do you believe that the readers understand them better because they are on the form of data journalism?
§ Do you believe for example that your stories worked better because they had this visualisation?
§ Do you think that most people find these data visualisations easy to understand? Could you give an example of a data visualisation or a type of data visualisation that you believe it is hard for the average data to understand?
17) According to you, which do you think are the required skills/knowledge for someone to work on data driven journalism and data visualisation on a professional level? 18) If you were building a data journalistic team for a national newspaper could you say something about the kinds of skills that team needs and how big the team needs to be? How many people?
19) How do you think that data journalism and data visualisation have changed journalism? How do you see data journalism and data visualisation develop in the future?
93
2.2 Transcript of Interview with Jacopo Ottaviani
Charalampia Boula: Hello. Jacopo Ottaviani: Hello. C.B.: Hi, how are you? Thank you for agreeing to help me with my dissertation. J.O.: You’re welcome, you’re welcome. C.B.: I had asked you on the email and you said it is okay for me to rec ord you. J.O.: Yes, yes. C.B.: I am only recording the sound. Okay? J.O.: Yeah, yeah. Sure, okay. C.B.: If after we finish you wish for me to remove something from my transcript I can do that and if you also wish to remain anonymous I can do that as well, I can refer to you with a code name or ‘interviewee number 1’, for example. Whatever you wish, if you don’t have any objections I can use your name. Just let me know. J.O.: Okay, then, that’s okay. C.B.: Okay. So, I’m having an interview with Mr Ottaviani and I will start with some general questions. And, the first question is why are data driven journalism and data visualisation constantly growing in importance and use, in your opinion. J.O.: You mean in Italy, or in the UK? C.B.: In general. J.O.: In general?
C.B.: In general, yes. J.O.: Okay, in general I would say that it’s growing as fast as the technology. I mean, the information is growing, the quantity of the information circulated online is growing so from those data that goes around on the internet it’s possible to extract stories that can be useful or interesting for the people, for the readers. So, one of the major causes I would say is the explosion of data online. And also, another reason for this could be the growing sensibility of governments and of institutions to be more transparent. For example, probably you know the Open Data. C.B.: Yeah, I have questions about that. J.O.: You know that? So, first of all I would say technology, information technology that it’s getting more common and more present in people’s lives so there is a lot of data circulated for example in the social media, on Twitter, on Facebook, everywhere. And that growing sensibility about open data and transparency of the
94
governments and then yeah, of course, this one, the third reason I am saying now it’s connected to the first probably but as you probably know the online media are taking, are replacing slowly, slowly and fastly, depends on where, but classic traditional media like papers, newspapers or magazines, everything is going online, so... data journalism is something that uses a lot, or data journalism and media visualisation uses a lot of interactives that are... that can’t be easily put on the paper but should be online so the readers can click around, they can write and interact. So okay, I think I cannot find another reason. Errm, oh yeah, data and statistics generally are easier to be fact-checking in my opinion, that people’s opinion or other sources of information so when you have a data set then you can use the scientific methods to check the quality of the data and the quality of the statistics. So, it’s more reliable somehow and the discussion on a piece of data journalism can have more methods to check the quality, that it’s you can read for example the methodology behind a data set and you can discuss about the quality of it. So you know, fact checking is another key point. And, yeah I think if I have more ideas after... C.B.: We will discuss, I have more questions so in the course and a lot will be answered to the other questions so it’s fine so far. My second question is, I’ve seen that you’ve used data visualisation and, do you use it in all your stories, in all your articles and why, or some of them, and why do you do that, why do you prefer to use it? J.O.: Okay, well personally, errm, I usually do a lot of Maths, I make Maths and I really believe in this mean, in this form of communication because it gives you an overview on some phenomenon and it’s very useful to have a map because it gives you an idea on how a phenomenon is distributed on spatial data, on some country, on some continent or even some city. So, it gives you a geographic view on some phenomenon and data visualisation in general are not only maps but they can be
also new forms of communication, of visual communication, they can be charts, they can be a combination of charts, they can be also a sort of... C.B.: Graphs and... J.O.: Yeah, graphs, networks, and they are very useful? In my opinion because they are a very fast, a very quick view on something that could not be read in any other way. So, for example, you have some data sets on some issue in the UK and you want to report on it, you want to explain that to the readers, and to make it easier, to make it faster, you can use design, you can use maps, you can use the other
95
data visualisation, and if you think it’s faster to read something from a chart, something from a, an infographic, than from the data itself or from a text, how it was 10 or 20 years ago, it was mostly text. Okay, there were already [sound break] it was not so developed like today and so now we are using more images and interactive to be faster with the readers, in my opinion it’s a matter of speed because today we have so much information and we have to read more information in less time, to make it easier to be received. C.B.: I will jump in another question that I had about that so I will go there. Do you have any feedback on how the readers, how they perceive the articles based on data journalism and they have visualisation, for example, do you believe that they understand them better because of that? For example, if the same article was just written with words and didn’t have any visualisation do you believe that the readers understand them better with visualisation, does it depend on the type of visualisation perhaps? J.O.: Yeah, in my opinion, it depends a lot on the quality of the data visualisation or the data journalism piece but in general I think if it’s well done, for example if the info graphic is well done then the readers can really understand better a phenomenon because it uses a lot of, it uses for example our perception of space if you want to show a proportion. Probably you saw some info graphics where they show how many soccer fields of forest is in one area or how many soccer fields of forests are cut down in an area. I saw something about that lately and it’s very intuitive to understand, more than text. If you say the number of square kilometres that are covered by forest in an area it’s not very easy to understand how much it is. But if you compare it, if you transform the quantity in an image then everybody knows, then it’s easier to understand the information and with maps also the yeah, everybody knows how it is, what’s the shape of each country and from the map you can see where the problems are and you can see patterns from the map that
you wouldn’t see from a text, unless the journalist already made in a study. But it’s very interesting to see how readers can find stories by themselves from some kind of map, from some kind of visualisation. Mostly, sometimes can happen that you make a map and then the readers they take stories from it because they play with it, they click around and they notice that near their place something is going on and they go deeper on it, so it’s a win-win situation.
96
C.B.: Yes. Do you have an example of a data visualisation that wasn’t, that you think that it was very difficult for people to understand? Have you seen something that... J.O.: Yes I have. If you wish I can send it to you later6 C.B.: That would be great J.O.: I'll just make a note to send it to you. You want visualisation that doesn't work? C.B.: Yes, something that you believe that it would be very difficult for an average reader to understand. J.O.: Okay, I will look for it and I will send it to you. C.B.: Okay, thank you. So, through the discussion you’ve answered already my next question was which are the main advantages on using data visualisation and do you believe there are any disadvantages on that? J.O.: Disadvantages... Let me think, disadvantages. Oh yeah, in my opinion there is one risk on doing data journalism in general but also data visualisation. And the risk is that you’re reporting stories without giving the quality that you view to them, you are just reporting something quantitatively and this can result in coldness, I don’t know if I express it very well. C.B.: Yes, I understand. J.O.: When I say coldness I mean that you’re not really giving something that creates an emotional reaction on people. When you write a story or for example when you interview somebody then you can go deeper inside the psychology of the other and it’s very difficult to do it with data journalism because data journalism is based on numbers, on statistics and statistics aggregate a huge amount of stories so I think it’s a challenge to connect the quantity and the aggregated data to the single stories and if you can create a bridge between these two methodologies then really you can have an efficient way to communicate the stories. That’s
actually my philosophy; my philosophy is to give numbers an identity. So, for example, I made that map on deaths in prisons, I don’t know if you saw it. C.B.: Yes I’ve seen it. I’ve seen both in the Data Store J.O.: Yeah the idea is that each of those markers on the map are the important stories so if you can combine the map with single stories then you can really have, you can really reach a better result in terms of communication, in terms of reaction also in the readers.
6The provided link was: http://flowingdata.com/category/statistics/mistaken-data/
97
C.B.: Do you believe that it’s especially when it affects people or it has to do with stories, like the ones on prisons, the deaths when you mention the names of the younger people. Do you believe that this is more important when it has to do with people? J.O.: Yeah, I mean that depends actually. It could be people, it could be environment but everything is connected to people. So, for example, if you map for example environmental problems or pollution over, I don’t know, over the UK or over London or over a neighbourhood and then the people discover that something is going wrong near their place then you can raise a reaction, you can create a reaction then yeah. But yeah, of course, it has to be connected to people. C.B.: How do you choose what to keep and what to omit from the available data? J.O.: Can you repeat please? C.B.: How do you choose what to use and what to leave on the side from the data you have? J.O.: Actually, I based all my problems, all my researches on my data journalism researches on research questions. So, usually I have some research questions and I want to answer to them and so, for example, in the prisons I wanted to show who died in prisons and that’s why I reported all the data that were connected to that. The questions were who died, where and how. So, for example, I didn’t write, I didn’t include some informations regarding I don’t know, the administration on the prisons. But usually I start by making questions, with making research questions that in my opinion can be relevant for the readers. Then I answered them. C.B.: Okay, that’s clear. And which are your main data sources that you prefer to use? J.O.: Well, usually if it’s available I use government data and if it’s open it’s better, because I don’t have to transform it into an open format so usually it’s easier to use for example the Excels that are made available by the governments. If not, I use
‘Scrapers’ So I make the data sets myself and in particular I use Scraperwiki, I can send you the link of that. C.B.: They have the link on your articles in the data store so I’ve seen it. J.O.: Yeah yeah, it’s a UK based business tool to create scrappers so you can also visit it, it’s in Liverpool and they have some servers that help journalists or developers to scrape data. So, scrapping probably you know what it is, it’s to create a dataset from a free text, from something that’s it not yet a data set you just take data from web pages and then you just transpose it to a dataset in excel
98
format or CSV for example. So yeah I use government data and if I need it I just take the data from other sources using the scrapping technologies. C.B.: Do you believe that it’s important to use data that it’s freely available, is it better not just for practical reasons for you but for the reader. Do you believe that for example if they see the original governmental excel file, for example, it is for them easier, or do they have more trust for example in what they read? J.O.: Well yeah, usually of course, if the data comes from statistics use from the government there should be more work behind it of course but that depends on the country and it depends on the institute. But sometimes of course now if we think of the UK or if we think of America or if we think of the first world countries then probably the datasets are reliable but for example there could be governments that don’t release data or if they release it they don’t release it completely so that depends from the country but yeah if we stick with the UK and the Europe governmental data or mostly I would say for example Institutes of statistics like the Royal Institute of Statistics in the UK or every country has one of those, those are quite good usually. In Italy I use that too. And they’re moving all their work online now yeah, so... C.B.: Do you have any example why you are writing a story but you cannot, erm, that it’s not possible to give access to the data to the reader? For example, in the Guardian there is always the data file to download for the reader. Have you seen or have you worked on an article where you couldn’t release the data where you had just to give some facts and nothing more and for some reason you couldn’t make it available to the reader? J.O.: You mean if I found some source... C.B.: It could be a confidential source or something that it’s not allowed; it could be for safety reasons that it shouldn’t be published. J.O.: Uhm, well there is the most important case of Wikileaks probably you know it
and it was very controversial because when Wikileaks published all the day people said it wasn’t checked and whether it included lots of names of people who could risk their lives afterwards. That’s probably the most important case. And for example they included the names of people who worked for America or also local sources of information and they could risk their lives after that. But yeah that’s the most important example in my mind now. But of course privacy is very important and we, data journalists, we have to be careful to expose all the information that is necessary and to be careful with the personal data that we are going to publish.
99
C.B.: Okay, that’s good. And in the case that the data is released by the government or another official organisation, which is the standard procedure that you tend to use in order to reach to the final publication of the article? J.O.: Well, first I choose the topic. For example, if I want to report on some environmental issue then I start looking for all the statistics published by the local institute of statistics then I look for the data released by the Ministry of the Environmental Protection and then I try to see if there are any independent observatories that release datasets, that happens usually with important issues. And that’s also very interested to see the differences between the data released by the governments and the data released by the independent institutes or by the independent associations. And then when I have identified the most relevant datasets I try to combine them if possible or to take what I need from each of them and to merge the tables if it’s needed or even to show the different scenarios that emerge from the different datasets. So can by that one visualisation is based on one dataset and another one is based on another one. So, I try to get the most out of everything. But usually I would suggest to use more datasets coming from different institutions because this is a way to see how every institution can be biased. Or... that’s it yeah. C.B.: Do you also do the visualisation yourself? J.O.: Well, usually I do Maths myself yes and then usually I also make some charts and I do them myself also and the scrappers yeah I do them myself and yeah, although it would be better to have a team. You have to consider that in Italy, like in all the Southern countries in Europe data journalism is not that developed actually. So I’m one of the first persons who made it in Italy and it’s still very difficult to tell the editors that’s it’s very important to have a data journalism unit in the newspaper. They don’t understand the real value yet of it. And this in Italy or also in Spain and Portugal, I suppose also in Greece.
C.B.: Oh yes it’s probably the same. It’s not that popular yet or if it’s used it’s used in a more premature way. J.O.: Yeah yeah, I agree. C.B.: And in case that you collect the data yourself, I mean, do you work purely on raw data? J.O.: No, no. Usually I use datasets.
100
C.B.: Okay, so then I’m moving onto another area that is quite a very hot topic at the moment; big data. What is big data for you? Because every person has a different perception of what big data is. J.O.: Yeah, big data in my opinion is that data that cannot be handled and elaborated by a single computer. So, also I can tell you that, also from my computer scientist’s background, because I have bachelor in computer science. So probably my definition of big data is more scientific and it’s not the same of what is represented everywhere nowadays. In my opinion big data is that data that can be elaborated with parallel computers, with multiple calculators that can handle extra bytes of data and you can imagine big data can be the DNA that comes from the human body or the huge amount of, without using the example of biology, you can think of the whole amounts of information that is going around on Twitter, that’s big data. It’s really "unhandle-able" and unreadable with simple computers, with the computers we use at home. You need special computers and special algorithms to handle it and to get meaning out of it, yeah. C.B.: Do you believe it’s only the dimension of quantity that defines its peak or could it be duration, for example, if there is a data that goes on for a very long time that they try to collect? J.O.: Well, actually big data itself it might be that it’s a matter of quantity yes, but elaborating more at least you can say that from big data then you can extract little datasets that come from big data originally but then they can be handled by simple computers or by simple developers but yeah you can still consider it like a result of big data, a self class of the big data or a subset yeah. C.B.: Okay. Which tools apart from Scraperwiki that you already mentioned, which other tools, free or paid programmes, you use to either gather, refine and organise data and to visualise? J.O.: Okay, well there are plenty of tools but my favourite tools are Scrapper wiki as
you said then Google refine, then, let me think, I can also see here, and in my works I use Batchgeo, I don’t know if you know it, it’s to make maps C.B.: Yes I’ve seen that yes. J.O.: It’s a very simple tool to make maps, it’s called Batchgeo then Google Maps is useful but nowadays I’m changing to Open Street maps and Map Box and Leaflet that are more difficult to use but they give you more freedom when you want to make maps than Google Maps. Google Fusion Tables for example is very easy to use but it doesn’t give you a lot of freedom. So I prefer Map Box and
101
Leaflet, that’s another one. If you want I can also send you the links of this but it’s very easy to Google them. C.B.: I’ll Google them and in case I cannot find any one I’ll ask you on an email. J.O.: Yeah sure. And, let me think of some other tools. Ah another good one is Datawrapper. C.B.: Yes, I’ve heard of that as well. It’s very used in the Data Store as well, I’ve seen it. J.O.: Yes, Datawrapper is very good and it lets you do charts very easily and interactive charts. C.B.: And do you prefer static or dynamic visualisation? By dynamic I mean interactive. J.O.: I prefer dynamic yeah. C.B.: And which are the strengths and weaknesses of each one, the dynamic and the static? J.O.: You mean the main differences between the dynamic and the static? C.B.: Which one is better in what. Mostly which one is weaker, it’s not that good to do something. J.O.: Yeah, well first of all it depends if you’re publishing online or on paper. If you publish on paper you have to do all static because it’s obvious, but if you publish online it’s always better if the users can interact with what they have in front of them. And so if you have a dynamic visualisation on which the readers can click and they can click with it I would say it’s suggestible because it involves the readers on a major degree. But yeah in some cases you have to be careful when making interactives because if they’re too complex then the users don’t know how to use it. So in that case probably you have to simplify it and in some cases if it’s not possible to simplify the dynamic one you can even do the static one. But it’s always a balance between complexity and efficiency and also beauty sometimes,
yeah. C.B.: How do you decide which type of visualisation is better for your story? And does it depend on the topic or on the data you have? J.O.: It depends on both. And actually sometimes you would like to make some stories but the data doesn’t allow you to do it. Or for example you would like to map a phenomenon but to do that you need geolocated data and that’s not always available. So it depends on what you want to represent but when you found out what you want to represent or expose in your stories you have to also find the data
102
that allows you to do that. So, I would say that the data infuelnes the choice of the data visualisation, yeah. C.B.: Now I’m going to give you 2 statements because I separated in 2 different statements rather than a very big question that would be confusing and I would like your comments on those. The first one is: ‘Do you search for available data on a specific story and try to prove or disprove a hunch you have about something, about a topic and try to see if the data proves or disproves what you had in your mind before writing the story or finding any other information about that? J.O.: That’s an interesting question. Sometimes we have prejudices and of course we have to deal with them. I don’t know if you read the article on ‘The Independent’7, I can send it to you in which it’s written out the perception of reality it’s really far from the reality itself. And they ask people to answer simple questions about statistics on general issues in the country. For example: ‘Is crime increasing or decreasing?’. And all the people said it’s increasing but then if you check this statement with the data you discover that it’s not. So sometimes we are sure about something but when we read the data and when we analyse the data we discover that it’s not like that. And that is one of the most important aims of data journalism; to check what is written around by journalists by opinionists, to double-check it and give a reliable opinion on it based on data, based on statistics. So I would say that everybody has prejudices but if you want to be a good data journalism you have to be open to review all you prejudices and to disproof what you had in your mind and yeah that can be critical can be also controversial because it seems that if you want to review some issue using the data it seems that you want to carry the message that that issue it’s not so important because okay it’s decreasing for example the phenomenon of domestic violence it seems that you want to say okay we are worrying too much but no, it’s never too much. So okay, sometimes you have to tell the truth but you have also to underline that this can be still a problem
even if it’s smaller, it’s important to act on it. C.B.: I understand. Do you sometimes try to see if there is a story without having anything in your mind when a dataset is available online or is realised by the government. J.O.: Yeah, sometimes it happens that I have a dataset I find it just by chance, I start reading inside it very quickly and if I see something interesting that can be 7 Link to the article on The Independent: http://www.independent.co.uk/voices/comment/immigration-crime-benefits-everything-you-know-about-the-state-of-the-nation-is-wrong-8697574.html
103
geolocated and/or can be represented with data visualisation then I keep it here in my computer and I try to make something out of it. C.B.: But what happens more often is that you usually have the story in your mind and you try to find the available data on that? J.O.: Uhm, yeah, more than the story I have the topic. Because the story should come afterwards when you have the data. But yeah, if I have the topic in my mind then I start looking for the datasets or the possible data sources and when I find them then I continue, I keep on going with the research and with the refinement. It’s a long process usually, it’s never very quick. Anyway I would say that data journalism is a branch of investigative journalism somehow, so it takes time, it takes long time to do it. C.B.: It takes time. For example, the one about the Italian prisons that I read, your piece on the Data Store. How long did it take for you to reach to the final? J.O.: Well yeah. It took a month, exactly one month and I have to say that it was the first time I was doing something like that so I have to learn a lot from the technical point of view, but for example that was my first Scraper I made with Scraperwiki and that’s something related to programmation and although I was already keen on programming and I was already familiar with programming I had to learn how to use that tool, that particular tool. And that took a long time but apart from that also managing all the issues related to data journalism anyway takes long times. You can reduce the part related to developing because the more you’re familiar with a tool the faster the experience with it but usually it takes a while. And of course if you have a team working with you it takes shorter, if you have somebody that makes all the software development part then you can concentrate on the data investigation and you can make things parallel so you save time But if you’re doing all by yourself like a one man band then it takes time so the prison work took one month.
C.B.: And what made you investigate that specific subject; of the many deaths in the Italian prisons. J.O.: The importance of the issue in my opinion that it’s a big problem in Italy. The prisons are overcrowded there is no space for prisoners and a lot of them are committing suicide and it’s all kept in silence, nobody knows about them. So, I was morally involved in that I felt that okay that’s something that should be underlined, should be put in the centre of the attention, of the public opinion. So I would say
104
that the relevance of the topic pushed me to do more and to look for the datasets and then to visualise them with a map, to map them and to visualise them. C.B.: Do you have time for 2 or 3 questions; my final ones? Because I don’t know if you need to leave. J.O.: No no you can, yes. Please. C.B.: According to you which are the required skills or knowledge for someone to work on the data journalism and data visualisation field on a professional level? J.O.: I would say that the first thing to study is programmation. So some programming, some statistics, some design and then journalism like ethics, all what is already studied in all the journalism schools but you have to add the programming, you have to add design well okay social media, how to use social media but that’s probably the easiest of the ones I’ve mentioned. Statistics is important also, you have to study how it works; statistics. C.B.: Specifically for programming do you have any...? J.O.: Yeah I have some suggestions. I’d say if you don’t know anything you have to start with HTML, then CSS, then Java Script and then could be useful to add some Python and it could be useful to study for example how to use Excel but in a deeper level so use the macros for example tables how to use Pivot tables and Google refine also requires some programming. Oh and another cool thing would be regular expressions (?) and yeah I would start with this. Oh okay you can also study some MySQL, you know that? That’s also important how to handle databases. C.B.: Yes okay. So my next question is if you were building data journalistic team for a national newspaper uhm which... J.O.: Yeah okay I would like to, I would like to but in Italy... I think something is moving now but it’s still necessary sometime, it’s not yet concrete. C.B.: So if you were the first one to build a team like that, for example if a
newspaper a major newspaper asked you to do that how big that team will be how many people and what types of skills would they have? J.O.: I would say that 3 or 4 people are enough. So I would prefer a small team with highly educated highly skilled people and I would include one programmer, one designer and one or two journalists and okay if you really want my dream team I will also include a statistician. Yeah that would be very interested to have one statistician who can really give you suggestions one statistics on a technical level you can make big things with that. But yeah so it’s all about combining different
105
backgrounds together. And would be also interesting that every member of the team knows a bit of every skill. Of course everybody has to be specialised but the people should exchange ideas also on what it’s not their real field. It’s interesting to see how their different skills overlap. C.B.: My final question is how do you feel that data journalism and data visualisation have changed, journalism in general, and what do you see in the future? J.O.: Journalism in general well it brings the question of statistics and numbers in the centre and for example what I was saying before, fact checking, data journalism helps to put in the centre the question of fact checking. That is, is what I’m writing right or not? It’s a matter of truth I mean because if you write your opinion without basing your opinion on numbers on statistics on something that is scientifically provable the risk is that you’re just giving an opinion, which doesn’t represent the reality. So data journalism is helping to put the truth in the centre and also it helps to, for example, push or foster the governments’ transparency but all this movement of data journalism is always asking the governments to open up their archives and their data because governments have a lot of data and from that data you can write very interesting stories but, of course, not all the governments wish to release their data because often it’s controversial and puts them in a bad position but data journalism is helping to involve governments in this process. C.B.: I understand. And what do you see in the future for data journalism and data visualisation? J.O.: I think it will get more popular more spread and because the online media offer a lot of opportunities and since the paper is going to disappear quite soon, not so soon, but I would say in some decades or even less what the online media offers gives data journalism a possibility to extend, to expand, to get more popular. C.B.: It is easier to share also, isn’t it?
J.O.: Yeah, it’s also easier to share and it involves the readers on a bigger level, on higher level and people can comment single instances can interact between themselves, can also add to fact-check or can also add to build datasets actually with crowdsourcing. That’s another really interesting reality and methodology. I would mention it between the most interesting ones because it involves the readers to build a dataset and to contribute at a new story of example. C.B.: Okay, thank you very much for your help. J.O.: You’re welcome, I hope it was helpful.
106
C.B.: It was really helpful, there were some things that have never crossed my mind and know I have a mini perception. [...A quick conversation about the Master's programme in Sheffield follows, which is not relevant to the research] C.B.: Is it ok if I use your name? J.O.: Yeah, I think I haven't said anything bad, right? C.B.: No, you haven't. J.O.: So, yeah, you can use my name. C.B.: Thank you.
2.3 Transcript of Interview with Lisa Evans
C.B.: So, for the record, I’m here with Miss Lisa Evans and I’m having an interview for my dissertation. Miss Evans I would like to start with some general questions. Why do you believe data journalism and data visualisation are constantly growing
in importance and in use in your opinion? L.E.: In Europe? C.B.: Aha. L.E.: So was the question...? Could you just? C.B.: To repeat? Yes, I’m sorry my voice it’s a little bit... L.E.: No, it’s not that it’s just that I think it cuts out a little bit. C.B.: Why do you believe that data journalism and data visualisation are constantly growing in importance and in use in your opinion? L.E.: I think it’s partly because the technology has become available more easily for people to use so like the barrier for access to making charts really quickly has bellowed quite a lot in the last couple of year, in the last few year and that’s because, partly because there’s been more funding for those kind of projects and the bigger companies like Google and so forth have made visualisation projects and then it kind of makes sense doesn’t it? With news been online rather than just the paper through the door there is two things coinciding quite closely cause we previously had charts that took a long time, well not a long time but a lot of skills to make and were really star lighted for the newspaper and now we put them online and everything is a kind of quicker so I think improved technology and a move online for all kinds of communication has brought about more data journalism and also obviously more governments are releasing data more this called transparency
107
movement has taken place where there’s some accountability, you gain accountability through being open about how you contact your business. So yeah 3 things come together with data journalism. C.B.: Okay. I’ve seen that you use data visualisation in your work, why do you use it which are the benefits from this? L.E.: You can communicate very quickly and kind of intuitively some fairly difficult ideas, difficult to explain ideas otherwise in words, so yeah it’s kind of a really powerful medium to be able to visualise things, not that I do it brilliantly well, just there’s much better people but with being at the team of the Guardian there’s great skills to work together and make visualisations that really are effective. C.B.: Which do you believe, if there are any disadvantages on visualisation? L.E.: Uhm, yeah I think, oh I should add to the previous question you’ve just asked. I think people, we notice, well I notice, that there are 2 sorts of visualisation, ones that are just like one quick instant message and ones where people can take time and explore them like the really detailed maps that we make. So, with the last, they really explored, the things that people take time and explore that’s quite a good way to engage with your audience because by spending more time, by thinking more they’re usually adding more comments or going and looking at other parts of the websites then you’re able to establish more of a relationship with them, with certain types of infographic. And then the disadvantages of using an infographic, well it’s very easy to get it wrong and there’s so much involved in it as well, I mean like just a simple bar-chart there are so many decisions that you have to make when you’re creating it, if you’re doing it all from scratch like the graphics team do it at the Guardian. And so they have their own conventions such as do you put the number on the scale on the line, below the line, above the line; do you put the last, like, when do you cut of the axes, what level do you have the data going above, if you use the greatest value you’ve got on it and so forth and yeah so there’s a lot of
decisions to be made and obviously you can do that beforehand, you can have that as a convention that you just work to, but other people come to it with different eyes and then yeah sometimes you worry that you’re giving the wrong impression cause you do always have to decisions about you want to emphasise on the graphic even from a huge dataset that’s got lots of different stories in it. Sometimes you might be emphasising something and you’re missing the bit, the best point or something like that so yeah they’re more complicated to deal with.
108
C.B.: Do you believe that data visualisation can be misleading and if you believe that this could happens which is the best way to avoid it? L.E.: Yeah they can definitely be misleading and the best way to avoid it is publish all the data so that your audience can go away and look through and write in the comments ‘Hey you’ve missed this point that’s much more important than the ones you’ve emphasised’ or whatever. C.B.: Yes, I understand. How do you choose what to keep and what to omit from the available data? L.E.: It’s, all comes down to judgement. After a while it becomes kind of more instinctive like when you’ve engaged with, when you’ve read all the comments and things that people respond to and also there’s applying test and things like what’s the biggest, the classic thing is what’s the biggest percentage change or what’s the biggest change in this data and that might often be the emphasis than you want to give them but other than, but then there’s like the, about the meaning of that data so there might be a small change or something that really matters a lot to people so yeah all comes down to judgement and experience, and having a good team around you who’s been experienced in writing for the newspaper and working for the Guardian for a long time, was really empowering (?). C.B.: Which are the main data sources that you prefer to use? L.E.: Usually we go for official types of sources so things from agencies that, and organisations, that have got a good methodology and they’re well respected so for example with the situation in Syria we, to start with because there weren’t official figures, we just gathered data from other newspapers and our own newspaper to count a number of people who lost their lives since the uprising and then the UN published their own figures so you automatically go to the more official source and use that, but we can still use the references to the newspaper articles that quoted deaths and things.
C.B.: Is it important that the data you use is freely available? L.E.: Yeah, yeah we always have to share it pretty much so that’s kind of the way that the Data Blog is set up instead of having this very fixed format where you’ve got headline, brief summary of the article, then a description of data and maybe some inforgraphics in that and then always at the bottom the data source so yeah the data has to be freely available otherwise we break that format. C.B.: Have you even been in a situation where you couldn’t give access to the data to the people? You couldn’t provide the data online?
109
L.E.: Yeah, sometimes, we have to learn to be really really plain with people when they share their data that we would want to republish it and that sometimes cause trouble so I think especially with private companies cause they’re data soft and commercially sensitive and if one person in their team gets afraid that they’re giving away something that might be combined with some other data somewhere and they will reveal something that’s usual (?) to their business they might call out say that’s, so you just have to be very clear with people at the start like we are the data blog and we publish data that’s kind of what we, that’s the format that we’ve got so and then that kind of avoids that situation. C.B.: In case that data is released by the government or another official organisation could you say a few things about what tends to be your standard procedure? L.E.: So, when the data is released, like the workflow, kind of? C.B.: Yes, exactly. L.E.: So, normally we look through the data and then and read kind of some summary of what data means and then I would call someone if that doesn’t make sense call the press officer or some contact that’s been given and then make a decision about whether it’s worth running as a story that day and then we’ll start either cleaning up the data which means that we put it usually in a Google doc and get rid of all no-cells (?) maybe think about what’s really essential in this data set and then if there’s something that doesn’t make any, doesn’t add any information then we clean that up and we’ll always link to the official source anyway so people can get their full dataset and then but often it’s just like a few columns in which case then we’ll just keep all of that but sometimes it’s sheets and sheets of someone’s spreadsheet or like we need to tidy it up cause we don’t want it to like a complete mess when readers look at it and then we think about what can we do visually with it, sometimes there isn’t anything that it’s obvious to do, the data
stands by itself and it’s interesting enough then we think oh well what can we do, what percentages’ changes make any difference to this think about what we can do just in terms of making it more meaningful to the readers like, if it’s say it’s something like the number of and we always try an bring things down to the personal level if we can so like how it affects individual so if we’ve got like a total amount of spending for a whole country we might think well maybe we can put this in terms of amount per person so do a calculation like that and then write up what we’ve done and what we think of the data and get any quotes that are relevant and
110
then start building the Blog post around that and the we just pass it onto the editor and give it a good look over it see if we’ve made any obvious mistakes with other people in the team and then publish it but it’s all quite a quick process it usually happens within a few hours for some of the posts other ???? of the press will take a lot more work but if you’re gonna keep relevant and up to date with the news then we really need to be able to publish things every few hours. C.B.: In case data is collected by you, it’s not released by an organisation or a government is the procedure different? L.E.: Yes, that would take a lot longer or at least, well for example when it was the London riots a few years ago then we were gathering reports of rioting and mapping them as they happened so that was like we had the map live really quickly and a few reports mapped on it and then we just kept adding to that throughout the day so that was like a continuous process that we made, that story continuous and there’s a few cases of that and then a similar thing was with Syria when we looked at reports of shootings and things but that wasn’t released, that took a while and it we didn’t release until we’d got enough data so yeah there’s different types of stories. C.B.: Big data is now a very hot topic; could you say what big data is for you? L.E.: Uhm yeah I think big data is really it’s kind of a different skill-set really to what we did at the data blog, it’s a lot more statistic heavy and a lot more requires a lot more a different set of skills for managing that data so you need to be able to use Unix and so forth as far as I’m aware there aren’t any tools that are equivalent of Google Charts and Google Spreadsheets for big data so you really got to be a programmer or at least now a command line tools that we just feed through lots and lots of data and pick out some things but yeah I think it’s gonna be interesting what happens with big data. C.B.: Which are the tools that you prefer to use to gather, refine and organise data
and which to visualise and why do prefer those? L.E.: So things that we use most frequently in that job were Google Docs and Google Maps and then for bigger datasets we used Fusion Tables, Google Fusion Tables and then it kind of with the things that we all there are like high core things that everyone used and then other people develop skills like John got really good at Tableau and yeah part of that job was working with the graphics team a lot so we were really hired by the graphics team to be researchers for them and then data blog was kind of an outcome of the research that we did for them that we could put
111
online in lots of ways so it was always kind of like half doing data blog stuff and half doing graphics research and working with them really closely which was great because and they use all the Adobe chores like Photoshop things and Illustrator and things so yeah we had a little bit of use of those things too, we had access to Photoshop and illustrator and occasionally use this but yeah and then we did play with some things and then found that they weren’t that useful for our particular job but were incredibly useful towards in general so we used Refine a little bit but mostly we found that with the data size that we had we could mostly do it by hand anyway like we weren’t often dealing with huge datasets that needed Refining anyway yeah so the Google suite of tools were our main ones and then Excel was useful, all the cleaning up happened on Excel and then we just imported it yeah. C.B.: How do you decide which type of visualisation is the best for your story? L.E.: Working with the graphics team really and then if we didn’t work with them what’s available already we’re fairly limited on the Google charts so yeah uhm it wasn’t that tricky but yeah I guess you apply the principles like if you’ve got a complete dataset and then you want to set, and then you’ve got it divided into different pieces then like if you’ve got total spending and then you break it up into pieces then that’s something you could use in a pie chart or but otherwise you wouldn’t use a pie chart just kind of applying those basics, charting principles and yeah. C.B.: Do you prefer static or dynamic visualisation? L.E.: It depends sometimes static visualisation is a complete picture but sometimes dynamic is usually more difficult to do but it often gives you that kind of relationship with the readers where they can properly explore for themselves and find things with in comments and stick around for a bit longer so yeah finding a really good topic and making an interactive for it it’s really kind of the best thing for the relationship with the readers but then like forcing something into an infrographic,
into an interactive it’s not gonna work either so it really depends on the topic. C.B.: About the static visualisation which do you believe are its advantages? L.E.: I think just in terms of like initial visual impact it’s the best and then yeah that’s I think that’s its main advantage it’s just convenient, very focused on what it’s doing C.B.: I am now going to give you two potential statements and I would like you to comment on them. The first one is ‘Do you search for available data on a specific
112
story and then try to prove or disprove a hunch or identify basic information hidden in it?’ L.E.: Uhm so which one...? C.B.: This one is the first one. The second one is if you examine various datasets in general to see if there’s a possible underlying story. L.E.: It’s usually the first that we sometimes there’s like kind of investigation but often with this it’s just obvious that something is really viable to the news that are giving vine (?) so like if housing is in the news then finding a relevant dataset that will explain that story more deeply, it’s usually driven by what’s at the news at the moment, what’s on people’s minds, what issue needs explaining more, and then we find a dataset and then we look into it maybe that there’s something that comes up from that but more often is just a deeper understanding of a story that’s already running at the time so that applies with the riots that for example like there’s no other stories to do and there are riots the next (?) and then a couple of summers ago so you just you look for anything you can on that issue to make it clearer to so in a sense you’re digging more into a story uhm that’s already present but you’re not like taking that data and then that might happen at a latest date which it did with the riots’ stuff where we took the locations of people who convicted of rioting and looked at where they lived and then layered a deprivation index on top of that so you could see that relatively clearly that the people who were involved in the riots were more likely to be from a poorer background so yeah often it’s driven by the news and then sometimes you go deeper into investigation. C.B.: I would like now to carry to your own work on the Data Store and comment a little bit on a specific story a specific article story line that perhaps has a great significance or has an interesting story behind a if you could say a little bit of how, which story it is because you have published a lot in The Guardian Data Store, and what made you investigate the topic, how you decided which variables to focus on.
I will let you select which one because you have published a lot and I would like to know which one for you has the most interesting story behind. L.E.: Yeah so we had this guy from a guy who’s been investigating a story which was all about people who claimed disability who claimed job seekers allowance, have their benefits cut so there was uhm he’d found someone who had worked at a job centre and they were being biased to reduce people’s job seekers allowance claim and put them onto something called put them like put their claim on hold while they were being investigated so they were "sanctioned" that’s what they’re
113
called and so he’d kind of got a story but he’d want it to back it up with some figures and so we went to, we used a database that’s on the Department for Work and Pensions which has the numbers of people who’ve had their benefits cut and been sanctioned, it’s like a temporary cut in their benefits and yeah so it wasn’t completely obvious the figures were really the naming of the figures was all a bit kind of cryptic so we had to do a lot of phoning with the Press Officer who obviously didn’t want us to run the story and so it was kind of finding all kinds of confusing things to distract us so but we did eventually get the data that we needed that we could feel confident with and then we did some analysis to look at how that the number of sanctions had changed overtime and there was this big search also how the sanctions had changed over the country overtime and we put it all together in a map and it really reinforced that story and gave it some real weight and then they were able to run it as a front page story and so yeah I think that was quite a good example of using data to back up something that we knew was an issue. C.B.: How much time did it take to produce the final publication, till the final publication? L.E.: It took about a week and they wanted to run it at the end of the week anyway cause it was one of the longer running stories, like it was a longer time frame to prepare for it because it was always gonna be true (?) Yeah. C.B.: Approximately how many people did work in this story? L.E.: So there was the guy who got the whistle blow and made the video out of it and then there was me and Simon helped out at the end with the calculations when I was thinking ‘Oh Gosh we haven’t got a story!’ and he was like ‘No no I think we really have!’. So three of us but, yeah... C.B.: Okay do you have any insight or feedback on how articles with data visualisation are perceived by the readers?
L.E.: Well there’s all the comments at the bottom which you really really, we always, I always read them anyway cause you’ve got to kind of take it with a balanced view cause sometimes people just hate pie charts and that’s their thing and they and they’re not kind of justified in complaining about use of pie charts cause they are perfectly reasonable thing to use on a particular dataset and then other times they have got a real point about certain things and so in general people like it I mean the popularity of the Data Blog and the time the three years (?) that I was there it just grew like to this kind of huge extent where people were actually, it became the
114
place where people sent their visualisations. And yeah I think people enjoy cause it’s just nice and it seems that it’s less time consuming in lots of ways and you also get the chance to explore it for yourself and it’s just a moth a kind of moth, light, fun way of learning about what’s happening in the news but in more detail and just this thing is happening and also like to get into this huge datasets that the government was releasing like what’s the bigger picture of these huge databases that they’re releasing. So yeah we got I think that we are going to find without a doubt that there’s a huge interest in visualisation, people get hugely enthusiastic about it and really passionate about it. C.B.: So you believe that the stories, your stories were better because they had visualisation? L.E.: Yeah I think people enjoy a story with something that they can look at as well quite often it’s when it’s really nicely put together and a lot of thought has gone into it and if there’s add to the story sometimes we did stories where the graphic was the thing that the article talked around so when we did for example the superpowers of China and the US and that shows lots of different factors the infographic was the huge, the biggest part of the page and then the article talked around the, it was built?? around the infographic. C.B.: Do you believe that readers understand visualisation? L.E.: I think generally they do yeah, I think if they don't like any feature, in the comments you'll be understanding, I mean that you’ll definitely be told if it doesn’t make sense. C.B.: And do you have an example of bad visualisation? L.E.: I noticed the other day they put up the ten worst visualisations. But yeah I don’t really I can’t really think of anything. I think the thing that bothers me the most in terms of visualisation it’s not the visualisation itself is the approach to the data analysis that’s wrong and then the visualisation follows on from that so, for
example, if you’re looking at mortality rates in hospitals and some hospitals have one or two patients with that particular like mortality rates from heart disease in different hospitals and some hospitals only have two or three patients who have heart disease but yeah and other hospitals have thousands of patients. Then to do a percentage check, a percentage of mortality rates on those hospitals and then put them side by side and basically turn that into a league table is really bad analysis and that can be much better to do statistical analysis on that where you look at what you would expect from a random uhm like if you ask the question that
115
was more like are these results more than you’d expect if there was a random number of deaths in that area so is it more than something that would just happen by chance? Is a much better question to ask in that dataset than which has got the worst percentage mortality rate because you’re not really comparing like to like (?) and then so you would need a different visualisation than say to visualise that question about chance are the death rates more than, greater than you would expect by chance? Then you have to visualise that differently to if you’re just asking about mortality rates which you might just got as a bar chart or a league table or something and then that’s really misleading it can panic people who live in an area where they’ve got the highest percentage mortality rate even though they’ve only got a couple of patients. So yeah there’s a kind of issues and I think they’re the worst mistakes that people can make just not applying the right analysis. C.B.: Is there a specific type of visualisation that is hard for people to understand? L.E.: Yeah there’s, I think network type things are really hard and people in the comments often say that doesn’t mean anything so yeah network diagrams seem to not work so well and I think that’s also because of the analysis too like you need to really apply some kind of network analysis to the data and then it might be that that’s a visualisation and need it if you come up with some pretty solid analysis on it. So yeah I think that’s... C.B.: The network, I’ve been told by another interviewee about this specific type as well. Do you believe that in general the readers understand the articles better because they are visualised? L.E.: I think so yeah, I think it really, really helps to give people a way into big datasets or to a big issue. C.B.: According to you which are the acquired skills or knowledge for someone to work on data driven journalism and data visualisation in a professional level? L.E.: I think it’s actually not as high as I would have thought before I entered it
because a lot of it you don’t need a deep statistical background which I thought you’d probably would, actually it’s quite good to, I mean it would help enormously but it’s actually okay to be really sensitive and ask experts in that area so for the Olympics last year a couple of statisticians I think three statisticians were hired to look at the data and they did a really nice job and so I think it’s okay to work with statisticians and not be a statistician and that doesn’t devalue what you’re doing because if you’re very sensitive to both on the one hand like the statistics but you’re not completely ‘I’ve spent years studying’ and then on the other hand to
116
what your audience would be interested in and what would benefit people in general to know or what they’d respond to, the way to put things. Then I think it’s much more of a bridging gap, bridging that gap than there’s being an expert at statistics AND being an expert at writing and being engaging to people and stuff. So yeah I think being very sensitive to most of those things. C.B.: I have a couple of more questions and then I will be fine. If you were building a data journalistic team for a national newspaper how many people the ideal would include and of which speciality? L.E.: Yeah so l’d definitely choose someone who’s a really good graphic designer cause the graphics team at the Guardian and the interactives team they were just really, really excellent and experienced and they were a lot more, they weren’t just designers they were data analysts too in lots of ways so they wouldn’t just draw a picture of anything unless they understood it and they were like a check on that so I’d pick someone just like them at least one or two people like them and then the interactives’ team would just the same so maybe a couple, okay one person who’s an interactive designer who’s as good as the people that were working there at the time and then I think someone who’s really experienced in these, like Simon, and then someone who’s fairly kind of young but maybe got some kind of Maths background yeah cause I think you do need someone who’s gonna question that but not to the extent that they won’t run any stories like they have to work really well with the person who’s, cause the person who’s, who knows the news really well and knows how to, the kind of level that you need to communicate at. C.B.: Great. And how do you think that data journalism and data visualisation have changed journalism in general? L.E.: Yeah I think they’ve, well at The Guardian the data journalism team was really well respected and part of the whole, people from all different specialities and news would come and talk with us and kind of, and work with us rather than treat as like
a service, do you see what I mean? So they would treat us, not like we were fact finders for them but more like we were some, one team (?) that could work together, we could do a data spin on a data heavy story on their topic that they’d been investigating that would add topic to their piece if it had facts behind it and so I think that wasn’t really initially always the case when I first started we were kind of part of the graphics team and over the two years that I was there we became much more kind of respected members of the journalism team as well as the graphics’ team.
117
C.B.: What do you see in the future for data journalism and data visualisation? L.E.: That’s a good question. I think we definitely see better tools, I saw the other day a really nice CartoDB, It’s brought out all these lovely pictures that would have taken lots of time to make by hand or would have needed an expert and now you can connect up and layer two datasets on a map and that’s really good. And then so the tools are definitely getting better, they’re just gonna improve and yeah I hope more people will come onto it who’ve got both really good stats skills and also really understand what’s useful to people. C.B.: Okay, thank you very much. I would like to ask you only if you agree for me to use your name on the dissertation. L.E.: Yeah sure. C.B.: Is there something that you’d like me to omit or to delete or not to include, something you’ve said? L.E.: It's ok. C.B.: Thank you very much, you’ve been really helpful and I really appreciate your help, thank you! L.E.: No worries. Good luck with it. C.B.: Have a nice day and a nice weekend! L.E.: You too! C.B.: Goodbye! L.E.: Bye bye!
2.4 Transcript of Interview with Paul Bradsaw
C.B.: I’m here with Mr Bradshaw, my name is Charalampia Boula and I’m having an interview. I would like to start with some general questions. Why are data driven journalism and data visualisation constantly growing in importance and in use in your opinion? P.B.: Well that seems that do you think that? Well I think data, journalism around data is more important partly because data is becoming more important and journalism’s role is partly to hold power to an account and data is a form of power at the moment information is power and data is used to make a number of financial and political decisions. So it’s very important from that point of view journalism is also about communication and translation and with large amounts of data visualisation for example is a way of communicating clearly what might actually be
118
loads and loads of number and would have otherwise be less interesting and fewer people would find that data. C.B.: I’ve seen that you’ve used data visualisation on your work so which do you believe are the main advantages of using visualisation and if there are any disadvantages, which could they be? PB.: I think visualisation is very good for grabbing someone’s attention it can almost be used like a headline or like a quote would be traditionally used in text-only journalism, it can be a good way of demonstrating a complex concept on complex story more simpler. And also it’s good for people who are not textual people, a lot of people are very visual they search visually they communicate visually people so it broadens the range of people that a story might have an impact with. It does have drawbacks like anything, I think it can oversimplify, it can, you can lose the subtleties and complexities of a story and so I think it’s important often to use visualisation in partnership with other information attached with video or whatever. C.B.: I’ve read in one of your articles that you said that sometimes the inforgraphics travel on their own... P.B.: Yeah yeah. C.B.: And it could be a disadvantage because people... P.B.: Well, I think it’s important when in an inforgraphic to include a link in or some sort of URL that people can follow to other context, yes. C.B.: Do you believe that data and information visualisation can be misleading? And if yes, which is the best way to avoid that? P.B.: I think any form of communication can be misleading and the way to avoid that would be with the usual, I guess ethical considerations that accompany any up top journalism, which is that you strive to be accurate, you strive to pop into context and not misrepresent so yes exactly the same processes.
C.B.: How do you choose what to keep and what to omit from the available data? P.B.: Again I think it’s the same as most journalism processes; you will take information out of a story if it is not pertinent, relevant... You know you want to strip back a story to the core details, the core facts, the key courts (?), the relevant background and so you reply the same rules to a visualisation, are we telling a story that all 400 you know law authorities or politicians are or are we just telling a story about five? Can we tell it more clearly by focusing on what our story is about and quite often we did, there are a lot of stories to tell so it’s quite, you need to be
119
more disciplined than text journalism? Sometimes being ruthless into what you take out, doing ??? the way this represents. C.B.: Which are your main data sources and what do you prefer to use? P.B.: It depends on the story. I mean on a regular basis I get updates from data.co.uk, the Office of National Statistics, F.O.I. requests on what do they know, how the particular key word in them that they might be interested in, so the particular sources like that but a lot of the time it will be something, for example, I’m working on some, on housing infographic at the moment so that’s a case of seeking guides’ data on housing not in fact is on data.co.uk, it’s on the gov.uk?? website it’s the Guardian, it’s non-profit organisations. There’s no scraping but something else I’m doing for something else involves scraping information from a series of job websites so it really depends, I do like scrapping because it’s a way of getting data that no one else has, so I get back something that I personally prefer but I get data from all sorts of sources. C.B.: Is it important that the data you use is freely available? P.B.: Yes, is the short answer. I think there’s a distinction to be made in terms of, you could possibly say that as a journalist I’m more likely to do something with data if I’m the only person who’s managed to gather it. As a citizen, I prefer, you know on principle, and as a journalist on principle, generally I think data should be more freely available and particularly available in format to make it easier to combine, so data should be clean as well as freely available yes. C.B.: We know that the Guardian data always publishes data but could you give an example that when it was not possible for you to give access to the data readers and under what circumstances something like that could happen. P.B.: The obvious reason would be copyright or database rights. So if I have script data of, for example a job’s website copy, if I was to publish all that data I’d be breaking the law and so that’s something I’m not going to do unless I feel that
there’s a public interest argument is so strong that it would be unlikely that I would be sued or if I was sued that an organisation would back me in defending that so there are legal reasons there may be cases where data is personal and reveals people’s identities or I have reason to believe that the data is chronically inaccurate or that there are inaccuracies that it would not be ethical to publish, those would be the main reasons why I would not publish data in front (?).
120
C.B.: In case the data is released by the government or another official organisation could you say a few things about what tends to be the standard procedure from the collection to the publication, the final publication? B.: Do you mean the collection by the agency or by me? C.B.: By you. P.B.: So if an organisation publishes data what do I do. I mean I would, first of all I would try and identify what it is in the data that I’m interested in because quite often there would be a lot of possible avenues and you could waste a lot of time cleaning up data or mixing data together that you don’t need to clean or mix up so I’d identify the particular aspect or for example I’ve been working on data this week that’s around housing costs and the stuff about temporary accommodation, about private and it’s about bed and breakfast so If I say I’m interested in bed and breakfast I would take that particular piece of data and clean that up so that’s in the format I could visualise or combine with other data or sort or aggregate or do something with so quite often there would be empty rows in the data and I would take the data into Google refine and strip out the empty rows, the headings might be in multiple rows and again Google Refine would allow me to combine those into a single header. So once I’ve cleaned it up I might want to duplicate entries as well, things like that then I would try to analyse that somewhere so if there are multiple rows for different items of spending for example I might use a pivot table to add them all together I might combine different years to give a view of a time, I might add extra context or I might need the populations for each regions or things like that over years is a good example of that and then I would probably pick up the phone to ask any questions about it. So, for example, in the bed and breakfast example some local authorities are actually making money they’ve got far more income than expenditure, which seems odd so pick up the phone to some of those authorities and say how does that work and try to understand what the calculations
are and what the money is that’s coming in and going out and then I guess at the end of all that you might do some sorts of chart and strip out details in the chart or you might write a text with plot (?) and quotes get at this studies you might combine it to it. C.B.: In case you collect the data from scratch yourself which is the equivalent procedure, I mean, where do they differentiate from the…? P.B.: There’s probably no difference apart from the fact that if I’m collecting it myself the cleaning, I will have prevented some of the cleaning problems, there
121
wouldn’t be empty roles, there wouldn’t be multi role headings the yeah I’d have more of an understanding I might be able to strip out duplicate entries at the point of collection so some of the clean might be done as part of the gathering, but I guess in that sense you’re still pass the process you’re still part of the gathering. C.B.: Big data is a very hot topic now, could you please say what is big data for you because there are different opinions? P.B.: I think like a lot of neologisms like data journalism, like citizen journalism, like visualisation they… there’s no, different people understand different things by it it’s a rebranding of something old but there’s a reason for it to exist something which is that something has changed my understanding of big data is that there’s a way of signalling the quantities of data being gathered or, that we can work with has changed in a way that affects what we do qualitatively. So I don’t think you can say above a certain amount of data I think it’s more of a cultural court (?) to say, to signal that something is unusually large in the context of someone’s experience but I don’t think there’s a hard and fast definition of what big data is. Does that make sense? C.B.: Yes, yes I understand, I understand yes. It’s very difficult to define and some people only focus on the quantity or other’s combine it with the qualitative aspect. P.B.: I don’t it’s a practically useful term, I don’t think it relates to anything concrete, I think it’s a socially useful term, culturally useful term in talking very generally. C.B.: When you said that it’s a rebranding of something old it obviously existed before but people perhaps hadn’t realised that it could stand as something on its own as a science of its own, no science exactly… P.B.: I think first of all was more of it, there have been very big data sets in the past but we have needed enormous computers the same process imply (?) but much bigger computers much fewer computers to do things with it. So that, you know
data journalism existed in the form of computer system to be parting, but has broaden to take in some of the other things we need a new term to essentially recognise that and talk about computer systems to be parting in a new way so think it’s the same things we’re still talking about data but we’re signalling some way that we’re talking about data in a qualitatively different way not specifically but just generally this is qualitatively different. C.B.: Which are the tools that you prefer to use to cover, refine, organise data and then to visualise and why?
122
P.B.: I use ??? to scrap it, to scrap relatively easy to datasets. I use Scrapper wiki and Python to scrap basic more complex data I use excel to analyse and sometimes Google drive, I use Google Refine to clean it up and sometimes Excel, and sometimes Google Docs, and sometimes Python even and sometimes command line, commands, terminal commands like to combine spreadsheet and I use Fusion Tables to combine datasets, I might use Excel to combine it, I might use Python again, visualisation I tend to go for Datawrapper at first because it’s nice and quick and let lumited (?) I use BatchGeo, I’ll use fusion tables again, I’ll use Tableau sometimes because I don’t have a PC at home and it doesn’t work on Mac I don’t use it that often. And I don’t know, a bit of Java script libraries that I’d like to do more with but I don’t that much and I’ve probably forgotten stuff but yeah, it depends on the particular problem you know I’ll turn to Google, if I’ve got a problem I will turn to search for solutions to that problem. C.B.: How do you decide which type of visualisation is the best for your story? P.B.: I do have quite often use, I used to use a chart chooser by someone called A Abella which kind of... C.B.: Is it the one I found a little one of the…? P.B.: It’s probably, I’ve probably mentioned it yeah. But I’ve kind of absorbed that in mind and I will do it in my head so I’ll decide is it a story about comparison or is it a story about the constitution of something, is it about distribution, is it about relationships and then you know if it’s about the composition of something as a snapshot I’ll use a pie chart or a tree map, if it’s composition of a time then bar chart if it’s… so I’ll kind of decide on what the story is and then pick the chart that tells that story. C.B.: Do you prefer static or dynamic visualisation? By dynamic I mean interactive, can you say something? P.B.: I tend to use static ones, relatively static ones because of speed and
because I tend to use visualisation to just tell, to give an overview of something accompanying the text story I know Caroline Beavon will use more interactivity with Tableau for example, to give different views on the same data but it’s just been nurtured of the stuff that I work with tends to be, I’m more text based, Caroline is more visually based in her work so we will… so we will probably work differently on that sense. It depends on the two I mean Datawrapper has interactivity in terms of that you can select from drop down menus and stuff like that and fusion tables has interactivity you can clink on different locations and get an information and… so I’ll
123
tend to use those of the two that I use most and I’ll use the interactive that comes with those but it’s really about telling a story simpler, at some point I did use Leaflets because which is a Java script library to do a map because it allowed me to do more interactivity and it can be used on a mobile phone to centralise the user’s location and I’d like to use this functionality more but it depends on having a story that requires that functionality. C.B.: I have two statements that I will say to you so I would like your comment in each one of these; the first is ‘Do you search for data on a specific story and the try to prove or disprove a hands or identify a basic information that could derive from that data? And the second one is ‘Do you examine various datasets in general to see if there’s a possible underlying story? P.B.: Both. C.B.: Both. Is there something that occurs more often? One of those two that…? P.B.: Because I’m not a, because I’m not a kind of an employed journalist as such I tend to do more with an hypothesis and look for the data that surrounds that… but it depends really If I was, you know if I needed to fill spares I needed to get content out regularly I’d be doing more starting off the data, looking for the stories and that, and that’s the stuff that I’m trying to do with my students on the Birmingham Data Blog because it’s simpler to do but I’m more interested in finding data based on an idea. C.B.: I would like to, if you would like to choose one of the articles you’ve posted in the data store and tell me a little bit about it, what made you choose…? P.B.: Okay, let’s go with that then… ‘So the Olympic torch relay places have been allocated’. There’s… I mean I guess there’s a bit of history to this and the others arrived a bit (?) straightforward, with this I was asked to write a story/post and that’s probably the end of the story with how to be a data journalist with a story behind the sponsors, I mean these are all relatively similar, all the Olympic ones are
similar. We were working on investigation I spoke to James Ball or Simon Rogers depending on the article and I can’t remember who for which, I spoke to them said ‘we’ve got this data do you want me to do a piece of the data blog about X’ so in this case we would compiled information on what happened to all the places in the torch relay places, how they had been allocated, by whom and to whom. And I’d been done for the book and I said ‘Do you want me to do a post about this particular aspect of the book?’ and I’d asked Caroline ‘Would you do a visualisation for this?’ So Caroline did the visualisation and I did an overview of the
124
story. And I guess part of the reason for doing that is because it broadens the… exposes the book and the story to more people and profit from the book go to charity so I’d like more people to buy the book, you can download it for free but I’d like to raise money for the charity, it raises the profile of Caroline, and it’s nice to be able to, for Caroline to have The Guardian on her CV, it’s nice for the other contributors to the investigation to be able to feel that it’s getting that sort of exposure, it improves, you know, the search engine optimisation that helped me investigate. So there are a number of benefits really but broadly speaking it’s a story that needs to be told I think, it’s… yeah. C.B.: How did you decide which variables to focus on? B.: Which? C.B.: Variables to search and focus on, I mean… P.B.: In this case we’ve been investigated the torch relay for a few weeks and individual instances of corporate executives been given torch relay classes and that story being departed and we could keep on doing that and find more and more executives but what we wanted, what I wanted to do I guess, was move from the individual and anecdotes (?) to something that was about assistance and find out what went wrong, rather than here’s something that’s shameful, why did that happen, who’s responsible, was it Adidas or was it ...? Was a promise actually broken and to what degree was it broken and so on. So, in order to do that I knew we needed to take VIPs and places and work out where they went and we knew that 2012 had gone to this particular campaign, a public campaign, a public cause publicity but, and we knew that some had been given to cooperate partners to distribute themselves and they were supposed to do that publicly so was really about saying we’ve got this figure but, you know, we were on the phone to Lloyd's saying ‘How much of this went to external and how much went to internal? And eventually they said ‘Right 50% went to internal’. And likewise we collected
various evidence of different campaigns by Samsung and different campaigns by Coca Cola. So it was, that was really a lot of document-based research, there wasn’t, there weren’t any datasets involved here other than, essentially we created a dataset out of public records you know out of here’s a statement from ...? and 2012 places, here’s a press release from Samsung about campaign that was for ten places, and so on. And there’s a dataset that has which organisation, how many places, which campaign and so on and that’s about, so that’s the methodology and it’s really about showing a system of work and trying to compare
125
that with the promise that was made which was 95% would be made available to the public. C.B.: How long did it take to create it although it was…? P.B.: Well, that particular thing probably took a week from start to finish, well from deciding to do that. But a lot of the information, a lot of the documents that formed part had already been collected along the way so it’s a case of looking back and kind of putting that into a spreadsheet, so I already had bookmarked press releases and things like that and I was going back on all of that. So it was kind of personal archive research and then identified gaps like Lloyd's and say ‘Right we need to phone them up cause we have no documents about Lloyds and phone up Coca Cola and pressure them and try and get The Guardian so the Guardian tried to get figures out of Coca Cola and that, which is another reason for working with them, you know you’re more likely to get access to things if you’re at The Guardian. C.B.: And apart from Caroline did anyone else cooperate at that? P.B.: Carol Maiers did a lot of work, I mean she was the one who was at the phone Lloyd's With this particular one there might have been one or two of those but it was mainly me and Carol doing the dig in, and I think James Ball at The Guardian, to an extent. The book is a whole of contributions from all sorts of people. C.B.: Thank you. And do you have any insight or feedback on how the articles with data visualisation are perceived by the readers? P.B.: Uhm… no. C.B.: Do you believe that the readers understand them better because they, of the form they are visualised? P.B.: Uhm… not particularly, I think the only thing I would say is that we did very early on in the investigation I did a map with fusion tables of Nottingham born torchbearers and Nottingham sport website took it into gymnastic ...? use ...? and write an article about it. So it had an impact in that sense and then that was, and
actually was repeated, someone in ...? did one, someone in Wales and so on. So clearly had an impact in some sense, that’s about it really it’s difficult to get an idea of any of the feedback and… yeah. C.B.: In general about, not for your article specifically, do you believe that in general articles with visualisation are better understood by the people, by the readers, the average reader? P.B.: I don’t know, I mean I don’t have any evidence to base a judgement. What I would say is that I think images generally are more effective at bringing someone
126
into an article. And charts raise a question I think that the article then helps answer. So, in theory you are likely to get people more engaged in the article from the start because of a chart, regardless if the chart itself helps at understand a bit better, it gives them more motivation to, I would say, because there is more motivation to read, is almost a promise that this will be explained. C.B.: You can’t know if, do you know if your stories worked better because, for example this one that had the visualisation, do you believe that it worked better because of that? P.B.: Yes. C.B.: Yes. And do you think that most people find data visualisation as easy to understand? P.B.: I don’t really have any evidence on that so what I think doesn’t necessarily carry any weight. As I said, there’s evidence of people that communicate visually that they, so I think it aids, what they/I do. C.B.: Do you have an example of data visualisation that was used badly that it was really hard for people to understand, for an average reader? P.B.: Yeah I keep my links, I’ve started saving bad visualisation, a simple one would be delicious.com-???? It’s not my work but there you go. Let’s try this [typing on PC to find the correct link that was: http://pinboard.in/u:paulbradshaw/t:badvis ] C.B.: Okay, thank you very much. I’ve read your article on how to be a data journalist. Which are the skill that someone needs to have in order to work on data driven journalism and data visualisation on a professional level? P.B.: What skills do you need? C.B.: Aha. P.B.: I think you need an eye for a story, you need to be able to see what stories exist, that might exist in data, you need to be able to analyse data and to find that, and then you need to be able to communicate the results effectively so that quite
often means writing about data in a way that isn’t bumped up/down with numbers that might human stories in them, so that’s about kind of leaving the data behind and speaking to people. C.B.: If you were building a data journalistic team for a national newspaper, could you say something about the kinds of people that you would recruit, I mean who many people of which specialisation, the idea, the minimum, perhaps… P.B.: I think ideally you want someone with a subject specialism. There’s a lot of ifs, there’s a lot of things, there’s a lot factors that come into account. So, for example,
127
if you’re in a team within a newspaper that has expertees more broadly then you don’t necessarily need a team (?) but you need to be able to say to the health reporter of an education specialist what do I need to know about issues in education or whatever. So you need access to either within or outside the team to subject expertees because that’s what leads to the kind of hypothesis about checking particular claims or looking for the impact of things or seeing if policies worked and then the same sorts of skills again you need people with a nose for news, you need people who can do basic spreadsheet work and cleaning an analysis and I think increasingly you need developers you know programmatic skills of being able to write scraper or being able to create systems about stream line process ever more if it is easy to work with public data but increasingly that will not be enough and also I think the more useful data will might not be the public data you need people that have got the F.O.I., you need people who’ve got the/a scraper pen you need people who have contacts who can leak/link (?) data. So people who have access to data that others don’t and that involves developers and F.O.I. expertees' contacts. C.B.: And my final set of questions is how do you think that journalism data visualisation have changed journalism and what do you in see the future for…? P.B.: How have they changed journalism uhm… Well I think, I don’t think it’s about data journalism but I think it’s an increased pressure on journalists to be factually accurate, partly , not just because of data journalism but because other people are able to publish now and say we know this subject inside out and you have made a mistake so there’s more pressure in terms of bloggers. There’s, it’s easier to access factual information and check claims and things like that so that’s had an impact. I think we’re telling stories in different ways so visual storytelling is becoming more important and that’s having an impact and I think it’s having an impact in terms of we can see that it’s selling papers like it can lead to a number of
stories like MPs’ expenses, Wikileaks, we can see that it leads to higher engagement in terms of traffic and stickiness and commercially, so that’s leading to a pressure to do more, but I don’t think it’s data journalism that’s causing those pressures I think it’s other things so I think it’s changes in how advertising is measured it’s changes in the information environment, in the information availability, its changes in how data is used by politicians you need there’s a lot more abuse of data by politicians but perhaps we're used to it because there is
128
more availability of data so journalists are probably reacting more to the information environment more generally than to data journalism as such… yeah. C.B.: And in the future what do you see? P.B.: In the future I think as I said I think there’s always gonna be a conflict between the information that a journalist is seeking and the information that powerful people want to make available and that plays out in a number of ways; the F.O.I. laws are under pressure because of what’s been done with them so I can see there being fights around F.O.I. in two major ways; one there’s a fight to roll it back to put more than it’s on it but also there’s a fight to extend it to private companies for example than dividing public services so those two fights are taking place I think there’s more press release data going to be made available so there’s gonna be more spin on data than journalists are gonna have to unspin and I think journalist are gonna become better at getting data that isn’t available through those means, so either being leaked by sources or being obtained through scraping because more of it it’s gonna be available online. So you kind of got to balance and that’s where I see most of it playing out. There’s gonna be more use of data as well so there’s gonna be more opportunities for personalisation, for stories to be told in ways that relate throughout to a person, so you plug in through Facebook and that story is told in terms of your area, your friends, your skills, your health conditions either. And that’s I think an area that’s going to grow a lot, network analysis connection between people, that’s being historically hard to do and is becoming easier so relationships ...? and things like that. So, that probably sums a lot. C.B.: Okay. P.B.: Alright? C.B.: Yes, thank you. Is it okay if I use your name? P.B.: Yeah, yeah, yeah. C.B.: Thank you and if there’s something that you would like me to omit?
P.B.: No. C.B.: Okay, thank you.
129
Appendix 3 - Content Analysis Methodology
3.1 Code Frame, Limitations, Clarifications (Tables A-E)
Table A: List of Variables:
Variable Code Name
Variable Description
Var1 Year of Publication Var2 Number of visualisations Var3 Author of article Var4 Subject Category Var5 Existence of Visualisation Number 1 Var6 Existence of Visualisation Number 2 Var7 Existence of Visualisation Number 3 Var8 Type of Visualisation Number 1 Var9 Type of Visualisation Number 2 Var10 Type of Visualisation Number 3 Var11 Tool for Visualisation Number 1 Var12 Tool for Visualisation Number 2 Var13 Tool for Visualisation Number 3 Var14 Existence of Data Summary Var15 Existence of Data Set
Coding description:
1. Var1: A numerical value indicating the year of publication, raging from 2009 till
2013 2. Var2: A numerical value indicating the number of visualisations in the article
3. Var3: A numerical value corresponding to a specific name for each author. In
the cases of multiple authors only the fist one is considered. Value 0 indicates articles with broken links or empty content
4. Var4: A numerical value corresponding to the subject category to which the
article belongs. Table of the code of each subject category to follow. . Value 0 indicates articles with broken links or empty content
5. Var5: Numerical value 1 indicates the existence of 1st visualisation, value 0
indicates non existence
130
6. Var6: Numerical value 1 indicates the existence of 2nd visualisation, value 0
indicates non existence 7. Var7: Numerical value 1 indicates the existence of 3rd visualisation, value 0
indicates non existence 8. Var8: Numerical value corresponding to a specific type of visualisation. Table
of the code of each type to follow. Value 0 corresponds to non existing visualisations, or articles with broken links or empty content
9. Var9: Numerical value corresponding to a specific type of visualisation. Table
of the code of each type to follow. Value 0 corresponds to non existing visualisations or articles with broken links or empty content
10. Var10: Numerical value corresponding to a specific type of visualisation. Table
of the code of each type to follow. Value 0 corresponds to non existing visualisations or articles with broken links or empty content
11. Var11: Numerical value corresponding to a specific tool used for the creation
of visualisation. Table of the code of each tool to follow. Value 0 corresponds to non existing visualisations or to the cases where a tool can not be identified8
12. Var12: Numerical value corresponding to a specific tool used for the creation
of visualisation. Table of the code of each tool to follow. Value 0 corresponds to non existing visualisations or to the cases where a tool can not be identified9
13. Var13: Numerical value corresponding to a specific tool used for the creation10
of visualisation. Table of the code of each tool to follow. Value 0 corresponds to non existing visualisations or to the cases where a tool can not be identified
14. Var14: Numerical value 1 indicates the provision of a Data Summary of the
Data Set on which the article was based. Value 0 indicates non-provision or articles with broken links. Tables in articles that are not clearly identified as 'Data Summary' at the end of the article, are considered general data tables
15. Var15: Numerical value 1 indicates the provision of the full Data Set or a link to
the source of the Data Set, on which the article was based (either on a downloadable spreadsheet or through a link to the source of the data). Value 0 indicates non-provision or articles with broken links
8 With the exception of the visualisation type of 'Tables', please see clarifications under Table E. 9 With the exception of the visualisation type of 'Tables', please see clarifications under Table E. 10 With the exception of the type visualisation of 'Tables', please see clarifications under Table E.
131
Please note that if an image of data visualisation contained more than one type of visualisations, those were treated as separate items and not as one.
Table B: Var3 : Author, Coding Scheme Code Number
Author's Name
Code Number
Author's Name
Code Number
Author's Name
1 Simon Rogers 24 Ersa Turk 47 Simon Day 2 Ami Sedghi 25 Rebecca
Ratcliffe 48 Anna Powell-
Smith 3 Theresa
Malone 26 Mona
Chalabi 49 Chris Hanretty
4 James Ball 27 Nick Evershed
50 Nick Mead
5 Graham Snowdon
28 Peter Walker 51 Jennifer Jones
6 Lisa Evans 29 Julia Kollewe 52 David McGillivray 7 Claire Provost 30 Simon
Choppin 53 Christine Oliver
8 Nathan Yau 31 Felicity Brown
54 Danny Dorling
9 Jeevan Vasagar
32 Severin Carrell
55 Gary Blight
10 Alice Woolley 33 Elena Moya 56 Nona Buckley-Irvine
11 Jonathan Grey 34 Denis Campbell
57 Jake Porway
12 Randeep Ramesh
35 Charles Arthur
58 Kathry Torney
13 Pete Robbins 36 David Henke 59 Harry Enten 14 Katy Stoddard 37 Premesagar
Rose 60 Lisa O'Carroll
15 Nigel Shadbolt 38 Larry Elliott 61 Chris Fenn 16 Jessica
Shepherd 39 David Mc
Candless 62 Andrew Sparrow
17 Sarah Hartley 40 SA Mathieson
63 George Arnett
18 Jonathan Glennie
41 Adam Vaughan
64 Andrés Monroy-Hernández
19 Paul Bradshaw 42 John Burn-Murdoch
65 Sam Weaver
20 Jonathan Grey 43 Tom MacInnes
66 Margot Huysman
21 Kevin Anderson
44 Nathan Green
22 The Guardian (no specific author mentioned)
45 Alasdair Rae
23 Juliette Garside
46 Antonia Kanczucla
132
Clarification: Author 3.Theresa Malone, was assigned as author by mistake.
However, the mistake was detected in an early stage of the research and her code was not replaced by another name in an effort not to miscalculate the coding results (the coding for the relevant article was corrected).
Clarifications - Limitations about Var4 Coding:
§ Due to coding limitations, in order to be compatible with ReCal (Intercoder Reliability Testing Online), if an article has multiple authors, only the first one is considered
Table C: Var4 : Subject Category, Coding Scheme Code Number
Category Code Number
Category
1 Politics / Government/ Public Administration
9 World News
2 Sports 10 Global Development 3 Culture 11 Environment / Weather /
Nature 4 Health 12 Media / Journalism 5 Military / War 13 Transportation 6 Education 14 Technology / Science 7 Society 15 Economy / Business 8 Crime / Terrorism
Clarifications - Limitations on some categories of Var4 Coding:
§ Politics / Government / Public Administration: Under this category are
classified articles about politics, government, local government, public administration
§ Sports: Under this category are classified articles about athletic events, sports,
athletes § Culture: Under this category are classified articles about music, books, theatre,
TV-shows, radio, cinema, museums, libraries, events, history, archaeology, food (when not related to health), customs, awards
§ Health: Under this category are classified articles about health, nutrition,
cosmetic surgery, diseases, epidemics
133
§ Military / War: Under this category are classified articles about military, war,
warzones, refugees, victims § Education: Under this category are classified articles about education, schools,
universities, literacy, (unless referring to Governmental reforms & policies of those, then they classified under category 1)
§ Society: Under this category are classified articles about unemployment,
employment, poverty, immigration (unless referring to immigration policies, then classified under Category 1), social media, demographics (population, income), drugs and alcohol consumption, traveling, life-style
§ Crime / Terrorism: Under this category are classified articles about crime,
terrorism (victims, attacks, rates) § World News: Under this category are classified articles about specific news or
topics about a country other than UK or do a comparison between countries, unless the subject is very clear
§ Global Development: Under this category are classified articles about aid,
poverty, global development § Environment / Weather / Nature: Under this category are classified articles
about environment, weather, nature, natural resources, energy consumption, natural disasters
§ Media / Journalism: Under this category are classified articles about media
stations or organisations, journalism, journalism conferences § Technology / Science: Under this category are classified articles about
technology, science, data and open data (unless referring to Government data & open data, then they are classified under category 1), data visualisation
§ Economy / Business: Under this category are classified articles about
economics, economy, business, (unless referring to Governmental reforms & policies of those, then they classified under category 1)
134
Table D: Var8, Var9 & Var10: Types of Visualisation, Coding Scheme based on (Bounford, 2000).
Code Number
Type Code Number
Type
1 Interactive 10 Map 2 Word Cloud 11 Symbol 3 Pie Chart 12 Combination of types 4 Spreadsheet 13 Relational Diagram 5 Video 14 Network Map 6 Line Graph 15 Timeline 7 Bar Chart 16 Scatter Graph 8 Table 0 Not Available / Not shown /
Broken Link / Not functioning 9 Area Chart
Clarifications - Limitations on Var8, Var9 & Var10 Coding:
§ Interactive are considered the visualisations that are animated, usually show combination of types of graphs that require the active participation from the user in order to be explored more
§ If an interactive graph or map have a clear type of graph (for example a bar chart that looks interactive because the user can click on and see some data, or a map that is interactive with more data when one clicks on) then these visualisations are classified under the type of chart they clearly show (bar chart, line graph, etc). Therefore, maps created with Google Fusion for example, or bar/line charts created with Datawrapper, although have a small degree of interactivity, they are classified as map, bar chart, line graph, etc., accordingly.
135
Table E: Var11, Var12 & Var13: Visualisation Tools, Coding Scheme
Code Number
Tool Code Number
Tool
1 Tableau 10 Compete 2 Wordle.net 11 Graphic from
External Source 3 Many Eyes 12 Datawrapper 4 Google Fusion 13 Timetric 5 Zoom.it 14 Prezi 6 Google
Docs / Drive 15 ZeeMaps
7 Opta 16 BatchGeo 8 Infomous 17 Cartödb 9 Guardian Graphics' Team /
Guardian Data Team/ External Freelance Graphist for The Guardian
0 Not Available / Not shown
Clarifications - Limitations on Var11, Var12 & Var13 Coding:
§ Clarification about Tool 7. Opta was assigned as a tool by mistake.
However, the mistake was detected in an early stage of the research and its code was not replaced by another tool in an effort not to miscalculate the coding results (the coding for the relevant article was corrected).
§ Clarification about Tool 9. "Guardian Graphics' Team / Guardian Data
Team/ External Freelance Graphist for The Guardian": Under this category are assigned visualisations that were created by The Guardian Graphics' Team or by an external freelance graphist with an indication 'For The Guardian'. Additionally, in many articles one can see tables (usually pictured with grey color palettes) that are created either by the author of the article of another member of The Guardian Data Journalism Team. However it is not possible to know the tool that it was used to create the table. They could be tables created in a word editing program, or in excel, or with the use of a database language, such as SQL, etc. In all those cases, those visualisation types are classified as created by the data journalism team, therefore are classified under Tool 9. They are not assigned value 0.
136
Appendix 4 - Quantitative Research Findings
All spreadsheets that include the coding for all variables of all articles, links to the articles, their titles and all the statictical tables and charts can be found and downloaded at: https://copy.com/xOsRJcSR1wwL Due to limitations on page margins, some tables and charts had to be inserted as an image. On the link provided above one can find the entire excel file with all tables and charts in larger size. Also note that all data for 2013 refer to articles published till 30th of July (7 out of the 12 months of the year).
4.1 - Visualisations per Article (Table 1, Chart 1) Table 1. Number of 1st, 2nd &3rd Visualisation,Total Number of Visualisations
Chart 1. Number of 1st, 2nd &3rd Visualisation, Total Number of Visualisations
12%
20%
6% 5% 7%
50%
Number of Visualisations per Article
Articles without Visualisation
Articles with 1 Visualisation
Articles with 2 Visualisations
Articles with 3 Visualisations
Articles with more than 3 Visualisations
Number or Articles
Percentage of Articles
Articles without Visualisation 71 24.07% Articles with 1 Visualisation 118 40.00% Articles with 2 Visualisations 39 13.22% Articles with 3 Visualisations 28 9.49%
Articles with more than 3 Visualisations 39 13.22% Total 295 100.00%
Average number of visualisations per article: 1.97
137
4.2 - Provision of Data Summary and Data Sets (or links to data source) (Table 2, Chart 2) Table 2. Provision of Data Summary and Data Sets (or links to data source) % of total Number Percentage Articles with Data Summary 105 35.59% Articles with Data Set 214 72.54% Articles with only Data Summary 2 0.68% Articles with only Data Set 112 37.97% Articles with Both Data Summary and Data Set 103 34.92%
Chart 2. Provision of Data Summary and Data Sets (or links to data source) % of total
35.59%
72.54%
0.68%
37.97% 34.92%
Articles with Data Summary
Articles with Data Set
Articles with only Data Summary
Articles with only Data Set
Articles with Both Data
Summary and Data Set
Provision of Data Summary and Data Set
138
4.3 - Authors by Number of Publications and Year (in descending order) (Tables 3-5, Charts 3-4) Table 3, part 1. Authors by Number of Publications and Year (in descending order)
139
Table 3, part 2. Authors by Number of Publications and Year (in descending order)
Table 4. Main Authors (percentage of total publications)
Author Name Total Percentage
Simon Rogers 41.02% Ami Sedghi 9.15% Mona Chalabi 7.46% John Burn-Murdoch 5.42% Lisa Evans 3.39% James Ball 2.71% Claire Provost 2.03% Katy Stoddard 2.03% Nick Evershed 1.69% Randeep Ramesh 1.02% Sarah Hartley 1.02% Kevin Anderson 1.02% Others 18.31%
140
Chart 3. Main Authors (percentage of total publications)
Table 5. Main Authors (Publications per year, Percentage of total publications)
Author Name
Number of Articles in 2009
Number of Articles in 2010
Number of Articles in 2011
Number of Articles in 2012
Number of Articles in 2013
Total Number of Articles
Total Percentage
Simon Rogers 21 24 31 30 15 121 41.02% Ami Sedghi 0 4 7 7 9 27 9.15% Mona Chalabi 0 0 1 0 21 22 7.46% John Burn-Murdoch 0 0 1 12 3 16 5.42% Lisa Evans 0 2 3 5 0 10 3.39% James Ball 0 0 3 2 3 8 2.71% Claire Provost 0 0 2 3 1 6 2.03% Katy Stoddard 1 4 1 0 0 6 2.03% Nick Evershed 0 0 0 0 5 5 1.69% Randeep Ramesh 0 0 0 3 0 3 1.02% Sarah Hartley 0 2 0 1 0 3 1.02% Kevin Anderson 3 0 0 0 0 3 1.02%
Others 9 8 14 25 9 65 18.31%
41.02%
9.15% 7.46% 5.42% 3.39% 2.71% 2.03% 2.03% 1.69% 1.02% 1.02% 1.02%
18.31%
Authors
141
Chart 4. Main Authors (Publications per year, Percentage of total publications)
142
4.4 - Articles Per Subject per Year (Tables 6-7, Charts 5-12) Table 6. Articles per Subject per Year (Frequencies)
Subject Category Subject Name
Number of Articles in 2009
Number of Articles in 2010
Number of Articles in 2011
Number of Articles in 2012
Number of Articles in 2013
Total Number of Articles
1 Politics / Government / Public Administration 6 12 13 12 12 55
2 Sports 0 1 6 9 5 21 3 Culture 2 5 8 5 4 24 4 Health 3 1 3 4 2 13 5 Military / War 3 3 1 1 6 14 6 Education 0 4 7 6 2 19 7 Society 4 7 8 14 11 44
8 Crime / Terrorism 1 0 2 1 1 5
9 World News 4 0 4 4 5 17
10 Global Development 0 2 3 5 2 12
11 Environment / Weather / Nature 4 2 2 5 3 16
12 Media / Journalism 1 3 2 9 2 17
13 Transportation 1 1 1 4 4 11
14 Technology / Science 1 1 1 4 5 12
15 Economy / Business 4 2 2 4 1 13
34 44 63 87 65 Total 34 44 63 87 65
Chart 5. Articles per Subject Total (Frequencies)
143
Chart 6. Articles per Subject per Year (Frequencies)
Table 7. Articles per Subject per Year (Percentages)
Subject Name
Percentage per Subject 2009
Percentage per Subject 2010
Percentage per Subject 2011
Percentage per Subject 2012
Percentage per Subject 2013
Total Percentage
Politics / Government / Public Administration 17.65% 27.27% 20.63% 13.79% 18.46% 18.64% Sports 0.00% 2.27% 9.52% 10.34% 7.69% 7.12% Culture 5.88% 11.36% 12.70% 5.75% 6.15% 8.14% Health 8.82% 2.27% 4.76% 4.60% 3.08% 4.41% Military / War 8.82% 6.82% 1.59% 1.15% 9.23% 4.75% Education 0.00% 9.09% 11.11% 6.90% 3.08% 6.44% Society 11.76% 15.91% 12.70% 16.09% 16.92% 14.92% Crime / Terrorism 2.94% 0.00% 3.17% 1.15% 1.54% 1.69% World News 11.76% 0.00% 6.35% 4.60% 7.69% 5.76% Global Development 0.00% 4.55% 4.76% 5.75% 3.08% 4.07% Environment / Weather / Nature 11.76% 4.55% 3.17% 5.75% 4.62% 5.42% Media / Journalism 2.94% 6.82% 3.17% 10.34% 3.08% 5.76% Transportation 2.94% 2.27% 1.59% 4.60% 6.15% 3.73% Technology / Science 2.94% 2.27% 1.59% 4.60% 7.69% 4.07% Economy / Business 11.76% 4.55% 3.17% 4.60% 1.54% 4.41%
Please note that 2 articles had no content displayed therefore their subject could not be identified.
144
Chart 7. Articles per Subject (Percentages) in total (all years)
Chart 8. Articles per Subject (Percentages) in 2009
18.64% 7.12% 8.14%
4.41% 4.75%
6.44% 14.92%
1.69% 5.76%
4.07% 5.42% 5.76%
3.73% 4.07% 4.41%
Politics / Government / Public Sports Culture Health
Military / War Education Society
Crime / Terrorism World News
Global Development Environment / Weather / Nature
Media / Journalism Transportation
Technology / Science Economy / Business
Total Percentage Per Subject
17.65%
0.00%
5.88%
8.82%
8.82%
0.00%
11.76%
2.94%
11.76%
0.00%
11.76%
2.94%
2.94%
2.94%
11.76%
Politics / Government / Public
Sports
Culture
Health
Military / War
Education
Society
Crime / Terrorism
World News
Global Development
Environment / Weather / Nature
Media / Journalism
Transportation
Technology / Science
Economy / Business
Percentage per Subject 2009
145
Chart 9. Articles per Subject (Percentages) in 2010
Chart 10. Articles per Subject (Percentages) in 2011
27.27% 2.27%
11.36% 2.27%
6.82% 9.09%
15.91% 0.00% 0.00%
4.55% 4.55%
6.82% 2.27% 2.27%
4.55%
Politics / Government / Public Sports Culture Health
Military / War Education Society
Crime / Terrorism World News
Global Development Environment / Weather / Nature
Media / Journalism Transportation
Technology / Science Economy / Business
Percentage per Subject 2010
20.63% 9.52%
12.70% 4.76%
1.59% 11.11%
12.70% 3.17%
6.35% 4.76%
3.17% 3.17%
1.59% 1.59%
3.17%
Politics / Government / Public Sports Culture Health
Military / War Education Society
Crime / Terrorism World News
Global Development Environment / Weather / Nature
Media / Journalism Transportation
Technology / Science Economy / Business
Percentage per Subject 2011
146
Chart 11. Articles per Subject (Percentages) in 2012
Chart 12. Articles per Subject (Percentages) in 2012
13.79% 10.34%
5.75% 4.60%
1.15% 6.90%
16.09% 1.15%
4.60% 5.75% 5.75%
10.34% 4.60% 4.60% 4.60%
Politics / Government / Public Sports Culture Health
Military / War Education Society
Crime / Terrorism World News
Global Development Environment / Weather / Nature
Media / Journalism Transportation
Technology / Science Economy / Business
Percentage per Subject 2012
18.46% 7.69%
6.15% 3.08%
9.23% 3.08%
16.92% 1.54%
7.69% 3.08%
4.62% 3.08%
6.15% 7.69%
1.54%
Politics / Government / Public Sports Culture Health
Military / War Education Society
Crime / Terrorism World News
Global Development Environment / Weather / Nature
Media / Journalism Transportation
Technology / Science Economy / Business
Percentage per Subject 2013
147
4.5 - Visualisation Types (Table 8, Charts 13-14) Table 8. Types of 1st, 2nd and 3rd Visualisation (Frequencies and Percentages)
148
Chart 13. Types of 1st, 2nd and 3rd Visualisation (Percentages)
Chart 14. Types of Visualisations (Percentages of total use)
14.90%
2.78%
0.25%
3.28% 2.53%
9.60%
18.94%
8.33% 6.82%
15.91%
2.27%
7.58%
4.04%
1.01% 0.76% 1.01%
Total Percentage of Use Per Type of Visualisation
149
4.6 - Visualisation Tools (Tables 9-11, Charts 15-16) Table 9. Visualisation Tools' Use Per Year (Frequencies) and Total Use (Frequencies and Percentage)
Var11, Var12, Var13
Tool Name
Number of Use in 2009
Number of Use in 2010
Number of Use in 2011
Number of Use in 2012
Number of Use in 2013
Total Number of Use
% of Use Compared to Other tools
1 Tableau 0 0 3 13 4 20 5.80%
2 Wordle.net
0 7 3 0 0 10 2.90% 3 Many Eyes 0 5 12 1 0 18 5.22%
4 Google Fusion 0 3 13 10 7 33 9.57%
5 Zoom.it 0 0 1 0 0 1 0.29%
6 Google Docs / Drive 0 9 3 2 0 14 4.06%
7 Opta 0 0 0 0 0 0 0.00% 8 Infomous 0 0 3 0 0 3 0.87%
9
Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 9 16 11 18 15 69
20.00%
10 Compete 0 0 2 0 0 2 0.58%
11
Graphic from External Source 14 7 6 61 25 113
32.75%
12 Datawrapper 0 0 3 8 45 56
16.23%
13 Timetric 0 0 0 0 0 0 0.00% 14 Prezi 0 0 1 1 0 2 0.58% 15 ZeeMaps 0 0 0 1 0 1 0.29% 16 BatchGeo 0 0 0 0 1 1 0.29% 17 Cartödb 0 0 0 0 2 2 0.58%
Total 345
150
Table 10. Main Visualisation Tools' Use Per Year (Frequencies)
Main Visualisa
tion Tools per
Year (Frequen
cies)
Graphic from
External Source
Guardian Graphics'
Team / Guardian
Data Team/ External
Freelance Graphist for The
Guardian Data-
wrapper Google Fusion Tableau
Many Eyes
Google Docs / Drive
Wordle.net Others
Number of Use in 2009 14 9 0 0 0 0 0 0 0
Number of Use in 2010 7 16 0 3 0 5 9 7 0 Number of Use in 2011 6 11 3 13 3 12 3 3 7
Number of Use in 2012 61 18 8 10 13 1 2 0 2
Number of Use in 2013 25 15 45 7 4 0 0 0 3
Chart 15. Main Visualisation Tools' Use Per Year (Frequencies)
151
Table 11. Total Use of Main Visualisation Tools (Percentage) in descending order.
Tool Name % of Use Compared to Other tools
Graphic from External Source 32.75% Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 20.00% Datawrapper 16.23% Google Fusion 9.57% Tableau 5.80% Many Eyes 5.22% Google Docs / Drive 4.06% Wordle.net 2.90% Others 3.48%
Chart 16. Total Use of Main Visualisation Tools (Percentage) in descending order.
152
4.7 - Frequencies of Use of Tools, Types and Frequencies of Subjects per Author (in descending Order
Author: Simon Rogers (Tables 12-14, Charts 17-19) Table 12. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
1. Simon Rogers 11. Graphic from External Source 49
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 29
4. Google Fusion 22
3. Many Eyes 13
12. Datawrapper 11
6. Google Docs / Drive 9
2. Wordle.net 7
1. Tableau 4
8. Infomous 3
14. Prezi 2
5. Zoom.it 1
17. Cartödb 1
10. Compete 0
13. Timetric 0
15. Zee Maps 0
16. Batchgeo 0
Chart 17. Total Use of Visualisation Tools (Frequencies)
49#
29#22#
13# 11# 9# 7# 4# 3# 2# 1# 1#
11.#Graphic#from#External#Source##
9.#Guardian#Graphics'#Team#/#
4.#Google#Fusion#
3.#Many#Eyes#
12.#Datawrapper#
6.#Google#Docs#/#Drive#
2.#Wordle.net#
1.#Tableau#
8.#Infomous#
14.#Prezi#
5.#Zoom.it#
17.#Cartödb#
Author:(Simon(Rogers,(Tools(Used(Times#of#Use##
153
Table 13. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use
1. Simon Rogers 1. Interactive
36
10. Map 35
7. Bar Chart 19
12. Combination of types 13
9. Area Chart 12
13. Relational Diagram 12
4. Spreadsheet 9
6. Line Graph 9
2. Word Cloud 8
8. Table 5
11. Symbol 5
5. Video 4
3. Pie Chart 1
15. Timeline 1
14. Network Map 0
16. Scatter Graph 0
Chart 18. Total Use of Visualisation Types (Frequencies)
36 35
19
13 12 12 9 9 8
5 5 4 1 1
Author: Simon Rogers, Types Used
154
Table 14. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
1. Simon Rogers
1. Politics / Government / Public Administration 25
7. Society 18
9. World News 10
3. Culture 9
5. Military / War 9
2. Sports 7
4. Health 7
12. Media / Journalism 7
6. Education 6
11. Environment / Weather / Nature 5
13. Transportation 5
8. Crime / Terrorism 4
15. Economy / Business 4
14. Technology / Science 3
10. Global Development 2
Chart 19. Total Frequencies of Subjects
25
18
10 9 9 7 7 7 6 5 5 4 4 3 2
Author: Simon Rogers, Subjects
155
Author: Ami Sedghi (Tables 15-17, Charts 20-22) Table 15. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use 2. Ami Sedghi 13. Timetric 10
12. Datawrapper 8
10. Compete 6
1. Tableau 2
3. Many Eyes 1
4. Google Fusion 1
6. Google Docs / Drive 1
2. Wordle.net 0
5. Zoom.it 0
8. Infomous 0
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 0
11. Graphic from External Source 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 20. Total Frequencies of Used Visualisation Tools
10
8
6
2 1 1 1
Author: Ami Sedghi, Tools Used
156
Table 16. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use
2. Ami Sedghi 7. Bar Chart 12
1. Interactive 5
6. Line Graph 3
10. Map 3
12. Combination of types 3
11. Symbol 2
4. Spreadsheet 1
8. Table 1
9. Area Chart 1
15. Timeline 1
2. Word Cloud 0
3. Pie Chart 0
5. Video 0
13. Relational Diagram 0
14. Network Map 0
16. Scatter Graph 0
Chart 21. Total Frequencies of Used Visualisation Types
12
5 3 3 3
2 1 1 1 1
Author: Ami Sedghi, Types Used
157
Table 17. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
2. Ami Sedghi 2. Sports 7
3. Culture 6
6. Education 4
7. Society 4
1. Politics / Government / Public Administration 1
5. Military / War 1
9. World News 1
12. Media / Journalism 1
4. Health 0
8. Crime / Terrorism 0
10. Global Development 0
11. Environment / Weather / Nature 0
13. Transportation 0
14. Technology / Science 0
15. Economy / Business 0
Chart 22. Total Frequencies of Subjects
7 6
4 4
1 1 1 1
Author: Ami Sedghi, Subjects
158
Author: Mona Chalabi (Tables 18-20, Charts 23-25) Table 18. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
26. Mona Chalabi 12. Datawrapper 17
11. Graphic from External Source 10
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 6
4. Google Fusion 1
17. Cartödb 1
1. Tableau 0
2. Wordle.net 0
3. Many Eyes 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
10. Compete 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
Chart 23. Total Frequencies of Used Visualisation Tools
17
10
6
1 1
12. Datawrapper 11. Graphic from External Source
9. Guardian Graphics' Team / Guardian
Data Team/ External Freelance Graphist for The Guardian
4. Google Fusion 17. Cartödb
Author: Mona Chalabi, Tools Used
159
Table 19. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use 26. Mona Chalabi 7. Bar Chart 11
6. Line Graph 8
1. Interactive 6
8. Table 4
10. Map 2
5. Video 1
12. Combination of types 1
14. Network Map 1
16. Scatter Graph 1
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
9. Area Chart 0
11. Symbol 0
13. Relational Diagram 0
15. Timeline 0
Chart 24. Total Frequencies of Used Visualisation Types
11
8
6
4
2 1 1 1 1
Author: Mona Chalabi, Types Used
160
Table 20. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
26. Mona Chalabi 7. Society 7
1. Politics / Government / Public Administration 3
5. Military / War 3
9. World News 3
11. Environment / Weather / Nature 2
14. Technology / Science 2
8. Crime / Terrorism 1
10. Global Development 1
2. Sports 0
3. Culture 0
4. Health 0
6. Education 0
12. Media / Journalism 0
13. Transportation 0
15. Economy / Business 0
Chart 25. Total Frequencies of Subjects
7
3 3 3 2 2
1 1
Author: Mona Chalabi, Subjects
161
Author: John Burn-Murdoch (Tables 21-23, Charts 26-28) Table 21. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use 42. John Burn-Murdoch
11. Graphic from External Source 15
1. Tableau 9
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 3
4. Google Fusion 1
2. Wordle.net 0
3. Many Eyes 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
10. Compete 0
12. Datawrapper 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 26. Total Frequencies of Used Visualisation Tools
15
9
3
1
11. Graphic from External Source
1. Tableau 9. Guardian Graphics' Team / Guardian
Data Team/ External Freelance Graphist for The Guardian
4. Google Fusion
Author: John Burn-‐Murdoch, Tools Used
162
Table 22. Total Use of Visualisation Types (Frequencies)
Chart 27. Total Frequencies of Used Visualisation Types
4 4 4
3 3 3
2 2 2
1
Author: John Burn-‐Murdoch, Types Used
Author Code / Name Type of Visualisation Times of Use
42. John Burn-Murdoch
1. Interactive
4
7. Bar Chart 4
10. Map 4
6. Line Graph 3
12. Combination of types 3
16. Scatter Graph 3
9. Area Chart 2
11. Symbol 2
14. Network Map 2
8. Table 1
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
5. Video 0
13. Relational Diagram 0
15. Timeline 0
163
Table 23. Total Frequencies of Subjects
Author Code / Name Subject code / Name Number of Articles 42. John Burn-Murdoch
7. Society 4
15. Economy / Business 3
1. Politics / Government / Public Administration 2
14. Technology / Science 2
2. Sports 1
3. Culture 1
6. Education 1
11. Environment / Weather / Nature 1
13. Transportation 1
4. Health 0
5. Military / War 0
8. Crime / Terrorism 0
9. World News 0
10. Global Development 0
12. Media / Journalism 0
Chart 28. Total Frequencies of Subjects
4
3
2 2
1 1 1 1 1
Author: John Burn-‐Murdoch, Subjects
164
Author: Lisa Evans (Tables 24-26, Charts 29-31) Table 24. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
6. Lisa Evans 11. Graphic from External Source 4
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 3
3. Many Eyes 1
1. Tableau 0
2. Wordle.net 0
4. Google Fusion 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
10. Compete 0
12. Datawrapper 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 29. Total Frequencies of Used Visualisation Tools
4
3
1
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data
Team/ External Freelance Graphist for
The Guardian
3. Many Eyes
Author: Lisa Evans, Tools Used
165
Table 25. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use
6. Lisa Evans 1. Interactive 2
7. Bar Chart 2
8. Table 2
12. Combination of types 2
10. Map 1
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
5. Video 0
6. Line Graph 0
9. Area Chart 0
11. Symbol 0
13. Relational Diagram 0
14. Network Map 0
15. Timeline 0
16. Scatter Graph 0
Chart 30. Total Frequencies of Used Visualisation Types
2 2 2 2
1
1. Interactive 7. Bar Chart 8. Table 12. Combination of types
10. Map
Author: Lisa Evans, Types Used
166
Table 26. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
6. Lisa Evans 1. Politics / Government / Public Administration 4
3. Culture 1
7. Society 1
9. World News 1
11. Environment / Weather / Nature 1
13. Transportation 1
15. Economy / Business 1
2. Sports 0
4. Health 0
5. Military / War 0
6. Education 0
8. Crime / Terrorism 0
10. Global Development 0
12. Media / Journalism 0
14. Technology / Science 0
Chart 31. Total Frequencies of Subjects
4
1 1 1 1 1 1
Author: Lisa Evans, Subjects
167
Author: James Ball (Tables 27-29, Charts 32-34) Table 27. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
4. James Ball 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 3
3. Many Eyes 1
11. Graphic from External Source 1
1. Tableau 0
2. Wordle.net 0
4. Google Fusion 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
10. Compete 0
12. Datawrapper 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 32. Total Frequencies of Used Visualisation Tools
3
1 1
9. Guardian Graphics' Team / Guardian Data
Team/ External Freelance Graphist for The Guardian
3. Many Eyes 11. Graphic from External Source
Author: James Ball, Tools Used
168
Table 28. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use
4. James Ball 8. Table 2
1. Interactive 1
5. Video 1
6. Line Graph 1
10. Map 1
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
7. Bar Chart 0
9. Area Chart 0
11. Symbol 0
12. Combination of types 0
13. Relational Diagram 0
14. Network Map 0
15. Timeline 0
Chart 33. Total Frequencies of Used Visualisation Types
2
1 1 1 1
8. Table 1. Interactive 5. Video 6. Line Graph 10. Map
Author: James Ball, Types Used
169
Table 29. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
4. James Ball 1. Politics / Government / Public Administration 3
4. Health 2
6. Education 1
7. Society 1
15. Economy / Business 1
2. Sports 0
3. Culture 0
5. Military / War 0
8. Crime / Terrorism 0
9. World News 0
10. Global Development 0
11. Environment / Weather / Nature 0
12. Media / Journalism 0
13. Transportation 0
14. Technology / Science 0
Chart 34. Total Frequencies of Subjects
3
2
1 1 1
1. Politics / Government /
Public Administration
4. Health 6. Education 7. Society 15. Economy / Business
Author: James Ball, Subjects
170
Author: Claire Provost (Tables 30-32, Charts 35-37) Table 30. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
7. Claire Provost 11. Graphic from External Source 3
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 2
1. Tableau 0
2. Wordle.net 0
3. Many Eyes 0
4. Google Fusion 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
10. Compete 0
12. Datawrapper 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 35. Total Frequencies of Used Visualisation Tools
3
2
11. Graphic from External Source 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian
Author: Claire Provost, Tools Used
171
Table 31. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use 7. Claire Provost 10. Map 3
6. Line Graph 2
8. Table 2
1. Interactive 1
12. Combination of types 1
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
5. Video 0
7. Bar Chart 0
9. Area Chart 0
11. Symbol 0
13. Relational Diagram 0
14. Network Map 0
15. Timeline 0
16. Scatter Graph 0
Chart 36. Total Frequencies of Used Visualisation Types
3
2 2
1 1
10. Map 6. Line Graph 8. Table 1. Interactive 12. Combination of
types
Author: Claire Provost, Types Used
172
Table 32. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
7. Claire Provost 10. Global Development
6
1. Politics / Government / Public Administration 0
2. Sports 0
3. Culture 0
4. Health 0
5. Military / War 0
6. Education 0
7. Society 0
8. Crime / Terrorism 0
9. World News 0
11. Environment / Weather / Nature 0
12. Media / Journalism 0
13. Transportation 0
14. Technology / Science 0
15. Economy / Business 0
Chart 37. Total Frequencies of Subjects
6
0
10. Global Development 6.Others
Author: Claire Provost, Subjects
173
Author: Katy Stoddard (Tables 33-35, Charts 38-40) Table 33. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
14. Katy Stoddard 2. Wordle.net 3
6. Google Docs / Drive 1
1. Tableau 0
3. Many Eyes 0
4. Google Fusion 0
5. Zoom.it 0
8. Infomous 0
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 0
10. Compete 0
11. Graphic from External Source 0
12. Datawrapper 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 38. Total Frequencies of Used Visualisation Tools
3
1
2. Wordle.net 6. Google Docs / Drive
Author: Kathy Stodard, Tools Used
174
Table 34. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use 14. Katy Stoddard 2. Word Cloud
3
4. Spreadsheet 1
1. Interactive 0
3. Pie Chart 0
5. Video 0
6. Line Graph 0
7. Bar Chart 0
8. Table 0
9. Area Chart 0
10. Map 0
11. Symbol 0
12. Combination of types 0
13. Relational Diagram 0
14. Network Map 0
15. Timeline 0
16. Scatter Graph 0
Chart 39. Total Frequencies of Used Visualisation Types
3
1
2. Word Cloud 4. Spreadsheet
Author: Kathy Stodard, Types Used
175
Table 35. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
14. Katy Stoddard 3. Culture
3
1. Politics / Government / Public Administration 1
7. Society 1
13. Transportation 1
2. Sports 0
4. Health 0
5. Military / War 0
6. Education 0
8. Crime / Terrorism 0
9. World News 0
10. Global Development 0
11. Environment / Weather / Nature 0
12. Media / Journalism 0
14. Technology / Science 0
15. Economy / Business 0
Chart 40. Total Frequencies of Subjects
3
1 1 1
3. Culture 1. Politics / Government / Public Administration
7. Society 13. Transportation
Author: Kathy Stodard, Subjects
176
Author: Nick Evershed (Tables 36-38, Charts 41-43) Table 36. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
27. Nick Evershed 12. Datawrapper 10
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 1
11. Graphic from External Source 1
1. Tableau 0
2. Wordle.net 0
3. Many Eyes 0
4. Google Fusion 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
10. Compete 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 41. Total Frequencies of Used Visualisation Tools
10
1 1
12. Datawrapper 9. Guardian Graphics' Team / Guardian Data
Team/ External Freelance Graphist for The Guardian
11. Graphic from External Source
Author: Nick Evershed, Tools Used
177
Table 37. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use 27. Nick Evershed 8. Table
2
5. Video 1
6. Line Graph 1
7. Bar Chart 1
10. Map 1
1. Interactive 0
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
9. Area Chart 0
11. Symbol 0
12. Combination of types 0
13. Relational Diagram 0
14. Network Map 0
15. Timeline 0
16. Scatter Graph 0
Chart 42. Total Frequencies of Used Visualisation Types
2
1 1 1 1
8. Table 5. Video 6. Line Graph 7. Bar Chart 10. Map
Author: Nick Evershed, Types Used
178
Table 38. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
27. Nick Evershed 9. World News 2
11. Environment / Weather / Nature 1
13. Transportation 1
14. Technology / Science 1
1. Politics / Government / Public Administration 0
2. Sports 0
3. Culture 0
4. Health 0
5. Military / War 0
6. Education 0
7. Society 0
8. Crime / Terrorism 0
10. Global Development 0
12. Media / Journalism 0
15. Economy / Business 0
Chart 43. Total Frequencies of Subjects
2
1 1 1
9. World News 11. Environment / Weather / Nature
13. Transportation 14. Technology / Science
Author: Nick Eveshed, Subjects
179
Author: Randeep Ramesh (Tables 39-41, Charts 44-46) Table 39. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
12. Randeep Ramesh 4. Google Fusion 1
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian
1
12. Datawrapper 1
1. Tableau 0
2. Wordle.net 0
3. Many Eyes 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
10. Compete 0
11. Graphic from External Source 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 44. Total Frequencies of Used Visualisation Tools
1 1 1
4. Google Fusion 9. Guardian Graphics' Team / Guardian Data
Team/ External Freelance Graphist for The Guardian
12. Datawrapper
Author: Randeep Ramesh, Tools Used
180
Table 40. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use 12. Randeep Ramesh 9. Area Chart
2
6. Line Graph 1
7. Bar Chart 1
8. Table 1
10. Map 1
1. Interactive 0
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
5. Video 0
11. Symbol 0
12. Combination of types 0
13. Relational Diagram 0
14. Network Map 0
15. Timeline 0
16. Scatter Graph 0
Chart 45. Total Frequencies of Used Visualisation Types
2
1 1 1 1
9. Area Chart 6. Line Graph 7. Bar Chart 8. Table 10. Map
Author: Randeep Ramesh, Types Used
181
Table 41. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
12. Randeep Ramesh 7. Society
2
1. Politics / Government / Public Administration 1
2. Sports 0
3. Culture 0
4. Health 0
5. Military / War 0
6. Education 0
8. Crime / Terrorism 0
9. World News 0
10. Global Development 0
11. Environment / Weather / Nature 0
12. Media / Journalism 0
13. Transportation 0
14. Technology / Science 0
15. Economy / Business 0
Chart 46. Total Frequencies of Subjects
2
1
7. Society 1. Politics / Government / Public Administration
Author: Randeep Ramesh, Subects
182
Author: Sarah Hartley (Tables 42-44, Charts 47-49) Table 42. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
17. Sarah Hartley 3. Many Eyes 1
4. Google Fusion 1
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 1
15. Zee Maps 1
1. Tableau 0
2. Wordle.net 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
10. Compete 0
11. Graphic from External Source 0
12. Datawrapper 0
13. Timetric 0
14. Prezi 0
16. Batchgeo 0
17. Cartödb 0
Chart 47. Total Frequencies of Used Visualisation Tools
1 1 1 1
3. Many Eyes 4. Google Fusion 9. Guardian Graphics' Team / Guardian
Data Team/ External Freelance Graphist for The Guardian
15. Zee Maps
Author: Sarah Hartley, Tools Used
183
Table 43. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use 17. Sarah Hartley 10. Map
2
1. Interactive 1
9. Area Chart 1
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
5. Video 0
6. Line Graph 0
7. Bar Chart 0
8. Table 0
11. Symbol 0
12. Combination of types 0
13. Relational Diagram 0
14. Network Map 0
15. Timeline 0
16. Scatter Graph 0
Chart 48. Total Frequencies of Used Visualisation Types
2
1 1
10. Map 1. Interactive 9. Area Chart
Author: Sarah Hartley, Types Used
184
Table 44. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
17. Sarah Hartley 1. Politics / Government / Public
Administration 1
12. Media / Journalism 1
13. Transportation 1
2. Sports 0
3. Culture 0
4. Health 0
5. Military / War 0
6. Education 0
7. Society 0
8. Crime / Terrorism 0
9. World News 0
10. Global Development 0
11. Environment / Weather / Nature 0
14. Technology / Science 0
15. Economy / Business 0
Chart 49. Total Frequencies of Subjects
1 1 1
1. Politics / Government / Public Administration
12. Media / Journalism 13. Transportation
Author: Sarah Hartley, Subjects
185
Author: Kevin Anderson (Tables 45-47, Charts 50-52) Table 45. Total Use of Visualisation Tools (Frequencies)
Author Code / Name Tool Code/ Name Times of Use
21. Kevin Anderson 11. Graphic from External Source 5
1. Tableau 0
2. Wordle.net 0
3. Many Eyes 0
4. Google Fusion 0
5. Zoom.it 0
6. Google Docs / Drive 0
8. Infomous 0
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 0
10. Compete 0
12. Datawrapper 0
13. Timetric 0
14. Prezi 0
15. Zee Maps 0
16. Batchgeo 0
17. Cartödb 0
Chart 50. Total Frequencies of Used Visualisation Tools
5
0
11. Graphic from External Source Others
Author: Kevin Anderson, Tools Used
186
Table 46. Total Use of Visualisation Types (Frequencies)
Author Code / Name Type of Visualisation Times of Use
21. Kevin Anderson 13. Relational Diagram
3
5. Video 1
10. Map 1
1. Interactive 0
2. Word Cloud 0
3. Pie Chart 0
4. Spreadsheet 0
6. Line Graph 0
7. Bar Chart 0
8. Table 0
9. Area Chart 0
11. Symbol 0
12. Combination of types 0
14. Network Map 0
15. Timeline 0
16. Scatter Graph 0
Chart 51. Total Frequencies of Used Visualisation Types
3
1 1
13. Relational Diagram 5. Video 10. Map
Author: Kevin Anderson, Types Used
187
Table 47. Total Frequencies of Subjects
Author Code / Name Subject code / Name
Number of Articles
21. Kevin Anderson
1. Politics / Government / Public Administration 1
12. Media / Journalism 1
15. Economy / Business 1
2. Sports 0
3. Culture 0
4. Health 0
5. Military / War 0
6. Education 0
7. Society 0
8. Crime / Terrorism 0
9. World News 0
10. Global Development 0
11. Environment / Weather / Nature 0
13. Transportation 0
14. Technology / Science 0
Chart 52. Total Frequencies of Subjects
1 1 1
1. Politics / Government / Public Administration
12. Media / Journalism 15. Economy / Business
Author: Kevin Anderson, Subjects
188
4.8 Types of Visualisations per Subject and Subjects of Visualisations per Type
Visualisation Types per Subject: (Table 48, Charts 53-68) Table 48. Part 1: Number of Graphs per Visualisation Type per Subject
Type
Subject Code /Name
1. Interactive
2. Word Cloud
3. Pie
Chart 4.
Spreadsheet 5.
Video 6. Line Graph
7. Bar Chart
8. Table
1. Politics / Government / Public Administration 6 7 0 3 2 6 12 6 2. Sports 9 0 0 2 0 4 6 4
3. Culture 6 4 1 2 0 4 6 4
4. Health 0 0 0 0 0 0 3 4
5. Military / War 3 0 0 1 1 1 1 1
6. Education 4 0 0 0 0 2 8 0
7. Society 10 0 0 3 2 6 16 2
8. Crime / Terrorism
3 0 0 0 0 4 1 1
9. World News 4 0 0 0 2 0 6 3
10. Global Development
2 0 0 0 0 3 4 3 11. Environment / Weather / Nature 3 0 0 2 0 3 1 3
12. Media / Journalism
4 0 0 0 2 0 1 0 13. Transportation 2 0 0 0 0 0 3 8
14. Technology / Science
1 0 0 0 0 0 6 0 15. Economy / Business 0 0 0 0 0 3 1 2
189
Table 48. Part 2: Number of Graphs per Visualisation Type per Subject
Type
Subject Code /Name
9. Area Chart
10. Map
11. Symbol
12. Combination of types
13. Relational Diagram
14. Network Map
15. Timeline
16. Scatter Graph
1. Politics / Government / Public Administration 12 8 0 5 1 0 0 1 2. Sports 1 2 2 7 0 0 1 0
3. Culture 2 3 4 1 2 0 1 0
4. Health 0 7 1 1 3 0 0 0
5. Military / War 2 2 0 3 1 0 0 0
6. Education 0 6 0 0 0 0 0 0
7. Society 6 9 1 1 0 0 0 3
8. Crime / Terrorism
4 2 0 0 0 0 0 0
9. World News 0 3 0 0 4 0 0 0
10. Global Development
0 5 0 1 0 0 0 0 11. Environment / Weather / Nature 2 2 0 6 1 0 0 0
12. Media / Journalism
2 2 1 1 3 1 1 0 13. Transportation 0 7 0 0 0 0 0 0
14. Technology / Science
0 1 0 3 0 1 0 0 15. Economy / Business 1 3 0 1 1 2 0 0
190
Chart 53. Visualisation Type: 1. Interactive, per Subject (Frequencies)
Chart 54. Visualisation Type: 2. Word Cloud, per Subject (Frequencies)
10 9
6 6
4 4 4
3 3 3
2 2
1 0 0
Society Sports
Politics / Government / Public Culture
Education World News
Media / Journalism Military / War
Crime / Terrorism Environment / Weather / Nature
Global Development Transportation
Technology / Science Health
Economy / Business
Type: Interactive, Per Subject
7
4
0
Politics / Government / Public Administration
Culture Others
Type: Word Cloud, Per Subject
191
Chart 55. Visualisation Type: 3. Pie Chart, per Subject (Frequencies)
Chart 56. Visualisation Type: 4. Spreadsheet, per Subject (Frequencies)
1
0
Culture Others
Type: Pie Chart, per Subject
3 3
2 2 2
1
0
Type: Spreadsheet, per Subject
192
Chart 57. Visualisation Type: 5. Video, per Subject (Frequencies)
Chart 58. Visualisation Type: 6. Line Graph, per Subject (Frequencies)
2 2 2 2
1
0
Type: Video, per Subject
6
6
4
4
4
3
3
3
2
1
0
Politics / Government / Public Administration
Society
Sports
Culture
Crime / Terrorism
Global Development
Environment / Weather / Nature
Economy / Business
Education
Military / War
Others
Type: Line Graph, per Subject
193
Chart 59. Visualisation Type: 7. Bar Chart, per Subject (Frequencies)
Chart 60. Visualisation Type: 8. Table, per Subject (Frequencies)
16 12
8 6 6 6 6
4 3 3
1 1 1 1 1
Society Politics / Government / Public
Education Sports Culture
World News Technology / Science Global Development
Health Transportation Military / War
Crime / Terrorism Environment / Weather / Nature
Media / Journalism Economy / Business
Type: Bar Chart, Per Subject
8
6
4
4
4
3
3
3
2
2
1
1
0
Transportation
Politics / Government / Public
Sports
Culture
Health
World News
Global Development
Environment / Weather / Nature
Society
Economy / Business
Military / War
Crime / Terrorism
Others
Type: Table, Per Subject
194
Chart 61. Visualisation Type: 9. Area Chart, per Subject (Frequencies)
Chart 62. Visualisation Type: 10. Map, per Subject (Frequencies)
12
6
4
2
2
2
2
1
1
Politics / Government / Public Administration
Society
Crime / Terrorism
Culture
Military / War
Environment / Weather / Nature
Media / Journalism
Sports
Economy / Business
Type: Area Chart, per Subject
9 8
7 7
6 5
3 3 3
2 2 2 2 2
1
Society Politics / Government / Public
Health Transportation
Education Global Development
Culture World News
Economy / Business Sports
Military / War Crime / Terrorism
Environment / Weather / Nature Media / Journalism
Technology / Science
Type: Map, per Subject
195
Chart 63. Visualisation Type: 11. Symbol, per Subject (Frequencies)
Chart 64. Visualisation Type: 12. Combination, per Subject (Frequencies)
4
2
1 1 1
0
Culture Sports Health Society Media / Journalism
Others
Type: Symbol, per Subject
7
6
5
3
3
1
1
1
1
1
1
0
Sports
Environment / Weather / Nature
Politics / Government / Public
Military / War
Technology / Science
Culture
Health
Society
Global Development
Media / Journalism
Economy / Business
Others
Type: Combination, per Subject
196
Chart 65. Visualisation Type: 13. Relational Diagram, per Subject (Frequencies)
Chart 66. Visualisation Type: 14. Network Map, per Subject (Frequencies)
4
3
3
2
1
1
1
1
0
World News
Health
Media / Journalism
Culture
Politics / Government / Public
Military / War
Environment / Weather / Nature
Economy / Business
Others
Type: Relational Diagram, per Subject
2
1 1
0
Economy / Business
Media / Journalism Technology / Science
Others
Type: Network Map, per Subject
197
Chart 67. Visualisation Type: 15. Timeline, per Subject (Frequencies)
Chart 68. Visualisation Type: 15. Timeline, per Subject (Frequencies)
1 1 1
0
Sports Culture Media / Journalism Others
Type: Timeline, per Subject
3
1
0
Society Politics / Government / Public Administration
Others
Type: Scatter Graph, per Subject
198
Subjects per Visualisation Types: (Table 49, Charts 69-83)
Table 49. Part 1: Number of Graphs per Subject per Visualisation Type
Subject Code / Name
Type
1. Politics / Government /
Public Administration
2. Sports
3. Culture
4. Health
5. Military / War
6. Education
7. Society
8. Crime / Terroris
m
Interactive
6 9 6 0 3 4 10 3 Word Cloud 7 0 4 0 0 0 0 0
Pie Chart 0 0 1 0 0 0 0 0
Spreadsheet 3 2 2 0 1 0 3 0
Video 2 0 0 0 1 0 2 0
Line Graph 6 4 4 0 1 2 6 4
Bar Chart 12 6 6 3 1 8 16 1
Table
6 4 4 4 1 0 2 1
Area Chart 12 1 2 0 2 0 6 0
Map
8 2 3 7 0 6 9 2
Symbol
0 2 4 1 0 0 1 0
Combination of types
5 7 1 1 0 0 1 0 Relational Diagram 1 0 2 3 0 0 0 0
Network Map
0 0 0 0 0 0 0
Timeline 0 1 1 0 0 0 0 0
Scatter Graph 1 0 0 0 0 0 3 0
199
Table 49. Part 2: Number of Graphs per Subject per Visualisation Type
Subject Code / Name
Type
9. World News
10. Global
Development
11. Environment / Weather /
Nature
12. Media
/ Journalism
11. Transportat
ion
12. Technology
/ Science
13. Economy / Business
Interactive
4 2 3 4 2 1 0 Word Cloud 0 0 0 0 0 0 0
Pie Chart 0 0 0 0 0 0 0
Spreadsheet 0 0 2 0 0 0 0
Video 2 0 0 2 0 0 1
Line Graph 0 3 3 0 0 0 3
Bar Chart 6 4 1 1 3 6 1
Table
3 3 3 0 8 0 2
Area Chart 4 0 0 2 0 0 1
Map
3 5 2 2 7 1 3
Symbol
0 0 0 1 0 0 0
Combination of types
0 1 6 1 0 3 1 Relational Diagram 4 0 1 3 0 0 1
Network Map
0 0 0 1 0 1 2
Timeline 0 0 0 1 0 0 0
Scatter Graph 0 0 0 0 0 0 0
200
Chart 69. Subject 1. Politics / Government / Public Administration, per Visualisation Type (Frequencies)
Chart 70. Subject 2. Sports, per Visualisation Type (Frequencies)
12 12
8 7
6 6 6 5
3 2
1 1 0 0 0 0
12 12
8 7
6 6 6 5
3 2
1 1 0 0 0 0
Subject: Politics / Government / Public Administration, per Type
9
7 6
4 4
2 2 2 1 1
0
Subject: Sports, per Type
201
Chart 71. Subject 3. Culture, per Visualisation Type (Frequencies)
Chart 72. Subject 4. Health, per Visualisation Type (Frequencies)
6 6
4 4 4 4
3
2 2 2
1 1 1
0
Subject: Culture, per Type
7
4
3 3
1 1
0 Map Table Bar Chart Relational
Diagram Symbol Combination
of types Others
Subject: Health, per Type
202
Chart 73. Subject 5. Military / War, per Visualisation Type (Frequencies)
Chart 74. Subject 6. Education, per Visualisation Type (Frequencies)
3
2
1 1 1 1 1
0
Subject: Military/War, per Type
8
6
4
2
0 Bar Chart Map Interactive Line Graph Others
Subject: Education, per Type
203
Chart 75. Subject 7. Society, per Visualisation Type (Frequencies)
Chart 76. Subject 8. Crime / Terrorism, per Visualisation Type (Frequencies)
16
10 9
6 6
3 3 2 2
1 1 0
Subject: Society, per Type
4
3
2
1 1
0 Line Graph Interactive Map Bar Chart Table Others
Subject: Crime / Terrorism, per Type
204
Chart 77. Subject 9. World News, per Visualisation Type (Frequencies)
Chart 78. Subject 10. Global Development, per Visualisation Type (Frequencies)
6
4 4 4
3 3
2
0 Bar Chart Interactive Area Chart Relational
Diagram Table Map Video Others
Subject: World News, per Type
5
4
3 3
2
1
0
Subject: Global Development, per Type
205
Chart 79. Subject 11. Environment / Weather / Nature, per Visualisation Type (Frequencies)
Chart 80. Subject 12. Media / Journalism, per Visualisation Type (Frequencies)
6
3 3 3
2 2
1 1
0
Subject: Environment / Weather / Nature, per Type
4
3
2 2 2
1 1 1 1 1
0
Subject: Media / Journalism, per Type
206
Chart 81. Subject 13. Transportation, per Visualisation Type (Frequencies)
Chart 82. Subject 14. Technology / Science, per Visualisation Type (Frequencies)
8
7
3
2
0 Table Map Bar Chart Interactive Others
Subject: Transportation, per Type
6
3
1 1 1
0 Bar Chart Combination
of types Interactive Map Network Map Others
Subect: Technology / Science, per Type
207
Chart 83. Subject 15. Economy / Business, per Visualisation Type (Frequencies)
3 3
2 2
1 1 1 1 1
0
Subject: Economy / Business, per Type
208
4.9 Most Used Visualisation Types per Most Used Visualisation Tools and Vice Versa
Most Used Visualisation Types per Most Used Visualisation Tools (Table 50, Charts 84-93) Table 50. Most Used Types per Most Used Tools (Frequencies)
Type Name / Code Tool Name / Code
7. Bar Chart
10. Map
1. Interactive
6. Line Graph
8. Table
12. Combination of types
9. Area Chart
13. Relational Diagram
4. Spreadsheet
2. Word Cloud
11. Graphic from External Source 9 15 23 7 4 22 5 12 0 0 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian 6 9 5 2 22 4 14 3 0 0 12. Datawrapper 40 0 0 13 2 0 0 0 0 0 4. Google Fusion 3 28 0 0 0 0 2 0 0 0 1. Tableau 6 2 2 1 0 2 1 1 0 0 3. Many Eyes 0 0 18 0 0 0 0 0 0 0
6. Google Docs / Drive
0 0 0 0 0 0 0 0 13 0 2. Wordle.net 0 0 0 0 0 0 0 0 0 10 Other 0 4 4 2 0 0 0 0 0 0 Not Known / Not available 11 6 6 12 0 2 5 0 0 1
209
Chart 84. Type 1. Interactive, per most important Tools (Frequency)
Chart 85. Type 2. Word Cloud, per most important Tools (Frequency)
23
18
6
5
4
2
0
0
0
0
11. Graphic from External Source
3. Many Eyes
Not Known / Not available
9. Guardian Graphics' Team / Guardian Data
Other
1. Tableau
12. Datawrapper
4. Google Fusion
6. Google Docs / Drive
2. Wordle.net
Type: Interactive, per Tool
10 1
0 0 0 0 0 0 0 0
2. Wordle.net Not Known / Not available
11. Graphic from External Source 9. Guardian Graphics' Team / Guardian Data
12. Datawrapper 4. Google Fusion
1. Tableau 3. Many Eyes
6. Google Docs / Drive Other
Type: Word Cloud, per Tool
210
Chart 86. Type 4. Spreadsheet, per most important Tools (Frequency)
Chart 87. Type 6. Line Graph, per most important Tools (Frequency)
13
0
0
0
0
0
0
0
0
0
6. Google Docs / Drive
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The
12. Datawrapper
4. Google Fusion
1. Tableau
3. Many Eyes
2. Wordle.net
Other
Not Known / Not available
Type: Spreadsheet, per Tool
13
12
7
2
2
1
0
0
0
0
12. Datawrapper
Not Known / Not available
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data
Other
1. Tableau
4. Google Fusion
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Type: Line Graph, per Tool
211
Chart 88. Type 7. Bar Chart, per most important Tools (Frequency)
Chart 89. Type 8. Table, per most important Tools (Frequency)
40
11
9
6
6
3
0
0
0
0
12. Datawrapper
Not Known / Not available
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The
1. Tableau
4. Google Fusion
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Other
Type: Bar Chart, per Tool
22
4
2
0
0
0
0
0
0
0
9. Guardian Graphics' Team / Guardian Data
11. Graphic from External Source
12. Datawrapper
4. Google Fusion
1. Tableau
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Other
Not Known / Not available
Type: Table, per Tool
212
Chart 90. Type 9. Area Chart, per most important Tools (Frequency)
Chart 91. Type 10. Map, per most important Tools (Frequency)
14
5
5
2
1
0
0
0
0
0
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The
11. Graphic from External Source
Not Known / Not available
4. Google Fusion
1. Tableau
12. Datawrapper
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Other
Type: Area Chart, per Tool
28
15
9
6
4
2
0
0
0
0
4. Google Fusion
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data
Not Known / Not available
Other
1. Tableau
12. Datawrapper
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Type: Map, per Tool
213
Chart 92. Type 12. Combination, per most important Tools (Frequency)
Chart 93. Type 13. Relational Diagram, per most important Tools (Frequency)
22
4
2
2
0
0
0
0
0
0
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The
1. Tableau
Not Known / Not available
12. Datawrapper
4. Google Fusion
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Other
Type: Combination, per Tool
12
3
1
0
0
0
0
0
0
0
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data
1. Tableau
12. Datawrapper
4. Google Fusion
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Other
Not Known / Not available
Type: Relational Diagram, per Tool
214
Most Used Visualisation Tools per Most Used Visualisation Types (Table 51, Charts 94-101) Table 51. Most Used Tools per Most Used Types (Frequencies)
Tool Name / Code Type Name / Code
11. Graphic from External Source
9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian
12. Datawrapper
4. Google Fusion
1. Tableau
3. Many Eyes
6. Google Docs / Drive
2. Wordle.net
Other
Not Known / Not available
7. Bar Chart 9 6 40 3 6 0 0 0 0 11
10. Map 15 9 0 28 2 0 0 0 4 6
1. Interactive 23 5 0 0 2 18 0 0 4 6 6. Line Graph 7 2 13 0 1 0 0 0 2 12 8. Table 4 22 2 0 0 0 0 0 0 0 12. Combination of types 22 4 0 0 2 0 0 0 0 2 9. Area Chart 5 14 0 2 1 0 0 0 0 5 13. Relational Diagram 12 3 0 0 1 0 0 0 0 0 4. Spreadsheet 0 0 0 0 0 0 13 0 0 0 2. Word Cloud 0 0 0 0 0 0 0 10 0 1
215
Chart 94. Tool 1. Tableau, per most important Visualisation Types (Frequency)
Chart 95. Tool 2. Wordle.net, per most important Visualisation Types (Frequency)
6
2 2 2 1 1 1
0 0 0
Tool: Tableau, per Type
10
0 0 0 0 0 0 0 0 0
Tool: Wordle.net, per Type
216
Chart 96. Tool 3. Many Eyes, per most important Visualisation Types (Frequency)
Chart 97. Tool 4. Google Fusion, per most important Visualisation Types (Frequency)
18
0 0 0 0 0 0 0 0 0
Tool: Many Eyes, per Type
28
3 2 0 0 0 0 0 0 0
Tool: Google Fusion, per Type
217
Chart 98. Tool 6. Google Docs / Drive, per most important Visualisation Types (Frequency)
Chart 99. Tool 9. Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per most important Visualisation Types (Frequency)
13
0 0 0 0 0 0 0 0 0
Tool: Google Docs / Drive, per Type
22
14
9 6 5 4 3 2
0 0
Tool: Guardian Graphics' Team / Guardian Data Team/ External Freelance Graphist for The Guardian, per Type
218
Chart 100. Tool 11. Graphic from External Source, per most important Visualisation Types (Frequency)
Chart 101. Tool 12. Datawrapper from External Source, per most important Visualisation Types (Frequency)
23 22
15
12
9 7
5 4
0 0
Tool: Graphic from External Source, per Type
40
13
2 0 0 0 0 0 0 0
Tool: Datawrapper, per Type
219
Information School.
Access to Dissertation A Dissertation submitted to the University may be held by the Department (or School) within which the Dissertation was undertaken and made available for borrowing or consultation in accordance with University Regulations. Requests for the loan of dissertations may be received from libraries in the UK and overseas. The Department may also receive requests from other organisations, as well as individuals. The conservation of the original dissertation is better assured if the Department and/or Library can fulfill such requests by sending a copy. The Department may also make your dissertation available via its web pages. In certain cases where confidentiality of information is concerned, if either the author or the supervisor so requests, the Department will withhold the dissertation from loan or consultation for the period specified below. Where no such restriction is in force, the Department may also deposit the Dissertation in the University of Sheffield Library. To be completed by the Author – Select (a) or (b) by placing a tick in the appropriate box If you are willing to give permission for the Information School to make your dissertation available in these ways, please complete the following: ✓ (a) Subject to the General Regulation on Intellectual Property, I, the author, agree to this dissertation being made
immediately available through the Department and/or University Library for consultation, and for the Department and/or Library to reproduce this dissertation in whole or part in order to supply single copies for the purpose of research or private study
(b) Subject to the General Regulation on Intellectual Property, I, the author, request that this dissertation be withheld from loan, consultation or reproduction for a period of [ ] years from the date of its submission. Subsequent to this period, I agree to this dissertation being made available through the Department and/or University Library for consultation, and for the Department and/or Library to reproduce this dissertation in whole or part in order to supply single copies for the purpose of research or private study
Name: CHARALAMPIA BOULA
Department MSc in DIGITAL LIBRARY MANAGEMENT
Signed
Date 01/09/2013
To be completed by the Supervisor – Select (a) or (b) by placing a tick in the appropriate box (a) I, the supervisor, agree to this dissertation being made immediately available through the Department and/or
University Library for loan or consultation, subject to any special restrictions (*) agreed with external organisations as part of a collaborative project.
*Special restrictions
(b) I, the supervisor, request that this dissertation be withheld from loan, consultation or reproduction for a period of [ ] years from the date of its submission. Subsequent to this period, I, agree to this dissertation being made available through the Department and/or University Library for loan or consultation, subject to any special restrictions (*) agreed with external organisations as part of a collaborative project
Name: Department Signed Date THIS SHEET MUST BE SUBMITTED WITH DISSERTATIONS IN ACCORDANCE WITH DEPARTMENTAL REQUIREMENTS.