Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Lecture 3 - Data Visualization
Module 2
A. Data is the raw material
used to create information.
B. Data collection gathers
information measurements.
C. Knowledge is measured by
the number of data
collected.
A. B. C.
0% 0%0%
A. It employs a lot of people
B. It produces data visuals
C. It gathers data
measurements for further
analysis
A. B. C.
0% 0%0%
A. True
B. False
A. B.
0%0%
A. True
B. False
A. B.
0%0%
A. True
B. False
A. B.
0%0%
A. Satellite images
B. Monitoring networks
C. Sampling
A. B. C.
0% 0%0%
A. Data from outer space
B. Data with coordinates
C. Data referring to
environmental resources
A. B. C.
0% 0%0%
5 Words about Climate Change by SCI103 Community – 2015/2016http://www.wordle.net/
“Having the data is not enough. I have to show it in ways people both enjoy and understand.”
(Prof. Hans Rosling)
• Raw data does not provide much insight unless it is processed and presented.
• The ways data is presented have a huge impact in providing meaningful analysis and
interpretation.
• The human brain retains more the information contained in visuals rather than the
information conveyed by written words or words spoken in a conversation.
Visual tools are crucial for effective data communication.
https://youtu.be/AdSZJzb-aX8
Some of the worst data visualizations are the ones you have to stare at for several minutes before you even comprehend what they’re trying to say.
Turn data into something more engaging. There should be a swift “aha!” moment within seconds after someone sees your data.
Goal of data visualization: communicate information clearly and effectively using graphics.
Graphics: visual images presented in a surface such as paper or a computer screen.
• Data visualization: effective techniques used to communicate data or information by
encoding it as visual objects able to synthesize large amounts of data.
Visuals need to combine functionality and design to convey information intuitively.
HTTPS://YOUTU.BE/MKEXX7SDXAI
Charts and tables: synthesize and display data (qualitative and quantitative)
Tables
Source: Goulburn Murray Water
Charts
Conceptual diagrams and infographics– synthesis, visualisation and context
Source : Bureau of Meteorology
Conceptual diagram
http://images.wookmark.com/96600_post1_new.jpg
Infographics
Satellite images and maps - geographic context
Source : Bureau of Meteorology
MapSatellite image
Photographs – provide unique information
Video clips – can help to tell a story by capturing motion, perspective and sound
Source: L Huzzey
Photograph
Video
• Graphic visual representation of
complex data/information, quickly
and clearly
• Include “traditional” data visuals
such as charts and tables
• Right combination of design and
content to be effective
https://doms.csu.edu.au/csu/thumbs/2aa657f5-7f7a-45a6-85b4-3610fd378eba/1/ad6a21bc-9c86-4640-b42a-dd470fa45d1a
http://images.sixrevisions.com/2009/05/09-03_coffee_drinks.jpg http://graphs.net/wp-content/uploads/2012/11/Worlds-10-Popular-Books-Sold-in-Last-50-Years.jpg
"Excellence in statistical graphics consists of complex ideas communicated with
clarity, precision and efficiency" (Prof. Edward Tufte)
• Graphical excellence is of extreme relevance when data is used to perform
analytical tasks such as making comparisons or determining causality.
• “Bad” visuals might distort the data, making it harder to understand or compare &
ineffective and poorly presented information can lead to misinformation.
• Understand the data you are trying to visualize including its size and cardinality
(the uniqueness of data values in a column)
• Determine what you are trying to visualize and what kind of information you want
to communicate
• Know your audience and understand how it processes visual information
• Use a visual that conveys the information in the best and simplest form for your
audience.
• Guideline 1: create the simplest graph that
conveys the information you want to convey
• Guideline 2: consider the type of encoding
object (points, lines, and bars) and attribute
(point position, line length, color) used to
create a plot
• Guideline 3: focus on visualizing patterns or
on visualizing details, depending on the
purpose of the plot
• Guideline 4: select meaningful axis ranges
• Guideline 5: data transformations and
carefully chosen graph aspect ratios can be
used to emphasize rates of change for time-
series data
• Guideline 6: plot overlapping points in a way
that density differences become apparent in
scatter plots
• Guideline 7: use lines when connecting
sequential data in time-series plots
• Guideline 8: aggregate larger datasets in
meaningful ways
• Guideline 9: keep axis ranges as similar as
possible to compare variables
• Guideline 10: select an appropriate color
scheme based on the type of data
“… summarize and describe vast amounts of information in a compact, efficient and
eye-catching way” (Ducklan & Martin, 2002)
Useful for data analysis, visualization & communication:
• present raw data
• present results of fairly complex analyses,
• summarise information,
• expose unanticipated characteristics of data,
• suggest hypotheses which may be further investigated.
Bar charts
Histograms
Pie charts
Graphs
http://www.statmethods.net/graphs/images/pie2.jpg
http://www.statmethods.net/graphs/images/barplot3.jpg
• Display categorical data being each category
independent of the others
• Used to compare a variable across a number of
different groups, showing the size of each
group (the length of each bar is proportional to
the value it represents)
• Two main types of bar charts: horizontal and
vertical. Horizontal bars usually represent a
single period of time whereas column bars may
represent similar items at different times
• Can also be displayed as subdivided bar charts
where different variables are represented in
the same columnX-axis: discrete data
Y-axis: continuous data (count, value or percent)
Gaps between
bars
• Type of a bar chart since data is displayed data using bars/columns but bars are
placed next to each other
• Used to display frequency values, or the number of values that fall within the same
category or interval (represented in the x-axis). In statistics, histograms are a graphical
representation of the distribution of data
• To construct a histogram:
divide the entire range of values into a series of categories;
count how many values fall into each category;
draw a rectangle with height proportional to the count and width equal to the
category size.
X-axis: interval data
Y-axis: continuous data (frequency values)
• Mainly used to compare proportions
• To construct a pie chart:
calculate the relative proportion of data in each
category
the divisions of a pie (proportions or segments)
add to a whole (100%)
• It is generally recommended to avoid pie charts
for data analysis and visualization:
humans process easily differences in line length
than surface area: it is more effective to use a
bar chart (which takes advantage of line length
to show comparison) rather than pie charts
(which use surface area to show comparison)
pie charts usually encode only a handful of
numbers and a table is usually a much more
efficient way to present such information
Proportions are ratio data
Labels are nominal or ordinal data
• A line showing the relationship between two or more variables (line graphs and
scatterplots)
• Advantages:
Display of high information density, sometimes with no loss of data
Rapid assimilation of the overall result
Clearly display of complex relationships among multivariate data
• Graph interpretation:
Height of the line (or series of lines)
Patterns (seasonal pattern, trend or a combination of both)
http://www.statmethods.net/graphs/images/linechart1.png
X-axis:
Discrete and continuous
data
Y-axis:
Discrete and continuous data
Scale should start at zero
Shows a trend Relationship between two variables by
plotting their (x,y) positions
Y-axis:
Continuous data
X-axis:
Continuous data
• Columns and rows filled with data
• Used for summarising results and data comparison
Tables vs charts
Tables are usually a better option than a chart when only few data points need to
be displayed.
If exact numeric values are required, a table is best since it can be hard to
represent the exact values in a chart axis.
In a thesis or research report, the detail and precision of tables may be more
important since they are a repository of information.
(Swires-Hennessy, 2014)
Swires-Hennessy, E. (2014). How to Communicate your Message Effectively.
Hay, I. (2012). Communicating in geography and the environmental sciences.
Thomas, J. E., Saxby, T. A., Jones, A. B., Carruthers, T. J. B., Abal, E. G. & Dennison, W. C. (2006) Communicating science effectively: a practical handbook for integrating visual elements.
Kelly, D., Jasperse, J. & Westbrooke, I. (2005). Designing science graphs for data analysis and presentation: the bad, the good and the better.
Schwabish, J.A. (2014). An Economist’s Guide to Visualizing Data.
• Show the data
• Reveal content
• Avoid distorting data
• Present many numbers in a small space
• Encourage comparison of datasets
• Reveal data at several levels of details
• Serve a clear purpose
• Be integrated with verbal and statistical descriptions of data
• Consistent style and format
• Concise and Comprehensible
(present only the information that is relevant and required to support the content –
“Who?”,“What?”,“How?”)
• Independent
(someone who has not read the document associated with the graphic should be
able to look at it and understand what it means)
• Referenced
http://abacus.bates.edu/~ganderso/biology/resources/writing/graphparts2003.gif http://abacus.bates.edu/~ganderso/biology/resources/writing/population_variation_table_gif.gif
Axis labels legible and easy to find and easily associated with the axis/object depicted
Labels on the graph should be clearly offset from the data or outside the axes
When appropriate the units of measurement should be displayed
Start the y-axis at zero when graphing numbers
http://www.owlnet.rice.edu/~labgroup/pdf/excelplot.pdf
Chart axis
Range of axis scale allows the full range of data to be included.
Data points properly spaced
Tick marks placed at sufficiently frequent intervals for a reader to work out accurately the value of each data point
Time shown as the x-axis with time progressing from left to right & time intervals should be equal
http://www.owlnet.rice.edu/~labgroup/pdf/excelplot.pdf
Chart axis
Two or more datasets must be easily distinguished from one another
No more than 4 simultaneous symbols, values or lines, and each line or symbol should be sufficiently different from the others
Overlapping symbols or lines must be visually separable
Use vertical axes on the left and right sides of the graph to depict different scales when comparing datasets with different measurements
Chart type
O.P. Yakutina, T.V. Nechaeva, N.V. Smirnova, Consequences of snowmelt erosion: Soil fertility, productivity and quality of wheat on Greyzemic Phaeozem in the south of West Siberia, Agriculture, Ecosystems & Environment,
Volume 200, 1 February 2015, Pages 88-93, ISSN 0167-8809, http://dx.doi.org/10.1016/j.agee.2014.10.021.
Table number (unique number for each table to be easily identified)
Table title (self-explanatory, above the table)
Column headings (explain meaning of the data, including units of measurement)
Table notes (supplementary information, below the table)
Table source (references)
Excel Table with Wagga Wagga climate data(September 2005 to August 2006) (Bureau of
Meteorology)Formatting the table to make it effective
Highlighted rows show the days in September 2005
where the rainfall was greater than evaporation
• Figures and tables are numbered separately (e.g. Table 1, Table 2, Figure 1, Table
3, Figure 2)
• Figures and tables must always be properly referred to and captioned in the text.
This means that a figure or table must be mentioned in the text before the figure
or table appears
• Figures and tables captions: should be informative without being too long; if the
data is sourced from elsewhere then this should be referenced in the caption.
Thomas et al., 2006
Figure captions go below the figure
Table captions go above the table
Graham S. Leonard, Carol Stewart, Thomas M. Wilson, Jonathan N. Procter, Bradley J. Scott, Harry J. Keys, Gill E. Jolly, Johnny B. Wardman, Shane J. Cronin, Sara K. McBride, Integrating multidisciplinary science, modelling and
impact data into evolving, syn-event volcanic hazard mapping and communication: A case study from the 2012 Tongariro eruption crisis, New Zealand, Journal of Volcanology and Geothermal Research, Volume 286, 1 October
2014, Pages 208-232, ISSN 0377-0273, http://dx.doi.org/10.1016/j.jvolgeores.2014.08.018.
• A title is not always used if the chart is used as a figure in the text and it is
included in as part of the caption (a stand-alone chart in a presentation would
normally have a title)
• Choose the right typeface style
• Avoid using different typefaces: use bold, italics, capitals, small caps or
contrasting colours to create contrast and emphasis
• If possible, avoid putting values on charts (if numbers are needed use tables
instead)
• Eliminate all redundant terms
• For tables, ensure that all values for the same variable have the same number of
decimal points and ensure decimal points are aligned
Only use 2D charts for 2D data (i.e. to plot two variables)
3D visual appearance distorts the data and avoids a clear interpretation
Use colour, but use it with caution & Try to design your chart without the use of colour
If it reproduces well in black and white it will be able to be reproduced in any medium
Black and white design: make patterns in columns as contrasting as possible
Don’t make shading too gradual
Use the right resolution for your graphics
Resolution measured as number of dots per inch (dpi) or number of pixels (given as the width
and height of the image or as the total number of pixels in the image).
More pixels > more resolution > file size
Different figures formats (*.jpg, *.tif, *.gif, *.png, *.eps): resolution-dependent (image quality
changes with compression) and resolution-independent (same quality even when we change
size)
Multivariate data refers to data that is measured for more than 2 variables (bivariate data refers to 2 variables)
Multivariate charts:
Scatterplot matrices (continuous data)
Mosaic plots (categorical data)
Example multivariate data
Displays the relationship among two or more categorical variables
Used for representing frequency tables (i.e. the number of times a data value occurs)
Example: Mortality rates aboard the Titanic vary for males and females. Among females, 67% survived (coded as 1) and 33% died (coded as 0).
https://medschool.vanderbilt.edu/cqs/files/cqs/media/DrTsai2_0.pdf
• Multidimensional data refers to measurements of variables in more than 2
dimensions (or 2D, normally referring to the Cartesian plot with the x,y axis)
• Common multidimensional charts: 3D plots where time (t) and depth (z) are
frequently chosen as the third dimension
Next lecture:
What is spatial data?
Spatial data collection (GPS)
Go through Module 2 in
your Learning Modules
The information presented
here is important for
Assessment 2a) and 2b)