Survey Design II
James Neill, 2011
Visualisation of quantitative information
Introduction to research ComEdu Hons/MastersSemester 1, 2011, University of Canberra, ACT, AustraliaJames T. Neillhttp://en.wikiversity.org/wikiVisualisation_of_quantitative_information
Image source: http://commons.wikimedia.org/wiki/File:3D_Bar_Graph_Meeting.jpgImage author: lumaxart, http://www.flickr.com/photos/lumaxart/2136954043/Image license: Creative Commons Attribution Share Alike 2.0 unported, http://creativecommons.org/licenses/by-sa/2.0/deed.en
Description: Overviews levels of measurement and graphical approaches to analysis of univariate data.
Overview
Visualisation
Approaching data
Levels of measurement
Principals of graphing
Univariate graphs
Graphical integrity
Visualisation
Visualization is any technique
for creating images, diagrams, or animations to communicate a
message. - Wikipedia
Image source: http://en.wikipedia.org/wiki/File:FAE_visualization.jpgLicense: Public domain
Is Pivot a turning point for web exploration?
(Gary Flake)
(TED talk - 6 min.)
For more examples of bias questions, see Nardi (2006). pp. 66-67
Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.pngImage author: FRacco, http://commons.wikimedia.org/wiki/User:FRaccoImage license: Creative Commons Attribution 3.0 unported, http://creativecommons.org/licenses/by-sa/3.0/deed.en
Approaching
data
Approaching
data
Entering &
screeningExploring,describing, &
graphingHypothesistesting
You are adding tools to your toolkitImage: Clipart
Describing & graphing data
THE CHALLENGE:
to find a meaningful, accurate
way to depict thetrue story of the data
Get your fingers dirty with data
Image source: http://www.flickr.com/photos/analytik/1356366068/By analytic http://www.flickr.com/photos/analytik/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
Get intimate with your data
Image source: http://www.flickr.com/photos/elmoalves/2932572231/By analytic Elmo Alves http://www.flickr.com/photos/elmoalves/License: CC-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en
Clearly report the data's main features
Image source: http://www.flickr.com/photos/lloydm/2429991235/By analytic fakelvis http://www.flickr.com/photos/lloydm/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
Levels of Measurement
=
Type of Data
Stevens (1946)
Image source: Unknown.
Levels of measurement
Nominal / Categorical
Ordinal
Interval
Ratio
Nominal/Category - measures identify categories e.g., sex, ethnicity.
Ordinal - relative ordering of responses e.g., rankings in an exam
Interval - scores stand in a quantitative relationship to one another, adjacent scores are separated by an equal interval
Ratio - like interval but with a true zero value e.g., height, speed
Discrete vs. continuous
Discrete- - - - - - - - - -
Continuous___________
Discrete data: finite options (e.g., labels)Continuous data: infinite options (e.g., cms)Discrete data is generally only whole numbers, whilst continuous data can have many decimalsDiscrete: nominal, ordinal, intervalContinuous: ratio
Each level has the properties of the preceding
levels, plus something more!
Image source: Unknown.
Categorical / nominal
Conveys a category label
(Arbitrary) assignment of #s to categories
e.g. Gender
No useful information, except as labels
Ordinal / ranked scale
Conveys order, but not distance
e.g. in a race, 1st, 2nd, 3rd, etc. or ranking of favourites or preferences
Image: Cropped version of http://www.flickr.com/photos/beatkueng/1350250361/?addedcomment=1#comment72157605326099631CC-by-A by Beat - http://www.flickr.com/photos/beatkueng/
Ordinal / ranked example:
Ranked importance
Rank the following aspects of the university according to what is most important to you (1 = most important through to 5 = least important)__ Quality of the teaching and education__ Quality of the social life__ Quality of the campus__ Quality of the administration__ Quality of the university's reputation
Image source:L.N Fowler & Co. c. 1870.
Interval scale
Conveys order & distance
0 is arbitrary
e.g., temperature (degrees C)Usually treat as continuous for > 5 intervals
Interval example:
8 point Likert scale
Image source:L.N Fowler & Co. c. 1870.
Ratio scale
Conveys order & distance
Continuous, with a meaningful 0 point
e.g. height, age, weight, time, number of times an event has occurredRatio statements can be made
e.g. X is twice as old (or high or heavy) as Y
Ratio scale:
Time
Image source:L.N Fowler & Co. c. 1870.
Why do levels of measurement matter?
Different analytical procedures
are used for different
levels of data.
More powerful statistics can be applied to higher levels
Image source: Unknown.
Principles of graphing
Image source: http://www.flickr.com/photos/pagedooley/2121472112/By Kevin Dooley http://www.flickr.com/photos/pagedooley/License: CC-by-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en
Graphs
(Edward Tufte)
Visualise data
Reveal data Describe
Explore
Tabulate
Decorate
Communicate complex ideas with clarity, precision, and efficiency
Tufte's graphing guidelines
Show the data
Avoid distortion
Focus on substance rather than method
Present many numbers in a small space
Make large data sets coherent
Tufte's graphing guidelines
Maximise the information-to-ink ratio
Encourage the eye to make comparisons
Reveal data at several levels/layers
Closely integrate with statistical and verbal descriptions
Graphing steps
Identify the purpose of the graph
Select which type of graph to use
Draw a graph
Modify the graph to be clear, non-distorting, and well-labelled.
Disseminate the graph (e.g., include it in a report)
Software for
data visualisation (graphing)
Statistical packages
e.g., SPSS
Spreadsheet packages
e.g., MS Excel
Word-processors
e.g., MS Word Insert Object Micrograph Graph Chart
Univariate graphs
Univariate graphs
Bar graph
Pie chart
Data plot
Error bar
Stem & leaf plot
Box plot (Box & whisker)
Histogram
Bar chart (Bar graph)
Examine comparative heights of bars
X-axis: Collapse if too many categories
Y-axis: Count or % or mean?
Consider whether to use data labels
Use a bar chart instead
Hard to readDoes not show small differences
Rotation / position influences perception
Pie chart
Image source: Unknown
Data plot & error bar
Data plot
Error bar
Image source: Unknown.This is a univariate precursor to a scatterplot (a plot of a ratio by ratio variable).It works if there is a small amount of data; otherwise use a histogram to indicate the frequency within equal interval ranges.From: http://www.physics.csbsju.edu/stats/display.distribution.html
Image source: Unknown.Karl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution Moments around the mean:http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Histogram: At what age do you think you will die?There is an outlier near zero which is minimising the positive skew; the data is also quite strongly leptokurtic.
Stem & leaf plot
Alternative to histogram
Use for ordinal, interval and ratio data
May look confusing to unfamiliar reader
Image source: Unknown.A bit of a plug and plea for stem & leaf plots they are underused. They are powerful because they are:Efficient e.g., they contain all the data succinctly others could use the data in a stem & leaf plot to do further analysis
Visual and mathematical: As well as containing all the data, the stem & leaf plot presents a powerful, recognizable visual of the data, akin to a bar graph. Turning a stem & leaf plot 90 degrees counter-clockiwse is recommend this makes the visual display more conventional and is easy to recognise, and the numbers are are less obvious, hence emphasizing the visual histogram shape.
Contains actual data
Collapses tails
Stem & leaf plot
Frequency Stem & Leaf 7.00 1 . & 192.00 1 . 22223333333 541.00 1 . 444444444444444455555555555555 610.00 1 . 6666666666666677777777777777777777 849.00 1 . 88888888888888888888888888899999999999999999999 614.00 2 . 0000000000000000111111111111111111 602.00 2 . 222222222222222233333333333333333 447.00 2 . 4444444444444455555555555 291.00 2 . 66666666677777777 240.00 2 . 88888889999999 167.00 3 . 000001111 146.00 3 . 22223333 153.00 3 . 44445555 118.00 3 . 666777 99.00 3 . 888999 106.00 4 . 000111 54.00 4 . 222 339.00 Extremes (>=43)
Box plot
(Box & whisker)
Useful for interval and ratio data
Represents min., max, median, quartiles, & outliers
Alternative to histogram
Useful for screening
Useful for comparing variables
Can get messy - too much info
Confusing to unfamiliar reader
Box plot
Histogram
For continuous data
X-axis needs a happy medium for # of categories
Y-axis matters (can exaggerate)
Histogram of male & female heights
Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.DV = height (ratio)IV = Gender (categorical)
Non-normal distributions
Image source: Unknown.The significance tests for skewness / kurtosis are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.
Non-normal distributions
Image source: Unknown.The significance tests for skewness / kurtosis are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.
Histogram of weight
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Roughly normal, with positive skew
Histogram of daily calorie intake
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Bimodal
Histogram of fertility
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Bimodal, with positive skew
Example normal distribution 1
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.At what age do you think you will die?There is an outlier near zero which is minimising the positive skew; the data is also leptokurtic.
Example normal distribution 2
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.This distribution is bi-modal. It should not be treated as normal.In fact, if one looks more closely, it would sense to break down the distribution by gender.From the Quick Fun Survey data in Tutorial 1.
Example normal distribution 2
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.This is the distribution for males; it has a ceiling effect, with very feminine not being selected at all (and not shown on the graph it should be). It is negatively skewed and leptokurtic. Note though that because Very feminine has no cases and is not shown, the population data would probably be even more skewed than this sample indicates. It is probably leptokurtic.
Effects of skew on measures of central tendency
Image source: Unknown.
Alternative to histogram
Implies continuity e.g., time
Can show multiple lines
Line graph
NOIRBar chart & pie chart NOI Histogram IRStem & leaf IRData plot & box plot IRError-bar IRLine graph IR
Summary:
Graphs & levels of measurement
Graphical integrity
(part of academic integrity)
Image source: Unknown.
Graphing can be like a bikini. What they reveal is suggestive,
but what they conceal is vital.
(aka Aaron Levenstein)
Image source: http://www.flickr.com/photos/alosojos/350530627/By FranUlloa, http://www.flickr.com/people/alosojos/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
Graphical integrity
"Like good writing, good graphical displays of data communicate
ideas with clarity, precision, and efficiency.Like poor writing,
bad graphical displays distort or obscure the data, make it harder
to understand or compare, or otherwise thwart the communicative
effect which the graph should convey."
Michael Friendly Gallery of Data Visualisation
Tufte, Edward R., The Visual Display of Quantitative Information, 1983
Clevelands hierarchy
Image source:http://www.processtrends.com/TOC_data_visualization.htm License: Unknown
Cleveland (1984) conducted experiments to measure people's accuracy in interpreting graphs, with findings as follows (Robbins):
Clevelands hierarchy:
Best to worst
Position along a common scale
Position along identical, non aligned scales
Length
Angle-slope
Area
Volume
Color hue - color saturation - density
Image source: Cleveland, William S., Elements of Graphing Data, 1985
Tuftes graphical integrity
Some lapses intentional, some not
Lie Factor = size of effect in graph size of effect in data
Misleading uses of area
Misleading uses of perspective
Leaving out important context
Lack of taste and aesthetics
Tufte, Edward R., The Visual Display of Quantitative Information, 1983
If a survey question produces a floor effect, where will the mean, median and mode lie in relation to one another?
Over the last century, the performance of the best baseball hitters has declined. Does this imply that the overall performance of baseball batters has decreased?
Review questions
OVERHEAD p.84 Bryman & Duncan (1997)
Can you complete this table?
LevelPropertiesExamplesDescriptive StatisticsGraphs
Nominal
/CategoricalOrdinal / RankIntervalRatio
Answers: http://wilderdom.com/research/Summary_Levels_Measurement.html
Links
Presenting Data Statistics Glossary v1.1 - http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html
A Periodic Table of Visualisation Methods - http://www.visual-literacy.org/periodic_table/periodic_table.html
Gallery of Data Visualization - http://www.math.yorku.ca/SCS/Gallery/
Univariate Data Analysis The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm
Pitfalls of Data Analysis
http://www.vims.edu/~david/pitfalls/pitfalls.htm
Statistics for the Life Sciences
http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/Handouts.html
Visualizing Quantitative Data, Tufte E. R., Graphics Press, 2001
Graphical Methods for Data Analysis, Chambers J., Cleveland, B. Kleiner, and P. Tukey, Duxbury Press, Boston, 1983 Exploratory Data Analysis, Tukey J., Addison-Wesley Pub Co., 1977
References
Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth.
Jones, G. E. (2006). How to lie with charts. Santa Monica, CA: LaPuerta.
Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.
Open Office Impress
This presentation was made using Open Office Impress.
Free and open source software.
http://www.openoffice.org/product/impress.html
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level