21
ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

Embed Size (px)

Citation preview

Page 1: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.1

Envisioning Information

Lecture 2

Simple Graphs and Charts

Ken BrodlieSchool of ComputingUniversity of Leeds

Page 2: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.2

Lecture Outline

• Preliminaries– Definitions– Datatypes

• Simple Data Presentation– Graphs and charts

Page 3: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.3

Fundamentals

• Basic Datatypes correspond to different levels of measurement

• Data can be:– Categorical - labels– Numerical – numbers

• Categorical– Nominal

• No sense of order• Apples, oranges,…

– Ordinal• Ordered in sequence• January, February, ..

• Numerical– Continuous

• Real numbers• Height of students in class

– Discrete• Typically whole numbers• Marks in an exam

Page 4: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.4

Question

• Give an example for each class in which numbers are involved…

• Categorical - nominal

• Categorical - ordinal

• Numerical – continuous

• Numerical - discrete

Page 5: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.5

Exploratory Data Analysis

• Pioneering figure is John Tukey

• New approach to data analysis, heavily based on visualization, as an alternative to classical data analysis

• See wikipedia

• Two stage process:– Exploratory: Search for

evidence using all tools available

– Confirmatory: evaluate strength of evidence using classical data analysis

Page 6: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.6

Simple Data Presentation

Page 7: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.7

Simple Data Presentation

• Simple data tables are often presented as line graphs, bar graphs, pie charts, dot graphs, histograms…

• Which should we use and when?

Page 8: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.8

Line Graph

• Fundamental technique of data presentation

• Used to compare two variables

– X-axis is often the control variable

– Y-axis is the response variable

• Good at:– Showing specific values– Trends– Trends in groups (using

multiple line graphs)

Students participating in sporting activities

MobilePhone use

Note: graph labelling is fundamentalAny criticalcomments here?

Page 9: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.9

Simple Representations – Bar Graph

• Bar graph– Presents categorical variables– Height of bar indicates value– Double bar graph allows

comparison– Note spacing between bars– Can be horizontal (when would

you use this?)

Internet use at a school

Number of police officers

Note more space for labels

Page 10: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.10

Dot Graph

• Very simple but effective…

• Horizontal to give more space for labelling

Page 11: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.11

Pie Chart

• Pie chart summarises a set of categorical/nominal data

• But use with care…

• … too many segments are harder to compare than in a bar chart

Should we have a long lecture?

Favourite movie genres

Page 12: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.12

Histograms

• Histograms summarise discrete or continuous data that are measured on an interval scale

• No gaps if variable is continuous

Distribution of salariesin a company

Page 13: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.13

Scatter Plot

• Used to present measurements of two variables

• Effective if a relationship exists between the two variables

Car ownership by household income

Example taken fromNIST Handbook –Evidence of strongpositive correlation

Page 14: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.14

Scatter Plots in Excel

• The scatter plot is a fundamental tool in Excel

• Chart type XY (Scatter) and subtype Unconnected Points

http://www2.ncsu.edu:8010/ncsu/chemistry/resource/excel/excel.html

Page 15: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.15

Regression Line

• Excel allows you to add a linear regression line (trend line)

Remember: correlation does not imply causality… ie a relationshipexists but one is not necessarily causing the other – there may be athird factor?

Page 16: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.16

Tukey Sum-Difference Plot

Better understanding of residuals …

Page 17: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.17

Box Plots

• In some situations we have, not a single data value at a point, but a number of data values, or even a probability distribution

• When might this occur?

• Tukey proposed the idea of a boxplot to visualize the distribution of values

• For explanation and some history, see:

http://mathworld.wolfram.com/Box-and-WhiskerPlot.html

http://en.wikipedia.org/wiki/Box_plot

M – medianQ1, Q3 – quarrtilesWhiskers –1.5 * interquartile rangeDots - outliers

http://www.upscale.utoronto.ca/GeneralInterest/Harrison/Visualisation/Visualisation.html

Darwin’s plant study

Page 18: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.18

Acknowledgement

• Thanks to Statistics Canada – an excellent web site for simple data presentation– http://www.statcan.ca/english/edu/power/toc/contents.htm

Page 19: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.19

Exercise for next week

• Understand a bit more about the merits of pie charts and bar graphs

• Create a dataset with roughly equal numbers in each class• Which is best if the task is to discriminate?

Page 20: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.20

Exercise for next week

• Over the next week look for examples of basic graphs– In newspapers, magazines or other print media

– On news web sites or other electronic media

• Analyse two examples– One should be a example where you think the use of graphics is

good

– One should be bad

• Be ready next week to present these results to the class…

Page 21: ENV 2006 2.1 Envisioning Information Lecture 2 Simple Graphs and Charts Ken Brodlie School of Computing University of Leeds

ENV 2006 2.21

Envisioning Information : Practical Work

Gnuplot

R

Excel