33
Hadley Wickham Stat405 Graphical theory & critique Tuesday, 19 October 2010

16 critique

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 16 critique

Hadley Wickham

Stat405Graphical theory & critique

Tuesday, 19 October 2010

Page 2: 16 critique

Project

• Generally excellent

• Common problems: lack of proof reading, lack of flow

• If you’re going to throw away 85% of the data, I want to know how that differs from the data that you kept

Tuesday, 19 October 2010

Page 3: 16 critique

Project

• Don’t forget to set up a meeting time with me this week

• (I’ll be travelling from Saturday until when the project is due, so if you can’t meet with me this week, email Garrett to set up a time)

Tuesday, 19 October 2010

Page 4: 16 critique

Tuesday, 19 October 2010

Page 5: 16 critique

Exploratory graphics

Are for you (not others). Need to be able to create rapidly because your first attempt will never be the most revealing.

Iteration is crucial for developing the best display of your data.

Gives rise to two key questions:

Tuesday, 19 October 2010

Page 6: 16 critique

What should I plot?How can I plot it?

Tuesday, 19 October 2010

Page 7: 16 critique

Two general tools

Plot critique toolkit: “graphics are like pumpkin pie”

Theory behind ggplot2:“A layered grammar of graphics”

plus lots of practice...

Tuesday, 19 October 2010

Page 8: 16 critique

Graphics are like pumpkin pie

The four C’s of critiquing a graphic

Tuesday, 19 October 2010

Page 9: 16 critique

Content

Tuesday, 19 October 2010

Page 10: 16 critique

Construction

Tuesday, 19 October 2010

Page 11: 16 critique

ContextTuesday, 19 October 2010

Page 12: 16 critique

ConsumptionTuesday, 19 October 2010

Page 13: 16 critique

Content

What data (variables) does the graph display?What non-data is present?What is pumpkin (essence of the graphic) vs what is spice (useful additional info)?

Tuesday, 19 October 2010

Page 14: 16 critique

Your turn

Identify the data and non-data on “Napoleon's march” and “Building an electoral victory”. Which features are the most important? Which are just useful background information?

Tuesday, 19 October 2010

Page 15: 16 critique

Results

Minard’s march: (top) latitude, longitude, number of troops, direction, branch, city name (bottom) latitude, temperature, dateBuilding an electoral victory: state, number of electoral college votes, winner, margin of victory

Tuesday, 19 October 2010

Page 16: 16 critique

Construction

How many layers are on the plot?What data does each layer display? What sort of geometric object does it use? Is it a summary of the raw data? How are variables mapped to aesthetics?

Tuesday, 19 October 2010

Page 17: 16 critique

Perceptual mapping1. Position along a common scale 2. Position along nonaligned scale 3. Length4. Angle/slope5. Area 6. Volume7. Colour

Best

Worst

For continuous

variables only!

Tuesday, 19 October 2010

Page 18: 16 critique

Your turn

Answer the following questions for “Napoleon's march” and “Flight delays”:How many layers are on the plot?What data does the layer display? How does it display it?

Tuesday, 19 October 2010

Page 19: 16 critique

ResultsNapoleon’s march: (top) (1) path plot with width mapped to number of troops, colour to direction, separate group for each branch (2) labels giving city names (bottom) (1) line plot with longitude on x-axis and temperature on y-axis (2) text labels giving datesFlight delays: (1) white circles showing 100% cancellation, (2) outline of states, (3) points with size proportional to percent cancellations at each airport.

Tuesday, 19 October 2010

Page 20: 16 critique

Can the explain composition of a graphic in words, but how do we

create it?

Tuesday, 19 October 2010

Page 21: 16 critique

“If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC

Tuesday, 19 October 2010

Page 22: 16 critique

“If any number of magnitudes are each the same multiple of the same number of other magnitudes, then the sum is that multiple of the sum.” Euclid, ~300 BC

m(Σx) = Σ(mx)Tuesday, 19 October 2010

Page 23: 16 critique

The grammar of graphics

An abstraction which makes thinking about, reasoning about and communicating graphics easier.

Developed by Leland Wilkinson, particularly in “The Grammar of Graphics” 1999/2005

You’ve been using it in ggplot2 without knowing it! But to do more, you need to learn more about the theory.

Tuesday, 19 October 2010

Page 24: 16 critique

What is a layer?

• Data

• Mappings from variables to aesthetics (aes)

• A geometric object (geom)

• A statistical transformation (stat)

• A position adjustment (position)

Tuesday, 19 October 2010

Page 25: 16 critique

layer(geom, stat, position, data, mapping, ...)

layer( data = mpg, mapping = aes(x = displ, y = hwy), geom = "point", stat = "identity", position = "identity")

layer( data = diamonds, mapping = aes(x = carat), geom = "bar", stat = "bin", position = "stack")

Tuesday, 19 October 2010

Page 26: 16 critique

# A lot of typing!

layer( data = mpg, mapping = aes(x = displ, y = hwy), geom = "point", stat = "identity", position = "identity")

# Every geom has an associated default statistic# (and vice versa), and position adjustment.

geom_point(aes(displ, hwy), data = mpg)geom_histogram(aes(displ), data = mpg)

Tuesday, 19 October 2010

Page 27: 16 critique

# To actually create the plotggplot() + geom_point(aes(displ, hwy), data = mpg) ggplot() + geom_histogram(aes(displ), data = mpg)

Tuesday, 19 October 2010

Page 28: 16 critique

# Multiple layersggplot() + geom_point(aes(displ, hwy), data = mpg) + geom_smooth(aes(displ, hwy), data = mpg)

# Avoid redundancy:ggplot(mpg, aes(displ, hwy)) + geom_point() + geom_smooth()

Tuesday, 19 October 2010

Page 29: 16 critique

# Different layers can have different aestheticsggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth()

ggplot(mpg, aes(displ, hwy)) + geom_point(aes(colour = class)) + geom_smooth(aes(group = class), method = "lm", se = F)

Tuesday, 19 October 2010

Page 30: 16 critique

Your turnFor each of the following plots created with qplot, recreate the equivalent ggplot code.

qplot(price, carat, data = diamonds)

qplot(hwy, cty, data = mpg, geom = "jitter")

qplot(reorder(class, hwy), hwy, data = mpg, geom = c("jitter", "boxplot"))

qplot(log10(price), log10(carat), data = diamonds), colour = color) + geom_smooth(method = "lm")

Tuesday, 19 October 2010

Page 31: 16 critique

ggplot(diamonds, aes(price, data)) + geom_smooth()

gglot(mpg, aes(hwy, cty)) + geom_jitter()

ggplot(mpg, aes(reorder(class, hwy), hwy)) + geom_jitter() + geom_boxplot()

ggplot(diamonds, aes(log10(price), log10(carat), colour = color)) + geom_point() + geom_smooth(method = "lm")

Tuesday, 19 October 2010

Page 32: 16 critique

More geoms & stats

See http://had.co.nz/ggplot2 for complete list with helpful icons:

Geoms: (0d) point, (1d) line, path, (2d) boxplot, bar, tile, text, polygon

Stats: bin, summary, sum

Tuesday, 19 October 2010

Page 33: 16 critique

Go back to the descriptions of “Minard’s march” and “Flight delays” that you created before. Start converting your textual description to ggplot2 code.

Your turn

Tuesday, 19 October 2010