03 extensions

  • View
    1.120

  • Download
    0

Embed Size (px)

Text of 03 extensions

  • 1.Stat405Graphical extensions & missing valuesHadley Wickham Tuesday, 31 August 2010

2. 1. Graphical extensions (1d & 2d) 2. Subsetting Tuesday, 31 August 2010 3. 1dextensions Tuesday, 31 August 2010 4. FairGood Very Good 6000 5000 4000 3000 2000 10000countPremium Ideal 6000 5000 4000 3000 2000 10000 0 500010000 15000 0 5000 10000 15000 0 5000 10000 15000 price qplot(price, data = diamonds, binwidth = 500) + facet_wrap(~ cut) Tuesday, 31 August 2010 5. FairGood Very Good 6000 5000 4000 3000 2000 10000countPremium Ideal 6000What makes it5000difcult tocompare the4000distributions?3000 2000Brainstorm for 1minute.10000 0 500010000 15000 0 5000 10000 15000 0 5000 10000 15000 price qplot(price, data = diamonds, binwidth = 500) + facet_wrap(~ cut) Tuesday, 31 August 2010 6. ProblemsEach histogram far away from the others, but we know stacking is hard to read use another way of displaying densities Varying relative abundance makes comparisons difcult rescale to ensure constant areaTuesday, 31 August 2010 7. # Large distances make comparisons hardqplot(price, data = diamonds, binwidth = 500) +facet_wrap(~ cut) # Stacked heights hard to compareqplot(price, data = diamonds, binwidth = 500, fill = cut) # Much better - but still have differing relative abundanceqplot(price, data = diamonds, binwidth = 500,geom = "freqpoly", colour = cut) # Instead of displaying count on y-axis, display density# .. indicates that variable isn't in original dataqplot(price, ..density.., data = diamonds, binwidth = 500,geom = "freqpoly", colour = cut) # To use with histogram, you need to be explicitqplot(price, ..density.., data = diamonds, binwidth = 500,geom = "histogram") + facet_wrap(~ cut)Tuesday, 31 August 2010 8. Your turn Use this technique to explore the relationship between price and clarity, and carat and clarity. Tuesday, 31 August 2010 9. 2dextensions Tuesday, 31 August 2010 10. Idea ggplotSmall points shape = I(".") Transparencyalpha = I(1/50)Jitteringgeom = "jitter"Smooth curve geom = "smooth"geom = "bin2d" or 2d bins geom = "hex" Density contours geom = "density2d" Tuesday, 31 August 2010 11. # There are two ways to add additional geoms# 1) A vector of geom names:qplot(price, carat, data = diamonds,geom = c("point", "smooth")) # 2) Add on extra geomsqplot(price, carat, data = diamonds) + geom_smooth() # This how you get help about a specific geom:?geom_smooth# or go to http://had.co.nz/ggplot2/geom_smooth.htmlTuesday, 31 August 2010 12. # To set aesthetics to a particular value, you need# to wrap that value in I() qplot(price, carat, data = diamonds, colour = "blue")qplot(price, carat, data = diamonds, colour = I("blue")) # Practical application: varying alphaqplot(price, carat, data = diamonds, alpha = I(1/10))qplot(price, carat, data = diamonds, alpha = I(1/50))qplot(price, carat, data = diamonds, alpha = I(1/100))qplot(price, carat, data = diamonds, alpha = I(1/250)) Tuesday, 31 August 2010 13. Your turnExplore the relationship between carat, price and clarity, using these techniques. Which did you nd most useful? Tuesday, 31 August 2010 14. SubsettingTuesday, 31 August 2010 15. MotivationLook at histograms and scatterplots of x, y, z from the diamonds dataset Which values are clearly incorrect? Which values might we be able to correct? (Remember measurements are in millimetres, 1 inch = 25 mm) Tuesday, 31 August 2010 16. Plotsqplot(x, data = diamonds, binwidth = 0.1) qplot(y, data = diamonds, binwidth = 0.1) qplot(z, data = diamonds, binwidth = 0.1) qplot(x, y, data = diamonds) qplot(x, z, data = diamonds) qplot(y, z, data = diamonds) Tuesday, 31 August 2010 17. Modifying data To modify, must rst know how to extract, or subset. Many different methods available in R. Well start with most explicit then learn some shortcuts next time. Basic structure: df$varname df[row index, column index] Tuesday, 31 August 2010 18. $Remember str(diamonds) ? That hints at how to extract individual variables: diamonds$carat diamonds$priceTuesday, 31 August 2010 19. blank include all integer +ve: include -ve: excludelogical include TRUEs character lookup by name Tuesday, 31 August 2010 20. Integer subsettingTuesday, 31 August 2010 21. # Nothingstr(diamonds[, ]) # Positive integers & nothingdiamonds[1:6, ] # same as head(diamonds)diamonds[, 1:4] # watch out! # Two positive integers in rows & columnsdiamonds[1:10, 1:4] # Repeating input repeats outputdiamonds[c(1,1,1,2,2), 1:4] # Negative integers drop valuesdiamonds[-(1:53900), -1] Tuesday, 31 August 2010 22. # Useful technique: Order by one or more columnsdiamonds