03 extensions

Hadley Wickham

Stat405Graphical extensions & missing values

Tuesday, 31 August 2010

1. Graphical extensions (1d & 2d)

2. Subsetting

1d extensions

Premium

0 5000 10000 15000

Very Good

0 5000 10000 15000

qplot(price, data = diamonds, binwidth = 500) + facet_wrap(~ cut)Tuesday, 31 August 2010

Premium

0 5000 10000 15000

Very Good

0 5000 10000 15000

qplot(price, data = diamonds, binwidth = 500) + facet_wrap(~ cut)

What makes it difficult to compare the distributions?

Brainstorm for 1 minute.

Problems

Each histogram far away from the others, but we know stacking is hard to read → use another way of displaying densities

Varying relative abundance makes comparisons difficult → rescale to ensure constant area

# Large distances make comparisons hardqplot(price, data = diamonds, binwidth = 500) + facet_wrap(~ cut)

# Stacked heights hard to compareqplot(price, data = diamonds, binwidth = 500, fill = cut)

# Much better - but still have differing relative abundanceqplot(price, data = diamonds, binwidth = 500, geom = "freqpoly", colour = cut)

# Instead of displaying count on y-axis, display density# .. indicates that variable isn't in original dataqplot(price, ..density.., data = diamonds, binwidth = 500, geom = "freqpoly", colour = cut)

# To use with histogram, you need to be explicitqplot(price, ..density.., data = diamonds, binwidth = 500, geom = "histogram") + facet_wrap(~ cut)

Use this technique to explore the relationship between price and clarity, and carat and clarity.

Your turn

2d extensions

Idea ggplot

Small points shape = I(".")

Transparency alpha = I(1/50)

Jittering geom = "jitter"

Smooth curve geom = "smooth"

2d bins geom = "bin2d" or geom = "hex"

Density contours geom = "density2d"

# There are two ways to add additional geoms# 1) A vector of geom names:qplot(price, carat, data = diamonds, geom = c("point", "smooth"))

# 2) Add on extra geomsqplot(price, carat, data = diamonds) + geom_smooth()

# This how you get help about a specific geom:?geom_smooth# or go to http://had.co.nz/ggplot2/geom_smooth.html

# To set aesthetics to a particular value, you need# to wrap that value in I()

qplot(price, carat, data = diamonds, colour = "blue")qplot(price, carat, data = diamonds, colour = I("blue"))

# Practical application: varying alphaqplot(price, carat, data = diamonds, alpha = I(1/10))qplot(price, carat, data = diamonds, alpha = I(1/50))qplot(price, carat, data = diamonds, alpha = I(1/100))qplot(price, carat, data = diamonds, alpha = I(1/250))

Your turn

Explore the relationship between carat, price and clarity, using these techniques.

Which did you find most useful?

Subsetting

Motivation

Look at histograms and scatterplots of x, y, z from the diamonds dataset

Which values are clearly incorrect? Which values might we be able to correct? (Remember measurements are in millimetres, 1 inch = 25 mm)

qplot(x, data = diamonds, binwidth = 0.1)qplot(y, data = diamonds, binwidth = 0.1)qplot(z, data = diamonds, binwidth = 0.1)qplot(x, y, data = diamonds)qplot(x, z, data = diamonds)qplot(y, z, data = diamonds)

Modifying data

To modify, must first know how to extract, or subset. Many different methods available in R. We’ll start with most explicit then learn some shortcuts next time.

Basic structure: df$varnamedf[row index, column index]

Remember str(diamonds) ?

That hints at how to extract individual variables:

diamonds$carat

diamonds$price

logical

integer

character

+ve: include-ve: exclude

lookup by name

include TRUEs

include all

Integer subsetting

# Nothingstr(diamonds[, ])

# Positive integers & nothingdiamonds[1:6, ] # same as head(diamonds)diamonds[, 1:4] # watch out!

# Two positive integers in rows & columnsdiamonds[1:10, 1:4]

# Repeating input repeats outputdiamonds[c(1,1,1,2,2), 1:4]

# Negative integers drop valuesdiamonds[-(1:53900), -1]

# Useful technique: Order by one or more columnsdiamonds <- diamonds[order(diamonds$price), ]

# Useful technique: Combine two tablescarats <- data.frame(table(carat = diamonds$carat))mtch <- match(diamonds$carat, carats$carat)diamonds$carat_count <- carats$Freq[mtch]

Logical subsetting

# The most complicated to understand, but # the most powerful. Lets you extract a # subset defined by some characteristic of # the datax_big <- diamonds$x > 10

head(x_big)sum(x_big)mean(x_big)table(x_big)

diamonds$x[x_big]diamonds[x_big, ]

small <- diamonds[diamonds$carat < 1, ]lowqual <- diamonds[diamonds$clarity %in% c("I1", "SI2", "SI1"), ]

# Comparison functions:# < > <= >= != == %in%

# Boolean operators: & | !small <- diamonds$carat < 1 & diamonds$price > 500lowqual <- diamonds$colour == "D" | diamonds$cut == "Fair"

a & !b

xor(a, b)

Useful functions for

logical vectors

table(zeros)sum(zeros)mean(zeros)

TRUE = 1; FALSE = 0

Select the diamonds that have:

Equal x and y dimensions.

Depth between 55 and 70.

Carat smaller than the mean.

Cost more than $10,000 per carat.

Are of good quality or better.

Your turn

Saving results

# Prints to screen

diamonds[diamonds$x > 10, ]

# Saves to new data frame

big <- diamonds[diamonds$x > 10, ]

# Overwrites existing data frame. Dangerous!

diamonds <- diamonds[diamonds$x < 10,]

diamonds <- diamonds[1, 1]diamonds

# Uh oh!

rm(diamonds)str(diamonds)

# Phew!

Your turn

Create a logical vector that selects diamonds with equal x & y. Create a new dataset that only contains these values.

Create a logical vector that selects diamonds with incorrect/unusual x, y, or z values. Create a new dataset that omits these values. (Hint: do this one variable at a time)

equal_dim <- diamonds$x == diamonds$yequal <- diamonds[equal_dim, ]

y_big <- diamonds$y > 10z_big <- diamonds$z > 6

x_zero <- diamonds$x == 0 y_zero <- diamonds$y == 0z_zero <- diamonds$z == 0zeros <- x_zero | y_zero | z_zero

bad <- y_big | z_big | zerosgood <- diamonds[!bad, ]

03 extensions

Documents

Terrace houses: 3m extensions versus 6m extensions - Permitted ... · Part 1 of the GPDO - Terrace houses: 3m extensions versus 6m extensions Last updated: December 2012 Introduction:

Agenda Item : 2/03...detached garage and undertake the following extensions’ alterations to the existing dwellinghouse: 2.2 Four storey rear corner extensions 3.70m wide and 1.25m

Hair Extensions from Lavadene Hair Extensions, Sydney

Amazing PPC Tactics PPC Tactics 062811.pdf · Types of Ad Extensions •Sitelinks •Product Extensions •Call Extensions •Location Extensions •Seller Ratings •Alpha’s &

Contract Data Extensions & Re-extensions...Contract Data – Extensions & Re-extensions, Continued Procedures, continued Step Action 2 Enter the Empl ID, be sure to check the Include

Browser Extensions

EasyPact or Compact NS branch extensions layoutelectroshops.ro/blog/wp-content/uploads/2012/12/... · · 2013-03-06EasyPact or Compact NS branch extensions layout ... Catalogue

Plugins( Extensions( add1ons) · 2 Extensions(for(privacy(protection • Extensions“ ad1blocking” Blockvisible+ advertisements • Extensions“ tracking1blocking” Blockinvisible+

Brand Extensions

Applying Google Apps, Extensions, · 2016-12-03 · More tools - manage extensions. Google Classroom ★Manage Classes and Clubs ... Formative assessment. Assignments Variety of ways

Performance Analysis of Instruction Set Architecture Extensions for Multimediaslingn/publications/mm... · 2002-01-03 · 1 Performance Analysis of Instruction Set Architecture Extensions

Extensions - new and changed applications · Description 03/2015 e.g. replaced pn. / miscellaneous Extensions - new and changed applications - according to applications CITROEN JUMPER

Multimedia Extensions, My Roadmap to the Future, My · PDF file · 2014-03-11Multimedia Extensions, My Roadmap to the Future, My Success ... States: Georgia Standards Subjects: Health

Land Procedure: Extensions to Private Holdings · 2015-03-11 · Land Procedure: Extensions to Private Holdings EFFECTIVE DATE: June 1, 2011 FILE: 11240-00 AMENDMENT: Page 1 1. PURPOSE

File Extensions

Specification of Timing Extensions - autosar.org · Specification of Timing Extensions AUTOSAR CP Release 4.3.1 2014-03-31 4.1.3 AUTOSAR Release Management Revised the entire contents

Tyabb Preschool Inc. · AAA Grade Remy Tape Hair Weave-In Extensions Feather Extensions Clip In Extensions Keratin Hair Straightening Cuts/ Colour/ Foils 100% Human Hair Extensions

Some Practical Extensions to Beverton and Holt's Relative Yield …legacy.seaaroundus.s3.amazonaws.com › doc › Researcher... · 2016-03-03 · Some Practical Extensions to Beverton

Oracle Configurator Extensions and Interface Object …...Oracle® Configurator Extensions and Interface Object Developer’s Guide Release 11i Part No. B13607-03 May 2005 This document

BRICS · BRICS DS-03-9 M. J. Jurik: Extensions to the Paillier Cryptosystem with Applications to Cryptological Protocols BRICS Basic Research in Computer Science Extensions to the