14 case-study

Preview:

Citation preview

Hadley Wickham

Stat405ddply case study (2)

Thursday, 7 October 2010

1. Recap

1. Focus on smaller subset

2. More ddply

1. Develop summary statistic

2. Classify names

3. Apply to full data

Thursday, 7 October 2010

For names that are used for both boys and girls, how has usage changed?

Can we use names that clearly have the incorrect sex to estimate error rates over time?

Questions

Thursday, 7 October 2010

Getting started

options(stringsAsFactors = FALSE)library(plyr)library(ggplot2)

both <- read.csv("both.csv")

Thursday, 7 October 2010

Interesting subsetboth_sum <- ddply(both, "name", summarise, years = length(name), avg_usage = mean(boy + girl) / 2)

both_sum <- subset(both_sum, years > 1)qplot(years, avg_usage, data = both_sum)

selected_names <- subset(both_sum, years > 50 & avg_usage > 0.0005)$nameselected <- subset(both, name %in% selected_names)

Thursday, 7 October 2010

selected$lratio <- with(selected, log10(boy / girl))qplot(lratio, name, data = selected) qplot(lratio, reorder(name, lratio), data = selected)qplot(abs(lratio), reorder(name, lratio), data = selected)

Patterns

Thursday, 7 October 2010

abs(lratio)

reor

der(n

ame,

lrat

io)

MaryHelen

MargaretElizabethFrances

HazelRuby

BerniceCarolPearl

BonnieJune

ShirleyJean

ConnieShannon

OraKelly

PatsyRobin

GailJamieBillieTracyOllie

DanaMarion

LynnJessieJackieAngelLeslie

JohnnieJimmie

WillieTerry

LeeSidney

GeneCecil

EddieFrancis

IraDale

ClydeJerryRay

CharlieJesse

JoeHenry

GeorgeMichaelCharles

FrankJosephJamesRobert

JohnDavid

ThomasWilliamRichard

●●●●●● ●●●● ●●●● ●●●● ●●●● ●● ●●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ● ●●●● ●●●●●●●● ●●●●●●●●●●●●●●●

● ●●● ●● ●●●● ●●●● ●●●● ●●● ●●● ●● ●● ● ●● ●● ●●●●●●●●●●●●●●● ●● ●●● ●●

● ●● ●● ●● ●●● ● ●●● ● ●●●●●●●● ● ●●● ●●● ● ● ●●●●●●● ●●●●●●●●●●●● ●●●● ●●●● ●●●

● ●●● ●●● ●● ● ●●● ●● ●●●●● ●● ●●●● ● ●●●● ●●●● ●●● ●●●● ●●● ●● ●●●●●●●●●●●●●●●

● ●●●● ● ●●●● ●● ●● ●●●●● ●●● ●●●●●● ●●● ● ●●●●● ●●●●●●●●●●●● ●●●● ●● ● ●● ●●●● ●●● ●

● ●● ● ●● ●● ●● ●● ●●●● ●●● ● ●●● ●● ●● ●● ●● ● ●●●●●●●●●●● ●●●●●●●●

● ●●● ●●●●● ●● ●● ● ●●● ●● ● ●●●● ● ●●● ●● ●●● ●●●● ●● ●●●● ●●●●●●●●

● ●●● ●●●● ●● ●● ●●●●● ●●● ● ●●●●● ●●●●●● ●● ●●● ●● ●●●●●● ●●●●●●●

● ● ● ●●● ●● ● ●●●●●●● ● ●●●● ●●●●● ● ●●● ●● ● ●●● ●●● ● ● ● ●● ●● ●●●●● ●●●●● ●●●●●●

● ●●●●●● ●● ●● ●●●● ● ●● ● ●●● ●● ●●●●●● ●●●● ●● ●●●●●●●●●●●●●●●● ●●● ●●●●

● ●● ●● ●●● ●● ●●●●● ●● ●● ●●●●●● ●●●●● ●●● ● ●●●● ●●● ●●● ●●●● ●● ●● ● ●●●● ●● ●●●● ●●●●

●●● ●●●●● ● ●●●●● ●●● ●● ● ●● ● ●●●●● ● ● ●● ●● ●●● ●●● ●● ●●●●●● ●●●● ●●● ●●

●●●●●● ●● ●●● ●●●● ●● ●●● ●● ●● ●● ●●● ●●●●●● ● ● ●● ●● ● ● ●● ●●●●●●● ● ● ●●● ●●● ●●● ●●● ● ●●●●●●●●

●●●●● ● ●● ● ●●●● ●●●● ● ● ●●● ●●● ●●● ● ●● ●● ● ●●●●●●●●●●● ●● ●●●●● ●●●●●●●● ●●●●●● ●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●● ● ●●

●● ●● ● ●●●● ●● ●● ●● ●● ●●● ●●● ●●● ● ●●●● ●● ●● ●●● ● ● ●● ●● ●●●●●● ● ●●● ● ● ●●●● ●● ● ●● ● ●● ●●●●● ●● ●●●●● ●●

● ● ●●●● ●● ● ●● ● ●●●●● ●●●●● ●●●● ● ● ● ●●●● ●● ●●●● ●●● ● ●● ●●●●●● ● ●●●●●●●●●●●●●●

● ●●●●● ●● ● ●●● ● ●●● ●● ●●●●● ●●●● ●● ●● ●●●●●● ●●● ●●● ●●●● ●● ●●● ● ●● ●● ● ●●●●●●

●●● ●●● ●●●●● ● ●●● ● ●●●●● ●●●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●

● ●● ●●● ● ●●●● ● ●● ● ●●●● ● ●●●●● ● ● ● ● ● ● ●● ● ● ●● ● ●●●● ● ●● ●● ●●●●

●● ●●●●●●●●● ●● ●●● ●● ● ●●● ●●● ● ●● ●● ●● ●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●● ●●●●

● ●● ● ●● ●●● ● ●● ●●●●● ●●●● ●●●● ●● ●●●●●●● ●●●●● ●●● ● ● ● ● ● ● ● ● ●● ●●● ●● ●● ●●● ● ●●● ●●●●●●●●

●● ●●●●●● ●● ●●●●● ● ●●●● ● ●●●●●●●● ● ●●●●● ●●●●●●● ●●●● ● ●●●● ●● ●● ●●●●●●●●●● ●●●●●●●●●●●● ●●

● ●●●● ●●● ●● ●● ●● ●●●● ●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ●●● ●●●●●●●●●●●●● ●●● ●● ●● ● ●●● ●●●● ●●●●●

●● ● ●●● ●● ●●●● ●●● ●●● ● ●●●● ●●● ● ● ●●●● ● ● ●●●● ● ●●● ●●●●●●●●●●●●●● ●●●●●●●● ●

● ●● ●●●● ●●●●●●● ●●●●●● ●●● ●●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●● ●● ●●●●●●● ●● ●●●● ●●● ●●●●●●● ●● ●●

●●●● ●●● ●● ●●● ●●●●●● ●● ●●● ●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●● ● ●●●●●● ● ●● ●●● ●●●●● ●● ●●●●●●●●●●● ●●●●●●●●● ●●

●●●●●●● ●●● ● ●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●●●●●●●● ●●● ●●●●●●●●●

● ●● ● ●●● ●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●

●●● ●●●●●●●●●●●● ●●●●●● ●●●●●●● ●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●

●●●●●● ●●● ●●●● ●●●●●●●●●●●●●● ●●●●●●●● ● ●● ● ●●● ●●●●●●●●●●●●● ●●●● ● ●●●●● ●●●●●●●●●●

●●●●●● ●●●●●●●●● ● ●●●●● ●●●● ●●●●●●●● ● ●● ●●●●●●●●●● ● ● ● ● ●●●

●● ● ●● ● ●●●● ●● ●●● ●●● ●●●●● ●●● ●●● ●● ●●●●●●●●● ●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●● ● ●●●●●●● ●● ●● ●●●● ●●●●● ●●●●● ●●●● ● ●

●● ●●● ●● ●● ● ●● ●●●●●● ●●●●● ●●●●● ●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●●●●●●● ●●● ●●

●●● ●●● ●●●●● ●● ●● ●● ● ●●● ●● ●●● ●●●● ●●●●●●●●● ●●●●●●● ●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●● ●● ● ●●●●●●

●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●● ●●●

● ●●●●●●●●● ● ●● ● ● ● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●● ● ●●●● ●●● ●●●

●● ●● ●● ●●●● ●● ●●●● ●●●●●●● ●●●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●● ●●●●●●●●● ●● ● ●●●●●●●

●● ● ●● ●● ●●● ●● ●● ●●●● ● ●●●● ●● ● ●●● ●● ●● ●●●●●●●●●●●●●● ●● ●●●●● ●● ●● ●●●●●●● ● ●● ● ● ●●●●●● ●●●●

● ●● ● ●● ● ●●●● ● ●● ●● ●● ●● ●● ●●●●●● ●● ●●● ●●●●● ● ● ● ●● ●● ●● ●●●●● ●●●●●●● ●●●

●●●● ●●●● ●● ●●●● ●●●● ● ●●● ●●●● ●●●●● ● ●●● ●● ●● ●● ●● ●●●● ●●●●●●●● ●●● ●●●● ●●

●●●● ●● ●● ●●● ●● ●● ● ●●● ●● ●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●● ●●●●●●●●●● ●●● ●● ●●●●●● ●● ●● ● ●●

●● ●● ●●●●●●●●●●●● ●●● ●● ●● ●●● ●● ●●● ● ●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●●● ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●●● ●●● ●●● ●●●● ●●●● ●●● ●● ●●●● ●●●●●●● ●●●●●●●●●●●● ●●● ●● ●●● ●●● ●●●●●●●●

● ● ● ● ●●● ● ●●●●● ●● ●●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●● ●●●●●

●● ● ●●● ●●●●● ●●● ●●●● ●●●● ●●●●● ● ●● ●● ●●●●●● ●● ●●● ●●● ●●●●● ●●

●● ●●●● ●●●●●●●●●●●●● ●●● ●●● ●● ●●● ●●●●●●●● ●●●● ●●● ●●●●●●●● ● ●● ●● ●●

●●● ●●● ●●●●● ●● ●● ●●● ●● ● ●● ● ●●● ●●●●● ●●● ●●●● ●●● ●●● ●●● ●●● ● ●●●●●

●●●● ●●●●●●●●● ●●●●●●● ●●●● ●● ●●● ●● ●●●●● ●●●●● ●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●

●● ●●●● ●●● ●●● ● ●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●

●●●● ●●● ●● ●●● ●●●● ●● ●●● ●● ● ●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●● ●●● ●● ●●● ●

●●● ● ●● ● ●● ●●●●● ● ●●●●●● ● ●● ●●●●●●● ●●●●●●●● ●● ●●●●●● ●●●● ● ●●●●● ●●●●

●●● ●● ●●●●● ● ●● ●●● ● ●● ●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●● ●●●●● ●●● ●●●●●●●

●● ●●●● ● ● ● ●●● ● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●● ● ● ●●●●●

●●● ●●● ●● ●●●● ● ●●● ● ●●●●● ●● ●● ●●●●●● ●●●●●●●●●● ●●●●●●● ●●●● ●●●● ●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●

● ●●●● ●● ●● ●● ●● ●●● ●●●● ●● ●●●● ●●● ●●● ●●●●●● ●● ●●●●●●●●●● ●● ●● ●●●●●

●● ●● ●● ●●● ●●●● ●●● ●●● ●●●● ●● ● ●● ●● ●●●●●●●●●●●●●●●● ●●● ●● ●●● ●●●● ●●●● ●●●●● ●●●●●●●●●●● ●●●●●●● ●

●● ●●● ●● ●● ●●● ●●●● ●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●● ●

● ●● ● ●●● ●● ● ● ●●●● ●● ●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●

● ●●●●●●● ●●● ●●●●● ●●● ●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●● ●● ●●●●● ●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●

●●● ●●●●● ●●●● ● ●● ●● ●●●●● ●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●● ●●●●●

●●● ●● ●● ●●●● ●● ●●●●●● ●●●● ●●●●●●●●●●●●●● ●● ●●● ●●● ●●●● ●● ●●●● ●●● ●●●●●●●●●●●●●●●

●●●● ●●●● ●● ●● ● ●●●● ●● ●●● ●● ●●●●● ●●●●● ●●●●●●●●●●●●●●●● ●●●● ●●●● ●●● ●●● ●●●●● ●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●●●

● ● ●●●●●●●●● ●●●●●● ●●● ●●●● ● ●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●

0.5 1.0 1.5 2.0 2.5Thursday, 7 October 2010

abs(lratio)

reor

der(n

ame,

lrat

io)

MaryHelen

MargaretElizabethFrances

HazelRuby

BerniceCarolPearl

BonnieJune

ShirleyJean

ConnieShannon

OraKelly

PatsyRobin

GailJamieBillieTracyOllie

DanaMarion

LynnJessieJackieAngelLeslie

JohnnieJimmie

WillieTerry

LeeSidney

GeneCecil

EddieFrancis

IraDale

ClydeJerryRay

CharlieJesse

JoeHenry

GeorgeMichaelCharles

FrankJosephJamesRobert

JohnDavid

ThomasWilliamRichard

●●●●●● ●●●● ●●●● ●●●● ●●●● ●● ●●● ●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ● ●●●● ●●●●●●●● ●●●●●●●●●●●●●●●

● ●●● ●● ●●●● ●●●● ●●●● ●●● ●●● ●● ●● ● ●● ●● ●●●●●●●●●●●●●●● ●● ●●● ●●

● ●● ●● ●● ●●● ● ●●● ● ●●●●●●●● ● ●●● ●●● ● ● ●●●●●●● ●●●●●●●●●●●● ●●●● ●●●● ●●●

● ●●● ●●● ●● ● ●●● ●● ●●●●● ●● ●●●● ● ●●●● ●●●● ●●● ●●●● ●●● ●● ●●●●●●●●●●●●●●●

● ●●●● ● ●●●● ●● ●● ●●●●● ●●● ●●●●●● ●●● ● ●●●●● ●●●●●●●●●●●● ●●●● ●● ● ●● ●●●● ●●● ●

● ●● ● ●● ●● ●● ●● ●●●● ●●● ● ●●● ●● ●● ●● ●● ● ●●●●●●●●●●● ●●●●●●●●

● ●●● ●●●●● ●● ●● ● ●●● ●● ● ●●●● ● ●●● ●● ●●● ●●●● ●● ●●●● ●●●●●●●●

● ●●● ●●●● ●● ●● ●●●●● ●●● ● ●●●●● ●●●●●● ●● ●●● ●● ●●●●●● ●●●●●●●

● ● ● ●●● ●● ● ●●●●●●● ● ●●●● ●●●●● ● ●●● ●● ● ●●● ●●● ● ● ● ●● ●● ●●●●● ●●●●● ●●●●●●

● ●●●●●● ●● ●● ●●●● ● ●● ● ●●● ●● ●●●●●● ●●●● ●● ●●●●●●●●●●●●●●●● ●●● ●●●●

● ●● ●● ●●● ●● ●●●●● ●● ●● ●●●●●● ●●●●● ●●● ● ●●●● ●●● ●●● ●●●● ●● ●● ● ●●●● ●● ●●●● ●●●●

●●● ●●●●● ● ●●●●● ●●● ●● ● ●● ● ●●●●● ● ● ●● ●● ●●● ●●● ●● ●●●●●● ●●●● ●●● ●●

●●●●●● ●● ●●● ●●●● ●● ●●● ●● ●● ●● ●●● ●●●●●● ● ● ●● ●● ● ● ●● ●●●●●●● ● ● ●●● ●●● ●●● ●●● ● ●●●●●●●●

●●●●● ● ●● ● ●●●● ●●●● ● ● ●●● ●●● ●●● ● ●● ●● ● ●●●●●●●●●●● ●● ●●●●● ●●●●●●●● ●●●●●● ●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●● ●●● ● ●●

●● ●● ● ●●●● ●● ●● ●● ●● ●●● ●●● ●●● ● ●●●● ●● ●● ●●● ● ● ●● ●● ●●●●●● ● ●●● ● ● ●●●● ●● ● ●● ● ●● ●●●●● ●● ●●●●● ●●

● ● ●●●● ●● ● ●● ● ●●●●● ●●●●● ●●●● ● ● ● ●●●● ●● ●●●● ●●● ● ●● ●●●●●● ● ●●●●●●●●●●●●●●

● ●●●●● ●● ● ●●● ● ●●● ●● ●●●●● ●●●● ●● ●● ●●●●●● ●●● ●●● ●●●● ●● ●●● ● ●● ●● ● ●●●●●●

●●● ●●● ●●●●● ● ●●● ● ●●●●● ●●●●●●● ●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●

● ●● ●●● ● ●●●● ● ●● ● ●●●● ● ●●●●● ● ● ● ● ● ● ●● ● ● ●● ● ●●●● ● ●● ●● ●●●●

●● ●●●●●●●●● ●● ●●● ●● ● ●●● ●●● ● ●● ●● ●● ●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●● ●●●●

● ●● ● ●● ●●● ● ●● ●●●●● ●●●● ●●●● ●● ●●●●●●● ●●●●● ●●● ● ● ● ● ● ● ● ● ●● ●●● ●● ●● ●●● ● ●●● ●●●●●●●●

●● ●●●●●● ●● ●●●●● ● ●●●● ● ●●●●●●●● ● ●●●●● ●●●●●●● ●●●● ● ●●●● ●● ●● ●●●●●●●●●● ●●●●●●●●●●●● ●●

● ●●●● ●●● ●● ●● ●● ●●●● ●●●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●● ●●● ●●● ●●●●●●●●●●●●● ●●● ●● ●● ● ●●● ●●●● ●●●●●

●● ● ●●● ●● ●●●● ●●● ●●● ● ●●●● ●●● ● ● ●●●● ● ● ●●●● ● ●●● ●●●●●●●●●●●●●● ●●●●●●●● ●

● ●● ●●●● ●●●●●●● ●●●●●● ●●● ●●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●● ●● ●●●●●●● ●● ●●●● ●●● ●●●●●●● ●● ●●

●●●● ●●● ●● ●●● ●●●●●● ●● ●●● ●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●● ● ●●●●●● ● ●● ●●● ●●●●● ●● ●●●●●●●●●●● ●●●●●●●●● ●●

●●●●●●● ●●● ● ●● ●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●● ●●●●●●●●●● ●●● ●●●●●●●●●

● ●● ● ●●● ●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ● ●●●●●●●● ● ●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●

●●● ●●●●●●●●●●●● ●●●●●● ●●●●●●● ●●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●

●●●●●● ●●● ●●●● ●●●●●●●●●●●●●● ●●●●●●●● ● ●● ● ●●● ●●●●●●●●●●●●● ●●●● ● ●●●●● ●●●●●●●●●●

●●●●●● ●●●●●●●●● ● ●●●●● ●●●● ●●●●●●●● ● ●● ●●●●●●●●●● ● ● ● ● ●●●

●● ● ●● ● ●●●● ●● ●●● ●●● ●●●●● ●●● ●●● ●● ●●●●●●●●● ●●●● ●●●● ●●● ●●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●● ● ●●●●●●● ●● ●● ●●●● ●●●●● ●●●●● ●●●● ● ●

●● ●●● ●● ●● ● ●● ●●●●●● ●●●●● ●●●●● ●●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●● ●●●● ●●●●●●●●●●● ●●● ●●

●●● ●●● ●●●●● ●● ●● ●● ● ●●● ●● ●●● ●●●● ●●●●●●●●● ●●●●●●● ●●●●●● ●●●●●●●●●●●●● ● ●●●●●●●● ●● ● ●●●●●●

●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●● ●●●

● ●●●●●●●●● ● ●● ● ● ● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●● ● ●●●● ●●● ●●●

●● ●● ●● ●●●● ●● ●●●● ●●●●●●● ●●●●●●●●●● ●● ●●●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●● ●● ●●●●●●●●● ●● ● ●●●●●●●

●● ● ●● ●● ●●● ●● ●● ●●●● ● ●●●● ●● ● ●●● ●● ●● ●●●●●●●●●●●●●● ●● ●●●●● ●● ●● ●●●●●●● ● ●● ● ● ●●●●●● ●●●●

● ●● ● ●● ● ●●●● ● ●● ●● ●● ●● ●● ●●●●●● ●● ●●● ●●●●● ● ● ● ●● ●● ●● ●●●●● ●●●●●●● ●●●

●●●● ●●●● ●● ●●●● ●●●● ● ●●● ●●●● ●●●●● ● ●●● ●● ●● ●● ●● ●●●● ●●●●●●●● ●●● ●●●● ●●

●●●● ●● ●● ●●● ●● ●● ● ●●● ●● ●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●● ●●●●●●●●●● ●●● ●● ●●●●●● ●● ●● ● ●●

●● ●● ●●●●●●●●●●●● ●●● ●● ●● ●●● ●● ●●● ● ●●●●●●●●●●●●●●●●● ●● ●●●●● ●●●●●● ●●● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●●● ●●● ●●● ●●●● ●●●● ●●● ●● ●●●● ●●●●●●● ●●●●●●●●●●●● ●●● ●● ●●● ●●● ●●●●●●●●

● ● ● ● ●●● ● ●●●●● ●● ●●●●●●● ●● ●● ●●●●●●●●●●●●●●●●●●●●●●● ● ● ●●●●●●●●● ●●●●●

●● ● ●●● ●●●●● ●●● ●●●● ●●●● ●●●●● ● ●● ●● ●●●●●● ●● ●●● ●●● ●●●●● ●●

●● ●●●● ●●●●●●●●●●●●● ●●● ●●● ●● ●●● ●●●●●●●● ●●●● ●●● ●●●●●●●● ● ●● ●● ●●

●●● ●●● ●●●●● ●● ●● ●●● ●● ● ●● ● ●●● ●●●●● ●●● ●●●● ●●● ●●● ●●● ●●● ● ●●●●●

●●●● ●●●●●●●●● ●●●●●●● ●●●● ●● ●●● ●● ●●●●● ●●●●● ●●●●●●●●●●●●● ●●●● ●●● ●●●●●●●●●●●●●●●

●● ●●●● ●●● ●●● ● ●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ●●●● ●

●●●● ●●● ●● ●●● ●●●● ●● ●●● ●● ● ●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●●●●●●●●● ●●● ●● ●●● ●

●●● ● ●● ● ●● ●●●●● ● ●●●●●● ● ●● ●●●●●●● ●●●●●●●● ●● ●●●●●● ●●●● ● ●●●●● ●●●●

●●● ●● ●●●●● ● ●● ●●● ● ●● ●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●● ●●●●● ●●●●● ●●● ●●●●●●●

●● ●●●● ● ● ● ●●● ● ●●● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●● ● ● ●●●●●

●●● ●●● ●● ●●●● ● ●●● ● ●●●●● ●● ●● ●●●●●● ●●●●●●●●●● ●●●●●●● ●●●● ●●●● ●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●

● ●●●● ●● ●● ●● ●● ●●● ●●●● ●● ●●●● ●●● ●●● ●●●●●● ●● ●●●●●●●●●● ●● ●● ●●●●●

●● ●● ●● ●●● ●●●● ●●● ●●● ●●●● ●● ● ●● ●● ●●●●●●●●●●●●●●●● ●●● ●● ●●● ●●●● ●●●● ●●●●● ●●●●●●●●●●● ●●●●●●● ●

●● ●●● ●● ●● ●●● ●●●● ●●●●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●● ●●●●● ●

● ●● ● ●●● ●● ● ● ●●●● ●● ●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●● ●●●●●●●●● ●●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ●

● ●●●●●●● ●●● ●●●●● ●●● ●●●●●●●●● ●●● ●●●●●●●●●●●●●●●●●●●● ●● ●● ●●●●● ●●●● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ●

●●● ●●●●● ●●●● ● ●● ●● ●●●●● ●● ●●●● ●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●● ●●●●●

●●● ●● ●● ●●●● ●● ●●●●●● ●●●● ●●●●●●●●●●●●●● ●● ●●● ●●● ●●●● ●● ●●●● ●●● ●●●●●●●●●●●●●●●

●●●● ●●●● ●● ●● ● ●●●● ●● ●●● ●● ●●●●● ●●●●● ●●●●●●●●●●●●●●●● ●●●● ●●●● ●●● ●●● ●●●●● ●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●● ●●●●

● ● ●●●●●●●●● ●●●●●● ●●● ●●●● ● ●●●●●● ●●●● ●●●●●●●●●●●●●●●●●●●●●●

0.5 1.0 1.5 2.0 2.5

What characteristics separate sex-errors from dual-sex names?

Thursday, 7 October 2010

Your turn

Compute the mean and range of lratio for each name.

Plot and come up with cutoffs that you think separate the two groups.

Thursday, 7 October 2010

rng <- ddply(selected, "name", summarise, diff = diff(range(lratio, na.rm = T)), mean = mean(lratio, na.rm = T))

qplot(diff, abs(mean), data = rng)qplot(diff, abs(mean), data = rng, geom = "text", label = name)

rng$dual <- abs(rng$mean) < 2arrange(rng, mean, dual)

selected <- join(selected, rng[c("name", "dual")])

Thursday, 7 October 2010

diff

abs(mean)

0.5

1.0

1.5

2.0

Angel

Bernice

Billie

Bonnie

Carol

Cecil

Charles

CharlieClyde

Connie

Dale

Dana

David

Eddie

Elizabeth

Frances

Francis

Frank

Gail

Gene

George

Hazel

HelenHenry

Ira

Jackie

James

Jamie

Jean

Jerry

Jesse

Jessie

Jimmie

Joe

John

Johnnie

Joseph

June

KellyLee

LeslieLynn

Margaret

Marion

MaryMichael

Ollie

OraPatsy

PearlRay

RichardRobert

Robin

Ruby

Shannon

Shirley

Sidney

Terry

Thomas

Tracy

William

Willie

0.5 1.0 1.5 2.0 2.5

Thursday, 7 October 2010

diff

abs(mean)

0.5

1.0

1.5

2.0

Angel

Bernice

Billie

Bonnie

Carol

Cecil

Charles

CharlieClyde

Connie

Dale

Dana

David

Eddie

Elizabeth

Frances

Francis

Frank

Gail

Gene

George

Hazel

HelenHenry

Ira

Jackie

James

Jamie

Jean

Jerry

Jesse

Jessie

Jimmie

Joe

John

Johnnie

Joseph

June

KellyLee

LeslieLynn

Margaret

Marion

MaryMichael

Ollie

OraPatsy

PearlRay

RichardRobert

Robin

Ruby

Shannon

Shirley

Sidney

Terry

Thomas

Tracy

William

Willie

0.5 1.0 1.5 2.0 2.5

Why does this pattern give us confidence that those dual-sex names are errors?

Thursday, 7 October 2010

qplot(year, lratio, data = selected, geom = "line", group = name) + facet_wrap(~ dual)

qplot(year, lratio, data = subset(selected, dual), geom = "line") + facet_wrap(~ name)

qplot(year, boy / (boy + girl), data = subset(selected, dual), geom = "line") + facet_wrap(~ name)

Thursday, 7 October 2010

Apply this threshold to all names, not just the few we focussed in on. Does it still seem like a good classification?

What can you say about trends in errors over time?

Your turn

Thursday, 7 October 2010

both$lratio <- with(both, log10(boy / girl))rng <- ddply(both, "name", summarise, diff = diff(range(lratio, na.rm = T)), mean = mean(lratio, na.rm = T))rng$dual <- abs(rng$mean) < 2arrange(rng, mean, dual)both <- join(both, rng[c("name", "dual")])

qplot(year, lratio, data = subset(both, !dual)) qplot(year, abs(lratio), data = subset(both, !dual), colour = factor(boy > girl)) + geom_smooth(size = 3)

Thursday, 7 October 2010

Math on the computer

Thursday, 7 October 2010

Your turn

Perform the following calculations in R. Are the answers what you expect?

seq(0.1, 0.9, by = 0.1) - 1:9 / 10

sqrt(2)^2 - 2

What is the property of these numbers that might cause the problem?

Thursday, 7 October 2010

# Each number must be stored in a finite amount of space

# => each number can only have a finite number of digits

# => floating point math does not work like normal math

(1e-16 + 1) == 1

(1e-16 + 1) * 10 == 1e-16 * 10 + 1 * 10

1e9 + 2 - 0.1 - 1e9

1e10 + 2 - 0.1 - 1e10

1e11 + 2 - 0.1 - 1e11

1e12 + 2 - 0.1 - 1e12

1e13 + 2 - 0.1 - 1e13

1e14 + 2 - 0.1 - 1e14

Thursday, 7 October 2010

a⋅(b + c) = a⋅b + a⋅ca + (b + c) = (a + b) + c

a + b - b = a

Thursday, 7 October 2010

# By default R only shows 7 significant digits

# If the trailing digits are zero, the number will be rounded

(1 / 237)

(1 / 237) * 237

(1 / 237) * 237 - 1

seq(0.1, 0.9, by = 0.1)

seq(0.1, 0.9, by = 0.1) - 1:9 / 10

# Tricky to get to print exactly:

formatC((1 / 237) * 237, digits = 20)

formatC(seq(0.1, 0.9, by = 0.1), digits = 20)

Thursday, 7 October 2010

# When working with floating point numbers (numeric)

# (but not integers, this is the one place where the

# difference is important) never test for equality with ==

a <- seq(0.1, 0.9, by = 0.1)

b <- 1:9 / 10

all(a == b)

all.equal(a, b)

all(abs(a - b) < 1e-6)

# Similarly, need to be careful with < and > etc

Thursday, 7 October 2010

# Places where this matters:# # * sums# * calculating the standard deviation# * inverting a matrix (condition)# * linear models!# * maximum likelihood estimation

Thursday, 7 October 2010

Reflection on project teams.

Plyr drills.

Plyr practice with basketball data (what we’ll be using for the next project).

Homework

Thursday, 7 October 2010

Recommended