Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 1 of 10
Introduction to R and R-Studio 2018-19
In-Class Lab Activity The 1970 Draft Lottery
Summary The goal of this activity is to give you practice with R Markdown for saving your work. It’s also a fun bit of sleuthing with real data, namely the lottery that determined who would be drafted in 1970 and sent to Viet Nam.
R Datasets Used in This Illustration
Download the following 2 datasets from the course website page, THIS WEEK lottery1970.Rdata lottery1970monthly.Rdata
Packages Used in This Illustration
knitr ggplot2 To install these packages (one time), at the console window, type install.packages(“knitr “) install.packages(“ggplot2”) Reminders - 1) When installing packages, don’t forget. install.packages has a period between install and packages 2) When installing packages, the package name must be enclosed in quotes
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 2 of 10
Background – The 1970 Lottery in the US was NOT Random Source: http://ww2.amstat.org/publications/jse/v5n2/datasets.starr.html#fienberg1 “This lottery was a source of considerable discussion before being held on December 1, 1969. Soon afterwards a pattern of unfairness in the results led to further publicity: those with birthdates later in the year seemed to have had more than their share of low lottery numbers and hence were more likely to be drafted. On January 4, 1970, the New York Times ran a long article, "Statisticians Charge Draft Lottery Was Not Random," illustrated with a bar chart of the monthly averages (Rosenbaum 1970a). It described the way the lottery was carried out, and with hindsight one can see how the attempt at randomization broke down. The capsules were put in a box month by month, January through December, and subsequent mixing efforts were insufficient to overcome this sequencing. The details of the procedure are quoted in Fienberg (1971a) and the first three editions of Moore (1979, 1985, 1991).”
Details of the 1970 Lottery Randomization Procedures Source: http://science.sciencemag.org/content/sci/171/3968/255.full.pdf
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 3 of 10
__1. Begin your R-Studio Session by Opening An New R Markdown file
Step 1. Launch R Studio Step 2. From the top menu bar: FILE > NEW FILE > R Markdown You should see something like the following (note – yours won’t say Carol Bigelow of course):
• At top right, at title: Type in a title of your choosing • Just below, under Default Output Format: choose your output format
o HTML – This is the default selection. It’s fine to choose this. o Click OK
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 4 of 10
Example –
REVIEW - Recall what you are looking at above: * A brand new R Markdown file comes with a bunch of stuff (helpful to read but not necessary to keep). * Each gray shaded area is called a chunk. A chunk is a set of R commands with a “beginning” and “end” CHUNK BEGINNING: Each “chunk” begins with ```{r} or it begins with ```{r SOMETHING YOU CHOOSE HERE} IF you choose ```{r include=FALSE} THEN messages and code will be NOT SHOWN (I do not recommend this) If you choose ```{r echo=FALSE} THEN code will NOT BE SHOWN (I do not recommend this either) Personally, I recommend sticking with beginning each “chunk” using ```{r } CHUNK END: Each “chunk” ends with ```
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 5 of 10
Step 3. Clear your brand new R Markdown so that it is empty of everything except your header: - Place your cursor at line 7 of the “shell” R Markdown. Drag to highlight and select all below - Click delete You should now see something like the following (with your name, not mine, obviously):
REVIEW - We will work “chunk by chunk”: writing code, fixing code, executing code
1st – We open a new blank chunk (to do a specific task that we want to do) 2nd - We type some commands into this chunk and then we run it. 3rd - As, typically is the case, we EDIT the commands in this chunk until we get what we like and then re-run it 4th - Once, we’re happy with the current chunk and the current task, we move on to the next chunk/next task!
1st – How to Open a New Blank Chunk Click on the little green “insert a chunk” icon at top (on the right). From the drop down menu, choose R
You should see the following. - You should see the gray chunk start ```{r} - You should see your cursor placed inside - You should see the chunk end ``` - TIP - NEVER delete the chunk start or end!
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 6 of 10
Key: BLACK - commands (you type these) BROWN - comments (optional, you type these) BLUE – output
1. Read in R dataset lottery1970.Rdata # input rdataset lottery1970.Rdata. Check. setwd("/Users/cbigelow/Desktop") load(file="lottery1970.Rdata")
2. Produce basic plot (no frills). No special package required. # command is plot(dataframe$xvar, dataframe$yvar) plot(lotterydata$day,lotterydata$rank)
What do you think?
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 7 of 10
3. Produce fancy scatterplot of raw scatter using package ggplot2 # command is ggplot(dataframe, aes(x=xvar, y=yvar)) + geom_point( ) library(ggplot2) p <-‐ ggplot(lotterydata, aes(x=day,y=rank)) p <-‐ p + geom_point() p <-‐ p + xlab("Birth Date") p <-‐ p + ylab("Selective Service Lottery Number") p <-‐ p + ggtitle("Scatterplot of 1970 Draft Lottery") p <-‐ p + theme_bw() p
## Warning: Removed 1 rows containing missing values (geom_point).
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 8 of 10
4. Obtain average lottery number, by month. No special package required. # command is aggregate() aggregate(lotterydata$rank,list(month=lotterydata$month),mean)
## month x ## 1 1 201.1613 ## 2 2 202.9655 ## 3 3 225.8065 ## 4 4 203.6667 ## 5 5 207.9677 ## 6 6 195.7333 ## 7 7 181.5484 ## 8 8 173.4516 ## 9 9 157.3000 ## 10 10 182.4516 ## 11 11 148.7333 ## 12 12 121.5484
The output you got above should match the following:
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 9 of 10
5. So now lets work with the monthly means. Next, load lottery1970monthly.Rdata load(file="lottery1970monthly.Rdata")
6. Produce fancy scatterplot, with overlay linear regression, of monthly means using package ggplot2 # command is ggplot(dataframe, aes(x=xvar, y=yvar)) + geom_point( ) + geom_smooth( ) library(ggplot2) p <-‐ ggplot(monthlydata, aes(x=xmonth,y=yave_rank)) p <-‐ p + geom_point() p <-‐ p + geom_smooth(method=lm, se=FALSE) p <-‐ p + xlab("Month") p <-‐ p + ylab("Average Selective Service Lottery Number") p <-‐ p + ggtitle("1970 Draft Lottery -‐ Monthly Average") p <-‐ p + theme_bw() p
Now what do you think?
R Handouts 2018-19 ACTIVITY: 1970 Lottery
…\R lab Fall 108 Draft Lottery.docx Page 10 of 10
All done? Save (archiving) your work (nifty either as a record of your work, or for re-use later!)
The action of saving your work is what is meant by knit. How to knit:
- At top click on the drop down menu for the knit icon - From the drop down menu, I recommend that you choose KNIT TO WORD
(Why? Answer – so that you can open this file later in word and perhaps fancy it up a bit) - Tip: Take care to choose a destination folder that you’ll remember (I always choose DESKTOP)