Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Loops, dplyr, mapsstat 480
Heike Hofmann
Outline
• Loops
• review of dplyr
• Maps
• Want to run the same block of code multiple times:
• Loop or iteration
for (i in 1:n) { season <- subset(baseball, id == players[i])
mba[i] <- with(season, mean(h/ab))}
block of commands
Iterations
output
NANANANANANANANANANANA
mba0.301NANANANANANANANANANA
mba <- rep(NA, n)
for (i in 1:n) { seasons <- subset(baseball, id == players[i])
mba[i] <- with(seasons, mean(h/ab))}
i = 1i = 20.3010.182NANANANANANANANANA
... and so on ...
0.3010.1820.2360.2100.2380.2750.0890.1520.1120.2490.158
Your Turn
• Run the iteration to get (a) the life time batting average for each player(b) the life time number of times each player was at bats.
• Make a dataset player.stats from mba, nab and players (use data.frame and cbind)
• Plot nab versus mba.
Other loops• while (condition) {
}
• repeat {
if (cond) break}
block of commands
block of commands
Good Practice
• Use tabulators to structure blocks of statements
• Build complex blocks of codes step by step, i.e. try with single state first, try to generalize
•# write comments!
Why should we not use loops?
• Loops generally highlight a user’s inexperience, b/c most loops can be dealt with better and faster in R’s vector system
• dplyr alternative takes care of all householding chores (like saving vector space beforehand, and binding vectors into a dataframe afterwards)
Some Social Issues
• How many people do you know admit to driving while intoxicated?
• How many people do not use their seat belts?
• How many people did not work out for a single minute in the last month?
• … the BRFSS (behavioral risk factor surveillance system) tries to answer those kind of questions …
Data set: Behavioral Risk Factor Surveillance System (BRFSS)
• largest telephone survey to track health risks: http://www.cdc.gov/brfss/
• For overview, go to:http://apps.nccd.cdc.gov/brfss/
• Visit the above website and try to answer one of the previous questions.
• Report on this - or another surprise finding.
What did you find?
• … the online tool is good, but we can do much better in R …
Report back
Using the Codebook
• Open the codebook in a text editor (any text editor, just double click the file once you have downloaded it from the website)
• Use the ‘Search’ function to navigate in the document …
• What does variable QLREST2 encode?
Review of data aggregation with dplyr
group_by, summarise
Recognize .variable• Use dplyr to compute mean QLREST2 values by
state.
• Summarize each of the variables GENHLTH, AVEDRNK2, and DRNKDRI2 by gender (SEX)
•What is the average weight in the population by state, gender and educational level? What is the standard deviation?
Maps
What is a map?
long
lat
40.5
41.0
41.5
42.0
42.5
43.0
43.5
-96 -95 -94 -93 -92 -91
Set of points specifying latitude and longitude
long
lat
40.5
41.0
41.5
42.0
42.5
43.0
43.5
-96 -95 -94 -93 -92 -91
Polygon: connect dots in correct order
long
lat
30
35
40
-95 -90 -85
What is a map?
long
lat
30
35
40
-95 -90 -85
Polygon: connect only the correct dots
Grouping
• Use parameter group to connect the “right” dots (need to create grouping sometimes)
long
lat
30
35
40
45
-120 -110 -100 -90 -80 -70
long
lat
30
35
40
45
-120 -110 -100 -90 -80 -70
long
lat
30
35
40
45
-120 -110 -100 -90 -80 -70
long
lat
30
35
40
45
-120 -110 -100 -90 -80 -70
lat
30
35
40
45
qplot(long, lat, geom="point", data=states)
qplot(long, lat, geom="path", data=states, group=group)
qplot(long, lat, geom="polygon", data=states, group=group, fill=region)
qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group)
Merging Files
• merge(x, y, ...)
• help(merge)
• need to specify along which (key) variable(s) in x and y records are aligned
Your Turn
• Draw a choropleth map of states showing percentage of households without healthcare coverage (HLTHPLAN == 2)
• Are elderly more affected? Draw choropleth maps of states showing percentage of households without healthcare coverage (HLTHPLAN) by age groups (AGE10 - defined earlier).- what is the group size?