99
Good Habits in R Programming STAT 133 Gaston Sanchez Department of Statistics, UC–Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133

Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Good Habits in R ProgrammingSTAT 133

Gaston Sanchez

Department of Statistics, UC–Berkeley

gastonsanchez.com

github.com/gastonstat/stat133

Course web: gastonsanchez.com/stat133

Page 2: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Good Coding Habits

2

Page 3: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Code Habits

Now that you’ve worked with various R scripts, written somefunctions, and done some data manipulation, it’s time to lookat some good coding practices.

3

Page 4: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Code Habits

Popular style guides among useR’s

I https://google-styleguide.googlecode.com/svn/

trunk/Rguide.xml

I http://adv-r.had.co.nz/Style.html

4

Page 5: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

5

Page 6: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

6

Page 7: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Editor

Text Editor

I Text editor 6= word processor

I Use a good text editor

I e.g. vim, sublime text, text wrangler, notepad, etc

I With syntax highlighting

I Or use an Integrated Development Environment (IDE) likeRStudio

7

Page 8: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Without Syntax Highlighting

a <- 2

x <- 3

y <- log(sqrt(x))

3*x^7 - pi * x / (y - a)

"some strings"

dat <- read.table(file = 'data.csv', header = TRUE)

8

Page 9: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Syntax Highlighting

a <- 2

x <- 3

y <- log(sqrt(x))

3*x^7 - pi * x / (y - a)

"some strings"

dat <- read.table(file = 'data.csv', header = TRUE)

9

Page 10: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Syntax Highlight

Without highlighting it’s harder to detect syntax errors:

numbers <- c("one", "two, "three")

if (x > 0) {

3 * x + 19

} esle {

2 * x - 20

}

10

Page 11: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Syntax Highlight

With highlighting it’s easier to detect syntax errors:

numbers <- c("one", "two, "three")

if (x > 0) {3 * x + 19

} esle {2 * x - 20

}

11

Page 12: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Your Turn

Which instruction is free of errors

A) mean(numbers, na.mr = TRUE)

B) read.table(~/Documents/rawdata.txt, sep = '\t')

C) barplot(x, horiz = TURE)

D) matrix(1:12, nrow = 3, ncol = 4)

12

Page 13: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Use an IDE

I Syntax highlighting

I Syntax awareI Able to evaluate R code

– by line– by selection– entire file

I Command completion

13

Page 14: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Use an IDE

Use an IDE with autocompletion

14

Page 15: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Use an IDE

Use an IDE that provides helpful documentation

15

Page 16: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Good Source Code

16

Page 17: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Literate Programming

Think about programs/scripts/code as works of literature

17

Page 18: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Important Aspects

I Indentation of lines

I Use of spaces

I Use of comments

I Naming style

I Use of white space

I Consistency

18

Page 19: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Literate Programming

Good source code

I Well readable by humans

I As much self-explaining as possible

19

Page 20: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Literate Programming

“Let us change our traditional attitude to theconstruction of programs: Instead of imagining thatour main task is to instruct a computer what to do,let us concentrate rather on explaining to humanbeings what we want a computer to do”

Donald Knuth. “Literate Programming (1984)”

20

Page 21: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Literate Programming

I Choose the names of variables carefully

I Explain what each variable means

I Strive for a program that is comprehensible

I Introduce concepts in an order that is best for humanunderstanding

(From Donald Knuth’s: Literate Programming, 1984)

21

Page 22: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Literate Programming

Instructing a computer what to do

# good for computers (not much for humans)

if (is.numeric(x) & x > 0 & x %% 1 == 0) TRUE else FALSE

Explaining a human being what we want a computer to do

# good for humans

is_positive_integer(x)

22

Page 23: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Literate Programming

Instructing a computer what to do

# good for computers (not much for humans)

if (is.numeric(x) & x > 0 & x %% 1 == 0) TRUE else FALSE

Explaining a human being what we want a computer to do

# good for humans

is_positive_integer(x)

22

Page 24: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Literate Programming

# example

is_positive_integer <- function(x) {(is.numeric(x) & x > 0 & x %% 1 == 0)

}

is_positive_integer(2)

## [1] TRUE

is_positive_integer(2.1)

## [1] FALSE

23

Page 25: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Indentation

I Keep your indentation style consistent

I There is more than one way of indenting code

I There is no “best” style that everyone should be following

I You can indent using spaces or tabs (but don’t mix them)

I Can help in detecting errors in your code because it canexpose lack of symmetry

I Do this systematically (RStudio editor helps a lot)

24

Page 26: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Indentation

# Don't do this!

if(!is.vector(x)) {stop('x must be a vector')

} else {if(any(is.na(x))){x <- x[!is.na(x)]

}total <- length(x)

x_sum <- 0

for (i in seq_along(x)) {x_sum <- x_sum + x[i]

}x_sum / total

}

25

Page 27: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Indentation

# better with indentation

if(!is.vector(x)) {stop('x must be a vector')

} else {if(any(is.na(x))) {x <- x[!is.na(x)]

}total <- length(x)

x_sum <- 0

for (i in seq_along(x)) {x_sum <- x_sum + x[i]

}x_sum / total

}

26

Page 28: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Indenting Styles

# style 1

find_roots <- function(a = 1, b = 1, c = 0)

{if (b^2 - 4*a*c < 0)

{return("No real roots")

} else

{return(quadratic(a = a, b = b, c = c))

}}

27

Page 29: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Indenting Styles

# style 2

find_roots <- function(a = 1, b = 1, c = 0) {if (b^2 - 4*a*c < 0) {return("No real roots")

} else {return(quadratic(a = a, b = b, c = c))

}}

28

Page 30: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Indentation

Benefits of code indentation:I Easier to read

I Easier to understand

I Easier to modify

I Easier to maintain

I Easier to enhance

29

Page 31: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Reformat Code in RStudio

I RStudio provides code reformatting (use it!)

I Click Code on the menu bar

I Then click Reformat Code

30

Page 32: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

31

Page 33: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Reformat Code in RStudio

# unformatted code

quadratic<-function(a=1,b=1,c=0){root<-sqrt(b^2-4*a*c)

x1<-(-b+root)/2*a

x2<-(-b-root)/2*a

list(sol1=x1,sol2=x2)

}

# reformatted code

quadratic <- function(a = 1,b = 1,c = 0) {root <- sqrt(b ^ 2 - 4 * a * c)

x1 <- (-b + root) / 2 * a

x2 <- (-b - root) / 2 * a

list(sol1 = x1,sol2 = x2)

}

32

Page 34: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Reformat Code in RStudio

# unformatted code

quadratic<-function(a=1,b=1,c=0){root<-sqrt(b^2-4*a*c)

x1<-(-b+root)/2*a

x2<-(-b-root)/2*a

list(sol1=x1,sol2=x2)

}

# reformatted code

quadratic <- function(a = 1,b = 1,c = 0) {root <- sqrt(b ^ 2 - 4 * a * c)

x1 <- (-b + root) / 2 * a

x2 <- (-b - root) / 2 * a

list(sol1 = x1,sol2 = x2)

}

32

Page 35: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Meaningful Names

33

Page 36: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Naming Style

Choose a consistent naming style for objects and functions

I someObject (lowerCamelCase)

I SomeObject (UpperCamelCase)

I some object (underscore separation)

I some.object (dot separation)

34

Page 37: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Naming Style

Avoid using names of standard R objects

I vector

I mean

I list

I data

I c

I colors

I etc

35

Page 38: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Naming Style

If you’re thinking about using names of R objects, prefersomething like this

I xvector

I xmean

I xlist

I xdata

I xc

I xcolors

I etc

36

Page 39: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Naming Style

Better to add meaning like this

I mean salary

I input vector

I data list

I data table

I first last

I some colors

I etc

37

Page 40: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Naming Style

# what does getThem() do?

getThem <- function(values, y) {list1 <- c()

for (i in values) {if (values[i] == y)

list1 <- c(list1, x)

}return(list1)

}

38

Page 41: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Naming Style

# this is more meaningful

getFlaggedCells <- function(gameBoard, flagged) {flaggedCells <- c()

for (cell in gameBoard) {if (gameBoard[cell] == flagged)

flaggedCells <- c(flaggedCells, x)

}return(flaggedCells)

}

39

Page 42: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Meaningful Distinctions

# argument names 'a1' and 'a2'?

move_strings <- function(a1, a2) {for (i in seq_along(a1)) {

a1[i] <- toupper(substr(a1, 1, 3))

}a2

}

# argument names

move_strings <- function(origin, destination) {for (i in seq_along(origin)) {destination[i] <- toupper(substr(origin, 1, 3))

}destination

}

40

Page 43: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Meaningful Distinctions

# argument names 'a1' and 'a2'?

move_strings <- function(a1, a2) {for (i in seq_along(a1)) {

a1[i] <- toupper(substr(a1, 1, 3))

}a2

}

# argument names

move_strings <- function(origin, destination) {for (i in seq_along(origin)) {destination[i] <- toupper(substr(origin, 1, 3))

}destination

}

40

Page 44: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Pronounceable Names

# cryptic abbreviations

DtaRcrd102 <- list(

nm = 'John Doe',

bdg = 'Valley Life Sciences Building',

rm = 2060

)

# pronounceable names

Customer <- list(

name = 'John Doe',

building = 'Valley Life Sciences Building',

room = 2060

)

41

Page 45: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Pronounceable Names

# cryptic abbreviations

DtaRcrd102 <- list(

nm = 'John Doe',

bdg = 'Valley Life Sciences Building',

rm = 2060

)

# pronounceable names

Customer <- list(

name = 'John Doe',

building = 'Valley Life Sciences Building',

room = 2060

)

41

Page 46: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Your Turn

Which of the following is NOT a valid name:

I A) x12345

I B) data

I C) oBjEcT

I D) 5ummary

I E) data.frame

42

Page 47: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Syntax

White SpacesI Use a lot of it

I around operators (assignment and arithmetic)

I between function arguments and list elements

I between matrix/array indices, in particular for missingindices

I Split long lines at meaningful places

43

Page 48: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

Avoid this

a<-2

x<-3

y<-log(sqrt(x))

3*x^7-pi*x/(y-a)

Much Better

a <- 2

x <- 3

y <- log(sqrt(x))

3*x^7 - pi * x / (y - a)

44

Page 49: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

# Avoid this

plot(x,y,col=rgb(0.5,0.7,0.4),pch='+',cex=5)

# OK

plot(x, y, col = rgb(0.5, 0.7, 0.4), pch = '+', cex = 5)

45

Page 50: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

# Avoid this

plot(x,y,col=rgb(0.5,0.7,0.4),pch='+',cex=5)

# OK

plot(x, y, col = rgb(0.5, 0.7, 0.4), pch = '+', cex = 5)

45

Page 51: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Readability

Lines should be broken/wrapped around so that they are lessthan 80 columns wide

# lines too long

histogram <- function(data){hist(data, col = 'gray90', xlab = 'x', ylab = 'Frequency', main= 'Histogram of x')

abline(v = c(min(data), max(data), median(data), mean(data)),

col = c('gray30', 'gray30', 'orange', 'tomato'), lty = c(2,2,1,1), lwd = 3)

}

46

Page 52: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Readability

Lines should be broken/wrapped aroung so that they are lessthan 80 columns wide

# lines too long

histogram <- function(data) {hist(data, col = 'gray90', xlab = 'x', ylab = 'Frequency',

main = 'Histogram of x')

abline(v = c(min(data), max(data), median(data), mean(data)),

col = c('gray30', 'gray30', 'orange', 'tomato'),

lty = c(2,2,1,1), lwd = 3)

}

47

Page 53: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

I Spacing forms the second important part in codeindentation and formatting.

I Spacing makes the code more readable

I Follow proper spacing through out your coding

I Use spacing consistently

48

Page 54: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

# this can be improved

stats <- c(min(x), max(x), max(x)-min(x),

quantile(x, probs=0.25), quantile(x, probs=0.75), IQR(x),

median(x), mean(x), sd(x)

)

49

Page 55: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

Don’t be afraid of splitting one long line into individual pieces:

# much better

stats <- c(

min(x),

max(x),

max(x) - min(x),

quantile(x, probs = 0.25),

quantile(x, probs = 0.75),

IQR(x),

median(x),

mean(x),

sd(x)

)

50

Page 56: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

You can even do this:

# also OK

stats <- c(

min = min(x),

max = max(x),

range = max(x) - min(x),

q1 = quantile(x, probs = 0.25),

q3 = quantile(x, probs = 0.75),

iqr = IQR(x),

median = median(x),

mean = mean(x),

stdev = sd(x)

)

51

Page 57: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

I All commas and semicolons must be followed by singlewhitespace

I All binary operators should maintain a space on either sideof the operator

I Left parenthesis should start immediately after a functionname

I All keywords like if, while, for, repeat should befollowed by a single space.

52

Page 58: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

All binary operators should maintain a space on either side ofthe operator

# NOT Recommended

a=b-c

a = b-c

a=b - c;

# Recommended

a = b - c

53

Page 59: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

All binary operators should maintain a space on either side ofthe operator

# Not really recommended

z <- 6*x + 9*y

# Recommended (option 1)

z <- 6 * x + 9 * y

# Recommended (option 2)

z <- (7 * x) + (9 * y)

54

Page 60: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

Left parenthesis should start immediately after a function name

# NOT Recommended

read.table ('data.csv', header = TRUE, row.names = 1)

# Recommended

read.table('data.csv', header = TRUE, row.names = 1)

55

Page 61: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

White spaces

All keywords like if, while, for, repeat should be followedby a single space.

# not bad

if(is.numeric(object)) {mean(object)

}

# much better

if (is.numeric(object)) {mean(object)

}

56

Page 62: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Syntax: Parentheses

Use parentheses for clarity even if not needed for order ofoperations.

a <- 2

x <- 3

y <- 4

a/y*x

## [1] 1.5

# better

(a / y) * x

## [1] 1.5

57

Page 63: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Use Parentheses

# confusing

1:3^2

## [1] 1 2 3 4 5 6 7 8 9

# better

1:(3^2)

## [1] 1 2 3 4 5 6 7 8 9

58

Page 64: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

# Comments

59

Page 65: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Comments

Comment your codeI Add lots of comments

I But don’t belabor the obvious

I Use blank lines to separate blocks of code and commentsto say what the block does

I Remember that in a few months, you may not follow yourown code any better than a stranger

I Some key things to document:– summarizing a block of code– explaining a very complicated piece of code– explaining arbitrary constant values

60

Page 66: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Line spaces and Comments

MV <- get_manifests(Data, blocks)

check_MV <- test_manifest_scaling(MV, specs$scaling)

gens <- get_generals(MV, path_matrix)

names(blocks) <- gens$lvs_names

block_sizes <- lengths(blocks)

blockinds <- indexify(blocks)

61

Page 67: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Line spaces and Comments

# ==================================================

# Preparing data and blocks indexification

# ==================================================

# building data matrix 'MV'

MV <- get_manifests(Data, blocks)

check_MV <- test_manifest_scaling(MV, specs$scaling)

# generals about obs, mvs, lvs

gens <- get_generals(MV, path_matrix)

# indexing blocks

names(blocks) <- gens$lvs_names

block_sizes <- lengths(blocks)

blockinds <- indexify(blocks)

62

Page 68: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Line spaces and Comments

Different line styles:

####################################################

# ==================================================

# **************************************************

# --------------------------------------------------

63

Page 69: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Line spaces and Comments

# ==================================================

# Preparing data and blocks indexification

# ==================================================

# building data matrix 'MV'

MV <- get_manifests(Data, blocks)

check_MV <- test_manifest_scaling(MV, specs$scaling)

# ---- Preparing data and blocks indexification ----

# building data matrix 'MV'

MV <- get_manifests(Data, blocks)

check_MV <- test_manifest_scaling(MV, specs$scaling)

64

Page 70: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Line spaces and Comments

# ==================================================

# Preparing data and blocks indexification

# ==================================================

# building data matrix 'MV'

MV <- get_manifests(Data, blocks)

check_MV <- test_manifest_scaling(MV, specs$scaling)

# ---- Preparing data and blocks indexification ----

# building data matrix 'MV'

MV <- get_manifests(Data, blocks)

check_MV <- test_manifest_scaling(MV, specs$scaling)

64

Page 71: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Comments

Include comments to say what a block does, or what a block isintended for

# =====================================================

# Data: liga2015

# =====================================================

# For this session we'll be using the dataset that

# comes in the file 'liga2015.csv' (see github repo)

# This dataset contains basic statistics from the

# Spanish soccer league during the season 2014-2015

65

Page 72: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Comments

x <- matrix(1:10, nrow = 2, ncol = 5)

# mean vectors by rows and columns

xmean1 <- apply(x, 1, mean)

xmean2 <- apply(x, 2, mean)

# Subtract off the mean of each row/column

y <- sweep(x, 1, xmean1)

z <- sweep(x, 2, xmean2)

# Multiply by the mean of each column (for some reason)

w <- sweep(x, 2, xmean1, FUN = "*")

66

Page 73: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

About Comments

Be careful with your comments (you never know who will endup looking at your code, or where you’ll be in the future)

# F***ing piece of code that drives me bananas

# wtf function

# best for loop ever

67

Page 74: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Good coding practices: Syntax

Code FilesI Break code into separate files (¡2000-3000 lines per file)

I Give files meaningful names

I Group related functions within a file

68

Page 75: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

R Scripts

Include Header information such asI Who wrote / programmed it

I When was it done

I What is it all about

I How the code might fit within a larger program

69

Page 76: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

R Scripts

Header example:

# ===================================================

# Some Title

# Author(s): First Last

# Date: month-day-year

# Description: what this code is about

# Data: perhaps is designed for a specific data set

# ===================================================

70

Page 77: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

R Scripts

If you need to load R packages, do so at the beginning of yourscript, after the header:

# ===================================================

# Some Title

# Author(s): First Last

# Date: month-day-year

# Description: what this code is about

# Data: perhaps is designed for a specific data set

# ===================================================

library(stringr)

library(ggplot2)

library(MASS)

71

Page 78: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Functions

72

Page 79: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Functions

I Functions are tools and operations

I Functions form the building blocks for larger tasks

I Functions allow us to reuse blocks of code easily for lateruse

I Use functions whenever possible

I Try to write functions rather than carry out your workusing blocks of code

73

Page 80: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Length of Functions

Some rules of thumb (in order of preference)

I Ideal length between 2 and 4 lines of code

I No more than 10 lines

I No more than 20 lines

I Should not exceed the size of the text editor window

74

Page 81: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Length of Functions

I Don’t write long functions

I Rewrite long functions by converting collections of relatedexpression into separate functions

I Smaller functions are easier to debug, easier to understand,and can be combined in a modular fashion

I Functions shouldn’t be longer than one visible screen (withreasonable font)

75

Page 82: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Length of Functions

I Separate small functions

I are easier to reason about and manage

I are easier to test and verify they are correct

I are more likely to be reusable

76

Page 83: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Some considerations

I Think about different scenarios and contexts in which afunction might be used

I Can you generalize it?

I Who will use it?

I Who is going to maintain the code?

77

Page 84: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Naming Functions

I Use descriptive names

I Readers (including you) should infer the operation bylooking at the call of the function

78

Page 85: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Functions

Functions should:I be modular (having a single task)

I have meaningful name

I have a comment describing their purpose, inputs andoutputs

79

Page 86: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Functions: Example

# find mean of Y on the data z, Y in last column, and predict at xnew

meany <- function(predpt,nearxy) {ycol <- ncol(nearxy)

mean(nearxy[,ycol])

}

# find variance of Y in the neighborhood of predpt

vary <- function(predpt,nearxy) {ycol <- ncol(nearxy)

var(nearxy[,ycol])

}

# fit linear model to the data z, Y in last column, and predict at xnew

loclin <- function(predpt,nearxy) {ycol <- ncol(nearxy)

bhat <- coef(lm(nearxy[,ycol] ~ nearxy[,-ycol]))

c(1,predpt) %*% bhat

}

source: Norm Matloff

80

Page 87: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Functions and Global Variables

I Functions should not modify global variables

I except connections or environments

I should not change global par() settings

81

Page 88: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

DRY

82

Page 89: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Don’t Repeat Yourself

83

Page 90: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Every piece of knowledge must have a single,

unambiguous, authoritative representation

within a system.

84

Page 91: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

DRY

# avoid repetition

plot(x, y, type = 'n')

points(x[size == 'xsmall'], y[size == 'xsmall'], col = 'purple')

points(x[size == 'small'], y[size == 'small'], col = 'blue')

points(x[size == 'medium'], y[size == 'medium'], col = 'green')

points(x[size == 'large'], y[size == 'large'], col = 'orange')

points(x[size == 'xlarge'], y[size == 'xlarge'], col = 'red')

# avoid repetition

size_colors <- c('purple', 'blue', 'green', 'orange', 'red')

plot(x, y, type = 'n')

for (i in seq_along(levels(size))) {points(x[size == i], y[size == i], col = size_colors[i])

}

85

Page 92: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

DRY

# avoid repetition

plot(x, y, type = 'n')

points(x[size == 'xsmall'], y[size == 'xsmall'], col = 'purple')

points(x[size == 'small'], y[size == 'small'], col = 'blue')

points(x[size == 'medium'], y[size == 'medium'], col = 'green')

points(x[size == 'large'], y[size == 'large'], col = 'orange')

points(x[size == 'xlarge'], y[size == 'xlarge'], col = 'red')

# avoid repetition

size_colors <- c('purple', 'blue', 'green', 'orange', 'red')

plot(x, y, type = 'n')

for (i in seq_along(levels(size))) {points(x[size == i], y[size == i], col = size_colors[i])

}

85

Page 93: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

RStudio Code Tools

86

Page 94: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

RStudio Shortcuts

87

Page 95: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Strongly Recommended

Look at other people’s codeI https://github.com/hadley

I https://github.com/yihui

I https://github.com/karthik

I https://github.com/kbroman

I https://github.com/cboettig

I https://github.com/garrettgman

88

Page 96: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Your Own Style

I It takes time to develop a personal style

I Try different styles and see which one best fits you

I Sometimes you have to adapt to a company’s style

I There is no one single best style

89

Page 97: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Test Yourself

What’s wrong with this function?

average <- function(x) {l <- length(x)

for(i in l) {y[i] <- x[i]/l

z <- sum(y[1:l])

return(as.numeric(z))

}}

90

Page 98: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Test Yourself

What’s wrong with this function?

freq_table <- function(x) {table <- table(x)

'category' <- levels(x)

'count' <- print(table)

'prop' <- table/length(x)

'cumcount' <- print(table)

'cumprop' <- table/length(x)

if(is.factor(x)) {return(data.frame(rownames=c('category', 'count','prop',

'cumcount','cumprop')))

} else {stop('Not a factor')

}}

91

Page 99: Good Habits in R Programming - STAT 133 - Gaston Sanchez · Literate Programming I Choose the names of variables carefully I Explain what each variable means I Strive for a program

Discussion

I What other suggestions do you have?

I How could we restructure the code, to make it easier toread?

I Grab a buddy and practice “code review”. We do it formethods and papers, why not code?

I Our code is a major scientific product and the result of alot of hard work!

92