10
Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

Embed Size (px)

Citation preview

Page 1: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

Writing Faster Code in RR meetup

Arelia T. WernerMay 20th 2015

Tectoria, Victoria, BC

Page 2: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

Background

• Different skill levels with R in this group• Me: easy to understand versus runs faster• I work with ‘big’ data so faster code useful• Also - faster code assists with debugging• I have a tendency to write in for loops (I think this

comes from learning from people who previously programmed in Fortran)

Page 3: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC
Page 4: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

Example Loop versus Function Speed

> system.time(for (i in 1:1000) {rnorm(100)})user system elapsed0.016 0.000 0.014

> system.time(replicate(1000, rnorm(100)))user system elapsed0.02 0.00 0.02

Page 5: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

Rules of thumb with Loops

• Avoid nested loops at all costs• Use a counter with while loops

Page 6: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC
Page 7: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

Avoid loops with “apply”> system.time( for (i in 1:ncol(worldbank)) {+ tmp <- is.na(worldbank[[i]])+ mv[i] <- sum(tmp)+ })user system elapsed0.004 0.000 0.000> mv[1] 0 0 0 0 0 0 0 0

> system.time(apply(worldbank, 2, function(x) sum(is.na(x))))user system elapsed0.02 0.00 0.02

Page 8: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC
Page 9: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

http://adv-r.had.co.nz/Performance.html

• The best tool for microbenchmarking in R is the microbenchmark package. It provides very precise timings, making it possible to compare operations that only take a tiny amount of time. For example, the following code compares the speed of two ways of computing a square root.

• Instead of using microbenchmark(), you could use the built-in function system.time(). But system.time() is much less precise, so you’ll need to repeat each operation many times with a loop, and then divide to find the average time of each operation, as in the code below.

• Alex will talk about this more.

Page 10: Writing Faster Code in R R meetup Arelia T. Werner May 20 th 2015 Tectoria, Victoria, BC

worldbank <- read.table("http://www.olivialau.org/teaching/sample_data/worldbank1.csv", sep=":", header=TRUE)worldbank <- worldbank[c(1,4,7,10,13,16,19,22)]