34
1 Writing Better R Code Hui Zhang Research Analytics

1 Writing Better R Code Hui Zhang Research Analytics

Embed Size (px)

Citation preview

Page 1: 1 Writing Better R Code Hui Zhang Research Analytics

1

Writing Better R Code

Hui ZhangResearch Analytics

Page 2: 1 Writing Better R Code Hui Zhang Research Analytics

2

Outline

• Approaches for improving the performance of R codes– Some previous knowledge of R is recommended – Some familiarity with C/C++ is also recommended.

• Topics– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

Page 3: 1 Writing Better R Code Hui Zhang Research Analytics

3

Loops

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

Page 4: 1 Writing Better R Code Hui Zhang Research Analytics

4

Loops

• Writing Better R Code– Loops

• for• while• No goto’s or do while’s• They are really slow

Page 5: 1 Writing Better R Code Hui Zhang Research Analytics

5

Loops

• Writing Better R Code– Loops

• Best Practices– Mostly try to avoid– Evaluate practicality of rewrite (plys, vectorization,

compiled code)– Always preallocate (when you can):

» Vectors: numeric(n), integer(n), character(n)» Lists: vector(mode=“list”, length=n)» Dataframes: data.frame(col1=numeric(n), …)

– If you can’t, try something other than an array/list.

Page 6: 1 Writing Better R Code Hui Zhang Research Analytics

6

Loops

Page 7: 1 Writing Better R Code Hui Zhang Research Analytics

7

Loops

Page 8: 1 Writing Better R Code Hui Zhang Research Analytics

8

Ply Fucntions

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

Page 9: 1 Writing Better R Code Hui Zhang Research Analytics

9

Ply Functions

• Writing Better R Code– Loops– Ply Functions

• R has functions that apply other functions to data• In a nutshell: loop sugar• Typical *ply’s

– apply(): apply function over matrix “margin(s)”– lapply(): apply function over list/vector– mapply(): apply function over multiple lists/vectors– sapply(): same as lapply(), but (possibly) nicer output– Plus some other mostly irrelevant ones

Page 10: 1 Writing Better R Code Hui Zhang Research Analytics

10

Ply Functions

Page 11: 1 Writing Better R Code Hui Zhang Research Analytics

11

Ply Functions

Page 12: 1 Writing Better R Code Hui Zhang Research Analytics

12

Ply Functions

• Writing Better R Code– Loops– Ply Functions

Transforming Loops into Ply’s

Page 13: 1 Writing Better R Code Hui Zhang Research Analytics

13

Ply Functions

• Writing Better R Code– Loops– Ply Functions

• Most Ply’s are just shorthand/higher expression of loops• Generally not much faster (if at all), especially with the

compiler• Thinking in terms of lapply() can be useful however …

Page 14: 1 Writing Better R Code Hui Zhang Research Analytics

14

Vectorization

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

Page 15: 1 Writing Better R Code Hui Zhang Research Analytics

15

Vectorization

• Writing Better R Code– Loops– Ply Functions– Vectorization

• x + y• X[, 1] <- 0• Rnorm(1000)

Page 16: 1 Writing Better R Code Hui Zhang Research Analytics

16

Vectorization

• Writing Better R Code– Loops– Ply Functions– Vectorization

• Same in R as in other high-level languages (Matlab, Rython, …)

• Idea: use pre-existing compiled kernels to avoid interpreter overhead

• Much faster than loops and plys

Page 17: 1 Writing Better R Code Hui Zhang Research Analytics

17

Vectorization

• Writing Better R Code– Loops– Ply Functions– Vectorization

Page 18: 1 Writing Better R Code Hui Zhang Research Analytics

18

Vectorization

• Writing Better R Code– Loops– Ply Functions– Vectorization

• Best Practices– Vectorize if at all possible– Note that this consumes potentially a lot of memory

Page 19: 1 Writing Better R Code Hui Zhang Research Analytics

19

Ply Fucntions

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

Page 20: 1 Writing Better R Code Hui Zhang Research Analytics

20

Putting It All Together

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization

• Loops are slow• apply() are just for loops• lapply(), sapply(), mapply() are not for loops• Ply functions are not vectorized• Vectorization is fastest, but often needs a lot of memory

Page 21: 1 Writing Better R Code Hui Zhang Research Analytics

21

Putting It All Together

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization

• Example: let us compute the square of the number 1-100000, using– for loop without preallocation– for loop with preallocation– sapply()– vectorization

Page 22: 1 Writing Better R Code Hui Zhang Research Analytics

22

Putting It All Together

Page 23: 1 Writing Better R Code Hui Zhang Research Analytics

23

RCPP

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

Page 24: 1 Writing Better R Code Hui Zhang Research Analytics

24

RCPP

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

• R is mostly a C program• R extensions are mostly R programs

Page 25: 1 Writing Better R Code Hui Zhang Research Analytics

25

RCPP

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

• Rcpp is:– R interface to compiled code– Package ecosystem (Rcpp, RcppArmadillo, RcppEigen, …)– Utilities to make writing C++ more convenient for R users– A tool which requires C++ knowledge to effectively utilize– GPL licensed (like R)

Page 26: 1 Writing Better R Code Hui Zhang Research Analytics

26

RCPP

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

• Rcpp is not– Magic– Automatic R-to-C++ converter– A way around having to learn C++– A tool to make existing R functionality faster (unless you rewrite it)– As easy to use as R

Page 27: 1 Writing Better R Code Hui Zhang Research Analytics

27

RCPP

• Writing Better R Code– Loops– Ply Functions– Vectorization– Loop, Plys, and Vectorization– Interfacing to Compiled Code

• Rcpp’s advantage– Compiled code is fast– Easy to install– Easy to use (comparatively)– Better documented than alternatives– Large, friendly, helpful community

Page 28: 1 Writing Better R Code Hui Zhang Research Analytics

28

RCPP

Page 29: 1 Writing Better R Code Hui Zhang Research Analytics

29

RCPP

• Example: Monte Carlo Simulation to Estimate – Sample N uniform observation (xi, yi) in the unit square [0,1] X

[0,1]. Then=

Page 30: 1 Writing Better R Code Hui Zhang Research Analytics

30

RCPP

• Example: Monte Carlo Simulation to Estimate

Page 31: 1 Writing Better R Code Hui Zhang Research Analytics

31

RCPP

• Example: Monte Carlo Simulation to Estimate

Page 32: 1 Writing Better R Code Hui Zhang Research Analytics

32

RCPP

• Example: Monte Carlo Simulation to Estimate

Page 33: 1 Writing Better R Code Hui Zhang Research Analytics

33

RCPP

• Example: Monte Carlo Simulation to Estimate

Page 34: 1 Writing Better R Code Hui Zhang Research Analytics

34

Summary

• Bad R often looks like good C/C++• Vectorize your code as you much as you can• Interfacing with compiled code helps