45
SAS to R Migration Richard Pugh Commercial Director [email protected]

SAS to R Migration Richard Pugh Commercial Director [email protected]

Embed Size (px)

Citation preview

Page 1: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

SAS to R Migration

Richard Pugh

Commercial Director

[email protected]

Page 2: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Agenda

• What is SAS?• Why Migrate from SAS to R?• Case Study: Major Financial Company• How to Migrate from SAS to R?• Questions

Page 4: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

> fortune("SUV")

When talking about user friendliness of computer software I like the analogy of cars vs. busses: [...]

Using this analogy programs like SPSS are busses, easy to use for the standard things, but very frustrating if you want to do something that is not already preprogrammed.

R is a 4-wheel drive SUV (though environmentally friendly) with a bike on the back, a kayak on top, good walking and running shoes in the passenger seat, and mountain climbing and spelunking gear in the back. R can take you anywhere you want to go if you take time to learn how to use the equipment, but that is going to take longer than learning where the bus stops are in SPSS. -- Greg Snow R-help (May 2006)

Page 5: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Why Migrate to R?

Page 8: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Why NOT migrate?

Page 11: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Case StudyMajor Financial Firm

Page 13: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Background

• SAS User Base• Mature SAS Processes• Leverage Oracle Investment

Can Oracle R Enterprise replace SAS?

Page 14: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Initial PoC

• Key SAS process: Credit Risk• 1,625 lines of SAS Code

• 79 “data steps” • 66 “procedure” calls• 29 macros

• Passionate SAS User Community

Page 15: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Initial PoC

Page 16: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Initial PoC

Theme Question

Capabilities

Does R/ORE provide all the SAS capabilities required?

What gaps, if any, exist between R and SAS?

Where, and why, do results differ between R and SAS?

WorkflowHow does the “style” of coding differ between R and SAS?

How easy, or not, was it to implement the existing SAS workflow?

Skills

How much learning is required in order to become proficient in R?

How much learning is required to take on and manage the

R implementation of the modelling macros?

ExtensionsWhat areas of “value add” arise from using R for these tasks?

What areas of “value add” outside of the current scope are enabled by using R?

Page 17: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Oracle R Enterprise

• In-database implementation of R• Very appealing: take R to the database• Features of ORE

• ROracle Implementation• Transparency Layer• Publish functions to database (access via R, SQL)

• Learn more at www.oracle.com/goto/R

Page 18: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

How to Migrate from SAS to R?

Page 19: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

What to Migrate?

• Doesn’t happen overnight!• Choose a key first step

• Functional Area• Capability (e.g. graphics, time series)

Page 20: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Step 1Analyse

SAS Code

Page 21: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

SAS Code Analysis%Macro1;

data a; set b; run;

%mend;

1,500 lines of code

%Macro2;

data c; set a; run;

%mend;

• Can be complex• Scoping rules in

particular can be a challenge

Page 22: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Use R to Analyse SAS Code

Page 23: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Use R to Analyse SAS Code

Page 24: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

SAS Dependencies with functionMap

Page 25: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Step 2Tame the SAS Code

Page 26: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

• Version Control• Unit Tests• Continuous

Integration

Tame your SAS Code

Page 27: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Step 3Translate the Code

Page 28: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Translate the Code

• Translate the Unit Tests first• Then, translate macros one at a time• Proc translates can be partially-automated, but

care must be taken

Page 29: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

 %macro sampler(DS=);    data random; set datalib.&DS.;      xxx=ranuni(54321);       origorder + 1;   run;    proc sort data=random ; by xxx; run;    data datalib.&DS.;      set random nobs=numg;      if _n_ le &DEVPERC.*numg  then Holdout=0;      else Holdout=1;   run;    proc sort data=datalib.&DS.; by origorder; run;                 proc freq data= datalib.&DS.;      tables Holdout  /missing;       weight weight;   run; %mend sampler; 

 sampler <- function(ds,  DEVPERC = .8, hCol = ‘HOLDOUT”)  {    N <- nrow(ds)    holdTest <- runif(N) > DEVPERC   ds[[hCol]] <- as.numeric(holdTest)    outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length)    print(transform(outDf, Percent = round(100 * Freq / N, 2)))     invisible(ds)}

Page 30: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

 %macro sampler(DS=);    data random; set datalib.&DS.;      xxx=ranuni(54321);       origorder + 1;   run;    proc sort data=random ; by xxx; run;    data datalib.&DS.;      set random nobs=numg;      if _n_ le &DEVPERC.*numg  then Holdout=0;      else Holdout=1;   run;    proc sort data=datalib.&DS.; by origorder; run;                 proc freq data= datalib.&DS.;      tables Holdout  /missing;       weight weight;   run; %mend sampler; 

 sampler <- function(ds,  DEVPERC = .8, hCol = ‘HOLDOUT”)  {    N <- nrow(ds)    holdTest <- runif(N) > DEVPERC   ds[[hCol]] <- as.numeric(holdTest)    outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length)    print(transform(outDf, Percent = round(100 * Freq / N, 2)))     invisible(ds)}

Page 31: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

 %macro sampler(DS=);    data random; set datalib.&DS.;      xxx=ranuni(54321);       origorder + 1;   run;    proc sort data=random ; by xxx; run;    data datalib.&DS.;      set random nobs=numg;      if _n_ le &DEVPERC.*numg  then Holdout=0;      else Holdout=1;   run;    proc sort data=datalib.&DS.; by origorder; run;                 proc freq data= datalib.&DS.;      tables Holdout  /missing;       weight weight;   run; %mend sampler; 

 sampler <- function(ds,  DEVPERC = .8, hCol = ‘HOLDOUT”)  {    N <- nrow(ds)    holdTest <- runif(N) > DEVPERC   ds[[hCol]] <- as.numeric(holdTest)    outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length)    print(transform(outDf, Percent = round(100 * Freq / N, 2)))     invisible(ds)}

Page 32: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

 %macro sampler(DS=);    data random; set datalib.&DS.;      xxx=ranuni(54321);       origorder + 1;   run;    proc sort data=random ; by xxx; run;    data datalib.&DS.;      set random nobs=numg;      if _n_ le &DEVPERC.*numg  then Holdout=0;      else Holdout=1;   run;    proc sort data=datalib.&DS.; by origorder; run;                 proc freq data= datalib.&DS.;      tables Holdout  /missing;       weight weight;   run; %mend sampler; 

 sampler <- function(ds,  DEVPERC = .8, hCol = “HOLDOUT”)  {    N <- nrow(ds)    holdTest <- runif(N) > DEVPERC   ds[[hCol]] <- as.numeric(holdTest)    outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length)    print(transform(outDf, Percent = round(100 * Freq / N, 2)))     invisible(ds)}

Page 33: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

 %macro sampler(DS=);    data random; set datalib.&DS.;      xxx=ranuni(54321);       origorder + 1;   run;    proc sort data=random ; by xxx; run;    data datalib.&DS.;      set random nobs=numg;      if _n_ le &DEVPERC.*numg  then Holdout=0;      else Holdout=1;   run;    proc sort data=datalib.&DS.; by origorder; run;                 proc freq data= datalib.&DS.;      tables Holdout  /missing;       weight weight;   run; %mend sampler; 

 sampler <- function(ds,  DEVPERC = .8, hCol = “HOLDOUT”)  {    N <- nrow(ds)    holdTest <- runif(N) > DEVPERC   ds[[hCol]] <- as.numeric(holdTest)    outDf <- aggregate( list(Freq = ds[[hCol]]), ds[hCol], length)    print(transform(outDf, Percent = round(100 * Freq / N, 2)))     invisible(ds)}

17 SAS Lines > 8 R Lines

Page 34: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Step 4Use Oracle R Enterprise

Page 35: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Oracle R Enterprise

• Remove code to import/export from database• Replace with links to the database• Look for other opportunities (e.g. using in-

database GLM vs standard)

Page 36: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Oracle R Enterprise 

library(ORE) # Load the libraryore.connect(…) # Make the connection

… 

ore.create(newData, table = "X") # Create new db tableX[1:5, ] # Simple Command

# Define function to runtheFun <- function(x, F, ...) step(ore.glm(F, data = x,

family = "binomial"), direction = "both") 

# Run the modelstepOut <- ore.tableApply(X, theFun, F = as.formula("DV ~ *"))

… 

ore.disconnect()

Page 37: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Review

AnalysisCode

UnitTests

UnitTests

AnalysisCode

Oracle REnterprise

SQLInterface

Page 38: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Findings

• A formal migration process allows for a clear and accurate transition

• SAS code conversion to R at a rate of ~200 lines per day

• Code base reduces by ~55%

Page 39: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Challenges

• More relaxed formal scoping of SAS• Differences in statistical algorithms• The danger of migrating poor code flows

Page 40: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Code Migration isn’t just technical …

Page 41: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

SAS Migration is more about people …

Page 42: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Why are these business users so defensive?  It’s just a computer language!!

Taking away SAS means taking away 

my ability to do analysis!!

Page 43: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

Convincing People to move to R

• Concede some ground …• Show quick wins• Teach the basic data structures early

Page 44: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

[email protected]

SAS to R Brain Dump …

Page 45: SAS to R Migration Richard Pugh Commercial Director rpugh@mango-solutions.com

Questions?