57
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Page 2: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

In-Database Analytics: Statistics and Advanced Analytics with R—Oracle R Enterprise Charlie Berger Sr. Director Product Management, Data Mining and Advanced Analytics Oracle Corporation [email protected] www.twitter.com/CharlieDataMine

FPO

R Open Source

Page 3: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remain at the sole discretion of Oracle.

Page 4: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Agenda

• Big Data & Big Data Analytics • Open Source Project

– Challenges limiting enterprise adoption of R

• Enterprise – Features, benefits and advantages

• Big Data Appliance – Open source distribution of R

• Oracle R Enterprise Beta Program • Q & A

R Open Source

New

New

Page 5: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

What Makes it Big Data?

VOLUME VELOCITY VARIETY VALUE

SOCIAL

BLOG

SMART METER

Page 6: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Make Better Decisions Using Big Data

Big Data in Action

ANALYZE

DECIDE ACQUIRE

ORGANIZE

Page 7: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Announcing Oracle R Enterprise

New

Page 8: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Statistical Programming Language

Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality plots Highly extensible with open source community R packages

Page 9: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Open Source

Driven in part by the rise of big data, business intelligence (BI) is a rapidly growing market that has seen increasingly strong enterprise adoption rates. The concurrent to the growth

of BI has been increased investment in predictive analytics; R is not only the tool of choice but the ideal environment for advanced analysis. R is designed to be extensible and integrate within BI suites to incorporate advanced analytics into reports.

http://www.gartner.com/technology/core/products/research/topics/businessIntelligence.jsp

“Hype Cycle for Analytic Applications, 2011, 30 August 2011

The number of web site links that point to the main web site of each software package on March 19, 2011. http://www.r4stats.com/popularity

Page 10: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Growing Popularity

• R’s rapid adoption over several years has earned its reputation as a new statistical software standard

– Rival to SAS and SPSS

While it is difficult to calculate exactly how many people use R, those most familiar with the software estimate that close to 250,000 people work with it regularly. “Data Analysts Captivated by R’s Power”, New York Times, Jan 6, 2009

http://www.r-project.org/

Page 11: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Statistical and advanced analyses are run and stored on the user’s laptop

Typical R Approach

Page 12: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

What Are ’s Challenges?

1. R is memory constrained –R processing is single threaded - does not exploit available

compute infrastructure –R lacks industrial strength for enterprise use cases

2. R has lacked mindshare in Enterprise market –R is still met with caution by the long established SAS and

IBM/SPSS statistical community • However, major university (e.g. Yale ) Statistics courses now taught in R • The FDA has recently shown indications for approval of new drugs for which

the submission’s data analysis was performed using R

Page 13: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle R Enterprise Approach

Data and statistical analysis are stored and run in-database Same R user experience & same R clients Embed in operational systems Complements Oracle Data Mining

R Open Source

Page 14: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

14 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

What is Enterprise? • Oracle R Enterprise brings R’s statistical

functionality closer to the Oracle Database 1. Eliminate R’s memory constraint by enabling R

to work directly & transparently on database objects – Allows R to run on very large data sets

2. Architected for Enterprise production infrastructure – Automatically exploits database parallelism without require parallel R

programming – Build and immediately deploy

3. Oracle R leverages the latest R algorithms and packages – R is an embedded component of the DBMS server

R Open Source

R Open Source

Page 15: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

15 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Architecture and Performance • Transparently function-ships R

constructs to database via R SQL translation –Data structures –Functions

• Data manipulation functions (select, project, join) • Basic statistical functions (avg, sum, summary) • Advanced statistical functions(gamma, beta)

• Performs data-heavy computations in database –R for summary analysis and graphics

• Transparent implementation enables using wide range of R “packages” from open source community

Sec

onds

Page 16: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

16 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle R Enterprise Architecture

Function push-down – data transformation &

statistics

R workspace console

Oracle statistics engineOBIEE, Web Services

Development Production Consumption

ROpen Source

Worst

Best

Use Case: Using ONTIME airline data, of the 36 busiest airports, run a box-plot analysis of the best/worst airports for

arrival delay?

Page 17: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle R Enterprise for Statistical Development

Oracle R OCI package makes all Oracle tables/views visible to R

R Oracle R

R commands, graphics, packages are identical to user in both R and Oracle R Enterprise Data can be in R data frames or Oracle Tables/Views

Oracle Confidential

Page 18: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle R Enterprise for Statistical Development

Oracle R OCI package makes all Oracle tables/views visible to R

R Oracle R

R commands, graphics, packages are identical to user in both R and Oracle R Data can be in R data frames or Oracle Tables/Views

Oracle Confidential

Page 19: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

19 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Benefits

• Oracle R Enterprise enables you to: – Run R to interactively explore and analyze data inside the Database – Develop R scripts on big data stored as tables and views inside the Oracle

database and then deploy them within the enterprise—without requiring code changes

– Leverage R’s familiar R console and open source R GUIs and IDEs to explore and analyze data either in the database and stored as R data frames

– Meet the statistical and advanced analytical requirements of the enterprise – Exploit an information technology platform designed to support analytically-

driven applications. – Leverage 30+ years of experience of ever advancing Oracle Database

technology.

"R for the Enterprise" R Open Source

Page 20: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

20 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

How Oracle R Enterprise Works • Oracle R Enterprise tightly integrates R with the database and fully

manages the data operated upon by R code. – The database is always involved in serving up data to the R code. – Oracle R Enterprise runs in the Oracle Database.

• Oracle R Enterprise eliminates data movement and duplication, maintains security and minimizes latency time from raw data to new information.

• Three ORE Computation Engines – Oracle R Enterprise provides three different interfaces between the open-source R engine and

the Oracle database:

1. Oracle R Enterprise (ORE) Transparency Layer 2. In-Database Statistics Engine 3. Embedded R

ORE Computation Engines R Open Source

Page 21: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

21 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

How Oracle R Enterprise Works 1. Oracle R Enterprise (ORE) Transparency Layer

– Traps all R commands and scripts prior to execution and looks for opportunities to function ship them to the database for native execution

– ORE transparency layer converts R commands/scripts into SQL equivalents and thereby leverages the database as a compute engine.

ORE Computation Engines R Open Source

Page 22: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

22 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

How Oracle R Enterprise Works 2. In-Database Statistics Engine

– Significantly extends the Oracle Database’s library of statistical functions and advanced analytical computations

– Provides support for the complete R language and statistical functions found in Base R and selected R packages based on customer usage • Open source packages - written entirely in R language with only

the functions for which we have implemented SQL counterparts - can be translated to execute in database.

– Without anything visibly different to the R users, their R commands and scripts are oftentimes accelerated by a factor of 10-100x

– Base SAS and most common SAS PROC "knock-offs"

ORE Computation Engines

All Base R functions R Multiple Regression …. Driven by customers

Base SAS PROCS • PROC FREQ • PROC MEANS • PROC RANK • PROC STANDARD • PROC SUMMARY • PROC UNIVARIATE • PROC APPEND • PROC SORT • PROC TRANSPOSE • PROC SQL • PROC CORR

R Open Source

Page 23: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

23 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

How Oracle R Enterprise Works 3. Embedded R

– For R functions not able to be mapped to native in-database functions, Oracle R Enterprise makes “extproc” remote procedure calls to multiple R engines running on multiple database servers/nodes

– This Oracle R Enterprise embedded layer uses the database as a data provider providing data level parallelism to R code

– The interfaces, called embedded-layer RQ functions, pass streams of data to one or more instances of R for (parallel) row by row processing (scoring), groups of rows processing (building a model one per group) and table of rows processing (building a model

– These functions are used for “operationalizing” R code to run in production

ORE Computation Engines R Open Source

Page 24: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

24 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle R Enterprise Example Illustrates Use of all 3 Engines from within 1 R Script

R Open Source

Page 25: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

25 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle—Hardware and Software Engineered to Work Together

• Oracle is the world's most complete, open, and integrated business software and hardware systems company • Data Warehousing, VLDB and ILM

• Oracle R Enterprise • R for the Enterprise

• Oracle Data Mining Option • 12- in-DB data mining algorithms

Oracle has taught the Database how to do Advanced Math/Statistics/Data Mining, and more…

New

New GUI

Page 26: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

26 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Two Separate Worlds…

• DBA – Security – Control – Scalability – Performance

• Line of Business – Ad hoc – Exploratory data analysis – Interactive graphics – Problem-solving

Page 27: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

27 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Brings Them Together

• DBA – Security – Control – Scalability – Performance

• Line of Business – Ad hoc – Exploratory data analysis – Interactive graphics – Problem-solving +

Page 28: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

28 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Better Business Intelligence Enrich BI Dashboards with Statistics, Data Mining and Adv. Analytics • Ad hoc • Exploratory data analysis • Interactive graphics • Problem-solving Oracle R Enterprise's

and ODM's results become a data feed for OBIEE

Page 29: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

29 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

ORACLE R ENTERPRISE FUNDAMENTALS

R Open Source

Page 30: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Starting up Oracle R Enterprise

When you start up Oracle R Enterprise, it loads several packages and automatically connects to an Oracle database.

Page 31: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

31 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Data and Summary Statistic cars head(cars) • Prints top rows of data set

summary(cars) • Provides summary

statistics for data set

Page 32: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Arithmetic & Basics GRADE "Built-in" Dataset

R> attach(GRADE) R> head(GRADE) ... R> max(FINALGRADE) [1] 97 R> min(FINALGRADE) [1] 71 R> max(FINALGRADE)- min(FINALGRADE)

[1] 26 R> mean(FINALGRADE) [1] 83 R> sd(FINALGRADE) [1] 9.237604

Page 33: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Histogram cars R> hist(cars$acceleration)

Fast cars! Slow cars

Page 34: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

34 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Graphics R> plot(cars$weight, cars$mpg)

Heavy cars

Page 35: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

35 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Graphics R> abline(coef(lm(acceleration ~ weight, cars)), col = "red")

Faster cars are heavier?

Page 36: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

36 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Graphics R> boxplot(split(weight, cylinder), col = "blue")

Heavier cars have 8 cylinders

Page 37: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

37 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Graphics R> boxplot(split(cars$mpg, cars$model.year), col = "green")

MPG increases over time…

Page 38: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

38 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Graphics R> boxplot(split(cars$acceleration, cars$model.year), col = "red")

If you want a FAST car, buy an 8 cylinder '70 model car

Page 39: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

39 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Graphics

R> plot(cars)

• Supports Exploratory Data Analysis for Oracle data

Page 40: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

40 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Graphics R> plot(data.frame(cars$acceleration,cars$mpg, cars$weight, cars$cylinders), col = "purple")

• Supports Exploratory Data Analysis for Oracle data

Page 41: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

41 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

R Graphics Using Add-in Pacakge install.packages("scatterplot3d")

--- Please select a CRAN mirror for use in this session ---

trying URL 'http://cran.case.edu/bin/windows/contrib/2.12/scatterplot3d_0.3-33.zip'

Content type 'application/zip' length 605876 bytes (591 Kb)

opened URL

downloaded 591 Kb

package 'scatterplot3d' successfully unpacked and MD5 sums checked

The downloaded packages are in

C:\Documents and Settings\chberger\Local Settings\Temp\RtmpAEe7NC\downloaded_packages

R> library(scatterplot3d)

Warning message:

package 'scatterplot3d' was built under R version 2.12.2

R> scatterplot3d(cars)

Page 42: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

42 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Linear Models

Page 43: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

43 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Linear Models

## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2,10,20, labels=c("Ctl","Trt")) weight <- c(ctl, trt) anova(lm.D9 <- lm(weight ~ group)) summary(lm.D90 <- lm(weight ~ group - 1))# omitting intercept opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) plot(lm.D9, las = 1) # Residuals, Fitted, ... par(opar) ## model frame : stopifnot(identical(lm(weight ~ group, method = "model.frame"), model.frame(lm.D9))) ### less simple examples in "See Also" above

Example from R Help(lm) http://127.0.0.1:19161/library/stats/html/lm.html

Page 44: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

44 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Linear Models

Page 45: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

45 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Oracle R Enterprise ARIMA Forecasting Script year200801 <- ONTIME_S[(ONTIME_S$YEAR==2008)& (ONTIME_S$MONTH==1),] y <- ore.pull(year200801) gc() delays <- tapply(y$ARRDELAY, y$DAYOFMONTH, mean, na.rm=TRUE) delays <- ts(delays, start=1, end=31, frequency=1) # Create a Kalman filter with the first 5 delays and predict the rest preds <- c() ses <- c() # 1 step predictions for (i in 5:length(delays)) { fit <- arima(delays[1:i], c(1,2,1)) # predict 1 step into the future. pred <- predict(fit) preds <- c(preds, pred$pred) ses <- c(ses, pred$se) } plot(5:length(delays), preds, type='l', col='green', ylim=range(c(preds+2*ses, preds-2*ses)), xlab="DEay of month", ylab="Predicted average delay (in minutes)", main="Average delays by day for January 2008") lines(5:length(delays), preds+2*ses, col='red') lines(5:length(delays), preds-2*ses, col='red') points(5:length(delays), as.vector(delays[5:length(delays)])) legend( 23, -8, c("Delay", "Predicted delay", "2 se confidence"), col=c(1, 3, 8), lty=c(0, 1, 1), pch=c(1, -1, -1), merge=TRUE)

Page 46: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

46 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Statistical Quality Control R Applications • The qcc package for R:

– Plots Shewhart quality control charts for continuous, attribute and count data;

– �Plots Cusum and EWMA charts for continuous data;

– Performs process capability analyses;

– Creates Pareto charts and cause-and-effect diagrams

http://www.stat.unipg.it/luca/Rnews_2004-1-pag11-17.pdf

Page 47: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

47 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Statistical Quality Control R Applications Process capability studies to characterize and understand the behavior of a "process"

http://www.stat.unipg.it/luca/Rnews_2004-1-pag11-17.pdf

Pareto (80/20 rule) analysis to understand which few factors contribute most

Page 48: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

48 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

You Can Think of It Like This… Traditional SQL

• “Human-driven” queries • Domain expertise • Lacks necessary statistical/adv.

analytcal functionality

• SQL Queries • SELECT

• DISTINCT • AGGREGATE • WHERE

• AND OR • GROUP BY • ORDER BY

• RANK

In-Database Stats/Adv.Analytics • Wide range of Oracle R Enterprise

statistical functions

• Ability to develop and deploy R scripts within the enterprise

• ORE Statistics/Adv. Analytics • SUMMARY • CORR

• Regression • Shewhart • ARIMA (Time Series)

• R packages

+ R

Open Source

Page 49: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

49 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Summary

"R for the Enterprise" • Enables DBAs and LOB users to readily integrate R models into production • Enables R models to be integrated into BI dashboards • Enables R programmers/statisticians to work against database data without knowing SQL • Reduces the number of LOB help requests for SQL queries to obtain data • Removes the need to manage data outside Oracle Database

Save money on SA$! • Use Oracle R Enterprise instead of Base SAS and reduce SA$ Annual Usage Fees • Private analytical sandboxes for LOB/data analyst to work directly on database data in-database

Oracle in-Database Analytics for Big Data • Over 100 built-in statistical functions that are compatible with Base SAS • High performance in-database linear algebra • Data parallelism for open source R packages executing in-database • Develop your own algorithms for execution closer to the data, and leverage database parallelism

Page 50: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

50 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Announcing Open source distribution of R

New

Open source distribution of R

Page 51: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

51 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Big Data Appliance & Software

Integrated Big Data Platform

Oracle Distribution of Apache Hadoop

Oracle NoSQL Database EE

Oracle Data Integrator Application Adapter for Hadoop

Oracle Loader for Hadoop

Oracle Hadoop Tools

Open source distribution of R Oracle Linux and JVM

New

Supported by

Page 52: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

52 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Big Data Appliance + R For Compute Intensive Operations Using R

Massively parallel

computations

logreg <- function(input, iterations, dims, alpha){ plane = rep(0, dims) g = function(z) 1/(1 + exp(-z)) for (i in 1:iterations) { z = hdfs.get(hadoop.run( input, export = c(plane, g), map = logisticRegressionMapper, reduce = logisticRegressionReducer)) gradient = c(z$val[1], z$val[2]) plane = plane + alpha * gradient } plane } x = hdfs.push(WEBSESSIONS) logreg(x, 10, 2, 0.05)

Function push-down – data transformation &

statistics

R workspace console

Oracle statistics engine

OBIEE, Web Services

Page 53: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

53 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Summary

• Enterprise

• Sign up for Oracle R Enterprise Beta Program

• Big Data Appliance

– Open source distribution of R—coming soon!

R Open Source

New

New

Page 54: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

54 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

• MARK YOUR CALENDARS!

• BIWA Summit @ • COLLABORATE 12

April 22-26, 2012 Mandalay Bay Convention Center

Las Vegas, Nevada http://events.ioug.org/p/cm/ld/fid=15

Page 55: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

55 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Q&A

Page 56: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

56 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.

Page 57: 1 Copyright © 2011, Oracle and/or its affiliates. All …...• Big Data & Big Data Analytics • Open Source Project – Challenges limiting enterprise adoption of R • Enterprise

57 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.