42
Microsoft R Server Ing. Eduardo Castro, PhD [email protected] Data Science Specialization Microsoft Data Platform MVP PASS Regional Mentor PASS Board Advisor

Microsoft R Server

Embed Size (px)

Citation preview

Page 1: Microsoft R Server

Microsoft R Server

Ing. Eduardo Castro, PhD

[email protected]

Data Science SpecializationMicrosoft Data Platform MVPPASS Regional MentorPASS Board Advisor

Page 2: Microsoft R Server

Organiza

http://tinyurl.com/ComunidadWindows

Page 3: Microsoft R Server

Patrocinadores del SQL Saturday

Platinum Sponsor

Diamond Sponsor

Bronze Sponsor

Page 4: Microsoft R Server

Fuentes consultadas

Esta presentación include slides tomados de las siguientes fuentes:

Revolution R Enterprise. Hong Ooi. Data Science with Azure Machine Learning,

SQL Server and R. Lukawiecki

Tutoriales y Demostraciones https://msdn.microsoft.com/en-us/library/mt590

536.aspx7

Page 5: Microsoft R Server

La ciencia de datos

El método científico de razonamiento aplicado de decisiones basadas en datos

Hipótesis, experimentos, hechos, lógico razonamiento+ Ingeniería de datos.

Page 6: Microsoft R Server

Data wrangling (munging), retrieval + storage

Data mining & machine learning

Statistics

Big data

la ciencia

de datos

Page 7: Microsoft R Server

¿Cómo?

DatosModelosNecesidad de negocios

Page 8: Microsoft R Server

El aprendizaje automático ≣ ciencia de datos

exploradatos

encuentra patrones

Predecir (scoring)

Page 9: Microsoft R Server

Herramientas disponibles

Page 10: Microsoft R Server

Herramientas

Chart from "2014 Data Science Salary Survey" (ISBN 978-1-491-91842-5)© 2015 O'Reilly Media, used with permission. Arrows mine.For more info, and great titles on data science, visit oreilly.com

Herramienta de la ciencia de datos # 1: SQL

¡Microsoft SQL Server!

Lenguage R

SAS

A veces

Están teniendo auge

Page 11: Microsoft R Server

Metodología sugerida

SSAS Data

MiningR Azure ML

Fácil, visual, intuitiva, Excel, simplemente

funciona

Estadísticas descriptivas, “sentir” sus datos, más algoritmos

Los algoritmos avanzados, el auto-tuning,

servicios web, nube!

Page 12: Microsoft R Server

Otras herramientas de las ciencias de datos de Microsoft

HDInsightHadoop en la nube+ Storm (análisis en tiempo real)+HBase (NoSQL)+Mahoot (ML!)

Azure Stream AnalyticsStreaming Data procedentes de la nubeBasado en HDInsight/ Hadoop

También son útiles:Power BI: Power Query, Power View, and DashboardsExcelAzure Data Factory (ETL in the cloud)Analytics Platform System (SQL Server on steroids + Hadoop + hardware)

Page 13: Microsoft R Server

¿Qué es R?

Lenguaje interpretado, pobre IDE 5000+ paquetes de software estadístico Mejor IDE: RStudio

http://www.rstudio.com/

Rattle y OnePageR hace que sea aún más fácil

Código abierto, libre, multiplataforma R Core: la versión más pura: http://cran.r-project.org/ Revolution Analytics: paralelismo y Rendimiento:

http://www.revolutionanalytics.com/ Azure ML: built-in

Page 14: Microsoft R Server

Limitaciones del open source R

R necesita datos en memoria R solo tiene un hilode ejecución

R require habilidades especializadas para crear cluster

R Open es soportado por la comunidad

Revolution R Enterprise brinda una solución a esto!

Page 15: Microsoft R Server

Usuarios de Revolution Analytics

Page 16: Microsoft R Server

Revolution roadmap con Microsoft

Continua el soporte para estas plataformas Windows Linux Hadoop Teradata

Integración con nuevas plataformas Azure Marketplace Azure ML Azure HDInsight Sql Server 2016 Azure SQL Frontend tooling/BI integration

Page 17: Microsoft R Server

Revolution R vs open source R

NO tiene límites de RAM• Open source R llena la

memoria y falla• RRE escala lineamiente

aunque sobrepase el límite de RAM

Algoritmos más rápidos• RRE optimizado para gran

cantidad de datos

File Name

Compressed File Size

(MB) No. Rows

Open Source R

(secs)Revolution R

(secs)Tiny 0.3 1,235 0.00 0.05V. Small 0.4 12,353 0.21 0.05Small 1.3 123,534 0.03 0.03Medium 10.7 1,235,349 1.94 0.08Large 104.5 12,353,496 60.69 0.42

Big (full) 12,960.0123,534,96

9 Memory! 4.89

V.Big 25,919.7247,069,93

8 Memory! 9.49

Huge 51,840.2494,139,87

6 Memory! 18.92

Public US Flight Data Linear Regression sobre el campo Arrival Delay Ejecución en 4 core laptop, 16GB RAM and 500GB

SSD

Page 18: Microsoft R Server

Revolution R vs SAS

• Pruebas realizadas por consultores independientes – 5 x 4 core maquinas ejecutando sobre CentOS

• SAS 9.4: Base SAS, SAS/STAT, Grid Mgr • Revolution R Enterprise ScaleR, con

IBM Platform LSF, Platform MPI Release 9

• Data set: 591 columnas y 5,000,000 filas

Page 19: Microsoft R Server

Cientificos de datosinteractuar directamente con

datos

Incorporado a SQL Server

Desarrollador de datos / DBA

Manejo de datos y analíticas en el mismo motor

Incorporando el análisis avanzadoDentro de la base de datos de análisis

Ejemplo de soluciones• La detección del fraude• Pronóstico de ventas• la eficiencia del inventario• Mantenimiento predictivo

datos relacional

Biblioteca analítica

T-SQL Interface

Extensibilidad

?RIntegración

R

010010

100100

010101

Microsoft AzureMachine Learning Marketplace

R nuevas secuencias de

comandos01001

010010

001010

1

010010

100100

010101

010010

100100

010101

010010

100100

010101

010010

100100

010101

Page 20: Microsoft R Server

Integración con R Scripts

Fuente: https://visualstudiomagazine.com/articles/2015/05/04/sql-server-2016-preview.aspx

Page 21: Microsoft R Server

Revolution R vs SAS

• Pruebas realizadas por empresa independiente – 5 x 4 core machines ejecutando CentOS

• SAS 9.4: Base SAS, SAS/STAT, Grid Mgr

• Revolution R Enterprise ScaleR, IBM Platform LSF, Platform MPI Release 9

• Data set: 591 columnas con 5,000,000 filas

Page 22: Microsoft R Server

Revo product suite

• Distribución gratis y open source R• Mejorado y distribuido por Revolution

Analytics

Revolution R Open

• Seguridad, Escalable una Distribución de R con soporte

• Incluye componentes propietarios creados por Revolution Analytics

Revolution R Enterprise

Page 23: Microsoft R Server

Revolution R Enterprise (RRE)

Distribución Open Source de R:• Conectivida con objetos big-data• Big-data advanced analytics• Soporte multiplataforma• Análisis Predictivo In-Hadoop in-Teradata• Soporte para ambientes de desarrollo y

producción• Servicios de soporte técnico y

entrenamiento

R+C

RA

N

Rev

olut

ion

R O

pen

DistributedR

DeployR DevelopR

ScaleR

ConnectR

Page 24: Microsoft R Server

La Plataforma RRE

Rev

oR

DevelopR DeployR

R+C

RA

N

DistributedR

ScaleR

ConnectR

ConnectR• Contiene High-speed &

direct connectors

Available for:• High-performance XDF• Formato de archivos SAS,

SPSS, delimited & fixed format text

• Hadoop HDFS (texto & XDF)

• Teradata Database & Aster• EDWs and ADWs• ODBC

ScaleR• Incluye características

Ready-to-Use high-performance para big data big analytics

• Procesamiento analítico Fully-parallelized

• Estadística descriptive &

pruebas estadísticas• Incluye funciones

adicionales de análisis predictivo

• Herramientas para distribuir R algorithms entre nodos

• Soporte para Wide data – miles de variables

DistributedR• Framework de computación

distribuidad• Portabilidad multiplataformaDisponible en:• Windows Servers• Red Hat and SuSE Linux Servers• Teradata Database• Cloudera Hadoop• Hortonworks Hadoop• MapR Hadoop

R+CRAN• Open source R interpreter

• R 3.2.2• Gran cantidad de algoritmos gratuitos• Algoritmos utilizados por RevoR• Embeddable R scripts• 100% Compatible con R scripts,

funcionesy paquetesRevoR• Intérprete de R con mejora

de desempeño• Basado en el open source

R• Agrega high-performance

math library acelerar las funciones de algebra lineal

Page 25: Microsoft R Server

Integración de R dentro de SQL Server 2016

exec sp_configure 'external scripts enabled', 1; reconfigure;

"C:\Program files\RRO\RRO-3.2.2-for-RRE-7.5.0\R-3.2.2\library\RevoScaleR\rxLibs\x64\registerRext.exe" /install

Page 26: Microsoft R Server

Integración de R dentro de SQL Server 2016

USE <target database name> GO CREATE LOGIN [<login name>] WITH PASSWORD= '<password>', CHECK_EXPIRATION=OFF, CHECK_POLICY=OFF; CREATE USER [<user name>] FOR LOGIN [<login name>] WITH DEFAULT_SCHEMA=[db_datareader] ALTER ROLE [db_datareader] ADD MEMBER [<user name>]

Page 27: Microsoft R Server

Integración de R dentro de SQL Server 2016

USE [master] GO CREATE USER [<user name>] FOR LOGIN [<login name>] WITH DEFAULT_SCHEMA=[db_rrerole] ALTER ROLE [db_rrerole] ADD MEMBER [<user name>]

Page 28: Microsoft R Server

Demostración

Instalación de R Server e Integración con SQL Server 2016

Page 29: Microsoft R Server

Posibles herramientas cliente

Page 30: Microsoft R Server

RRE: escalar a grandes volúmenes de datos

“Fragmentación“ de datos alivia los límites de memoria

Volumen limitado sólo por la capacidad de almacenamiento

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 421.0 6 160.0 110 3.90 2.875 17.02 0 1 4 422.8 4 108.0 93 3.85 2.320 18.61 1 1 4 121.4 6 258.0 110 3.08 3.215 19.44 1 0 3 118.7 8 360.0 175 3.15 3.440 17.02 0 0 3 218.1 6 225.0 105 2.76 3.460 20.22 1 0 3 114.3 8 360.0 245 3.21 3.570 15.84 0 0 3 424.4 4 146.7 62 3.69 3.190 20.00 1 0 4 222.8 4 140.8 95 3.92 3.150 22.90 1 0 4 219.2 6 167.6 123 3.92 3.440 18.30 1 0 4 417.8 6 167.6 123 3.92 3.440 18.90 1 0 4 416.4 8 275.8 180 3.07 4.070 17.40 0 0 3 317.3 8 275.8 180 3.07 3.730 17.60 0 0 3 315.2 8 275.8 180 3.07 3.780 18.00 0 0 3 310.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4. . .

mpg cyl disp hp drat wt qsec vs am gear carb

Page 31: Microsoft R Server

RRE: escalar a grandes volúmenes de datos

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 421.0 6 160.0 110 3.90 2.875 17.02 0 1 4 422.8 4 108.0 93 3.85 2.320 18.61 1 1 4 121.4 6 258.0 110 3.08 3.215 19.44 1 0 3 118.7 8 360.0 175 3.15 3.440 17.02 0 0 3 218.1 6 225.0 105 2.76 3.460 20.22 1 0 3 114.3 8 360.0 245 3.21 3.570 15.84 0 0 3 424.4 4 146.7 62 3.69 3.190 20.00 1 0 4 222.8 4 140.8 95 3.92 3.150 22.90 1 0 4 219.2 6 167.6 123 3.92 3.440 18.30 1 0 4 417.8 6 167.6 123 3.92 3.440 18.90 1 0 4 416.4 8 275.8 180 3.07 4.070 17.40 0 0 3 317.3 8 275.8 180 3.07 3.730 17.60 0 0 3 315.2 8 275.8 180 3.07 3.780 18.00 0 0 3 310.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4. . .

mpg cyl disp hp drat wt qsec vs am gear carbEn un archivo de xdf (local)

21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 421.0 6 160.0 110 3.90 2.875 17.02 0 1 4 422.8 4 108.0 93 3.85 2.320 18.61 1 1 4 121.4 6 258.0 110 3.08 3.215 19.44 1 0 3 118.7 8 360.0 175 3.15 3.440 17.02 0 0 3 218.1 6 225.0 105 2.76 3.460 20.22 1 0 3 114.3 8 360.0 245 3.21 3.570 15.84 0 0 3 424.4 4 146.7 62 3.69 3.190 20.00 1 0 4 222.8 4 140.8 95 3.92 3.150 22.90 1 0 4 219.2 6 167.6 123 3.92 3.440 18.30 1 0 4 417.8 6 167.6 123 3.92 3.440 18.90 1 0 4 416.4 8 275.8 180 3.07 4.070 17.40 0 0 3 317.3 8 275.8 180 3.07 3.730 17.60 0 0 3 315.2 8 275.8 180 3.07 3.780 18.00 0 0 3 310.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

mpg cyl disp hp drat wt qsec vs am gear carb

Page 32: Microsoft R Server

RRE: escalar a grandes volúmenes de datos

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 421.0 6 160.0 110 3.90 2.875 17.02 0 1 4 422.8 4 108.0 93 3.85 2.320 18.61 1 1 4 121.4 6 258.0 110 3.08 3.215 19.44 1 0 3 118.7 8 360.0 175 3.15 3.440 17.02 0 0 3 218.1 6 225.0 105 2.76 3.460 20.22 1 0 3 114.3 8 360.0 245 3.21 3.570 15.84 0 0 3 424.4 4 146.7 62 3.69 3.190 20.00 1 0 4 222.8 4 140.8 95 3.92 3.150 22.90 1 0 4 219.2 6 167.6 123 3.92 3.440 18.30 1 0 4 417.8 6 167.6 123 3.92 3.440 18.90 1 0 4 416.4 8 275.8 180 3.07 4.070 17.40 0 0 3 317.3 8 275.8 180 3.07 3.730 17.60 0 0 3 315.2 8 275.8 180 3.07 3.780 18.00 0 0 3 310.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4. . .

mpg cyl disp hp drat wt qsec vs am gear carb Teradata

VAMPs

Teradata Database

ODBC

Revolution R Enterprise

Data Segments

Database Nodes

Hybrid Storage

ParseEngine

External Stored Procedure

Table Operator

Table Operator

Table Operator

Table Operator

Desktops & Servers

21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 421.0 6 160.0 110 3.90 2.875 17.02 0 1 4 422.8 4 108.0 93 3.85 2.320 18.61 1 1 4 121.4 6 258.0 110 3.08 3.215 19.44 1 0 3 118.7 8 360.0 175 3.15 3.440 17.02 0 0 3 218.1 6 225.0 105 2.76 3.460 20.22 1 0 3 114.3 8 360.0 245 3.21 3.570 15.84 0 0 3 424.4 4 146.7 62 3.69 3.190 20.00 1 0 4 222.8 4 140.8 95 3.92 3.150 22.90 1 0 4 219.2 6 167.6 123 3.92 3.440 18.30 1 0 4 417.8 6 167.6 123 3.92 3.440 18.90 1 0 4 416.4 8 275.8 180 3.07 4.070 17.40 0 0 3 317.3 8 275.8 180 3.07 3.730 17.60 0 0 3 315.2 8 275.8 180 3.07 3.780 18.00 0 0 3 310.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

Page 33: Microsoft R Server

RRE: escalar a grandes volúmenes de datos

Slave node

Task tracker

Master node

Job tracker

Hadoop

Slave node

Task tracker

Slave node

Task tracker

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 421.0 6 160.0 110 3.90 2.875 17.02 0 1 4 422.8 4 108.0 93 3.85 2.320 18.61 1 1 4 121.4 6 258.0 110 3.08 3.215 19.44 1 0 3 118.7 8 360.0 175 3.15 3.440 17.02 0 0 3 218.1 6 225.0 105 2.76 3.460 20.22 1 0 3 114.3 8 360.0 245 3.21 3.570 15.84 0 0 3 424.4 4 146.7 62 3.69 3.190 20.00 1 0 4 222.8 4 140.8 95 3.92 3.150 22.90 1 0 4 219.2 6 167.6 123 3.92 3.440 18.30 1 0 4 417.8 6 167.6 123 3.92 3.440 18.90 1 0 4 416.4 8 275.8 180 3.07 4.070 17.40 0 0 3 317.3 8 275.8 180 3.07 3.730 17.60 0 0 3 315.2 8 275.8 180 3.07 3.780 18.00 0 0 3 310.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4. . .

mpg cyl disp hp drat wt qsec vs am gear carb

21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 421.0 6 160.0 110 3.90 2.875 17.02 0 1 4 422.8 4 108.0 93 3.85 2.320 18.61 1 1 4 121.4 6 258.0 110 3.08 3.215 19.44 1 0 3 118.7 8 360.0 175 3.15 3.440 17.02 0 0 3 218.1 6 225.0 105 2.76 3.460 20.22 1 0 3 114.3 8 360.0 245 3.21 3.570 15.84 0 0 3 424.4 4 146.7 62 3.69 3.190 20.00 1 0 4 222.8 4 140.8 95 3.92 3.150 22.90 1 0 4 219.2 6 167.6 123 3.92 3.440 18.30 1 0 4 417.8 6 167.6 123 3.92 3.440 18.90 1 0 4 416.4 8 275.8 180 3.07 4.070 17.40 0 0 3 317.3 8 275.8 180 3.07 3.730 17.60 0 0 3 315.2 8 275.8 180 3.07 3.780 18.00 0 0 3 310.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 414.7 8 440.0 230 3.23 5.345 17.42 0 0 3 432.4 4 78.7 66 4.08 2.200 19.47 1 1 4 130.4 4 75.7 52 4.93 1.615 18.52 1 1 4 233.9 4 71.1 65 4.22 1.835 19.90 1 1 4 121.5 4 120.1 97 3.70 2.465 20.01 1 0 3 115.5 8 318.0 150 2.76 3.520 16.87 0 0 3 215.2 8 304.0 150 3.15 3.435 17.30 0 0 3 213.3 8 350.0 245 3.73 3.840 15.41 0 0 3 419.2 8 400.0 175 3.08 3.845 17.05 0 0 3 227.3 4 79.0 66 4.08 1.935 18.90 1 1 4 126.0 4 120.3 91 4.43 2.140 16.70 0 1 5 230.4 4 95.1 113 3.77 1.513 16.90 1 1 5 215.8 8 351.0 264 4.22 3.170 14.50 0 1 5 419.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6. . .

Page 34: Microsoft R Server

RRE: cómputo distribuido

Ningún movimiento de datos Establecer el contexto de cálculo determina donde

se realiza la transformación

VAMPs

Teradata Database

ODBC

Revolution R Enterprise

Data Segments

Database Nodes

Hybrid Storage

ParseEngine

External Stored Procedure

Table Operator

Table Operator

Table Operator

Table Operator

Desktops & Servers

Page 35: Microsoft R Server

Contexto de cómputo local

### LOCAL COMPUTE CONTEXT ### rxSetComputeContext("local")

### CREATE DIRECTORY AND FILE OBJECTS ###AirlineDatabase <-file.path("datasets","AirlineDemoSmall")AirlineDataSet <- RxXdfData(file.path(AirlineDatabase,"AirlineDemoSmall.xdf"))

### ANALYTICAL PROCESSING ###### Statistical Summary of the datarxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)

### CrossTab the datarxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)

### Linear Model and plotarrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(arrLateLinMod$coefficients)

Page 36: Microsoft R Server

Remote Compute: Teradata

### SETUP TERADATA ENVIRONMENT VARIABLES ###dbConnStr <- "Driver=Teradata; Server=dbHostName; Database=RevoDb; Uid=xxxx; pwd=xxxx"myTeradataCC <- RxInTeradata(connectionString = dbConnStr, shareDir = "/tmp",    remoteShareDir = "/tmp/revoJobs", revoPath = "/usr/lib64/Revo-7.0/R-3.0.2/lib64/R")

### TERADATA COMPUTE CONTEXT ###rxSetComputeContext(myTeradataCC)

### CREATE TERADATA DATA SOURCE ###AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxTeradata(connectionString = dbConnStr, sqlQuery = AirlineDemoQuery)

### ANALYTICAL PROCESSING ###### Statistical Summary of the datarxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)

### CrossTab the datarxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)

### Linear Model and plotarrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(arrLateLinMod$coefficients)

Page 37: Microsoft R Server

Remote compute: Hadoop

### SETUP HADOOP ENVIRONMENT VARIABLES ###myNameNode <- "master"myUser <- "root"myPort <- 8020myHadoopCluster <- RxHadoopMR(sshUsername = myUser, sshHostname = myNameNode, port = myPort)

### HADOOP COMPUTE CONTEXT USING HDFS ###rxSetComputeContext(myHadoopCluster)

### CREATE HDFS, DIRECTORY AND FILE OBJECTS ###hdfsFS <- RxHdfsFileSystem(hostName=myNameNode, port=myPort)AirlineDatabase <-file.path("datasets","AirlineDemoSmall")AirlineDataSet <- RxXdfData(file.path(AirlineDatabase,"AirlineDemoSmall.xdf"), fileSystem = hdfsFS)

### ANALYTICAL PROCESSING ###### Statistical Summary of the datarxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)

### CrossTab the datarxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)

### Linear Model and plotarrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(arrLateLinMod$coefficients)

Page 38: Microsoft R Server

Contexto remoto: SQL Server *### SETUP SQL SERVER ENVIRONMENT VARIABLES ###

dbConnStr <- "Driver=SQL Server; Server=dbHostName; Database=RevoDb; Uid=xxxx; pwd=xxxx"mySqlServerCC <- RxInSqlServer(connectionString = dbConnStr, consoleOutput = TRUE)

### SQL SERVER COMPUTE CONTEXT ###rxSetComputeContext(mySqlServerCC)

### CREATE SQL SERVER DATA SOURCE ###AirlineDemoQuery <- "SELECT * FROM AirlineDemoSmall;" AirlineDataSet <- RxSqlServer(connectionString = dbConnStr, sqlQuery = AirlineDemoQuery)

### ANALYTICAL PROCESSING ###### Statistical Summary of the datarxSummary(~ArrDelay+DayOfWeek, data= AirlineDataSet, reportProgress=1)

### CrossTab the datarxCrossTabs(ArrDelay ~ DayOfWeek, data= AirlineDataSet, means=T)

### Linear Model and plotarrLateLinMod <- rxLinMod(ArrDelay ~ DayOfWeek + 0 , data = AirlineDataSet) plot(arrLateLinMod$coefficients) * In 2016

Page 39: Microsoft R Server

ScaleR funciones y algoritmosData step

Data import – delimited, fixed, SAS, SPSS, ODBCVariable creation & transformationRecode variablesFactor variablesMissing value handlingSort, merge, splitAggregate by category (means, sums)

Descriptive statisticsMin / Max, Mean, Median (approx.)Quantiles (approx.)Standard deviationVarianceCorrelationCovarianceSum of squares (cross product matrix for set variables)Pairwise cross tabsRisk ratio & odds ratioCrosstabulation of data (standard tables & long form)Marginal summaries of crosstabulations

Statistical testsChi square testKendall rank correlationFisher’s exact testStudent’s t-test

SamplingSubsample (observations & variables)Random sampling

Predictive modelsSum of squares (cross product matrix for set variables)Multiple linear regressionGeneralized linear models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions.Covariance & correlation matricesLogistic regressionClassification & regression treesPredictions/scoring for modelsResiduals for all models

Variable selectionStepwise regression

SimulationSimulation (eg Monte Carlo)Parallel random number generation

Cluster analysisK-means clustering

ClassificationDecision forests (random forests)Decision treesGradient boosted decision treesNaïve Bayes

CombinationPEMA APIrxDataSteprxExec

Page 40: Microsoft R Server

DeployR

Marco de I como un servicio para aplicaciones de BI / web

Page 41: Microsoft R Server

Arquitectura de DeployR