R Lightning Talks @ BURN (2014-01-15)

Embed Size (px)

Citation preview

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    1/163

    2014. janur 15.

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    2/163

    01 Arat Bence: Ki szereti az R-t?02 Bod Lszl: R s !""0= Bur#er !sa$a: % nz'i d(nt se) *izs#lata l+e4-#'el04 ,arczi er#el': 9z(*e#es jelent se) ) sz>t se05 /or*t er#el': R a i*atalos statiszti)$an0 Kocsis +re: R/adoo . a Reduce R-$en0@ K(les ri Lszl: 3, atc ,as $oard0 dud*ari '(r#': R 6 %'t on colla$oration

    0 7ttucs) '(r#': 7nline 8orecastin# A lication10 9aln)i #nes: Ano+liadete)tls R-rel11 ;t er#el': R +int 9 esz)(z12

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    3/163

    Arat Bence:Ki szereti az R-t?

    BI Consulting

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    4/163

    Arat Bence:Ki szereti az R-t?

    Bod Lszl:

    R s !""

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    5/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    6/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    7/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    8/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    9/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    10/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    11/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    12/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    13/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    14/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    15/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    16/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    17/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    18/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    19/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    20/163

    Bod Lszl:R s !""

    Burg Analytics

    Bur#er !sa$a:

    % nz'i d(nt se)*izs#lata l+e4-#'el

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    21/163

    Pnzgyi dntsek vizsglata az lme4 csom

    Burger Csaba, PhD

    Budapest Users of R Network Meetup2014. Janur 15.

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    22/163

    Vrhat llami nyugdj az utols vi fizets

    szzalkbantlagbrt keres frfi alkalmazott, megszakts nlkli karrier

    Germany (reform el tt)

    E iSlovak RepublicDenmark

    IrelandSweden

    United KingdomJapan

    SwitzerlandCanadaUnited States

    Germany (reform utn)Belgium

    KoreaNorway

    Czech RepublicPortugalFinland

    Italy

    Austria

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    23/163

    Mennyit teszel flre a nyugdjra?

    Attl fgg, hogy Hny ves vagy N vagy frfi vagy Meg akarsz takartani (nem a munkltat finans ...

    ezeket az sszefggseket befolysolja, hogy hol lsz

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    24/163

    tlagos megtakartsi sszegek

    p.a.

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    25/163

    Megtakartsi sszeg magyarzata egy sima

    regresszival

    ++= k

    k k X Savings )ln(

    Hny ves N vagy frfi

    Finanszrozsi forma

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    26/163

    Megtakartsi sszeg magyarzata egy sima

    regresszival

    ++= k

    k k X Savings )ln(

    Hny ves N vagy frfi

    Finanszrozsi forma

    Nem standardizlt rezi

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    27/163

    A megtakartsi sszeg az letkorral s a lakhellyel

    ltszik egyttmozogni

    600

    800

    1 000

    1 200

    1 400

    400

    200

    050 5445 4940 4435 3930 3425 2920 2420 l tt

    p.a.

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    28/163

    Gender gap : frfiak s n k megtakartsa kz

    klnbsg lakhely- s letkor-fgg

    12

    76

    96

    155

    107

    64

    41

    -42

    126129134

    113

    4036

    -18

    frfi - n (kelet)

    frfi - n (nyugat)

    p.a.

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    29/163

    Lme4-csomag: mixed effects regresszi, aho

    fggvny crossed effect-et is lehet v tesz

    library(lme4)

    fm

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    30/163

    Nem s Bundesland interakciBundesland s nem fixed effects rtkek

    -0,3

    -0,2

    -0,1

    0,0

    0,1

    0,2

    0,3

    0,4

    RP MSLBWNIHBNWHH HE BE SH BY BB

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    31/163

    38

    43

    47

    52

    55

    55

    59

    61

    62

    68

    72

    74

    77

    Nordrhein-Westfalen

    Niedersachsen

    Sachsen-Anhalt

    Hessen

    Schleswig-HolsteinHamburg

    Thringen

    Bremen

    Sachsen

    BayernBaden-Wrttemberg

    Rheinland-Pfalz

    Saarland

    N

    K

    Becslt megtakartsi gender gapEgy 40 ves biztostott esetben

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    32/163

    Random-effect a nemre s klnbsgk: sz

    megosztottsg

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    33/163

    Mik a terveid a nyugdjas korodra?

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    34/163

    Bur#er !sa$a:% nz'i d(nt se) *izs#lata l+e4-#'el

    rapporter net

    ,arczi er#el':.ander

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    35/163

    ,arczi -er#el':.ander

    KSH

    /or*t0 -er#el':

    R a 0i*atalosstatiszti)$an

    R a hivatalos statisztikban

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    36/163

    BURN lightning talk1

    R a hivatalos statisztikban

    Kitekints s

    tapasztalatok

    Mit csinl a statisztika?

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    37/163

    BURN lightning talk2

    Mit csinl a statisztika?

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    38/163

    BURN lightning talk3

    De mit adhat a hivatalos

    statisztiknak az

    R kzssg?

    CRAN official statistics task vie

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    39/163

    BURN lightning talk4

    CRAN official statistics task vie

    Komple! mintk Mintavtel

    "#l$ozs

    kalib%ls &ecslsek 'hibaszm(ts)

    Adatfeldolgozs

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    40/163

    BURN lightning talk5

    Adatfeldolgozs

    *ditls 'edit%+les) ,ibake%ess s sszef-ggsek ellen.%zse

    ,in$z/ adatok *lemzse '01M) s 2/tlsa

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    41/163

    BURN lightning talk6

    Mindenkinek van eg$ t%tnete

    M i R t ?

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    42/163

    BURN lightning talk7

    M i R t ?

    *!cel "A"3 "2"" Adatbziskezel.

    'Mi%t ne?)

    Mi%t lenne 4/?

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    43/163

    BURN lightning talk8

    Mi%t lenne 4/?

    Adat5viz+alizci/

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    44/163

    BURN lightning talk9

    Adat5viz+alizci/

    6 lib%a%$'R7D&C)

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    45/163

    BURN lightning talk10

    6 lib%a%$ R7D&C)

    Adatel%s eg$sze%8en Adatok Metaadatok

    "9: megszelid(tse ";l p%og%am felhasznlsa R5ben

    6 lib%a%$'tables)

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    46/163

    BURN lightning talk11

    6 lib%a%$ tables)

    /l alak(that/

    # 9. minta, kor, nemetab_9

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    47/163

    BURN lightning talk12

    p

    >/l hasznlhat/ *setenknt 4obban3 mint a nag$ok@

    &%mil$en feladat%a

    +1 R"t+dio

    K+tat/szoba

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    48/163

    BURN lightning talk13

    K+tat/ k+tasson Kevss vdett adat *g$%e tbb el%het. llomn$

    R el%het. lesz ,og$an?

    K+tat/szoba B

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    49/163

    BURN lightning talk14

    Adatvdelmi ellen.%zs Nem m csak #g$ k+tatgat+nk o+tp+t checking

    Rep%od+klhat/sg

    %ts-k az R5t

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    50/163

    BURN lightning talk 15

    g

    s mg eg$ kicsit tbb%e isE

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    51/163

    /or*t0 er#el':R a 0i*atalos statiszti)$an

    BME MIT

    Kocsis 1+re:

    R/adoo .2a Reduce R-$en

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    52/163

    Budapesti Mszaki s Gazdasgtudomnyi EgyetemMrstechnika s Informcis Rendszerek Tanszk

    RHadoop: MapReduce R-ben

    Kocsis [email protected]

    BURN Meetup, 2014.01.15.

    Egy/A Big Data problma

    mailto:[email protected]:[email protected]
  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    53/163

    Elosztott trols

    Computation to data

    At rest Big Datao

    Nincs updateo Mindent elemznk

    Not true, but a very, very good lie! (T. Pratchett, Nightwatch)

    MapReduce

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    54/163

    DistributedFile System

    [ , ][ , ]

    [ , ]

    [ , ][ , ]

    [ , ]

    [ , ][ , ]

    [ , ]

    [ , ][ , ]

    [ , ]

    [ , ][ , ]

    [ , ]

    [ ,[ , , ]]

    [ ,[ , , ]]

    [ ,[ , , ]]

    [ ,[ , , ]]

    [ ,[ , , ]]

    SHUFFLE

    Map

    Reduce

    [ , ] [ , ] [ , ] [ , ] [ , ]

    Szszmlls

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    55/163

    MapReduce stlusban szervezhet

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    56/163

    Ami zavarbaejten prhuzamos o embarrassingly parallel

    Statistical Query Model o Locally Weighted Linear Regression, Naive Bayes, Gaussian

    Discriminative Analysis, k-means, Logistic Regression,Neural Network, PCA, ICA, EM, SVM,

    Generalized Iterative Matrix-Vector mult. o PageRank, grftmr, sszefgg komponensek,

    RHadoop = Hadoop + R

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    57/163

    Hadoop

    HDFS

    [ , ] [ , ] [ , ]

    SHUFFLE

    M a p R e d u c e map(k,v)

    reduce(k,vv)mapreduce(...)

    RHadoop

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    58/163

    github.com/RevolutionAnalytics/RHadoop/

    The most mature [] project for R and Hadoop isRHadoop. (OReilly , R In a Nutshell, 2012)

    rmr : mapreduce rhdfs : HDFS llomnykezels

    rhbase, plyrmr

    rmr: mapreduce

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    59/163

    Local backend

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    60/163

    rmr.options(backend="local")

    Helyi llomnyrendszerSzekvencilis vgrehajts

    Debug!

    Input/output itt isllomnyrendszer

    Input/output format

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    61/163

    text json csv

    native (R sorosts) sequence.typedbytes (Hadoop) pig.hive

    hbase

    ElnykM

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    62/163

    Map s Reduce: R-beno

    Csomagok!o MR algoritmus- prototipizls

    + a vezrls is: knyelem

    Hadoop Job: egy fggvnyhvs!o Pl. iteratv MapReduce teljesen R-ben

    o Map s Reduce : ~a hv krnyezetben

    Hogyan lehet ilyenem?L l b k d db VM k

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    63/163

    Local backend, sandbox VM-eko

    Cloudera, Hortonworks

    Sajt Hadoop klaszter

    Amazon Elastic MapReduce (EMR)o Brelhet Hadoop klaszter

    Sajt felh megolds

    Rhadoop az Apache Virtual Computing Lab-ban

    El k ht k

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    64/163

    Elnyk s htrnyok

    Htrnyok?N h k d b

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    65/163

    Nehzkes debug

    +1 hangolsi rteg MAHOUT-kln Sok Hadoop funkc.

    Kevs plda

    Ritka esemnyek kategorizlsa RHadooppal

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    66/163

    Infrastr.-adatokSalnki gnes

    Mkdik.Jpr gotcha De inkbb, mint

    Java-ban

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    67/163

    Kocsis 1+re:R/adoo . 2a Reduce R-$en

    Planimter

    K(les ri Lszl:

    23, 4atc0,as0$oard

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    68/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    69/163

    2014.01.15 Budapest Users of R Network 2

    Data Source ( FDA )

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    70/163

    2014.01.15 Budapest Users of R Network 3

    ( )

    2012 Q4 Safety Alerts for Human MedicalProducts (Drugs, Biologics, Medical Devices,Special Nutritionals, and Cosmetics)

    The alerts contain actionable information thatmay impact both treatment and diagnosticchoices for healthcare professional andpatient .

    MED Watch Dashboard Viewershttp://medwatch.co.nf

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    71/163

    2014.01.15 Budapest Users of R Network 4

    pDecember of 2013

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    72/163

    2014.01.15 Budapest Users of R Network 5

    Data Clean - 1

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    73/163

    2014.01.15 Budapest Users of R Network 6

    The raw reported data have been cleanedaccording to the International Conference onHarmonisation (ICH) of TechnicalRequirements for Registration ofPharmaceuticals for Human Use .

    The verbatim reactions/indications have beencoded into the system organ class (SOC)

    using the Medical Dictionary for Regulatory Activities (MedDRA version 13.1) for coding of diseases/medical conditions.

    Data Clean - 2

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    74/163

    2014.01.15 Budapest Users of R Network 7

    ! The raw reported datasets have beentransferred into CDISC SDTM datasets, andalso into CDISC ADaM datasets which are thebasis for production of statistical graphs in Rstatistical package including: Shiny ,

    vcd (The conditional density plot - Hofmann and Theus2005) ,

    Basic Hexagon Binning Functions ( hexbin )

    rworldmap (joinCountryData2Map, mapCountryData) ,

    ggplot2 .

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    75/163

    2014.01.15 Budapest Users of R Network 8

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    76/163

    2014.01.15 Budapest Users of R Network 9

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    77/163

    2014.01.15 Budapest Users of R Network 10

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    78/163

    2014.01.15 Budapest Users of R Network 11

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    79/163

    2014.01.15 Budapest Users of R Network 12

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    80/163

    2014.01.15 Budapest Users of R Network 13

    LIE Factor - Edward Tufte

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    81/163

    2014.01.15 Budapest Users of R Network 14

    Define the maximum ideas to the audience: In the shortest time ,

    Minimize the number of "ink " , with the smallest optimal representation.

    Tell the truth about the data.

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    82/163

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    83/163

    K(les ri Lszl:23, 4atc0 ,as0$oard

    Qanopt

    5dud*ari -'(r#':R 6 %'t0on

    colla$oration

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    84/163

    Budapesti Mszaki s Gazdasgtudomnyi Egyetem Mrstechnika s Informcis Rendszerek Tanszk

    TekR edik a kgy avagy az R s Python sszekapcsolsnak lehetsgei

    Ndudvari Gyrgy [email protected]

    2014. j anur 15

    Kapcsolatom az R- rel s a Pythonnal

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    85/163

    Hello World!2002.12.26.- n Szmomra A nyelv

    o Webo Rendszeradminisztrci o Knyelmi funkcik

    Elszr 2012-bentallkoztunk Kevs tapasztalat

    o Logelemzs o Vizulis adatelemzs

    Erssgek s gyengesgek

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    86/163

    ltalnos hasznlatra Egyszer

    Statisztikai nyelvSzmomra nehzkes

    Plottok forrsa: http://ghalib.me/blog/a-superficial-comparison-of

    Erssgek s erssgek

    +

    http://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-of
  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    87/163

    Hatkony, gyorsfejleszts Kzssg

    Plottols Statisztikai csomagok

    Kzssg

    +

    =Profit

    Lehetsgek

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    88/163

    R Python rPythonRSPython

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    89/163

    R Python system()

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    90/163

    Python R rpy2

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    91/163

    Forrs: rpy2 dokumentci

    Python R rpy2 szintek, csomagok

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    92/163

    Python R rpy2 R session- k

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    93/163

    Python R rpy2Egy plda:

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    94/163

    Alkalmazs egy sajt projektnl

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    95/163

    IndokokHatrid Tbb t t l t P th l

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    96/163

    Tbb tapasztalat PythonnalMr meglv R-es kdbzis

    Elnyk Gyorsabb fejleszts E bb i t i

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    97/163

    Egyszerbb integrci Ksz R-es fggvnyek minimlis mdostsa Ignyes grafikonok

    sszefoglals Merjnk kilpni az R-es vilgbl! A k l gh tk bb k t h lj k!

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    98/163

    A szmunkra leghatkonyabb eszkzt hasznljuk! A lehetsg megvan

    Ksznm a figyelmet!

    @reedcourty

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    99/163

    5dud*ari -'(r#':R 6 %'t0on colla$oration

    7ttucs) -'(r#':7nline 8orecastin#

    A..lication

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    100/163

    Using R in Production

    Case Study:Online Forecasting

    Gyuri [email protected]

    1

    Background

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    101/163

    US startup

    My first big R project

    Early 2010

    Start as research pilot (plan: proto in R, afterJava:)

    2

    Sequential Sales Forecasting

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    102/163

    US Grocery stores

    Input Data previous days and earlier days sales data Historical & future Price (Promotion Calendars) Other inputs

    Forecast horizon: next 7-days

    3

    Sequential Sales Forecasting

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    103/163

    US Grocery stores

    Input Data previous days and earlier days sales data Historical & future Price (Promotion Calendars) Other inputs

    Forecast horizon: next 7-days

    4

    Some numbers

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    104/163

    2010 ~200 fcsts/day, 1 developer, laptop

    2013 ~100,000 fcsts/day, 3 developers+1 data ops,

    server (8 cores)

    ~2MB R code

    5

    Challenges

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    105/163

    Speed couple hours time window to process data &

    generate the forecast (should submit before 6amEastern time)

    More CPU cores

    Training/Back Testing File mutex

    6

    Challenges

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    106/163

    Speed couple hours time window to process data &

    generate the forecast (should submit before 6amEastern time)

    More CPU cores

    Training/Back Testing File mutex

    7

    Challenges II

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    107/163

    Maintenance

    Requirements High Availability Fault tolerance (e.g. Tornado)

    8

    Maintenance

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    108/163

    Early months manual operation Manually move the data/start the process Check the data quality

    Put together a doc and hand it over to ops

    9

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    109/163

    10

    Maintenance 2.0 First idea: crontab, etc..

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    110/163

    Second: Hudson-CI, http://hudson-ci.org/ Overkill?

    11

    Hudson

    http://hudson-ci.org/http://hudson-ci.org/http://hudson-ci.org/http://hudson-ci.org/
  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    111/163

    Monitoring executions of externally-run jobs

    Nice web UI

    Cron + procmail and lot more

    Tons of plugins

    12

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    112/163

    13

    Conclusion: R in production

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    113/163

    Pros: Quick prototyping No issue with speed

    Cons: Code maintenance is hard after several thousand

    lines

    14

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    114/163

    Thank you

    15

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    115/163

    7ttucs) '(r#':7nline 8orecastin# A..lication

    BME MIT

    9aln)i #nes:Ano+liadete)tlsR-rel

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    116/163

    Budapesti Mszaki s Gazdasgtudomnyi EgyetemMrstechnika s Informcis Rendszerek Tanszk

    Anomliadetektls R-rel

    Salnki gnes

    2014.01.15.

    Egy motivci (1949)

    Hadlum vs. Hadlum

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    117/163

    Forrs: http://www.siam.org/meetings/sdm10/tutorial3.pdf

    Egy motivci(1949)

    tlag:280 nap

    http://www.siam.org/meetings/sdm10/tutorial3.pdfhttp://www.siam.org/meetings/sdm10/tutorial3.pdf
  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    118/163

    Forrs: http://www.siam.org/meetings/sdm10/tutorial3.pdf

    (40 ht)

    Mrs. Hadlum:

    349

    Anomlia definci?ms a generl folyamat

    http://www.siam.org/meetings/sdm10/tutorial3.pdfhttp://www.siam.org/meetings/sdm10/tutorial3.pdf
  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    119/163

    Anomlia definci?

    exception

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    120/163

    anomaly

    surprise

    rare eventnoveltyoutlier

    aberration

    peculiarity

    discordant observations

    CsoportostsTvolsg alap

    o Befoglal burok:depth

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    121/163

    o MVE, MCD: MASSo BACON: robustXo DB: fields

    Srsg alapo LOF:DMwR

    o NNDB

    Tvolsg?

    Befoglal burok1D: min, max

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    122/163

    (bels: medin)

    2D: bef. poligon

    3 D:

    Befoglal burok: depth::depth

    MVEMinimum Volume

    Ellipsoid

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    123/163

    Kimert keresssel

    MVE: MASS::cov.rob

    BACON

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    124/163

    Ha sszefgg,sszefgg

    BACON: robustX::mvBacon

    DBHiba vagyunk a

    kzppontban, hanincsenek szomszdaink

    Distance-basedapproach

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    125/163

    DB: fields::fields.rdist.near

    LOF motivci2 sem,

    vagy 1 is?

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    126/163

    LOFLocal outlier factor

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    127/163

    Ha aszomszdaim

    is

    magnyosak,nincs nagygond

    LOF: DMwR::lofactor

    NNDB

    Hol vannak a

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    128/163

    nagy vltsok?

    Amire mi hasznljuk: teljestmnymen.

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    129/163

    Tvolsg vagy srsg alap?BACON: elg messze

    vanNNDB: de homogn

    srsg

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    130/163

    9 l )i #

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    131/163

    9aln)i #nes:Ano+liadete)tls R-rel

    rapporter.net

    ;t0 er#el':

    R +int 19 esz)(z

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    132/163

    Tth Gergely, Rapporter -Easystats/PERIPATO

    library (sp ) # Alap: a terleti adatok kezelse (classes )

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    133/163

    library (maptools ) # ESRI standardok kezelse

    library (rgeos ) # Tri objektumok manipullsa (GEOS )

    library (raster ) # grid, raster

    library (rasterVis ) # raster megjelents

    # sok- sok fggsg

    library (dismo ) # google maps hvsok (eredetileg: gmap +szksges rgdal )

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    134/163

    +szksges rgdal )

    mymap

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    135/163

    library (RgoogleMaps )

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    136/163

    PlotOnStaticMap (lat = c(36.3 , 35.8 , 36.4 ),lon = c(-5.5 , -5.6 , -5.8), zoom = 10,cex = 4, pch = 19, col = "red" ,

    FUN = points, add = F)# Ments fjlba: ujterkep

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    137/163

    # Mintaadatok: Nemes babr

    library (dismo )

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    138/163

    library (dismo ) laurus

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    139/163

    Goog e trkpen:locs.sp.coords

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    140/163

    # rworldmaplibrary (rworldmap ) data (coastsCoarse )

    plot (locs.sp, pch = 20, cex = 2,col = "steelblue" )title (Nemes babr

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    141/163

    ( )data (countriesLow ) plot (coastsCoarse, add = T)

    (Spanyolorszgban ") plot (countriesLow, add = T)

    library (googleVis )data (Exports ) # minta adatbzis # 'data.frame': 10 obs. of 3 variables:# $ C F / 10 l l

    # Andrew hurrikn: data (Andrew )M1

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    142/163

    # $ Country: Factor w/ 10 levels"Brazil","France",..: 3 1 10 2 4 6 5 7 8 9

    # $ Profit : num 3 4 5 4 3 2 1 4 5 1# $ Online : logi TRUE FALSE TRUE TRUE

    FALSE TRUE ..Geo

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    143/163

    library (rworldmap )

    data(" countryExData ",

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    144/163

    data( cou t y Data ,envir=environment(),package=" rworldmap ")

    sPDF

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    145/163

    Kontrtrkpek Trbeli autokorrelci

    Francisco Rodriguez-Sanchez: Spatial data inR U i R GIS

    http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/
  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    146/163

    Francisco Rodriguez Sanchez: Spatial data inR: Using R as a GIS

    CRAN Task View: Analysis of Spatial Data Making Maps with R

    Csomag dokumentcik

    ;t0 -er#el':

    http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://cran.r-project.org/web/views/Spatial.htmlhttp://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://cran.r-project.org/web/views/Spatial.htmlhttp://cran.r-project.org/web/views/Spatial.htmlhttp://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/
  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    147/163

    ;t0 er#el :R +int -19 esz)(z

    Qanopt

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    148/163

    1. Quanopt Ltd.

    MonetDB.Rcsomagbl

    Urbanics Gbor < [email protected]>

    Villmelads R meetup Budapest

    Problmafelvets

    Az R j, de a memriban kell lenniemindennek hamar elfogyhat

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    149/163

    2. Quanopt Ltd.

    j ,mindennek hamar elfogyhatErre a legegyszerbb megoldsok

    o File-backed csomagok hasznlata, pl.

    bigmemory csomagcsald, ff csomag o Amennyire lehet, hasznljunk adatbzist

    Trols + Feldolgozs

    MonetDB

    Relcis

    ID Day Discount10 4/4/98 0.195

    11 9/4/98 0.06512 1/2/98 0.175

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    150/163

    3. Quanopt Ltd.

    Oszlopszervezso V. soronknti

    13 7/2/98 0

    OID ID

    100 10101 11

    102 12

    103 13

    104 14

    OID Day

    100 4/4/98101 9/4/98

    102 1/2/98

    103 7/2/98

    104 1/2/99

    OID Discount

    100 0.195101 0.065

    102 0.175

    103 0

    104 0.065

    3 db kln fjl a diszken

    MonetDB elnyk

    A teljes oszlopo(ka )t rint lekrdezseknllesz hatkony (IO hozzfrs jobb)

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    151/163

    4. Quanopt Ltd.

    lesz hatkony (IO hozzfrs jobb)

    Egy oszlopot egyszer cache-elnio

    Memory mapped fileOszloponknt jobban tmrthetek az adatok

    o kisebb trhely

    o CPU- val fizetnk, a kevesebb IO- rt

    Frtbe is szervezhet

    MonetDB htrnyok

    Nem silver-bullet : mindenre nem lesz j

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    152/163

    5. Quanopt Ltd.

    Read-mostly hozzfrsre javasolt

    Itt is a kd a legbiztosabb dokumentci

    MonetDB s R integrci

    RODBC-n keresztl elrhet

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    153/163

    6. Quanopt Ltd.

    o Ilyet lttunk mr ez nem segtene sokat

    MonetDB.R csomag

    o MonetDB specifikus funkcionalits R - bl

    o Alapvet DB management (start/stop)

    o Adatelrs (termszetesen)

    MonetDB.R csomag DBI elrs

    DBI drivero dbConnect

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    154/163

    7. Quanopt Ltd.

    o dbSendUpdate ,

    # using DBI with MonetDB

    > conn result print(result)

    L11 29722533> str(result)'data.frame': 1 obs. of 1 variable: $ L1: num 29722533

    MonetDB.R csomag monet.frame

    Egy data.frame szer osztly

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    155/163

    8. Quanopt Ltd.

    o Proxy objektum egy adatbzis tblhoz

    Egyszer mveletek adatbzis oldalon

    o Az R hvsokat SQL lekrdezsekk rja t

    MonetDB.R csomag monet.frame> mframe

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    156/163

    9. Quanopt Ltd.

    > str(mframe)

    MonetDB-backed data.frame surrogate 3 columns, 12500 rows Query: SELECT * FROM demotable Columns: col1 (numeric), col2 (numeric), col3 (numeric)

    lekrdezseket!

    MonetDB.R csomag monet.frame> res

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    157/163

    10. Quanopt Ltd.

    QQ: 'SELECT col3 FROM demotable QQ: 'SELECT 10*(col3) FROM demotable QQ: 'SELECT COS(10*(col3)) FROM demotable QQ: 'SELECT SQRT(COS(10*(col3))) FROM demotable

    QQ: 'SELECT ((col1)+(col2))/(SQRT(COS(10*(col3)))) FROMdemotable'

    a lekrdezseket,de nem hajtja vgre

    MonetDB.R csomag monet.frame> head(res)

    QQ: 'SELECT ((col1)+(col2))/(SQRT(COS(10*(col3)))) FROM

    demotable LIMIT 6 OFFSET 0 II: 'Re- Initializing column info. EX 'SELECT (( l1) ( l2))/(SQRT(COS(10*( l3)))) FROM

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    158/163

    11. Quanopt Ltd.

    EX: 'SELECT ((col1)+(col2))/(SQRT(COS(10*(col3)))) FROMdemotable LIMIT 6 OFFSET 0

    sql_div_sql_add_col11 4.6613272 NA3 NA4 6.2608295 6.6401086 -21.836528

    Szmos R fggvny s opertor SQL oldaliimplementcija szerepel a csomagban

    b [

    MonetDB.R csomag monet.frame

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    159/163

    12. Quanopt Ltd.

    o Subset , [ opertor o Aritmetikai mveletek: +, -, *, /, ^, %%, %/%o Logikai opertorok: &, |, !o min, max, mean, sd, var, median, quantile,

    tabulateo abs, sign, sqrt, floor, ceiling, trunc, round, signifo exp, log, expm1, log1p, cos, sin, tan, acos, asin,

    atan, cosh, sinh, tanh, acosh,asinh, atanh

    MonetDB.R csomag htrnyok

    Az tlet (R SQL trs) j , deo Az implementciban vannak azrt hibk( l li i l h lj k )

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    160/163

    13. Quanopt Ltd.

    (pl. limit nev vltozt ne hasznljunk )o Az mveletek elemkszlete korltozott

    Teljesen transzparens nem lesz

    MonetDB.R csomag htrnyok

    Tetszleges meglv fggvnyt nemfogunk tudni hasznlni

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    161/163

    14. Quanopt Ltd.

    > ggplot(data=mframe, aes(x=col1)) + geom_histogram()Error: ggplot2 doesn't know how to deal with data of classmonet.frame

    Termszetesen mi rhatunk olyan sajt fggvnyt, amitmogatja

    MonetDB.R hasonl megoldsok

    Oracle R Enterpriseo R fggvnyek vgrehajtsa Oracled tb i k

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    162/163

    15. Quanopt Ltd.

    adatbzisokon IBM Netezza R csomagok

    o

    nzR, nzA, nzMatrixTeradata

    o teradataR csomag

    01 Arat Bence: Ki szereti az R-t?

    02 Bod Lszl: R s !""0= Bur#er !sa$a: % nz'i d(nt se) *izs#lata l+e4-#'el04 i # l' 9 (* # j l ) ) >

  • 7/22/2019 R Lightning Talks @ BURN (2014-01-15)

    163/163

    04 ,arczi er#el': 9z(*e#es jelent se) ) sz>t se05 /or*t er#el': R a i*atalos statiszti)$an0 Kocsis +re: R/adoo . a Reduce R-$en0@ K(les ri Lszl: 3, atc ,as $oard0 dud*ari '(r#': R 6 %'t on colla$oration0 7ttucs) '(r#': 7nline 8orecastin# A lication10 9aln)i #nes: Ano+liadete)tls R-rel

    11 ;t er#el': R +int 9 esz)(z12