Upload
rapporternet
View
214
Download
0
Embed Size (px)
Citation preview
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
1/163
2014. janur 15.
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
2/163
01 Arat Bence: Ki szereti az R-t?02 Bod Lszl: R s !""0= Bur#er !sa$a: % nz'i d(nt se) *izs#lata l+e4-#'el04 ,arczi er#el': 9z(*e#es jelent se) ) sz>t se05 /or*t er#el': R a i*atalos statiszti)$an0 Kocsis +re: R/adoo . a Reduce R-$en0@ K(les ri Lszl: 3, atc ,as $oard0 dud*ari '(r#': R 6 %'t on colla$oration
0 7ttucs) '(r#': 7nline 8orecastin# A lication10 9aln)i #nes: Ano+liadete)tls R-rel11 ;t er#el': R +int 9 esz)(z12
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
3/163
Arat Bence:Ki szereti az R-t?
BI Consulting
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
4/163
Arat Bence:Ki szereti az R-t?
Bod Lszl:
R s !""
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
5/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
6/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
7/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
8/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
9/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
10/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
11/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
12/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
13/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
14/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
15/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
16/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
17/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
18/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
19/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
20/163
Bod Lszl:R s !""
Burg Analytics
Bur#er !sa$a:
% nz'i d(nt se)*izs#lata l+e4-#'el
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
21/163
Pnzgyi dntsek vizsglata az lme4 csom
Burger Csaba, PhD
Budapest Users of R Network Meetup2014. Janur 15.
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
22/163
Vrhat llami nyugdj az utols vi fizets
szzalkbantlagbrt keres frfi alkalmazott, megszakts nlkli karrier
Germany (reform el tt)
E iSlovak RepublicDenmark
IrelandSweden
United KingdomJapan
SwitzerlandCanadaUnited States
Germany (reform utn)Belgium
KoreaNorway
Czech RepublicPortugalFinland
Italy
Austria
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
23/163
Mennyit teszel flre a nyugdjra?
Attl fgg, hogy Hny ves vagy N vagy frfi vagy Meg akarsz takartani (nem a munkltat finans ...
ezeket az sszefggseket befolysolja, hogy hol lsz
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
24/163
tlagos megtakartsi sszegek
p.a.
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
25/163
Megtakartsi sszeg magyarzata egy sima
regresszival
++= k
k k X Savings )ln(
Hny ves N vagy frfi
Finanszrozsi forma
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
26/163
Megtakartsi sszeg magyarzata egy sima
regresszival
++= k
k k X Savings )ln(
Hny ves N vagy frfi
Finanszrozsi forma
Nem standardizlt rezi
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
27/163
A megtakartsi sszeg az letkorral s a lakhellyel
ltszik egyttmozogni
600
800
1 000
1 200
1 400
400
200
050 5445 4940 4435 3930 3425 2920 2420 l tt
p.a.
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
28/163
Gender gap : frfiak s n k megtakartsa kz
klnbsg lakhely- s letkor-fgg
12
76
96
155
107
64
41
-42
126129134
113
4036
-18
frfi - n (kelet)
frfi - n (nyugat)
p.a.
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
29/163
Lme4-csomag: mixed effects regresszi, aho
fggvny crossed effect-et is lehet v tesz
library(lme4)
fm
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
30/163
Nem s Bundesland interakciBundesland s nem fixed effects rtkek
-0,3
-0,2
-0,1
0,0
0,1
0,2
0,3
0,4
RP MSLBWNIHBNWHH HE BE SH BY BB
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
31/163
38
43
47
52
55
55
59
61
62
68
72
74
77
Nordrhein-Westfalen
Niedersachsen
Sachsen-Anhalt
Hessen
Schleswig-HolsteinHamburg
Thringen
Bremen
Sachsen
BayernBaden-Wrttemberg
Rheinland-Pfalz
Saarland
N
K
Becslt megtakartsi gender gapEgy 40 ves biztostott esetben
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
32/163
Random-effect a nemre s klnbsgk: sz
megosztottsg
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
33/163
Mik a terveid a nyugdjas korodra?
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
34/163
Bur#er !sa$a:% nz'i d(nt se) *izs#lata l+e4-#'el
rapporter net
,arczi er#el':.ander
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
35/163
,arczi -er#el':.ander
KSH
/or*t0 -er#el':
R a 0i*atalosstatiszti)$an
R a hivatalos statisztikban
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
36/163
BURN lightning talk1
R a hivatalos statisztikban
Kitekints s
tapasztalatok
Mit csinl a statisztika?
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
37/163
BURN lightning talk2
Mit csinl a statisztika?
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
38/163
BURN lightning talk3
De mit adhat a hivatalos
statisztiknak az
R kzssg?
CRAN official statistics task vie
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
39/163
BURN lightning talk4
CRAN official statistics task vie
Komple! mintk Mintavtel
"#l$ozs
kalib%ls &ecslsek 'hibaszm(ts)
Adatfeldolgozs
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
40/163
BURN lightning talk5
Adatfeldolgozs
*ditls 'edit%+les) ,ibake%ess s sszef-ggsek ellen.%zse
,in$z/ adatok *lemzse '01M) s 2/tlsa
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
41/163
BURN lightning talk6
Mindenkinek van eg$ t%tnete
M i R t ?
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
42/163
BURN lightning talk7
M i R t ?
*!cel "A"3 "2"" Adatbziskezel.
'Mi%t ne?)
Mi%t lenne 4/?
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
43/163
BURN lightning talk8
Mi%t lenne 4/?
Adat5viz+alizci/
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
44/163
BURN lightning talk9
Adat5viz+alizci/
6 lib%a%$'R7D&C)
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
45/163
BURN lightning talk10
6 lib%a%$ R7D&C)
Adatel%s eg$sze%8en Adatok Metaadatok
"9: megszelid(tse ";l p%og%am felhasznlsa R5ben
6 lib%a%$'tables)
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
46/163
BURN lightning talk11
6 lib%a%$ tables)
/l alak(that/
# 9. minta, kor, nemetab_9
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
47/163
BURN lightning talk12
p
>/l hasznlhat/ *setenknt 4obban3 mint a nag$ok@
&%mil$en feladat%a
+1 R"t+dio
K+tat/szoba
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
48/163
BURN lightning talk13
K+tat/ k+tasson Kevss vdett adat *g$%e tbb el%het. llomn$
R el%het. lesz ,og$an?
K+tat/szoba B
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
49/163
BURN lightning talk14
Adatvdelmi ellen.%zs Nem m csak #g$ k+tatgat+nk o+tp+t checking
Rep%od+klhat/sg
%ts-k az R5t
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
50/163
BURN lightning talk 15
g
s mg eg$ kicsit tbb%e isE
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
51/163
/or*t0 er#el':R a 0i*atalos statiszti)$an
BME MIT
Kocsis 1+re:
R/adoo .2a Reduce R-$en
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
52/163
Budapesti Mszaki s Gazdasgtudomnyi EgyetemMrstechnika s Informcis Rendszerek Tanszk
RHadoop: MapReduce R-ben
Kocsis [email protected]
BURN Meetup, 2014.01.15.
Egy/A Big Data problma
mailto:[email protected]:[email protected]7/22/2019 R Lightning Talks @ BURN (2014-01-15)
53/163
Elosztott trols
Computation to data
At rest Big Datao
Nincs updateo Mindent elemznk
Not true, but a very, very good lie! (T. Pratchett, Nightwatch)
MapReduce
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
54/163
DistributedFile System
[ , ][ , ]
[ , ]
[ , ][ , ]
[ , ]
[ , ][ , ]
[ , ]
[ , ][ , ]
[ , ]
[ , ][ , ]
[ , ]
[ ,[ , , ]]
[ ,[ , , ]]
[ ,[ , , ]]
[ ,[ , , ]]
[ ,[ , , ]]
SHUFFLE
Map
Reduce
[ , ] [ , ] [ , ] [ , ] [ , ]
Szszmlls
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
55/163
MapReduce stlusban szervezhet
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
56/163
Ami zavarbaejten prhuzamos o embarrassingly parallel
Statistical Query Model o Locally Weighted Linear Regression, Naive Bayes, Gaussian
Discriminative Analysis, k-means, Logistic Regression,Neural Network, PCA, ICA, EM, SVM,
Generalized Iterative Matrix-Vector mult. o PageRank, grftmr, sszefgg komponensek,
RHadoop = Hadoop + R
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
57/163
Hadoop
HDFS
[ , ] [ , ] [ , ]
SHUFFLE
M a p R e d u c e map(k,v)
reduce(k,vv)mapreduce(...)
RHadoop
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
58/163
github.com/RevolutionAnalytics/RHadoop/
The most mature [] project for R and Hadoop isRHadoop. (OReilly , R In a Nutshell, 2012)
rmr : mapreduce rhdfs : HDFS llomnykezels
rhbase, plyrmr
rmr: mapreduce
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
59/163
Local backend
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
60/163
rmr.options(backend="local")
Helyi llomnyrendszerSzekvencilis vgrehajts
Debug!
Input/output itt isllomnyrendszer
Input/output format
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
61/163
text json csv
native (R sorosts) sequence.typedbytes (Hadoop) pig.hive
hbase
ElnykM
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
62/163
Map s Reduce: R-beno
Csomagok!o MR algoritmus- prototipizls
+ a vezrls is: knyelem
Hadoop Job: egy fggvnyhvs!o Pl. iteratv MapReduce teljesen R-ben
o Map s Reduce : ~a hv krnyezetben
Hogyan lehet ilyenem?L l b k d db VM k
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
63/163
Local backend, sandbox VM-eko
Cloudera, Hortonworks
Sajt Hadoop klaszter
Amazon Elastic MapReduce (EMR)o Brelhet Hadoop klaszter
Sajt felh megolds
Rhadoop az Apache Virtual Computing Lab-ban
El k ht k
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
64/163
Elnyk s htrnyok
Htrnyok?N h k d b
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
65/163
Nehzkes debug
+1 hangolsi rteg MAHOUT-kln Sok Hadoop funkc.
Kevs plda
Ritka esemnyek kategorizlsa RHadooppal
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
66/163
Infrastr.-adatokSalnki gnes
Mkdik.Jpr gotcha De inkbb, mint
Java-ban
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
67/163
Kocsis 1+re:R/adoo . 2a Reduce R-$en
Planimter
K(les ri Lszl:
23, 4atc0,as0$oard
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
68/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
69/163
2014.01.15 Budapest Users of R Network 2
Data Source ( FDA )
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
70/163
2014.01.15 Budapest Users of R Network 3
( )
2012 Q4 Safety Alerts for Human MedicalProducts (Drugs, Biologics, Medical Devices,Special Nutritionals, and Cosmetics)
The alerts contain actionable information thatmay impact both treatment and diagnosticchoices for healthcare professional andpatient .
MED Watch Dashboard Viewershttp://medwatch.co.nf
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
71/163
2014.01.15 Budapest Users of R Network 4
pDecember of 2013
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
72/163
2014.01.15 Budapest Users of R Network 5
Data Clean - 1
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
73/163
2014.01.15 Budapest Users of R Network 6
The raw reported data have been cleanedaccording to the International Conference onHarmonisation (ICH) of TechnicalRequirements for Registration ofPharmaceuticals for Human Use .
The verbatim reactions/indications have beencoded into the system organ class (SOC)
using the Medical Dictionary for Regulatory Activities (MedDRA version 13.1) for coding of diseases/medical conditions.
Data Clean - 2
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
74/163
2014.01.15 Budapest Users of R Network 7
! The raw reported datasets have beentransferred into CDISC SDTM datasets, andalso into CDISC ADaM datasets which are thebasis for production of statistical graphs in Rstatistical package including: Shiny ,
vcd (The conditional density plot - Hofmann and Theus2005) ,
Basic Hexagon Binning Functions ( hexbin )
rworldmap (joinCountryData2Map, mapCountryData) ,
ggplot2 .
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
75/163
2014.01.15 Budapest Users of R Network 8
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
76/163
2014.01.15 Budapest Users of R Network 9
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
77/163
2014.01.15 Budapest Users of R Network 10
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
78/163
2014.01.15 Budapest Users of R Network 11
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
79/163
2014.01.15 Budapest Users of R Network 12
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
80/163
2014.01.15 Budapest Users of R Network 13
LIE Factor - Edward Tufte
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
81/163
2014.01.15 Budapest Users of R Network 14
Define the maximum ideas to the audience: In the shortest time ,
Minimize the number of "ink " , with the smallest optimal representation.
Tell the truth about the data.
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
82/163
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
83/163
K(les ri Lszl:23, 4atc0 ,as0$oard
Qanopt
5dud*ari -'(r#':R 6 %'t0on
colla$oration
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
84/163
Budapesti Mszaki s Gazdasgtudomnyi Egyetem Mrstechnika s Informcis Rendszerek Tanszk
TekR edik a kgy avagy az R s Python sszekapcsolsnak lehetsgei
Ndudvari Gyrgy [email protected]
2014. j anur 15
Kapcsolatom az R- rel s a Pythonnal
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
85/163
Hello World!2002.12.26.- n Szmomra A nyelv
o Webo Rendszeradminisztrci o Knyelmi funkcik
Elszr 2012-bentallkoztunk Kevs tapasztalat
o Logelemzs o Vizulis adatelemzs
Erssgek s gyengesgek
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
86/163
ltalnos hasznlatra Egyszer
Statisztikai nyelvSzmomra nehzkes
Plottok forrsa: http://ghalib.me/blog/a-superficial-comparison-of
Erssgek s erssgek
+
http://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-ofhttp://ghalib.me/blog/a-superficial-comparison-of7/22/2019 R Lightning Talks @ BURN (2014-01-15)
87/163
Hatkony, gyorsfejleszts Kzssg
Plottols Statisztikai csomagok
Kzssg
+
=Profit
Lehetsgek
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
88/163
R Python rPythonRSPython
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
89/163
R Python system()
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
90/163
Python R rpy2
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
91/163
Forrs: rpy2 dokumentci
Python R rpy2 szintek, csomagok
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
92/163
Python R rpy2 R session- k
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
93/163
Python R rpy2Egy plda:
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
94/163
Alkalmazs egy sajt projektnl
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
95/163
IndokokHatrid Tbb t t l t P th l
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
96/163
Tbb tapasztalat PythonnalMr meglv R-es kdbzis
Elnyk Gyorsabb fejleszts E bb i t i
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
97/163
Egyszerbb integrci Ksz R-es fggvnyek minimlis mdostsa Ignyes grafikonok
sszefoglals Merjnk kilpni az R-es vilgbl! A k l gh tk bb k t h lj k!
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
98/163
A szmunkra leghatkonyabb eszkzt hasznljuk! A lehetsg megvan
Ksznm a figyelmet!
@reedcourty
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
99/163
5dud*ari -'(r#':R 6 %'t0on colla$oration
7ttucs) -'(r#':7nline 8orecastin#
A..lication
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
100/163
Using R in Production
Case Study:Online Forecasting
Gyuri [email protected]
1
Background
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
101/163
US startup
My first big R project
Early 2010
Start as research pilot (plan: proto in R, afterJava:)
2
Sequential Sales Forecasting
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
102/163
US Grocery stores
Input Data previous days and earlier days sales data Historical & future Price (Promotion Calendars) Other inputs
Forecast horizon: next 7-days
3
Sequential Sales Forecasting
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
103/163
US Grocery stores
Input Data previous days and earlier days sales data Historical & future Price (Promotion Calendars) Other inputs
Forecast horizon: next 7-days
4
Some numbers
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
104/163
2010 ~200 fcsts/day, 1 developer, laptop
2013 ~100,000 fcsts/day, 3 developers+1 data ops,
server (8 cores)
~2MB R code
5
Challenges
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
105/163
Speed couple hours time window to process data &
generate the forecast (should submit before 6amEastern time)
More CPU cores
Training/Back Testing File mutex
6
Challenges
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
106/163
Speed couple hours time window to process data &
generate the forecast (should submit before 6amEastern time)
More CPU cores
Training/Back Testing File mutex
7
Challenges II
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
107/163
Maintenance
Requirements High Availability Fault tolerance (e.g. Tornado)
8
Maintenance
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
108/163
Early months manual operation Manually move the data/start the process Check the data quality
Put together a doc and hand it over to ops
9
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
109/163
10
Maintenance 2.0 First idea: crontab, etc..
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
110/163
Second: Hudson-CI, http://hudson-ci.org/ Overkill?
11
Hudson
http://hudson-ci.org/http://hudson-ci.org/http://hudson-ci.org/http://hudson-ci.org/7/22/2019 R Lightning Talks @ BURN (2014-01-15)
111/163
Monitoring executions of externally-run jobs
Nice web UI
Cron + procmail and lot more
Tons of plugins
12
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
112/163
13
Conclusion: R in production
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
113/163
Pros: Quick prototyping No issue with speed
Cons: Code maintenance is hard after several thousand
lines
14
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
114/163
Thank you
15
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
115/163
7ttucs) '(r#':7nline 8orecastin# A..lication
BME MIT
9aln)i #nes:Ano+liadete)tlsR-rel
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
116/163
Budapesti Mszaki s Gazdasgtudomnyi EgyetemMrstechnika s Informcis Rendszerek Tanszk
Anomliadetektls R-rel
Salnki gnes
2014.01.15.
Egy motivci (1949)
Hadlum vs. Hadlum
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
117/163
Forrs: http://www.siam.org/meetings/sdm10/tutorial3.pdf
Egy motivci(1949)
tlag:280 nap
http://www.siam.org/meetings/sdm10/tutorial3.pdfhttp://www.siam.org/meetings/sdm10/tutorial3.pdf7/22/2019 R Lightning Talks @ BURN (2014-01-15)
118/163
Forrs: http://www.siam.org/meetings/sdm10/tutorial3.pdf
(40 ht)
Mrs. Hadlum:
349
Anomlia definci?ms a generl folyamat
http://www.siam.org/meetings/sdm10/tutorial3.pdfhttp://www.siam.org/meetings/sdm10/tutorial3.pdf7/22/2019 R Lightning Talks @ BURN (2014-01-15)
119/163
Anomlia definci?
exception
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
120/163
anomaly
surprise
rare eventnoveltyoutlier
aberration
peculiarity
discordant observations
CsoportostsTvolsg alap
o Befoglal burok:depth
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
121/163
o MVE, MCD: MASSo BACON: robustXo DB: fields
Srsg alapo LOF:DMwR
o NNDB
Tvolsg?
Befoglal burok1D: min, max
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
122/163
(bels: medin)
2D: bef. poligon
3 D:
Befoglal burok: depth::depth
MVEMinimum Volume
Ellipsoid
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
123/163
Kimert keresssel
MVE: MASS::cov.rob
BACON
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
124/163
Ha sszefgg,sszefgg
BACON: robustX::mvBacon
DBHiba vagyunk a
kzppontban, hanincsenek szomszdaink
Distance-basedapproach
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
125/163
DB: fields::fields.rdist.near
LOF motivci2 sem,
vagy 1 is?
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
126/163
LOFLocal outlier factor
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
127/163
Ha aszomszdaim
is
magnyosak,nincs nagygond
LOF: DMwR::lofactor
NNDB
Hol vannak a
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
128/163
nagy vltsok?
Amire mi hasznljuk: teljestmnymen.
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
129/163
Tvolsg vagy srsg alap?BACON: elg messze
vanNNDB: de homogn
srsg
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
130/163
9 l )i #
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
131/163
9aln)i #nes:Ano+liadete)tls R-rel
rapporter.net
;t0 er#el':
R +int 19 esz)(z
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
132/163
Tth Gergely, Rapporter -Easystats/PERIPATO
library (sp ) # Alap: a terleti adatok kezelse (classes )
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
133/163
library (maptools ) # ESRI standardok kezelse
library (rgeos ) # Tri objektumok manipullsa (GEOS )
library (raster ) # grid, raster
library (rasterVis ) # raster megjelents
# sok- sok fggsg
library (dismo ) # google maps hvsok (eredetileg: gmap +szksges rgdal )
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
134/163
+szksges rgdal )
mymap
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
135/163
library (RgoogleMaps )
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
136/163
PlotOnStaticMap (lat = c(36.3 , 35.8 , 36.4 ),lon = c(-5.5 , -5.6 , -5.8), zoom = 10,cex = 4, pch = 19, col = "red" ,
FUN = points, add = F)# Ments fjlba: ujterkep
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
137/163
# Mintaadatok: Nemes babr
library (dismo )
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
138/163
library (dismo ) laurus
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
139/163
Goog e trkpen:locs.sp.coords
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
140/163
# rworldmaplibrary (rworldmap ) data (coastsCoarse )
plot (locs.sp, pch = 20, cex = 2,col = "steelblue" )title (Nemes babr
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
141/163
( )data (countriesLow ) plot (coastsCoarse, add = T)
(Spanyolorszgban ") plot (countriesLow, add = T)
library (googleVis )data (Exports ) # minta adatbzis # 'data.frame': 10 obs. of 3 variables:# $ C F / 10 l l
# Andrew hurrikn: data (Andrew )M1
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
142/163
# $ Country: Factor w/ 10 levels"Brazil","France",..: 3 1 10 2 4 6 5 7 8 9
# $ Profit : num 3 4 5 4 3 2 1 4 5 1# $ Online : logi TRUE FALSE TRUE TRUE
FALSE TRUE ..Geo
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
143/163
library (rworldmap )
data(" countryExData ",
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
144/163
data( cou t y Data ,envir=environment(),package=" rworldmap ")
sPDF
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
145/163
Kontrtrkpek Trbeli autokorrelci
Francisco Rodriguez-Sanchez: Spatial data inR U i R GIS
http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/7/22/2019 R Lightning Talks @ BURN (2014-01-15)
146/163
Francisco Rodriguez Sanchez: Spatial data inR: Using R as a GIS
CRAN Task View: Analysis of Spatial Data Making Maps with R
Csomag dokumentcik
;t0 -er#el':
http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://cran.r-project.org/web/views/Spatial.htmlhttp://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://www.molecularecologist.com/2012/09/making-maps-with-r/http://cran.r-project.org/web/views/Spatial.htmlhttp://cran.r-project.org/web/views/Spatial.htmlhttp://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/http://pakillo.github.io/R-GIS-tutorial/7/22/2019 R Lightning Talks @ BURN (2014-01-15)
147/163
;t0 er#el :R +int -19 esz)(z
Qanopt
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
148/163
1. Quanopt Ltd.
MonetDB.Rcsomagbl
Urbanics Gbor < [email protected]>
Villmelads R meetup Budapest
Problmafelvets
Az R j, de a memriban kell lenniemindennek hamar elfogyhat
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
149/163
2. Quanopt Ltd.
j ,mindennek hamar elfogyhatErre a legegyszerbb megoldsok
o File-backed csomagok hasznlata, pl.
bigmemory csomagcsald, ff csomag o Amennyire lehet, hasznljunk adatbzist
Trols + Feldolgozs
MonetDB
Relcis
ID Day Discount10 4/4/98 0.195
11 9/4/98 0.06512 1/2/98 0.175
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
150/163
3. Quanopt Ltd.
Oszlopszervezso V. soronknti
13 7/2/98 0
OID ID
100 10101 11
102 12
103 13
104 14
OID Day
100 4/4/98101 9/4/98
102 1/2/98
103 7/2/98
104 1/2/99
OID Discount
100 0.195101 0.065
102 0.175
103 0
104 0.065
3 db kln fjl a diszken
MonetDB elnyk
A teljes oszlopo(ka )t rint lekrdezseknllesz hatkony (IO hozzfrs jobb)
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
151/163
4. Quanopt Ltd.
lesz hatkony (IO hozzfrs jobb)
Egy oszlopot egyszer cache-elnio
Memory mapped fileOszloponknt jobban tmrthetek az adatok
o kisebb trhely
o CPU- val fizetnk, a kevesebb IO- rt
Frtbe is szervezhet
MonetDB htrnyok
Nem silver-bullet : mindenre nem lesz j
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
152/163
5. Quanopt Ltd.
Read-mostly hozzfrsre javasolt
Itt is a kd a legbiztosabb dokumentci
MonetDB s R integrci
RODBC-n keresztl elrhet
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
153/163
6. Quanopt Ltd.
o Ilyet lttunk mr ez nem segtene sokat
MonetDB.R csomag
o MonetDB specifikus funkcionalits R - bl
o Alapvet DB management (start/stop)
o Adatelrs (termszetesen)
MonetDB.R csomag DBI elrs
DBI drivero dbConnect
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
154/163
7. Quanopt Ltd.
o dbSendUpdate ,
# using DBI with MonetDB
> conn result print(result)
L11 29722533> str(result)'data.frame': 1 obs. of 1 variable: $ L1: num 29722533
MonetDB.R csomag monet.frame
Egy data.frame szer osztly
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
155/163
8. Quanopt Ltd.
o Proxy objektum egy adatbzis tblhoz
Egyszer mveletek adatbzis oldalon
o Az R hvsokat SQL lekrdezsekk rja t
MonetDB.R csomag monet.frame> mframe
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
156/163
9. Quanopt Ltd.
> str(mframe)
MonetDB-backed data.frame surrogate 3 columns, 12500 rows Query: SELECT * FROM demotable Columns: col1 (numeric), col2 (numeric), col3 (numeric)
lekrdezseket!
MonetDB.R csomag monet.frame> res
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
157/163
10. Quanopt Ltd.
QQ: 'SELECT col3 FROM demotable QQ: 'SELECT 10*(col3) FROM demotable QQ: 'SELECT COS(10*(col3)) FROM demotable QQ: 'SELECT SQRT(COS(10*(col3))) FROM demotable
QQ: 'SELECT ((col1)+(col2))/(SQRT(COS(10*(col3)))) FROMdemotable'
a lekrdezseket,de nem hajtja vgre
MonetDB.R csomag monet.frame> head(res)
QQ: 'SELECT ((col1)+(col2))/(SQRT(COS(10*(col3)))) FROM
demotable LIMIT 6 OFFSET 0 II: 'Re- Initializing column info. EX 'SELECT (( l1) ( l2))/(SQRT(COS(10*( l3)))) FROM
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
158/163
11. Quanopt Ltd.
EX: 'SELECT ((col1)+(col2))/(SQRT(COS(10*(col3)))) FROMdemotable LIMIT 6 OFFSET 0
sql_div_sql_add_col11 4.6613272 NA3 NA4 6.2608295 6.6401086 -21.836528
Szmos R fggvny s opertor SQL oldaliimplementcija szerepel a csomagban
b [
MonetDB.R csomag monet.frame
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
159/163
12. Quanopt Ltd.
o Subset , [ opertor o Aritmetikai mveletek: +, -, *, /, ^, %%, %/%o Logikai opertorok: &, |, !o min, max, mean, sd, var, median, quantile,
tabulateo abs, sign, sqrt, floor, ceiling, trunc, round, signifo exp, log, expm1, log1p, cos, sin, tan, acos, asin,
atan, cosh, sinh, tanh, acosh,asinh, atanh
MonetDB.R csomag htrnyok
Az tlet (R SQL trs) j , deo Az implementciban vannak azrt hibk( l li i l h lj k )
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
160/163
13. Quanopt Ltd.
(pl. limit nev vltozt ne hasznljunk )o Az mveletek elemkszlete korltozott
Teljesen transzparens nem lesz
MonetDB.R csomag htrnyok
Tetszleges meglv fggvnyt nemfogunk tudni hasznlni
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
161/163
14. Quanopt Ltd.
> ggplot(data=mframe, aes(x=col1)) + geom_histogram()Error: ggplot2 doesn't know how to deal with data of classmonet.frame
Termszetesen mi rhatunk olyan sajt fggvnyt, amitmogatja
MonetDB.R hasonl megoldsok
Oracle R Enterpriseo R fggvnyek vgrehajtsa Oracled tb i k
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
162/163
15. Quanopt Ltd.
adatbzisokon IBM Netezza R csomagok
o
nzR, nzA, nzMatrixTeradata
o teradataR csomag
01 Arat Bence: Ki szereti az R-t?
02 Bod Lszl: R s !""0= Bur#er !sa$a: % nz'i d(nt se) *izs#lata l+e4-#'el04 i # l' 9 (* # j l ) ) >
7/22/2019 R Lightning Talks @ BURN (2014-01-15)
163/163
04 ,arczi er#el': 9z(*e#es jelent se) ) sz>t se05 /or*t er#el': R a i*atalos statiszti)$an0 Kocsis +re: R/adoo . a Reduce R-$en0@ K(les ri Lszl: 3, atc ,as $oard0 dud*ari '(r#': R 6 %'t on colla$oration0 7ttucs) '(r#': 7nline 8orecastin# A lication10 9aln)i #nes: Ano+liadete)tls R-rel
11 ;t er#el': R +int 9 esz)(z12