2
Szilard Pafka – Los Angeles area R users group meeting – November 17, 2010 Software tools for data analysis: (size related to surveyed usage) C C++ Fortran Java + libraries... Perl Python Ruby Unix shell Lisp Clojure R Matlab Octave Maple Mathematica SPSS Stata Statistica SAS JMP Excel SAS EM SPSS Clementine RapidMiner Weka Mahout MySQL SQL Server NoSQL stores Hadoop CUDA support: editors code versioning cloud computing Possible talks: December: 1. C, interfaces with R (both ways) / something else ? 2. SAS: performance, R interface ready? 3. RExcel January: 1. Python & R – a comparison 2. numpy, scipy 3. Python vs Unix shell / NLTK / networkX Other talks (March-) 1. data storage (SQL and some noSQL), access from R 2. data mining platforms 3. Hadoop 4. gpu 5. Java 6. Clojure ...

Los Angeles R users group - Nov 17 2010 - Part 1

Embed Size (px)

Citation preview

Page 1: Los Angeles R users group - Nov 17 2010 - Part 1

Szilard Pafka – Los Angeles area R users group meeting – November 17, 2010

Software tools for data analysis: (size related to surveyed usage)

C C++ Fortran Java + libraries...

Perl Python Ruby Unix shellLisp Clojure

R Matlab Octave Maple Mathematica

SPSS Stata Statistica SAS JMP

ExcelSAS EM SPSS Clementine RapidMiner Weka Mahout

MySQL SQL Server NoSQL stores

Hadoop CUDA

support: editors code versioning cloud computing

Possible talks: December: 1. C, interfaces with R (both ways) / something else ?2. SAS: performance, R interface ready?3. RExcel

January: 1. Python & R – a comparison2. numpy, scipy3. Python vs Unix shell / NLTK / networkX

Other talks (March-)1. data storage (SQL and some noSQL), access from R2. data mining platforms3. Hadoop4. gpu5. Java6. Clojure...

Page 2: Los Angeles R users group - Nov 17 2010 - Part 1

Criterias for talks:

usefulness (for data analysis!) and also comparing it with R

paradigm/philosophy, main usage domain, performance, easiness to learn, quick to program, libraries

break down by:- part of the data analysis process (pre-processing, exploration (e.g. visualization), modeling etc.)- nature of data (e.g. numeric, categorical, unstructured text, networks/links etc.)- size of data

stuff that increases functionality: libraries, 3rd party extensions...

does tool X have R to X and/or X to R interface?

how these tools can be combined to support the whole process of data analysis