24
BIG DATA ANALYTICS TO THE MASSES JOSE LUIS LÓPEZ PINO DATA ENGINEER GETYOURGUIDE

Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Embed Size (px)

Citation preview

Page 1: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

BIG DATA ANALYTICS TO THE MASSES

JOSE LUIS LÓPEZ PINODATA ENGINEER GETYOURGUIDE

Page 2: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Big Data Analytics to the masses

Why it has failed and how we can fix it

Jose Luis Lopez Pino

Page 3: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Who am I?

BI Consultant

Large-Scale & Distributed

Founding

Data Engineer

Page 4: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Big Data is like Tourism But if you aren’t an expert,

you can’t make the most of itIt seems easy to do

Page 5: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Struggle to analyze Big Data

Harlan Harris, Sean Murphy, and Marck Vaisman. Analyzing the Analyzers: An Introspective Survey of Data Scientists and Their Work. O’Reilly Media, Inc., 2013Also: Sean Kandel, Andreas Paepcke, Joseph M Hellerstein, and Jeffrey Heer. Enterprise data analysis and visualization: An interview study. Visualization and Computer Graphics, IEEE Transactions

Page 6: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Tools

Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era. Proceedings of the VLDB Endowment, 7(13), 2014

Page 7: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Tools (October 2014)

Original: Volker Markl. Breaking the chains: On declarative data analysis and data independence in the big data era. Proceedings of the VLDB Endowment, 7(13), 2014

Page 8: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Deep analytics

Page 9: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Libraries!

We need libraries...

Query languages

Write your own MR/RDD/Transformations

Page 10: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

… comprehensive ones!

Page 11: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Say it with memes!

When you doDeep analytics in small data

using R and CRAN packages

When you dodeep analytics in BIG data

using R and CRAN packages

Page 12: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

When you try to program it using MapReduce

When you try to program it using Apache Spark /

Apache Flink

When you try to use a library scalable to large data sets

Page 13: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Can’t we do it better?

- Make it similar to normal R programs.

- Hide complexity.- Make file manipulation easier.- Part of the computing in the

cluster and part of the computer in the client.

Page 14: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Our approach

Page 15: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Our approach

Page 16: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Behind the scenes: Before

Page 17: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Behind the scenes: After

Page 18: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Without writing significantly different code

Page 19: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Competitive or even faster than R native code in small data

Page 20: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

And it scales

Page 21: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

Some relevant findings

- Transmission time was not significant.- Stratosphere/Flink was competitive in highly

iterative programs.- We were not able to do it keeping the code

100% the same.- Ensemble scenarios are the most exciting

ones.

Page 22: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

4 Takeaways from this talk

- We still need to bring Big Data to the right people in the right place.

- We need comprehensive libraries.- We need to move data back and forth.- Use a syntax that the users are familiar with.

Page 23: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

That’s all!- Have you found this talk interesting?

- Follow me: @jllopezpino- Interested in a job as SEM Data Analyst

(Berlin)?- Ask me for the details:

- Are you interested in Data + Energy?- Keep in touch:

Page 24: Analytics to the masses by Jose Luis Lopez at Big Data Spain 2014

17TH ~ 18th NOV 2014MADRID (SPAIN)