13
SPARKING PANDAS: AN EXPERIMENT PyConOtto - Florence '17 Francesco Bruni brunifrancesco

Sparking pandas: an experiment

Embed Size (px)

Citation preview

Page 1: Sparking pandas: an experiment

SPARKING PANDAS: ANEXPERIMENT

PyConOtto - Florence '17

Francesco Bruni

� brunifrancesco

Page 2: Sparking pandas: an experiment

WHO I AMMSc in Telecommunication Engineering

Functional pythonista

Currently working with geo data

Page 3: Sparking pandas: an experiment

OUTLINE

Why Sparking Pandas

Functional data processing pipelines

A real world application

Conclusions

Page 4: Sparking pandas: an experiment

WHY SPARKING PANDAS

What if your data don't fit into memory?

Page 5: Sparking pandas: an experiment

APACHE SPARK: THECOMPONENTS

Page 6: Sparking pandas: an experiment

APACHE SPARK: THE

ARCHITECTURE

Page 7: Sparking pandas: an experiment

FUNCTIONAL DATA

PROCESSING PIPELINES

High order functions

Immutable data

Lazy evaluation

Page 8: Sparking pandas: an experiment

THE EXPERIMENT

The scenario

Containerized application

Page 9: Sparking pandas: an experiment

THE SCENARIO

Page 10: Sparking pandas: an experiment

CONTAINERIZED

APPLICATION

Containerized componentsConstrained memory nodesdocker-composed ecosystem

Page 11: Sparking pandas: an experiment

HANDS ON CODEApache Spark basics

Linear regression

Near real time processing with Apache Kafka

Page 12: Sparking pandas: an experiment

CONCLUSIONS

Complex structure

Worth the effort with a lot of data

Worker nodes should be distribueted

Keep exploring :)

Page 13: Sparking pandas: an experiment

QUESTIONS?

� brunifrancesco

https://github.com/brunifrancesco/docker-spark