How to manage large amounts of data with Iteratee - ScalaDays Berlin 2014

Preview:

DESCRIPTION

How to manage a large file with HTTP? Is it possible to manage data stream when you are in a HTTP POST request? How to be relax when managing data stream (ie : not use while(true) hack or something I/O block)? This talk is about Play! iteratee and how to use it on real project : what is an iteratee? why it's useful? how to use it? how to manage it? is it clean? is your code readable or not? http://scaladays.org/#schedule/How-to-manage-large-amounts-of-data-with-Iteratee

Citation preview

HOW TO MANAGE LARGE AMOUNTS OF DATA WITH ITERATEEBY @WAXZCE – QUENTIN ADAM

SCALA DAYS 2014 BERLIN

MY DAY TO DAY WORK : CLEVER CLOUD, MAKE YOUR APP RUN ALL THE TIME

And learn a lot of things about your code, apps, and good/bad design…

KEEP YOUR APPS ONLINE. MADE WITH NODE.JS, SCALA, JAVA, RUBY, PHP, PYTHON, GO…

AND LEARN A LOT OF THINGS ABOUT YOUR CODE, APPS, AND GOOD/BAD DESIGN…

WHY STREAMS ARE SO TRENDY THESE DAYS?

2 MAIN REASONS

DATA ARE COMING BIGGER AND BIGGER

WE NEED TO ACT WHILE DATA TRANSFERRING IS OCCURRING

&

WE ARE NOW USING LOTS OF PERMANENT CONNECTION

EXAMPLE

WHAT IS INSIDE AN HTTP REQUEST ?

Verb• The action

Resource• The object of the action

Headers • The context of the action

Body • Optional• The datas

IN MANY CASE THE REQUEST IS MANIPULATE ALL FROM MEMORY

UPLOADING A 20 GB FILE ON A POST REQUEST AND STORE IT

EXAMPLE

IT’S A BAD IDEA TO PUT THE BODY PART IN MEMORY

CREATE A TEMP FILE TO STORE THE DATA

Client upload

file

Create a temp file

Send a file in a

backend

Then answer to the client

The file is transferring two

times

NOT SO GOOD

Client upload file

Directly stream it to the backend

Then answer to the client

LET’S ACT ON STREAM!

CLASSIC JAVA STREAM MANAGEMENT

CLASSIC JAVA STREAM MANAGEMENT

• Buffers• Buffer management• Buffer exception• Buffer overflow

CLASSIC JAVA STREAM MANAGEMENT • Low performances if not buffered

• Not modular

• Thread blocking

• Code is ugly

• No back pressure

• Error handling is bad

• i/o management and business code are mixed

I CAN’T SLEEP IN THE NIGHT

WE HAVE TO FIND A BETTER WAY

THAT’S WHY WE NEED ITERATEE

DATA STRUCTURES TO EXPRESS DATA STREAM MANAGEMENT

Like a recipe

Consume the data

ITERATEE : HOW TO MANAGE A STREAM

Produce the data

ENUMERATOR : DATA STREAM

Set of tools to do cool things with Iteratee and Enumerator

ENUMERATEE

SIMPLE EXAMPLE TO START

AN ENUMERATOR

AN ITERATEE

INPUTS

• EOF

• Empty

• El

ITERATEE STATES

• Done

• Error

• Cont

AN ITERATEE

AN ITERATEE TO MANAGE THE STREAM

AN ITERATEE

ITERATEE COMPOSITION

PURE FUNCTIONAL

TYPE SAFE

COMPOSITION

SO WE WILL JUST MANAGE THE BODY STREAM

JUST DO NOT REWRITE HTTP PARSER

CREATE TEMP FILE

Built in on play with

CREATE TEMP FILE

Built in on play with

BODY PARSERSREQUEST HEADERS -> ITERATEE[ARRAY[BYTE], EITHER[SIMPLERESULT, ?]]

GET FILE AND CALCULATE HASH FROM CHUNK

DEMO

I WANT TO USE FORMS

SO PARTHANDLER ALLOW YOU TO COMPOSE A BODYPARSER

ITERATEE <3 COMPOSITION

NOW WE CAN ACCESS TO STREAMS

DATA STORAGE

I HATE FILE SYSTEM

I HATE FILE SYSTEM

DO NOT USE THE FILE SYSTEM AS A DATASTORE

File system are POSIX compliant

• POSIX is ACID• POSIX is powerful but is a bottleneck • File System is the nightmare of ops • File System creates coupling (host provider/OS/language)• SPOF-free multi tenant File System is a unicorn

• Key = hash of the chunk

STORE FILE IN CHUNK INSIDE K/V STORE

STORE AS META DATA THE LIST OF CHUNKS

PROCESS

File is post by browser

Stream is chunk

nomalized

Each chunk is async

store in kv

List of chunk hash is store

with file id

No overhead for data store

NEW DEMO• Play 2 + iteratee

• Datastore is riak

ARE YOU CONVINCE TO USE STREAM ?

DATA STREAMING PROGRAMMING IS NOW TRENDING

THX TO @CLEMENTDCLEMENT DELAFARGUE

I’m @waxzce on twitter

I’m the CEO of

A PaaS provider, give it a try ;-)

THX FOR LISTENING & QUESTIONS TIME

Coupon for Clever Cloud trial :

berlin_scala_days

Recommended