38
Composing and Scaling Data Platforms Rahul Kumar

Composing and Scaling Data Platforms-2015

Embed Size (px)

Citation preview

Composing  and  Scaling  Data  Platforms  

Rahul  Kumar  

Data  Representation

Architecture

Parallelism

Talk  Highlights

 As  software  engineer  we  are  inevitably  affected  by  the  tools  we  surrounded  ourself  with  

Process

all  act  to  shape  the  software  we  build.

Language

Frameworks

Likewise  database,  which  have  trodden  a  very  specific  path,  inevitably  affect  the  way  we  treat  mutability  and  share  state  in  our  application.  

5

Today’s data platforms range greatly in complexity. From simple caching layers or Polyglot Persistence right through to

wholly integrated data pipelines.

There are many paths. They go to many different places.

So the aim for this talk is to explain how and why some of these popular approaches work.

http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

This  talk  is  based  on  Ben  Stopford’s  actual  presentation.  

6

Computer  work  best  with  sequential  workloadWhen we’re dealing with data, we’re really just arranging locality.

Locality to the CPU. Locality to the other data we need.

7

Accessing  data  sequentially  is  an  important  component  of  this.    

Computers  are  just  good  at  sequential  operations.    Sequential  operations  can  be  predicted.    

8

Random  vs  Sequential  Addressing

If  you’r  taking  data  from  disk  sequentially  it  will    be  pre-­‐fetched  in  to    the  disk  buffer,    the  page  cache  and    the  different  levels  of  CPU  caching.

But it does little to help the addressing of data at random, be it in main memory, on disk or over the network. In fact pre-fetching actually hinders random workloads as the various caches and frontside bus fill with data which is unlikely to be used.

9

Streaming  data  sequentially  from  disk  can  actually  outperform  randomly  addressed  main  memory.    So  disk  may  not  always  be  quite  the  tortoise  we  

think  it  is,    at  least  not  if  we  can  arrange  sequential  access.    

10

We  want  to  keep  writes  and  reads  sequential,  as  it  works  well  with  the  hardware.    

We  can  append  writes  to  the  end  of  the  file  efficiently.    We  can  read  by  scanning  the  the  file  in  its  entirety.    

Any  processing  we  wish  to  do  can  happen  as  the  data  streams  through  the  CPU.    

We  might  filter,  aggregate  or  even  do  something  more  complex.    

11http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

12http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

13

14http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

15http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

16http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

17http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

18http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

19http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

20http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

21http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

22

Parallelism

23http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

24http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

25http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

26http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

27http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

28http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

29

Architecture

30http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

31http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

32http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

33http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

34http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

35http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

36http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

37http://www.benstopford.com/2015/04/28/elements-­‐of-­‐scale-­‐composing-­‐and-­‐scaling-­‐data-­‐platforms/

Thank You