Anatomy of in memory processing in Spark

Anatomy of in-memory processing in Spark

A deep-dive into custom memory management in Spark

https://github.com/shashankgowdal/introduction_to_dataset



● Shashank L

● Big data consultant and trainer at datamantra.io

● www.shashankgowda.com

http://datamantra.io/

http://datamantra.io/

http://shashankgowda.com

http://shashankgowda.com

Agenda

● Era of in-memory processing● Big data frameworks on JVM● JVM memory model● Custom memory management● Allocation● Serialization● Processing● Benefits of these on Spark

Era of in-memory processing● After Spark, in memory has become a defacto

standard for big data workloads● Advancement in hardware is pushing more

frameworks in that direction● Memory management is coupled with runtime of the

framework● Using memory efficiently in a big data workload is a

challenging task● Memory management depends upon runtime of the

framework

Why JVM is a prominent runtime for big data workloads

● Managed runtime

● Portable

● Hadoop was on JVM

● Rich eco system

Big data frameworks on JVM● Many frameworks today runs on JVM today

○ Spark○ Flink○ Hadoop○ etc

● Organising data in memory○ In-memory processing○ In-memory caching of intermediate results

● Memory management influences○ Resource efficiency○ Performance

Straight-forward approach

● JVM memory model approach● Store collection of objects and perform any processing on

the collection.● Advantages

○ Eases development cycle○ Built in safety checks before modifying any of the memory○ Reduces complexity○ JVM built-in GC

JVM memory model - Disadvantages

● Predicting memory consumption is hard○ If it fails, OutOfMemory error kills the JVM

● High garbage collection overhead○ Easily 50% of the time spent in GC

● Objects have space overhead○ JVM objects doesn’t take the same amount of memory as

we think.

Java object overhead● Consider a string “abcd” as a JVM object. By looking at it, it

should take up 4 bytes (one per character) of memory.

Garbage collection challenges● Many big data workloads create objects in a way that are

unfriendly to regular Java GC.

● Young generation garbage collection is frequent.● Objects created in big data workloads tend to live in Young

generation itself because they are used few times.

Generality has a cost, so semantics and schema should be used to exploit specificity

instead

Custom memory management

● Allocation - Allocate fixed number of segments upfront.

● Serialization - Data objects are serialized into memory segments

● Processing - Implement algorithms on binary representation.

Allocation

Managing memory on our own

● sun.misc.Unsafe

● Directly manipulating memory without safety checks.

(hence, its unsafe)

● This API is used to build data structures off heap in

Spark.

sun.misc.Unsafe

● Unsafe is one of the gateway to low level

programming in Java.

● Exposes C-style memory access

● explicit allocation, deallocation, pointer arithmetics

● Unsafe methods are intrinsic

Hands-on

● DirectIntArray

● MemoryCorruption

Custom memory management in SparkOn heap

● Stores data inside an array of type Long● Capable of storing 16GB at once● Bytes are encoded in long and stored here

Off heap

● Allocates memory in the memory assigned to JVM other than heap

● Uses Unsafe API● Stores bytes directly

Encoding memory addresses● Off heap: Addresses are raw memory pointers.● On heap: Addresses are base object + offset pairs● Spark uses its own page table abstraction to enable more

compact encoding of on-heap addresses.

Serialization

Data Structures prominently used in Big data

● Sequence

● Key-Value pair

Java object-based row notation● 3 fields of type (int, string, string)

with value (123, “data”,”mantra”)

➔ 5+ objects➔ high space overhead➔ expensive hashCode()

Tungsten’s unsafe row format

● Bitset for tracking null values● Every column appears in the fixed-length value region

○ Fixed length variables are inclined○ For variable length values, we store a relative offset

into the variable length data section● Rows are always 8 byte aligned● Equality comparison can be done on raw bytes.

Example of unsafe row

null tracking bitmap

(123, “data”,”mantra”)

Hands-on

● UnsafeRowCreator

java.util.HashMap

...

array

● Huge object overheads

● Poor memory locality

● Size estimation is hard

Tungsten BytesToBytesMap

...

● Low overheads

● Good memory locality, especially for scans

Processing

Many big data workloads are now compute bound

● Network optimizations can only reduce job completion time by a median of

at most 2%.

● Optimizing or eliminating disk accesses can only reduce job completion

time by a median of at most 19%

● [1]

Hardware trends

Why is CPU the new bottleneck● Hardware has improved

○ 1Gbps to 10Gbps link in networks○ High B/W SSDs or Stripped HDD arrays

● Spark IO has been optimized○ Many workloads now avoid significant disk IO by pruning

data that is not needed in a given job○ New shuffle and network layer implementations

● Data formats have improved○ Parquet, binary data formats

● Serialization and Hashing are CPU-bound bottlenecks

Code generation

● Generic evaluation of expression logic is very expensive on the JVM○ Virtual function calls○ Branches based on expression type○ Object creation due to primitive boxing○ Memory consumption by boxed primitive objects

● Generating the code which directly applies the expression logic on serialized data

Which Spark API can be benefited

Spark dataframes

SparkSQL

RDD

Why only Dataframes are benefited?

PythonDF

Java/ScalaDF

RDF

Logical Plan

Physicalexecution

Catalyst optimizer

SparkSQL

Physicalexecution

RDDAPI

Runtime bytecode generation

Dataframe code

Catalyst expressions

Low level bytecode

Code generation

Aggregation optimization in DataFrame

Aggregation optimization in DataFrame

Input row Grouping keyUnsafeRow

BytesToBytesMapIterate

Update in place

Probe

ProjectConvert

Scan

Performance results with optimizations (Run time)

Performance results with optimizations (GC Time)

References

● https://www.eecs.berkeley.edu/~keo/publications/nsdi15-final147.pdf

● https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html

● https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing-spark-closer-to-bare-metal/

● https://gist.github.com/raphw/7935844● http://www.bigsynapse.com/addressing-big-data-

performance

https://www.eecs.berkeley.edu/~keo/publications/nsdi15-final147.pdf



https://databricks.com/blog/2015/04/28/project-tungsten-bringing-spark-closer-to-bare-metal.html



https://spark-summit.org/2015/events/deep-dive-into-project-tungsten-bringing-spark-closer-to-bare-metal/



https://gist.github.com/raphw/7935844

https://gist.github.com/raphw/7935844

http://www.bigsynapse.com/addressing-big-data-performance



Data & Analytics

Anatomy of in memory processing in Spark