In-Memory Computing
Srinath Perera Director, Research
WSO2 Inc.
Performance Numbers (based on Jeff Dean’s numbers )
Mem Ops / Sec
If Memory access is a Second
L1 cache reference 0.05 1/20th sec
Main memory reference 1 1 sec
Send 2K bytes over 1 Gbps network 200 3 min
Read 1 MB sequentially from memory 2500 41 minDisk seek 1*10^5 27 hours
Read 1 MB sequentially from disk 2*10^5 2 days
Send packet CA->Netherlands->CA 1.5*10^6 17 days
OperationSpeed
MB/sec
Hadoop Select 3Terasort Bench mark 18Complex Query Hadoop 0.2
CEP 60
CEP Complex 2.5
SSD 300-500
Disk 50-100
Performance Numbers (based on Jeff Dean’s numbers )
Mem Ops / Sec
If Memory access is a Second
L1 cache reference 0.05 1/20th sec
Main memory reference 1 1 sec
Send 2K bytes over 1 Gbps network 200 3 min
Read 1 MB sequentially from memory 2500 41 minDisk seek 1*10^5 27 hours
Read 1 MB sequentially from disk 2*10^5 2 days
Send packet CA->Netherlands->CA 1.5*10^6 17 days
OperationSpeed
MB/sec
Hadoop Select 3Terasort Bench mark 18Complex Query Hadoop 0.2
CEP 60
CEP Complex 2.5
SSD 300-500
Disk 50-100
Most Big Data Apps are Latency-bound!!
Often, your app waste CPU waiting for data to arrive
Latency Lags Bandwidth
• Observation in prof. Patterson’s Keynote at 2004
• Bandwidth improves, but not latency • Same holds now, and the gap is
widening with new systems
Handling Speed Differences in Memory Hierarchy
1. Caching – E.g. Processor caches, file cache,
disk cache, permission cache
2. Replication – E.g. RAID, Content Distribution
Networks (CDN), Web Cache
3. Prediction – Predict what data will be needed and prefect – Tradeoff bandwidth – E.g. disk caches, Google Earth
Above three does not always work
• Limitations – Caching works only if working set is small – Prefetching only works when access patterns are predictable – Replication is expensive and limited by receiving side machines
• Lets assume you are reading and filtering 10G data (assuming 6b per record that is 17Billion records)– 3 minutes to read the data from disk– 35ms to filter 10M in my laptop => 1 minutes to process all
data – Keeping data in memory can give about 30X more
Data Access Patterns in Big Data Applications
• Read from Disk, process once (Basic Analytics)– Data can be perfected, batch load is only about 100 times faster.– OK if processing time > data read time
• Read from Disk, iteratively Process (Machine Learning Algos, e.g. KMean)– Need to load data from disk once and process (e.g. Spark supports this)
• Interactive (OLAP)– Queries are random, data may be scattered. Once query started, can load data to
memory and process
• Random Access (e.g. Graph Processing)– Very hard to optimize
• Realtime Access – As data comes in
In-Memory Computing
Four Myths
• Myths– Too expensive 1TB RAM cluster for 20-40k (about 1$/GB)– It is not durable – Flash is fast enough – It is about In-Memory DBs
• From Nikita Ivanov’s post– http
://gridgaintech.wordpress.com/2013/09/18/four-myths-of-in-memory-computing/
Let us look at each Big data access pattern and where In-Memory
Computing can make a difference
Access Pattern 1:Read from Disk, Process Once • If Tp = 35ms vs
Td=1.2 sec with 60MB chunks, it will give about 30X to keep all data in Memory
• However, this benefit is less if computation is more complex (e.g. Sort)
Access Pattern 2: Read from Disk, iteratively Process
• Very common pattern for machine learning algorithms (e.g. KMean)
• On this case, advantages are greater – If we cannot hold data in memory fully, we need to offload– Then we need to read again – Then cost is very high to load and process and much faster
with in memory computing
• Spark let you load to memory fully and process
Spark• New Programming Model
built on functional programming concepts
• Can be much faster for recursive usecases
• Have a complete stack of products
file = spark.textFile("hdfs://...”)file.flatMap(
line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)
Access Pattern 3: Interactive Queries
• Need to be responsive, < 10 sec• Harder to predict what data is needed• Queries tend to be simpler • Can be made faster by a RAM Cloud
– SAP Hana– Volt DB
• With smaller queries, disk may still be OK. Apache Drill as an Alternative
VoltDB Story
• VoltDB Team (Michael Stonebraker et al.) observed 92% of work in a DB related to Disk
• By building complete in-memory database cluster they made it 20x faster!
Distributed Cloud (e.g. Hazelcast)
• Store the data portioned and replicated across many machines
• Used as a cache that span multipme machines• Key value access
Access Pattern 4: Random Accesses
• E.g. Graph Traversal • This is the hardest usecase • In easy cases, there is a small working set and can be
solved with a cache ( checking users against a black list), not the case with Graph most graph operations like traversal
• Hard cases, In Memory Computing is only real solution • Can be as fast as 1000x or more
Access Pattern 5: Realtime Processing
• This is already In-Memory technology using tools like Complex Event Processing (e.g. WSO2 CEP) or stream processing (e.g. Apache Storm)
Faster Access to Data
• In-Memory databases (e.g. VoltDB, MemSQL)– Provide Same SQL interface– Can think as fast database– VoltDB has shown to about 20X faster than MySQL
• Distributed Cache – Can Integrated as a Large Cache
Load Data Set to Memory and Analyze
• Used with Interactive and Random access usecases • Can be as 1000x faster for some usecases • Tools
– Spark – Hazelcast– SAP Hana
Realtime Processing
• Realtime analytics tools – CEP (WSO2 CEP)– Stream Processing (e.g. Storm)
• Can generate results within few milliseconds to seconds
• Can process 10ks-millions of events per second
• Not all algorithms can be implemented
In Memory Computing with WSO2 Platform
Thank You