In Memory Data Grids, Demystified!

Uri CohenHead of Product @ [email protected]/uric

In-Memory Data Grids, Demystified

Agenda

• Why IMDG?• Brief History• How It Works– Data model & placement– HA and fault tolerance – Consistency – Internals

Why IMDG?

Today, more than ever, there are many choices when it comes to storing your data

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

4

But There Many

Solutions

® Copyright 2011 Gigaspaces Ltd. All Rights Reserved

5

Just A Few Years Back

So Why Indeed??

The Need for Speed, In

Real Time…

Some Facts

Memory will always be faster

than disk (usually by orders of

magnitude)

Recent Survey

67%

The ratio of IT managers that think that real time analysis is the biggest challenge for big data implementations

40%

• Plan to use in memory technologies for big data projects.• Only 32%

mentioned Hadoop

Stream Processing

Hell, Even Gartner Thinks So

“In memory computing (IMC) … provides transformational opportunities. The execution of

certain-types of hours-long batch processes can be squeezed into minutes or even seconds …

Millions of events can be scanned in a matter of a few tens of millisecond to detect correlations and patterns

pointing at emerging opportunities and threats "as things happen.”

And nowadays

HW and SW just makes it a whole lot

cheaper

Some Common Use Cases

Fast, Transactional Data Access

• Inventory management • Financial

reference data• Real time

transactional data

Real Time Stream

Processing

• Fraud Detection• Click Stream

Analysis • Real time

analytics • Continuous

calculation

Heavyweight Offline

Calculations

• Trade Reconciliation • Pattern analysis

and detection• Number crunching

Caching

• Database offloading • Content heavy

websites

The Evolution of Data Grids

First There Were Local Caches

CacheIn process cachingof Key->Value data

structure

Distribute CachePartitioned cache

nodes

IMDGPartitioned system

of record

IMDG.next()

Good for repetitive-data reads

Limited in capacity

Doesn’t handle write-heavy scenarios

Reads are only part latency path

Then Came Distributed Caches


structure


nodes


of record

Increased Capacity

Still no support for write-heavy scenarios

Limited to ID-based reads

Reads are only part latency path

IMDG.next()

In Memory Data Grids


structure

Increased capacity

Write scalability

Can serve as system of record with querying & transaction semantics

Still limited in capacity

Latency can come from other parts of your app


nodes


of record

IMDG.next()

How It Works

Data Models

27

Data Placement – Fixed Hashing

hash(key) % #nodes

28

Fixed Hashing - HA

hash(key) % #nodes

29

Fixed Hashing – Scaling

Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/

http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/



30

Data Placement – Consistent Hashing





31






32






33






34






Data Consistency

Since we’re dealing with distributed data, consistency cannot be taken for granted• Read after write • Read after read • Write-write consistency

Solution 1: Single

Master

Solution 2: Read/Write Quorums

Some More Concerns

• Transactions• Querying • Failure detection • Leader election • Persistency • Interoperability

IMDG.next()

Using IMDG for messaging, BL

IMDG.next()

SSD FTW!

Thank You!

docs.gigaspaces.com

Technology

In Memory Data Grids, Demystified!