Upload
uri-cohen
View
372
Download
3
Tags:
Embed Size (px)
DESCRIPTION
The principles and foundations of in memory data grids
Citation preview
Uri CohenHead of Product @ [email protected]/uric
In-Memory Data Grids, Demystified
Agenda
• Why IMDG?• Brief History• How It Works– Data model & placement– HA and fault tolerance – Consistency – Internals
Why IMDG?
Today, more than ever, there are many choices when it comes to storing your data
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
4
But There Many
Solutions
® Copyright 2011 Gigaspaces Ltd. All Rights Reserved
5
Just A Few Years Back
So Why Indeed??
The Need for Speed, In
Real Time…
Some Facts
Memory will always be faster
than disk (usually by orders of
magnitude)
Recent Survey
67%
The ratio of IT managers that think that real time analysis is the biggest challenge for big data implementations
40%
• Plan to use in memory technologies for big data projects.• Only 32%
mentioned Hadoop
Stream Processing
Hell, Even Gartner Thinks So
“In memory computing (IMC) … provides transformational opportunities. The execution of
certain-types of hours-long batch processes can be squeezed into minutes or even seconds …
Millions of events can be scanned in a matter of a few tens of millisecond to detect correlations and patterns
pointing at emerging opportunities and threats "as things happen.”
And nowadays
HW and SW just makes it a whole lot
cheaper
Some Common Use Cases
Fast, Transactional Data Access
• Inventory management • Financial
reference data• Real time
transactional data
Real Time Stream
Processing
• Fraud Detection• Click Stream
Analysis • Real time
analytics • Continuous
calculation
Heavyweight Offline
Calculations
• Trade Reconciliation • Pattern analysis
and detection• Number crunching
Caching
• Database offloading • Content heavy
websites
The Evolution of Data Grids
First There Were Local Caches
CacheIn process cachingof Key->Value data
structure
Distribute CachePartitioned cache
nodes
IMDGPartitioned system
of record
IMDG.next()
Good for repetitive-data reads
Limited in capacity
Doesn’t handle write-heavy scenarios
Reads are only part latency path
Then Came Distributed Caches
CacheIn process cachingof Key->Value data
structure
Distribute CachePartitioned cache
nodes
IMDGPartitioned system
of record
Increased Capacity
Still no support for write-heavy scenarios
Limited to ID-based reads
Reads are only part latency path
IMDG.next()
In Memory Data Grids
CacheIn process cachingof Key->Value data
structure
Increased capacity
Write scalability
Can serve as system of record with querying & transaction semantics
Still limited in capacity
Latency can come from other parts of your app
Distribute CachePartitioned cache
nodes
IMDGPartitioned system
of record
IMDG.next()
How It Works
Data Models
27
Data Placement – Fixed Hashing
hash(key) % #nodes
28
Fixed Hashing - HA
hash(key) % #nodes
29
Fixed Hashing – Scaling
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
30
Data Placement – Consistent Hashing
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
31
Data Placement – Consistent Hashing
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
32
Data Placement – Consistent Hashing
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
33
Data Placement – Consistent Hashing
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
34
Data Placement – Consistent Hashing
Source: http://www.griddynamics.com/distributed-algorithms-in-nosql-databases/
Data Consistency
Since we’re dealing with distributed data, consistency cannot be taken for granted• Read after write • Read after read • Write-write consistency
Solution 1: Single
Master
Solution 2: Read/Write Quorums
Some More Concerns
• Transactions• Querying • Failure detection • Leader election • Persistency • Interoperability
IMDG.next()
Using IMDG for messaging, BL
IMDG.next()
SSD FTW!
Thank You!
docs.gigaspaces.com