In Memory Data Grid in Actionwith Oracle Coherencefor Paris NoSQL User Group
Cyrille Le Clerc
Transactions chapter will be presented during another session
Wednesday, May 25, 2011
Speaker
2
Cyrille Le Clerc
@cyrilleleclerc
blog.xebia.fr
Open Source (Apache CXF, ...)
In Memory Data Grid
Large Scale
“you build it, you run it”
Wednesday, May 25, 2011
3
Once upon a time...
Wednesday, May 25, 2011
4
- Released Coherence in 2001- Started as a distributed cache
- Released Gigaspaces XAP in 2001- Started as a data grid
On the Financial side
• Very low latency
• Rich queries & transactions
• Scalability
• Data consistency
Needs within financial market :
Wednesday, May 25, 2011
5
Let’s define an In Memory Data Grid ...
Wednesday, May 25, 2011
Let’s define an In Memory Data Grid
6
eXtreme Scale
This is an In Memory Data Grid
Wednesday, May 25, 2011
Let’s define an In Memory Data Grid
7
This is Network Attached Memory
Wednesday, May 25, 2011
Similarities with NoSQL document orientedPartitioned, distributed Hastable, schema-less, value is not opaque, scale-out scalability
Very fastIn memory (persistence coming), business logic inside the data
Consistent and AvailableTransactional, redundant
Written in Java, data are POJOs Not necessary
Clients in Java, Microsoft, etc8
Let’s define an In Memory Data Grid
Wednesday, May 25, 2011
9
Use cases for this presentation
Wednesday, May 25, 2011
Train Booking System
10
trains, stations, seats, booking and passengers
Wednesday, May 25, 2011
eCommerce Web Site
11
warehouse stocks
231
2
canon-eos: 1ipod : 1headphone : 1iphone: 1...
ipad : 1 iphone: 1
barbie : 1iphone: 1cabbage-doll: 1
121
311
12
264
637
{ "name": "Barbie Computer", "stock": 637, "weigth" : 200 }
warehouse & customers shopping carts
Wednesday, May 25, 2011
12
In Memory Data Grids Key Principles
Wednesday, May 25, 2011
Store Everything in a Mainframe !
13
3 To of RAM80 x 5.2 GHtz coresMuch more than $1,000,000
IBM z11http://ibm.com/
Wednesday, May 25, 2011
Spread on Inexpensive Servers
14
Mainframe Cheap Servers !http://1userverrack.net/
http://ibm.com/
Wednesday, May 25, 2011
Partition Data
15
MainFrame
Smallservers
Partition gamma
Partition beta
Partition alpha
Partition for scalability
Wednesday, May 25, 2011
Duplicate Data
16
sync synchronization
Duplicate data for high availability
Partition alpha
Master
Standby Backup
Wednesday, May 25, 2011
17
Data Access Patterns
Wednesday, May 25, 2011
Data Access Patterns
This is not traditional Java EE coding style !
Can apply very complex business logic inside the data
18
Stored Procedures Style
Change management challenge !
Wednesday, May 25, 2011
19
Pattern : Targeted Operation
Wednesday, May 25, 2011
Pattern: Targeted Operation
20
Partition gamma
Search Trains
Partition beta
Search Trains
Partition alpha
Search Trains
{ "train-id": "tgv-3071-20110512", "time" : 2011/05/12 12:15, "departure" : "Paris", "arrival" : "Marseille", "seats" : 3, }
Book Train Tickets
“train-id” is indexed
Wednesday, May 25, 2011
21
Pattern : Map Reduce Style Operation
Wednesday, May 25, 2011
Pattern: Map Reduce
22
Partition gamma
Search Trains
Partition beta
Search Trains
Partition alpha
Search Trains
{ "departure": "Paris", "arrival": "Marseille", "time" : 2011/05/12 12:00, "seats" : 3, }
Distributed “Search Train Ticket”Wednesday, May 25, 2011
Pattern: Map Reduce
23
Partition gamma
Search Trains
Partition beta
Search Trains
Partition alpha
Search Trains
{ "Paris -> Marseille : 12:15", "Paris -> Marseille : 13:15"}
Distributed “Search Train Ticket”
{ #NONE# }
{ "Paris -> Lyon -> Marseille : 12:40"}
Wednesday, May 25, 2011
Pattern: Map Reduce
24
Partition gamma
Search Trains
Partition beta
Search Trains
Partition alpha
Search Trains
Distributed “Search Train Ticket”
{ "Paris -> Marseille : 12:15", "Paris -> Lyon -> Marseille : 12:40", "Paris -> Marseille : 13:15"}
Wednesday, May 25, 2011
Data Access Patterns
This is not traditional Java EE coding style
Don’t forget “Map Reduce” = “Distributed Table Scan”
25
Use Indexes
Change management
Wednesday, May 25, 2011
26
CAP Theorem & In Memory Data Grids
Wednesday, May 25, 2011
CAP Theorem and In Memory Data Grid
27
Consistency
Availability
PartitionTolerance
Only 2 of these 3 properties can be
achieved at any given moment in time
Brewer’s Conjecture
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
Wednesday, May 25, 2011
CAP Theorem and In Memory Data Grid
28
Consistency
Availability
PartitionTolerance
Only 2 of these 3 properties can be
achieved at any given moment in time
Brewer’s Conjecture
http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
Data Grids
Wednesday, May 25, 2011
Cross Data Center Data Consistency
29
TokyoNew York
London
World wide replicationfor financial market
Wednesday, May 25, 2011
Cross Data Center Data Consistency
30
West Coast
East Coast
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 }
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 }
Warehouse stocks
Wednesday, May 25, 2011
Cross Data Center Data Consistency
31
propagation delay !
West Coast
East Coast
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 }
set stock to 146
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 }
Wednesday, May 25, 2011
Cross Data Center Data Consistency
32
West Coast
East Coast
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 }
set stock to 146
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 }
set weight 175reconciliation API needed !
Wednesday, May 25, 2011
Cross Data Center Data Consistency
33
West Coast
East Coast
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 }
set stock to 146
{ "name": "Barbie Computer", "stock": 147, "weigth" : 200 }
set weight 175Network partitioning
Wednesday, May 25, 2011
34
Data Modeling
Wednesday, May 25, 2011
Data Modeling
Dominant Question Driven Design
Constrained Tree Schema
Denormalized
35
Opposite to Relational which is Domain Driven Design
Because RPC matters
Due to dominant questions and CTS
Wednesday, May 25, 2011
Data Modeling
36
TrainStopdate
TrainStationcodename
Traincodetype
Seatnumberprice
Bookingreduction
Passengername
Typical relational data model
Wednesday, May 25, 2011
Data Modeling
37
Find the root entity and denormalize
TrainStopdate
Seatnumberprice
Bookingreduction
Passengername
Reference data
Duplicated in each grid node
TrainStationcodename
Root entity
Partitioning ready entities tree
Traincodetype
Wednesday, May 25, 2011
Data Modeling
38
Remove unused data
TrainStopdate
Seatnumberprice
Bookingreduction
Passengername
booked
TrainStationcodename
Traincodetype
Partitioned
Replicated
Wednesday, May 25, 2011
Data Modeling
39
TrainStopdate
TrainStationcodename
Seatnumberpricebooked
Traincodetype
Data Grid Ready data structure
Partitioned
Replicated
Wednesday, May 25, 2011
40
Data Modeling is Hard !
Wednesday, May 25, 2011
Data Modeling is Hard !
41
Two root entities for the same MoneyTransfer !
from to
CashWitdrawaldateamount
MoneyTransferiddateamount
Accountnumber
CashWitdrawaldateamount
Accountnumber
Wednesday, May 25, 2011
Data Modeling is Hard !
42
CashWitdrawaldateamount
CashWitdrawaldateamount
MoneyTransferIniddateamount
MoneyTransferOutiddateamount
Accountnumber
Accountnumber
Split MoneyTransfer
Wednesday, May 25, 2011
Data Modeling is Hard !
43
CashWitdrawaldateamount
MoneyTransferOutiddateamount
Accountnumber
CashWitdrawaldateamount
MoneyTransferIniddateamount
Accountnumber
Split MoneyTransfer
Wednesday, May 25, 2011
Data Modeling is Hard !
44
CashWitdrawaldateamount
MoneyTransferOutiddateamount
MoneyTransferIniddateamount
Accountnumber
Data Grid Ready data structure
Wednesday, May 25, 2011
45
Grid Internals
Wednesday, May 25, 2011
Data Serialization
Used for data transfer and byte oriented storage
Hot topic like Apache Thrift, Apache Avro, Google Protocol Buffer
46
Must support evolvable data structure
Wednesday, May 25, 2011
Data Storage
Store Java Beans in the grid
Store byte arrays in the grid
47
No need to unmarshall for inprocess operations
Beware of garbage collector !
Pay unmarshalling at each read and write
Slightly more garbage collector friendlyLow-level / byte-oriented APIs to read data
Wednesday, May 25, 2011
Communication Protocols
UDP Multi Cast (Coherence, Gigaspaces)
TCP/IP (Websphere eXtreme Scale)
48Wednesday, May 25, 2011
Topology
Partitions made of shards : 1 primary + 0..* backups)
Dynamic shards location (changes at runtime and at restart)
Can use dedicated “directory servers” or embed it in the “data nodes”
49Wednesday, May 25, 2011
JVM and Memory
Many editors recommend tiny 1.4 Go JVM !
More than ten JVM per server
50
Garbage collector hell
Management hell
More and more IMDG support large heaps
Wednesday, May 25, 2011
51
APIs
Wednesday, May 25, 2011
Raw Java Mapping with Oracle Coherence
52
hand-coded serializationJUnit is your friend !
public class Train extends AbstractEvolvable implements PortableObject { enum Type { HIGH_SPEED, NORMAL }
/** Key of the Cache */ String code;
/** Indexed */ String name;
Type type;
List<Seat> seats = new ArrayList<Seat>();
int version;
List<TrainStop> trainStops = new ArrayList<TrainStop>();
@Override public int getImplVersion() { return 1; }
@Override public void readExternal(PofReader pofReader) throws IOException { this.code = pofReader.readString(0); this.name = pofReader.readString(1); this.type = (Type) pofReader.readObject(2); pofReader.readCollection(3, this.seats); pofReader.readCollection(4, this.trainStops); this.version = pofReader.readInt(5); }
@Override public void writeExternal(PofWriter pofWriter) throws IOException { pofWriter.writeString(0, this.code); pofWriter.writeString(1, this.name); pofWriter.writeObject(2, this.type); pofWriter.writeCollection(3, this.seats, Seat.class); pofWriter.writeCollection(4, this.trainStops, TrainStop.class); pofWriter.writeInt(5, this.version); }}
TrainStopdate
Seatnumberpricebooked
Traincodetype
Wednesday, May 25, 2011
JPA Style Mapping with Websphere eXtreme Scale
53
sub entities can have cross relations
@Entity(schemaRoot=true)public class Train { @Id String code; @Index @Basic String name; @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version;
...}
TrainStopdate
Seatnumberpricebooked
Traincodetype
Wednesday, May 25, 2011
Map API with Oracle Coherence
54
NamedCache trainCache = CacheFactory.getCache("train-cache");
/** Save */ void persist(Train train) { trainCache.put(train.getCode(), train); } /** Find by key */ Train findByCode(String code) { return (Train) trainCache.get(code); }
/** Find by Query Language */ Train findByTrainName(String name) { Filter filter = QueryHelper.createFilter("name = :name" , Collections.singletonMap("name", name)); Set<Map.Entry<String, Train>> trainEntrySet = trainCache.entrySet(filter); if (trainEntrySet.isEmpty()) { return null; } else { return trainEntrySet.iterator().next().getValue(); } }
Map API
Wednesday, May 25, 2011
JPA Style with Websphere eXtreme Scale
55
/** Save */void persist(Train train) { entityManager.persist(train);}
/** Find by key */Train findByCode(String code) { return (Train) entityManager.find(Train.class, code);}
/** Query Language */Train findByTrainName(String name) { Query q = entityManager.createQuery("select t from Train t where t.name=:name"); q.setParameter("name", name);
return (Train) q.getSingleResult();}
JPA Style Entity Manager
Wednesday, May 25, 2011
Creating Indexes
56
Map reduce (without index) = Distributed Table Scan !
Wednesday, May 25, 2011
Indexes with Oracle Coherence
57
class Train { String name;
Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); }
...}
{ NamedCache trainCache = CacheFactory.getCache("train-cache");
trainCache.addIndex(new ReflectionExtractor("getName"), false, null); trainCache.addIndex(new ReflectionExtractor("getTrainStationsCodes"), false, null);}
Wednesday, May 25, 2011
Indexes with Websphere eXtreme Scale
58
@Entity(schemaRoot=true)class Train { @Index @Basic String name;
@Index Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); }
...}
Query query = em.createQuery("select t from Train t where t.name=:name");query.getPlan();
eXtreme Scale
for q2 in Train ObjectMap using INDEX on name = ( ?name) filter ( q2.c[0] = ?name ) returning new Tuple( q2 )
This is an execution plan
Wednesday, May 25, 2011
More APIs
Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data
59
Unified API ontop of NoSQL stores ?
Serialization / Object to Tuple Mapping API ?
Wednesday, May 25, 2011
60
Data Grid <-> Relational Database Interactions
Wednesday, May 25, 2011
Data Grid <-> Relational Database
61
Data Grids are “In Memory” -> we need to persist data on disk !
Wednesday, May 25, 2011
Data Grid <-> Relational Database
62
update / insert / delete
“select directly modified in DB”
Wednesday, May 25, 2011
Data Grid <-> Relational Database
63
backend DB
Highly available write behind queues+ SQL batched statements
Data Grid -> Relational Database
Wednesday, May 25, 2011
Data Grid <-> Relational Database
64
TrainStopdate
TrainStationcodename
Seatnumberpricebooked
Traincodetype
Constrained Tree Schema <-> Relational Impedance Mismatch
Data Grid -> Relational Database
Wednesday, May 25, 2011
Data Grid <-> Relational Database
DB writes MUST succeed !
65
Align the database on the Data Grid model !
Denormalize the databaseRemove the foreign keys, use same PKs in DB and data gridSupport unordered SQL statements
Prefer raw SQL rather than reused business logic
Wednesday, May 25, 2011
Data Grid <-> Relational Database
66
backend DB
Data Grid Originated Scheduled Refresh(Oracle System Change Number, etc)
select * from train where last_modif > ?
Relational Database -> Data Grid
Wednesday, May 25, 2011
Data Grid <-> Relational Database
67
backend DB
Database Originated PushJMS = durable subscription(Oracle Database Change Notification, etc)
Relational Database -> Data Grid
Wednesday, May 25, 2011
Data Grid <-> Relational Database
In Memory -> prepare for reloading after maintenance operations !
Prepare consistency checkers
68
Need for “graceful shutdown with disk persistence”
Wednesday, May 25, 2011
69
Transactions
Wednesday, May 25, 2011
70
We didn’t have the time to talk about transaction.
Another session is planned at Paris No SQL User Group for this.
Wednesday, May 25, 2011
71
Let’s go live !
Wednesday, May 25, 2011
Data Grids and Operations
Standard packaging?
Limited Management
Limited debugging tools
JVM pandemia
72
Do It Yourself (layout, scripts, etc)
Do It Yourself (stop/start, detecting data loss, etc)
Dozens of JVM to manage !
Do It Yourself (debugging consoles, troubleshooting agents)
Wednesday, May 25, 2011
Data Grids and Operations
Dev / Ops collaboration is required
Experts only !
73Wednesday, May 25, 2011
74
The right tool for the right job
Wednesday, May 25, 2011
The right tool for the right job
Incredibly fast ! Even with transactions !
Scalable
Good at data replication (when it implements it)
Very geeky on both dev and ops side
“Quite” expensive
75
Not an enterprise grade data store
Reconciliation api, etc
Requires very skilled people + change management
If you solve the data loading issue
Wednesday, May 25, 2011
76
?
Questions / Answers
Wednesday, May 25, 2011