Download pdf - Paris NoSQL User Group - In Memory Data Grids in Action (without transactions chapter)

In Memory Data Grid in Actionwith Oracle Coherencefor Paris NoSQL User Group

Cyrille Le Clerc

Transactions chapter will be presented during another session

Wednesday, May 25, 2011

Speaker

2

Cyrille Le Clerc

@cyrilleleclerc

blog.xebia.fr

Open Source (Apache CXF, ...)

In Memory Data Grid

Large Scale

“you build it, you run it”


3

Once upon a time...


4

- Released Coherence in 2001- Started as a distributed cache

- Released Gigaspaces XAP in 2001- Started as a data grid

On the Financial side

• Very low latency

• Rich queries & transactions

• Scalability

• Data consistency

Needs within financial market :


5

Let’s define an In Memory Data Grid ...


Let’s define an In Memory Data Grid

6

eXtreme Scale

This is an In Memory Data Grid



7

This is Network Attached Memory


Similarities with NoSQL document orientedPartitioned, distributed Hastable, schema-less, value is not opaque, scale-out scalability

Very fastIn memory (persistence coming), business logic inside the data

Consistent and AvailableTransactional, redundant

Written in Java, data are POJOs Not necessary

Clients in Java, Microsoft, etc8



9

Use cases for this presentation


Train Booking System

10

trains, stations, seats, booking and passengers


eCommerce Web Site

11

warehouse stocks

231

2

canon-eos: 1ipod : 1headphone : 1iphone: 1...

ipad : 1 iphone: 1

barbie : 1iphone: 1cabbage-doll: 1

121

311

12

264

637

{ "name": "Barbie Computer", "stock": 637, "weigth" : 200 }

warehouse & customers shopping carts


12

In Memory Data Grids Key Principles


Store Everything in a Mainframe !

13

3 To of RAM80 x 5.2 GHtz coresMuch more than $1,000,000

IBM z11http://ibm.com/


http://ibm.com

http://ibm.com

Spread on Inexpensive Servers

14

Mainframe Cheap Servers !http://1userverrack.net/

http://ibm.com/


http://ibm.com

http://ibm.com

Partition Data

15

MainFrame

Smallservers

Partition gamma

Partition beta

Partition alpha

Partition for scalability


Duplicate Data

16

sync synchronization

Duplicate data for high availability

Partition alpha

Master

Standby Backup


17

Data Access Patterns



This is not traditional Java EE coding style !

Can apply very complex business logic inside the data

18

Stored Procedures Style

Change management challenge !


19

Pattern : Targeted Operation


Pattern: Targeted Operation

20

Partition gamma

Search Trains

Partition beta

Search Trains

Partition alpha

Search Trains

{ "train-id": "tgv-3071-20110512", "time" : 2011/05/12 12:15, "departure" : "Paris", "arrival" : "Marseille", "seats" : 3, }

Book Train Tickets

“train-id” is indexed


21

Pattern : Map Reduce Style Operation


Pattern: Map Reduce

22

Partition gamma

Search Trains

Partition beta

Search Trains

Partition alpha

Search Trains

{ "departure": "Paris", "arrival": "Marseille", "time" : 2011/05/12 12:00, "seats" : 3, }

Distributed “Search Train Ticket”Wednesday, May 25, 2011

Pattern: Map Reduce

23

Partition gamma

Search Trains

Partition beta

Search Trains

Partition alpha

Search Trains

{ "Paris -> Marseille : 12:15", "Paris -> Marseille : 13:15"}

Distributed “Search Train Ticket”

{ #NONE# }

{ "Paris -> Lyon -> Marseille : 12:40"}


Pattern: Map Reduce

24

Partition gamma

Search Trains

Partition beta

Search Trains

Partition alpha

Search Trains

Distributed “Search Train Ticket”

{ "Paris -> Marseille : 12:15", "Paris -> Lyon -> Marseille : 12:40", "Paris -> Marseille : 13:15"}



This is not traditional Java EE coding style

Don’t forget “Map Reduce” = “Distributed Table Scan”

25

Use Indexes

Change management


26

CAP Theorem & In Memory Data Grids


CAP Theorem and In Memory Data Grid

27

Consistency

Availability

PartitionTolerance

Only 2 of these 3 properties can be

achieved at any given moment in time

Brewer’s Conjecture

http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf




CAP Theorem and In Memory Data Grid

28

Consistency

Availability

PartitionTolerance

Only 2 of these 3 properties can be

achieved at any given moment in time

Brewer’s Conjecture


Data Grids




Cross Data Center Data Consistency

29

TokyoNew York

London

World wide replicationfor financial market



30

West Coast

East Coast



Warehouse stocks



31

propagation delay !

West Coast

East Coast


set stock to 146




32

West Coast

East Coast


set stock to 146


set weight 175reconciliation API needed !



33

West Coast

East Coast


set stock to 146


set weight 175Network partitioning


34

Data Modeling


Data Modeling

Dominant Question Driven Design

Constrained Tree Schema

Denormalized

35

Opposite to Relational which is Domain Driven Design

Because RPC matters

Due to dominant questions and CTS


Data Modeling

36

TrainStopdate

TrainStationcodename

Traincodetype

Seatnumberprice

Bookingreduction

Passengername

Typical relational data model


Data Modeling

37

Find the root entity and denormalize

TrainStopdate

Seatnumberprice

Bookingreduction

Passengername

Reference data

Duplicated in each grid node


Root entity

Partitioning ready entities tree

Traincodetype


Data Modeling

38

Remove unused data

TrainStopdate

Seatnumberprice

Bookingreduction

Passengername

booked


Traincodetype

Partitioned

Replicated


Data Modeling

39

TrainStopdate


Seatnumberpricebooked

Traincodetype

Data Grid Ready data structure

Partitioned

Replicated


40

Data Modeling is Hard !



41

Two root entities for the same MoneyTransfer !

from to

CashWitdrawaldateamount

MoneyTransferiddateamount

Accountnumber


Accountnumber



42



MoneyTransferIniddateamount

MoneyTransferOutiddateamount

Accountnumber

Accountnumber

Split MoneyTransfer



43



Accountnumber



Accountnumber

Split MoneyTransfer



44




Accountnumber

Data Grid Ready data structure


45

Grid Internals


Data Serialization

Used for data transfer and byte oriented storage

Hot topic like Apache Thrift, Apache Avro, Google Protocol Buffer

46

Must support evolvable data structure


Data Storage

Store Java Beans in the grid

Store byte arrays in the grid

47

No need to unmarshall for inprocess operations

Beware of garbage collector !

Pay unmarshalling at each read and write

Slightly more garbage collector friendlyLow-level / byte-oriented APIs to read data


Communication Protocols

UDP Multi Cast (Coherence, Gigaspaces)

TCP/IP (Websphere eXtreme Scale)

48Wednesday, May 25, 2011

Topology

Partitions made of shards : 1 primary + 0..* backups)

Dynamic shards location (changes at runtime and at restart)

Can use dedicated “directory servers” or embed it in the “data nodes”


JVM and Memory

Many editors recommend tiny 1.4 Go JVM !

More than ten JVM per server

50

Garbage collector hell

Management hell

More and more IMDG support large heaps


51

APIs


Raw Java Mapping with Oracle Coherence

52

hand-coded serializationJUnit is your friend !

public class Train extends AbstractEvolvable implements PortableObject { enum Type { HIGH_SPEED, NORMAL }

/** Key of the Cache */ String code;

/** Indexed */ String name;

Type type;

List<Seat> seats = new ArrayList<Seat>();

int version;

List<TrainStop> trainStops = new ArrayList<TrainStop>();

@Override public int getImplVersion() { return 1; }

@Override public void readExternal(PofReader pofReader) throws IOException { this.code = pofReader.readString(0); this.name = pofReader.readString(1); this.type = (Type) pofReader.readObject(2); pofReader.readCollection(3, this.seats); pofReader.readCollection(4, this.trainStops); this.version = pofReader.readInt(5); }

@Override public void writeExternal(PofWriter pofWriter) throws IOException { pofWriter.writeString(0, this.code); pofWriter.writeString(1, this.name); pofWriter.writeObject(2, this.type); pofWriter.writeCollection(3, this.seats, Seat.class); pofWriter.writeCollection(4, this.trainStops, TrainStop.class); pofWriter.writeInt(5, this.version); }}

TrainStopdate


Traincodetype


JPA Style Mapping with Websphere eXtreme Scale

53

sub entities can have cross relations

@Entity(schemaRoot=true)public class Train { @Id String code; @Index @Basic String name; @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version;

...}

TrainStopdate


Traincodetype


Map API with Oracle Coherence

54

NamedCache trainCache = CacheFactory.getCache("train-cache");

/** Save */ void persist(Train train) { trainCache.put(train.getCode(), train); } /** Find by key */ Train findByCode(String code) { return (Train) trainCache.get(code); }

/** Find by Query Language */ Train findByTrainName(String name) { Filter filter = QueryHelper.createFilter("name = :name" , Collections.singletonMap("name", name)); Set<Map.Entry<String, Train>> trainEntrySet = trainCache.entrySet(filter); if (trainEntrySet.isEmpty()) { return null; } else { return trainEntrySet.iterator().next().getValue(); } }

Map API


JPA Style with Websphere eXtreme Scale

55

/** Save */void persist(Train train) { entityManager.persist(train);}

/** Find by key */Train findByCode(String code) { return (Train) entityManager.find(Train.class, code);}

/** Query Language */Train findByTrainName(String name) { Query q = entityManager.createQuery("select t from Train t where t.name=:name"); q.setParameter("name", name);

return (Train) q.getSingleResult();}

JPA Style Entity Manager


Creating Indexes

56

Map reduce (without index) = Distributed Table Scan !


Indexes with Oracle Coherence

57

class Train { String name;

Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); }

...}

{ NamedCache trainCache = CacheFactory.getCache("train-cache");

trainCache.addIndex(new ReflectionExtractor("getName"), false, null); trainCache.addIndex(new ReflectionExtractor("getTrainStationsCodes"), false, null);}


Indexes with Websphere eXtreme Scale

58

@Entity(schemaRoot=true)class Train { @Index @Basic String name;

@Index Collection<String> getTrainStationsCodes() { return Collections2.transform(trainStops, ...); }

...}

Query query = em.createQuery("select t from Train t where t.name=:name");query.getPlan();

eXtreme Scale

for q2 in Train ObjectMap using INDEX on name = ( ?name) filter ( q2.c[0] = ?name ) returning new Tuple( q2 )

This is an execution plan


More APIs

Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data

59

Unified API ontop of NoSQL stores ?

Serialization / Object to Tuple Mapping API ?


60

Data Grid <-> Relational Database Interactions


Data Grid <-> Relational Database

61

Data Grids are “In Memory” -> we need to persist data on disk !



62

update / insert / delete

“select directly modified in DB”



63

backend DB

Highly available write behind queues+ SQL batched statements

Data Grid -> Relational Database



64

TrainStopdate



Traincodetype

Constrained Tree Schema <-> Relational Impedance Mismatch

Data Grid -> Relational Database



DB writes MUST succeed !

65

Align the database on the Data Grid model !

Denormalize the databaseRemove the foreign keys, use same PKs in DB and data gridSupport unordered SQL statements

Prefer raw SQL rather than reused business logic



66

backend DB

Data Grid Originated Scheduled Refresh(Oracle System Change Number, etc)

select * from train where last_modif > ?

Relational Database -> Data Grid



67

backend DB

Database Originated PushJMS = durable subscription(Oracle Database Change Notification, etc)

Relational Database -> Data Grid



In Memory -> prepare for reloading after maintenance operations !

Prepare consistency checkers

68

Need for “graceful shutdown with disk persistence”


69

Transactions


70

We didn’t have the time to talk about transaction.

Another session is planned at Paris No SQL User Group for this.


71

Let’s go live !


Data Grids and Operations

Standard packaging?

Limited Management

Limited debugging tools

JVM pandemia

72

Do It Yourself (layout, scripts, etc)

Do It Yourself (stop/start, detecting data loss, etc)

Dozens of JVM to manage !

Do It Yourself (debugging consoles, troubleshooting agents)


Data Grids and Operations

Dev / Ops collaboration is required

Experts only !


74

The right tool for the right job


The right tool for the right job

Incredibly fast ! Even with transactions !

Scalable

Good at data replication (when it implements it)

Very geeky on both dev and ops side

“Quite” expensive

75

Not an enterprise grade data store

Reconciliation api, etc

Requires very skilled people + change management

If you solve the data loading issue


76

?

Questions / Answers