73
NoSQL & DataGrids from a Developer Perspective Cyrille Le Clerc - Michaël Figuière

GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Embed Size (px)

DESCRIPTION

GeeCon 2011 : NoSQL and In Memory Data Grids from a developer perspective by Cyrille Le Clerc and Michael Figuière - Xebia

Citation preview

Page 1: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

NoSQL & DataGrids from a Developer Perspective

Cyrille Le Clerc - Michaël Figuière

Page 2: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Speaker

Cyrille Le Clerc

@cyrilleleclerc

blog.xebia.fr

Apache CXF DataGrids

Large Scale

Page 3: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Speaker

Michaël Figuière

@mfiguiere

blog.xebia.fr

Search Engines NoSQL

DistributedSystems

Page 4: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

About NoSQL

No SQL

Page 5: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

About NoSQL

No SQL

Not Only

Page 6: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

About NoSQL

No SQLRelational

Not Only

Page 7: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Once upon a time...

Page 8: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

On the Web side

• Huge amount of data

• High availability

• Fault tolerance

• Scalability on commodity hardware

Similar needs for Web giants :- Created Dynamo- < 40 min of unavailability per year

- Created BigTable & MapReduce- Stores every webpages of Internet

Page 9: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Amazon : the birth of Dynamo

Fill cart Checkout Payment Process order Prepare Send

Requires high availability,key-value store is enough

Requires complex requests,temporal unavailability is acceptable

Page 10: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

On the Financial side

• Very low latency

• Rich queries & transactions

• Scalability

• Data consistency

Needs within financial market :- Released Coherence in 2001- Started as a distributed cache

- Released Gigaspaces XAP in 2001- Routes the request inside the data

Page 11: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Partitioning and Replication

Page 12: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Use Case : Train Ticketing System

With trains, stations, seats, booking and passengers

Page 13: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Store everything in a Mainframe !

Up to 3 To of RAM ! More than $1,000,000

IBM z11

Page 14: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Partitioning

Split data for scalability

MainFrame

Smallservers

Partition gamma

Partition beta

Partition alpha

Page 15: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Replication

synchro

Duplicate data for high availability and

scalability

Partition alpha

Node 1

Node 2

Node 3

Page 16: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Partitioned Data Modeling

Page 17: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Partitioned Data Modeling

TrainStopdate

TrainStationcodename

Traincodetype

Seatnumberprice

Bookingreduction

Passengername

Typical relational data model

Page 18: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Partitionned Data Modeling

TrainStopdate

Seatnumberprice

Bookingreduction

Passengername

Reference data

Duplicated in each partition

TrainStationcodename

Root entity

Partitioning ready entities tree

Traincodetype

Find the root entity and denormalize

Page 19: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Partitionned Data Modeling

Remove unused data

TrainStopdate

Seatnumberprice

Bookingreduction

Passengername

booked

TrainStationcodename

Traincodetype

Page 20: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Partitionned Data Modeling

TrainStopdate

TrainStationcodename

Seatnumberpricebooked

Traincodetype

Sharding ready data structure

Page 21: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Consistency, Availability and Partition Tolerance

Page 22: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Consistency with replicas

Node 1

write to all

read from one

{ "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]}

Node 2

Node 3

Node 1

Node 2

Node 3

Page 23: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Consistency with replicas

{ "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]}

write to one

read from all

Node 1

Node 2

Node 3

Node 1

Node 2

Node 3

Page 24: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Consistency with replicas

• You can adjust the balance between number of writes and number of reads

• See Eventual Consistency

Page 25: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Consistency with Multiple Data Centers

West Coast

East Coast

{ "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]}

{ "name": "Barbie Computer", "price": 15.50, "tags" : [ "doll", "barbie" ]}

Page 26: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Consistency with Multiple Data Centers

{ "name": "Barbie Computer", "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer",

"price": 15.50, "tags" : [ "doll", "barbie" ]}

set price to $ 20.00

propagation delay !

West Coast

East Coast

Page 27: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Consistency with Multiple Data Centers

{ "name": "Barbie Computer", "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer",

"price": 15.50, "tags" : [ "doll", "barbie", “girl” ]}

set price to $ 20.00

add tag “girl”reconciliation API needed !

West Coast

East Coast

Page 28: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Consistency with Multiple Data Centers

{ "name": "Barbie Computer", "price": 20.00, "tags" : [ "doll", "barbie" ]} { "name": "Barbie Computer",

"price": 15.50, "tags" : [ "doll", "barbie", “girl” ]}

set price to $ 20.00

add tag “girl”Network partitioning

West Coast

East Coast

Page 29: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data Consistency with Multiple Data Centers

TokyoNew York

London

World wide replicationfor financial market

Page 30: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

CAP Theorem

Consistency

Availability

PartitionTolerance

Only 2 of these 3 properties can be

achieved in storage system

Page 31: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

CAP Theorem

Impossible

Relational DBNoSQL DB Consistency

Availability

PartitionTolerance

Page 32: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Data models & APIs

Page 33: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Request Driven Data Modeling

• Relational data modeling is business driven

• With partitioning, data modeling had to be adapted for requests

• NoSQL & DataGrids data modeling is request driven

Adaptation to requests comes with tuning

Because network latency matters

Two requests may require to store data twice

Page 34: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Key-Value Store

In memory

Persistent

In memory with async persistence

Page 35: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Example with a user profile

johndoe User profile as byte[]

Similar to a Java HashMap

Page 36: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Write Example with Riak

RiakClient riak = new RiakClient("http://server1:8098/riak");

RiakObject userProfileObj = new RiakObject("bucket", "johndoe", serializer.serialize(userProfile);

riak.store(userProfileObj);

Inserts a user profile into Riak

Page 37: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Read Example with Riak

FetchResponse response = riak.fetch("bucket", "johndoe");

if (response.hasObject()) {

userProfileObj = response.getObject();

}

Fetch a user profile using its key in Riak

Page 38: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Column Families Store

Page 39: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Column Families Store

Relational DB Column families DB

For each Row ID we have a list of key-value pairs

Key-value pairs are

sorted by keys

Page 40: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Example with a shopping cart

17:21 Iphone 17:32 DVD Player 17:44 MacBookjohndoe

6:10 Camera 8:29 Ipadwillsmith

14:45 PlayStation 15:01 Asus EEE 15:03 Iphonepitdavis

Page 41: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Write Example with Cassandra

Cluster cluster = HFactory.getOrCreateCluster("cluster", new CassandraHostConfigurator("server1:9160"));

Keyspace keyspace = HFactory.createKeyspace("EcommerceKeyspace", cluster);

Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);

mutator.insert("johndoe", "ShoppingCartColumnFamily", HFactory.createStringColumn("14:21", "Iphone"));

Inserts a column into the ShoppingCartColumnFamily

Page 42: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Read Example with Cassandra

SliceQuery<String, String, String> query = HFactory.createSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer);

query.setColumnFamily("ShoppingCartColumnFamily") .setKey("johndoe") .setRange("", "", false, 10);

QueryResult<ColumnSlice<String, String>> result = query.execute();

Reads a slice of 10 columns from ShoppingCartColumnFamily

Page 43: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Document Store

Page 44: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Example with an item of a catalog

{ "name": "Iphone", "price": 559.0, "vendor": "Apple", "rating": 4.6, "tags": [ "phone", "touch" ]}

item_1

The database is aware of document’s fields and

can offers complex queries

Page 45: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Write Example with MongoDB

Mongo mongo = new Mongo("mongos_1", 27017);DB db = mongo.getDB("Ecommerce");DBCollection catalog = db.getCollection("Catalog");

BasicDBObject doc = new BasicDBObject();doc.put("name", "Iphone");doc.put("price", 559.0);

catalog.insert(doc);

Inserts an item document into MongoDB

Page 46: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Read Example with MongoDB

BasicDBObject query = new BasicDBObject();query.put("price", new BasicDBObject("$lt", 600));DBCursor cursor = catalog.find(query);

while(cursor.hasNext()) { System.out.println(cursor.next());}

Queries for all items with a price lower than 600

Page 47: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

In Memory Data Grids

eXtreme Scale

Page 48: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Example with train booking with IBM eXtremeScale

With Data Grids,sub entities can have cross relations

@Entity(schemaRoot=true)public class Train { @Id String code; @Index @Basic String name; @OneToMany(cascade=CascadeType.ALL) List<Seat> seats = new ArrayList<Seat>(); @Version int version;

...}

TrainStopdate

Seatnumberpricebooked

Traincodetype

Page 49: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Write Example with IBM eXtreme Scale

void persist(Train train) { entityManager.persist(train);}

Inserts a train into eXtreme Scale

eXtreme Scale provides a JPA Style API

Page 50: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Read Example with IBM eXtreme Scale

/** Find by key */Train findById(String id) { return (Train) entityManager.find(Train.class, id);}

/** Query Language */Train findByTrain(String code) { Query q = entityManager.createQuery("select t from Train t where t.code=:code"); q.setParameter("code", code);

return (Train) q.getSingleResult();}

Simple and complex queries with eXtreme Scale

Page 51: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

More APIs

• Another Java EE versus Spring battle ? JSR 347 Data Grids vs. Spring Data

Unified API ontop of relational, document, column, key-value ?

Object to tuple projection API

Page 52: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions

Page 53: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions

• NoSQL usually means NO transactions

• Except when it means eXtreme Transactions !

Page 54: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions Concurrency

warehouse stocks

231

264

2

637

canon-eos: 1ipod : 1headphone : 1iphone: 1...

ipad : 1 iphone: 1

barbie : 1iphone: 1cabbage-doll: 1

concurrency on iphone

121

311

Place order

cancel order if one product is missing

12

Page 55: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

SQL Transactions

warehouse stocks

231

264

2

637

canon-eos: 1ipod : 1headphone : 1iphone: 1...

lock duration = f(shoppingcart.length)if too many locks on the rows, then lock table !

beginfor each shoppingCart.product select for update ... update ...commit

121

311

12

ipad : 1 iphone: 1

barbie : 1iphone: 1cabbage-doll: 1

Place order

Page 56: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

SQL Transactions

warehouse stocks

231

264

2

637

canon-eos: 1ipod : 1headphone : 1iphone: 1...

lock duration = f(shoppingcart.length)if too many locks on the rows, then lock table !

select for update ...

121

311

12

ipad : 1 iphone: 1

barbie : 1iphone: 1cabbage-doll: 1

Place order

Page 57: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

SQL Transactions

warehouse stocks

231

264

2

637

canon-eos: 1ipod : 1headphone : 1iphone: 1...

lock duration = f(shoppingcart.length)if too many locks on the rows, then lock table !

select for update ...

121

311

12

ipad : 1 iphone: 1

barbie : 1iphone: 1cabbage-doll: 1

Place order

Page 58: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions with Manual Compensation

warehouse stocks

231

264

2

637

code “do”, “undo” and the chain

121

311

12

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !}

DO

stock = stock + quantity;

UNDO

canon-eos: 1ipod : 1headphone : 1iphone: 1...

Place order

Page 59: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions with Manual Compensation

warehouse stocks

231

264

0

637

121

311

12

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

barbie : 1iphone: 1cabbage-doll: 1

Place order

Page 60: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions with Manual Compensation

warehouse stocks

231

264

0

636

121

311

12

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

barbie : 1iphone: 1cabbage-doll: 1

Place order

Page 61: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions with Manual Compensation

warehouse stocks

231

264

0

636

121

311

12

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

no more iphone !barbie : 1iphone: 1cabbage-doll: 1

Place order

Page 62: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions with Manual Compensation

warehouse stocks

231

264

0

636

121

311

12

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

-1if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

interruptedbarbie : 1iphone: 1cabbage-doll: 1

cancelled

Place order

Page 63: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions with Manual Compensation

warehouse stocks

231

264

0

636 +1

121

311

12

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

-1

-1

-1

interruptedbarbie : 1iphone: 1cabbage-doll: 1

cancelledif(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !

DO

stock = stock + quantity;

UNDO

undo

if(stock - quantity > 0) { stock = stock - quantity;} else { throw exception() !}

DO

stock = stock + quantity;

UNDO

Place order

Page 64: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions with Manual Compensation

• Code “do” & “undo” & chain execution

• What about interrupted chain execution ? Data corruption ?

Page 65: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Transactions with Manual Compensation

• Code “do” & “undo” & chain execution

• What about interrupted chain execution ? Data corruption ?

data store managed transaction chain execution

Page 66: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Which solution to choose?

Page 67: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Key-Value Store

• Get and Set by key

• Riak and Voldemort provide a great scalability

• Memcached and Redis offer low overhead and latency

Simple but enough for a lot of use cases

Great to persist continuously growing datasets

Great for cache and live data

Page 68: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Column Families Store

• Get and Set by key of a list of columns

• Queries are simples, but columns slice fetching is possible

• Data model is too low level for many complex data modeling

Makes it possible to fetch and update partial data

Great for pagination

Should typically be used for the largest scalability needs

Page 69: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Document Store

• Schema less

• Complex queries are available

• Scalability may be limited if not querying using partition key

Great for continuously updated schemas

Necessary for filtering and search

Can be handle using multiple storage and limited queries

Page 70: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

In Memory Data Grid

• Very Low Latency & eXtreme Transaction Processing (XTP)

• In Memory - No Persistence

• High budget and Developer skills required

Investment banking, booking & inventory systems

Most of the time backed with a database

Some Open Source alternatives are appearing

Page 71: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Polyglot storage for eCommerce

Application

Solr

MongoDB

Cassandra

Coherence

Productssearch

Warehouseinventory

Product catalog

User account and Shopping cart

Page 72: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Why NoSQL & DataGrids matter ?

• Polyglot Storage: databases that fit the needs of every type of data

• Linear Scalability: being able to handle any further business requirements

• High Availability: multi-servers and multi-datacenters

• Elasticity: natural integration with Cloud Computing philosophy

• Some new use cases now available

Page 73: GeeCon 2011 - NoSQL and In Memory Data Grids from a developer perspective

Questions / Answers

?