DATA ACCESS FOR MODERN APPLICATIONS - · PDF fileSpring GemFire Top-Level Spring project...

Preview:

Citation preview

DATA ACCESS FOR MODERN APPLICATIONS James Williams – VMware Costin Leau (@costinl) - VMware

© 2011 SpringOne 2GX. All rights reserved. Do not distribute without permission.

Agenda

•  GemFire overview •  GemFire common patterns and usage

•  GemFire demo

•  GemFire and Spring eco-system

•  The Big Picture – NOSQL & Spring Data

2

About Costin

•  Spring committer since 2006

•  Involved in: –  Spring Framework (cache abstraction, JPA, @Bean, etc…) –  Pitchfork (Spring-based EJB 3 support in WebLogic) –  Spring OSGi –  Spring Data –  Spring GemFire –  Spring Hadoop

3

About James

•  Ex-JBoss SE(AKA Open Source Mercenary)

•  Left JBoss to build a product based on Spring/Tomcat

•  Two glorious years at VMware

4

Numbers everyone should know

5 Jeff Dean

Challenge #1 – Transaction Processing

•  Colocation –  Store reference data along-side operational data –  Most distributed transactions can be localized –  In-process data access is fast and reliable

•  Data Partitions –  Partition operational data across many servers –  Client should be partition unaware –  Distributed transactions are evil –  Compensating transactions are necessary evil

6

The Enterprise Data Fabric Way

GemFire helps customers access data at in-memory speed without compromising consistency and availability while providing an acceptable level of partition tolerance.

7

The CAP Dilemma

8

A!

P!C!

Highly consistent & available!

Fast, scalable access to data!

Handle network splits!

Relaxed partition tolerance!

Distributed System!!

!

!

!

!

!

Client!

Locator!Locator! Server!

Primary!

Server!Server!

Backups!

Database!

Big Data !Store!

Big Data!Store!

Reference Architecture – Bird’s Eye

9

Reference Architecture – Client

10

Servlet Container!

WAR!

Servlet Container!

WAR!

Spring!

GemFire Client!

Cache!

Connection Pool!

Region!

Reference Architecture – Server

11

Servlet Container!

WAR!

Servlet Container!

WAR!

Spring!

GemFire Server!

Cache!Region!

Data Queue!

Reference Architecture – Locator

12

JVM!

Locator!

Server!Server!

Server!

Coordinator!

Client!

JVM!

Locator!

Server!Server!

Server!

Coordinator!

Client!

JVM!

Locator!

Server!Server!

Server!

Coordinator!

Client!

Reference Architecture – Subscriptions

13

Queue!

Queue!JVM!

Locator!

Server!Server!

Server!

Coordinator!

Client!

Reference Architecture – Two Hop

14

JVM!

Locator!

Server!Server!

Server!

Coordinator!

Client!

Scale Without Compromise

Customers!

Customers!

Customers!Par

tition!

Products!

Rep

licat

e!

Local!Transaction!

Disk! Disk! Disk!

Share Nothing Persistence!

Server!Server!Server!

Products!

Customers!

15

Scale in action – More Customers!

16

Server!Server!Server!Server!Server!Server!

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Products!

Customers!

Products!

Customers!

Scale in action – More Customers!

17

Server!Server!Server!Server!Server!Server!

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Products!

Customers!

Products!

Customers!Server!Server!Server!

Server!Server!Server!

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Products!

Customers!

Products!

Customers!Server!Server!Server!

Customers!

Products!

Scale in action – Rebalance Partitions

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Server!Server!Server!Customers!

Products!

Customers!

Customers!

18

Scale in action – Rebalance Partitions

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Server!Server!Server!Customers!

Products!

Customers!

Customers!

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Server!Server!Server!Customers!

Products!

Server!Server!Server!

Products!

Customers!

Customers!

Customers!

19

Scale in action – Rebalance Partitions

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Server!Server!Server!Customers!

Products!

Customers!

Customers!

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Server!Server!Server!Customers!

Products!

Server!Server!Server!

Products!

Customers!

Customers!

Customers!

20

Customers!

Customers!

Customers!Partition!

Products!

Replicate!

Server!Server!Server!Customers!

Products!

Server!Server!Server!

Products!

Customers!

Server!Server!Server!

Products!

Customers!

Need for Speed

•  Eliminate object to relational impedance for the application •  Store data in-memory •  Simplify data access patterns for the application •  Provide auto-update functionality to the application •  Execute data intensive operations in situ

21

Need for Speed

22

Distributed System!!

!

!

!

!

!

Client!

Locator!Locator! Server!

Primary!

Server!Server!

Backups!

Database!

•  No ORM overhead •  Proactive updates •  Automatic load

balancing

•  Memory storage, disk recovery/overflow •  Cluster balances data in buckets •  Support large JVM heaps •  Execute business logic in the grid

•  Write-behind to System of Record

Highly Consistent and Available

•  Consistency –  within distributed system –  with system of record –  localized transactions

•  Availability –  system of record can go down –  any node can fail –  distributed system to distributed system replication can fail

23

Highly Consistent and Available

24

Distributed System!!

!

!

!

!

!

Client!

Locator!Locator! Server!

Primary!

Server!Server!

Backups!

Database!

•  Detect server down •  Detects new servers •  Re-route client connections

•  All writes handled by primary •  Sync to backups •  Support for JTA

•  Write through to System of Record

•  Cluster unaware •  Declarative transactions •  Proactive updates

Acceptable Partition Tolerance

25

Distributed System!!

!

!

!

!

!

Client!

Locator!Locator!

Database!

Server!Server!Server!

Winners!

Server!Server!Server!

Losers!

Distributed System!!

!

!

!

!

!

Client!

Locator!Locator!

Database!

Server!Server!Server!

Winners!

Server!Server!Server!

Losers!

•  Re-establish backups •  Detects split brain •  Re-routes clients to survivors

•  Multiple locators •  Cluster unaware

•  Queue write-behind

Distributed System - London!!

!

!

!

!

!

Locator!Locator! Server!

Primary!

Server!Server!

Backups!

Distributed System - New York!!

!

!

!

!

!

Locator!Locator! Server!

Primary!

Server!Server!

Backups!

Long Distance Challenge

26

•  Both ends can be active*

•  Pass the book is common

•  Optimized for slow WAN

•  Highly available

•  Resilient

Use Case – Wall Street

•  Monte Carlo simulations •  Regulations •  Scalability

27

Use Case – Online Travel

•  Hotel, airline and vacation package data •  Direct correlation between data latency and revenue •  Support for both C# and Java

28

http://www.flickr.com/photos/72213316@N00/5972487340/sizes/m/in/photostream/

Two Data Points

29

Upper Bound!

9.4!TB!

Typical!500!GB!

How much data?!How many backups?!

How many clients?!

Sizing Guidelines Upper Bound!

9.4!TB!

Typical!500!GB!

How much data?!How many backups?!

How many clients?!

Sizing Guidelines

Focus on hot data!Denormalize!Synchronize!

!

Reality Check

Demo

30

GemFire HOWTO

Spring Gemfire

Spring GemFire Top-Level Spring project •  Build Spring-powered, highly available / highly scalable applications

using GemFire as a distributed data management platform Full Access To GemFire API •  Easy declarative DI style configuration (with Spring-backed wiring and

namespace support) •  Cache lifecycle and instance support •  Exception Translation to Spring’s portable DataAccessException

hierarchy •  Template and callback support •  Transaction management support

Gemfire Server

Example: Building an Online-Store...

Product Data Region <region name=“Products"> <region-attributes /> <gfe:region name=“products”/>

Replicated Product Data Region <region name=“Products"> <region-attributes data-policy="replicate“/> <gfe:replicated-region name=“products”/>

Let‘s start with replicating the Product Data which we need everywhere and is limited in size

Gemfire Server

Gemfire Server Gemfire Server

Growing in size - partitioning

<region name=“Orders" refid="PARTITION" />

<gfe:partitioned-region name=“orders”/>

Gemfire Server

Gemfire Server

Gemfire Server Gemfire Server

Partitioning with replicas (HA) <region name="Orders"> <region-attributes> <partition-attributes redundant-copies="1" /> </region> <gfe:partitioned-region name=“orders” copies=“1” />

Gemfire Server Gemfire Server

Colocating Data <region name=“Customers" refid="PARTITION"> <region-attributes>

<partition-attributes colocated-with=“Orders" redundant-copies="1" /> </region> <gfe:partitioned-region name=“Customers” copies=“1” colocated-with=“Orders”/>

Customer Objects with their corresponding Order Objects

Gemfire Server

Gemfire Server Gemfire Server

Moving the code (not the data) around

§  Execute the logic where the data –  Avoids network traffic and inconsistencies –  Reduces data fragmentation –  Increases data collocation (stickyness)

FunctionService.onServers.execute()

execute()

execute()

ResultCollector.getResult() result

result

Gemfire Client

Gemfire Server

Gemfire Server

onEvent(CqEvent cqEvent) key = cqEvent.getKey(); Order order = (Order)cqEvent.getNewValue();

Tracking data changes – Continuous Queries

§  Excellent for having real-time data querying

orderTracker.execute()

§  CQs are registered on primary and secondary servers and server failover is performed without any interruption to CQ messaging

§  Durable CQs possible

CqAttributesFactory cqf = new CqAttributesFactory(); cqf.addCqListener(new OrderEventListener()); CqAttributes cqa = cqf.create(); String cqName = “orderTracker"; String queryStr = "SELECT * FROM /Orders o where o.price > 100.00"; CqQuery orderTracker = queryService.newCq(cqName, queryStr, cqa);

Partitioned Region

put(“Foo”)

Gemfire Client

CQs – Spring version

40

<gfe:cq-listener-container> <gfe:listener ref="listener" query=“SELECT * FROM /Orders o where o.price >

100.00“ method=“match”/ > "</gfe:cq-listener-container>""<bean id="listener" class=“com.foo.PriceMatcher"/>

class PriceMatcher { void match(Object price) { … } }

Data Modeling in GemFire

Top Down •  Develop Java object model •  Use JSR303 based constraints to maintain integrity •  Support for any level of object graph depth Bottom Up •  Reverse engineer DB schema via Spring Roo •  Object to relational map can be highly denormalized Data Format •  Serialized, Gemfire's own or anything you like(!) •  Supports Java, C#, C++

41

Data Access •  Multiple ways to access data

–  JDBC (“direct” access) –  ORM (JPA, JDO) –  Cache/Key-Value

•  Each app has its own “best” way –  Set-based – JDBC –  Mix of set and identity – ORM –  Identity – Cache

•  Using the wrong approach kills performance (and likely the app) –  N+1 problem

•  Don’t be afraid to mix and match

42

Data Granularity

•  Important to figure out what to cache

•  Related to you data access pattern –  JDBC

•  Set based (Result Set) –  ORM-based

•  OO (data model) –  RDMS

•  Table-like (normalized)

•  Pay attention to the identity –  Commonly used to “break” down objects –  Enables lazy loading

43

GemFire DA – Identity Based

§  customerRegion.put(key, customer) §  customerRegion.get(key)

§  customerRegion.create(key, customer) §  customerRegion.replace(key, customer) §  customerRegion.remove(key)

§  customerRegion.query("WHERE …")

GemFire DA – Set/ORM

•  Full support for OQL •  Run scatter/gather queries across partitions •  Indexes •  Continuous Queries

45

OQL

SQL

Spring 3.1 cache abstraction

46

<gfe:cache id="gemfire-cache" /> <bean id="cacheManager"

class=“o.s.d.gemfire.support.GemfireCacheManager" p:cache-ref="gemfire-cache">

@Cacheable("books") public Book findBook(ISBN isbn) @Cacheable(value="book", key="#isbn.rawNumber") public Book findBook(ISBN isbn, boolean includeUsed) @Cacheable(value="book",key="T(someType).hash(#isbn") public Book findBook(ISBN isbn)

47

The Big Picture

New demands on data access

48

Challenge #1 – Scale Horizontally

•  Why? –  Data volumes are increasing 60% each year –  Data use varies widely

•  Mobile •  Browser •  Data exchange via messaging / SOA

•  Database under duress –  Horizontal sharding of data is external to the RDBMS –  Traditional RDBMS scaling is vertical, not horizontal –  Database replication is expensive and difficult

49

Challenge #2 – Heterogeneous Data Access

•  Business needs have changed –  ACID semantics are not needed for all use cases –  BASE semantics are a viable option

•  Online banking = ACID •  Facebook updates = BASE

•  Data has changed –  We store a lot more than text data –  Distributed applications mean distributed data –  Speed is king, scale is queen –  Consistency is relative

50

NOSQL?

51

OR

NoSQL offers several data store categories

52

Column Key-Value Document Graph

Data Model

•  Key Value –  Memcache, Membase, Redis, Riak, Voldemort –  Some are ‘Amazon Dynamo Inspired’

•  Column-Family –  HBase, Cassandra –  Persistent multidimensional sorted map –  Google ‘Big Table’ inspired

•  Document –  MongoDB, CouchDB, Riak –  Collections containing semi-structured data (JSON/BSON/XML?)

•  Graph –  Neo4j, Sones, InfiniteGraph –  Edges and Nodes with properties

•  OO-DB, XML-DB

53

54

Spring Data

Spring Data

•  Challenge •  Proliferation of data •  Complexity of data •  Won’t all go into relational databases

•  NOSQL = Not Only SQL •  Opportunity for Spring to provide solutions •  Spring Data support for new data stores •  Builds upon existing features in Spring

•  MVC Framework, Type Conversion, Caching, Portable Data Access Exceptions

•  Spring Batch, Spring Integration

•  Transaction abstractions •  Common data access exception hierarchy •  JDBC - JdbcTemplate •  ORM - Hibernate, JPA support •  OXM - Object to XML mapping •  Serializer/Deserializer strategies (Spring 3.0) •  Cache support (Spring 3.1)

Spring Framework built-in DA support

Break-down Big Data

•  Leverage existing infrastructure –  Spring Integration –  Spring Batch

•  Easy ETL between environments –  Watch incoming data –  Trigger/Schedule jobs –  Process flat, CVS, XML, ZIP files –  Chunk/Partition/Retry –  QoS/Monitoring/Audit

58

Spring Data Projects

§ Data Commons §  Polyglot persistence

§ Data Key-Value §  Redis, Riak

§ Data Document §  MongoDB, CouchDB

§ Data Graph §  Neo4j

§ BigData (Hadoop/Hive) § Data Repository §  JPA, Mapping

§ Planned § Guidance Docs

§ The big picture § Data Column

§ Cassandra, Hbase § Blob storage

§ Amazon, Atmos, Azure § SQL - Generic DAOs § Grails/Roo support

Finding Spring Data

• GitHub: https://github.com/SpringSource • Web page:

http://www.springsource.org/spring-data • Forum:

http://forum.springsource.org/forumdisplay.php?f=80

Thank you! http://blog.springsource.org

twitter: @costinl

Recommended