29
GigaSpaces Data Caching / Data Grid overview August 2009

Giga Spaces Data Grid / Data Caching Overview

Embed Size (px)

DESCRIPTION

An overview of the DataGrid and Data Caching capabilities of GigaSpaces XAP including topologies and patterns of use,

Citation preview

Page 1: Giga Spaces Data Grid / Data Caching Overview

GigaSpaces Data Caching / Data Grid overview

August 2009

Page 2: Giga Spaces Data Grid / Data Caching Overview

Scaling Up Your

Database by

Adding a Data Grid

Page 3: Giga Spaces Data Grid / Data Caching Overview

Scaling Up Your Database by Adding a Data Grid

• To scale up your database, use the IMDG directly

• On the backend, the IMDG persists the data to your database using your

existing Hibernate O/R mapping.

• Hibernate used by the IMDG

• Application using Native IMDG API - object/SQL API very similar to

Hibernate

• Gain full power of the IMDG

• Good for write and read scenarios

Page 4: Giga Spaces Data Grid / Data Caching Overview

Benefits of using GigaSpaces as the system of record

• Decreasing database load through partitioning and

data distribution - enables higher data volumes and

higher throughput with low latency

• Better decoupling between your application and the

database - no need to hard-wire Hibernate and database

concepts into your code and runtime environment

• Event-driven model enables notifications when data is

modified

• Database access can be synchronous or

asynchronous - the GigaSpaces Mirror Service allows

data to be persisted to the database asynchronously,

without a performance penalty

Page 5: Giga Spaces Data Grid / Data Caching Overview

IMDG Access support

• Main Features:

– Direct persistency (Write/Read Through)

– Asynchronous Reliable persistency (Write Behind)

– Fast Data load once IMDG started

– Lazy load in case of a cache miss

– Delegating IMDG SQL Queries to database

– Advanced Hibernate and nHibernate integration

– Java , C++ and .Net objects persistency

– Custom persistency support

Page 6: Giga Spaces Data Grid / Data Caching Overview

Step 2: Access data via IMDG SQL Queries

• Supported Options and Queries– Opeations: =, <>, <,>, >=, <=, [NOT] like, is [NOT] null, IN.– GROUP BY – performs DISTINCT on the POJO properties– Order By (ASC | DESC)

SQLQuery rquery = new SQLQuery(MyPojo.class,"firstName rlike '(a|c).*' or ago > 0 and lastName rlike '(d|k).*'");

Object[] result = space.readMultiple(rquery);

• Dynamic Query SupportSQLQuery query = new SQLQuery(MyClass.class,“firstName = ? or lastName = ? and ago>?");

query.setParameters(“david”,”lee”,50);

• Supported Options via JDBC API– COUNT, MAX, MIN, SUM, AVG , DISTINCT , Blob and Clob , rownum , sysdate , Table aliases – Join with 2 tables

• Non Supported– HAVING, VIEW, TRIGGERS, EXISTS, BETWEEN, NOT, CREATE USER, GRANT, REVOKE, SET PASSWORD, CONNECT USER, ON.

– NOT NULL, IDENTITY, UNIQUE, PRIMARY KEY, Foreign Key/REFERENCES, NO ACTION, CASCADE, SET NULL, SET DEFAULT, CHECK.

– Union, Minus, Union All.

– STDEV, STDEVP, VAR, VARP, FIRST, LAST.

– # LEFT , RIGHT [INNER] or [OUTER] JOIN

Page 7: Giga Spaces Data Grid / Data Caching Overview

GigaSpaces In-

Memory-Data-

Grid

Page 8: Giga Spaces Data Grid / Data Caching Overview

The IMDG – Runtime Modes – Embedded

• An IMDG (space) instance that runs within the application memory address

space

• Accessed by reference without going through network or serialization calls

• Most efficient configuration mode

• Used as the primary space configuration setup

Virtual Machine

ClientApplication

C++

Page 9: Giga Spaces Data Grid / Data Caching Overview

The IMDG – Runtime Modes – Remote

• Accessing a remote space involves network calls and

serialization/de-serialization of the cached objects between the

client and the space process

• Used only in cases where: – Client application cannot run an embedded space (due to memory capacity

limitations, etc.)

– In cases where there are a large number of concurrent updates on the same

cached object using different remote processes

Virtual Machine Virtual MachineVirtual Machine

ClientApplication

ClientApplication

C++C++

Page 10: Giga Spaces Data Grid / Data Caching Overview

The IMDG – Runtime Modes – Master-Local Cache

• A local ‘cache’ – Embedded with a client

– Set of cached objects is a snapshot

– No additional objects get added to the local space unless new queries are made

– Writes should be made on the master only

• Use when– Many distributed clients

– Accessing the same space

– Read-mostly

Master SpaceClient

ClientApplication

Page 11: Giga Spaces Data Grid / Data Caching Overview

The IMDG – Runtime Modes – Master-Local View

• A local 'View'– Embedded with a client

– Contains updated and changing results based on a client specified query

• Used when– Clients want to get a streaming view of a subset of the 'main' space

• Writes can be made to the view– Contains a proxy to the master

Master SpaceClient

ClientApplication

Page 12: Giga Spaces Data Grid / Data Caching Overview

The IMDG – Runtime Modes – Persistent

• Stores data both into memory and on disk in a relational database

Database

Virtual Machine

Can use custom Mapping or built in Hibernate/nHibernate plug-in

Page 13: Giga Spaces Data Grid / Data Caching Overview

Virtual MachineVirtual Machine

Sync Replication

BackupPrimary

Virtual Machine

Initial Load

Mirror

Async-Replication

Bulk

Feeder

Asynchronous Reliable Persistency - Write Behind

• The most common architecture

• Database is out of the critical path of the transaction

• IMDG operations and data are delegated to the database in a reliable, consistent manner

• Support read and write scenarios

Hibernate

Hibernate

Page 14: Giga Spaces Data Grid / Data Caching Overview

GSCGSC

Replication

Partition 1-BackupPartition 1-Primary

Initial loadusing Query 1

GSCGSC

ReplicationInitial loadusing Query 2

GSCGSC

ReplicationInitial loadusing Query 3

Partition 2-BackupPartition 2-Primary

Partition 3-BackupPartition 3-Primary

The Initial Load – Fast Data load from the Database

Page 15: Giga Spaces Data Grid / Data Caching Overview

IMDG

Deployment Topologies

Page 16: Giga Spaces Data Grid / Data Caching Overview

IMDG Basic Deployment Topologies

Primary-Backup

Virtual Machine Virtual MachineVirtual Machine

Virtual MachineVirtual Machine

Replication

BackupPrimary

Partitioned

Feeder

Feeder

Virtual MachineVirtual Machine Virtual MachineVirtual Machine

Replication

Primary 1Backup 1

Replication

Backup 2Primary 2

Partitioned + BackupFeeder

Page 17: Giga Spaces Data Grid / Data Caching Overview

IMDG

Operations

Page 18: Giga Spaces Data Grid / Data Caching Overview

IMDG Basic Operations

Application

Space

Take

Application

Space

Read

Application

Space

WriteMultipleApplication

Space

Write

Application

Space

ReadMultiple

Application

Space

TakeMultiple

Application

Space

ExecuteApplication

Space

Notify

Page 19: Giga Spaces Data Grid / Data Caching Overview

Move into SBA

Page 20: Giga Spaces Data Grid / Data Caching Overview

What is Space Based Architecture (SBA)

What is a Processing Unit:

• Bundle of services, data,

messaging

• Collocation into single VM

• Unified Messaging & Data

• In-Memory

Cloud of Processing

Units

• Scale through

Partitioning

• Virtualized middleware

Space-Based Architecture (SBA) is a software architecture pattern for achieving linear scalability of stateful, high-performance applications, based on Yale’sTuple-Space Model (Source Wikipedia)

Space-Based Architecture (SBA) is a software architecture pattern for achieving linear scalability of stateful, high-performance applications, based on Yale’sTuple-Space Model (Source Wikipedia)

Application

Space

Write POJO

What is a Space:

• Elegant – 4 API

• Solves:

• Data sharing

• Messaging

• Workflow

• Parallel processing

Page 21: Giga Spaces Data Grid / Data Caching Overview

Move into SBA

• Deploy Application components as Processing Units

– Form a composite SOA Application• Distributed Data Processing

– Use GigaSpaces event driven and data processing

components to process incoming data in real time• Collocate business logic and Data

– Scale these as one entity to allow true linear

scalability

Page 22: Giga Spaces Data Grid / Data Caching Overview

Space Based Architecture – Business logic and data collocated

Processing UnitProcessing UnitProcessing Unit

ServiceBean

ServiceBean

ServiceBean Processing Unit

CollectorServiceBean

Processing Unit

FeederServiceBean

Primary 1 Primary 2 Primary 3

Backup 3Backup 2Backup 1

Replication Replication Replication

Pushing data into the backend system

In-Memory-Data-Grid and collocated Processing units

Collects results / reporting Service

Page 23: Giga Spaces Data Grid / Data Caching Overview

Map-Reduce Approach to perform Parallel Query

• How The GigaSpaces Task Executors works?– Phase 1 - Sending the Task to be executed:

Page 24: Giga Spaces Data Grid / Data Caching Overview

Map-Reduce Approach to perform Parallel Query

• How Task Executors works? – Phase 2 - Getting the results back to be reduced.

– The Task itself will query the IMDG instance and perform whatever calculations needed.

Page 25: Giga Spaces Data Grid / Data Caching Overview

Distributed Task Example

public class MyDistTask implements DistributedTask<Integer, Long> {

public Integer execute() throws Exception { return 1; }

public Long reduce(List<AsyncResult<Integer>> results) throws Exception {

long sum = 0;

for (AsyncResult<Integer> result : results) {

if (result.getException() != null) {

throw result.getException();

}

sum += result.getResult();

}

return sum;

}

}

AsyncFuture<Long> future = gigaSpace.execute(new MyDistTask());

long result = future.get(); // result will be the number of primary spaces

The Task Reducer Implementation– Run at

the client Side

The Task execution – Called from the Client

side

The Task execute Implementation – Run at

the Space Side

Page 26: Giga Spaces Data Grid / Data Caching Overview

SBA

Fundamental

Page 27: Giga Spaces Data Grid / Data Caching Overview

Using SBA to Virtualize the Middleware = GigaSpaces XAP

• Steps to virtualize the middleware:

1. Decouple the application from the deployment

environment

2. Use partitioning to split the load and the data

3. Move manual process to SLA driven deployment

4. Inject dynamic scaling and self healing

• The result: a scale-out application server providing:

– End-end scale-out middleware for Web data, messaging and business logic

– In memory clustering

– Unique database scalability

– Automatic self healing

• Enterprise-grade and OEM-ready:

– Supports open-source and standard development

frameworks

– Supports Java, .NET, C++ and scripting

languages

Page 28: Giga Spaces Data Grid / Data Caching Overview

The Service Grid

• Continuous Application Availability To Achieve 99999’s – Automatically provision additional resources after failures

• Maintain optimal application performance– Dynamically scale (or shrink) system resources based upon business demand

• Dramatic reduction in enterprise server utilization rates– Dynamic provisioning eliminates the need to design for peak loads

• Significant reduction in IT Operations and system management costs

An automated, SLA-based application provisioning & management engine

Page 29: Giga Spaces Data Grid / Data Caching Overview

Typical Web Application Architecture

Grid ServiceContainer

Grid ServiceContainer

Grid ServiceContainer

Grid ServiceContainer

Grid ServiceContainer

Processing UnitProcessing UnitProcessing UnitProcessing Unit

ServiceBean

ServiceBean

ServiceBeanProcessing Unit

Admin Application Replication

ServiceBean

Replication

Primary 1 Backup 1 Primary 2 Backup 2

Grid ServiceContainer

Processing Unit

Web Container

Grid ServiceContainer

Processing Unit

Web Container

Grid ServiceContainer

Processing Unit

Web Container

Web Browser

Load Balancer(Apache)

Grid ServiceManager

Dynamic LB Configuration Dynamic LB

Configuration Managed Jetty Web

Containers, Http Session on top

of the Space

Managed Jetty Web Containers,

Http Session on top of the Space

Business Logic and Data on top of the Data Grid

Business Logic and Data on top of the Data Grid

Interact with BL and Data via Space

API, events, remoting or task

executors

Interact with BL and Data via Space

API, events, remoting or task

executors

Partitioning and collocation for best

performance and scalability

Partitioning and collocation for best

performance and scalability

Async. Persistency

Async. Persistency Proactive

AdministrationProactive

Administration