Gig as Paces DataGrid OGF Oct07

Embed Size (px)

Citation preview

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    1/46

    Data-Awareness and Low-

    Latency on the Enterprise Grid

    Getting the Most out of Your Grid withEnterprise IMDG

    Shay Hassidim

    Deputy CTO

    Oct 2007

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    2/46

    Overall Presentation Goal

    Understand the Space Based Architecture model and its 4

    verbs.

    Understand the Data contention challenge and the latency

    challenge with Enterprise Grid based applications.

    Understand why typical In-Memory-Data-Grid cant solve

    the above problems and why the Enterprise IMDG can.

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    3/46

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    4/46

    About myself Shay Hassidim

    B.Sc. Electrical, Computer & Telecommunications engineer. Focus on

    Neural networks & Artificial Intelligence , Ben-Gurion University , Graduated1994

    Object and Multi-Dimensional DBMS Expert

    Extensive knowledge with Object Oriented & Distributed Systems

    Consultant for Telecom, Healthcare , Defense & Finance projects Technical Skills: MATLAB , C, C++, .Net , PowerBuilder , Visual Basic ,

    Java , XML , CORBA , J2EE , ODMG , JDO , Hibernate, SQL , JMS , JMX,

    IDE , GUI , Jini , ODBMS , RDBMS , JavaSpaces

    In the past:

    Sirius Technologies Israel - VMDB Applications & Tools team Leader

    Versant Corp US. - Tools Lead Architect , R&D

    Since 2003 - GigaSpaces VP Product Management (Based in Israel)

    Since 2007 GigaSpaces Deputy CTO (Based in NY)

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    5/46

    GigaSpaces Technical

    overview

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    6/46

    The Basics Data Grid: Caching Topologies

    Partitioned Cache

    Replicated Cache

    Master / Local Cache

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    7/46

    So. . .What is Space-Based Architecture?

    Utilizing a single logical/virtual resource to share:

    Data

    Logic

    Events

    Services: Interact with each other through the space

    Can be co-located with data/events for faster results

    Are deployed and managed in an adaptive and fail-safe way

    } Objects! Data ProvisioningEvent

    Propagation

    Logic

    Processing

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    8/468

    Space Based SOA using 4 Simple Verbs

    Write TakeRead Write Notify

    Write + Read = IMDG (Caching)

    Write + Notify = Messaging

    Write + Take = Parallel Processing

    Take

    Write

    Read

    Take

    Notify

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    9/46

    IMDG Distributed In-Memory Query Support

    Enable aggregation of data

    transparently Support SQL Query

    semantics

    Continues query via

    notifications

    Local view client side

    cache

    Partitioned

    Clustered SpaceRead

    Space

    proxy

    Parallel Query

    Local View updated using Continues Query

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    10/46

    Data virtualization IMDG Accessed by all popular API

    and programming languages

    JDBC

    Clustered Space

    Map/

    JCache

    SpaceApplications

    Provides true data grid that

    supports variety of

    standard based data API

    API Becomes just a view

    Same data can be

    accessed via multiple API

    Combine the benefits of

    the relational model with

    OO model

    CPP/.Net

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    11/46

    Integration with External Database 2 basic models

    Write/Read Through

    and Write behind

    enables lazy load of

    data from DB to thecache and async

    persistency Complete mirroring

    cache data into the

    DB

    Support also forblack box persistency

    into RDBMS and

    index file (light

    embedded ODBMS)Sync/Async

    Hibernate Cache

    plug-in provides 2nd

    level cache for

    hibernate based

    applications

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    12/46

    Seamless Integration with External Data Sources

    The Mirror service ensures

    Reliable synchronization with

    minimal performance overhead

    Mirror Service

    Data is propagated seamlesslyfrom the IMDG to the external

    Data source and visa versa

    Through the CacheStore.

    load

    loadload

    store store store

    External Data

    Source

    Reliable Async Replication

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    13/4613

    Services can be

    Java, C++, .Net

    Content-Based

    Routing

    Shared state to

    enable stateful

    services

    SBA Real-time SOA for Stateful Services

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    14/46

    Enterprise Data Grid unique features

    Feature Benefits

    Extended and Standard Querybased on SQL, and

    ability to connect to IMDG using standard JDBCconnector.

    - Makes the IMDG accessible to standard reporting tools.

    - Makes accessing the IMDG just like accessing a JDBC-compatible database, reducing the learning curve.

    SQL-based continuous query support. Brings relevant data close to the local memory of therelevant application instance.

    Central management, monitoring and control. Allows the entire IMDG to be controlled and viewedfrom an administrators console.

    Mirror Servicetransparent persistence of data from theentire IMDG to a legacy database or other datasource.

    Allows seamless integration with existing reporting andback-office systems.

    Real-time event notificationapplication instances canselectively subscribe to specific events.

    Provides capabilities usually provided by messagingsystems, including slow-consumer support, FIFO,

    batching, pub/sub, content-based routing.

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    15/46

    GigaSpaces solution for

    Enterprise Grid

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    16/46

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    17/46

    How can I bring front office application to the

    grid?

    The Latency challenge

    Great, But

    What about stateful applications? Data Contention challenge

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    18/46

    The Data Contention Challenge

    Only stateless applications can scale up freely on the Grid.

    Any application that needs to:a. Share state between more than one instance (service/process)

    b. Store state using a central database

    Could not scale easily!Could not scale easily! This implies

    Partial analysis results checkpoints to enable recovery.

    Managing a workflow involving more than one process.

    Common data need to be shared between processes

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    19/46

    The Latency Challenge

    Enterprise Grid designed for batch applications

    Each client request is submitted as a job. Hardware resources are allocated.

    Relevant software instances (service/process) are scheduled to run on the

    resources and perform the work.

    Impracticable with low-latency environments!Impracticable with low-latency environments!

    Why?

    An interactive application receives thousands of client requests per second, each

    of which needs to be fulfilled within milliseconds.

    It is impossible to respond fast enough in a job approach.

    Throughput would be severely limited due to the need to schedule and launch

    large numbers of application instances.

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    20/46

    Three Stages Approach to the Solution

    1. In Memory Data Grid (IMDG)

    2. Data Aware Grid using SLA driven containers

    3. Adding front office application to the Grid using

    Declarative Space Based Architecture (SBA)

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    21/46

    In Memory Data Grid (IMDG)

    Data stored in the memory of numerous physical machines

    instead of, or alongside, a database.

    Eliminates I/O, network and CPU load.

    Partitions the data and moves it closer to the

    application.

    However, IMDG in an Enterprise distributed environment,However, IMDG in an Enterprise distributed environment,

    is only a partial solution!is only a partial solution!

    Stage1

    Stage1

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    22/46

    Data Aware Grid using SLA driven containers

    Common wisdom holds that it is much easier to bring the business logic to the data

    than to bring the data to the business logic.

    But Not all IMDG support data & business logic co-locality!But Not all IMDG support data & business logic co-locality!

    This results:

    Unnecessary overhead caused by remote calls from business logic to IMDG

    instances.

    Data duplication, because business logic elements that use the same data are notnecessarily concentrated around the relevant IMDG instance.

    And worst of all, data contention, because several business logic elements might

    access the same IMDG instance - leading to exactly the problem the IMDG was

    meant to solve!

    Requirements for a Data-Aware Grid

    The Enterprise Grid must know which data is stored on which IMDG instances.

    There must be a way to guarantee data affinity- tasks must always be executed

    with the relevant data coupled to them.

    Stage2

    Stage2

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    23/46

    Enterprise IMDG Deployment requirements

    Deploying a shared IMDG rather than specific IMDG per

    application requires: Improved resource utilization

    With the IMDG as a shared resource, memory and CPUs

    available to the IMDG instances can be shared between

    different applications, depending on their current data loads.

    It is also much easier to scale the IMDG to respond to

    changing data needs

    Lower total cost of ownership

    Installation, testing, configuration, maintenance and

    administration of the IMDG is performed centrally for all the

    applications on the Grid.

    Stage2

    Stage2

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    24/46

    Enterprise IMDG requirements for grid environments

    Sensitivity to Demand for Data vs. Available Resources

    Free (Memory) resources when there is no need for them

    Multi-Tenancy

    Continuous High-Availability

    Hot fail-over

    Versioningit should be possible to upgrade or update the IMDG instances without affecting

    the data or interrupting access.

    Configuration changesit should be possible to change configuration without affecting

    availability of the IMDG instances.

    Schema evolutionchanging the data structure (i.e. adding or modifying classes) should not

    affect the existing data and should not require downtime.

    Isolation (Groups, instances, Data)

    Content-Based Security Explicit Control over IMDG Instance Locations (manual relocation while the system is

    running)

    Integration with Existing Systems

    Stage2

    Stage2

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    25/46

    Strategies for adding data awareness to the grid

    Scenario Method of Providing Data Awareness

    IMDG instances deployed directly byEnterprise Grid (without SLA-DrivenContainers).

    Integration using affinity keystheEnterprise Grid and users submitting tasksshare special keys that identify the datarelevant to each task. In this way theEnterprise Grid can execute tasks on the samemachine as the relevant data.

    SLA-Driven Containers are launched byEnterprise Grid (each container launchesrelevant IMDG instances).

    Provides data awareness implicitlydata-intensive procedures can run in the SLA-Driven Container, together (co-located) withthe IMDG instances. Because the container

    itself is data aware, data affinity can beguaranteed, without making the EnterpriseGrid itself data aware.

    Stage2

    Stage2

    Stage3

    Stage3

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    26/46

    Adding front-office to the grid using Declarative SBA

    All services are collocated on the same machine

    Transparent data affinity via content based routing (i.e. hash based load-balancing)

    Sharing can be done in local memory => the lowest possible latency.

    Stage3

    Stage3

    Processing

    unit

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    27/46

    27

    Declarative SBA (cont.)

    So what it this processing unit?

    A mini-application which can perform the

    entire business process.

    Accept a user request, perform all steps of

    the transaction on its own, and provide a

    result.

    Removes the need for sharing of state and

    partial results between different

    components of the application running ondifferent physical machines.

    Stage3

    Stage3

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    28/46

    Provides built-in support for deployment of Spring based

    applications

    Virtualize the network and physical resources from the

    application

    Handles Fail Over, Scaling and Relocation policies using

    SLA based definitions.

    Provides distributed dependency injection to handle

    partial failure and deployment dependency.

    Provides single point of access for monitoring and

    management

    SLA Driven Application Service Container Stage3Stage3

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    29/46

    SLA: Failover policy Scaling policy Ststem requirements Space cluster topology

    PU Services beans definition

    SLA Driven Deployment Stage3Stage3

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    30/46

    Fail-OverFailure

    Continuous High Availability Stage3Stage3

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    31/46

    VM 1 ,2GGSCGSC

    VM 3 , 2GGSCGSC

    Dynamic Partitioning = Dynamic Capacity Growth

    VM 2 ,2GGSCGSC

    Max Capacity=2GMax Capacity=4GMax Capacity=6G

    E F

    Partition 1Partition 1

    A B

    Partition 2Partition 2

    CD

    Partition 3Partition 3

    In some point VM 1 free memory is

    below 20 % - it about the time to

    increase the capacity lets move

    Partitions 1 to another GSC and

    recover the data from the running

    backup!

    Later .. Partition 2 needs to

    move After the move ,

    data is recovered from the

    backup

    VM 5 , 4GGSCGSCVM 4 ,4GGSCGSC

    A BPartition 2Partition 2

    E F

    Partition 1Partition 1

    CD

    Partition 3Partition 3

    P - PrimaryP - Primary

    B - BackupB - Backup

    PP

    PP

    PP

    BB

    BB BB

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    32/46

    A closer look at

    OpenSpaces and Declarative

    SBA Development

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    33/46

    Step 1:

    Implement POJO domain model

    Step 2:

    Implement the POJO Services

    Step 3: Wire the services through spring

    Step 4:

    Packaging

    Deploy to Grid (Scale-Out)

    Declarative Spring-SBA How it works.

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    34/46

    @SpaceClass

    public class Data {

    @SpaceId(autoGenerate = true)

    public String getId() {

    return id;}

    @SpaceRouting

    public Long getType() {

    return type;

    }

    public void setProcessed(boolean processed) {

    this.processed = processed;

    }

    }

    SpaceClass indicate that this is aSpaceEntry SpaceClass includes

    classlevel attributes such as

    FIFO,Persistent

    SpaceId used to define the key for thatentry.

    SpaceRouting used to set the data

    affinity i.e. define the partition where thisentry will be routed to.

    The POJO Based Data Domain Model

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    35/46

    public class DataProcessor implements IDataProcessor {

    @SpaceDataEvent

    public Data processData(Data data) {

    data.setProcessed(true);

    data.setData("PROCESSED : " + data.getRawData());

    // reset the id as we use auto generate true

    data.setId(null);System.out.println(" ------ PROCESSED : " + data);

    return data;

    }

    }

    SpaceDataEvent annotation marks theprocessData method as the one that need to be called

    when an event is triggered

    Order Processor Service Bean

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    36/46

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    37/46

    Write

    Space BUS

    Order

    Processor

    ServiceBean

    Polling Event

    Container

    Notify Event

    Container

    Processed

    Orders

    Routing

    Service Bean

    Take Write Notify

    Data Loader

    Space Proxy

    Direct Data Loader Client

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    38/46

    Order Proxy

    Order Processor

    Client

    SpaceServiceProxyFactoryBean

    Invoke

    Write

    SpaceInvokeData OrderProcessor

    Delegator

    Space BUS

    Order

    Processor

    ServiceBean

    SpaceServiceExporter

    Take

    SpaceInvokeData

    Write

    result

    ProcesData

    Space Based Remoting

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    39/46

    Order Proxy

    Order Processor

    Client

    SpaceServiceProxyFactoryBean

    Invoke

    Write

    SpaceInvokeData

    OrderProcessor

    Delegator

    Space BUS

    OrderProcessor

    ServiceBean

    SpaceServiceExporter

    Take

    SpaceInvokeData

    Write

    result

    ProcessData

    Space Based Remoting Inherent Scalability/Reliability

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    40/46

    Looking into the Future Many Enhancements!

    Enhance Performance

    Built in infiniband support Voltaire , Cisco

    Enhance Database integration

    Enhance the Space Mirror support (async persistency)

    Enhance partnership and integration with grid vendors DataSynapse , Platform Computing , Sun Grid Engine, Microsoft

    Compute Cluster Server

    Enhance CPP and .Net support

    Performance optimization first goal same as java

    Support for complex object mapping

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    41/46

    Conclusions and Summary

    Typical IMDG wont help you

    You need Data Aware Enterprise IMDG to solve the data

    contention and latency challenges.

    Data affinity need its twin: data & business locality

    The Enterprise IMDG co-locates the data with thebusiness logic

    Using self-sufficient autonomic processing unit deployed into

    SLA based container that scales via the Enterprise Grid

    The Enterprise IMDG bring the Front-office into the grid

    Makes the grid a utility model for wide spectrum of applications

    across the organization

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    42/46

    Case

    Studies

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    43/46

    A Dynamically Scalable Architecture for Data Intensive

    Trading Analysis Applications

    Most financial organizations today use

    Excel or Reporting Databases as the main

    trading analysis tools. These are very difficult

    to scale.

    The solution is to create a shared In-Memory

    Data Grid (IMDG) which stores the trading

    data in a shared pool of machines. Commondata calculation and analysis run on that pool

    as well, leveraging the available memory and

    CPU resources.

    JavaSpaces is a powerful model for

    distributed persistence. GigaSpaces is a

    JavaSpaces vendor providing Enterprise

    features.

    Spring hides the details of the JavaSpaces

    model, allows effort to be focused on

    requirements rather than frameworks.

    Using shared data grid for all users

    Running analytics close to the data

    to improve performance and leverage the

    available resources

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    44/46

    Reconciliation Calculation

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    45/46

    Questions?

  • 7/31/2019 Gig as Paces DataGrid OGF Oct07

    46/46

    Thank [email protected]