An Introduction To Space Based Architecture

An Introduction toSpace Based Architecture

Amin Abbaspour

MAGFA IT Development Centertwitter.com/abbaspour

Agenda

Title

Scalability, why and hows (15)

Space Based Architecture (6)

Java Spaces (4)

GigaSpaces (5)

Migrating Spring Apps to GigaSpaces (6)

Case Study (2)

Conclusion (2)

Q/A

40 slides

Time

10

5

5

10

5

10

1

-

45 min

Innovation Comes From The Need

The applications workload is increasing each day. This is inevitable.

We expect fast and reliable softwares even with increasing workload.

Speed and reliability means the death or life of a business.

But why so much Workload?

Todays softwares are not limited to operators and limited society. They directly interact with millions of people and thousand of other softwares.

● Large scale community sites, like facebook, hi5, twitter.● Prepaid Telecoms● Banking/XTP● Online Gaming● Online Fraud/Risk Management

Need For Speed

A brokerage can lose up to $4 million per millisecond of latency.

- The Tabb Group

An additional 500 ms latency resulted in -20% traffic.

- Google

An additional 100 ms in latency resulted in -1% sales.

- Amazon

Cost of Downtime

According to a 2004 Forrester survey of 235 companies the hourly cost of downtime was:

Percent of Companies Hourly Cost

33% $10K-100K

25% $100K-500K

13% $500K- 1M

4% >$1M

25% Didn't Know

Unpredictable work load

● How do you design and build applications that cost-effectively scale in such conditions?

● Without compromising reliability, performance and time-to-market?

Scalability

The solutions is to have scalable softwares. With scalability we create speed and reliability.

– Vertical scalability; More powerful machines leads to faster software.

– Horizontal scalability; More boxes leads to faster and more reliable software.

– Linear scalability; The overall throughput = (number of processing units) * (throughput per unit).

– Dynamic scalability; Scale on demand (usually using some sort of provisioning and monitoring capabilities)

We usually refer to horizontal scalability, since its more applicable and cost effective. Budget is a great excuse.

Amdal's Law

if, for example, your program has only 10% of a given function synchronized, then:

if the throughput of that function at a single CPU is 100 messages per second,

to increase performance by a factor of 10 (to 1,000 msg/sec)

we will need to increase our CPU resources by a factor of 100

This is 10 times more then what would have been required if the application wouldn't have any synchronization blocks in its code

Scalability Wall

Non-Scalable applications are expensive and risky.

At some point the application will hit a wall:● Application crashes● Re-architecting the application every few months/years

Server cost 20,000

Server Throughput:

1,000 tx/sec

Contention: 15%

Amdal's Law Consequences

To have scalable softwares, we should eliminate synchronized blocks. This means eliminating the bottlenecks and contentions.

Do We build Scalable Software?

Order Management Example

Need Availability? Things Get Worse

Tier Based Car-wash

● Total CPH is the minimal CPH.

● Failure in each warehouse makes the whole business fail.

● To increase performance need to budget all three warehouses.

● Personnel with specialized capabilities.

All in One Car-wash

● To increase CPH, simple add new warehouses.

● Better resources utilization.

● Each warehouse is independent.

● Less steps

Scaling Made Simple – Process Unit Design

Space-Based Architecture

● Based on Object Space Computational Model.● Processing Unit

– Self sufficient unit of scale

– Combination of Data, Processing and Messaging

● Principles of Partitioning● Content Based Routing● Interaction Model Abstractions

Inside A Processing Unit

Closer Look at PU

Parallel Pus – Bring Linear Scalability

The Ideal Scenario - “Write Once Scale Anywhere”

● Scale-out to get more processing power when volume increases.

● Through caching● Parallelizing of TX● Low commodity

resources ● Better Utilization

Space Based Architecture – Theory Basics

● Object Spaces is a paradigm for development of distributed computing applications.

● Spaces can be used to achieve scalability through parallel processing.

● Objects, when deposited in an Object Space are passive, i.e., their methods cannot be invoked while the objects are in the Object Space.

● This paradigm inherently provides mutual exclusion.● Linda coordination language was developed at Yale.● Object Spaces is usually called Tuple Spaces since it

contains of tuples unrelated to each others.

SBA Paradigm in Java; Sun Didn't (Re)invent The Wheel

● Linda a language and platform on tuple-spaces.● Space model was recommending a plug-n-play

infrastructure.– Jini was there

● So JavaSpaces was invented, based on TupleSpaces paradigm and on top of Jini platform.

● By the way Java is not the only language to take the concept. Tuple-spaces are ported to many other languages such as Python, Ruby, Scala, C, .NET, ... .

Tuple/Java Spaces Basics Operations

// An Entry classpublic class SpaceEntrySpaceEntry implements EntryEntry { public Integer count = 0; public String toString() { return "Count: " + count; }}

public class Server Server { public static void main(String[] args) throws Exception { SpaceEntry entry = new SpaceEntry(); JavaSpace space = (JavaSpace) space();

// Register and write the Entry into the Space space.write(entry, null, Lease.FOREVER);

// retrieve the Entry and check its state. SpaceEntry e = space.read(new SpaceEntry(), null, Long.MAX_VALUE);}

JavaSpaces Standard API

Java Spaces Implementations

● Sun RI (now River Project)● Orbitz (running orbitz.com)● Blitz (open source)● Openwings (?)● Semispaces● TSpaces (IBM's implementation)● GigaSpaces

About GigaSpaces Technologies

● Provides Application Platform product (XAP) for applications characterized by:

– High volume transaction processing and

– Very Low latency requirements

– Large Data Volumes

● Scaled-Out Application Server – GigaSpaces XAP– In-Memory Data Grid

– Service Grid

– Java, .NET and C++

● Customer Base – Financial Services, Retail, Banking, Gaming

XAP – eXtreme Application Platform

XAP – pronounced zap - a new class of application server focusing cloud computing and scaling out architectures.

Used for two main domain:● Data intensive/EDG (write-behind cache)● Compute intensive

GigaSpaces Architecture – Sub Systems

GigaSpaces Architecture - Runtime

GigaSpaces Architecture - SLA

How Can GigaSpaces Help Me● To much data and DB is slow. My application has too many

interactions with database.● Application does not scale well. We have (strong) hardware but

throughput does not increase anymore (symptom of tier based architecture)

● We develop XTP platform. e.g. billing, banking, finance.● Not pleased with my HTTP session clustering solution.● Want an scalable SOA/ESB platform.● Need in memory indexing, searching.● Want to deploy my application in cloud (pay-as-you-go)● Want CBR over my MQ.● We don't use Java. Want to stay in C++ or .NET.● Want SLA in my application/data-partition.

Expectations From Application Server

● Data access● Messaging / Event Processing ● Remoting● TX management ● Web

Migration to GigaSpaces● Messaging / Event processing

– Replace MDBs with GigaSpaces event listeners

● Remoting– Replace SLSBs with GigaSpaces SVF

(Remoting/Executors)

● Data access– Use GigaSpaces 2nd cache for Hibernate

– Convert your DAOs to use GigaSpaces, use mirror to persist

● TX management – Use Spring…

● Web– Use GigaSpaces web processing unit

– Use GS HTTP session replication

Migration in Practice

Converted Layer

Code change Config change

Effort (3 is

biggest)

Messaging Minor to none Yes 1

Remoting No Yes 1

Data Access

ORM 2nd level cache: No

DAO: Yes

Yes 2-3

Http Session

No No 1

XAP and Integration to Other (JEE) Platforms

● Spring● ORM● Lucene/Compass● Mule ESB● JGroovy, JRuby, and hopefully Scala ● C++● .NET

XAP Alternatives

● Data Grid– GemFire, Coherence

● Shared Memory– Terracotta/NAM, Memcached, Tokyo Cabinet, Infinispin

● Computation Grids– IBM Extreme Scale Platform

● Cloud/Grid– Google AppEngine, GridGain

● Map Reduce Engines– Hadoop, Disco, Skynet

Case Study – MAGFA SMPP Gateway

Case Study – Let's See it in Action

Other Useful Results

● Use everything in place. Memcached helped us a lot to have a fast, simple and centralized Key/Value store.

● BASE-like transactions in favor of full XA. Memcached as a transaction-memory.

● Tried on both Linux and Solaris. No tangible difference.● Don't design for an specific platform. Use them as tools.

– Easily switched to ActiveMQ, RabbitMQ, Coherence

● Spring greatly helps to apply above rule.● Love immutable. It prevents bugs before they happen.● Reduces contention as much as possible.

References

● Migrating JEE Apps to GigaSpaces, Uri Cohen● Scaling Out Tier Based Applications, Nati Shalom● Cloud Computing; Designing Applications for Efficiency,

Geva Perry● Characteristics of The Next Generation Application

Servers, Guy Nirpaz● GigaSpaces Wiki● Wikipedia

Thanks for Your Attention

Q/A

Technology

An Introduction To Space Based Architecture