Team 2: The House Party Blackjack Mohammad Ahmad Jun Han Joohoon Lee Paul Cheong Suk Chan Kang

Team 2: The HouseParty Blackjack

Mohammad AhmadJun HanJoohoon LeePaul CheongSuk Chan Kang

Team Members

Hwi Cheong (Paul)[email protected]

Mohammad [email protected]

Joohoon [email protected]

Jun [email protected]

SukChan [email protected]

Baseline Application Blackjack game application

User can create tables and play Blackjack. User can create/retrieve profiles.

Configuration Operating System: Linux Middleware: Enterprise Java Beans (EJB) Application Development Language: Java Database: MySQL Servers: JBOSS J2EE 1.4

Baseline Architecture

Three-tier system Server completely stateless Hard-coded server name into clients Every client talks to HostBean (session)

Fault-Tolerant Design Passive replication

Completely stateless servers No need to transfer states from primary to backup All states stored in database Only one instance of HostBean (session bean) needed to

handle multiple client invocations efficient on server-side Degree of replication depends on number of available

machines Sacred machines

Replication Manager (chess) mySQL database (mahjongg) Clients

Replication Manager Responsible for server availability notification and

recovery Server availability notification

Server notifies Replication Manager during boot. Replication Manager pings each available server

periodically. Server recovery

Process fault: pinging fails; reboot server by sending script to machine

Machine fault (Crash fault): pinging fails; sending script does nothing; machine has to be booted and server has to be manually launched.

Replication Manager (cont’d) Client-RM communication

Client contacts Replication Manager each time it fails over

Client quits when Replication Manager returns no server or Replication Manager can’t be reached.

Evaluation of Performance (without failover)

Observable Trend

Failover Mechanism

Server process is killed. Client receives a RemoteException Client contacts Replication Manager and

asks for a new server. Replication Manager gives the client a new

server. Client remakes invocation to new server Replication Manager sends script to recover

crashed server

Failover Experiment Setup

3 servers initially available Replication Manager on chess 30 fault injections Client keeps making invocations until 30

failovers are complete. 4 probes on server, 3 probes on client to

calculate latency

Failover Experiment Result

0 0.5 1 1.5 2 2.5

x 105

0

1

2

3

4

5

6

7

8x 10

5 End-to-End Roundtrip Latency vs # Invocations

Invocation #

Latency (ms)

Failover Experiment Results

Maximum jitter: ~700ms Minimum jitter: ~300ms Average failover time: ~ 404ms

Failover Pie-chart1%

97%

2%Failure Induced

Query Naming Service

Detection of Failure + Reconnecting to new Server

Normal Run time

Most of latency comes from getting an exception from server and connecting to the new server

Real-time Fault-Tolerant Baseline Architecture Improvements Fail-over time Improvements

Saving list of servers in client Reduces time communicating with replication manager

Pre-creating host beans Client will create host beans on all servers as soon as it

receives list from replication manager

Runtime Improvements Caching on the server side

Client-RM and Client-Server Improvements Client-RM and Client-Server communication

Client contacts Replication Manager each time it runs out of servers to receive a list of available servers.

Client connects to all servers in the list and makes a host beans in them, then starts the application with one server

During each failover, client connects to the next server in the list.

No looping inside list Client quits when Replication Manager returns an empty list

of servers or Replication Manager can’t be reached.

Real-time Server

Caching in server Saves commonly accessed database data in

server Use Hashmap to map query to previously

retrieved data. O(1) performance for caching

Real-time Failover Experiment Setup 3 servers initially available Replication Manager on chess 30 fault injections Client keeps making invocations until 30 failovers are complete. 4 probes on server, 5 probes on client to calculate latency and

naming service time Client probes

Probes around getPlayerName() and getTableName() Probes around getHost() – for failover

Server probes Record source of invocation – name of method Record invocation arrival and result return times

Real-time Failover Experiment Results

Latency (ms)

Invocation #

Real-time Failover Experiment Results Average failover time: 217 ms

Half the latency without improvements (404 ms) Non-failover RTT is visibly lower (shown on graphs below)

Before Real-Time Implementation0 0.5 1 1.5 2 2.5

x 105

0

1

2

3

4

5

6

7

8x 10

5 End-to-End Roundtrip Latency vs # Invocations

After Real-Time Implementation

Real-time Failover Experiment Results

Open Issues

Blackjack game GUI Load-balancing using Replication Manager Multiple number of clients per table (JMS) Profiling on JBoss to help improve

performance Generating a more realistic workload TimeoutException

Conclusions What we have accomplished

Fault-tolerant system with automatic server detection and recovery

Our real-time implementations proved to be successful in improving failover time as well as general performance

What we have learned Merging code can be a pain. A stateless bean are accessed by multiple clients. State can exist even in stateless beans and is useful if accessed

by all clients cache!

What we would do differently Start evaluation earlier… Put more effort and time into implementing timeout’s to enable

bounded detection of server failure.

Documents

Team 2: The House Party Blackjack Mohammad Ahmad Jun Han Joohoon Lee Paul Cheong Suk Chan Kang