View
213
Download
0
Embed Size (px)
Citation preview
Team Members
Hwi Cheong (Paul)[email protected]
Mohammad [email protected]
Joohoon [email protected]
SukChan [email protected]
Baseline Application Blackjack game application
User can create tables and play Blackjack. User can create/retrieve profiles.
Configuration Operating System: Linux Middleware: Enterprise Java Beans (EJB) Application Development Language: Java Database: MySQL Servers: JBOSS J2EE 1.4
Baseline Architecture
Three-tier system Server completely stateless Hard-coded server name into clients Every client talks to HostBean (session)
Fault-Tolerant Design Passive replication
Completely stateless servers No need to transfer states from primary to backup All states stored in database Only one instance of HostBean (session bean) needed to
handle multiple client invocations efficient on server-side Degree of replication depends on number of available
machines Sacred machines
Replication Manager (chess) mySQL database (mahjongg) Clients
Replication Manager Responsible for server availability notification and
recovery Server availability notification
Server notifies Replication Manager during boot. Replication Manager pings each available server
periodically. Server recovery
Process fault: pinging fails; reboot server by sending script to machine
Machine fault (Crash fault): pinging fails; sending script does nothing; machine has to be booted and server has to be manually launched.
Replication Manager (cont’d) Client-RM communication
Client contacts Replication Manager each time it fails over
Client quits when Replication Manager returns no server or Replication Manager can’t be reached.
Failover Mechanism
Server process is killed. Client receives a RemoteException Client contacts Replication Manager and
asks for a new server. Replication Manager gives the client a new
server. Client remakes invocation to new server Replication Manager sends script to recover
crashed server
Failover Experiment Setup
3 servers initially available Replication Manager on chess 30 fault injections Client keeps making invocations until 30
failovers are complete. 4 probes on server, 3 probes on client to
calculate latency
Failover Experiment Result
0 0.5 1 1.5 2 2.5
x 105
0
1
2
3
4
5
6
7
8x 10
5 End-to-End Roundtrip Latency vs # Invocations
Invocation #
Latency (ms)
Failover Experiment Results
Maximum jitter: ~700ms Minimum jitter: ~300ms Average failover time: ~ 404ms
Failover Pie-chart1%
97%
2%Failure Induced
Query Naming Service
Detection of Failure + Reconnecting to new Server
Normal Run time
Most of latency comes from getting an exception from server and connecting to the new server
Real-time Fault-Tolerant Baseline Architecture Improvements Fail-over time Improvements
Saving list of servers in client Reduces time communicating with replication manager
Pre-creating host beans Client will create host beans on all servers as soon as it
receives list from replication manager
Runtime Improvements Caching on the server side
Client-RM and Client-Server Improvements Client-RM and Client-Server communication
Client contacts Replication Manager each time it runs out of servers to receive a list of available servers.
Client connects to all servers in the list and makes a host beans in them, then starts the application with one server
During each failover, client connects to the next server in the list.
No looping inside list Client quits when Replication Manager returns an empty list
of servers or Replication Manager can’t be reached.
Real-time Server
Caching in server Saves commonly accessed database data in
server Use Hashmap to map query to previously
retrieved data. O(1) performance for caching
Real-time Failover Experiment Setup 3 servers initially available Replication Manager on chess 30 fault injections Client keeps making invocations until 30 failovers are complete. 4 probes on server, 5 probes on client to calculate latency and
naming service time Client probes
Probes around getPlayerName() and getTableName() Probes around getHost() – for failover
Server probes Record source of invocation – name of method Record invocation arrival and result return times
Real-time Failover Experiment Results Average failover time: 217 ms
Half the latency without improvements (404 ms) Non-failover RTT is visibly lower (shown on graphs below)
Before Real-Time Implementation0 0.5 1 1.5 2 2.5
x 105
0
1
2
3
4
5
6
7
8x 10
5 End-to-End Roundtrip Latency vs # Invocations
After Real-Time Implementation
Open Issues
Blackjack game GUI Load-balancing using Replication Manager Multiple number of clients per table (JMS) Profiling on JBoss to help improve
performance Generating a more realistic workload TimeoutException
Conclusions What we have accomplished
Fault-tolerant system with automatic server detection and recovery
Our real-time implementations proved to be successful in improving failover time as well as general performance
What we have learned Merging code can be a pain. A stateless bean are accessed by multiple clients. State can exist even in stateless beans and is useful if accessed
by all clients cache!
What we would do differently Start evaluation earlier… Put more effort and time into implementing timeout’s to enable
bounded detection of server failure.