Request Distribution in Server Clusters

Request Distribution in Server Clusters

Krithi Ramamritham

Indian Institute of Technology Bombay

Web site infrastructureClustered, multi-tiered architectures

… …

WebSwitch

WebServerCluster

ApplicationServerCluster

… …

WebSwitch

WebServerCluster


e-Shopping Open the portal home page Login View items, prices, availability Select an item type Specify the no. of items Confirm by entering the credit card number Logout

WS vs. AS

• Web servers– Do well defined and quantifiable local work

• e.g., processing HTTP headers, serving static content

• Application servers– Run multi-layer programs

• e.g., scripts involving calls to backends

ReDalIn clustered, multi-tiered architectures, two request distribution points:

– Web Server Request Distribution (WSRD): Web switch distributes requests to the web server cluster– Application Server Request Distribution (ASRD): Web server distributes requests requiring business logic to the

application server cluster

… …Web

Switch

WebServerCluster


… …Web

Switch

WebServerCluster

ApplicationServerCluster ReDal:

Request Distribution for the Application Layer

An approach for efficient distribution of requests across a cluster of application servers

Web Server Request DistributionMany policies: Random, Round Robin (RR), Weighted Round Robin (WRR), Least Connections

– Several of these policies are commercially implemented (e.g., Cisco’s Local Director and F5’s BIG/IP)

Two improvements:1. Session Affinity 2. Locality-Aware Request Distribution (LARD)

• attempts to exploit locality of working sets on different servers – not applicable to dynamically generated content

Session Affinity:

Consecutive requests in a given user session will be served faster if they are handled by the same server

Application Server Request Distribution

Dynamic scheduling techniques usually presuppose some knowledge of task (e.g., duration, weight) and/ or resource (e.g., queue sizes, service times)

– In ASRD, both tasks and resources are highly dynamic

So, techniques are adaptations of WSRD techniques

Most common technique: combination of RR and Session Affinity– Requests starting new sessions are dispatched according to

RR– Subsequent requests in a session are routed to the server

where the session’s previous request was served, i.e., where the session object resides

=> frequently results in load imbalances

ReDal: Motivation

Request distribution combining RR and Session Affinity

Short and long sessions arrive at at one-minute intervals

S S L S S L S L L S

3 4 5 6 7 8 9 10 1121

A1

S3

S

s7 S9

3 4 5 6 7 8 9 10 1121

A2

S6 S8s

Load imbalances

Time (minutes)

Nu

mb

er

of

Act

ive S

ess

ion

s

3 4 5 6 7 8 9 10 1121

A1

3 4 5 6 7 8 9 10 1121

A2

Load imbalances

Time (minutes)

Nu

mb

er

of

Act

ive S

ess

ion

s

ReDAL ObjectiveDistribute requests across a cluster of application

servers such that:• Load on each application server is kept below a certain threshold

• Session affinity is preserved where possible

Lightly Loaded

#users

Trsper Sec

Throughput Peak

Peak Load

Heavily Loaded

ReDAL Components

Application Analyzer

characterizes behavior of

application server

Runs in offline phase to record peak throughput/load values, which are used at runtime by

Request Dispatcher

Request Dispatcher routes requests to a set of application servers

Monitors expected and actual load on each application server

Routes a given request to the affined server if lightly loaded else to application server

having lowest expected load

ReDAL Algorithm

based on key observation:

think-time or view-time on a page is predictable based on past behavior

Jeffrey Heer and Ed H. Chi (Palo Alto Xerox Research Center), “Mining the Structure of User Activity using Cluster Stability”, Proceedings of the Web

Analytics Workshop, SIAM Conference on Data Mining (2002)

ReDal: Capacity Reservation

• Consider a finite lookahead period partitioned into discrete time periods or slices

Current Time

Time SliceTimet1 t2

r1 r2

Think Time

Slice 0 Slice 1 Slice 2

Load metrics:

• Actual Load = number of requests in time slice

• Expected Load = number of requests expected in a time slice based on think time, i.e., time between subsequent requests in a session

– e.g., Capacity is reserved for request r2 on this application server during time slice 2

• Modified Load = Actual Load + Expected Load (0 1)

accounts for prediction errors

ReDal: Algorithm Overview

Inputs:

Request in a session, Think time, Time slice duration,

Output:

Assignment of request to application server A

A = NULL

A = SessionAffinity()

If A is NULL

A = LeastLoaded()

UpdateLoadMetrics()

AdvanceTimeSlice()

Return A

SessionAffinityIf ActualLoad() < PeakLoad()

Return AffinedServer()

LeastLoaded

If request is part of new session

A = LeastLoaded(modified)

Else

A = LeastLoaded(actual)

Return A

Consistent global view of metadata

• Multicasting of changed load info by

WS request dispatcher• Session objects virtualized

in a shared db• Web server records time of

response in a cookie – useful for estimating think

times in web server clusters

… …Web

Switch

WebServerCluster


… …Web

Switch

WebServerCluster


ReDal: Evaluation

• ReDal, RR, HJ implemented as

Apache Web Server plug-ins

• Load generator simulates a varying number of simultaneous user sessions, each session submitting a stream of requests

• Each request chosen from a uniform distribution across the high and low load transaction requests

• Load generator (LoadRunner 6), Web server (Apache), 10 application server instances (WebLogic 7.1), and session repository (Oracle 8), each running on separate hardware

• Machine configuration: single-CPU (900 MHz), 1GB RAM, 20 GB disk, running Windows 2000 Advanced Server (SP3)

HJ (Hwang and Jung, 2002) uses“least-active-requests” routing policy not applicable to stateful applications

ReDal: Experimental Results

Performance Metrics:

• Average Throughput per Application Server (ATAS): average number of transactions per second an application server in the cluster provides

• Average Response Time (ART): average response time provided by the application servers, measured from the end user perspective

• Web Server CPU Utilization (WSCU): percentage CPU utilization on the web server, measured by OS utilities

• Peak % CPU on the Application Servers: peak percentage CPU usage among a cluster of application servers measured by OS utilities.

• Scaling with Application Servers: percentage CPU usage in web server for various number of application servers in application server cluster.

Throughput Performance

0

10

20

30

40

50

60

0 20 40 60 80 100

Number of Simultaneous Sessions

ATA

S

ReDAL (0.9)

ReDAL (0.5)

HJ

RR

• ReDAL (0.9) is ReDAL algorithm with = 0.9• ReDAL (0.5) is ReDAL algorithm with = 0.5

ReDAL with = 0.9 case has highest throughput

Response Time Performance

0

200

400

600

800

1000

1200

1400

0 20 40 60 80 100


AR

T (

ms)

ReDAL (0.9)

ReDAL (0.5)

HJ

RR

ReDAL with = 0.9 case has best response time

CPU Overhead on the Web Server

0

2

4

6

8

10

12

14

0 20 40 60 80 100


WS

CU

(%

)

HJ

RR

ReDAL (0.9)

Additional overhead ofReDal algorithm is 1.5% or less

Peak CPU Utilization on Application Servers

0

20

40

60

80

100

0 20 40 60 80 100


Pea

k %

CP

U o

n th

e A

pplic

atio

n S

erve

rs ReDAL-Alpha=0.9

ReDAL-Alpha=0.5

HJ

RR

Highest in the RR case and lowest in the ReDAL ( = 0.9) case

Scaling with Application Servers

overhead of ReDAL algorithm is at or below 15% for 100 concurrent sessions

0

2

4

6

8

10

12

14

0 20 40 60 80 100

Number of Simulatenous Sessions

WS

CU

(%

)

#App-Server=5

#App-Server=10

#App-Server=20

Real World EvaluationOnline credit card application

30 WebLogic application servers on Linux Redhat 9.0 Apache Web Server on Linux RedHat 9.0 Machine hardware configuration: 1 GB RAM, 2.2 GHz dual processors Load was simulated by re-tracing web log collected during various times over a day

At a peak load of 1000 simultaneous sessions, ReDAL improved the response time of RR by 100%.

0

200

400

600

800

1000

1200

1400

1600

1800

0 200 400 600 800 1000


AR

T (

ms

)

ReDal-0.8

HJ

RR

Summary

… …

WebSwitch

WebServerCluster


… …

WebSwitch

WebServerCluster


ReDal: Application server load Distribution

Maximizes affinity

Exploits application characteristics

Practical and scalable

Documents

Request Distribution in Server Clusters