Upload
tiara
View
37
Download
0
Embed Size (px)
DESCRIPTION
Request Distribution in Server Clusters. Web site infrastructure. Clustered, multi-tiered architectures. e-Shopping Open the portal home page Login View items, prices, availability Select an item type Specify the no. of items Confirm by entering the credit card number Logout. - PowerPoint PPT Presentation
Citation preview
Request Distribution in Server Clusters
Krithi Ramamritham
Indian Institute of Technology Bombay
Web site infrastructureClustered, multi-tiered architectures
… …
WebSwitch
WebServerCluster
ApplicationServerCluster
… …
WebSwitch
WebServerCluster
ApplicationServerCluster
e-Shopping Open the portal home page Login View items, prices, availability Select an item type Specify the no. of items Confirm by entering the credit card number Logout
WS vs. AS
• Web servers– Do well defined and quantifiable local work
• e.g., processing HTTP headers, serving static content
• Application servers– Run multi-layer programs
• e.g., scripts involving calls to backends
ReDalIn clustered, multi-tiered architectures, two request distribution points:
– Web Server Request Distribution (WSRD): Web switch distributes requests to the web server cluster– Application Server Request Distribution (ASRD): Web server distributes requests requiring business logic to the
application server cluster
… …Web
Switch
WebServerCluster
ApplicationServerCluster
… …Web
Switch
WebServerCluster
ApplicationServerCluster ReDal:
Request Distribution for the Application Layer
An approach for efficient distribution of requests across a cluster of application servers
Web Server Request DistributionMany policies: Random, Round Robin (RR), Weighted Round Robin (WRR), Least Connections
– Several of these policies are commercially implemented (e.g., Cisco’s Local Director and F5’s BIG/IP)
Two improvements:1. Session Affinity 2. Locality-Aware Request Distribution (LARD)
• attempts to exploit locality of working sets on different servers – not applicable to dynamically generated content
Session Affinity:
Consecutive requests in a given user session will be served faster if they are handled by the same server
Application Server Request Distribution
Dynamic scheduling techniques usually presuppose some knowledge of task (e.g., duration, weight) and/ or resource (e.g., queue sizes, service times)
– In ASRD, both tasks and resources are highly dynamic
So, techniques are adaptations of WSRD techniques
Most common technique: combination of RR and Session Affinity– Requests starting new sessions are dispatched according to
RR– Subsequent requests in a session are routed to the server
where the session’s previous request was served, i.e., where the session object resides
=> frequently results in load imbalances
ReDal: Motivation
Request distribution combining RR and Session Affinity
Short and long sessions arrive at at one-minute intervals
S S L S S L S L L S
3 4 5 6 7 8 9 10 1121
A1
S3
S
s7 S9
3 4 5 6 7 8 9 10 1121
A2
S6 S8s
Load imbalances
Time (minutes)
Nu
mb
er
of
Act
ive S
ess
ion
s
3 4 5 6 7 8 9 10 1121
A1
3 4 5 6 7 8 9 10 1121
A2
Load imbalances
Time (minutes)
Nu
mb
er
of
Act
ive S
ess
ion
s
ReDAL ObjectiveDistribute requests across a cluster of application
servers such that:• Load on each application server is kept below a certain threshold
• Session affinity is preserved where possible
Lightly Loaded
#users
Trsper Sec
Throughput Peak
Peak Load
Heavily Loaded
ReDAL Components
Application Analyzer
characterizes behavior of
application server
Runs in offline phase to record peak throughput/load values, which are used at runtime by
Request Dispatcher
Request Dispatcher routes requests to a set of application servers
Monitors expected and actual load on each application server
Routes a given request to the affined server if lightly loaded else to application server
having lowest expected load
ReDAL Algorithm
based on key observation:
think-time or view-time on a page is predictable based on past behavior
Jeffrey Heer and Ed H. Chi (Palo Alto Xerox Research Center), “Mining the Structure of User Activity using Cluster Stability”, Proceedings of the Web
Analytics Workshop, SIAM Conference on Data Mining (2002)
ReDal: Capacity Reservation
• Consider a finite lookahead period partitioned into discrete time periods or slices
Current Time
Time SliceTimet1 t2
r1 r2
Think Time
Slice 0 Slice 1 Slice 2
Load metrics:
• Actual Load = number of requests in time slice
• Expected Load = number of requests expected in a time slice based on think time, i.e., time between subsequent requests in a session
– e.g., Capacity is reserved for request r2 on this application server during time slice 2
• Modified Load = Actual Load + Expected Load (0 1)
accounts for prediction errors
ReDal: Algorithm Overview
Inputs:
Request in a session, Think time, Time slice duration,
Output:
Assignment of request to application server A
A = NULL
A = SessionAffinity()
If A is NULL
A = LeastLoaded()
UpdateLoadMetrics()
AdvanceTimeSlice()
Return A
SessionAffinityIf ActualLoad() < PeakLoad()
Return AffinedServer()
LeastLoaded
If request is part of new session
A = LeastLoaded(modified)
Else
A = LeastLoaded(actual)
Return A
Consistent global view of metadata
• Multicasting of changed load info by
WS request dispatcher• Session objects virtualized
in a shared db• Web server records time of
response in a cookie – useful for estimating think
times in web server clusters
… …Web
Switch
WebServerCluster
ApplicationServerCluster
… …Web
Switch
WebServerCluster
ApplicationServerCluster
ReDal: Evaluation
• ReDal, RR, HJ implemented as
Apache Web Server plug-ins
• Load generator simulates a varying number of simultaneous user sessions, each session submitting a stream of requests
• Each request chosen from a uniform distribution across the high and low load transaction requests
• Load generator (LoadRunner 6), Web server (Apache), 10 application server instances (WebLogic 7.1), and session repository (Oracle 8), each running on separate hardware
• Machine configuration: single-CPU (900 MHz), 1GB RAM, 20 GB disk, running Windows 2000 Advanced Server (SP3)
HJ (Hwang and Jung, 2002) uses“least-active-requests” routing policy not applicable to stateful applications
ReDal: Experimental Results
Performance Metrics:
• Average Throughput per Application Server (ATAS): average number of transactions per second an application server in the cluster provides
• Average Response Time (ART): average response time provided by the application servers, measured from the end user perspective
• Web Server CPU Utilization (WSCU): percentage CPU utilization on the web server, measured by OS utilities
• Peak % CPU on the Application Servers: peak percentage CPU usage among a cluster of application servers measured by OS utilities.
• Scaling with Application Servers: percentage CPU usage in web server for various number of application servers in application server cluster.
Throughput Performance
0
10
20
30
40
50
60
0 20 40 60 80 100
Number of Simultaneous Sessions
ATA
S
ReDAL (0.9)
ReDAL (0.5)
HJ
RR
• ReDAL (0.9) is ReDAL algorithm with = 0.9• ReDAL (0.5) is ReDAL algorithm with = 0.5
ReDAL with = 0.9 case has highest throughput
Response Time Performance
0
200
400
600
800
1000
1200
1400
0 20 40 60 80 100
Number of Simultaneous Sessions
AR
T (
ms)
ReDAL (0.9)
ReDAL (0.5)
HJ
RR
ReDAL with = 0.9 case has best response time
CPU Overhead on the Web Server
0
2
4
6
8
10
12
14
0 20 40 60 80 100
Number of Simultaneous Sessions
WS
CU
(%
)
HJ
RR
ReDAL (0.9)
Additional overhead ofReDal algorithm is 1.5% or less
Peak CPU Utilization on Application Servers
0
20
40
60
80
100
0 20 40 60 80 100
Number of Simultaneous Sessions
Pea
k %
CP
U o
n th
e A
pplic
atio
n S
erve
rs ReDAL-Alpha=0.9
ReDAL-Alpha=0.5
HJ
RR
Highest in the RR case and lowest in the ReDAL ( = 0.9) case
Scaling with Application Servers
overhead of ReDAL algorithm is at or below 15% for 100 concurrent sessions
0
2
4
6
8
10
12
14
0 20 40 60 80 100
Number of Simulatenous Sessions
WS
CU
(%
)
#App-Server=5
#App-Server=10
#App-Server=20
Real World EvaluationOnline credit card application
30 WebLogic application servers on Linux Redhat 9.0 Apache Web Server on Linux RedHat 9.0 Machine hardware configuration: 1 GB RAM, 2.2 GHz dual processors Load was simulated by re-tracing web log collected during various times over a day
At a peak load of 1000 simultaneous sessions, ReDAL improved the response time of RR by 100%.
0
200
400
600
800
1000
1200
1400
1600
1800
0 200 400 600 800 1000
Number of Simultaneous Sessions
AR
T (
ms
)
ReDal-0.8
HJ
RR
Summary
… …
WebSwitch
WebServerCluster
ApplicationServerCluster
… …
WebSwitch
WebServerCluster
ApplicationServerCluster
ReDal: Application server load Distribution
Maximizes affinity
Exploits application characteristics
Practical and scalable