Download pdf - eQoS: Provisioning of Client-Perceived End-to-End QoS Guarantees in Web Servers

eQoS: Provisioning of Client-PerceivedEnd-to-End QoS Guarantees in Web Servers

Jianbin Wei, Student Member, IEEE, and Cheng-Zhong Xu, Senior Member, IEEE

Abstract—It is important to guarantee client-perceived end-to-end quality of service (QoS) under heavy load conditions. Existing work

focuses on network transfer time or server-side request processing time. In this paper, we propose a novel framework, eQoS, to

monitor and controll client-perceived response time in heavy loaded Web servers. The response time is measured with respect to Web

pages that contain multiple embedded objects. Within the framework, we propose an adaptive fuzzy controller, STFC, to allocate

server resources. The controller assumes no knowledge of the pageview traffic model. It deals with the effect of process delay in

resource allocation by its two-level self-tuning capabilities. We also prove the stability of the STFC. We implement a prototype of eQoS

in Linux and conduct comprehensive experiments across wide-range server workload conditions on PlanetLab and simulated

networks. Experimental results demonstrate the effectiveness of the framework: It controls the deviation of client-perceived pageview

response time to be within 20 percent of a predefined target with both synthetic and real Web traffics. We also compare the STFC with

other controllers, including static fuzzy, linear proportional integral (PI), and adaptive PI controllers. Experimental results show that,

although the STFC works slightly worse than the static fuzzy controller in the environment where the static fuzzy controller is best

tuned, because of its self-tuning capabilities, it has better performance in all other test cases by around 25 percent on average in terms

of the deviation from the target response time. In addition, due to its model independence, the STFC outperforms the linear PI and

adaptive PI controllers by 50 percent and 75 percent on average, respectively.

Index Terms—Client-perceived end-to-end, pageview response time, quality of service, self-tuning fuzzy control, Web servers.

Ç

1 INTRODUCTION

IN the past decade, we have seen an increasing demand forprovisioning of quality of service (QoS) guarantees to

various network applications and clients. There exist manyworks on the provisioning of QoS guarantees. Most of themfocus on Web servers without considering network delays[1], [2], [7], [8], [10], [27], [28], [38], [45], [49], on individualnetwork routers [9], [20], [16], or on clients with assump-tions of QoS supports in networks [19]. Some recent workfocused on end-to-end QoS guarantees in network cores[24], [43]. They aimed at guaranteeing QoS measured fromserver-side network edges to client-side network edgeswithout considering delays incurred in servers.

In practice, client-perceived QoS is measured as theretrieval time of a Web page, including all of its embeddedobjects. It consists of not only network transfer time but alsoserver-side latency, including request queuing delays andprocessing times. The objective of this paper is to guaranteeclient-perceived end-to-end QoS in heavy loaded Webservers, which is useful and necessary. For example, in[13], the authors showed that, for a nonheavily loaded Webserver that was comprised of static Web pages, as high as16.4 percent page accesses were aborted due to the largeresponse time. Furthermore, more than 60 percent of them

were aborted within 5 seconds and the average responsetime for aborted page accesses was 9.21 seconds.

To provide client-perceived QoS guarantees, servicequality must be accurately measured in real time so thatserver resources can be allocated promptly. Most recently,the ksniffer approach presented in [31] realized such anonline real-time measurement for HTTP services and thesMonitor approach presented in [47] further realized such ameasurement for secured services for the first time. Theymake provisioning of client-perceived end-to-end QoSguarantees possible.

The first contribution in this paper is a novel eQoSframework to monitor and control client-perceived QoS inWeb servers. To the best of our knowledge, the eQoS is the firstone to guarantee client-perceived end-to-end QoS. Moreover,the guaranteed QoS is measured with respect to requests forwhole Web pages that usually consist of a base HTML file andseveral embedded objects, instead of requests for a singleobject [1], [2] or TCP connections in Web servers [27], [28],[38]. Pageview response time becomes an increasinglyimportant performance metric because more than 50 percentof Web pages have one or more embedded objects [21]. Noticethat it is feasible to provide end-to-end QoS guarantees in aheavily loaded Web server. In [5], the authors pointed out thatserver latency was responsible for more than 80 percent of theretrieval time of a small object in a heavy loaded Web server.In addition, small objects are dominant in the Web [3].

The second contribution of this paper is a two-level self-tuning fuzzy controller (STFC) for resource allocationwithin the eQoS framework. Traditional linear feedbackcontrol has been applied as an analytic method for QoSguarantees in Web servers because of its self-correcting andself-stabilizing behavior. It adjusts the allocated resource of

IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 12, DECEMBER 2006 1543

. J. Wei is with the Mathematics and Computer Science Department, SouthDakota School of Mines & Technology, 501 East Saint Joseph Street, RapidCity, SD 57701. E-mail: [email protected].

. C.-Z. Xu is with the Department of Electrical and Computer Engineering,Wayne State University, 5050 Anthony Wayne Drive, Detroit, MI 48202.E-mail: [email protected].

Manuscript received 6 Apr. 2005; revised 16 Mar. 2006; accepted 21 Mar.2006; published online 20 Oct. 2006.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-0098-0405.

0018-9340/06/$20.00 � 2006 IEEE Published by the IEEE Computer Society

a client class according to the difference between the targetQoS and the achieved one in previous scheduling epochs.The examples include [1], [27], [37]. In these approaches, thenonlinear relationship between the allocated resource of aclass and its received service quality is linearized at a fixedoperating point. It is well known that linear approximationof a nonlinear system is accurate only within the neighbor-hood of the point where it is linearized. In Web servers withcontinuously changing workload conditions, the operatingpoint changes dynamically and their simple linearizationthus is inappropriate.

In Web servers, resource allocation must be based on theaccurately measured effect of previous resource allocationon the client-perceived pageview response time. Uponreceiving a request for the base HTML file of a Web page,the server needs to schedule the request according to itsresource allocation. At this point, it is impossible to measurethe client-perceived response time of the whole Web pagebecause the server needs to process the request and theresponse needs to be transmitted over the networks. Anaccurate measurement of resource-allocation effect onresponse time is thus delayed. Consequently, the resourceallocation is significantly complicated because it has to bebased on an inaccurate measurement. We refer to thelatency between allocating server resources and accuratelymeasuring the effect of the resource allocation on providedservice quality as process delay.

In [28], [38], the authors addressed the effect of theprocess delay in resource allocation by using a queuing-model-based server load predictor. They integrated thepredictor into a linear feedback controller to react to anincoming performance degradation according to predictedserver workloads. The integrated feedback control ap-proach works well for QoS guarantees with respect torequests for individual objects. It is hardly applicable to thecontrol of the response time of Web pages due to the lack ofan accurate input traffic model for pageview requests.

The STFC overcomes the limitations of existing ap-proaches with its two-level self-tuning capability. On thefirst level is a resource controller that takes advantage offuzzy control theory to address the issue of the lack of anaccurate server model due to the dynamics and unpredict-ability of pageview request traffic. On the second level is ascaling-factor controller. It aims to compensate for the effectof the process delay by adjusting the resource controller’soutput scaling factor according to transient server beha-viors. We note that fuzzy control theory was recently usedby others for QoS guarantees as well [25], [34]. Althoughtheir approaches can be tuned for certain working condi-tion, they are unable to guarantee client-perceived page-view response time in Web servers with continuouslychanging workload conditions. In [22], [23], the authorsapplied adaptive control theory to dynamically adjust theparameters of controllers in adaptation to workload varia-tions. Their approaches deliver better services than thosewith fixed control parameters in most test cases. To provideclient-perceived end-to-end QoS guarantees in addition toworkload variations, the adaptive control also needs to takeinto account the process delay.

To evaluate the performance of the eQoS framework, weimplement a prototype in Linux. We conduct experimentsacross a wide range of server workload conditions on thePlanetLab testbed [36]. We also evaluate the effect ofnetwork delays on its performance using simulated net-works. The experimental results with Web traffic generatedsynthetically and World Cup Soccer 98 server logs demon-strate that the provisioning of client-perceived QoS guar-antees is feasible and the eQoS is effective in suchprovisioning: It controls the deviation of client-perceivedaverage pageview response time to be within 20 percent of apredefined target.

For comparison, we also implement three other con-trollers within the eQoS framework: a static fuzzy con-troller, a linear PI controller, and an adaptive PI controller.The basic ideas of the static fuzzy controller are similar tothe one in [25]. The adaptive PI controller bears aresemblance to the approach in [22]. Experimental resultsshow that, although the STFC works slightly worse than thenonadaptive fuzzy controller in the environment where thenonadaptive fuzzy controller is best tuned, because of itsself-tuning capabilities it has better performance in all othertest cases by 25 percent in terms of the deviation from thetarget response time. In addition, due to its modelindependence, the STFC outperforms the linear PI andadaptive PI controllers by around 50 percent and 75 percenton average, respectively.

In summary, this paper makes three contributions:

1. Proposing a novel framework, eQoS, to provideclient-perceived end-to-end QoS guarantees.

2. Proposing a two-level self-tuning fuzzy controllerwith unknown model parameters for resourceallocation in Web servers with stability analysis.

3. Implementing the eQoS framework with STFC andother feedback controllers, evaluating it on PlanetLaband simulated networks, and demonstrating theeffectiveness of the eQoS/STFC approach to guar-antee client-perceived end-to-end QoS in terms ofpageview response time.

The structure of the paper is as follows: Section 2presents the structure of the eQoS framework and discussesthe design and implementation issues. Section 3 presentsthe two-level STFC and analyzes its stability. Section 4evaluates the performance of the eQoS on real-world andsimulated networks and compares the performance ofdifferent controllers. Section 5 reviews related work in theprovisioning of QoS guarantees in Web servers andSection 6 concludes the paper.

2 THE eQOS FRAMEWORK

The eQoS framework is designed to provide client-perceived end-to-end QoS guarantees in Web servers. Theclient-perceived pageview response time is the time intervalthat starts when a client sends the first request for the Webpage to the server and ends when the client receives the lastobject of the Web page. In this work, we use the ApacheWeb server with support of HTTP/1.1. We assume that allobjects reside in the same server so that we can control theprocessing of the whole Web page. Fig. 1 shows the

1544 IEEE TRANSACTIONS ON COMPUTERS, VOL. 55, NO. 12, DECEMBER 2006

interactions between clients and Web servers during theretrieval of http://www.foo.com/index.html.

1. The client first obtains the IP address of www.foo.comby making inquiries to domain name servers or fromits own cache if the Web site has been accessed in thepast and the address is cached.

2. The client sends a connection request to the Webserver corresponding to the IP address and estab-lishes a TCP connection via three-way handshake.

3. After establishing the TCP connection, the client sendsan HTTP request for index.html. Note that, in practice,the client normally sends the first HTTP requestimmediately (within 0.5 ms in our experiments) afterthe last step of the three-way handshake.

4. The server determines when the established TCPconnection should be passed to the Apache Webserver based on its resource allocation.

5. A child process of the Apache Web server processesthe request and sends index.html back to the client.

6. The client sends an individual request for eachembedded object to the server.

7. The server sends all embedded objects back to theclient.

8. The server waits for possible requests from the sameconnection for certain period (the default is 15 sec-onds in the Apache Web server) and then terminatesthe connection.

In this work, we only consider the latency incurred inStep 2 through Step 7. The latency incurred in Step 1 is onlymeasurable from the client side and is difficult to obtainfrom Web servers. More importantly, the latency isnormally negligible in comparison with pageview responsetime, which is normally on the order of seconds. As shownin [33], 70 percent of the name-lookup requests haveresponse time less than 10 ms and 90 percent of them areless than 100 ms for all of their examined domain nameservers except one.

In the eQoS, we aim to guarantee the average pageviewresponse time perceived by premium clients WðkÞ to beclose to a predefined reference value DðkÞ. By achievingthis, we guarantee the QoS of premium clients and provideas good as possible services to basic clients simultaneously.Because the server load can grow arbitrarily high, it is

impossible to guarantee the QoS of all clients under heavyload conditions.

2.1 Design of the eQoS

The eQoS framework consists of four components: a Webserver, a QoS controller, a resource manager, and a QoSmonitor. Fig. 2 illustrates the components and theirinteractions. The QoS monitor measures client-perceivedpageview response time using ideas similar to thosepresented in [13], [31], [47]. The QoS controller determinesthe amount of resources allocated to each class. It can be anycontroller designed for the provisioning of QoS guarantees.In addition to the STFC, current implementation includesthree other controllers: a nonadaptive fuzzy controller, aPI controller, and an adaptive PI controller for comparison.

The resource manager classifies and manages clientrequests and realizes resource allocation between classes. Itconsists of a classifier, several waiting queues, and aprocessing-rate allocator. The classifier categorizes a re-quest’s class according to rules defined by service provi-ders. For simplicity, in eQoS, the rules are based on therequest’s header information (e.g., IP address and portnumber). It can also be extended to use application-levelinformation [46] if clients with different classes are from thesame proxy. Without the eQoS, a single waiting queue iscreated in the kernel to store all client requests for eachsocket. In the eQoS, requests are stored in their correspond-ing waiting queues in the resource manager. The requestsfrom the same class are served in a first-come-first-servedmanner. The process-rate allocator realizes resource alloca-tion between different classes. Since every child process inthe Apache Web server is identical, we realize theprocessing-rate allocation by controlling the number ofchild processes that a class is allocated. In addition, when aWeb server becomes overloaded, admission control me-chanisms [12], [45], [49] can be integrated into the resourcemanager to ensure the server’s aggregate performance.

The client-perceived response time consists of networktransmission time and server latency. In this work, weassume the network is not highly congested. This assump-tion is justified since network utilization is normally low, asobserved in [14]. In the case where the network iscongested, the network transmission time may become toolarge to provide QoS guarantees even in lightly loaded Webservers. For example, assuming RTT becomes as high as500 ms when the network is congested, as shown in Fig. 1,the retrieval of a Web page with 10 embedded objects viaone TCP connection takes at least 6 seconds: 0.75 secondsfor establishing the TCP connection and 0.5 seconds forretrieving the base HTML file and every embedded objectwith serialized requests.

WEI AND XU: EQOS: PROVISIONING OF CLIENT-PERCEIVED END-TO-END QOS GUARANTEES IN WEB SERVERS 1545

Fig. 1. The interactions between the client and the Web server during

the retrieval of http://www.foo.com/index.html.

Fig. 2. The structure of the eQoS framework.

2.2 Implementation of the eQoS

To evaluate the performance of the eQoS, we haveimplemented a prototype in Linux. In our implementation,we take into account four issues. First, in a heavily loadedWeb server, a new connection request may be dropped bythe operating system due to an overflowing accept queue inthe kernel and the exponential back-off mechanism ofclient-side TCP will be triggered. To minimize such apossibility so as to reduce the retransmission delays, ourimplementation of the resource manager drains the kernel’saccept queue in a tight loop. The accepted connections arestored in their corresponding accept queues, which haveunlimited size within the resource manager. Because eachconnection needs a file descriptor, we need to prevent eQoSfrom running out of available file descriptors. In Linux 2.4and later versions, the default maximum number of filedescriptors is 26,213, which is large enough for the ApacheWeb server and the eQoS. In some systems, such asSunOS 5.8, the default value is 1,024, which may need tobe increased. On the other hand, as we will discuss inSection 4.7, if the load of a server becomes too high,admission control mechanisms need to be deployed to droprequests to ensure QoS guarantees. This also reduces thenumber of backlogged connection requests and the neededfile descriptors.

Second, to ensure that the aggregate performance of aWeb server is not affected by supporting QoS guarantees,the processing-rate allocator realizes the resource allocationusing work-conserving weighted fair queuing algorithms[32]. In the implementation, the Apache Web server ismodified to accept requests from the resource managerthrough a unix domain socket. When a child process in theApache Web server calls accept() on the unix domain socket,a signal is sent to the processing-rate allocator. Uponreceiving the signal, the allocator determines which classshould be served and dispatches a request from the classthrough the unix domain socket to the child process. In thecase where all accept queues are empty, a flag is set toindicate that there exists an idle child process. A newlyarrived request will be passed to the child processimmediately if the flag is set.

Third, in the eQoS, we assume that the capacity of a Webserver is limited by its CPU (processing rate). With thepopularity of dynamic Web content, such as those ine-commerce Web servers, the processing rate becomes aneasier bottleneck resource than others. Similar assumptionshave also been adopted in previous work [2], [27].

Fourth, to prevent the requests from the basic class frombeing starved, we set a configurable upper bound of serverresources to which the other classes can be allocated. Forexample, in the case of two classes, this upper bound is usefulwhen too many requests arrive from the premium class.

3 THE SELF-TUNING FUZZY CONTROLLER

To guarantee client-perceived QoS effectively, the QoScontroller must address issues of the process delay inresource allocation without any assumption of the page-view request traffic model. Within the eQoS framework, wepropose a two-level STFC to control the resource allocation

in Web servers. Fig. 3 presents the structure of the STFC.

The resource controller on the first level takes advantage of

fuzzy control theory to address the issue of lacking accurate

server models. The scaling-factor controller is to compen-

sate for the effect of process delay by adjusting the resource

controller’s output scaling factor according to transient

server behaviors.

3.1 The Structure of the Resource Controller

In the resource controller, the resource allocated to the

premium class in sampling period kþ 1, denoted by

uðkþ 1Þ, is adjusted according to its error eðkÞ (i.e., the

difference between the reference value and the achieved

one) and change of error �eðkÞ in previous sampling period

k using a set of control rules about heuristic control

knowledge. In the controller, eðkÞ and �eðkÞ are calculated

using the reference value rðkÞ and the achieved value yðkÞ.Based on these, the controller calculates resource adjust-

ment �uðkÞ for the next sampling period, which is then fed

into the resource manager component.As shown in Fig. 3, the resource controller consists of

four components. The rule base contains a set of If-Then

rules about quantified control knowledge about how to

adjust the resource allocated to the premium class accord-

ing to eðkÞ and �eðkÞ for QoS guarantees. The fuzzification

interface converts controller inputs into certainties in

numeric values of the input membership functions. The

inference mechanism activates and applies rules according

to fuzzified inputs and generates fuzzy conclusions for the

defuzzification interface. The defuzzification interface con-

verts fuzzy conclusions into the change of resource of

premium class in numeric value.The resource controller presented in Fig. 3 also contains

three scaling factors: input factors Ke and K�e and output

factor �K�u. They are used to tune the controller’s perfor-

mance. The actual inputs of the controller are KeeðkÞ and

K�e�eðkÞ. In the output factor, � is adjusted by the scaling-

factor controller. Thus, the resource allocated to the premium

class during the ðkþ 1Þth sampling period is

uðkþ 1Þ ¼ uðkÞ þ �K�u�uðkÞ ¼Z�K�u�uðkÞdk: ð1Þ

Note that these scaling factors are positive in order to

ensure the stability of the control system.The parameters of the control loop as shown in Fig. 3 are

defined as follows: The reference input for the kth sampling

period rðkÞ is DðkÞ. The output of the loop is the achieved

response time WðkÞ. The error eðKÞ and the change of error


Fig. 3. The structure of the STFC.

�eðkÞ are defined as DðkÞ �WðkÞ and eðkÞ � eðk� 1Þ,respectively.

3.1.1 Design of the Rule Base

It is well known that the bottleneck resource plays animportant role in determining the service quality a classreceives. Thus, by adjusting the bottleneck resource towhich a class is allocated, we are able to control its QoS: Themore resources it receives, the smaller the response time itexperiences. The key challenge in designing the resourcecontroller is how to translate such heuristic control knowl-edge into a set of control rules so that it is able to provideQoS guarantees without an accurate model of continuouslychanging Web servers.

In the resource controller, we define the control rulesusing linguistic variables. For brevity, linguistic variables“eðkÞ,” “�eðkÞ,” and “�uðkÞ” are used to describe eðkÞ,�eðkÞ, and �uðkÞ, respectively. The linguistic variablesassume linguistic values NL, NM, NS, ZE, PS, PM, andPL. Their meanings are shown in Fig. 4a. Note that theyindicate the sign and the size in relation to the otherlinguistic values. This gives more flexibility to the STFCthan other control-theoretic approaches.

We next analyze the effect of the controller on theprovided services as shown in Fig. 5a. In this figure, fivezones with different characteristics can be identified.Zones 1 and 3 are characterized by opposite signs of eðkÞand �eðkÞ. That is, in zone 1, eðkÞ is positive and �eðkÞ isnegative; in zone 3, eðkÞ is negative and �eðkÞ is positive. Inthese two zones, it can be observed that the error is self-correcting and the achieved value is moving toward thereference value. Thus, �uðkÞ needs to be set either to speedup or to slow down the current trend.

Zones 2 and 4 are characterized by the same signs of eðkÞand �eðkÞ. That is, in zone 2, eðkÞ is negative and �eðkÞ is

negative; in zone 4, eðkÞ is positive and �eðkÞ is positive.Differently from zone 1 and zone 3, in these two zones, the

error is not self-correcting and the achieved value is moving

away from the reference value. Therefore, �uðkÞ should beset to reverse the current trend.

Zone 5 is characterized with rather small magnitudes of

eðkÞ and �eðkÞ. Therefore, the system is at a steady state and

�uðkÞ should be set to maintain the current state and

correct small deviations from the reference value.By identifying these five zones, we are able to design the

fuzzy control rules. Let UðkÞ denote the equilibrium

resource value of the premium class at which the reference

value can be achieved. Let ~uðkÞ denote the differencebetween the equilibrium resource value and the current

one. It follows that

~uðkÞ ¼ UðkÞ � uðkÞ: ð2Þ

In order to provide QoS guarantees, the resource allocation

should converge to its equilibrium value. Based on the

characteristics of the five zones shown in Fig. 5a, inlinguistics, we have

00~uðkÞ00 ¼ �½00eðkÞ}þ00 �eðkÞ}� ð3Þ

and “�uðkÞ” is set as

00�uðkÞ} ¼00 ~uðkÞ:} ð4Þ

Notice that we are using linguistic values of the controller’sinputs and output and NL and PL are their lower bound

and upper bound, respectively. Therefore, when “eðkÞ” and

“�eðkÞ” are NL, “�uðkÞ” is set to PL.The resulting control rules are summarized in Fig. 5b. A

general linguistic form of these rules is read as: If premiseThen consequent. Let ruleðm;nÞ, where m and n assume

linguistic values, denote the rule of the ðm;nÞ position in


Fig. 4. The membership functions of the STFC. (a) The membership functions of e, �e, and �u. (b) The membership functions of �.

Fig. 5. Fuzzy control rules in the resource controller. (a) Illustration of the control effect. (b) The rule base of the resource controller.

Fig. 5b. As an example, ruleðPS; PMÞ ¼ NL reads that: If

the error is positive small and the change of error is positive

medium Then the change of resource is negative large.

Remark. Note that the control rules are designed based on

the analysis of resource allocation on achieved response

time. This avoids the need for an accurate model for Web

servers. Therefore, the STFC is model independent.

3.1.2 Fuzzy Quantification of the Rule Base

In the resource controller, the meaning of the linguistic

values is quantified using “triangle” membership functions,

which are most widely used in practice, as shown in Fig. 4a.

We have also examined membership functions with

different shapes, such as “Gaussian” and “trapezoid,” and

found no significant performance differences between

them. For brevity, in the paper, we only present the results

due to the “triangle” membership functions. In Fig. 4a, the

x-axis can be eðkÞ, �eðkÞ, or �uðkÞ. The mth membership

function quantifies the certainty (between 0 and 1) that an

input can be classified as linguistic value m.The fuzzification component translates the inputs into

corresponding certainty in numeric values of the member-

ship functions. Let �mðeðkÞÞ denote the certainty of eðkÞ of

the mth membership function and �nð�eðkÞÞ the certainty

of �eðkÞ of the nth membership function. When eðkÞ ¼ 1=12

and �eðkÞ ¼ 1=3, according to the membership functions of

eðkÞ and �eðkÞ, all membership functions yield 0 except that

�ZEðeðkÞÞ ¼ 0:75; �PSðeðkÞÞ ¼ 0:25; and �PSð�eðkÞÞ ¼ 1:

3.1.3 Inference Mechanism and Defuzzification

The inference mechanism is to determine which rules

should be activated and what conclusions can be reached.

Let �ðm;nÞ denote the premise certainty of ruleðm;nÞ. The

and operation in the premise is calculated via minimum

operation. Suppose eðkÞ ¼ 1=12 and �eðkÞ ¼ 1=3, then only

two rules, ruleðZE;PSÞ and ruleðPS; PSÞ, should be

activated because the premise certainties of all other rules

are 0 according to the example presented in Section 3.1.2.

Consequently,

�ðZE;PSÞ ¼ minf0:75; 1g ¼ 0:75; and

�ðPS; PSÞ ¼ minf0:25; 1g ¼ 0:25:

Based on the outputs of the inference mechanism, the

defuzzification component calculates the fuzzy controller

output, which is a combination of multiple control rules,

using the “center average” method. Let bðm;nÞ denote the

center of the membership function of the consequent of

ruleðm;nÞ. In this case, it is where the membership function

reaches its peak. For example, the center of the NS and NM

membership functions of �uðkÞ is �1=3 and �2=3, respec-

tively. The fuzzy control output is

�uðkÞ ¼P

m;n bðm;nÞ � �ðm;nÞPm;n �ðm;nÞ

: ð5Þ

In the example, therefore, the fuzzy controller output �uðkÞis �5=12.

3.2 The Scaling-Factor Controller

To successfully design the resource controller discussed in

Section 3.1, we must compensate for the effect of the process

delay. To that end, we design a scaling-factor controller to

adaptively adjust �K�u according to the transient behaviors

of a Web server in a way similar to [29]. The selection of

output scaling factor �K�u is because of its global effect on

the control performance.The scaling-factor controller consists of the same

components as the resource controller. The membership

functions of “�” (the corresponding linguistic variable of �)

also have “triangle” shape, as shown in Fig. 4b. It also uses

the “center average” method to calculate the fuzzy

controller output for �. Because � needs to be positive to

ensure the stability of the control system, “�” assumes

different linguistic values from “eðkÞ” and “�eðkÞ.” Fig. 4b

also shows the linguistic values and their meanings.The control rules of the scaling-factor controller are

designed in order to compensate for the effect of the process

delay on the performance of the STFC. They are shown in

Fig. 6 with the following five zones:

1. When eðkÞ is large but �eðkÞ and eðkÞ have the samesigns, the client-perceived response time is not onlyfar away from the reference value but it is alsomoving farther away. Thus, � should be set large toprevent the situation from further worsening.

2. When eðkÞ is large and �eðkÞ and eðkÞ have theopposite signs, � should be set at a small value toensure a small overshoot and to reduce the settlingtime but not at the cost of responsiveness. Moreover,the small value is important to compensate for theeffect of the process delay in resource allocation.

3. When eðkÞ is small, � should be set according tocurrent server states to avoid a large overshoot orundershoot. When �eðkÞ is negative large, it meansthe average response time of the premium class justreaches the reference value and is moving awayupward. A large � is needed to prevent the upwardmotion being more severe and can result in a smallovershoot. Similarly, when eðkÞ is positive small and�eðkÞ is negative small, then � should be very small.The large variation of � is important to preventexcessive oscillation and to increase the convergencerate of achieved service quality. Such a largevariation also justifies the need for the dynamicoutput scaling factor.

4. Note that the workload of a Web server is highlydynamic and has disturbances. The scaling-factor


Fig. 6. The rule base of the scaling-factor controller.

controller also provides regulation against thedisturbances. When a workload disturbance hap-pens, eðkÞ is small and �eðkÞ is normally large withthe same sign as eðkÞ. To compensate for such aworkload disturbance, � is set large.

5. When both eðkÞ and �eðkÞ are very small, � shouldbe around zero to avoid the chattering problemaround the reference value.

The operation of the STFC has two steps. First, we need to

tune theKe,K�e, andK�u through trial and error. In this step,

the scaling-factor controller is turned off and � is set to 1.

Notice that tuning of the control factors, automatically or

manually, is required for all practical controllers because no

fixed values fit all situations [18]. In the second step, the STFC

is turned on to control resource allocation in running Web

servers. The scaling-factor controller is on to tune �

adaptively. In addition, as suggested in [29], the K�u is set

to three times larger than the one obtained in the previous

step to maintain the responsiveness of the STFC during

workload disturbances. TheKe andK�e are kept unchanged.Finally, we remark that the STFC has small overhead. For

any inputs, only two membership functions lead to nonzero

values in the resource controller. Therefore, at most four

rules are on at any time in the resource controller. Similarly,

at most four rules are on in the scaling-factor controller. It is

demonstrated in Section 4.7 that there is negligible

performance difference with the STFC on and off. Further-

more, the implementation complexity is small: Our im-

plementation of the STFC is less than 100 lines of C code.

3.3 Stability Analysis

Stability is one of the most important concerns of a control

system because it is often the case that, if the system is

unstable, there is no chance that any other performance

specifications will hold. In this section, we prove the

stability of the fuzzy control system in a way similar to [15].

Theorem 1. In the fuzzy control system, the allocated resource of

a class will converge asymptotically to a small � neighborhood

of its equilibrium point.

Proof. By (1) and (2), we have

~uiðkþ 1Þ ¼ ~uiðkÞ � �K�u�uiðkÞ: ð6Þ

We use the Lyapunov direct method to prove that thesystem converges to the equilibrium point. Choose

V ð~uiðkÞÞ ¼ ~u2i ðkÞ ð7Þ

as the “Lyapunov function.” It characterizes the distance

of the allocated resource of class i from its equilibrium

rate. To prove that the resource of class i will converge to

the equilibrium point, the difference

V ð~uiðkþ 1ÞÞ � V ð~uiðkÞÞ¼ ~u2

i ðkþ 1Þ � ~u2i ðkÞ

¼ ½~uiðkÞ � �K�u�uiðkÞ�2 � ~u2i ðkÞ

¼ �K�u�uiðkÞ½�K�u�uiðkÞ � 2~uiðkÞ�

ð8Þ

should always be negative.

By (4), we know ~uiðkÞ and �uiðkÞ are always the samein sign. Recall that �K�u is positive. Therefore, byintroducing a constraint

j�uiðkÞj <2

�K�uj~uiðkÞj; ð9Þ

we can ensure (8) is always negative. Therefore, uiðkÞconverges to its equilibrium point. Note that it is difficultto obtain the equilibrium point UiðkÞ for ~uiðkÞ. Instead,letting � be a small positive constant, we use a fixedconstraint for �uiðkÞ, that is,

j�uiðkÞj <2�

�K�u: ð10Þ

Therefore, for j~uiðkÞj > �, (10) leads to the validity of (9).It indicates that uiðkÞ will converge asymptotically to an� neighborhood of its equilibrium point. This concludesthe proof. tu

Remark. According to the definition of equilibrium point,by ensuring that the allocated resource of a classconverges to its equilibrium point, the STFC can provideQoS guarantees.

4 PERFORMANCE EVALUATIONS

We define a metric, relative deviation RðeÞ, based on root-mean square error to measure the performance of the eQoS:

RðeÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnk¼1 DðkÞ �W ðkÞð Þ2=n

qDðkÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPnk¼1 eðkÞ

2=nq

DðkÞ : ð11Þ

RðeÞ reflects the transient characteristics of a control systembecause large deviations contribute heavily to it. Thesmaller the RðeÞ, the more the achieved average responsetime concentrates near the target value and the better thecontroller’s performance. In addition, it is always largerthan the standard deviation when the mean of sampledresponse times is different from the target DðkÞ.

4.1 Experimental Environments and Configurations

We have conducted experiments on the PlanetLab test bedto evaluate the performance of the eQoS in a real-worldenvironment. The clients reside on nine geographicallydiverse nodes: Cambridge, Massachusetts, San Diego,California, and Cambridge, United Kingdom. We assumethat premium and basic clients are from all these nodes forfairness between clients with different network connections.The Web server is a Dell PowerEdge 2450 configured withdual-processor (1 GHz Pentium III) and 512 MB mainmemory and is located in Detroit, Michigan. The server wasconnected to the Internet via a 100 Mbps network card.During the experiments, the RTTs between the server andthe clients are around 45 ms (Cambridge), 70 ms (SanDiego), and 130 ms (United Kingdom).

The client requests were generated by SURGE with itsdefault settings [4]. In the emulated Web objects, there were2,000 unique objects. The object sizes were heavy-tailed,where the body and the tail followed Lognormal and Paretodistributions, respectively. The minimal and the maximalobject sizes were 77 bytes and 3,047 Kbytes, respectively.


The total size of all objects was 56 Mbytes. In ourexperiments, we observed that all objects were loaded intothe memory and, thus, the disk was not a bottleneck. Themaximum number of embedded objects in a given page was150 and the percentages of base, embedded, and lonerobjects were 30 percent, 38 percent, and 32 percent,respectively. The interarrivals between requests followedWeibull distribution if they were less than 1 second.Otherwise, they followed Pareto distribution.

The server load was controlled by adjusting the numberof concurrent user equivalents (UEs) in SURGE. Notice thatthe fixed number of UEs does not affect the representative-ness (i.e., self-similarity and high dynamics) of thegenerated Web traffic [4]. Similarly to the approachpresented in [37], a request is blocked for a time durationproportional to its requested object size. This blockingmodel is to simulate a busy Web server whose capacity islimited by its CPU. The SURGE was set up withoutpipelining because it is not supported or disabled bydefault in most browsers. Meanwhile, in all experiments,the arrival ratio of premium class to basic class wasadjusted randomly for every 100 pages of requests.

The Apache Web server with the support of HTTP/1.1was used to provide Web services. The number of themaximum concurrent child processes was set to 128. SinceQoS guarantees are most desirable when the server isheavily loaded, we set up the environment such that theratio between the number of UEs and the number of childprocesses could drive the server to be heavily loaded.Notice that, although large Web servers such as e-commerceservers usually have more child processes than weconfigured, they also tend to have many more clients thanwe simulated. Therefore, our configuration can be viewedas an emulation of real-world heavy-load scenarios at asmall scale [27].

In the experiments with two classes, we aimed to keepthe average response time of the premium class at around5 seconds. In the experiments with three classes, weassumed the reference values of class 1 and class 2 were 5and 11 seconds, respectively. As pointed out in [6], theservice quality of Web servers is rated as “good” and“average” when the average response time of Web pages isless than 5 seconds and 11 seconds, respectively. Notice thatit is possible to provide response-time guarantees withtargets smaller than 5 and 11 seconds. On the other hand,with 5 and 11 seconds as the targets, we are able to provideservices as good as possible to other clients. In e-commerceenvironments, such a configuration helps to turn occasionalbuyers into loyal customers [48]. In addition, the number ofclasses in real environments is usually limited. As recom-mended in [30], a service provider needs to support two orthree different levels of services.

We aimed to provide guaranteed service only when theserver is heavily loaded. In our experiment configurations,the number of UEs was between 500 and 800. When thenumber of UEs is less than 500, the average response time ofall Web pages is around 5 seconds. When the number ofUEs is larger than 800, we have observed refused connec-tions using an unmodified Apache Web server. In such acase, the server becomes overloaded and admission control

mechanisms, such as those presented in [12], [45], [49],should be used to ensure the aggregate performance of theWeb server.

Meanwhile, as discussed in Section 2.2, to preventrequests of the basic class from being starved, we need tolimit the resources that other classes can use. For example,in the case of two classes, the limit for the premium classwas set to 95 percent of CPU capacity in our experiments.

To investigate the effect of network latency on theperformance of the eQoS, we implemented a network-delaysimulator. It emulates wide-area network delay in a similarway to [41]. In the simulated environment, two machinesare used as clients and one as the network simulator. Theyhave the same hardware configurations as the server andare connected by a 100 Mbps Ethernet. We changed thenetwork routing in the server and client machines so thatthe packets between them were sent to the simulator. Uponreceiving a packet, the simulator routes the packet to an“ethertap” device. A small user-space program reads thepacket from the “ethertap” device, delays it, and writes itback to the device. The packet is then routed to the ethernet.Thus, we can control the RTT between the server andclients. The simulator was shown to be effective inemulating wide-area network delay. For example, withthe RTT set as 180 ms, ping times were showing a roundtrip of around 182 ms.

In the experiments on the simulated networks, the RTTbetween clients and servers was set to be 40, 80, or 180 ms.They represent the transmission latency within the con-tinental US, the latency between the east and west coasts ofthe US, and the one between the US and Europe,respectively [39].

4.2 Effectiveness of the eQoS

To evaluate the effectiveness of the eQoS in providingclient-perceived QoS guarantees, we have conductedexperiments under different workloads and network delayswith two and three client classes. In the experiments, thesystem was first warmed up for 60 seconds. After that, thecontroller was turned on. In our experiments, we found thata sampling period of 4 seconds was a good setting. Fig. 7presents the experiment results.

Fig. 7a shows the relative deviations of the premiumclass relative to the reference value (5 seconds). From thefigure, we observe that all the relative deviations of averageresponse times are smaller than 35 percent. Meanwhile,most of them are around 20 percent. This means the size ofdeviations is normally around 1.0 second.

Fig. 7b presents the results with three classes. Because weobserve no qualitative differences between the results withdifferent RTTs in the simulated networks, we only presentthe results where RTT was set to 180 ms for brevity. Fromthe figure, we see that most of the relative deviations arebetween 15 percent and 30 percent. Because the referencevalues of class 1 and class 2 are 5.0 and 11.0 seconds, theresults indicate the deviation sizes are between 0.75 and1.5 seconds for class 1 and between 1.65 and 3.3 seconds forclass 2. Note that the results shown in Fig. 7 are due toexperiments conducted under different network conditionsand server workloads. We conclude from the results that


the eQoS is able to guarantee client-perceived QoSeffectively in various environments.

To further investigate the effectiveness of the eQoS, weconducted experiments using traces from World CupSoccer 98 server logs [3]. To generate the objects, we usedthe scripts from [50] to analyze the logs. The total number ofobjects was 3,458. The minimal and the maximal object sizeswere 4 bytes and 2,825 Kbytes, respectively. The total size ofall objects was 81 Mbytes. In the experiments, we alsoobserved that all objects were loaded into server memory asin the experiments with SURGE. The object sizes were alsoheavy-tailed [3]. The logs were then replayed by clientsfrom the PlanetLab to access the generated objects.

Fig. 8 plots the experimental results. From the figure, weobserve that, although there were some small spikes, theresponse time of the premium class was kept around thetarget of 5 seconds. Meanwhile, the clients from the basicclass experienced much larger delays (i.e., around 15 sec-onds). In addition, the delays changed frequently andabruptly. We conclude that the eQoS is effective inproviding QoS guarantees even using real Web traffic.

In the eQoS, server resources are allocated based onmeasured client-perceived pageview response time in Webservers. It is important to ensure that the measuredresponse time is close to that experienced by remote clients.We carried out experiments to examine the accuracy ofserver-side measurement. We modified SURGE so that italso recorded the pageview response time along with that ofan individual request. We found that the differencesbetween the client-side and server-side measurements arewithin 7 percent. The differences may be caused by RTTvariances due to network dynamics. It demonstrates thatthe framework is effective in providing client-perceivedQoS guarantees.

4.3 Robustness of the eQoS

In this section, we investigate the robustness of the eQoS inchanging environments. We first examined the performanceof the eQoS with a changing number of clients on PlanetLab.In the experiments, the number of UEs was set to 600initially. It was then increased to 700 at the 200th secondand to 800 after another 200 seconds to further overload theserver. Fig. 9a shows the experimental results. From thefigure, we observe that, with the increase of clients, theserver load and the response time of basic class increase.However, the eQoS can always keep the response timeperceived by premium class close to the target. It suggeststhat the performance of eQoS is not affected by the increaseof server load.

To investigate the robustness of the eQoS in differentnetwork conditions, we conducted experiments with chan-ging RTTs on the simulated networks. We were unable toconduct such experiments on PlanetLab because it wasimpossible to control the RTTs between the clients and theserver. Initially, the RTT was set to 40 ms. After 200 seconds,the RTT was increased to 80 ms. At the 400th second, theRTT was further increased to 180 ms. In this experiment, thenumber of UEs was set to 700. Fig. 9b depicts the responsetimes of the premium class and the basic class. From thefigure, we observe that the response time of the premiumclass experienced only small fluctuations. With the increaseof RTTs, the response time will first increase and thendecrease. It is because the resource allocated to premiumclass was adjusted according to its current response time.

4.4 Feasibility of Client-Perceived QoS Guarantees

In this section, we investigate why it is feasible to guaranteeclient-perceived QoS from the server side under heavy loadconditions. The client-perceived response time consists ofwaiting time, processing time, and transmission time. Thewaiting time is the latency incurred in Step 4 as shown inFig. 1. It is the time interval that starts when a connection isaccepted by the server operating system and ends when theconnection is passed to the Apache Web server to beprocessed. The processing time is the time that the Webserver spends on processing the requests for the whole Webpage, including the base HTML file and its embeddedobjects. The transmission time includes the completetransfer time of client requests and all server responsesover the networks. We instrumented the Apache Webserver to record the processing time. We conductedexperiments on PlanetLab. We also carried out similar


Fig. 7. The performance of the eQoS with (a) two and (b) three classes.

Fig. 8. The results of the eQoS with World Cup Soccer 98 server logs on

PlanetLab.

experiments with different workloads and network delaysand the results were similar. For simplicity, Fig. 10 showsthe breakdown of client-perceived response time inPlanetLab.

From Fig. 10 we observe that, when the server is heavilyloaded (the number of UEs is larger than 400), the server-side waiting time is the dominant part of client-perceivedresponse time. This finding is consistent with others, suchas those in [5], [27], [37]. It is because, when the server isheavily loaded, the child processes of the Apache Webserver are busy in processing accepted client requests. Thenewly incoming client requests then have to wait. Further-more, we also observe that the transmission time is only asmall part of the response time when server workload ishigh. It indicates that, although the service providers haveno control over the network transmissions, they are still ableto control the client-perceived response time by controllingthe server-side waiting time and processing time. Theserver-side QoS guarantees in heavily loaded Web serversthus are feasible in practice.

4.5 Comparison with Other Controllers

Within the eQoS framework, we also implement three othercontrollers: a fuzzy controller without self-tuning, a tradi-tional PI controller, and an adaptive PI controller using thebasic idea of [22]. Proportional, integral, and derivativecontrols are the most popular control design techniques[18]. Derivative control takes into account the change oferrors and has good responsiveness. Its drawback is that itis sensitive to measurement noises, which are common inprovisioning of client-perceived end-to-end QoS guaran-tees. The advantage of proportional control lies in itssimplicity and good agility. It, however, cannot eliminatethe steady-state error introduced during the control, whichcauses the target response time to be unachievable. Suchsteady-state errors are then eliminated using integral

control. Therefore, PI controllers have been widely used inproviding QoS guarantees, including [22], [27]. Further-more, the STFC is a PI-like controller with nonlinearoperating functions. For a traditional PI controller, itsequation in a discrete form is

uðkÞ ¼ K�eeðkÞ þKe

XeðkÞ:

By taking the derivative, it becomes

�uðkÞ ¼ K�e�eðkÞ þKeeðkÞ;

while the inputs of the resource controller in the STFC areK�e�eðkÞ and KeeðkÞ and its output is �uðkÞ. Therefore, thecomparison between these controllers is fair.

In [22], the parameters of the adaptive PI controller areadjusted according to the admission probability of a class.In our implementation, the parameters are adjusted accord-ing to the processing rate of a class in a similar manner.

We have specifically tuned the fuzzy controller in anenvironment that the number of UEs was set to 700 and RTTwas assumed to be 180 ms on the simulated networks toachieve its best performance. We also tuned the PI controllerin the same environment under the guideline of the Ziegler-Nichols method [18]. After that, we then conducted experi-ments with different server workloads and network condi-tions on PlanetLab and simulated networks.

To easily compare the performance of different con-trollers, we take the performance of STFC as a baseline anddefine the performance difference between the STFC andother controller as

PerDiff ¼ RðeÞother �RðeÞSTFCRðeÞSTFC

; ð12Þ

whereRðeÞother andRðeÞSTFC are the relative deviations of theother controller and the STFC, respectively. If PerDiff ispositive, the STFC has better performance than the othercontroller and vice versa. Fig. 11 presents the performancedifference due to other controllers. Because of space limita-tion, we only present the results where RTT was 180 ms.

From Fig. 11b, we observe that, although the nonadap-tive fuzzy controller provide slightly better service than theSTFC when the number of UEs is 700 and the RTT is 180 ns,the STFC outperforms it by around 25 percent under allother conditions, which can also be observed from Fig. 11a.As mentioned before, the STFC further adjusts the outputscaling factor of the fuzzy controller adaptively according tothe transient behaviors of the Web server. Such tuning is


Fig. 9. The results of eQoS under a changing number of (a) UEs and (b) RTTs.

Fig. 10. The breakdown of pageview response time on PlanetLab.

important to compensate for the effect of the process delay

in resource allocation. Thus, the STFC is able to provide

services with a smaller relative deviation than the fuzzy

controller. The observation also demonstrates that our

analysis and design of the self-tuning scaling-factor con-

troller are correct.The observation that the STFC provides no better service

than the nonadaptive fuzzy controller under a certain

environment is expected. It is because an adaptive controller

cannot provide better performance than a nonadaptive one

where the nonadaptive one has been specifically tuned. Even

under such an environment, the performance difference

between the STFC and the fuzzy controller is negligible, that

is, the performance difference is only �6 percent.In comparison with the PI controller, the STFC achieves

better performance even when the PI controller operates

under its specifically tuned environment. It can be observed

in Fig. 11b. When the number of UEs is 700 and RTT is

180 ms, their performance difference is 28 percent. From

Fig. 11, we observe that all performance differences of the PI

controller are larger than 60 percent and the average is

around 75 percent. The poor performance of the PI

controller is due to its inaccurate underlying model. In the

PI controller, we follow the approach in [22] and model the

server as an M=GI=1 processor sharing system. It is known

that the exponential interarrival distribution is unable to

characterize the Web server [35]. Thus, the model is

inaccurate. Similarly, although the adaptive PI improves

upon the nonadaptive PI controller, it still has worse

performance than the STFC and the fuzzy controller. Its

average performance difference in relation to the STFC is

around 50 percent. Moreover, the PI and adaptive PI

controllers provide no means to compensate for the effect ofthe process delay in resource allocation.

We also compare the transient behaviors of the STFCwith other controllers. We found that the STFC causedfewer spikes than all other controllers while it kept theaverage response time of premium users around the target.

4.6 Analysis of the Process Delay

As mentioned before, there exists process delay in resourceallocation. We have conducted experiments to quantify itand analyze its effects on provisioning of client-perceivedend-to-end QoS guarantees. Fig. 12a shows the percentageof requests finished within different numbers of samplingperiod after being admitted. In the experiments, the numberof UEs was 700 and the RTT was 180 ms. Fig. 12b depictscorresponding cumulative distribution function of theservice time, which is the latency incurred in Steps 5through 7, as shown in Fig. 1. Because the latency incurredin Steps 2 through 4 can be measured at the end of Step 4, itdoes not affect the accuracy of the measured resource-allocation effect. Comparing Fig. 12a and Fig. 12b, weobserve that, although over 95 percent of the requests arefinished in 8 seconds after being admitted, only 77.8 percentof them are processed within the same sampling periodwhen it is set to 8 seconds. Moreover, it also indicates that22.2 percent of the measured response time is affected bythe resource allocation performed several sampling periodsago. Consequently, the resource-allocation effect cannot beaccurately measured promptly.

To provide QoS guarantees, however, the resourceallocation in Web servers should be based on an accuratelymeasured effect of previous resource allocation on client-perceived QoS. It in turn controls the order in which clientrequests are scheduled in Step 4 in Fig. 1. The process delay


Fig. 11. The performance comparison between the STFC and other controllers in PlanetLab and simulated networks. (a) PlanetLab. (b) RTT = 180 ms.

Fig. 12. The process delay in resource allocation. (a) The percentage of finished requests within a different number of sampling periods with different

sampling period size. (b) The cumulative distribution function of requests’ service time.

has been recognized as one of the most difficult dynamic

elements naturally occurring in physical systems to deal

with [40]. It sets a fundamental limit on how well a

controller can fulfill design specifications because it limits

how fast a controller can react to disturbances. In the STFC,

we address the issue of the process delay by adaptively

adjusting the output scaling factor using the scaling-factor

controller.

4.7 Effect of the eQoS on Server Performance

In this section, we investigate the effect of the eQoS on the

performance of Web servers. We conducted experiments

with the eQoS (STFC-on and STFC-off) and without the

eQoS. We conducted experiments with different numbers of

UEs for 10 minutes on PlanetLab. To obtain an accurate

average response time, the reported results are the average

without the beginning and the ending periods, that is, the

response time is averaged from the second to the ninth

minute. Fig. 13 presents the average response time

measured from the client side using instrumented SURGE.From Fig. 13, we observe that the performance difference

between the STFC-off and the STFC-on is negligible. It

indicates that the overhead of the STFC itself is small

because the controller only needs to adjust resource

allocation once a sampling period. It also indicates that

the aggregate server performance is not affected by the

working-conserving processing-rate allocation.The overhead comes from our application-level imple-

mentation. In the eQoS, the resource manager first needs to

copy the established connections from the kernel space to

the application space. The classifier then needs to determine

the class type for every request and store it in the

corresponding accept queue. Meanwhile, instead of using

system call accept(), the Apache Web server accepts an

established connection from the eQoS using a signal

through the unix domain socket. When the workload is

moderate, such overhead is small in relation to the page-

view response time. When the server becomes busy, the cost

of extra data copy and process scheduling, the delay

incurred during the signal delivery, and the latency of the

unix domain socket become significant [42]. Part of our

future work is to implement the eQoS in Linux kernel to

make it more efficient.From Fig. 13, we also observe that the average response

time is around 4 seconds when there are 100 UEs in our

server settings. In this case, although the server is lightly

loaded, the network delays are large. This agrees with the

findings in [13] that the response time of a static Web page

with two embedded objects was 3.9 seconds in a lightlyloaded server.

5 RELATED WORK

Provisioning of QoS guarantees has been an active researchtopic. In practice, the client-perceived service quality ismainly determined by both networks and Web servers.Moreover, the service quality is normally measured withrespect to whole Web pages that contain embedded objects.The existing work, however, measured service quality withrespect to a single packet in networks [9], [11], [16], [20] oran individual request [1], [7], [10], [22], [37] or connection[28], [38], [45] in Web servers. For example, in [16], theauthors aimed to provide different QoS levels betweenmultiple aggregated traffic classes within a network router.In [1], the authors were to bound the server-side latency ofindividual client request. In comparison, the proposedframework eQoS aims to guarantee the pageview QoS fromthe perspective of end clients.

Early work focused on providing differentiated servicesto different client classes using priority-based scheduling[2], [17]. It aimed to provide better services to the premiumclass than to the basic class by adjusting the number or thepriority of the allocated processes between the classes oneither the user level or the kernel level. These approaches,however, were unable to guarantee the QoS a class received.

To provide QoS guarantees, queuing-theoretic ap-proaches were proposed. It is well known that the delayupper bound in a G=G=1 is determined by the system loadand the variance of requests’ interarrival and service timedistributions. In the approach in [45], the load of a class wasadjusted by controlling its resource allocation so that thetarget delay equals the upper bound. Its performancehighly depends on the parameter estimation, such as thevariance, which is difficult to measure accurately in thepresence of unpredictable workload variations.

Traditional linear feedback control was also applied tocontrol the resource allocation in Web servers [1], [27], [37].Because the behavior of a Web server changes continuously,the performance of the linear feedback control is limited. Incomparison, our approach is model independent by takingadvantage of fuzzy control theory to manage the server-resource allocation based on the effect analysis of resource-allocation on achieved response time.

Recent work applied adaptive control [22], [23] andmachine-learning [44] to address the issue of the lack of anaccurate server model. In [22], [23], the parameters of thecontrollers were dynamically adjusted in adaptation toworkload variations. In [44], the resource allocation wascontrolled based on observations of past allocations ondelivered service quality. Although these approachesprovide better performance than nonadaptive linear feed-back control approaches under workload disturbances, toguarantee client-perceived end-to-end QoS, the inherentprocess delay also needs to be considered.

Fuzzy control theory was also applied in providing QoSguarantees. In [25], the authors presented a fuzzy controlmodel to address the nonlinear QoS requirements ofdifferent multimedia applications under different resourceconstraints. Similarly, in [26], because of the nonlinearity of


Fig. 13. The effect of the eQoS on the performance of Web servers.

Web servers, fuzzy control theory was used to determine anoptimal number of concurrent child processes to improvethe aggregated server performance. In [34], within thedifferentiated services model, the authors proposed a set ofcontrol rules to dynamically adjust the target delay ratiosbetween different traffic flows to reduce the effect of burstytraffic in the basic class on the delay of the premium class.The objective of the STFC is different in that its focus is onproviding client-perceived end-to-end QoS guarantees withconsiderations of the inherent process delay.

6 CONCLUSIONS

In the paper, we have proposed a novel framework eQoS toproviding client-perceived end-to-end pageview responsetime guarantees in heavy loaded Web servers. Within theframework, we have proposed a two-level self-tuning fuzzycontroller with unknown model parameters to explicitlyaddress the process delay in resource allocation. On the firstlevel is the resource controller that takes advantage of fuzzycontrol theory to address the issue of lacking accurateserver models. The second-level scaling-factor controllercompensates for the effect of process delay by adjusting theresource controller’s output scaling factor according totransient server behaviors. We have also proved thestability of the STFC.

We have implemented the framework in Linux andcarried out comprehensive experiments under differentworkload conditions on the PlanetLab testbed and simu-lated networks. The experimental results have shown that itis feasible to provide client-perceived QoS guarantees inheavy load Web servers. They also demonstrated that theeQoS is effective in providing such QoS guarantees. Thenonadaptive fuzzy controller provides slightly betterservices than the STFC in its best tuned conditions. In allother conditions, on average, the STFC outperforms thenonadaptive fuzzy controller by 25 percent in terms ofdeviations because of its self-tuning capabilities in con-tinuously changing Web servers. Furthermore, because ofthe model independence of the STFC, it outperforms thelinear PI controller and the adaptive PI controller by around75 percent and 50 percent on average, respectively.

In this work, we focus on single-tiered Web servers.Because of the popularity of multitiered e-commerce Websites, our future work will investigate how to incorporatethe eQoS into a multitiered environment and its perfor-mance on dynamic content.

ACKNOWLEDGMENTS

The authors would like to thank the anonymous reviewers fortheir numerous helpful comments. This work was supportedin part by US National Science Foundation grants ACI-0203592, CCF-0611750, and MCS-0624849, and NASA03-OBPR-01-0049.

REFERENCES

[1] T.F. Abdelzaher, K.G. Shin, and N. Bhatti, “Performance Guaran-tees for Web Server End-Systems: A Control-Theoretical Ap-proach,” IEEE Trans. Parallel and Distributed Systems, vol. 13, no. 1,pp. 80-96, Jan. 2002.

[2] J. Almeida, M. Dabu, A. Manikutty, and P. Cao, “ProvidingDifferentiated Levels of Service in Web Content Hosting,” Proc.ACM SIGMETRICS Workshop Internet Server Performance, pp. 91-102, 1998.

[3] M. Arlitt and T. Jin, “A Workload Characterization Study of the1998 World Cup Web Site,” IEEE Network, vol. 14, no. 3, pp. 30-37,May-June 2000.

[4] P. Barford and M. Crovella, “Generating Representative WebWorkloads for Network and Server Performance Evaluation,”Proc. ACM Sigmetrics, pp. 151-160, 1998.

[5] P. Barford and M. Crovella, “Critical Path Analysis of TCPTransactions,” IEEE/ACM Trans. Networking, vol. 9, no. 3, pp. 238-248, 2001.

[6] N. Bhatti, A. Bouch, and A. Kuchinsky, “Integrating User-Perceived Quality into Web Server Design,” Proc. Int’l World WideWeb Conf., pp. 1-16, 2000.

[7] N. Bhatti and R. Friedrich, “Web Server Support for TieredServices,” IEEE Network, vol. 13, no. 5, pp. 64-71, 1999.

[8] P. Bhoj, S. Ramanathan, and S. Singhal, “Web2K: Bringing QoS toWeb Servers,” Technical Report HPL-2000-61, HP Laboratories,May 2000.

[9] S. Blake, D. Black, M. Carlson, E. Davies, Z. Wang, and W. Weiss,“An Architecture for Differentiated Services,” IETF, RFC 2475,Dec. 1998.

[10] J.M. Blanquer, A. Batchelli, K. Schauser, and R. Wolski, “Quorum:Flexible Quality of Service for Internet Services,” Proc. Symp.Networked Systems Design and Implementation, 2005.

[11] R. Braden, D. Clark, and S. Shenker, “Integrated Services in theInternet Architecture: An Overview,” RFC 1633, June 1994.

[12] X. Chen and P. Mohapatra, “Performance Evaluation of ServiceDifferentiating Internet Servers,” IEEE Trans. Computers, vol. 51,no. 11, pp. 1368-1375, Nov. 2002.

[13] L. Cherkasova, Y. Fu, W. Tang, and A. Vahdat, “Measuring andCharacterizing End-to-End Internet Service Performance,” ACMTrans. Internet Technology, vol. 3, no. 4, pp. 347-391, 2003.

[14] B.-Y. Choi, S. Moon, Z.-L. Zhang, K. Papagiannaki, and C. Diot,“Analysis of Point-to-Point Packet Delay in an OperationalNetwork,” Proc. IEEE Infocom, 2004.

[15] Y. Diao, J.L. Hellerstein, and S. Parekh, “Using Fuzzy Control toMaximize Profits in Service Level Management,” IBM Systems J.,vol. 41, no. 3, pp. 403-420, 2002.

[16] C. Dovrolis, D. Stiliadis, and P. Ramanathan, “ProportionalDifferentiated Services: Delay Differentiation and Packet Schedul-ing,” IEEE/ACM Trans. Networking, vol. 10, no. 1, pp. 12-26, 2002.

[17] L. Eggert and J. Heidemann, “Application-Level DifferentiatedServices for Web Servers,” World Wide Web J., vol. 2, no. 3, pp. 133-142, 1999.

[18] G.F. Franklin, J.D. Powell, and A. Emami-naeini, Feedback Controlof Dynamic Systems, fourth ed. Prentice Hall, 2002.

[19] M.E. Gendy, A. Bose, S.-T. Park, and K.G. Shin, “Paving the FirstMile for QoS-Dependent Applications and Appliances,” Proc. Int’lWorkshop Quality of Service, pp. 245-254, 2004.

[20] D. Grossman, “New Terminology and Clarifications for Diffserv,”RFC 3260, Apr. 2002.

[21] F. Hernandez-Campos, K. Jeffay, and F.D. Smith, “Tracking theEvolution of Web Traffic: 1995-2003,” Proc. Int’l Symp. Modeling,Analysis, and Simulation of Computer and Telecomm. Systems, pp. 16-25, 2003.

[22] A. Kamra, V. Misra, and E. Nahum, “Yaksha: A Self TuningController for Managing the Performance of 3-Tiered Websites,”Proc. Int’l Workshop Quality of Service, pp. 47-56, 2004.

[23] M. Karlsson, C. Karamanolis, and X. Zhu, “Triage: PerformanceIsolation and Differentiation for Storage Systems,” Proc. Int’lWorkshop Quality of Service, pp. 67-76, 2004.

[24] J. Kaur and H. Vin, “Providing Deterministic End-to-End FairnessGuarantees in Core-Stateless Networks,” Proc. Int’l WorkshopQuality of Service, pp. 401-421, 2003.

[25] B. Li and K. Nahrstedt, “A Control-Based Middleware Frameworkfor Quality of Service Adaptations,” IEEE J. Selected Areas inComm., vol. 17, no. 9, pp. 1632-1650, Sept. 1999.

[26] X. Liu, L. Sha, Y. Diao, J.L. Hellerstein, and S. Parekh, “OnlineResponse Time Optimization of Apache Web Server,” Proc. Int’lWorkshop Quality of Service, pp. 461-478, 2003.

[27] C. Lu, T.F. Abdelzaher, J.A. Sankovic, and S.H. Son, “A FeedbackControl Approach for Guaranteeing Relative Delays in WebServers,” Proc. IEEE Real-Time and Embedded Technology andApplications Symp., 2001.


[28] Y. Lu, T.F. Abdelzaher, C. Lu, L. Sha, and X. Liu, “FeedbackControl with Queueing-Theoretic Prediction for Relative DelayGuarantees in Web Servers,” Proc. IEEE Real-Time and EmbeddedTechnology and Applications Symp., pp. 208-217, May 2003.

[29] R.K. Mudi and N.R. Pal, “A Robust Self-Tuning Scheme for PI-and PD-Type Fuzzy Controllers,” IEEE Trans. Fuzzy Systems,vol. 7, no. 1, pp. 2-16, Feb. 1999.

[30] K. Nichols, V. Jacobson, and L. Zhang, “A Two-Bit DifferentiatedServices Architecture for the Internet,” RFC 2638, July 1999.

[31] D.P. Olshefski, J. Nieh, and E. Nahum, “ksniffer: Determining theRemote Client Perceived Response Time from Live PacketStreams,” Proc. USENIX Operating Systems Design and Implementa-tion, 2004.

[32] A.K. Parekh and R.G. Gallager, “A Generalized Processor SharingApproach to Flow Control in Integrated Services Networks: TheSingle-Node Case,” IEEE/ACM Trans. Networking, vol. 1, no. 3,pp. 344-357, 1993.

[33] K. Park, V.S. Pai, L. Peterson, and Z. Wang, “CoDNS: ImprovingDNS Performance and Reliability via Cooperative Lookups,” Proc.USENIX Operating Systems Design and Implementation, pp. 199-214,2004.

[34] S. Patchararungruang, S.K. Halgamuge, and N. Shenoy, “Opti-mized Rule-Based Delay Proportion Adjustment for ProportionalDifferentiated Services,” IEEE J. Selected Areas in Comm., vol. 23,no. 2, pp. 261-276, Feb. 2005.

[35] V. Paxson and S. Floyd, “Wide Area Traffic: The Failure of PoissonModeling,” IEEE/ACM Trans. Networking, vol. 3, no. 3, pp. 226-244,June 1995.

[36] L. Peterson, T. Anderson, D. Culler, and T. Roscoe, “A Blueprintfor Introducing Disruptive Technology into the Internet,” Proc.ACM Workshop Hot Topics in Networking, 2002.

[37] P. Pradhan, R. Tewari, S. Sahu, A. Chandra, and P. Shenoy, “AnObservation-Based Approach towards Self-Managing Web Ser-vers,” Proc. Int’l Workshop Quality of Service, 2002.

[38] L. Sha, X. Liu, Y. Lu, T.F. Abdelzaher, “Queueing Model BasedNetwork Server Performance Control,” Proc. IEEE Real-TimeSystems Symp., pp. 81-90, 2002.

[39] S. Shakkottai, R. Srikant, N. Brownlee, A. Broido, and K. Claffy,“The RTT Distribution of TCP Flows in the Internet and Its Impacton TCP-Based Flow Control,” technical report, Cooperative Assoc.for Internet Data Analysis (CAIDA), 2004.

[40] F.G. Shinskey, Process Control Systems: Application, Design, andTuning, fourth ed. McGraw-Hill, 1996.

[41] J. Slottow, A. Shahriari, M. Stein, X. Chen, C. Thomas, and P.B.Ender, “Instrumenting and Tuning Dataview—A NetworkedApplication for Navigating through Large Scientific Datasets,”Software Practice and Experience, vol. 32, no. 2, pp. 165-190, Nov.2002.

[42] W.R. Stevens, Advanced Programming in the UNIX Environment.Addison-Wesley, 1993.

[43] W. Sun and K.G. Shin, “Coordinated Aggregate Scheduling forImproving End-to-End Delay Performance,” Proc. Int’l WorkshopQuality of Service, 2004.

[44] V. Sundaram and P. Shenoy, “A Practical Learning-BasedApproach for Dynamic Storage Bandwidth Allocation,” Proc. Int’lWorkshop Quality of Service, 2003.

[45] B. Urgaonkar, P. Shenoy, “Cataclysm: Handling Extreme Over-loads in Internet Applications,” Proc. Int’l World Wide Web Conf.,May 2005.

[46] T. Voigt, R. Tewari, D. Freimuth, and A. Mehra, “KernelMechanisms for Service Differentiation in Overloaded WebServers,” Proc. USENIX Ann. Technical Conf., June 2001.

[47] J. Wei and C. Xu, “sMonitor: A Non-Intrusive Client-PerceivedEnd-to-End Performance Monitor of Secured Internet Services,”Proc. USENIX Ann. Technical Conf., 2006.

[48] J. Wei, X. Zhou, and C.-Z. Xu, “Robust Processing Rate Allocationfor Proportional Slowdown Differentiation on Internet Servers,”IEEE Trans. Computers, vol. 54, no. 8, pp. 964-977, Aug. 2005.

[49] M. Welsh and D. Culler, “Adaptive Overload Control for BusyInternet Servers,” Proc. USENIX Symp. Internet Technologies andSystems, 2003.

[50] R. Zhang, Real-Time Web Log Replayer, http://www.cs.virginia.edu/~rz5b/software/software.htm, 2006.

Jianbin Wei received the BS degree in compu-ter science from Huazhong University of Scienceand Technology, China, in 1997 and the MS andPhD degrees in computer engineering fromWayne State University in 2003 and 2006,respectively. His research interests are indistributed and Internet computing systems. Heis a student member of the IEEE.

Cheng-Zhong Xu received the BS and MSdegrees in computer science from NanjingUniversity in 1986 and 1989, respectively, andthe PhD degree in computer science from theUniversity of Hong Kong in 1993. He is anassociate professor in the Department of Elec-trical and Computer Engineering of Wayne StateUniversity. His research interests are in distrib-uted and parallel systems, particularly in re-source management for high performance

cluster and grid computing and scalable and secure Internet services.He has published more than 100 peer-reviewed articles in journals andconference proceedings in these areas. He is the author of the bookScalable and Secure Internet Services and Architecture (CRC Press,2005) and the lead coauthor of the book Load Balancing in ParallelComputers: Theory and Practice (Kluwer Academic, 1997). He serveson the editorial boards of three international journals, including theJournal of Parallel and Distributed Computing, and as a guest editor forseveral other journals and transactions. He has also served as theprogram chair or a member of program committees of numerousconferences. His research has been supported in part by the USNational Science Foundation, NASA, and Cray Research. He is arecipient of the Faculty Research Award of Wayne State University in2000, President’s Award for Excellence in Teaching in 2002, and CareerDevelopment Chair Award in 2003. He is a senior member of the IEEE.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.