31

Flexible and Adaptable Bu er Management Techniques for

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Flexible and Adaptable Bu�er Management Techniques forDatabase Management Systems�Christos FaloutsosDepartment of Computer Scienceand Institute for Systems Research (ISR)University of MarylandCollege Park, MD 20742 Raymond NgDepartment of Computer ScienceUniversity of British ColumbiaVancouver, B.C. Canada V6T 1Z2Timos SellisyDepartment of Electrical and Computer EngineeringComputer Science DivisionNational Technical University of AthensZographou 157 73, Athens, GreeceAbstractThe problem of bu�er management in database management systems is concerned with thee�cient main memory allocation and management for answering database queries. Previous workson bu�er allocation are based either exclusively on the availability of bu�ers at runtime or on theaccess patterns of queries. In this paper, we �rst propose a uni�ed approach for bu�er allocation inwhich both of these considerations are taken into account. Our approach is based on the notion ofmarginal gains which specify the expected reduction in page faults by allocating extra bu�ers to aquery. Then, we extend this approach to support adaptable bu�er allocation. An adaptable bu�erallocation algorithm automatically optimizes itself for the speci�c query workload. To achieve thisadaptability, we propose using run-time information, such as the load of the system, in bu�erallocation decisions. Our approach is to use a simple queueing model to predict whether a bu�erallocation will improve the performance of the system. Thus, this paper provides a more theoreticalbasis for bu�er allocation. Simulation results show that our methods based on marginal gains andour predictive methods consistently outperform existing allocation strategies. In addition, thepredictive methods have the added advantage of adjusting their allocation to changing workloads.Keywords: bu�er management, performance analysis, relational databases�This research was partially sponsored by the National Science Foundation under Grants DCR-86-16833, IRI-8719458, IRI-8958546 and IRI-9057573, by DEC, IBM and Bellcore, by NASA under Grant NAS5-31351, by NSERCunder Grants STR0134419 and OGPO138055, and by the University of Maryland Institute for Advanced ComputerStudies (UMIACS).yWork performed while the author was with the Dept. of Comp. Science, Univ. of Maryland.1

1 IntroductionIn relational database management systems, the bu�er manager is responsible for all the operationson bu�ers, including load control. That is, when bu�ers become available, the manager needs todecide whether to activate a query from the waiting queue and how many bu�ers to allocate tothat query. Figure 1 outlines the major components involved in this issue of bu�er allocation. Thebu�er pool area is a common resource and all queries { queries currently running and queries inthe waiting queue { compete for the bu�ers. Like in any competitive environment, the principleof supply and demand, as well as protection against starvation and unfairness must be employed.Hence, in principle, the number of bu�ers assigned to a query should be determined based on thefollowing factors:1. the demand factor { the space requirement of the query as determined by the access patternof the query (shown as path (1) in Figure 1),2. the bu�er availability factor { the number of available bu�ers at runtime (shown as path (2)in Figure 1), and3. the dynamic load factor { the characteristics of the queries currently in the system (shown aspath (3) in Figure 1).Based on these factors, previous proposals on bu�er allocation can be classi�ed into the followinggroups, as summarized in Table 1.Allocation algorithms in the �rst group consider only the bu�er availability factor. They in-clude variations of First-In-First-Out (FIFO), Random, Least-Recently-Used (LRU), Clock, andWorking-Set[6, 10, 15]. However, as they focus on adapting memory management techniques usedin operating systems to database systems, they fail to take advantage of the speci�c access patternsexhibited by relational database queries, and their performance is not satisfactory[3].Allocation strategies in the second group consider exclusively the demand factor, or more specif-ically the access patterns of queries. They include the proposal by Kaplan[8] on the implementationof INGRES[16], the Hot-Set model designed by Sacca and Schkolnick[13, 14], and the strategy usedby Cornell and Yu[5] in the integration of bu�er management with query optimization. This ap-proach of bu�er allocation is culminated in the work of Chou and DeWitt[3]. They introduce the321outputqueries bu�ermanager CPUandDiskbu�er poolFigure 1: Bu�er Manager and Related Components2

access patterns availability of dynamicof queries (demand) bu�ers at runtime workloadFIFO, Random, LRU, etc. { p {Hot-Set, DBMIN p { {Flexible algorithms proposed here p p {Adaptable algorithms proposed here p p pTable 1: Classi�cation of Bu�er Allocation Algorithmsnotion of a locality set of a query, i.e. the number of bu�ers needed by a query without causingmany page faults. They propose the DBMIN algorithm that makes allocation equal to the size ofthe locality set. DBMIN also allows di�erent local replacement policies. Simulation results in [2, 3]show that DBMIN outperforms the Hot-Set strategy and the algorithms referred to in the �rstgroup.While the strength of DBMIN and other algorithms referred to in the second group lies in theirconsideration of the access patterns of queries, their weakness arises from their oblivion of runtimeconditions, such as the availability of bu�ers. This imposes heavy penalties on the performanceof the whole system. This de�ciency leads us to study and propose a uni�ed approach in bu�erallocation which simultaneously takes into account the access patterns of queries and the availabilityof bu�ers at runtime. The objective is to provide the best possible use of bu�ers so as to maximizethe number of page hits. The basis of this approach is the notion of marginal gains which specifythe expected number of page hits that would be obtained in allocating extra bu�ers to a query. Aswe shall see later, simulation results show that allocation algorithms based on marginal gains givesbetter performance than DBMIN.However, one characteristic common to all the above algorithms is that they are static in nature,and cannot adapt to changes in system loads and the mix of queries using the system. To rectifythe situation, in the second half of this paper, we propose a new family of bu�er managementtechniques that are adaptable to the workload of the system. The basic idea of our approach isto use predictors to predict the e�ect a bu�er allocation decision will have on the performance ofthe system. These predictions are based not only on the availability of bu�ers at runtime andthe characteristics of the particular query, but also on the dynamic workload of the system. Twopredictors are considered in this paper: throughput and e�ective disk utilization. Simulation resultsshow that bu�er allocation algorithms based on these two predictors perform better than existingones.In Section 2 we present mathematical models and derive formulas for computing the expectednumber of page faults for di�erent types of database references. Then we introduce in Section 3the notion of marginal gains, and present exible bu�er allocation algorithms based on marginalgains. In Section 4 we introduce the predictors and present the policies for adaptable allocationalgorithms. Finally, we present in Section 5 simulation results that compare the performance of3

our algorithms with DBMIN.2 Mathematical Models for Relational Database ReferencesIn this section we �rst review the taxonomy proposed by Chou and DeWitt[2, 3] for classifyingreference patterns exhibited by relational database queries. We analyze in detail the major typesof references, and present mathematical models and formulas calculating the expected number ofpage faults using a given number of bu�ers. These models help to provide formulas for computingmarginal gains and predictive estimates in Sections 3 and 4.2.1 Types of Reference PatternsIn [2, 3] Chou and DeWitt show how page references of relational database queries can be decom-posed into sequences of simple and regular access patterns. Here we focus on three major types ofreferences: random, sequential and looping. A random reference consists of a sequence of randompage accesses. A selection using a non-clustered index is one example. The following de�nitionsformalize this type of references.De�nition 1 A reference Ref of length k to a relation is a sequence < P1; P2; : : : ; Pk > of pagesof the relation to be read in the given order. 2De�nition 2 A random referenceRk;N of length k to a relation of sizeN is a reference< P1; : : : ; Pk >such that for all 1 � i; j � k, Pi is uniformly distributed over the set of all pages of the accessedrelation, and Pi is independent of Pj for i 6= j. 2In a sequential reference, such as in a selection using a clustered index, pages are referencedand processed one after another without repetition.De�nition 3 A sequential reference Sk;N of length k to a relation of size N is a reference <P1; : : : ; Pk > such that for all 1 � i; j � k � N , Pi 6= Pj . 2When a sequential reference is performed repeatedly, such as in a nested loop join, the referenceis called a looping reference.De�nition 4 A looping reference Lk;t of length k is a reference < P1; : : :Pk > such that for somet < k, i) Pi 6= Pj , for all 1 � i; j � t, and ii) Pi+t = Pi, for 1 � i � k � t. The subsequence< P1; : : : ; Pt > is called the loop, and t is called the length of the loop. 2In the following, for these three types of references, we give formulas for computing the expectednumber of page faults using a given number of bu�ers s. Table 2 summarizes the symbols used inthis section.De�nition 5 Let Ef(Ref; s) denote the expected number of page faults caused by a referenceRef using s bu�ers, where Ref can be Lk;t;Rk;N or Sk;N . 24

Symbols De�nitionsk length of a references number of bu�ersf number of page faultsN number of pages in the accessed relationt length of loop in a looping referenceLk;t a looping reference of length k and loop length tRk;N a random reference of length k and relation size NSk;N a sequential reference of length k and relation size NEf(Ref; s) expected number of faults for reference Ref with s bu�ersTable 2: Summary of Symbols and De�nitions2.2 Random ReferencesThroughout this section, we use P (f; k; s; N) to denote the probability that there are f faults in kaccesses to a relation of size N using s bu�ers, where s � 1 and 0 � f � k. Thus for a randomreference, the expected number of page faults is given by:Ef(Rk;N ; s) = kXf=1 f � P (f; k; s; N) (1)To model a random reference, we set up a Markov chain in the following way. A state in theMarkov chain is of the form [f; k] indicating that there are f faults in k accesses for f � k. Insetting up the transitions from states to states, there are two cases to deal with. In the �rst case,the number f of faults does not exceed the number s of allocated bu�ers. Thus, there must be fdistinct pages kept in the bu�ers after f faults. Now consider a state [f; k] in the chain. Thereare two possibilities to have f faults in k accesses. If the last access does not cause a page fault,that is with a probability f=N , then there must be f faults in (k � 1) accesses. In other words,there is an arc from state [f; k � 1] to state [f; k] with a transition probability of f=N . The otherarc to state [f; k] is from state [f � 1; k � 1] with a transition probability of (N � f + 1)=N . Thiscorresponds to the case when there are (f � 1) faults in (k� 1) accesses, and the last page accessedis not one of the (f � 1) pages being kept in the bu�ers. Hence, the case for f � s is summarizedby the following recurrence equation:P (f; k; s; N) = f=N � P (f; k � 1; s; N)+ (N � f + 1)=N � P (f � 1; k� 1; s; N); f � s (2)In the second case, the number f of faults exceeds the number s of allocated bu�ers. Localreplacement must have taken place, and there are always s pages kept in the bu�ers. Note howeverthat since the reference is random, the choice of local replacement policies is irrelevant. Theanalysis for the case when f > s is almost identical to the case when f � s except that thetransition probabilities must be changed to the following: s=N for accessing a page already in thebu�ers, and (N � s)=N otherwise. Hence, the situation for f > s is summarized by the followingrecurrence equation: 5

P (f; k; s; N) = s=N � P (f; k � 1; s; N) + (N � s)=N � P (f � 1; k� 1; s; N); f > s (3)In addition to the recurrence Equations 2 and 3, the base case is P (0; 0; s; N) = 1 for all s � 1.Then, the expected number of page faults Ef(Rk;N ; s) can be computed according to Equation 1.Except for the case where s = 1, we do not have a simple closed form formula for Ef(Rk;N ; s).Fortunately, the formula below gives very close approximations to the actual values:Ef(Rk;N ; s) � ( N � [1� (1� 1=N)k0 ] k < k0s + (k � k0) � (1� s=N) otherwise (4)where k0 = ln(1 � s=N) = ln(1 � 1=N). Intuitively, k0 is the expected number of page accessesthat �ll all the s bu�ers. Thus, the top row of the formula corresponds to the case where noneof the bu�ers that have been �lled needs to be replaced. This �rst case uses Cardenas' formula[1]which calculates the expected number of distinct pages accessed after k random pages have beenselected out ofN possible ones with replacement. More accurate results may be obtained with Yao'sformula[18], which assumes no replacement. All these formulas make the uniformity assumption;its e�ects are discussed in[4]. The second row corresponds to the case when local replacement hasoccurred. Then, s faults have been generated to �ll the s bu�ers (which take k0 page accesses onthe average); for the remaining (k � k0) requests, the chance of �nding the page in the bu�er poolis s=N .2.3 Sequential ReferencesRecall from De�nition 3 that each page in a sequential reference Sk;N is accessed only once. Thus,the probability of a page being re-referenced is 0. Hence, a sequential reference can be viewed as adegenerate random reference, and the following formula is obvious:Ef(Sk;N ; s) = k (5)2.4 Looping ReferencesRecall from condition (i) of De�nition 4 that within a loop, a looping reference Lk;t is strictlysequential. Thus, based on Equation 5, t page faults are generated in the �rst iteration of the loop.Then there are two cases. Firstly, if the number s of allocated bu�ers is not less than the length tof the loop, all pages in the loop are retained in bu�ers, and no more page faults are generated inthe remainder of the reference. The choice of a local replacement policy is irrelevant in this case.In the second case, if the number s of allocated bu�ers is less than the length t of the loop, thelocal replacement policy plays a major role in determining the number of page faults generated bya looping reference. Among all local replacement policies, it is not di�cult to see that for a loopingreference Lk;t, MRU replacement requires the fewest number of faults. The key observation is thatfor a looping reference, MRU is identical to the policy which looks ahead and keeps the pages that6

will be used in the most immediate future (cf. the table in the example below). Then a well-knownresult by Mattson et al[11] for optimal page replacement in operating systems can be applied toshow the optimality of MRU. Thus, in this paper we only present the analysis for MRU, which isbest explained by an example.Example 1 Consider a looping reference with the loop < a; b; c; d; e >. Suppose s = 3 bu�ers areavailable for the reference. The following table summarizes the situation under MRU.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25a b c d e a b c d e a b c d e a b c d e a b c d e* * * * * * * * * *a b c d e a b c d e a b c d e a b c d e a b c d ea b b b e a a a d e e e c d d d b c c c a b b ba a a b e e e a d d d e c c c d b b b c a a aThe �rst row of the table indicates the numbers of page accesses. The second row shows theorder the pages are accessed for �ve iterations of the loop. If a page hit occurs, the access is markedwith an asterisk. The last three rows of the table indicate the pages kept in the bu�ers after thatpage access, with the page most frequently used in the top row.This example demonstrates a few important properties of MRU. First note that there are�ve \mini-cycles" of length four which may not align with the iterations of the loop. They areseparated by vertical lines in the table above. These mini-cycles also follow a cyclic pattern,namely the twenty-sixth access of the table will be exactly the same as the sixth access, and so on.Furthermore, within each mini-cycle, there are two \resident" pages { those that are not swappedout in that mini-cycle. For instance, for the �rst mini-cycle, the resident pages are a and e. Notethat these resident pages are the pages that begin the next mini-cycle, avoiding page faults forthose accesses; this property is exactly the reason why MRU is optimal. 2In general, given a loop of length t, the mini-cycles are of length (t� 1). In other words, in (t� 1)iterations of the loop, there are t di�erent mini-cycles. Furthermore, these mini-cycles recur every(t� 1) iterations of the loop. Then in each mini-cycle, there are (s� 1) resident pages. Thus, thereare (t� s) faults in each mini-cycle. Hence, on the average, there are (t� s) � t = (t� 1) faults ineach iteration of the loop. Thus, the equation below follows immediately:Ef(Lk;t; s) = t+ (t� s) � t � (k=t� 1)=(t� 1); s � t (6)3 Marginal Gains and Flexible Allocation Methods: MG-x-yIn this section we �rst review DBMIN. Then we introduce the notion of marginal gains. Finally, wepropose exible bu�er allocation algorithms MG-x-y that are designed to maximize total marginalgains and utilization of bu�ers.3.1 Generic Load Control and DBMINIn order to classify and study various allocation methods, we break down the problem of loadcontrol into two. That is, during load control, a bu�er manager determines whether a waiting7

reference can be activated, and decides how many bu�ers to allocate to this reference. Throughoutthis paper, we use the term admission policy to refer to the �rst decision and the term allocationpolicy to refer to the second one. Once the admission and allocation policies are chosen, a bu�erallocation algorithm adopting the First-Come-First-Serve policy can be outlined as follows.Algorithm 1 (Generic) Whenever bu�ers are released by a newly completed query, or whenevera query enters in an empty queue, perform the following:1. Use the given admission policy to determine whether the query Q at the head of the waitingqueue can be activated.2. If this is feasible, use the allocation policy to decide the number s of bu�ers that Q shouldhave. Notice that only Q can write on these bu�ers which are returned to the bu�er poolafter the termination of Q. Then activate Q and go back to step 1.3. Otherwise, halt and all queries must wait for more bu�ers to be released. 2Note that for all the allocation algorithms considered in this paper, DBMIN and our proposedmethods alike, if a query consists of more than one reference, it is given a number of bu�ers thatis equal to the sum of bu�ers allocated to each relation accessed by the query. The allocation toeach relation is determined by the reference pattern as described in the previous section, and eachrelation uses its own allocated bu�ers throughout. See [2] for a more detailed discussion. In ongoingwork, we study how to allocate bu�ers on a per query basis. Before we describe DBMIN using thegeneral framework outlined in Algorithm 1, let us de�ne a few symbols that are used throughoutthe rest of this paper. We use the term A to denote the number of available bu�ers, and the termssmin and smax to denote respectively the minimum and maximum numbers of bu�ers that a bu�erallocation algorithm is willing to assign to a reference.For DBMIN, the admission policy is simply to activate a query whenever the speci�ed numberof bu�ers are available, that is smin � A. As for the allocation policy, it depends on the type of thereference. For a looping reference, the locality set size is the total number of pages of the loop [2,pp. 52]. Since DBMIN requires the entire locality set be allocated [2, pp. 50], i.e. smin = smax = t,where t is the length of the loop 1. As for a random reference, it is proposed in [2, 3] that arandom reference may be allocated 1 or byao bu�ers where byao is the Yao estimate on the averagenumber of pages referenced in a series of random record accesses[18]. In practice, the Yao estimatesare usually too high for allocation. For example, for a blocking factor of 5, the Yao estimate ofaccessing 100 records of a 1000-record relation is 82 pages. Thus, DBMIN almost always allocates1 bu�er to a random reference, i.e. smin = smax = 1. As a preview, some of our algorithms mayalso make use of the Yao estimate. But a very important di�erence is that unlike DBMIN whichallocates either 1 or 82 bu�ers in this example, our algorithms may allocate any bu�er within the1In [2], Chou remarks that MRU is the best replacement policy for a looping reference under sub-optimal allocation.However, as far as we know, no method is proposed in [2, 3] to allocate sub-optimally.8

range 1 and 82, depending on conditions such as bu�er availability and dynamic workload. Finally,for a sequential reference, DBMIN speci�es smin = smax = 1.Note that while DBMIN improves on traditional algorithms like Working-Set, LRU, etc., it is not exible enough to make full use of available bu�ers. This in exibility is illustrated by the fact thatthe range [smin; smax] degenerates to a point. In other words, DBMIN does not allow sub-optimalallocations to looping references, and not allow random references the luxury of being allocatedmany bu�ers even when those bu�ers are available. These problems lead us to the development ofthe notion of marginal gains and exible bu�er allocation algorithms MG-x-y to be discussed next.3.2 Marginal GainsThe concepts of marginal gain and marginal utility have been widely used in ecomonics theorysince the 18th century[9]. Here we apply the approach to database bu�er allocation.De�nition 6 For s � 2, the marginal gain of a reference Ref to use s bu�ers is de�ned as:mg(Ref; s) = Ef(Ref; s� 1)� Ef(Ref; s),where Ref can be Lk;t;Rk;N and Sk;N . 2For a given reference Ref , the marginal gain value mg(Ref; s) speci�es the expected number ofextra page hits that would be obtained by increasing the number of allocated bu�ers from (s� 1)to s. Note that these values take into account the reference patterns and the availability of bu�erssimultaneously. In essence, the marginal gain values specify quantitatively how e�ciently a referenceuses its bu�ers. Moreover, this quanti�cation is at a granularity level �ner than the locality set sizesused in DBMIN. Thus, while DBMIN can only allocate on a per locality-set-size basis, allocationalgorithms based on marginal gains can be more exible and allocate on a per bu�er basis. Belowwe analyze how the marginal gain values for di�erent types of references vary with the number ofbu�ers. This analysis is crucial in designing the exible algorithms to be presented.For a looping reference Lk;t, Equation 6 dictates that for any allocation s < t, extra page hitswould be obtained by allocating more and more bu�ers to the reference, until the loop can be fullyaccommodated in the bu�ers. The allocation s = t is the optimal allocation that generates thefewest page faults. Furthermore, any allocation s > t is certainly wasteful, as the extra bu�ersare not used. The graph for looping references in Figure 2 summarizes the situation. The typicalmarginal gain values of looping references are in the order of magnitude of O(10) or O(102). Forexample, if a reference goes through a loop of 50 pages 20 times, the marginal gain value for allbu�ers s � 50 is 19.4.Similarly, based on Equations 2, 3 and 4, it is easy to check that the marginal gain valuesof random references are positive, but are strictly decreasing as the number of allocated bu�erss increases, as shown in Figure 2. Eventually, the marginal gain value becomes zero, when theallocation exceeds the number of accesses or the number of pages in the accessed relation. Notethat, unlike DBMIN, a bu�er allocation algorithm based on marginal gains may allocate the idle9

6 -O(102)mg t slooping Lk;t 6 -O(10�1)mg smin(k;N)random Rk;N 6 -mg s0 sequential Sk;NFigure 2: Typical Curves of Marginal Gain Valuesbu�ers to the random reference, as long as the marginal gain values of the reference indicate thatthere are bene�ts to allocate more bu�ers to the reference. In fact, even if the number of idlebu�ers exceeds the Yao estimate, it may still be bene�cial to have an allocation beyond the Yaoestimate. It is however worth pointing out that the marginal gain values of a random referenceare normally lower than those of a looping reference. The highest marginal gain value of a randomreference is typically in the order of magnitude of O(1) or O(10�1). For example, for the randomreference discussed earlier (i.e. accessing 100 records from 200 pages) , the highest marginal gainvalue is about 0.5.Finally, as shown in Equation 5, the marginal gain values of sequential references are alwayszero, indicating that there is no bene�t to allocate more than one bu�er to such references (cf.Figure 2).3.3 MG-x-yAs we have shown above, the marginal gain values of a reference quantify the bene�ts of allocatingextra bu�ers to the reference. Thus, in a system where queries compete for a �xed number ofbu�ers, the marginal gain values provide a basis for a bu�er manager to decide which queriesshould get more bu�ers than others. Ideally, given N free bu�ers, the best allocation is the onethat does not exceed N and that maximizes the total marginal gain values of queries in the waitingqueue. However, such an optimization will be too expensive and complicated for bu�er allocationpurposes. Furthermore, to ensure fairness, we favor bu�er allocation on a First-Come-First-Servebasis. In the following we present a class MG-x-y of allocation algorithms that achieve high marginalgain values, maximizes bu�er utilization, and are fair and easy to compute. It follows the genericframework outlined in Algorithm 1. Like DBMIN, the allocation policy of MG-x-y presented belowallocates on a per reference basis.Allocation Policy 1 (MG-x-y) Let R be the reference at the head of the waiting queue, andA > 0 be the number of available bu�ers. Moreover, let x and y be the parameters of MG-x-y tobe explained in detail shortly.Case 1: R is a looping reference Lk;t. 10

1. If the number A of available bu�ers exceeds the length t of the loop (i.e. A > t), allocate tbu�ers to the reference.2. Otherwise, if the number of available bu�ers is too low (i.e. A < (x% � t)), allocate no bu�ersto this reference.3. Otherwise (i.e. A � (x% � t)), give all A bu�ers to the reference R.Case 2: R is a random reference Rk;N .1. As long as the marginal gain values of R are positive, allocate to R as many bu�ers as possible,but not exceeding the number A of available bu�ers and y (i.e. allocation � minimum (A; y)).Case 3: R is a sequential reference Sk;N .1. Allocate 1 bu�er. 2MG-x-y has two parameters, x and y. The x parameter is used to determine allocations forlooping references. As described in Case 1 above, MG-x-y �rst checks to see if the number ofavailable bu�ers exceeds the length of the loop of the looping reference. Recall from the previoussection and Figure 2 that the allocation which accommodates the whole loop minimizes page faultsand corresponds to the highest total marginal gain values of the reference. Thus, if there are enoughbu�ers, then like DBMIN, MG-x-y gives the optimal allocation. However, if there are not enoughbu�ers, MG-x-y checks to determine whether a sub-optimal allocation is bene�cial, via the use ofparameter x.In general, the response time of a query has two components: the waiting time and the processingtime, where the former is the time from the arrival of the query to the time the query is activated,and the latter is the time from activation to completion. The processing time is minimized withthe optimal allocation. But to obtain the optimal allocation, the waiting time may become toolong. On the other hand, while a sub-optimal allocation may result in longer processing time, itmay at the end give a response time shorter than the optimal allocation, if the reduction in waitingtime more than o�sets the increase in processing time. Hence, in trying to achieve this �ne balancebetween waiting time and processing time, MG-x-y uses the heuristic that a sub-optimal allocationis only allowed if the total marginal gain values of that allocation is not too \far" away from theoptimal. This requirement translates to the condition shown in Case 1 that a sub-optimal allocationmust be at least x% of the optimal one.In constrast to DBMIN, MG-x-y may allocate extra bu�ers to a random reference, as long asthose extra bu�ers are justi�ed by the marginal gain values of the reference. However, there is apitfall simply considering only the marginal gain values of the random reference. As an example,suppose a random reference is followed by a looping reference in the waiting queue. In situationswhere bu�ers are scarce, giving one more bu�er to the random reference implies that there is onefewer bu�er to give to the looping reference. But since the marginal gain values of a loopingreference are usually higher than those of a random reference, it is desirable to save the bu�er from11

allocation allocation policy admissionalgorithms looping random sequential policysmin smax smin smax smin smaxDBMIN t t 1 1 1 1 smin � AMG-x-y x% � t t 1 y 1 1 smin � Apredictive methods f(load) t f(load) byao 1 1 smin � ATable 3: Characteristics of Bu�er Allocation Algorithmsthe random reference and to allocate the bu�er to the looping reference instead. Since MG-x-yoperates on a First-Come-First-Serve basis, MG-x-y uses the heuristic of imposing a maximum onthe number of bu�ers allocated to a random reference. This is the purpose of the y parameter inMG-x-y.The �rst two rows of Table 3 summarize the similarities and di�erences between DBMIN andMG-x-y. Recall from the previous section that smin and smax denote respectively the minimum andmaximum numbers of bu�ers that a bu�er allocation algorithm is willing to assign to a reference.In fact, it is easy to see that MG-x-y generalizes DBMIN in that MG-100-1 (i.e. x=100%, y=1)is the same as DBMIN. As we shall see in Section 6, as we allow more exible values for x and ythan DBMIN, MG-x-y performs considerably better.Note that to obtain the best performance, the x and y parameters need to be determinedaccording to the mix of queries to use the system. This may involve experimenting with di�erentcombinations of values of x and y 2. Clearly, this kind of experimentation is expensive. Moreover,these optimal values are vulnerable to changes in the mix of queries. Thus, in the next section, weexplore further the idea of exible bu�er allocation, and we develop adaptable allocation algorithmsthat dynamically choose the smin and smax values using run-time information. The basis of ourapproach is to use a queueing model to give predictions about the performance of the system, andto make the smin and smax parameters vary according to the state of the queueing model. In thenext section, we describe the proposed queueing model, as well as the ways the model can be usedto perform bu�er allocation in a fair (FCFS), robust and adaptable way.4 Adaptable Bu�er Allocation4.1 Predictive Load ControlAs described in the previous section, both DBMIN and MG-x-y are static in nature and theiradmission policy is simply: smin � A, where smin is a pre-de�ned constant, for each type ofreference. Here we propose adaptable methods that use dynamic information, so that smin is nowa function of the workload, denoted by f(load) in Table 3. Thus in considering admissions, thesemethods not only consider the characteristics of the reference and the number of available bu�ers,2A good starting point is x = 50 and y = byao from our experience.12

Symbols De�nitionsA number of available bu�erssmin minimum number of bu�ers assigned to a referencesmax maximum number of bu�ers assigned to a referencesopt (usually=smax ) maximum number of bu�ers usable by a referenceTP throughputn multiprogramming levelmpl � n number of active queriesncq number of concurrent queries (active + waiting for bu�ers)TC;i CPU load of RefiTD;i disk load of RefitD time for one disk accesstC time to process one page in main memoryTC (harmonic or geometric) average of CPU loadsTD (harmonic or geometric) average of disk loads� relative load (disk vs CPU)UD disk utilizationUD;i disk utilization due to RefiEDU e�ective disk utilizationsi number of bu�ers assigned to Refiwi portion of \avoidable" (\wasted") page faults of RefiTable 4: Summary of Symbols and De�nitions for queueing modelbut they also take into account the dynamic workload of the system. More speci�cally, a waitingreference is activated with s bu�ers, if this admission is predicted to improve the performance of thecurrent state of the system. In more precise notations, suppose Pf denotes a performance measure(e.g. throughput), �!Refcur represents all the references (i.e. queries) Ref1; : : : ; Refn currently inthe system, with �!scur= s1; : : : ; sn bu�ers respectively, and Ref is the reference under considerationfor admission. Then smin is the smallest s that will improve the Pf predictor: Pf( �!Refnew ; �!snew) � Pf( �!Refcur ; �!scur), where �!Refnew= �!Refcur [Ref , �!snew= s1; : : : ; sn; s, and the symbol Pf(~R;~s)denotes the performance of the system with ~R active references and ~s bu�er allocations. Thus, thereference Ref is admitted only if it will not degrade the performance of the system3.In this paper we consider two performance measures or predictors: throughput TP and e�ectivedisk utilization EDU . Before we analyze the above predictors and discuss the motivation behindour choices, we outline a queueing model that forms the basis of these predictors. At the end ofthis section, we discuss how these predictors can be incorporated with various allocation policiesto give di�erent adaptable bu�er allocation algorithms. In section 6 we present simulation resultscomparing the performance of these adaptable algorithms with MG-x-y and DBMIN.4.2 Queueing ModelWe assume a closed queueing system with two servers: one CPU and one disk. Figure 3 shows thesystem, and Table 4 summarizes the symbols used for the queueing model. Within the system, thereare n references (jobs) Ref1; : : : ; Refn whose CPU and disk loads are TC;i and TD;i respectively for3There is however one exception; see Section 4.4 for a discussion.13

queue forbu�ers diskCPU ......Figure 3: Queueing systemi = 1; : : : ; n. Furthermore, Refi has been allocated si bu�ers. Therefore, if every disk access coststD (e.g. 30 msec), and the processing of a page after it has been brought in core costs tC (e.g. 2msec), we have the following equations:TD;i = tD �Ef(Refi; si) (7)TC;i = tC � ki (8)where ki is the number of pages accessed by Refi, and Ef(Refi; si) can be computed using theformulas listed in Section 3.The general solution to such a network can be calculated; see for example [17, pp. 451-452].It involves an n-class model with each job being in a class of its own. But while it gives accurateperformance measures such as throughput and utilizations, this solution is expensive to compute,since it requires exponential time on the number of classes. As ease of computation is essential inload control, we approximate it with a single-class model. We assume that all the jobs come fromone class, with the overall CPU load TC and the overall disk load TD being the averages of therespective loads of the individual references. TC and TD may be the harmonic or geometric meansdepending on the predictors to be introduced in the following.Before we proceed to propose two performance predictors for allocation, note that in this paper,we focus on a single-disk system, mainly to show the e�ectiveness of the proposed bu�er allocationschemes. A multiple disk system would introduce the issue of data placement; once this has beendecided, we could extend our queueing model to have multiple disks. Queueing systems withmultiple servers are studied in [17].4.3 Predictor TPSince our ultimate performance measure is the throughput of the system, a natural predictor is toestimate the throughput directly. In general, there are two ways to try to increase the throughputof a system: increase the multiprogramming level mpl, or decrease the disk load of the jobs byallocating more bu�ers to the jobs. However, these two requirements normally con ict with eachother, as the total number of bu�ers in a system is �xed. Hence, for our �rst predictor TP, wepropose the following admission policy: 14

Admission Policy 1 (TP) Activate the reference if the maximal allocation is possible; otherwise,activate only if the reference will increase the throughput. 2In the policy described above, a maximal allocation is one which assigns as many bu�ers tothe reference as the reference needs and as many as the number of bu�ers that are available. Toimplement the above policy, we provide formulas to compute the throughput. The solution to thesingle class model is given in [17]: TP = UD=TD: (9)UD is the utilization of the disk given by:UD = � �n � 1�n+1 � 1 (10)where � is the ratio of the disk load versus the CPU load� = TD=TC : (11)To derive the average loads TC and TD, we use the harmonic means of the respective loads. Thereason is that the equations of the queueing systems are based on the concept of \service rate"which is the inverse of the load. Thus, using the harmonic means of the loads is equivalent to usingthe arithmetic means of the rates, i.e. 1=TC = 1=n �Pni=1 1=TC;i and 1=TD = 1=n �Pni=1 1=TD;i.Notice that the calculation of the throughput requires O(1) operations, if the bu�er managers keepstrack of the values TD and TC .4.4 Predictor EDUAlthough very intuitive, using the estimated throughput as the criterion for admission may leadto some anomalies. Consider the situation when a long sequential reference is at the head of thewaiting queue, while some short, maximally allocated random references are currently running inthe system. Now admitting the sequential reference may decrease the throughput, as it increasesthe average disk load per job. However, as the optimal allocation for the sequential reference isonly one bu�er, activating the sequential reference is reasonable. Exactly for this reason, AdmissionPolicy 1 is \patched up" to admit a reference with smax bu�ers, even if this admission decreasesthe throughput.This anomaly of the throughput as a predictor leads us to the development of our secondpredictor { E�ective Disk Utilization (EDU). Consider the following point of view of the problem:There is a queue of jobs (i.e. references), a system with one CPU and one disk, and a bu�er poolthat can help decrease the page faults of the jobs. Assuming that the disk is the bottleneck (whichis the case in all our experiments, and is usually the case in practice), a reasonable objective isto make the disk work as e�ciently as possible. There are two sources of ine�cient uses of thedisk: (1) the disk is sitting idle because there are very few jobs, or (2) the disk is working on pagerequests that could have been avoided if enough bu�ers had been given to the references causingthe page faults. The following concept captures these observations.15

0% 100%wastenwaste1 idleUD;nUD;1� 1=nUD UDFigure 4: E�ective disk utilizationDe�nition 7 The e�ective disk utilization EDU is the portion of time that the disk is engaged inpage faults that could not be avoided even if the references are each assigned its optimal numberof bu�ers (in�nite, or, equivalently sopt which is the maximum number of bu�ers usable by areference). 2Hence, for our second predictor EDU, we use the following admission policy:Admission Policy 2 (EDU) Activate the reference if it will increase the e�ective disk utilization.2Mathematically, the e�ective disk utilization is expressed by:EDU = ( nXi=1 UD;i)� ( nXi=1UD;i � wi) (12)where UD;i represents the disk utilization due to Refi and wi is the portion of \avoidable" (or\wasted") page faults caused by Refi:wi = Ef(Refi; si)� Ef(Refi;1)Ef(Refi; si) : (13)For practical calculations, we use sopt instead of 1; clearly, sopt is 1, t and byao for sequential,looping and random references respectively. Note that the above equation relates the notion of EDUto marginal gain values introduced in the previous section. The term Ef(Refi; si)�Ef(Refi; sopt)can be rewritten asPsoptj=si+1mg(Refi; j), by De�nition 6. Thus, wi, while intuitively represents theportion of avoidable page faults, can also be regarded as some form of normalized marginal gainvalues.Informally, Equation 12 speci�es that at every unit time, the disk serves Refi for UD;i units oftime. Out of that, Refi wastes wi � UD;i units of time. Summing over all jobs, we get Equation12. Figure 4 illustrates the concept of e�ective disk utilization. The horizontal line corresponds toa 100% disk utilization; the dotted portion stands for the idle time of the disk, the dashed partscorrespond to the \wasted" disk accesses and the sum of the solid parts corresponds to the e�ectivedisk utilization. 16

Note that, for I/O bound jobs, every job has approximately an equal share of the total diskutilization UD, even though the jobs may have di�erent disk loads. Thus, we have the formula:UD;i = UD=n, which simpli�es Equation 12 to:EDU = UD � UD=n � ( nXi=1wi): (14)Notice that we have not yet used a single-class approximation. We only need this approximationto calculate the disk utilization UD . Using the exact n-class model [17], we �nd out that thegeometric averages give a better approximation to the the disk utilization. Thus, the average CPUand disk loads are given by: TC = nqQni=1 TC;i and TD = nqQni=1 TD;i. Based on these equations,the disk utilization UD can be computed according to Equations 10 and 11. Like calculating theTP predictor, the calculation of EDU requires O(1) steps, if the bu�er manager keeps track of theloads TC , TD and the total \wasted" disk accesses Pni=1 wi.4.5 Adaptable Bu�er Allocation AlgorithmsThus far we have introduced two predictors: TP and EDU.We have presented the admission policiesbased on these predictors and provided formulas for computing these predictions. To complete thedesign of adaptable bu�er allocation algorithms, we propose three allocation policies, which arerules to determine the number of bu�ers s to allocate to a reference, once the reference has passedthe admission criterion.Allocation Policy 2 (Optimistic) Give as many bu�ers as possible, i.e. s=min(A; smax). 2Allocation Policy 3 (Pessimistic) Allocate as few bu�ers as necessary to random references(i.e. smin), but as many as possible to sequential and looping references. 2The optimistic policy tends to give allocations as close to optimal as possible. However, it mayallocate too many bu�ers to random references, even though these extra bu�ers may otherwise beuseful for other references in the waiting queue. The pessimistic policy is thus designed to dealwith this problem. But a weakness of this policy is that it unfairly penalizes random references. Inparticular, if there are abundant bu�ers available, there is no reason to let the bu�ers sit idle andnot to allocate these bu�ers to the random references.Allocation Policy 4 (2-Pass) Assign tentatively bu�ers to the �rst m references in the waitingqueue, following the pessimistic policy. Eventually, either the end of the waiting queue is reached,or the m+1 -th reference in the waiting queue cannot be admitted. Then perform a second passand distribute the remaining bu�ers equally to the random references that have been admittedduring the �rst pass. 2In essence, when the 2-Pass policy makes allocation decisions, not only does it consider the referenceat the head of the waiting queue, but it also takes into account as many references as possible inthe rest of the queue. 17

query query selec- access path join access path reference typetype operators tivity of selection method of join data pages onlyI select(A) 10 % clustered index { { S50;500II select(B) 10 % non-clustered index { { R30;15III select(C) 1 % non-clustered index { { R30;150IV select(A) 1 B 1 % sequential scan index join non-clustered index on B R100;15V select(B) 1 C 10 % sequential scan index join non-clustered index on B R30;150VI select(A) 1 B 4 % clustered index nested loop sequential scan on B L300;15Table 5: Summary of Query Typesrelation A 10,000 tuplesrelation B 300 tuplesrelation C 3,000 tuplestuple size 182 bytespage size 4KTable 6: Details of RelationsFollwing the generic framework described in Algorithm 1, the three allocation policies can beused in conjunction with both TP and EDU, giving rise to six potential adaptable bu�er allocationalgorithms. As a naming convention, each algorithm is denoted by the pair \predictor-allocation"where \predictor" is either TP or EDU, and \allocation" is one of: o, p, 2, representing optimistic,pessimistic and 2-Pass allocation policies respectively. For instance, EDU-o stands for the algorithmadopting the EDU admission policy and the optimistic allocation policy.5 Simulation ResultsIn this section we present simulation results on the performance of MG-x-y and the adaptablemethods in a multiuser environment. As Chou and DeWitt have shown in [2, 3] that DBMINperforms better than the Hot-Set algorithm, First-In-First-Out, Clock, Least-Recently-Used andWorking-Set, we only compare our algorithms with DBMIN.5.1 Details of SimulationIn order to make direct comparison with DBMIN, we use the simulation program Chou and DeWittused for DBMIN, and we experiment with the same types of queries. Table 5 summarizes the detailsof the queries that are chosen to represent varying degrees of demand on CPU, disk and memory[2, 3]. Table 6 and Table 7 show respectively the details of the relations and the query mixes weused. In the simulation, the number of concurrent queries varies from 2 to 16 or 24. Each of theseconcurrent queries is generated by a query source which cannot generate a new query until thelast query from the same source is completed. Thus, the simulation program simulates a closed18

I II III IV V VIS50;500 R30;15 R30;150 R100;15 R30;150 L300;15mix 1 10% 10% { 10% { 70%mix 2 10% 45% { 45% { {mix 3 10% 30% { { 30% 30%mix 4 { { 50% { 50% {Table 7: Summary of Query MixesMG-100-120 2 4 6 8 10 12 14 160.951.001.051.101.151.201.251.301.351.401.451.501.55

number of concurrent queriesratiointhroughput

(IDEAL)MG-50-12TP-o, EDU-oEDU-2MG-50-15MG-50-6MG-50-1DBMINFigure 5: Relative Throughput: Mix 1 (mainly looping references), no Data Sharingsystem4. See [2, 3] for more details.5.2 E�ectiveness of Allocations to Looping ReferencesThe �rst mix of queries consists of 70% of queries of type VI (looping references) and 10% each ofqueries of types I, II and IV (sequential, random and random references respectively). The purposeof this mix is to evaluate the performance of MG-x-y and adaptable algorithms in situations wherethere are many looping references to be executed. The x parameter of MG-x-y is set to one ofthe following: 100, 85, 70 and 50. The y parameter is one of 1, 6, 12 and 15. Figure 5 showsthe throughputs of DBMIN, MG-100-12, MG-50-y's and the adaptable algorithms running with4Besides bu�er management, concurrency control and transaction management is another important factor af-fecting the performance of the whole database system. While the simulation package does not consider transactionmanagement, see [2] for a discussion on how the transaction and lock manager can be integrated with a bu�er man-ager using DBMIN. Since our algorithms di�er from DBMIN only in load control, the integration proposed there alsoapplies to a bu�er manager using our algorithms. 19

MG-100-12MG-100-12DBMIN TP-o, EDU-o, EDU-2MG-50-15MG-50-12MG-50-6MG-50-1DBMINTP-o, EDU-o, EDU-2MG-50-y's4 8 12 160.300.400.500.600.700.800.901.001.101.20

number of concurrent queriesratioinwaitingtime 0 2 4 6 8 10 12 14 163540455055606570

7580859095number of concurrent queries

%inbufferutilizationFigure 6: Average Waiting Time and Bu�er Utilization: Mix 1di�erent number of concurrent queries using 35 bu�ers. The results for MG-70-y's and MG-85-y'sare similar to those for MG-50-y's, and they are omitted for brevity. The results for the pessimisticapproach are typically only slightly better than those for DBMIN, and thus these performance�gures are not plotted in the graphs for brevity. The major reason why the pessimistic approachgives poor performance is that the approach is being too aggressive in allowing too many queriesto get into the system. Note that to obtain the throughput values, we run our simulation packagerepeatedly until the values stabilized. [2] discusses how the simulation package can be used to obtainresults within a speci�ed con�dence interval. Figure 5 also includes the throughputs of the \ideal"algorithm that has in�nitely many bu�ers and can therefore support any number of concurrentqueries requiring any number of bu�ers. Furthermore, to highlight the increase or decrease relativeto DBMIN, the values are normalized by the values of DBMIN, e�ectively showing the ratio inthroughput.Let us focus our attention on the MG-x-y algorithms �rst. All four MG-50-y algorithms showconsiderable improvement when compared with DBMIN. In particular, since the allocations forrandom and sequential references are the same for both MG-50-1 and MG-100-1 (i.e. DBMIN),the improvement exhibited by MG-50-1 relative to MG-100-1 is due solely to the e�ectiveness ofallocating bu�ers sub-optimally to looping references, whenever necessary. As the y value increasesfrom 1 to 15, the throughput increases gradually until y becomes 15. The increase in throughputcan be attributed to the fact that the random queries are bene�ted by the allocation of more bu�ers.But when too many bu�ers (e.g. y = 15) are allocated to a random query, some of the bu�ers arenot used e�ciently. Thus, the throughput of MG-50-15 is lower than that of MG-50-12. Finally, theadaptable algorithms TP-o, EDU-o and EDU-2 perform comparably to the best MG-x-y schemewhich is MG-50-12 in this case.Note that to a certain extent, the algorithm MG-100-12 represents the algorithm that allocates20

MG-70-12MG-50-12 MG-85-12DBMIN28 29 30 31 32 33 34 35 36 37 380.951.001.051.101.151.201.251.301.351.40total number of bu�ers

ratiointhroughputFigure 7: Relative Throughput vs Total Bu�ers: Mix 1 (ncq = 8)bu�ers to minimize the number of page faults. However, such \optimal" allocations may inducehigh waiting time5 for queries and low bu�er utilization and throughput of the system. The twographs in Figure 6 demonstrate the situation. The graph on the left shows the average waiting timeof queries. Values are again normalized by the values of DBMIN. The graph on the right shows theaverage percentage of bu�ers utilized.Thus far, we have seen how the performance of MG-x-y varies with di�erent values of x andy. Figure 7 shows how the relative throughput varies with the number of total bu�ers used inrunning this mix of queries with 8 concurrent queries. The graphs for other multiprogramminglevels exihibit similar patterns. Figure 7 shows the situations when sub-optimal allocations areallowed by MG-50-12, MG-70-12 and MG-85-12. For instance, when the number of total bu�ersbecomes 30, MG-50-12 allows sub-optimal allocations to looping references, and the throughput ofthe system increases signi�cantly when compared with other algorithms. As the total number ofbu�ers increases, MG-70-12 and MG-85-12 follow MG-50-12 and perform better than DBMIN. Thisdiscrepancy can be explained by considering a looping reference at the head of the waiting queue.Because DBMIN insists on giving the optimal allocation to this reference (18 in this case), thisreference is blocking other queries from using the bu�ers. Now when this reference �nally managesto get the optimal number of bu�ers (i.e. when the total number of bu�ers becomes 36), DBMINperforms not too much worse than the others. In this case, the di�erence in throughput is dueto the e�ective allocations to random references by the MG-x-12 algorithms. If the graph extendsto higher numbers of total bu�ers, we expect that a similar pattern of divergence in throughput5The waiting time of a query is the time from arrival to activation.21

(IDEAL)MG-100-13EDU-oTP-oMG-100-8EDU-2MG-100-15DBMIN0 2 4 6 8 10 12 14 16 18 20 22 240.951.001.051.101.151.201.251.301.351.401.451.501.55number of concurrent queries

ratiointhroughputFigure 8: Relative Throughput: Mix 2 (mainly random references), no Data Sharingappears before every multiple of 18, though the magnitude will probably decrease.5.3 E�ectiveness of Allocations to Random ReferencesThe second mix of queries consists of 45% of queries of type II, 45% of queries of type IV (bothrandom references), and 10% of queries of type I (sequential references). The purpose of thismix is to evaluate the e�ectiveness of MG-x-y and the adaptable schemes on allocating bu�ers torandom references. Since there are no looping references in this mix, the x parameter of MG-x-yis irrelevant and is simply set to 100. The y parameter is one of the following: 1, 8, 13 and 15.Figure 8 shows the ratio of throughputs of DBMIN, MG-100-y's and the adaptable algorithmsrunning with di�erent number of concurrent queries using 35 bu�ers. As before, the results forthe pessimistic policies are not explicitly included in the �gure. For this mix of queries, algorithmsadopting the pessimistic policies behave exactly as DBMIN (i.e. MG-100-1) in allocating one bu�erto each random reference.Let us focus our attention on the MG-x-y algorithms �rst. Compared with DBMIN (i.e. MG-100-1), all three other MG-100-y algorithms show signi�cant increases in throughput. As the yvalue increases from 1 to 15, the throughput increases gradually until y becomes 15. The increasein throughput can be attributed to the fact that the random queries are bene�ted by the allocationof more bu�ers. But as explained in the previous section, when y becomes 15, some of the bu�ersallocated to random queries are no longer used e�ciently. Thus, the throughput of MG-100-15drops below that of MG-100-13, and even that of MG-100-8.As for the adaptable algorithms, EDU-o and TP-o perform comparably to MG-100-13 and the22

TP-o, EDU-o, EDU-2MG-50-12, MG-50-15,MG-50-1DBMINtuphguorhtnioitar

0 2 4 6 8 10 12 14 160.951.001.051.101.151.201.251.301.351.401.451.50number of concurrent queriesFigure 9: Relative Throughput: Mix 1, full Data Sharing\ideal" algorithm. But for EDU-2, though better than DBMIN, it does not perform as well as theothers. This is because every time during the �rst pass of allocations (cf. Allocation Policy 4),EDU-2 has the tendency of activating many random references. As a result, the number of bu�ersper random reference allocated by EDU-2 is lower than that allocated by the other algorithms,thereby causing more page faults and degrading overall performance.5.4 E�ect of Data SharingIn the simulations carried out so far, every query can only access data in its own bu�ers. However,our algorithms can support sharing of data among queries in exactly the same way as DBMIN.More speci�cally, when a page is requested by a query, the algorithm �rst checks to see if the pageis already in the bu�ers owned by the query. If not, and if data is allowed to be shared by thesystem, the algorithm then tries to �nd the page from the bu�ers where the query is allowed toshare. If the page is found, the page is given to the query, without changing the original ownershipof the page. See [2, 3] for more details.To examine the e�ect of data sharing on the relative performance of our algorithms relative toDBMIN, we also run simulations with varying degrees of data sharing. Figure 9 shows the relativethroughputs of DBMIN, MG-50-y's and the adaptable algorithms running the �rst mix of querieswith 35 bu�ers, when each query has read access to the bu�ers of all the other queries, i.e. fulldata sharing.Compared with Figure 5 for the case of no data sharing, Figure 9 indicates that data sharingfavors our algorithms. For other query mixes we have used, the same behaviour occurs. In fact,23

(b)(a) EDU-2TP-oEDU-oMG-50-16MG-50-18DBMIN0 2 4 6 8 10 12 14 160.951.001.051.101.151.201.251.301.35number of concurrent queries

ratiointhroughputEDU-o, TP-o MG-50-18EDU-2DBMIN0 2 4 6 8 10 12 14 160.981.001.021.041.061.081.101.12number of concurrent queries

ratiointhroughput Figure 10: Switching Mixes: (a) Stage 1 { Mix 4, (b) Stage 2 { Mix 3this phenomenon is not surprising because sub-optimal allocations to looping references give evenbetter results if data sharing is allowed. It is obvious that with data sharing, the higher the bu�erutilization, the higher the throughput is likely to be. In other words, the in exibility of DBMIN inbu�er allocation becomes even more costly than in the case of no data sharing.5.5 Comparisons with MG-x-y { AdaptabilityAmong all the simulations we have shown thus far, the adaptable allocation algorithms TP-o, EDU-o and EDU-2 perform comparably to the best of MG-x-y. The reason is that we have a �xed mix ofqueries, with few types of queries, and we have selected carefully the x and y parameters that arebest suited for this speci�c mix. But in the simulations described below, we shall see that havingone set of statically chosen values for x and y creates some problems for MG-x-y.The �rst problem of MG-x-y is due to the fact that each MG-x-y scheme has only one x andone y value for all kinds of looping and random references. Consider the situation where there aretwo kinds of random references: the �rst one with a low Yao estimate and high selectivity, and theother one with a high Yao estimate and low selectivity. For example, consider Query Type II andV respectively. Query Type II (R30;15) has a Yao estimate of 12 and a selectivity of making 30random accesses on 15 pages. On the other hand, Query Type V (R30;150) has a Yao estimate of27 and a selectivity of making 30 random accesses on 150 pages. For a query of the �rst type, it isbene�cial to allocate as close to the Yao estimate as possible. But for a query of the second type,it is not worthwhile to allocate many bu�ers to the query. Thus, for any MG-x-y scheme, usingone y value is not su�cient to handle the diversity of queries. This problem is demonstrated byrunning a simulation on the third query mix which consists of the two kinds of random referencesmentioned above (Query Type II and query Type V). Figure 10(b) shows the relative throughput24

TP-oMG-50-18DBMIN0.700.750.800.850.900.951.001.051.101.151.201.251.30

simulation timeinstantaneousthroughput0.700.750.800.850.900.951.001.051.101.151.201.251.30

simulation timeinstantaneousthroughput0.700.750.800.850.900.951.001.051.101.151.201.251.30

simulation timeinstantaneousthroughput Figure 11: Mix 4 to Mix 3: Instantaneous Throughput before and after Switchingof running this mix of queries with 30 bu�ers. When compared with the best result of MG-x-y (i.e.MG-50-16 in this case), the adaptable algorithms perform better, handling the diversity of queriesmore e�ectively.The second weakness of MG-x-y is its inability to adjust to changes in query mixes. Figure 10shows the result of running a simulation that consists of two stages. In the �rst stage, the querymix (i.e. mix 4) consists of random references only. As shown in Figure 10(a), the best resultof MG-x-y (i.e. MG-50-18 in this case) performs comparably to the adaptable algorithms. Butwhen the second stage comes and the query mix changes from mix 4 to mix 3, MG-50-18 cannotadapt to the changes, as illustrated by Figure 10(b). In contrast, the adaptable algorithms adjustappropriately.Figure 11 shows how the instantaneous throughputs of DBMIN, MG-50-18 and TP-o uctu-ate before and after switching the mixes. The instantaneous throughput values are obtained bycalculating the average throughputs within 10-second windows. The thin line in each graph plotsthe uctuation of the instantaneous throughputs, and the solid line represents the (overall) averagethroughput of the mix. indicates the moment of switching mixes. The �gure indicates that, atthe time of switching, the instantaneous throughputs of DBMIN uctuate greatly, eventually ta-pering o� to a lower average throughput. For MG-50-18, the uctuation after switching the mixesis greater than before. As for TP-o and other adaptable schemes, since they are designed to besensitive to the characteristics of queries currently running in the system, uctuation is expected.5.6 SummaryOur simulation results show that the MG-x-y algorithms are e�ective in allocating exibly toqueries. Compared with DBMIN, MG-x-y algorithms give higher throughput, higher bu�er uti-lization and lower waiting time for queries. The increase in performance is even higher when datasharing is allowed.Our simulation results also indicate that adaptable allocation algorithms are more e�ective andmore exible than DBMIN, with or without data sharing. They are capable of making allocation25

allocation average time taken in average response ratio of load control timealgorithms load control (msec) time (sec) to response timeDBMIN 0.01 1.39 O(10�5)MG-100-13 0.01 0.94 O(10�5)TP-o 0.73 0.96 O(10�3)EDU-o 0.77 0.95 O(10�3)EDU-2 2.98 1.12 O(10�3)Table 8: Costs of Running the Algorithms (Mix 2, ncq = 4, no data sharing)decisions based on the characteristics of queries, the runtime availability of bu�ers, and the dynamicworkload. When compared with the MG-x-y algorithms, they are more adaptable to changes, whilebehaving as exibly as the MG-x-y schemes. Moreover, no sensitivity analysis is needed for theadaptable methods.The advantages of the adaptable schemes listed above seem to indicate that the adaptablealgorithms should be used in all situations. The only concern is the amount of time they take tomake load control decisions. Table 8 lists the average time a query took in load control and theaverage response time of a query, running query mix 2 with 4 concurrent queries (cf. Figure 8).These �gures are obtained by running our simulation package in a UNIX environment in a DEC-2100 workstation. It is easy to see that the MG-x-y algorithms take much less time to execute thanthe adaptable ones. Thus, in situations where query mixes are not expected to change too often,and where sensitivity analysis can be performed inexpensively to �nd good values for the x andy parameters, it is bene�cial to use the MG-x-y algorithms instead of the adaptable ones. In anyother case, the adaptable algorithms are more desirable. Even though the computation of thesealgorithms take much longer time than the static ones, the extra time is worthwhile. After all, 3milliseconds (i.e. for the worst case EDU-2) can be more than o�set by saving one disk read, and3 milliseconds constitute less than 1% of the total response time of a query.As for the two predictors TP and EDU, both of them perform quite well. While EDU is probablymore accurate for a single disk system, TP is more extendible to multi-disk systems, and is slightlyeasier to compute (cf. Table 8). As for the allocation policies, the winners are the 2-Pass approachand the optimistic one. The pessimistic approach generally give poor results. The 2-Pass approachon the other hand performs well in most situations, with the exception of heavy workloads consistingprimarily of random references. In this case, the 2-Pass policy degenerates to the pessimistic one,because there is normally no bu�ers left over to be distributed in the second pass. Another practicaldisadvantage of the 2-Pass policy is that it cannot activate queries instantaneously because queriesadmitted in the �rst pass may have to wait for the second pass for additional bu�ers. Thus, itis slower than the algorithms that only require one pass. Finally, the optimistic allocation policyperforms very well in most situations. In addition, the optimistic policy is simple, easy to implement26

and, unlike the 2-Pass approach, is capable of making instantaneous decisions.6 ConclusionsThe principal contributions reported in this paper are summarized in the following list.1. We have proposed and studied exible bu�er allocation.� It is a uni�ed approach for bu�er allocation in which both the access patterns of queriesand the availability of bu�ers at runtime are taken into consideration. This is achievedthrough the notion of marginal gains which give an e�ective quanti�cation on how bu�erscan be used e�ciently.� The MG-x-y allocation algorithms are designed to achieve high total marginal gains andmaximize bu�er utilization. Generalizing DBMIN which is the same as MG-100-1, theycan allocate bu�ers more exibly.� Simulation results show that exible bu�er allocation is e�ective and promising, and theMG-x-y algorithms give higher throughput, higher bu�er utilization and lower waitingtime for queries than DBMIN.2. We have proposed and studied adaptable bu�er allocation.� Extending the exible bu�er allocation approach, it incorporates runtime informationin bu�er allocation. Based on a simple, but accurate single-class queueing model, itpredicts the impact of each bu�er allocation decision.� Two performance predictors { TP and EDU { are proposed. In general, a waiting queryis only activated if its activation does not degrade the performance of the system, asestimated by the predictors. In addition, three di�erent allocation policies are stud-ied: optimistic, pessimistic and 2-pass. Combined with the two predictors, six di�erentadaptable bu�er allocation algorithms are considered.� Simulation results indicate that the adaptable algorithms are more e�ective and exiblethan DBMIN. When compared with the exible algorithms MG-x-y, the adaptable onesare capable of adapting to changing workloads, while performing as exibly as MG-x-y.Though more costly to compute, the extra time is well paid o�. Finally, simulationresults show that both performance predictors TP and EDU perform equally well, andthat the optimistic and 2-pass allocation policies are e�ective. Taking implementationcomplexity into account, TP-o seems to be the best choice.3. We have set up mathematical models to analyze relational database references. These modelsprovide formulas to compute marginal gains and the performance predictions based on TPand EDU. 27

In ongoing research, we are investigating how to extend our predictors to systems with multipledisks, and how to set up analytic models for references with data sharing. We are also studyingwhether the exible and predictor approach can be incorporated into the framework proposed byCornell and Yu[5], in order to improve the quality of query plans generated by a query optimizer.Finally, we are interested in deriving formulas for computing marginal gains of more complex querieslike sort-merge joins.Acknowledgements. We would like to thank H. Chou and D. DeWitt for allowing us to usetheir simulation program for DBMIN so that direct comparison can be made. We would also liketo thank anonymous referees for many valuable suggestions and comments.References[1] A.F. Cardenas. Analysis and Performance of Inverted Data Base Structures, Communicationsof the ACM (18) 5 (1975).[2] H. Chou. Bu�er Management of Database Systems, Computer Sciences Techincal Report 597,University of Wisconsin, Madison (1985).[3] H. Chou and D. DeWitt. An Evaluation of Bu�er Management Strategies for RelationalDatabase Systems, Proc. of the 11th Intern. Conference on Very Large Data Bases (1985).[4] S. Christodoulakis. Implication of Certain Assumptions in Data Base Performance Evalua-tion, ACM Transactions on Database Systems (9) 2 (1984).[5] D. Cornell and P. Yu. Integration of Bu�er Management and Query Optimization in Re-lational Database Environment, Proc. 15th Intern. Conference on Very Large Data Bases(1989).[6] W. E�elsberg and T. Haerder. Principles of Database Bu�er Management, ACMTransactionson Database Systems (9) 4 (1984).[7] C. Faloutsos, R. Ng and T. Sellis. Predictive Load Control for Flexible Bu�er Allocation,Proc. 17th Intern. Conference on Very Large Data Bases, pp 265{274 (1991).[8] J. Kaplan. Bu�er Management Policies in a Database Environment, Master Thesis, Univer-sity of California, Berkeley (1980).[9] E. Kauder. History of Marginal Utility Theory, Princeton University Press (1965).[10] T. Lang, C. Wood and E. Fernandez. Database Bu�er Paging in Virtual Storage Systems,ACM Transactions on Database Systems (2) 4 (1977).[11] R. Mattson, J. Gecsei, D. Slutz and I. Traiger. Evaluation Techniques for Storage Hierarchies,IBM Systems Journal (9) 2 (1970). 28

[12] R. Ng, C. Faloutsos and T. Sellis. Flexible Bu�er Allocation based on Marginal Gains, Proc.ACM SIGMOD International Conference on Management of Data, pp 387{396 (1991).[13] G. Sacca and M. Schkolnick. A Mechanism for Managing the Bu�er Pool in a RelationalDatabase System using the Hot Set Model, Proc. 8th Intern. Conference on Very Large DataBases (1982).[14] G. Sacca and M. Schkolnick. Bu�er Management in Relational Database Systems, ACMTransactions on Database Systems (11) 4 (1986).[15] S. Sherman and R. Brice. Performance of a Database Manager in a Virtual Memory System,ACM Transactions on Database Systems (1) 4 (1976).[16] M. Stonebraker, E. Wong and P. Kreps. The Design and Implementation of INGRES, ACMTransactions on Database Systems (1) 3 (1976).[17] K.S. Trivedi. Probability and Statistics with Reliability, Queuing and Computer Science Ap-plications, Prentice Hall, Inc., Englewood Cli�s, NJ (1982).[18] S. Yao. Approximating Block Accesses in Database Organizations, Communications of theACM (20) 4 (1977).

29

List of Figures1 Bu�er Manager and Related Components : : : : : : : : : : : : : : : : : : : : : : : : 22 Typical Curves of Marginal Gain Values : : : : : : : : : : : : : : : : : : : : : : : : : 103 Queueing system : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 144 E�ective disk utilization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 165 Relative Throughput: Mix 1 (mainly looping references), no Data Sharing : : : : : : 196 Average Waiting Time and Bu�er Utilization: Mix 1 : : : : : : : : : : : : : : : : : : 207 Relative Throughput vs Total Bu�ers: Mix 1 (ncq = 8) : : : : : : : : : : : : : : : : 218 Relative Throughput: Mix 2 (mainly random references), no Data Sharing : : : : : : 229 Relative Throughput: Mix 1, full Data Sharing : : : : : : : : : : : : : : : : : : : : : 2310 Switching Mixes: (a) Stage 1 { Mix 4, (b) Stage 2 { Mix 3 : : : : : : : : : : : : : : 2411 Mix 4 to Mix 3: Instantaneous Throughput before and after Switching : : : : : : : : 25

30

List of Tables1 Classi�cation of Bu�er Allocation Algorithms : : : : : : : : : : : : : : : : : : : : : : 32 Summary of Symbols and De�nitions : : : : : : : : : : : : : : : : : : : : : : : : : : : 53 Characteristics of Bu�er Allocation Algorithms : : : : : : : : : : : : : : : : : : : : : 124 Summary of Symbols and De�nitions for queueing model : : : : : : : : : : : : : : : 135 Summary of Query Types : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 186 Details of Relations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 187 Summary of Query Mixes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 198 Costs of Running the Algorithms (Mix 2, ncq = 4, no data sharing) : : : : : : : : : 26

31