16
Algorithmica (2012) 63:347–362 DOI 10.1007/s00453-011-9534-1 Improved Approximation Algorithms for Data Migration Samir Khuller · Yoo-Ah Kim · Azarakhsh Malekian Received: 14 December 2009 / Accepted: 23 May 2011 / Published online: 6 July 2011 © Springer Science+Business Media, LLC 2011 Abstract Our work is motivated by the need to manage data items on a collec- tion of storage devices to handle dynamically changing demand. As demand for data items changes, for performance reasons, the system needs to automatically re- spond to changes in demand for different data items. The problem of computing a migration plan among the storage devices is called the data migration problem. This problem was shown to be NP-hard, and an approximation algorithm achieving an ap- proximation factor of 9.5 was presented for the half-duplex communication model in Khuller, Kim and Wan (Algorithms for data migration with cloning. SIAM J. Com- put. 33(2):448–461, 2004). In this paper we develop an improved approximation al- gorithm that gives a bound of 6.5 + o(1) using new ideas. In addition, we develop better algorithms using external disks and get an approximation factor of 4.5 using external disks. We also consider the full duplex communication model and develop an improved bound of 4 + o(1) for this model, with no external disks. Keywords Data migration · Edge coloring · Approximation algorithms A preliminary version of the paper was presented at the 2006 APPROX conference. Research of S. Khuller was supported by NSF CCF 0728839 and a Google Research Award. S. Khuller · A. Malekian ( ) Department of Computer Science, University of Maryland, College Park, MD 20742, USA e-mail: [email protected] S. Khuller e-mail: [email protected] Y.-A. Kim Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA e-mail: [email protected]

Improved Approximation Algorithms for Data Migration

Embed Size (px)

Citation preview

Page 1: Improved Approximation Algorithms for Data Migration

Algorithmica (2012) 63:347–362DOI 10.1007/s00453-011-9534-1

Improved Approximation Algorithms for DataMigration

Samir Khuller · Yoo-Ah Kim ·Azarakhsh Malekian

Received: 14 December 2009 / Accepted: 23 May 2011 / Published online: 6 July 2011© Springer Science+Business Media, LLC 2011

Abstract Our work is motivated by the need to manage data items on a collec-tion of storage devices to handle dynamically changing demand. As demand fordata items changes, for performance reasons, the system needs to automatically re-spond to changes in demand for different data items. The problem of computing amigration plan among the storage devices is called the data migration problem. Thisproblem was shown to be NP-hard, and an approximation algorithm achieving an ap-proximation factor of 9.5 was presented for the half-duplex communication model inKhuller, Kim and Wan (Algorithms for data migration with cloning. SIAM J. Com-put. 33(2):448–461, 2004). In this paper we develop an improved approximation al-gorithm that gives a bound of 6.5 + o(1) using new ideas. In addition, we developbetter algorithms using external disks and get an approximation factor of 4.5 usingexternal disks. We also consider the full duplex communication model and developan improved bound of 4 + o(1) for this model, with no external disks.

Keywords Data migration · Edge coloring · Approximation algorithms

A preliminary version of the paper was presented at the 2006 APPROX conference.Research of S. Khuller was supported by NSF CCF 0728839 and a Google Research Award.

S. Khuller · A. Malekian (�)Department of Computer Science, University of Maryland, College Park, MD 20742, USAe-mail: [email protected]

S. Khullere-mail: [email protected]

Y.-A. KimDepartment of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269,USAe-mail: [email protected]

Page 2: Improved Approximation Algorithms for Data Migration

348 Algorithmica (2012) 63:347–362

1 Introduction

To handle high demand, especially for multimedia data, a common approach is toreplicate data items within the storage system. Typically, a large storage server con-sists of several disks connected using a dedicated network, called a Storage Area Net-work. Disks typically have constraints on storage as well as the number of clients thatcan access data items from a single disk simultaneously. These systems are gettingincreasing attention since TV channels are moving to systems where TV programswill be available for users to watch with full video functionality (pause, fast forward,rewind etc.). Such programs will require large amounts of storage, in addition tobandwidth capacity to handle high demand.

Approximation algorithms have been developed [6, 11, 17, 18] to map knowndemand for data items to a specific data layout pattern to maximize utilization, wherethe utilization is the total number of clients that can be assigned to a disk that containsthe data item they want. In the layout, we compute not only how many copies of eachitem we need, but also a layout pattern that specifies the precise subset of items oneach disk. The problem is NP-hard, but there are polynomial-time approximationschemes [6, 11, 17, 18]. Given the relative demand for data items, the algorithmcomputes an almost optimal layout. Note that this problem is slightly different fromthe data placement problem considered in [3, 9, 16] since all the disks are in the samelocation, it does not matter which disk a client is assigned to; even in this special case,the problem is NP-hard [6].

Over time as the demand for data changes, the system needs to create new datalayouts. The problem we are interested in is the problem of computing a data migra-tion plan for the set of disks to convert an initial layout to a target layout. We assumethat data objects have the same size (these could be data blocks, or files) and thatit takes the same amount of time to migrate any data item from one disk to anotherdisk. In this work we consider two models. In the first model (half-duplex) the crucialconstraint is that each disk can participate in the transfer of only one item—either asa sender or as a receiver in each round. In other words, the communication patternin each round forms a matching. Our goal is to find a migration schedule to mini-mize the time taken to complete the migration (makespan). To handle high demandfor popular objects, new copies will have to be dynamically created and stored ondifferent disks. All previous work on this problem deals with the half-duplex model.We also consider the full-duplex model, where each disk can act as a sender and areceiver in each round for a single item. Previously we did not consider this naturalextension of the half-duplex model since we did not completely understand how toutilize its power to prove interesting approximation guarantees.

The formal description of the data migration problem is as follows: data item i

resides in a specified (source) subset Si of disks, and needs to be moved to a (desti-nation) subset Di . In other words, each data item that initially belongs to a subset ofdisks, needs to be moved to another subset of disks. (We might need to create newcopies of this data item and store it on an additional set of disks.) See Fig. 1 for anexample. If each disk had exactly one data item, and needs to copy this data itemto every other disk, then it is exactly the problem of gossiping. The data migrationproblem in this form was first studied by Khuller, Kim and Wan [14], and it was

Page 3: Improved Approximation Algorithms for Data Migration

Algorithmica (2012) 63:347–362 349

Fig. 1 An initial and target layout (left). For example, disk 1 initially has items {2,4,5} and in the targetlayout has items {1,3,4}. The corresponding Si ’s and Di ’s are shown on the right. Data item i resides in asubset Si of disks initially, and needs to be moved to Di during migration. For example, item 1 is locatedin disk 2 and 3 in the initial layout and in the target layout, we need to create a copy of the item in disk 1

shown to be NP-hard. In addition, a polynomial-time 9.5-approximation algorithmwas developed for the half-duplex communication model.

A slightly different formulation was considered by Hall et al. [10] in which aparticular transfer graph was specified. In their transfer graph vertices represent thestorage devices, and edges represent data transfers. The transfer is complete when allthe edges (transfers) incident on it complete. While they are able to solve the problemvery well, this approach is limited in the sense that it does not allow (a) cloning(creation of several new copies) and (b) does not allow optimization over the space oftransfer graphs. In [14] it was shown that a more general problem formulation is theone with source and destination subsets specified for each data item. However, themain focus in [10] is to do the transfers without violating space constraints. Anotherformulation has been considered recently where one can optimize over the spaceof possible target layouts [12]. The resulting problems are also NP-hard. However,no significant progress on developing approximation algorithms was made on thisproblem. They presented a simple flow based heuristic for the problem, and wasdemonstrated to be effective in finding good target layouts.

Job migration has also been considered in the scheduling context recently as well[2], where a fixed number of jobs can be migrated to reduce the makespan by asmuch as possible. There is a lot of work on data migration for minimizing comple-tion time for a fixed transfer graph as well (see [5, 15] for references). The transfergraph definition in their model is similar to [14]. In [15], Kim proved that the problemis NP-hard when edge lengths are the same and showed that Graham’s list schedulingalgorithm [8], when guided by an optimal solution to a linear programming relax-ation, gives an approximation ratio of 3. For the objective of minimizing the averagevertex completion time, Gandhi and Mestre [5] gave a primal-dual 3-approximationfor unit processing times and a 5.83-approximation for arbitrary processing times.For minimizing the average edge completion time, they present a

√2-approximation

for bipartite graphs.

1.1 Communication Model

Different communication models can be considered based on how the disks are con-nected. In this paper we consider two models. The first model is the same model asin the work by Hall et al. [1, 10, 13, 14] where the disks may communicate on any

Page 4: Improved Approximation Algorithms for Data Migration

350 Algorithmica (2012) 63:347–362

matching; in other words, the underlying communication graph allows for communi-cation between any pair of devices via a matching (a switched storage network withunbounded backplane bandwidth). Moreover, to model the limited switching capac-ity of the network connecting the disks, one could allow for choosing any matchingof bounded size as the set of transfers that can be done in each round. We call thisthe bounded-size matching model. It was shown in [14] that an algorithm for thebounded matching model can be obtained by a simple simulation of the algorithm forthe unbounded matching model with excellent performance guarantees.

In addition we consider the full duplex model where each disk may act as a senderand a receiver for an item in each round. Note that we do not require the commu-nication pattern to be a matching any more. For example, we may have cycles, withdisk 1 sending an item to disk 2, disk 2 to disk 3 and disk 3 to disk 1. In earlier workwe did not discuss this model as we were unable to utilize the power of this model toprove non-trivial approximation guarantees.

1.2 Our Results

Our approach is based on the approach initially developed by Khuller et al. [14].Various new ideas enable a reduction of the approximation factor to 6.5 + o(1). Themain technical difficulty is simply that of “putting it all together” and making theanalysis work.

In addition we show two more results. If we are allowed to use “external disks”(called bypass disks in [10]), we can improve the approximation guarantee further to3 + 1

2 max(3, γ ). (We assume that each external disk can hold γ items.) This can beachieved by using at most ��

γ� external disks, where � is the number of items that

need to be migrated. This gives an approximation factor of 4.5 by setting γ = 3.Finally, we also consider the full-duplex model where each disk can be the source

or destination of a transfer in each round. In this model we show that an approxima-tion guarantee of 4 + o(1) can be achieved.

The algorithm developed in [14] has been implemented, and we performed an ex-tensive set of experiments comparing its performance with the performance of otherheuristics [7]. Even though the worst case approximation factor is 9.5, the algorithmperformed very well in practice, giving approximation ratios within twice the lowerbounds computed by the algorithm in most cases.

In Sect. 2 we first discuss a brief overview of our 6.5 + o(1)-approximation algo-rithm along with some theorems and lower bounds that we will utilize for the anal-ysis. The full details are described in Sect. 3. The algorithms for external disks andfull duplex models are discussed in Sect. 4 and 5, respectively. Section 6 concludesthe paper.

2 The Data Migration Algorithm

We start this section by describing some theorems from the edge coloring andscheduling literature, as well as the lower bounds that we will use in the followingsections for the analysis. In the second part, we present our data migration algorithm.

Page 5: Improved Approximation Algorithms for Data Migration

Algorithmica (2012) 63:347–362 351

2.1 Preliminaries

Our algorithms make use of known results on edge coloring of multigraphs. Givena graph G with max degree �G and multiplicity μ the following results are known(see Bondy-Murty [4] for example). Let χ ′ be the edge chromatic number of G. Notethat when G is bipartite, χ ′ = �G and such an edge coloring can be obtained inpolynomial time [4].

Theorem 1 (Vizing [21]) If G has no self-loops then χ ′ ≤ �G + μ.

Theorem 2 (Shannon [19]) If G has no self-loops then χ ′ ≤ � 32�G�.

Another result that we will use (related to scheduling) is the following theorem byShmoys and Tardos [20]:

Theorem 3 We are given a collection of jobs J , each of which is to be assignedto exactly one machine among the set M; if job j ∈ J is assigned to machinei ∈ M, then it requires pij units of processing time, and incurs a cost of cij . Sup-pose that there exists a fractional solution (that is, a job can be assigned fractionallyto machines) with makespan P and total cost C. Then in polynomial time we can finda schedule with makespan P + maxpij and total cost C.

We use two main lower bounds for our analysis. As in [14] let βj be |{i|j ∈ Di}|,i.e., the number of different sets Di , to which a disk j belongs. We define β asmaxj=1,...,N βj . In other words, β is an upper bound on the number of items a diskmay need. Note that β is a lower bound on the optimal number of rounds, since thedisk j that attains the maximum, needs at least β rounds to receive all the items i suchthat j ∈ Di , since it can receive at most one item in each round. Another lower boundthat we will use in the analysis is α which is defined as follows: For an item i decidea primary source si ∈ Si so that α = maxj=1,...,N (|{i|j = si}| + βj ) is minimized. Inother words, α is the maximum number of items for which a disk may be a primarysource (si ) or destination. As one can see α is also a lower bound on the optimalnumber of rounds. We will describe how to compute α optimally in polynomial timeusing network flow algorithms.

Moreover, we may assume that Di = ∅ and Di ∩ Si = ∅. This is because we candefine the destination set Di as the set of disks that need item i and do not currentlyhave it.

Next, we present a high level description of our data migration algorithm.

2.2 Data Migration Algorithm: High Level Idea

The high level description of the algorithm is as follows:

Algorithm 1 (Data Migration Algorithm)

1. For each item i, find a disk (call it the primary source) si ∈ Si such thatmaxj=1,...,N (|{i|j = si}| + βj ) is minimized. Later we show how we can do thisstep in polynomial time.

Page 6: Improved Approximation Algorithms for Data Migration

352 Algorithmica (2012) 63:347–362

2. For each item i, we define two different subgroups Gi ⊆ Di and Ri(⊆ Di) withthe following properties:

– Gi ’s are disjoint from each other. At first, we send item i to these disks.– Ri sets are not disjoint from each other but each disk belongs to only a bounded

number of different Ri sets. In our algorithm, we send data items from Gi to Ri

and then from Ri to the rest of the disks in Di .

3. Ri sets are selected as follows:(a) First partition Di into subgroups Di,k k = 0 . . . � |Di |

q� of size at most q (q is

a parameter that will be specified later). That is, we partition Di into � |Di |q

�subgroups of size q and possibly one subgroup of size less than q (if |Di | isnot a multiple of q).

(b) Now select Ri ⊆ Di and assign each Di,k to a disk in Ri such that for eachdisk in Ri the total size of subgroups (the total number of disks) assigned tothe disk is at most β + q . (We will see later that it is always possible to selectRi with this property.) Let ri be the disk in Ri to which the small subgroup(a subgroup with size strictly less than q) is assigned. Note that if |Di | is amultiple of q , there is no disk ri . We define Ri to be Ri \ ri . For the analysispurposes we need this classification.

4. Compute Gi ⊆ Di such that |Gi | = � |Di |β

� and they are mutually disjoint.

5. For each item i for which Gi = ∅ but Ri = ∅, we select a disk gi . Let G′i = Gi if

Gi is not empty and G′i = {gi} otherwise.

Note that gi disk exists iff q < |Di | < β .6. Send data item i from the primary source si to G′

i .7. Send item i from G′

i to Ri \ G′i by setting up a transfer graph and using an edge

coloring to schedule the transfer. Here, Ri is defined to be Ri \ ri where ri is thedisk in Ri to which the small subgroup (a subgroup with size strictly less than q)is assigned.

8. Send item i from si to ri if ri has not received item i.9. Finally set up a transfer graph from Ri to Di \Ri . We find an edge coloring of the

transfer graph and the number of colors used is an upper bound on the number ofrounds required to ensure that each disk in Di gets item i. In Lemma 7 we derivean upper bound on the number of required colors.

In the algorithm given in [14], the migration is performed in two stages. In thefirst stage, each item i is migrated to disjoint subsets of Di ’s (they called it Gi ) andin the second stage, the items are migrated from Gi ’s to the rest of the disks in Di .By choosing disjoint sets, broadcasting inside the subsets are faster and easier butalso selecting disjoint sets limits the size of these subsets and as a result, the numberof rounds required to complete the second stage will be increased. In our algorithmwe add an extra intermediate stage. As in the previous method, the items are firstmigrated to disjoint sets Gi . In the intermediate stage, the data items are sent to aspecific subset of disks in Di (we call them Ri ) which are not necessarily disjointfrom each other, however the overlap is limited. And finally we migrate items fromRi to the rest of the disks in Di . A high level presentation for the algorithm can be findin Fig. 2. In the following sections, we will describe the details of the Algorithm 1.

Page 7: Improved Approximation Algorithms for Data Migration

Algorithmica (2012) 63:347–362 353

Fig. 2 An overall picture of the data migration algorithm

3 Details of Steps

In this section, we discuss some of the steps of the algorithms in more detail.

Description of Step 1: Selecting the Primary Source for Each Item This is exactlythe same as Lemma 3.1 described in [14].

Lemma 1 [14] We can find a source si ∈ Si for each item i so thatmaxj=1,...,N (|{i|j = si}| + βj ) is minimized, using a flow network.

Proof We create a flow network with a source s and a sink t as shown in Fig. 3. Wehave two sets of nodes corresponding to disks and items respectively. Add directededges from s to nodes for items and also directed edges from item i to disk j ifj ∈ Si . The capacity of all those edges is one. Finally we add an edge from the nodecorresponding to disk j , to t , with capacity α − βj . We want to find the minimum α

so that the maximum flow of the network is �. We can do this by checking if there isa flow of value � with α starting from β and increasing by one until it is satisfied. Ifthere is outgoing flow from item i to disk j , then we set j as si . �

Description of Step 3: Selecting RI for Each Item i Let Dik (k = 1, . . . , � |Di |q

�) be

kth subgroup of Di . The size of Dik is q for k = 1, . . . , � |Di |q

� and Dik , k = �|Di |q

�+1

contains the remaining disks in Di of size |Di |−q ·�|Di |/q� (and it could be possiblyempty). To show how we assign Dij to Ri we use Theorem 3. In our problem, we can

Page 8: Improved Approximation Algorithms for Data Migration

354 Algorithmica (2012) 63:347–362

think of each subgroup Dik as a job and each disk as a machine. If disk j belongsto Di , then we can assign job Dik to disk j with zero cost. The processing time isthe size of Dik , which is at most q . If disk j does not belongs to Di , then the cost toassign Dik to j is ∞ (disk j cannot be in Ri ). First we can show that:

Lemma 2 There exists a fractional assignment such that the max load of each diskis at most β .

Proof We can assign a 1|Di | fraction of subgroup Dik to each disk j ∈ Di . It is easy

to check that every subgroup Dik is completely assigned. The load on disk j is givenby

i:j∈Di

k

|Dik||Di | =

i:j∈Di

1

|Di |∑

k

|Dik| =∑

i:j∈Di

1 ≤ β�

Now we can show that:

Lemma 3 There is a way to choose Ri sets for each i = 1, . . . ,� and assign sub-groups Dik such that for each disk in Ri the total size of subgroups Dik assigned tothe disk is at most β + q .

Proof By Theorem 3, we can convert the fractional solution obtained in Lemma 2 toan integral solution such that each subgroup is completely assigned to one disk, andthe maximum load on a disk is at most β +q . (Since as maximum size of Dik is q .) �

Considering this assignment, we can directly conclude that:

Fact 1 For each disk j , at most β/q + 1 different large subgroups Dik (of size ex-actly q) can be assigned to the disk j . In other words, a disk can belong to at mostβ/q + 1 different R̄i sets.

The reason is that the number of disks assigned to Ri is at most β + q and the sizeof each large subgroup is q . We will use this fact later.

Description of Step 4: Select G′i ⊆ Di We can find disjoint sets Gi ⊆ Di using the

same algorithm as in [14]. For completeness we include their method here:

Lemma 4 There is a way to choose disjoint sets Gi for each i = 1, . . . ,�, such that|Gi | = � |Di |

β� and Gi ⊆ Di .

Proof First note that the total size of the sets Gi is at most N .

�∑

i=1

|Gi | ≤�∑

i=1

|Di |β

= 1

β

�∑

i=1

|Di |

Note that∑�

i=1 |Di | is at most βN by definition of β . This proves the upper boundof N on the total size of all sets Gi . We now show how to find the sets Gi . As shown

Page 9: Improved Approximation Algorithms for Data Migration

Algorithmica (2012) 63:347–362 355

Fig. 3 Computing Gi sets

in Fig. 3, we create a flow network with a source s and sink t . In addition we havetwo sets of vertices U and W . The first set U has � nodes, each corresponding toan item. The set W has N nodes, each corresponding to a disk in the system. Weadd directed edges from s to each node in U such that the edge (s, i) has capacity� |Di |

β�. We also add directed edges with unit capacity from node i ∈ U to j ∈ W if

j ∈ Di . We add unit capacity edges from nodes in W to t . We find a max-flow froms to t in this network. The min-cut in this network is obtained by simply selectingthe outgoing edges from s. To see this, note that we can find a fractional flow ofthis value as follows: saturate all the outgoing edges from s. From each node i thereare |Di | edges to nodes in W . Suppose λi = �|Di |

β�. Send 1

βunits of flow along λiβ

outgoing edges from i. Note that since λiβ ≤ |Di | this can be done. Observe that thetotal incoming flow to a vertex in W is at most 1 since there are at most β incomingedges, each carrying at most 1

βunits of flow. An integral max-flow in this network

will correspond to |Gi | units of flow going from s to i and from i to a subset ofvertices in Di before reaching t . The vertices to which i has nonzero flow will formthe set Gi . �

Description of Step 5: Selecting GI for Each Item i The above approach can helpus find Gi sets. However if Gi = ∅ but Ri = ∅, we need to select another disk gi aswell. Note that if |Gi | = 0 then |Di | < β , and therefore, |Ri | <

βq

. We define G′i to

be Gi if Gi = ∅ and G′i = gi otherwise. Next, we show how to select gi as well.

Lemma 5 For each item i for which Gi = ∅ but Ri = ∅, we can find gi so that for adisk j ,

∑i:j=gi

|Ri | ≤ 2βq

+ 1. Or in other words, each gi is responsible for at most

β/q disks but a disk can be gi for multiple items. So a disk may be responsible for2β/q + 1 disks.

Proof We again use Theorem 3. Reduce the problem to the following schedulingproblem: Consider each disk as a machine. For each item such that |Gi | = 0, create ajob of size |Ri |. The cost of assigning job i to machine j is 1 iff j ∈ Ri , otherwise it isinfinite. Note that there is a fractional assignment such that the load on each machineis at most β

q+ 1. The way to show it is by assigning a 1

|Ri | fraction of each job to

each machine in its Ri set. The load due to this job on the machine is 1. Since a disk

Page 10: Improved Approximation Algorithms for Data Migration

356 Algorithmica (2012) 63:347–362

is in at most βq

+ 1 different Ri sets (based on fact 1), the fractional load on each

machine is at most βq

+ 1. By applying the Shmoys-Tardos [20] scheduling algorithm(Theorem 3), we can find an assignment of jobs (items) to machines (disks) such thatthe total cost is at most the number of items and the load on each machine(disk) is atmost 2β

q+ 1. Note that the size of each job is at most β

q. gi will be the disk(machine)

that item i is assigned to. �

Description of Step 6: Sending Items from Si to G′i First we show how to send data

items from Si to G′i and also give the number of rounds these transfers take. We

claim that this can be done in 2OPT + O(βq) rounds. We develop a lower bound on

the optimal solution by solving the following linear program L(m) for a given m.

L(m) :∑

j

m∑

k=1

nijkxijk ≥ |G′i | for all i (1)

i

xijk ≤ 1 for all j , k (2)

0 ≤ xijk ≤ 1 (3)

where nijk = min(2m−k, |G′i |) if disk j belongs to Si and nijk = 0 otherwise. Intu-

itively, xijk indicates that at time k, disk j send item i to some disk in G′i . Let M be

the minimum m such that L(m) has a feasible solution. Note that M is a lower boundfor the optimal solution. (Otherwise, consider a feasible migration and set xijk basedon the given schedule as defined above.) One can easily verify that the schedule givesa feasible solution for the linear program L(m). Also, we know that between all thefeasible solutions, M is the smallest possible m that has a feasible solution. Now, weshow that:

Lemma 6 We can perform migrations from Si to G′i in 2 · M + O(β/q) rounds.

Proof Given a fractional solution x∗ to L(M), we can obtain an integral solutionx∗∗ such that for all i,

∑j

∑k x∗∗

ijk ≥ �∑j

∑k x∗

ijk�. (Using Lemma 3.4 from [14].)For each item i, we arbitrarily select min(

∑j

∑k x∗∗

ijk, |G′i |) disks from G′

i . Let Hi

denote this subset. We create the following transfer graph from Si to Hi : create anedge from a disk j ∈ Si to every disk in Hi if x∗∗

ijk = 1. (Make sure every disk in Hi

has an incoming edge from a disk in Si .) Note the indegree of a disk in this transfergraph is 2 + β

qsince a disk can belong to Hi for at most 2 + β/q different i’s (a disk

can be gi for at most β/q + 1 different items because in the worst case, a nodeis responsible for β/q sets of size 1 and one set of size β/q and also may belong toone Gi ). The outdegree is M (because of Constraints (2) and the fact that k ≤ M) andthe multiplicity of the transfer graph is 2β/q + 4 (again because a disk can belong toHi for at most β/q different i’s and each can be a source or destination). Therefore,we can perform the migration from Si to Hi in M + O(β/q) rounds by Theorem 1.For i with |G′

i | = 1, the transfer is complete. For the rest of the items, since sets

Page 11: Improved Approximation Algorithms for Data Migration

Algorithmica (2012) 63:347–362 357

G′i (= Gi) are disjoint, we can double the number of copies in each round until the

number of copies becomes |Gi |. After M rounds, the number of copies we can makefor item i is at least

2M |Hi | = 2M min

(∑

j

k

x∗∗ijk, |Gi |

)

≥ min

(2M−1 · 2

j

k

x∗∗ijk, |Gi |

)

≥ min

(2M−1 ·

(∑

j

k

x∗∗ijk + 1

), |Gi |

)

≥ min

(2M−1

j

k

x∗ijk, |Gi |

)

≥ min

(∑

j

k

nijkx∗ijk, |Gi |

)≥ |Gi |.

The second inequality comes from the fact that∑

j

∑k x∗∗

ijk ≥ 1. Therefore we canfinish the whole transfer from Si to G′

i in 2 · M + O(β/q) rounds by Theorem 1. �

Description of Step 7: Sending Item i from G′i to Ri We now focus on sending item

i from the disks in G′i to disks in Ri . We construct a transfer graph to send items from

G′i to Ri so that each disk in Ri \ G′

i receives item i from one disk in G′i . We create

the transfer graph as follows: First, add directed edges from disks in Gi to disks inRi . Recall that |Gi | = � |Di |

β� and |Ri | = � |Di |

q�. Since Gi sets are disjoint, there is

a transfer graph in which each disk in Gi has at most �(β/q) outgoing edges. Foritems with Gi = ∅, we put edges from gi to all disks in Ri . The outdegree of eachdisk can be increased by at most 2β

q+1. The indegree of a disk in Ri is at most β

q+1

and the multiplicity is 2βq

+ 2. Therefore, this step can be done in O(β/q) rounds.

Description of Step 8: Sending Item i from si to ri Again we create a transfer graphin which there is an edge from si to ri if ri has not received item i in the previoussteps. The indegree of a disk j is at most βj since a disk j is selected as ri only ifj ∈ Di and the outdegree of disk j is at most α − βj . Using Theorem 2, this step canbe done in 3α

2 rounds.

Description of Step 9: Sending Item i from Ri to Di \ (Ri ∪ G′i ) We now create a

transfer graph from Ri to Di \ (Ri ∪ G′i ) such that there is an edge from disk a ∈ Ri

to disk b if the subgroup that b belongs to is assigned to a in Lemma 3. We find anedge coloring of the transfer graph. The following lemma gives an upper bound onthe number of rounds required to ensure that each disk in Di gets item i.

Lemma 7 The number of colors we need to color the transfer graph is at most3β + q .

Page 12: Improved Approximation Algorithms for Data Migration

358 Algorithmica (2012) 63:347–362

Proof First, we compute the maximum indegree and outdegree of each node. Theoutdegree of a node is at most β + q due to the way we choose Ri (see Lemma 3).The indegree of each node is at most β since in the transfer graph we send items onlyto the disks in their corresponding destination sets. The multiplicity of the graph isalso at most β since we send item i from disk j to disk k (or vice versa) only if bothdisk j and k belong to Di . By Theorem 1, we see that the maximum number of colorsneeded is at most 3β + q . �

To wrap up, in the next theorem we show that the total number of rounds in thisalgorithm is bounded by 6.5 + o(1) times the optimal solution.

Theorem 4 The total number of rounds required for the data migration is at most6.5 + o(1) times OPT .

Proof The total number of rounds we need is 2M + 3α/2 + 3β +O(β/q)+ q . SinceM , α, and β are the lower bounds on the optimal solution, choosing q = �(

√β)

gives the desired result. �

4 External Disks

Until now we assumed that we had N disks, and the source and destination sets werechosen from this set of disks and only essential transfers are performed. In otherwords, if an item i is sent to disk j , then it must be that j ∈ Di (disk j was in the des-tination set for item i), hence the total number of transfers done is the least possible.In several situations, we may have access to idle disks with available storage that wecan make use of as temporary devices to enable a faster completion of the transferswe are trying to schedule. In addition, we exploit the fact that by performing a smallnumber of non-essential transfers (this was also used in [10, 13]), we can further re-duce the total number of rounds required. We show that indeed such techniques canconsiderably reduce the total number of rounds required for performing the transfersfrom Si sets to Di sets.

We assume that each external disk has enough space to pack γ items. If we areallowed to use ��

γ� external disks, the approximation ratio can be improved to 3 +

max(1.5,γ2 ). For example, choosing γ = 3 gives a bound of 4.5.

Define β̄ = ∑�i=1

|Di |N

. We can see that 2β̄ is a lower bound on the optimal numberof rounds since in each round at most �N

2 � data items can be transferred. The highlevel description of the algorithm is as follows:

1. Assign γ items to each external disk. Send items to their assigned external disks.2. For each item i, choose disjoint Gi sets of size �Di/β̄�.3. Send item i to all disks in the Gi set.4. Send item i from the Gi set to all the disks in Di . We will also make use of the

copy of item i on the external disk.

We now discuss these steps in detail.

Page 13: Improved Approximation Algorithms for Data Migration

Algorithmica (2012) 63:347–362 359

First step can be done in at most max(α, γ ) rounds by sending the items from theirprimary sources to the external disks. For this step the primary disks are chosen by thesame method that we used to compute α as discussed in Sect. 3. Once α is computed,we make a bipartite graph with two set of vertices. The first set (call it primary set)contains the primary sources and the second set (call it external set) contains theexternal disks. We put edges between primary sources and the corresponding externaldisks for each item. Based on the way that we construct this graph, the outdegree ofthe disks in the primary set is at most α and the indegree of the vertices in external setis at most γ . Since the graph is bipartite, we can find an edge coloring (and a transferschedule respectively) with max(α, γ ) in polynomial time.

We can easily choose disjoint set Gi as we are allowed to perform non-essentialtransfers (i.e., a disk j can belong to Gi even if j is not in Di ). Hence we can usea simple greedy method to choose Gi . Broadcasting items inside Gi can be done in2M rounds as described in Sect. 2.

Next step is to send the item to all the remaining disks in the Di sets. We make atransfer graph as follows: assign to each disk in Gi at most β̄ disks in Di so that eachdisk in Di is assigned to at most one disk in the Gi set. The number of unassigneddisks from each Di set is at most β̄ . Assign all of the remaining disks from Di to theexternal disk containing that item. The outdegree of the internal disks is at most β̄

since each disk belongs to at most one Gi set. The indegree of each internal disk is atmost β since a disk will receive an item only if it is in its demand set. The multiplicitybetween two internal disks is at most 2. (Since each disk can belong to at most oneGi set.) So the total degree of each internal disk is at most β + β̄ . Each external diskhas at most γ items and the number of remaining disks for each item is at most β̄ . Sothe outdegree of each external disk is at most γ β̄ ≤ γ

2 OPT .As a summary, the first step can be done in max(α, γ ) rounds. Step 3, can be com-

pleted in 2M rounds and step 4 can be done in max(β + β̄, γ β̄) ≤ 12 max(3, γ )OPT +

max(2, γ ) rounds. (β̄ ≤ OPT2 ). So the total number of steps to do the whole transfer

is at most α + 2M + 3 + 12 max(3, γ )OPT + max(2, γ ) ≤ (3 + 1

2 max(3, γ ))OPT +2γ + O(1).

Theorem 5 There is a (3+ 12 max(3, γ ))OPT +2γ +O(1) approximation algorithm

for data migration when there exists �γ

external disks.

5 Full Duplex Model

In this section we consider the full duplex communication model. In this model, weassume that each disk can send and receive at most one item in each round. In thehalf-duplex model, we assumed that at each round, a disk can either send or receiveone item (but not both at the same time). In the full duplex model the communicationpattern does not have to induce a matching since directed cycles are allowed (thedirection indicates the data item transfer direction).

We develop a 4 + o(1) approximation algorithm for this model. In this model,given a transfer graph G, we find an optimal migration schedule for G as follows:Construct a bipartite graph by putting one copy of each disk in each partition. We

Page 14: Improved Approximation Algorithms for Data Migration

360 Algorithmica (2012) 63:347–362

Fig. 4 Computing α′

call the copy of vertex u in the first partition uA, and in the other partition uB . Weadd an edge from uA to vB in the bipartite graph if and only if there is a directededge in the transfer graph from u to v. The bipartite graph can be colored optimallyin polynomial time and the number of colors is equal to the maximum degree of thebipartite graph.

Note that β and M are still lower bounds on the optimal solution in the full-duplexmodel. The algorithm is the same as in Sect. 2 except the procedure to select primarysources si .

– For each item i, decide a primary source si so that α′ = maxj=1,...,N (max(|{j |j =si}|, βj )) is minimized. Note that α′ is also a lower bound for the optimal solution.We can find these primary sources as shown in Lemma 8 by adapting the methodused in [14].

We show how to find the primary sources si .

Lemma 8 By using network flow we can choose primary sources to minimizemaxj=1,...,N (max(|{j |j = si}|, βj )).

Proof Create two vertices s and t . (See Fig. 4 for example.) Make two sets, one forthe items and one for the disks. Add edges from s to each node corresponding to anitem of unit capacity. Add a directed edge of infinite capacity between item j anddisk i if i ∈ Sj . Add edges of capacity α′ from each node in the set of disks to t . Findthe minimum α′ (initially α′ = β), so that we can find a feasible flow of value �. Foreach item j , choose the disk as its primary source sj to which it sends one unit offlow. �

Theorem 6 There is a 4 + o(1) approximation algorithm for data migration in thefull duplex model.

Proof sending data items from Si to G′i (step 6) and from G′

i to Ri (step 7) still takes2M + O(β/q) rounds and O(β/q) rounds, respectively. For step 8, if we constructa bipartite graph, then the max degree is at most max(α′, β), which is the numberof rounds required for this step. For Step 9, the maximum degree of the bipartite

Page 15: Improved Approximation Algorithms for Data Migration

Algorithmica (2012) 63:347–362 361

graph is β + q . Therefore, the total number of rounds we need is 2M + max(α′, β)+β + O(β/q) + q . By choosing q = �(

√β), we can obtain a 4 + o(1)-approximation

algorithm. �

6 Conclusion

In this paper, we studied the data migration problem. In this problem, the objective isto find a migration plan among the storage devices. In this paper we developed an im-proved approximation algorithm that gives a bound of 6.5 + o(1). The improvementsmainly came from using additional intermediate representative sets Ri ’s, which arenot necessarily disjoint but overlapped only limited number of times. We also utilizedexisting copies of items more efficiently for tighter analysis.

We also developed better algorithms using external disks and get an approximationfactor of 4.5 using external disks. In addition, we considered the full duplex commu-nication model and developed an improved bound of 4 + o(1) for this model, with noexternal disks.

References

1. Anderson, E., Hall, J., Hartline, J., Hobbes, M., Karlin, A., Saia, J., Swaminathan, R., Wilkes, J.: Anexperimental study of data migration algorithms. In: Workshop on Algorithm Engineering, London,UK, 2001, pp. 145–158. Springer, Berlin (2001)

2. Aggarwal, G., Motwani, R., Zhu, A.: The load rebalancing problem. In: Symposium on Parallel Al-gorithms and Architectures, pp. 258–265 (2003)

3. Baev, I.D., Rajaraman, R.: Approximation algorithms for data placement in arbitrary networks. In:Proc. of ACM-SIAM Symposium on Discrete Algorithms, pp. 661–670 (2001)

4. Bondy, J.A., Murty, U.S.R.: Graph Theory with Applications. American Elsevier, New York (1977)5. Gandhi, R., Mestre, J.: Combinatorial algorithms for data migration to minimize average completion

time. Algorithmica 54(1), 54–71 (2009)6. Golubchik, L., Khanna, S., Khuller, S., Thurimella, R., Zhu, A.: Approximation algorithms for data

placement on parallel disks. In: Proc. of ACM-SIAM Symposium on Discrete Algorithms, Wash-ington, D.C., USA, 2000, pp. 661–670. Society of Industrial and Applied Mathematics, Philadelphia(2000)

7. Golubchik, L., Khuller, S., Kim, Y., Shargorodskaya, S., Wan, Y.: Data migration on parallel disks:algorithms and evaluation. Algorithmica 45(1), 137–158 (2006)

8. Graham, R.L., Grahamt, R.L.: Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math.17, 416–429 (1969)

9. Guha, S., Munagala, K.: Improved algorithms for the data placement problem, 2002. In: Proc. ofACM-SIAM Symposium on Discrete Algorithms, San Fransisco, CA, USA, 2002, pp. 106–107. So-ciety of Industrial and Applied Mathematics, Philadelphia (2002)

10. Hall, J., Hartline, J., Karlin, A., Saia, J., Wilkes, J.: On algorithms for efficient data migration. In:Proc. of ACM-SIAM Symposium on Discrete Algorithms, pp. 620–629 (2001)

11. Kashyap, S., Khuller, S.: Algorithms for non-uniform size data placement on parallel disks. J. Algo-rithms 60(2), 144–167 (2006)

12. Kashyap, S., Khuller, S., Wan, Y.C., Golubchik, L.: Fast reconfiguration of data placement in paralleldisks. In: 2006 ALENEX Conference, Jan. 2006

13. Khuller, S., Kim, Y., Wan, Y.C.: On generalized gossiping and broadcasting. In: European Symposiaon Algorithms, Budapest, Hungary, 2003, pp. 373–384. Springer, Berlin (2003)

14. Khuller, S., Kim, Y.A., Wan, Y.C.: Algorithms for data migration with cloning. SIAM J. Comput.33(2), 448–461 (2004)

Page 16: Improved Approximation Algorithms for Data Migration

362 Algorithmica (2012) 63:347–362

15. Kim, Y.: Data migration to minimize the average completion time. In: Proc. of ACM-SIAM Sympo-sium on Discrete Algorithms, pp. 97–98 (2003)

16. Meyerson, A., Munagala, K., Plotkin, S.A.: Web caching using access statistics. In: Symposium onDiscrete Algorithms, pp. 354–363 (2001)

17. Shachnai, H., Tamir, T.: Polynomial time approximation schemes for class-constrained packing prob-lems. In: Workshop on Approximation Algorithms. LNCS, vol. 1913, pp. 238–249 (2000)

18. Shachnai, H., Tamir, T.: On two class-constrained versions of the multiple knapsack problem. Algo-rithmica 29, 442–467 (2001)

19. Shannon, C.E.: A theorem on colouring lines of a network. J. Math. Phys. 28, 148–151 (1949)20. Shmoys, D.B., Tardos, E.: An approximation algorithm for the generalized assignment problem. Math.

Program., Ser. A 62, 461–474 (1993)21. Vizing, V.G.: On an estimate of the chromatic class of a p-graph. Diskretn. Anal. 3, 25–30 (1964)

(Russian)