The Loading Time Scheduling Problem

Ž .Journal of Algorithms 36, 1]33 2000doi:10.1006rjagm.2000.1076, available online at http:rrwww.idealibrary.com on

The Loading Time Scheduling Problem1

Randeep Bhatia2

Bell Labs, 600 Mountain Aënue, Room 2A-244, Murray Hill, New Jersey 07974

Samir Khuller 3

Computer Science Department and Institute for Adänced Computer Studies,Uni ersity of Maryland, College Park, Maryland 20742

and

Ž . 4Joseph Seffi Naor

Department of Computer Science, Technion, Haifa 32000, Israel

Received June 14, 1996

In this paper we study precedence constrained scheduling problems, where thetasks can only be executed on a specified subset of the set of machines. Eachmachine has a loading time that is incurred only for the first task that is scheduledon the machine in a particular run. This basic scheduling problem arises in thecontext of machining on numerically controlled machines and query optimizationin databases and in other artificial intelligence applications. We give the firstnontrivial approximation algorithm for this problem. We also prove nontriviallower bounds on best possible approximation ratios for these problems. Theseimprove on the nonapproximability results that are implied by the nonapproxima-bility results for the shortest common supersequence problem. We use the samealgorithm technique to obtain approximation algorithms for a problem arising inthe context of code generation for parallel machines and for the weighted shortestcommon supersequence problem. Q 2000 Academic Press

1 An extended abstract of this paper appeared in the Proceedings of the 36th IEEEConference on Foundations of Computer Science, Milwaukee, Wisconsin, pp. 72]81, 1995.

2 E-mail: [email protected] Research supported by NSF Research Initiation Award CCR-9307462 and NSF CAREER

Award CCR-9501355. E-mail: [email protected] Research supported in part by Grant No. 92-00225 from the United States]Israel

Ž .Binational Science Foundation BSF , Jerusalem. Research also supported by the TechnionV.P.R. Fund 120-882, Israel. E-mail: [email protected].

1

0196-6774r00 $35.00Copyright Q 2000 by Academic Press

All rights of reproduction in any form reserved.

BHATIA, KHULLER, AND NAOR2

1. INTRODUCTION

In this paper we study precedence-constrained scheduling problems. Thetasks are denoted by the vertex set of an acyclic graph. Precedenceconstraints are denoted by directed edges in the usual way; an edge from ito j indicates that task i should be completed before task j can be started.Each task needs to be scheduled on one of a specified subset of machinesŽ .for example, the machines may have different capabilities .

The cost of executing a task can be decomposed into two components.One component is the inherent execution time of the task itself. The othercomponent is the loading time, which is the setup time of the machine wechoose to perform the task on. When we perform a set of tasks consecu-tively on a particular machine, we incur the loading time only for the firsttask performed on the machine.

A schedule which is a feasible sequence of tasks represents the order ofexecution of tasks. At any time, at most one task is executed, and theschedule specifies the machine on which each task is executed. Every timethe next task in the sequence is executed on a machine other than thecurrent one, the loading time of the new machine is incurred.

Ž .We call this basic problem the loading time scheduling problem LTSP .w xA special case of this problem was first mentioned by Hayes 7 in the

context of machining metal parts. The objective is to start with a block ofmetal and to use a numerically controlled machining center to cut a varietyof features into the block. Each geometric feature is a task, and there areprecedence constraints on the order in which certain tasks can be per-formed. Different methods may be used to perform the tasks. Eachmethod can be performed on the machining center, which can accomplish

Ž .a variety of different operations drilling, end-milling, etc. , but can onlyperform one operation at a time. When we are able to overlap themachining operations, we do not incur the loading time delay for the

Žmachine repeatedly. For example, when we perform two drilling opera-tions consecutively, we only have to load the block of metal on the drilling

. w xmachine once. According to Hayes 7 , this setup time is a large fractionof the time for each operation; sometimes as much as 90% of the time isspent in setting up for one machining operation. All other times arerelatively small compared to the setup time.

w x ŽA second motivation given by Hayes 7 is shown in Fig. 1. We are notsolving the same problem, but this explains some of the intuition behind

.the loading time scheduling problem. Suppose we have to run a fewerrands. The time to do each errand can be decomposed into the time toget to the place where the errand is to be done, together with the time toactually do the task. The time for performing the errand depends onwhether we need to go to the location where the errand is to be performed

THE LOADING TIME SCHEDULING PROBLEM 3

FIG. 1. Motivation for the problem.

or whether we are already there. The optimal solution is to first go home,then go to the grocery store, and finally go to the post-office. This is amore general problem where there is a switching cost between machines.We handle the special case when the switching cost is the sum of theloading and unloading costs.

An extensive survey of operator overlap problems in artificial intelli-w xgence appears in the work by Foulser et al. 4 . In particular, they discuss a

variety of heuristics, with an average case analysis for them, as well asempirical results. Other applications of overlapping operators arise in

w xdatabases when we try to do multiple query optimization 15 .A problem related to the loading time scheduling problem is the

Ž . w xshortest common supersequence SCS problem 5, problem SR8 . Here, acollection of sequences over a fixed alphabet is given, and the goal is tofind a shortest common supersequence such that all given sequencesappear as a subsequence in the common supersequence. A sequence A isa subsequence of a sequence B if all the characters of A appear in thesame order in B. Note that we do not require that adjacent characters inA are adjacent in B. In other words A may not be a substring of B.

It is easy to obtain a r-approximation algorithm for the SCS problem,w xwhere r is the size of the alphabet. Jiang and Li 8 have shown hardness

results for approximating the SCS problem. Let n denote the number ofw x Ž .sequences. Specifically, Jiang and Li 8 show that i SCS does not have a

polynomial time constant factor approximation algorithm, unless P s NP;Ž .ii there exists a constant d such that if SCS has a polynomial time

d Ž p o l y log n.approximation algorithm with ratio log n then NP : DTIME 2 .They also give algorithms that produce solutions close to the optimal when

Ž w x .the supersequences are random see 8 for more details .A generalization is the weighted shortest common supersequence

Ž .WSCS , where each letter of the alphabet has a weight, and the weight ofa sequence is the sum of the weights of its constituent letters. Given a


collection of such weighted sequences the objective is to find a superse-quence of minimum weight. The WSCS problem is closely related to theLTSP by viewing the alphabet letters as machines, loading times as weightson the alphabets, and each sequence as defining precedence constraints

Ž .between tasks the precedence graph is a path . The objective functions ofthe two problems differ in how they treat the case when the same letterappears consecutively in a sequence. As an application of our results weshow that we can obtain a r-approximation algorithm for the WSCSproblem.

ŽThe literature concerning scheduling problems is very extensive see,w x.e.g., 9 . However, it appears that the specific constraints on the loading

time scheduling problem are very different from the kinds of problems thathave been previously considered in the scheduling literature.

A different motivation for our work stems from the design of compilersfor multiprocessor architectures that use the fork]join model of paral-lelism. In a fork]join model, the only operations available for expressing

Ž .parallelism are fork spawn a node’s execution as a new thread of controlŽ .and join wait for all previously forked threads to complete . The input is a

DAG whose vertices represent tasks and whose edges reflect all thecontrol and data dependencies among these instructions. The objective isto generate fork]join parallel code for the DAG, with minimum overallexecution time. It is assumed that the parallel code would be executed ona machine with an unbounded number of processors and no communica-tion overhead.

w xSarkar 14 investigated this problem of generating maximally parallelcode using only fork and join operations to correctly satisfy all the controland data dependences in the program. This problem is of interest whencompiling for multiprocessor architectures and runtime systems wherefork]join is the only mechanism, or the most efficient mechanism, avail-able for satisfying dependences. In Section 1.2 we provide a detaileddescription of the problem and its connection to the loading time schedul-ing problem.

1.1. The Loading Time Scheduling Problem

Let the tasks be denoted by the vertex set of a directed acyclic graphŽ . � 4G s V, E , where V s 1, 2, . . . , n . The procedure constraints are de-

noted in the usual way by directed edges; if there is an edge from i to jthen i needs to be done before j.

Suppose there are r machines m , . . . , m . Each task i can be per-1 r

Ž . Žformed only on a subset M i of the machines the machines have. Ž .different capabilities . Each machine m has a loading time l m . Anyj j

Ž Ž ..task i that can be performed on m , m g M i , and which satisfies thej j


Ž .precedence constraints, may be scheduled with an execution time of e i .When we perform a set of tasks consecutively on a particular machine, wepay the loading time only once. With these constraints, we wish tominimize the total makespan. The execution times for all tasks are fixedŽ .the choice of machine does not affect the execution time , so we canassume that the execution times are zero and concentrate on minimizingthe load-time. More formally,

1. The problem is to partition V into k subsets V j V ??? j V1 2 ksuch that

;p s 1 . . . k , M V / B,Ž .p

Ž . Ž .where M V s F M i . In other words, all tasks assigned to set Vp ig V pp

share at least one machine in common.2. For each edge x ª y in E, if x g V and y g V we require thati j

i F j.k Ž .3. The goal is to minimize Ý l V , whereps1 p

l V s min l m .Ž .Ž .p kŽ .m gM Vk p

w x < Ž . <In many manufacturing applications 7, 2 , typically, M i s 1. Thetasks, for example, could be drilling, end-milling, etc. Let the term jobdenote the block of metal mentioned earlier. The following simple heuris-tic is commonly used in such applications. After constructing the taskgraph, load the job on a machine and perform the set of tasks that can bedone on this machine such that they have no unfinished prerequisities.When there is a choice of machine, pick the machine on which the largestset of jobs can be performed consecutively. Stop when all the tasks that areready to be performed cannot be done on the current machine the job is

Ž .on. Now move the job to a different machine this incurs a loading timeand continue. Notice that, in general, the job could be loaded on the samemachine many times.

1.2. Fork]Join Parallelism Problem

Ž .Let G s V, E be a DAG representing the fork]join model. There is anonnegative cost function, w, denoting the execution time associated witheach vertex. Let W denote the ratio between the maximum and minimum

Ž .costs of vertices in V. The cost of a set of vertices B, denoted by w B , isdefined as the maximum cost of a vertex belonging to B.


The problem is defined as follows: Partition the vertices of the DAGinto a set of blocks B , B . . . B such that1 2 k

vX X

X XIf i ª j is an edge, and if i g B and j g B then i - j .i j

vk Ž .Minimize Ý w B .is1 i

An antichain in a DAG is a set of incomparable elements, i.e., there isno directed path between any pair of elements in an antichain. In thecontext of the fork]join model, an antichain denotes a set of fork opera-tions followed by a join operation. Essentially, we are asking for a parti-

Ž .tioning of the DAG into a set of antichains each block is an antichain ,such that we minimize the sum of the costs of the antichains. We alsorequire that for each edge i ª j, the block that i belongs to is before the

Ž .block j belongs to in other words, the antichains cannot cross .A natural algorithm for this problem is a greedy algorithm, which

defines the next block B to be the set of all vertices, all of whosekpredecessors are in an already picked block B , j - k. This greedy algo-j

rithm may perform very poorly for the fork]join problem, since it mayspread high cost vertices between several blocks, instead of grouping themin the same one; e.g., if the DAG is just a collection of n chains where theith vertex in chain i has the highest cost, then it pays to put all the highcost vertices in each chain in one block. The greedy algorithm, on theother hand, will put each high cost vertex in a different block.

1.3. Our Results

It is easy to show that the loading time scheduling problem is NP-com-Ž .plete for arbitrary M i , even when there are no precedence constraints,

w x Žby a reduction from the set-cover problem 5 . The elements correspondto tasks, and each subset corresponds to a machine. A task can be done ona machine if the corresponding element belongs to the set corresponding

. < Ž . <to the machine. We show that even when M i s 1, and r G 4, theproblem is NP-complete by a reduction from the shortest common super-

w xsequence problem 5, problem SR8 . Moreover, the reduction proves thehardness results even for the case of unit loading times. A recent result ofw x12 implies that LTSP is NP-complete for r s 3.

Hardness results: We show that for any constant d , no polynomialtime approximation algorithm with a factor of logd n is possible unless

Ž OŽlog log n..NP : DTIME n .

Greedy algorithm: First we show that the greedy algorithm performspoorly for LTSP. The greedy algorithm is a very natural algorithm for this


problem: at each step it schedules the machine on which the maximumnumber of tasks can be performed. We give an example where the

'Ž .approximation ratio of the greedy algorithm is V n , even when r s 4nŽ .When r s log n, then the approximation ratio can be as bad as V .log n

Approximation algorithm: Our main contribution is an approximationalgorithm for LTSP. This algorithm achieves a worst case approximation ofr, where r is the total number of machines. The idea is to compute foreach task i a function which is a lower bound on the total loading timeneeded to schedule i. We sort the sets of tasks according to this function,and then schedule the tasks in a greedy manner. We also show that the

Ž . Ž .algorithm can be implemented in O n log n q r e time and O rn q espace, where n and e are the number of vertices and number of edges,respectively, in the graph.

From the practical point of view, an approximation factor of r is muchbetter than an approximation factor that is a function of n, since r is

Ž .typically very small 4 or 5 , compared to the size of the task graph that canw xhave, for example, over 1000 features for an engine block 6 .

We also give a second approach that gives a simpler r-approximation forLTSP. However, we believe that the previous method will produce bettersolutions in practice. In any case, both approaches are described as theyuse very different techniques.

Finally, we discuss a natural linear programming approach to LTSP andshow that the integrality gap of the linear relaxation for the integer

Ž .program derived is at least r y 1 r4, meaning that this approach is notuseful for obtaining significantly improved approximation factors.

w xFork–join problem: Sarkar 14 presents a heuristic for solving thefork]join problem, with no analysis, and conjectures that his heuristic hasa constant worst case guarantee. We have been able to construct an

Ž .example for which Sarkar’s algorithm has a performance of V log n timesthe optimal cost, disproving his conjecture. However, no proof is knownthat this is also an upper bound for the algorithm’s performance. It shouldbe noted that Sarkar reports that the heuristic works very well in practice.We show that an instance of the fork]join problem can be mapped to aninstance of LTSP. We use the same technique to obtain an algorithm with

Ž Ž ..an approximation ratio of O min log W, log n for this problem. This isthe first worst case approximation algorithm for this problem.

Weighted SCS problem: For the SCS problem where each letter of thealphabet has an arbitrary weight, we are able to obtain an algorithm withan approximation factor of r. Here, r is the size of the alphabet.


2. GREEDY ALGORITHM

The most obvious algorithm for the loading time scheduling problem is agreedy algorithm. At each step this algorithm schedules the machine onwhich the largest number of tasks can be performed on a single run. Weshow that this algorithm can perform very poorly; i.e., it may achieve an

Ž .approximation factor of V nrlog n .'Ž .We first show an example where the approximation factor is V n .

Ž .See Fig. 2. There are four machines each with unit loading time. The first'row of the DAG contains n tasks. All these tasks can be performed on

Ž .machine 3. They can also be performed on machines 1 and 2 alternately .The second row contains n tasks, all of which can be performed on

Žmachine 4, as well as on either 1 or 2 depending on the machine their.parent can be done on . Clearly, the maximum number of tasks that can be

' 'done on any single machine initially is 1 q n on machine 1, n onmachine 3, and 0 on machines 2 and 4. The greedy algorithm will schedule

'machine 1. After performing 1 q n tasks, the greedy algorithm schedules' 'machine 2 and performs another 1 q n tasks. Since there are n q n

'tasks in all, the greedy algorithm will obtain a solution of length n thatalternates between the 1’s and the 2’s. The optimum solution is to

'schedule all the tasks on machines 3 and 4. First do n tasks on machine3, then do the remaining n tasks on machine 4.

We now show how to modify this instance so as to obtain an instance forwhich the performance ratio of the greedy algorithm can be as bad asŽ .V nrlog n . The idea is to extend the instance depicted in Fig. 2 to an

Ž .instance containing k to be defined later replicas of this instance,connected in ‘‘levels.’’

Our basic building block in Fig. 3 has the same structure as the instancedepicted in Fig. 2. It has two layers of nodes. These nodes are linked bytwo kinds of arcs: those that are incident on nodes in the same layer andthose that are incident on nodes in different layers. The arcs of the former

Žkind form a chain in each layer. The first layer contains r to be defined.later tasks, and each task in the first layer has two successors in the

FIG. 2. Bad example for the greedy algorithm.


FIG. 3. Basic building block for r s 4.

second layer. Let u and ¨ be two tasks in the first layer: the sets inducedby the successors of u and ¨ in the second layer are disjoint.

We will now connect k instances of our basic building block in levels.Let the building blocks be denoted by B , . . . , B , where the indices are1 kordered according to levels. For 1 F i F k, in block B , the value of r isiequal to 2 i. For 1 F i - k: there is a single outgoing edge from the jthtask in the second layer of block B to the jth task in the first layer ofiblock B , for 1 F j F 2 iq1.iq1

In block B , for 1 F i F k:i

1. All tasks in the first layer can be performed on machine l , and allitasks in the second layer can be performed on machine lX.i

Ž .2. In the first layer: i all tasks such that their distance from thebeginning of the layer is odd, and their successors in the second layer, can

Ž .be performed on the same machine, denoted by o . ii All tasks such thatitheir distance from the beginning of the layer is even, and their successorsin the second layer, can be performed on the same machine, denoted by e .i

All the machines in this construction have unit loading time.This construction can be thought of as a threaded tree, where all tree

edges are directed toward the leaves, and in addition to the tree edgesthere are paths that connect vertices in the same level. The optimalalgorithm will traverse the graph in a BFS fashion. In other words, it willperform the tasks block-by-block: perform the tasks in block B then1perform the tasks in block B and so on. The tasks in block B are2 iperformed in two steps as follows: first do all the tasks in the first layer onmachine l and next do all the tasks in the second layer on machine lX. Thei i

Ž .cost of each block is therefore 2 machines have unit loading time . Thus,the cost of an optimal algorithm is at most 2k. In contrast, our greedyalgorithm could traverse the tree edges in a DFS fashion. This followssince, at each step, there is a greedy choice that complies with a DFStraversal of the tree edges. For example, for the construction in Fig. 4 thegreedy algorithm will first do the leftmost node in the first layer of blockB and its two successors on machine o . Next it will do the leftmost node1 1in the first layer of block B and its two successors on machine o . After2 2this it will do the second from left node in the first layer of block B and2


FIG. 4. The construction for k s 2.

its two successors on machine e . Next it will do the second from left node2in the first layer of block B and its two successors on machine e . Finally1 1it will do the third and fourth node in the first layer of block B and their2two successors on machines o and e , respectively, in that order.2 2

Note that the greedy algorithm performs at most one task from the firstlayer of at most one block, for each machine on which it schedules a task.The total number of tasks in the first layer of all the blocks is Ýk 2 i.is1Therefore the cost of the greedy algorithm is Ýk 2 i, yielding that theis1

k i Ž .performance ratio of the greedy algorithm is Ý 2 r 2k . The number ofis1Ž .tasks that need to be processed is denoted by n. We choose k s Q log n ,

yielding that the approximation factor of the greedy algorithm can be asŽ .bad as V nrlog n .

3. APPROXIMATION ALGORITHMS

In this section we present two approximation algorithms, as well as anapproach based on linear programming for LTSP.

The main idea behind the first approximation algorithm is to compute aU Ž .function T i that represents the lower bound on the total loading time

incurred to scheduling task i on any machine. We will use the function TU

in building the schedule as well as in the proof of the approximation factorof r. In fact, we will show that the total loading time incurred by any

Ž .machine in our schedule possibly over several loadings of this machinedoes not exceed OPT. This immediately gives an approximation factor of


Ž .r, the number of machines. For each task i, let Pred i denote the set ofpredecessors of i in G.

Ž .We first compute the function T i, j which is a lower bound on theU Ž .time incurred if task i is scheduled on machine m . T i is the time forj

Ž .scheduling task i as quickly as possible on any machine m g M i .jU Ž . Ž . U Ž . Ž . U Ž .T i s min T i, j where m i s m , such that T i, j s T i .m g M Ž i. jj

Ž .We now define T i, j . The first two cases follow easily. The idea in thethird case is to use the lower bound that we have computed for the task

Ž .i g Pred i , which has two cases depending on whether task i is sched-p p

uled on m or not.j

¡ if m f M iŽ .j

l m if Pred i s BŽ . Ž .j~T i , j sŽ . Umax min T i , j , T i q l mŽ .Ž . Ž .Ž .½ 5i g P r edŽ i. p p jp¢ otherwise

The following two propositions are immediate.

Ž .PROPOSITION 1. In the computation of T i, j , i may be restricted to thepset of immediate predecessors of i.

Ž . U Ž . U Ž .PROPOSITION 2. If i g Pred i then T i F T i .p p

ŽLet OPT be the cost of the optimal solution with minimum loading time.to complete all the tasks .

U Ž .LEMMA 3. For any i, OPT G T i .

Ž .Proof. We show that T i, j is a lower bound on the time elapsed iftask i is scheduled on machine m . In other words, we show that in anyjfeasible solution, in which task i is scheduled on machine m , the sum ofjthe loading times of all the machines on which tasks are scheduled before

Ž .or at the same time when task i is scheduled is at least T i, j . Note thatU Ž .this would imply, for any i, OPT G T i .

The proof is by induction on the levels in the DAG. For tasks at level 0,Ž Ž . . Ž . Ž .defined by Pred i s B , it is obviously true, since T i, j s l m ifj

Ž .m g M i . Assume it is true for tasks in the first k levels. Consider now ajŽ .task i at level k q 1. Let m g M i , and let i be a predecessor of i. Inj p

any feasible solution, i is scheduled before, or, at the same time as i.p

Assume that i is scheduled on machine m .p jp

By the induction hypothesis, the sum of the loading times of all theŽ .machines which are scheduled before or with task i is at least T i , j .p p p


There are two cases to be considered:

Ž .a j s j : then, at best, task i can be scheduled with task i , andp p

hence the sum of the loading times of all the machines which areŽ .scheduled before or with i is at least T i , j .p

Ž .b j / j : in this case, i is scheduled strictly later than i , and hencep pthe sum of the loading times of all the machines which are scheduled

Ž . Ž . U Ž . Ž .before or with i is at least T i , j q l m G T i q l m .p p j p j

The above implies that the sum of the loading times of all the machineswhich are scheduled before or with i is at least

max min T i , j , TU i q l m s T i , j .Ž . Ž .Ž . Ž .Ž .½ 5p p jŽ .i gPred ip

Ž U Ž ..For each task we create a vertical interval of length l m i , with theU Ž . Ž .lower end of the interval at distance T i from the x-axis see Fig. 5 . Two

intervals are said to oërlap if there is a horizontal line that cuts both theintervals.

We now give a high level description of the algorithm. Assume that S isŽthe set of tasks that still needs to be scheduled initially S is the entire set

.of tasks . The algorithm sweeps a horizontal line from top to bottom.When the sweep-line crosses the lower end of a vertical interval, weschedule the task associated with that interval. Assume that task x is the

FIG. 5. Example to show two nonoverlapping intervals.


U Ž .first task to be scheduled on m x . At this point we also schedule otherU Ž . Ž .tasks that can be done on m x R will denote the set of these tasks .x

ALGORITHM

U Ž . ŽStep 1. The set S is sorted by increasing T i value the lower end of. U Ž . U Ž . Ž .the intervals . If T i s T j , and i g Pred j , then i occurs before j

in S.Step 2. Pick the first task from S and call it x.

U Ž .Step 3. Pick as many tasks from S as can be performed on m x .� Ž U Ž . Ž .. wŽ Ž .. U Ž .Formally, let R s y ¬ m x g M y n ;z g S l Pred y , m x gx

Ž .x4M z .U Ž .Step 4. Schedule R on m x .x

Step 5. Remove R from S and return to Step 2.x

X X Ž . U Ž X. U Ž .Notice that for any two tasks i and i , if i g Pred i , then T i F T i ,and iX precedes i in the set S. Hence, the solution produced by the above

Žalgorithm is feasible. Let OPT be the cost of the optimal solution with.minimum loading time to complete all the tasks .

LEMMA 4. Let x be a task picked in Step 2. Let y g S such thatU Ž . U Ž .m x s m y . If the inter als of x and y oërlap, then y g R .x

Ž .Proof. Suppose that there exists a task z g S l Pred y , such thatU Ž . Ž . U Ž . U Ž . U Ž .m x f M z , and T x F T z F T y that prevents us from putting

Ž U Ž .. Ž U Ž ..y in R . This implies that T z, m x s `, and T z, m y s `.xŽ . U Ž . Ž U Ž .. Ž U Ž .. Ž ŽSince z g Pred y , T y s T y, m y s T y, m x G min T z,

U Ž .. U Ž . Ž U Ž ... U Ž . U Ž . Ž U Ž ..m y , T z q l m y . This proves that T y G T z q l m yU Ž . Ž U Ž ..G T x q l m y , contradicting the assumption that the intervals of x

and y overlap.

THEOREM 5. The total loading time incurred for any particular machineU Ž .m in a schedule output by the algorithm is at most max T i .j i

Proof. Consider any particular machine m . This machine may occur inj

the schedule many times. Each time the loading time is incurred for theŽ .first task in particular, the task picked in Step 2 . We charge this task for

the loading time. By Lemma 4 the intervals for the tasks that get chargedfor loading machine m do not oërlap. Hence, the total charge on thej

U Ž Ž .. Ž .loading times for machine m is T L j where L j is the last task to bej

charged for machine m .j

We conclude with the following theorem.

THEOREM 6. If the total number of machines is r, the total loading time isU Ž .at most r ? max T i F r ? OPT.t


3.1. Implementation

Ž .THEOREM 7. The algorithm can be implemented in O n log n q r e timeŽ .and O rn q e space, where n and e are the number of ërtices and number

of edges, respecti ely, in the DAG.

Ž . U Ž .Proof. Note that the computation of T i, j and T i can be done inŽ .O r e time, by topologically sorting the DAG and then doing a local

computation at every task, according to the order in which the tasksŽ .appear in the topological sort. As mentioned before, the value of T i, j

can be computed by just looking at the immediate predecessors of task i,Ž Ž ..and therefore the local computation for task i takes O r ? indeg i time,

Ž .where indeg i is the in-degree of task i. For each task we need to store rŽ .values, so the total space requirement is O rn .

The rest of the algorithm is implemented as follows: For each task weŽmaintain its in-degree in the current DAG the tasks that are already

.scheduled are not in the current DAG . For each of the r machines, westore a doubly linked list of pointers to tasks in the current DAG with thefollowing property. Let m and i be a machine and a task, respectively, with

Ž .m g M i . If the in-degree of i in the current DAG is 0, then the linkedlist for m contains task i. The sorted list S of tasks is implemented as adoubly linked list. Note that there are r q 1 doubly linked lists and hence

Ž .all this requires O rn space. The use of doubly linked lists facilitatesdeletion of tasks from the lists in constant time per list.

Call the following one phase of the algorithm. Let x be the first task inU Ž .the current set S. Using the linked list for m x find set R . Remove allx

tasks in R from the DAG and update all the doubly linked lists and thexin-degree of the tasks.

We will now show that a phase of the algorithm can be implemented inŽ Ž .. Ž .O rÝ outdeg r time, where outdeg r is the out-degree of task r in ther

original DAG. Since every node is in an R set once, the time boundŽ .follows. We use an additional Queue of size O n to do this.

U Ž .Put all the tasks in the list for machine m x in the Queue. Repeatedlydo the following until the Queue is empty. Remove a task j from theQueue, delete j from S, DAG, and the linked lists for all the machines,and add j to R . For all immediate successors of j do the following: if i isx

Ž . Ž .an immediate successor of j, decrease indeg i by 1. If indeg i becomes 0then if

Ž . U Ž . Ž .a m x g M i , then add i to the Queue,Ž . U Ž . Ž .b m x f M i , then add i to the linked list for every machine m

Ž .such that m g M i .

At the end of a phase of the algorithm the tasks in the set R are noxlonger in any of the linked lists or the DAG. The linked lists for the


machines contain only those tasks with indegree 0 in the new DAG. Notethat in a phase of the algorithm, every task i that is added to the Queue,ends up in R . Since i is added at most once to the Queue, and for i we doxŽ Ž ..O r ? outdeg i work, the bound follows.

3.2. Uni ersal Sequences

A universal sequence is an infinite list of machines such that if themachines are scheduled according to the list, then any set of tasks can beperformed on them, independent of the DAG. This concept was recently

w xsuggested by Mansour 11 in the context of randomized approximationalgorithms which can be shown to achieve an expected approximationfactor of r y 1. The difficulty with this approach is that it may generate a

w xuniversal sequence of exponential length. Azar 1 suggested a way to makeit deterministic, but this method increases the approximation ratio by afactor of 2. Here, we show how to get a r-approximation in polynomialtime via a deterministic algorithm. We describe a deterministic r-ap-proximation algorithm that also generates a universal sequence. The ideais to generate a sequence such that for each prefix of the sequence, thenumber of times a machine appears in it is inversely proportional to itsloading time.

Ž . � Ž .Recall that l m is the loading time of machine m . Let I s k ? l m ¬j j j j4 �Ž . 4k s 1, 2, . . . be an infinite set. Define I s a, b ¬ a g I .b

Ž .Let the comparison operator on the tuples a, b be defined as:

a, b ) c, d if a ) c or if a s c and b ) d.Ž . Ž .

Sort the elements of a set I in increasing order and if this sorted sequenceŽ . Ž .is a , b , a , b . . . then the universal sequence is the machine sequence1 1 2 2

m , m . . . .b b1 2

EXAMPLE. Suppose we have three machines with loading times of 2, 3,and 7, respectively.

� 4 � 4 � 4I s 2, 4, 6, 8, . . . , I s 3, 6, 9, 12, . . . and I s 7, 14, 21, 28, . . . .1 2 3

If we sort I in increasing order, the sequence that we obtain is:

2, 1 , 3, 2 , 4, 1 , 6, 1 , 6, 2 , 7, 3 , 8, 1 , 9, 2 , 10, 1 ,Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž .12, 1 , 12, 2 , 14, 1 , 14, 3 . . . .Ž . Ž . Ž . Ž .

Ž .The corresponding sequences of machines universal sequence is

m , m , m , m , m , m , m , m , m , m , m , m , m . . . .1 2 1 1 2 3 1 2 1 1 2 1 3

To obtain the actual schedule of machines, we will scan the universalŽ .sequence from left to right and output a machine if it can do one or more


ready jobs from the DAG. The completed jobs will then be deleted fromthe DAG. A potential problem with this scheme is that the portion of theuniversal sequence that we need to generate may be exponential in length.One way to address this is to keep track of the set of ready machines ateach step by examining the DAG, i.e., machines that can do one or moreready jobs. We then need to output the next such machine from theuniversal sequence. This can be done very efficiently as follows: Call a jobready if it has in-degree 0 in the current DAG. Call a machine m ready if

Ž . Rm g M i for some ready job i. Let M be the set of ready machines. LetŽ .a , b g I be the tuple corresponding to the last ready machinep p

Ž . Ž .output by the algorithm. Let a , b g I be the first tuple after a , b inj j p p

Ž . Rthe sorted sequence for I when scanning right such that m g M , thenbj

Žuthe next ready machine in the universal sequence is m . Let d s a rb k pjŽ Ž ..v Ž . u Ž Ž ..v Ž .l m l m if a r l m l m ) a or k ) b , and let d s a qk k p k k p p k p

Ž . Rl m otherwise. Then m is the machine in M with the smallest valuek bjŽ .for d ties broken for machine with smaller index and can therefore bebj

efficiently computed. Let a s d .j b j

We now prove that the total loading time of the machines in the prefixof the universal sequence that was generated does not exceed r ? OPT. Letthe loading time of a sequence of machines be the total loading time of allthe machines in the sequence.

Ž .THEOREM 8. Any sequence of machines oër m , m , . . . , m with total1 2 r

loading time l occurs as a subsequence of a prefix of the uni ersal sequence oftotal loading time at most r ? l.

Proof. First of all, note that in the construction of the universalŽ . Žsequence, if we restrict the largest element in set I to be l, r call this

l. lfinite set I , then we get a prefix U of the universal sequence of totalloading time at most r ? l. This follows from the fact that in set I l there

? Ž Ž ..@ Ž .are at most lr l m tuples of the form y, k and for each of thesektuples exactly one instance of machine m is output in U l. Therefore thek

l ? Ž Ž ..@ Ž .total loading time incurred for machine m in U is lr l m l m F l.k k kSince there are r machines the total loading time of U l is at most r ? l.

Let M s m , m , m , . . . , m be any sequence of machines of totala a a a1 2 3 kk Ž .loading time Ý l m s l. We now show that M is a subsequence ofjs1 a j

U l, thus proving the theorem. Let f be a function that assigns numbers tothe k machines in the sequence M defined as:

Ž . Ž . Ž .a f m s l ma a1 1

Ž . Ž . Ž .b f m s smallest integer in I ) f m if a F a , i ) 1.a a a i iy1i i iy1

Ž . Ž . Ž .c f m s smallest integer in I G f m if a ) a , i ) 1.a a a i iy1i i iy1


We will establish the following two facts:

f m , a g I l for 1 F i F k .Ž .Ž .a ii

f m , a ) f m , a for i ) j.Ž . Ž .Ž . Ž .a i a ji j

Note that these facts are sufficient to establish that M occurs as al Ž Ž . .subsequence of U . This is because if f m , a is the b th tuple in thea i ii

sorted sequence I l and if U l s u , u , u . . . then M s u , u , u . . . .1 2 3 b b b1 2 3

Since the previous facts imply that b ) b whenever i ) j, M occurs as ai jsubsequence of U l.

Now we establish the two facts thus proving the theorem. Note thatŽ . Ž . Ž . � Ž .f m y f m F l m , i ) 1. This is because I s j ? l m ¬ j sa a a a ai iy1 i i i

41, 2, . . . and therefore the smallest integer in I which is greater thana iŽ . Ž . Ž .f m is at most f m q l m . The result now follows from condi-a a aiy1 iy1 i

Ž . Ž . Ž . Ž . Ž .tions b and c . From condition a we have f m s l m . Adding upa a1 1

Ž . k Ž .these equations we get f m F Ý l m s l. Also note that the threea js1 ak j

Ž . Ž . Ž . Ž .conditions a , b , and c ensure that the value f m assigned to anyiŽ Ž . .machine m in the sequence M is from the set I . Therefore f m , ai i a ii

g I l for 1 F i F k.Ž . Ž .The conditions b and c ensure that the numbers assigned by f to

the machines in the sequence M form a nondecreasing sequence with tiesŽ Ž . .only when the earlier machine has lower index. Therefore f m , a )a ii

Ž Ž . .f m , a for i ) j.a jj

THEOREM 9. The uni ersal sequence algorithm yields a r-approximationfor the loading time scheduling problem.

Proof. Let I be a given instance of the LTSP. Let M be the sequenceof machines corresponding to an optimal solution for the instance I. Letthe total loading time of the machine sequence M be OPT. By Theorem 8,M occurs as a subsequence of a prefix of the universal sequence of totalloading time at most r ? OPT. Therefore the algorithm that outputs readymachines by scanning the universal sequence from left to right is guaran-teed to find a feasible solution of the LTSP instance I of total cost at mostr ? OPT, in polynomial time.

3.3. Integrality Gap

In this section we discuss a natural linear integer programming approachto the loading time scheduling problem. The integrality gap of an integerprogram is the ratio of best integral solution to best fractional solution in alinear relaxation of the program. We prove that the integrality gap for the

Ž .integer program that we construct is at least r y 1 r4, meaning that this


Žapproach cannot yield improved approximation factors by more than a.constant factor . In fact, the integrality gap holds for the shortest common

subsequence problem as well. Recall that the shortest common superse-quence problem is a special case of the loading time scheduling problemby viewing alphabet letters as machines and each sequence as definingprecedence constraints between tasks. For example, a sequence a , . . . , a1 k

Ž .corresponds to task t , . . . , t such that: i tasks t , . . . , t must be1 k 1 iy1Ž .executed before task t , for all 2 F i F k; ii task t can only be executedi i

on the machine corresponding to letter a , for all 1 F i F k.iOur instance of the loading time scheduling problem is a DAG consist-

ing of chains, where each task can be performed on a single machine.Suppose there are r machines, m , . . . , m . In our instance, the machines1 r

on which the tasks can be performed in each chain induce a permutationof 1, . . . , r, and there are r! chains altogether. All machines have unitloading time.

A fractional schedule is defined as follows. At each time slot we allow aŽ .fraction of a machine to be scheduled with the following constraints: i

the sum of the fractions of the machines scheduled together at a time slotŽ .cannot exceed 1, and ii a fraction a of a task can be performed in time

slot t, only if an a fraction of all of its predecessors has already beencompleted up to time slot tX, where tX - t. We omit the details of defininga fractional solution for arbitrary instances of the loading time schedulingproblem, since our goal here is proving a lower bound.

More formally, we model the above mentioned instances of the loadingtime scheduling problem by the following integer program:

Let T be a large number such that the cost of the optimal solution is atŽmost T for example, T can be chosen to be n where n is the number of

.tasks . We assume there are T time slots each of unit size. For eachmachine i and time slot j we define a binary variable x , where x s 1i, j i, jiff machine i is scheduled at time slot j. For each task i and time slot j wedefine a binary variable y , where y s 1 iff task i is scheduled at timei, j i, jslot j. We relax the constraints of x and y being binary variables byi, j i, j

the constraint that they lie between 0 and 1. The relaxation is as follows:Ž Ž . .M i is the unique machine on which task i can be scheduled .

rT

Minimize xÝ Ý i , jjs1 is1

r

x F 1, ; j s 1 . . . TÝ i , jis1

k ky1

y F y , ; i , i , k such that i g Pred i and k is any time slotŽ .Ý Ýi , j i , j p ppjs1 js1


y F x , ; i s 1 . . . n , ; j s 1 . . . Ti , j M Ž i. , j

T

y , ; i s 1 . . . nÝ i , jjs1

0 F x F 1, ; i s 1 . . . r , ; j s 1 . . . Ti , j

0 F y F 1, ; i s 1 . . . n , ; j s 1 . . . Ti , j

We show that a fractional solution to this LP relaxation of the abovementioned LTSP instance is T s 2 r y 1 and x s 1rr for 1 F j F Ti, j

and 1 F i F r. The cost of this solution is T s 2 r y 1 and hence the costof the optimal fractional solution is at most 2 r y 1.

We have to show that all r! chains of tasks can be performed using theschedule which has 2 r y 1 time slots and in which, at each time slot afraction of 1rr of each machine is scheduled. Note that in the t-th timeslot for t F r a fraction 1rr of the first t tasks in each chain can bescheduled. In other words for each of these tasks i we can set y s 1rr.i, tThis follows from the fact that in the first time slot a fraction 1rr of thefirst task in each chain can be scheduled. This means that in the next timeslot a fraction 1rr of the second task in each chain is ready to bescheduled. Therefore in the second time slot a fraction 1rr of both thefirst and the second task in each chain can be scheduled and so on. Note

r y i q 1 Ž .that after the r th time slot a fraction of the ith task 1 F i F r inr

each chain can be scheduled. Also note that after time slot r a fraction1rr of each task can be scheduled in every time slot. Hence the last taskŽ . Ž Ttherefore every task in each chain can be completely scheduled Ý yjs1 i, j

.s 1, ; i after r y 1 additional time slots. Hence in 2 r y 1 slots all taskscan be completely scheduled.

Ž .In contrast, we claim that the length number of time slots of anyŽ 2 .integral solution S is at least V r . This is proved as follows. Scan S

from left to right, and stop when all machines are recorded. Without lossof generality, suppose that m is the last machine recorded. Let j be ther r

place in the list where m appears. Clearly, j G r. Resume the scanningr r

of the list and stop again when all machines m , . . . , m are recorded.1 ry1Without loss of generality, suppose that m is the last machine recorded.ry1Let j be the place in the list where m appears. Clearly, j G 2 rry1 ry1 ry1

y1. Repeat the scanning, and let j , . . . , j be defined similarly. Therery2 1are two cases. If the list S is exhausted before j is defined, for some i G 1,ithen the chain of tasks that corresponds to the permutation m , m ,r ry1

. . . , m cannot be performed by S. Otherwise, the length of S is at least j ,1 1Ž . Ž .which is r r y 1 r2, yielding that the integrality gap is at least r y 1 r4.


4. HARDNESS RESULTS

We prove the following theorem regarding the loading time schedulingproblem. This theorem holds even for the restricted case, when each jobcan be done only on a single machine.

THEOREM 10. For any constant d , there does not exist a polynomial timeŽ .dalgorithm that has an approximation factor of log n , unless NP :

Ž OŽlog log n..DTIME n .

For the following proof we use a restricted version of the SCS problem,where there are no consecutive runs of the same alphabet in the se-quences. In other words there are no adjacent repeated characters in anysequence. The main idea is to take an instance X of the restricted versionof the SCS problem and to convert it into a large instance of the LTSPproblem. Toward this end we define a power operator on the restrictedinstances X of the SCS problem, such that for any k G 1, X k defines aninstance of the LTSP problem such that a solution of cost l for theinstance X of the SCS problem can be used to construct a solution of cost

k k Ž .dl for the LTSP instance X and vice versa. Using a log n -approxima-tion algorithm for the LTSP problem, we are able to obtain a c-approxima-tion algorithm for the restricted SCS problem, for any c. We then use thefact that the restricted SCS problem is MAX SNP-hard, so a c-approxima-tion algorithm would imply the existence of an algorithm to find theoptimal solution with the same running time. A similar technique is used

w xby 8 to establish the hardness of approximating shortest common super-sequences and is our motivation for this proof.

4.1. Preliminaries

DEFINITION 1. An LDAG is an acyclic digraph for which each vertex islabeled by a single letter from a given alphabet.

DEFINITION 2. A minimal supersequence z of an LDAG is defined asfollows:

v If the LDAG is empty, then z s B

v Let a be the first letter of z, i.e., z s a ? z , then some in-degree 01node in the LDAG is labeled with a; and z is a minimal supersequence of1the LDAG obtained by deleting all indegree 0 nodes that have label a.

DEFINITION 3. A supersequence of an LDAG is any sequence thatcontains a minimal supersequence of the LDAG as a subsequence.

DEFINITION 4. Let X , X , . . . , X be a collection of LDAGs. Let1 2 kX s X ? X ??? X denote the LDAG that is obtained by connecting each1 2 k


X to X by a set of directed edges that go from each vertex ofi iq1out-degree 0 in X , to each vertex of in-degree 0 in X .i iq1

w xThe following definitions from 8 are extended to LDAGs.

DEFINITION 5. Let S and SX be two alphabets. Let a g S and b g S

X

Ž . Xbe two letters. The product a = b is the composite letter a, b g S = S .

DEFINITION 6. The product of an LDAG X and a letter b is the LDAGŽ .X = b obtained by taking the product of the label of each vertex of X

Žwith b. The structure of the LDAG stays the same, only the labels.change.

DEFINITION 7. The product of an LDAG X and a sequence y s b . . . b1 kŽ . Ž . Ž . Ž .is the LDAG X = y s X = b ? X = b ??? X = b .1 2 k

� 4DEFINITION 8. The product of an LDAG X with a set Y s y , . . . , y1 nŽ . n Ž .of sequences is denoted by X = Y s D X = y .is1 i

DEFINITION 9. Let the LDAG X be a union of disjoint sequences,where each sequence is also viewed as a chain. Then, we define the LDAGX k s X ky1 = X, where k ) 1.

The previous definitions are illustrated with an example for the con-struction of X 2 in Fig. 6.

The number of vertices in the LDAG X k is nk, where n is the numberŽof vertices in the LDAG X the number of vertices in a chain is the length

. k kof the chain . The alphabet size of the labels of the LDAG X is m ,where m is the alphabet size of the labels of the LDAG X.

w xDEFINITION 10. Let z be a sequence. By z a . . . b , we denote thesubstring of z from position a to position b.

The following propositions are immediate from Definitions 2 and 4.

FIG. 6. An example.


PROPOSITION 11. Let X s X ? X ??? X where each X is an LDAG.1 2 k iLet z be a minimal supersequence for X . Then, z s z ? z ? ??? ? z is ai i 1 2 kminimal supersequence for X.

PROPOSITION 12. Let X s X ? X ??? X where each X is an LDAG.1 2 k iLet z be a minimal supersequence for X. We can uniquely decomposez s z ? z ? z . . . where each z is a minimal supersequence for X and this1 2 3 i i

Ž < <.decomposition of z can be obtained in time O z by scanning z.

DEFINITION 11. Let X be an LDAG and z be a supersequence of X.Let the length of the smallest prefix zX of z which is a supersequence of X

Ž . < X < Y Xbe denoted by the function g X, z s z . Let z be a subsequence of zŽ .obtained as follows: Let a be the first letter when scanning left to right1

of zX such that some in-degree 0 node in the LDAG X is labeled with a .1Delete all in-degree 0 nodes labeled a in the LDAG X and find the next1letter a of zX such that some in-degree 0 node in the new LDAG X is2labeled with a . Repeat this procedure until there are no more in-degree 02

Y Ž . Ynodes in X. Let z s a a . . . and let the function h X, z s z .1 2

Ž .PROPOSITION 13. In the aboë definition of the functions g X, z andŽ . Xh X, z the prefix z can be computed in polynomial time by scanning z and X

Y Ž .and the sequence z is unique for a fixed X, z pair and can be computedand extracted from z in polynomial time. In addition the last letter of zY is thesame as the last letter of zX.

4.2. Main Lemmas

Let the LDAG X be a disjoint union of sequences. In this section weestablish that there is a supersequence of length l for the LDAG X iffthere is a supersequence of length l k for the LDAG X k. In addition wealso show that this is a constructive proof: given one supersequence we canconstruct the other in polynomial time. Note that if z is a supersequenceof the LDAG X then z k is a supersequence of the LDAG X k. Hence weonly have to establish that given a supersequence z of the LDAG X k we

< <1r kcan construct a supersequence of the LDAG X of length at most z .This is established by the following lemmas. Later in this section wepresent an example illustrating the proof of these lemmas.

LEMMA 14. Let the LDAG X be a disjoint union of sequences, and z asupersequence of X k. We can find k minimal supersequences z , z , . . . , z of1 2 kX such that the product of the length of these sequences is at most the length of

< < < < < < < <z, i.e., z ? z ??? z F z . Hence we can find a supersequence of X of size1 2 k

< <1r k < k <at most z . This can be done in time which is polynomial in X .

Proof. The proof is by induction on k. The lemma is clearly true fork s 1. We prove the induction step separately, in Lemma 15.


LEMMA 15. Let the LDAG X be a disjoint union of sequences and let z bek < k < Ua supersequence for X . In time polynomial in X we can find sequence z

< U < < < U Ž . Ž . Žsuch that z F z and a decomposition z s z = x ? z = x ??? z =1 1 2 2 r. ky1x , where each z is a minimal supersequence of X , x is a letter of ther i i

alphabet of X, and x s x ? x ??? x is a minimal supersequence of X. This1 2 r< < < < < U < < <implies that for some j, z ? x F z F z .j

Proof. We give a constructive proof. Note that X k is a disjoint union ofŽ ky1 . Ž ky1 . Ž ky1 .LDAGs of the form X = a ? X = a ??? X = a where the1 2 ll

sequence a .a . . . a is in X. We will rearrange the sequence z to a new1 2 ll

supersequence zY? zX of the LDAG X k where zY is a minimal superse-

Ž ky1 . Xquence of X = x for some letter x . Note that z is therefore ap p

supersequence of the LDAG obtained from X k by replacing every LDAGŽ ky1 . Ž ky1 . Ž ky1 . k Ž ky1 .X = a ? X = a ??? X = a in X by X = a ???1 2 ll 2Ž ky1 .X = a , whenever a s x . We then iteratively apply this procedurell 1 pto the new LDAG and its supersequence zX.

�More formally let S s x ¬ x appears as the first letter in some se-i i4 ŽŽ ky1quence of X and let x be a letter in S for which min g X =p x g Si

. . Y ŽŽ ky1 . .x , z is attained. Delete z s h X = x , z from z, yielding se-i pX Y Ž ky1 .quence z . Output z , which is a minimal supersequence for X = x .p

Replace X k by X ky1 = X where X is obtained by replacing every1 1Ž .sequence of the form x ? Y in X with Y.p

< X < < Y < < < Y XNote that z q z s z . We claim that z ? z is a supersequence forX k for the following reasons.

vk ky1Ž .For an LDAG in X of the form R s X = x ? T we have byp

Ž . Ythe previous propositions that h R, z s z ? t, where t is a minimalw ŽŽ ky1 . .supersequence of T and is a subsequence of z g X = x , z qp

< <x X Y X1 . . . z . Hence t is a subsequence of z . Thus z ? z contains a minimalsupersequence of R.

vk ky1Ž .For any LDAG in X of the form R s X = x ? T , wherei

ŽŽ ky1 . .x / x we have by the propositions and by the fact g X = x , z Gi p i

ŽŽ ky1 . . Ž . Ž Y X. Y Xg X = x , z that h R, z s h R, z ? z . Thus z ? z contains a mini-pmal supersequence of R.

Therefore zX is a supersequence for the modified X k. We iteratively applythis procedure to zX with the modified X k given by X ky1 = X until X is1 1empty. It is easy to see that if this algorithm outputs z , z , . . . , z se-1 2 r

Ž ky1quences in that order then each z is a minimal supersequence of Xi.= x for some x letter in the alphabet of X and x ? x ??? x is a minimali i 1 2 r

supersequence of X. The running time of the algorithm is polynomialk< <in X .

We now illustrate the proof of Lemma 15 with the example in Fig. 6. Inthis example we assume k s 2.


Let z, the given supersequence for the LDAG X 2, be

z s a, a a, b b , a b , b a, a a, b b , a a, aŽ . Ž . Ž . Ž . Ž . Ž . Ž . Ž .= b , b b , a b , b a, b a, a .Ž . Ž . Ž . Ž . Ž .

� 4In the first iteration S s a, b , x s a.p

zY s a, a b , a a, a .Ž . Ž . Ž .zX s a, b b , b a, b b , a a, a b , b b , a b , b a, b a, a .Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž .

In the second iteration the modified X 2 is X 2 as shown in Fig. 7.1� 4S s b , x s b.p

zY s a, b b , b a, b .Ž . Ž . Ž .zX s b , a a, a b , b b , a b , b a, b a, a .Ž . Ž . Ž . Ž . Ž . Ž . Ž .

2 2 � 4In the third iteration the modified X is X as shown in Fig. 7. S s a ,2x s a.p

zY s a, a b , a a, a .Ž . Ž . Ž .zX s b , a b , b b , b a, b .Ž . Ž . Ž . Ž .

Therefore in the statement of Lemma 15

zU s a, a b , a a, a a, b b , b a, b a, a b , a a, a .Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž . Ž .

Note that aba is a minimal supersequence of X.

LEMMA 16. Gi en a supersequence z of X k, we can compute a superse-X < X < < <1r k < k <quence z of X of size z s z in time polynomial in X . Hence, gi en a

polynomial time approximation algorithm that achie es an approximationŽ . Ž . kfactor of f N where N is the input size whose instance is X , we can

FIG. 7. Example for the proof.


1r kŽ .construct an f N -approximation algorithm for the problem whose instance< k <is X that runs in time polynomial in X .

Proof. Let OPT be the size of the optimal solution for the problemkwhose instance is X k. The optimal solution for the problem whoseinstance is X is exactly OPT s OPT 1r k. This follows from the fact that Xk

1r k Ž .has a supersequence of size OPT by Lemma 14 and the fact that if Ykis a supersequence for X then Y k is a supersequence for X k. Assume

Ž .there exists a polynomial time f N -approximation algorithm for the LTSPproblem. This means that in polynomial time we can find a supersequence

k Ž .for the instance of X , which is of size at most f N ? OPT . But thiskimplies the size of the solution for the problem whose instance is X is at

1r k 1r kŽ Ž . . Ž .most f N ? OPT s f N ? OPT.k

4.3. Proof of Theorem

In the following proof we use a restricted version of the SCS problem,where there are no consecutive runs of the same alphabet in the se-quences. In other words there are no adjacent repeated characters in anysequence. Note that in an instance of this restricted version of the SCSproblem is also an instance of LTSP, with unit loading time. This problem

Ž .is MAX SNP-hard see Section 6 . This implies that there is a constantc such that if there exists a polynomial time c-approximation algorithmfor the restricted SCS problem, then P s NP. In other words, there

Ž OŽlog log n..is a constant c such that if there exists a DTIME n time c-approximation algorithm for the restricted SCS problem, then NP :

Ž OŽlog log n..DTIME n . Let X be an instance of size n of the restricted SCSproblem.

We now prove Theorem 10.

Proof. Let c be the constant such that for the problem with instanceŽ OŽlog log n..X, there does not exist a DTIME n time c-approximation algo-

Ž OŽlog log n..rithm, unless NP : DTIME n . We will show that if there is aŽ .d Ž .polynomial time log n approximation algorithm for LTSP for any d ,

then we can construct a c-approximation algorithm for the restricted SCSŽ OŽlog log n..problem that runs in DTIME n time. This would imply that

Ž OŽlog log n..NP : DTIME n .Suppose we are given a polynomial time approximation algorithm that

Ž . Ž .achieves an approximation factor of f N where N is the input size forLTSP. Since X and hence X k is an instance of LTSP, by Lemma 16,applying this algorithm to instance X k would imply an approximation

1r kŽ k .factor of f n for instance X. We would like to choose k such that1r kŽ k . Ž k . k Ž . df n F c. Hence, we require f n - c . Let f n s log n, and thus


Ž .d kk log n - c . We now pick

k s 2 log logd n q 2d log dc c

and this yields a c-approximation algorithm for X that runs in timek OŽlog log n.Ž Ž .. Ž .O poly N , where N s n is O n .

5. FORK]JOIN PROBLEMS

Given an instance of the fork]join problem, we show how to create aninstance of the LTSP such that the ratio of the cost of their optimalsolutions is a constant. More specifically if W is the highest execution timeinstruction, assuming that the lowest execution time is 1, in the fork]join

u vproblem, then for the instance of the LTSP, r F log W q 1, and the costof the optimal solution of the LTSP instance is no more than twice thecost of the optimal solution for the fork]join instance. We then use ourapproximation algorithm to find a solution to the instance of LTSP whosecost is within a factor r of the cost of its optimal solution. We finally showthat any solution for the LTSP instance can be mapped back to a solutionfor the fork]join problem instance without any cost increase. The aboveimplies that the algorithm obtains a solution for the fork]join probleminstance of cost at most 2 r times the cost of its optimal solution. This

Ž .therefore yields an algorithm with approximation ratio O log W for thefork]join problem. A slight modification of this technique yields an

Ž .algorithm with approximation ratio O log n where n is the number ofnodes.

5.1. Mapping a Fork]Join Instance to an LTSP Instance

Ž .The instance of the fork]join problem is given as a DAG V, U . V isthe set of nodes and U is the set of edges. Every node is labeled with anexecution cost. Assume all the execution costs in the fork]join problemare between 1 and W. This can be achieved by appropriate scaling.

Ž i iq1 x iq1Increase any cost that lies in the range 2 , 2 to 2 . Note that now weu vhave F log W distinct costs and the cost of any solution of the original

problem gets increased by at most a factor of 2.Ž X X . < <We now create a new DAG V , U by augmenting the graph with U

new nodes, each with zero execution cost and by subdividing the ith edgeŽ . Ž . Ž .x, y into two edges x, r and r , y , where r is the ith new node. Notei i ithat this changes neither the set of feasible solutions nor the cost associ-ated with them, but it increases the number of distinct costs.


We map this new instance of the fork]join problem to an instance ofthe LTSP as follows. The underlying DAG for the LTSP is the same; i.e.,V X is mapped to the set of tasks and U X is the set of edges. For everydistinct execution cost w we introduce a machine m with loading cost w ;i i i

i u vthus w s 0 and w s 2 , i G 1 and r F log W q 1. For every node0 iŽ . � 4j g V, M j s m ¬ w G execution cost of node j and for every nodei i

X Ž < <j g V _V these are the new U nodes introduced for edge subdivisions,. Ž . � 4and all of which have cost zero , M j s m .0

LEMMA 17. If OPT is the size of an optimal solution for an instance ofF Jthe fork]join problem then the size of the optimal solution for the LTSPproblem deri ed by the reduction described aboë is at most 2 ? OPT .F J

Proof. Let OPT X be the value of the optimal solution for the instanceF Jof the fork]join problem created by our mapping scheme. We have alreadyargued that OPT X F 2 = OPT . We now show that OPT F OPT X ,F J F J LT S P F Jthus completing the proof. Given an optimal solution FJ X for the instanceof the fork]join problem created by our mapping scheme, we show how toconstruct a solution L for the LTSP instance without any increase in cost.Let FJ X be represented by a partition of the vertices of the DAG into

Ž .blocks B , B . . . B . Note that the cost of B , denoted by w B , is the1 2 k i imaximum execution cost of the vertices in B . Also note that OPT X si F J

k Ž . Ž . Ž . u vÝ w B . By construction w B s w for some 1 F c i F log W . Byis1 i i cŽ i.construction the tasks corresponding to the vertices in block B can all beischeduled on machine m . Hence a solution to the LTSP instance L iscŽ i.obtained by loading the machines in the sequence m , m , m , m ,cŽ1. 0 cŽ2. 0

. . . , m , m . Note that the cost of this solution to the LTSP instance L0 cŽk .Xk Ž .is at most Ý w B s OPT .is1 i F J

PROPOSITION 18. Any feasible solution for the LTSP is a feasible solutionfor the instance of the fork]join problem, with the same cost, obtained bymapping tasks to instructions.

Ž ŽTHEOREM 19. An algorithm with an approximation ratio of O min log W,..log n can be designed for the fork]join problem.

Proof. Note that the cost of the optimal solution of the given fork]joininstance is at least W. Let us change the execution costs of the tasks such

Ž .that they all lie in the range Wrn, W . This is done by increasing theexecution cost of any task which has execution cost less than Wrn to Wrn.Note that this transformation can increase the cost of any feasible solutionby an additive factor of at most W. This is because there are at most ntasks whose execution cost increases by at most Wrn each. Therefore thecost of the optimal solution of the new instance of the fork]join problemis at most twice the cost of the optimal solution of the original instance of


the fork]join problem. Note that when we map the new instance of thefork]join problem to an instance of the LTSP the instance of the LTSP

Ž .will have at most O log n distinct machines. We have already shown howto do such a mapping of the fork]join instance to an instance of the LTSP

Ž .with at most O log W machines. Therefore we can map the instance ofthe fork]join instance to an instance of the LTSP with at most r sŽ Ž ..O min log W, log n machines. By using a r-approximation algorithm for

the LTSP we can find a solution of this LTSP instance whose cost is withina factor r of the cost of its optimal solution. From Lemma 17 andProposition 18 it follows that this feasible solution of the LTSP instance isa feasible solution of the original fork]join instance of cost at most

Ž Ž ..r s O min log W, log n times the cost of its optimal solution. This there-Ž Ž ..fore yields an O min log W, log n -approximation algorithm for the fork]

join problem.

5.2. Sarkar ’s Algorithm

We briefly review Sarkar’s algorithm for the fork]join problem. LetŽ .pred ¨ be the cost of the highest weight path from some node of in-degree

Ž .0 to ¨ , and let succ ¨ be the cost of the highest weight path from ¨ tosome node of out-degree 0. Note that they do not include the cost of ¨

Ž . Ž .itself. Sarkar calls pred ¨ the earliest starting time and DAGCP y succ ¨the latest completion time of node ¨ . Let DAGCP be the weight of theheaviest weight path in the DAG.

Initially all nodes are unmarked. The algorithm works in phases. Eachphase has five steps:

Ž .1 Set k to the unmarked node with largest execution cost. The setCurBlock is initialized to k.

Ž . Ž . Ž .2 Compute pred ¨ and succ ¨ for every node ¨ . Let T sPR EDŽ . Ž .pred k , T s succ k .SUCC

Ž .3 Compute ParallelSet, the set of nodes that can execute concur-rently with k.

Ž .4 Repeatedly do:choose a NODE j from ParallelSet so as to minimize

max T , pred j q max T , succ j .Ž . Ž .Ž . Ž .PR ED SUCC

Put j in CurBlock. ParallelSet is updated to contain nodes which can beŽ Ž ..done in parallel with nodes in CurBlock and T s max T , pred j ,PR ED PR ED

Ž Ž ..T s max T , succ jSUCC SUCC

Ž .5 If ParallelSet s B then collapse all nodes in CurBlock to node k,i.e., all the nodes in CurBlock will be done with node k in the final


Ž .schedule. Go back to step 1 with the reduced DAG, and we mark k to beŽ .a vertex that cannot be picked by step 1 .

5.3. Bad Example for Sarkar ’s Algorithm

Ž .Let G s V, E be a DAG representing the fork]join model, wherethere is a nonnegative cost function w, denoting the execution time,associated with each vertex. We will denote each vertex in V by arectangle whose vertical length is proportional to its cost. Consider aninstance of the fork]join problem where the DAG is a collection of log nchains each of unit total cost. Vertices in each chain have the same cost.The first chain has one vertex, the second chain has two vertices, the thirdchain has four vertices, the fifth chain has eight vertices, etc. The lastchain has nr2 vertices. We claim that any solution to this fork]join

1instance has cost at least log n. Consider any solution for this instance; it2

is a set of blocks B , B , . . . , B as defined before. Note that there must be1 2 kŽone block of cost 1, one other block of cost at least 1r2 because there are

.two vertices of cost 1r2 which cannot both be in the same block , twoŽother blocks of cost 1r4 because there are four vertices of cost 1r4 and

.only two of them can be in the previous blocks , four other blocks of cost1r8 and so on. So the total cost of the blocks must be at least 1 q 1r2 q 2

n 2 1? 1r4 q 4 ? 1r8 ??? s 1r2 q log n.4 n 2

We now construct a bad example for Sarkar’s algorithm. Let X denotethe chain constructed by concatenating the chains of the previous examplein order. That is, chain i q 1 is placed directly after chain i in X. Betweenchain i and i q 1 a vertex of unit cost is inserted. The first vertex in chainX is a vertex of unit cost. We call these vertices of unit cost in chain X

Žseparator vertices. We use chain X to construct log n chains which form.the bad example for Sarkar’s algorithm as follows: first concatenate m

copies of X to obtain a new chain S. From S we create log n chainsS , S . . . , where chain S is obtained from chain S by removing all the1 2 i

Ž .vertices strictly before the ith from the beginning ‘‘separator’’ vertex. Anexample for n s 8, m s 2 is shown in Fig. 8. Note that the vertices in thechains, shown in the figure, go from top to bottom.

Note that since every chain S is constructed by deleting some verticesiof chain S, S is a subchain of S. Therefore the optimal solution of theifork]join instance S , S . . . S is S s S and the cost of the optimal1 2 log n 1

solution is the cost of S which is m times the cost of X. Since the cost ofX is 2 log n the cost of the optimal solution is 2m log n.

Sarkar’s algorithm when applied to our example does the following. InŽ .the ith phase it merges the ith from top in Fig. 8 separator vertices of

Ž .the chains, for the first Q m log n phases. This is because each separatorŽ .vertex has unit cost the largest cost for any vertex , and the merging leads


FIG. 8. Bad example for Sarkar’s algorithm.

Žto no increase in T or T if the first vertex to be picked vertex kPR ED SUCC.in the algorithm for the phase is the ‘‘separator’’ vertex in the longest

Ž .chain. Note also that the vertices between the ith and i q 1 th merged‘‘separator’’ vertices are an instance of the fork]join problem, mentionedat the beginning of the section for which the cost of any solution is at least1 log n. So the solution produced by Sarkar’s algorithm would have cost at2

Ž 2 .least Q m log n , but as shown earlier the cost of the optimal solution isŽjust the length of the longest chain, which is 2m log n. Therefore for large

. Ž .m Sarkar’s algorithm gives a V log n approximation.

6. NP-COMPLETENESS PROOF

We prove that the loading time scheduling problem is NP-complete evenŽ .for the case of a constant number of machines, and M i s 1, by a

polynomial time reduction from the shortest common supersequence prob-


w x.lem 5, problem SR8 . The proof also shows that the LTSP is MaxSNP-hard.

Shortest common supersequence. Given a finite alphabet S, a finiteset R of sequences from SU , and a positive integer K, is there a string

U < < iX g S with X F K such that each sequence S g R is a subsequence ofX ; i.e., X s x si x s i . . . s i x where each x g SU and Si s s i s i . . . s i?0 1 1 2 p p j 1 2 p

< < w xThis problem is known to be NP-complete even when S s 2 13, 10 .w xRecently, it has been shown to be Max SNP-hard 8 for an arbitrary

alphabet size.

Ž .THEOREM 20. LTSP is NP-complete for p G 4 and Max SNP-hard.

Proof. It is easy to see that the problem is in NP since we can verify agiven partitioning of V easily. We will prove that it is NP-hard by a

Ž .reduction from the shortest common supersequence problem SCS . As-1 ll ll < i <sume that R contains sequences S , . . . , S and that Ý S s n.is1

Ž .We will prove the problem NP-complete for r G 4 and Max SNP-hardeven for the case when each task can be done only on a single machine,

< Ž . <i.e., M i s 1.Note that a sequence x x . . . simply denotes a set of tasks, such that1 2

task i can be done on machine x and that x needs to be done beforei ix . For the reduction we would like to have that a shortest superse-iq1quence of the SCS instance is an optimal solution of the correspondingLTSP instance and vice versa. Note that this may not be possible if wedirectly view the instance of the SCS problem as an instance of the LTSPinstance. This is because if two adjacent tasks i and j have a common

Ž Ž . Ž ..machine M i s M j then in the optimal solution they may be sched-uled on the same machine for a unit cost, but two adjacent characters in asequence must map to different characters of the shortest common super-sequence. In fact the cost of the optimal solution of the LTSP instancemay be arbitrarily smaller than the length of the shortest common superse-quence. Therefore in our reduction we need to introduce dummy tasks

Ž .between two adjacent tasks adjacent characters of a sequence to ensureŽ . Ž .that if i and j are two adjacent tasks then M i / M j . We also need to

introduce some dummy machines for this purpose. The set of machines isX X � X 4 X X

S j S , where S s a ¬ a g S . We are assuming that for every a g S , ais a new letter not already in S. So the number of machines is twice the

Žalphabet size. The loading time for all the machines is the same unit.loading time .

For the reduction, we create a set RX of chains C1, . . . , C ll, RX s� X X X X 4s s s s s s . . . s s ¬ s . . . s g R . In other words, we replace every letter1 1 2 2 3 3 p p 1 ps by s sX.j j j


We claim that the optimal schedule for the LTSP has loading time 2 K ifand only if the shortest supersequence of the SCS is of length K.

We first prove that if the shortest supersequence is of size K then thereis a schedule with loading time 2 K. Let the shortest supersequence beX s X X . . . X , where X g S. We leave it to the reader to verify that1 2 K iX X X . . . X X X is a valid schedule for all the chains.1 1 K K

We now prove that if the shortest supersequence is of length K, thenany schedule must have length G 2 K. Let the schedule have length L. Wecan view this schedule as a sequence X from the alphabet S j S

X oflength L. From this sequence we obtain two sequences X X and X Y. X X isobtained from X by removing all the letters from S

X. X Y is obtained fromX by removing all the letters from S and then substituting every letter by

Ž X . Xits corresponding letter in S i.e., replace s by s . Note that both X andj jY < X < < Y <X are supersequences for the SCS. Therefore X G K and X G K.

X Y< < < <Thus, L s X q X G 2 K.

The following theorem is obvious.

THEOREM 21. The loading time scheduling problem can be sol ed inpolynomial time for r s 2.

w xA recent result by Middendorf 12 implies that LTSP is NP-completefor r s 3.

7. WEIGHTED SHORTEST COMMONSUPERSEQUENCE PROBLEM

As an application of our result we give a r-approximation algorithm forthe WSCS problem. WSCS is a generalization of the SCS problem wherethe letters in the alphabet have weights associated with them and we wantto compute a common supersequence with minimum weight. Note that, ifwith every letter of the alphabet we associate a machine, with loading timeequal to the weight of the letter, then the sequence of machines thatcorresponds to the optimal solution to the WSCS instance would occur asa subsequence of a prefix of the universal sequence of total loading time atmost r ? OPT. Here, OPT is the weight of the optimal solution of theWSCS instance. This follows from the fact that Theorem 8 holds even forthose schedules where consecutive schedules of the same machine areallowed. So the algorithm presented in Subsection 3.2 gives a r-approxima-tion for the WSCS problem.


ACKNOWLEDGMENTS

We are extremely grateful to Satyandra Gupta for telling us about this scheduling problem.We thank Yossi Azar and Yishay Mansour for letting us include their suggestions regarding

Ž .universal sequences Subsection 3.2 . We thank Amos Fiat and Uzi Vishkin for helpfulw xdiscussions. We thank Tao Jiang and Ming Li for a pointer to Ref. 13 .

REFERENCES

1. Y. Azar, personal communication, 1995.2. D. Das, S. Gupta, and D. Nau, Reducing setup cost by automated generation of redesign

suggestions, ‘‘Proc. ASME Computers in Engineering Conference,’’ pp. 159]170, 1994.3. J. Ferannte, K. Ottenstein, and J. Warren, The program dependence graph and its uses in

optimization, ‘‘ACM Transactions of Programming Languages and Systems,’’ pp. 319]349,1987.

4. D. Foulser, M. Li, and Q. Yang, Theory and algorithms for plan merging, ArtificialŽ .Intelligence, 57 1992 , 143]181.

5. M. R. Garey and D. S. Johnson, ‘‘Computers and Intractability: A Guide to the Theory ofNP-Completeness,’’ Freeman, San Francisco, 1979.

6. S. Gupta, personal communication, 1994.7. C. C. Hayes, A model of planning for plan efficiency: Taking advantage of operator

overlap, ‘‘Proc. of the 11th International Joint Conference of Artificial Intelligence,’’ pp.949]953, 1989.

8. T. Jiang and M. Li, On the approximation of shortest common supersequences andŽ .longest common subsequences, SIAM J. Comput. 24 1995 , 1122]1139.

9. E. Lawler, J. Lenstra, A. Rinnooy-Kan, and D. Shmoys, Sequencing and scheduling:algorithms and complexity, in ‘‘Handbooks in Operations Research and Management

ŽScience, Vol. 4: Logistics of Production and Inventory’’ S. C. Graves, A. H. G. Rinnooy.Kan and P. Zipkin, Eds. , North-Holland, Amsterdam, 1993.

10. D. Maier, The complexity of some problems on subsequences and supersequences, J.Ž .Assoc. Comput. Mach. 25 1978 , 322]336.

11. Y. Mansour, personal communication, 1995.12. M. Middendorf, Supersequences, runs, and CD grammar systems, in ‘‘Developments in

Ž .Theoretical Computer Science’’ J. Dassow and A. Kelemenova, Eds. , pp. 101]114,Topics in Computer Science, Vol. 6, Gordon and Breach, Amsterdam, 1994.

13. K. J. Raiha and E. Ukkonen, The shortest common supersequence problem over a binaryŽ .alphabet is NP-complete, Theoret. Comput. Sci. 16 1981 , 187]198.

14. V. Sarkar, Instruction reordering for fork]join parallelism, in ‘‘Proc. of ACM SIGPLAN-PLDI Conf,’’ pp. 322]336, 1990.

Ž .15. T. Sellis, Multiple-query optimization, Trans. Database Systems 13 1988 , 23]52.

Documents

The Loading Time Scheduling Problem