10
An Optimal Algorithm for the Longest Common Subsequence Problem * Hua Lin Mi Lu Texas A & M University Jesse Fang Hewlett-Packard Laboratories Abstract The longest common subsequence problem is to find a longest common subsequence of two given strings. The complexity of this problem on the de- cision tree model is known as mn, where m and n are the lengths of these two strings, respectively, and m _< n. We present a parallel algorithm for this problem on the CREW PRAM model, which takes O(log2mloglogm) time with mn/log2mloglogm pro- cessors when log’mlo logm > logn, or otherwise O(1ogn) time with mnfiogn processors. 1 Introduction An alphabet is a finite set of symbols. A string over an alphabet is a sequence of symbols from the alpha- bet. Given a string, a subsequence of the string can be obtained from the string by deleting none or some symbols (not necessarily consecutive ones). If string C is a subsequence of both string A and string B, then C is a common subsequence (CS) of A and B. String C is the longest common subsequence (LCS) of string A and B if C is a common subsequence of both and is as long as any other common subsequence. For example, string “tcagg” is the longest subsequence of strings “tcaggatt” and “gatttatgcagg” . Given two strings A and B with length m and n, m 5 n, re- spectively, the LCS problem is to identify the longest common subsequence of A and B. Aho et a1 in [AHU76 showed that under the deci- not two positions have or do not have the same sym- bol, mn times comparisons are necessary for solving this problem, unless the alphabet size is fixed. There are some sequential algorithms reached this lower time bound [Hirschberg77] [HD84]. In recent years several parallel a1 orithms have been designed [AP88][AALMSO],[LuSO]fLLAS1]. Among them, Aggarwal and Park in [AP88], and Apostolic0 et a1 in [AALMSO] have independently shown that this problem (and the more general problem called string-editing problem) can be solved sion tree model, in whic h all decisions are whether or ‘This research was partially supported by the National Sci- ence Foundation under grant no. MIP 8809328. in O(logm1ogn) time using mnllogm processors on CREW-PRAM. Their approach is relating the string- editing problem with the problem of recognizing the shortest path from source to sink on a m + 1) x conquer scheme to compute the “distance matrix” which records the shortest distance from every vertices on the left (or top) boun4ary of the grid directed graph to every vertices on the bottom (or right) boundary. Following the same approach but more concentrating on exploitin the nice properties of the LCS problem, Lin ei a1 in fLLA911 have defined a different distance matrix which has a more compact form than the pre- vious one, and shown that, on CREW-PRAM model, O(1og’m + logn) time using mn/Iogm processors suf- fices to solve the LCS problem. However, in terms of the product of the time bound and the number of pro- cessors used, none of them achieve the lower bound mn. In this paper, we propose an optimal algorithm for the LCS problem in the sense that the time x processors matchs the sequential lower bound of the problem. The computation model we used is the concurrent-read and exclusive-write (CREW) parallel random access machine PRAM). A PRAM employs memory. A CREW-PRAM allows simultaneous access by more than one processors to the same memory loca- tion for read but not for write purpose. Our algorithm takes O(1og’mloglogm) time with mn/log’mloglogm processors when log’mloglogm > logn, or otherwise takes O(1ogn) time with rnnllogn processors. This paper is a continuation of paper [LLASl]. The improvement made in this pa er based on the following observation. In [LLASlr, like in [AP88] and [AALMSO], at each stage of the logm stages of “conquer” we are to compute the distance ma- trix. However, for a sub-grid directed graph of size (n + 1) x (mi + l), the size of the corresponding dis- tance matrix is mi x n. So suppose that at the i-th “conquer stage” the (TTI + 1) x (n + 1) grid directed graph has been divided into O(ml2’) sub-grid directed graphs of size 2’ x (n + l), we have to compute O(mn) (n + 1) grid directed graph and use a d ivide-and- synchronous processors a \ 1 having access to a common 630 0-8186-2310-1191 $l.aOS 1991 EEE

An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

An Optimal Algorithm for the Longest Common Subsequence Problem *

Hua Lin Mi Lu

Texas A & M University

Jesse Fang Hewlett-Packard Laboratories

Abstract The longest common subsequence problem is to

find a longest common subsequence of two given strings. The complexity of this problem on the de- cision tree model is known as mn, where m and n are the lengths of these two strings, respectively, and m _< n. W e present a parallel algorithm for this problem on the CREW PRAM model, which takes O(log2mloglogm) time with mn/log2mloglogm pro- cessors when log’mlo logm > logn, or otherwise O(1ogn) t ime with mnfiogn processors.

1 Introduction An alphabet is a finite set of symbols. A string over

an alphabet is a sequence of symbols from the alpha- bet. Given a string, a subsequence of the string can be obtained from the string by deleting none or some symbols (not necessarily consecutive ones). If string C is a subsequence of both string A and string B , then C is a common subsequence (CS) of A and B. String C is the longest common subsequence (LCS) of string A and B if C is a common subsequence of both and is as long as any other common subsequence. For example, string “tcagg” is the longest subsequence of strings “tcaggatt” and “gatttatgcagg” . Given two strings A and B with length m and n, m 5 n, re- spectively, the LCS problem is to identify the longest common subsequence of A and B.

Aho et a1 in [AHU76 showed that under the deci-

not two positions have or do not have the same sym- bol, mn times comparisons are necessary for solving this problem, unless the alphabet size is fixed. There are some sequential algorithms reached this lower time bound [Hirschberg77] [HD84].

In recent years several parallel a1 orithms have been designed [AP88][AALMSO],[LuSO]fLLAS1]. Among them, Aggarwal and Park in [AP88], and Apostolic0 et a1 in [AALMSO] have independently shown that this problem (and the more general problem called string-editing problem) can be solved

sion tree model, in whic h all decisions are whether or

‘This research was partially supported by the National Sci- ence Foundation under grant no. MIP 8809328.

in O(logm1ogn) time using mnllogm processors on CREW-PRAM. Their approach is relating the string- editing problem with the problem of recognizing the shortest path from source to sink on a m + 1) x

conquer scheme to compute the “distance matrix” which records the shortest distance from every vertices on the left (or top) boun4ary of the grid directed graph to every vertices on the bottom (or right) boundary. Following the same approach but more concentrating on exploitin the nice properties of the LCS problem, Lin e i a1 in fLLA911 have defined a different distance matrix which has a more compact form than the pre- vious one, and shown that, on CREW-PRAM model, O(1og’m + logn) time using mn/Iogm processors suf- fices to solve the LCS problem. However, in terms of the product of the time bound and the number of pro- cessors used, none of them achieve the lower bound mn.

In this paper, we propose an optimal algorithm for the LCS problem in the sense that the time x processors matchs the sequential lower bound of the problem. The computation model we used is the concurrent-read and exclusive-write (CREW) parallel random access machine PRAM). A PRAM employs

memory. A CREW-PRAM allows simultaneous access by more than one processors to the same memory loca- tion for read but not for write purpose. Our algorithm takes O(1og’mloglogm) time with mn/log’mloglogm processors when log’mloglogm > logn, or otherwise takes O(1ogn) time with rnnllogn processors.

This paper is a continuation of paper [LLASl]. The improvement made in this pa er based on the following observation. In [LLASlr, like in [AP88] and [AALMSO], at each stage of the logm stages of “conquer” we are to compute the distance ma- trix. However, for a sub-grid directed graph of size (n + 1) x (mi + l), the size of the corresponding dis- tance matrix is mi x n. So suppose that at the i-th “conquer stage” the (TTI + 1) x (n + 1) grid directed graph has been divided into O(ml2’) sub-grid directed graphs of size 2’ x (n + l), we have to compute O(mn)

(n + 1) grid directed graph and use a d ivide-and-

synchronous processors a \ 1 having access to a common

630

0-8186-2310-1191 $l.aOS 1991 E E E

Page 2: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

g a t t t a t g c a g g t I

C 2

a 3

4 4

9 5

a 6

t 7

t 8

a I 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3

Figure 1: The grid DAG for string “tcaggatt” and “gatttat gcagg”

entries of these matrices. Totally, we have to compute O(mn1ogrn) entries. This suggests that any attempt of computing and recording the distance matrices di- rectly will destroy the hope of achieving the optimal algorithm. By using the nice roperties of the distance matrix exploited in [LLAglf we invent an efficient data structure to represent the distance matrix such that any entry of the distance matrix can be obtained from the data structure very fast and the size of the data structure representing an n x rn distance matrix is much smaller than that of an TI x m 2-dimensional array representing the same distance matrix.

The remainder of the paper is organized as follows: In section 2, we show how the LCS problem can be viewed as the longest path problem on the grid di- rected graph, establish the main structure of our al- gorithm, and introduce some basic ideas used in “con- quer stage”. Section 3 concentrates on exploiting the properties of the distance matrix. In section 4, these properties are applied to achieve an optimal algorithm.

2 Solve the LCS problem through grid DAG

2.1 The grid directed acyclic graph An 11 x 12 grid DAG is a directed acyclic graph

whose vertices are the 11 x 12 grid points of an 11 x 12 grid. The only edges from grid point (i,j), referred to as vertex (i,j), are to vertices i , j + I), ( i + l ! j ) and

vertical and diagonal edges, respectively. Vertex is the source, and vertex ( 1 1 , l z ) is the sink. (!!;:I! twostrings A = a l , a 2 , . . . , a , and B = b 1 , b 2 : . . , b 9 , the grid DAG, G, associated with strings A and B 1s an (m + 1) x (n + 1) grid DAG such that each edge on G is associated with cost 1 if it is a diagonal edge from vertex ( i , j ) to vertex (i + 1 , j + 1) and symbols ai and b. are identical, otherwise associated with cost 0. The jength of a path on G is defined as the sum of costs on the path. Throughout, we presume that rn, the length of A , is a power of 2. As an example, Figure 1 shows the grid DAG associated with strings “tcaggatt” and “gatttatgcagg” .

(i + 1, j + 1). Sometimes we re 6 er them as horizontal,

2 7 9 1 2 3 7 9 1 2 4 7 9 1 2 5 7 9 1 2 6 7 9 1 2 7 9 11 12 8 9 11 12 9 11 12 00

10 11 12 00

11 12 00 00

1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Figure 2: D G ~ and D G ~

2 3 4 5 3 4 5 0 0 4 5 0 0 0 0

5 6 0 0 0 3

6 8 0 0 0 3

7 8 c o o 0 8 1 1 0 0 ~ 0 9 1 1 0 0 0 0 1 1 0 0 0 0 0 3 1 1 0 0 0 3 0 0 1 2 0 0 0 3 0 0 1 3 c o m c m 0 0 0 0 0 0 0 0

OBSERVATION 2.1: Any path of length 1 on g r i d DAG G associated with strings A and B corresponds to a CS with length 1 of A and B. In particular, the longest path between the source and the sink corre- sponds to the LCS of A and B.

So, given strings A and B with lengths m and n, respectively, to find the LCS of A and B we only need to find the longest path beginning at the source and ending at the sink on grid DAG G associated with A and B. Consider the longest paths from a vertex on the top row of G, say vertex (1, i), to vertices on the bottom row. A vertex w on the bottom row is the j - th breakout vertex with respect to (w.r.t.) wertex (1, i) if w is the left most vertex on the bottom row such that the path from vertex (1, i to w is of length j .

for short.

th breakout vertices w.r.t. the source. The n x m distance matrix associated with G , D G ,

is defined as follows: for 1 5 i 5 n and 1 5 j 5 m, D ~ ( i , j ) = k if vertex ( m + 1, k) is the j - th breakout vertex w.r.t . vertex ( l , i) , or D ~ ( i , j ) = “CO” if vertex (1, i) does not have the j-th breakout vertex.

Throughout, by D& we denote the i-th row of DG, and by DcU and D G ~ we denote the distance matrices associated with Gu, the upper half of G and GL, the lower half of G, respectively. In Figure 2, D G ~ and DcL correspond to G in,Figure 1. 2.2

four steps:

Sometimes we simply call the b reakout vertex w.r.t. vertex ( l , i ) , or the breakout vertex of

In Figure 1, vertices (9,2), (9 ,3 (9,5) and (9,13) are the 1-st, 2-nd, 3-rd, 4-t

The main structure of the algorithm Basically the algorithm consists of the following

1. compute DG,, where Gi is a 2 x ( n + 1 ) grid DAG consisting of the i- th and ( i + 1)-th rows of G , f o r I s i s m ;

2. recursively compute DG from D G ~ and D G ~ ;

9. identify wedices on the longest path between the source and the sink on G with the help of distance matrices;

63 1

Page 3: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

4. identify the subsequence ofA that corresponds t o

In these four steps, Step 1 and Step 4 are the easy ones, so we would like to make a short description on the implementations of them in the next sub-section. Step 2 and 3 (especially Step 2 ) are quite complicated, we will only provide the basic idea of the implemen- tations in this section and leave the details to Section 4. 2.3 The implementations of Step 1 and

Let Gh be a 2 x (n+l) grid DAG, consisting of the h- th and (h+l)-th rows of G and D G ~ be the correspond- ing distance matrix with size n x 1. Let bi , , . . . , b i ; , the il-th, + ., i,-th symbols in B , be all symbols iden- tical to a h , the h-th symbol in A . I t should be appar- ent that, by the definition of Gh, the values of entries from D G ~ ( ~ , 1) to D ~ ~ ( i l , 1) are i l + 1, the values of entries from DGh(il + 1,1) to D ~ ~ ( i z , 1 ) are i z + 1, and so on; when i, < i 5 n, the values of entries D ~ ~ ( i , l ) are “00”. For example, consider the sub- grid DAG, GI , of G shown in Figure 1, and we have D G I - - (4,4,4,5,6,8,8,00,00,00,00,00)~. Aparallel algorithm for generating D G ~ , which takes logn time by using nllogn processors, should be trivial.

Now we turn to Step 4. Let p =< w1, w2,. . . , w1 > be the longest path obtained in Step 3. In Step 4, any symbol ai in A will be marked if edge e = (wk, w k + 1 ) in p has cost 1 and vk is vertex ( i , j ) for some j . The LCS of A and B , corresponding to p , can be simply obtained by ranking those marked symbols. Since the length of p is bounded by 271, and checking the cost on an edge can be done in constant time, so marking symbols in A can be done in constant time with n processors, or in O(1ogn) time with nllogn processors. The ranking job can be done in O(1ogn) time with n processors by using the standard technique[GR88]. Thus O(1ogn) time and n processors suffices to Step 4. 2.4

size n x m/2,

from D G ~ and D G ~ as follows:

the longest path.

Step 4

Basic ideas for Step 2 Given distance matrices D G ~ and D G ~ each with

It was shown in [LLASl] that DG can be obtained

LEMMA 2.1: For 1 5 i 5 n and 15 j 5 m,

I , m 4

where we define D ~ ~ ( i , j ) = DGL(i,j) = “CO” if j > m/2, and DGL(DGU(i, k ) , j - k ) = “00” i f DGu(i, k) = “03 ”.

Lemma 2.1 suggests that computing entry DG(i, j) of DG from D G ~ and D G ~ is nothing more than iden- tifying the minima among O(m) entires. This can be trivially done in O(1ogm time with rnllogm proces-

pute DG from D G ~ and D G ~ with nm2/logm proces- sors. To achieve an efficient method to compute DG, a more organized form of Lemma 2.1 is necessary.

sors, which implies that J (logm) time suffices to com-

X

Figure 3: copy of x

Figure 4: M[D&]

For a row-vector 3: of size m, the copy of it is a row-vector of size 2m, denoted as CY [ k , x] for some k between 1 and m, such that the entries on CY[k,x] from CY[k, z](k) to CY[k, z](k+ m - 1) are copies of the entries on z from c(1) to z (m) while other entries hold “00”. See Figure 3. Given DG,, and D G ~ , the n x m/2 distance matrices associated with Gu and GL, we define n matrices M[D&], for 1 5 i 5 n, as follows (also see Figure 4):

1. the size of M[D$] is 1 x m, where 1 is the number of breakout vertices w.r.t. vertex (1, i) on Gu;

DG,,(;,~) 2. the j-th row of M [ D & ] is C Y [ j + 1, DGL 3 , for 1 5 j 5 1.

As an example, let D G ~ and D G ~ be matrices shown in Figure 2. Figure 5 shows the matrices M[D&] and M[Di] , respectively.

It is not hard to check‘that by using matrix M[D&], Lemma 2.1 can be rewritten as follows.

LEMMA 2.2: For 1 5 i 5 n and 1 5 j 5 m,

where 1 is the number of breakout vertices w.r.t. vertex (1 , i ) on Gu.

Thus, the computation of DG from D G ~ and D G ~ can be simply viewed as identification of the column minima of n matrices, M[D&], denoted by Cmin[M[D&]] for 1 5 i 5 n. There are a lot of redundant information in DG. By using an efficient data structure to represent DG, we can remove the redundancy, hence simplify the computation greatly.

632

Page 4: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

Figure 5: M [ D k ] and M [ D i ]

3

3.1 Properties of DG

Properties of DG and M[D&] Most of the properties were exploited in [LLASl].

We first state some simple facts about DG. PROPOSITION 3.1:

1. D G ( ! , ~ I ) < D ~ ( i , j 2 ) i f j 1 < j 2 and D G ( ~ , ~ I ) #

2. D ~ ( i 1 , j ) 5 D ~ ( i 2 , j ) i f i l < i2; ;

3. DG(i + 1,j) 5 DG(i , j + 1);

4. if D ~ ( i 1 , j l ) = k and k # "m", then for any i2, il < i2 5 k , there exists j g such that D ~ ( i 2 , j 2 ) = k ;

5. i f D ~ ( i 1 , j l ) = DG( , ig , j ? ) = k l , then f o ~ any i, il < i < 22, there exzstsj such that D G ( z , ~ ) = k1 .

Proposition 3.1(4) and Proposition 3.1(5) s y 8 e s t the fact that many rows in DG may be very simi- lar" to each other. Given a row-vector, a sub-row- vector of it is obtained from it by deleting none or some entries (not necessarily consecutive ones). If a row-vector is a sub-row-vector of cl, c g , . . . , cl, then it is a common sub-row-vector (without causing mis- understanding, we simply call it common row-vector) of them. Row-vectors q , t 2 , . . . , 21, each with size m, are k-varzani if there exists a common row-vector of them such that the size of it is no less than m- k . Ob- viously, for example, any consecutive two row-vectors in D G ~ and D G ~ shown in Figure 2 are 1-variant.

THEOREM 3.1 [LLA9 11:

1. DL and 0;" , the i-th and (i + 1)-th TOW of DG,

2. Any k + 1 consecutive TOWS of DG are k-variant.

Consider k + 1 consecutive row-vectors in DG, say D$ for il 5 i < il + IC. Common row-vector L of them is not necessarily corresponding to consec- utive entries of them. w e partition L into groups, L1, L2, . . . , L,, such that elements in the same group are consecutive entries in every D b . The remnant of DL w.r.t. L (the remnant of DL for short) is a

are 1-variant, f o r 1 _< i < n;

vector R[DL] = (Rf , Ra,. . . , Rf+l) such that D$ = ( R ' ; , L l , R ~ , . . . , L ,,R',+,). The size of Rj may be zero. The size of R [ D b ] is defined by the sum of the sizes of Rj, for 1 5 j 5 k .

For example, consider the 1-st and 2-nd rows of in Figure 2. The cqmmon row-vector of them is

L = (7 ,9 ,12 ) and the remnants of them are R[Dk,] = (2) and R[D$U] = (3) , respectively. Now we state some facts about common row-vector and remnants by the following two propositions.

For k + 1 consecutive row- vectors Dt, of D G , il < i 5 il + k ,

PROPOSITION 3.2:

1. common row-vector L of D 2 and D$+k is dhe

2. L can be partitioned into at most 2k groups;

3. Ri' is a sub-vector of RF i f i 1 < i 2 , for 2 5 j 5

Proposition 3.2(1) and 3.2(3) can be de- rived directly from Proposition 3.1(5). As for Propo- sition 3.2(2), we observe that, since there are at least m - k entries in L , L can be partitioned into at most 2k groups such that elements in the same,group correspond to consecutive entries in both D: and Dk+" Moreover, Proposition 3.1(4) suggests that if kl, kz, . . . , kh are some consecutive entries in both D 2 and D k + k , then they are also consecutive ones in any DL, for il < i < il + IC. ,U

Theorem 3.1 and Proposition 3.2 suggest the fol- lowing conclusion.

COROLLARY 3.1: Each row-vector DL of any k + 1 consecutive rows i n DG can be represented b y their common row-vector, consisting of at most 26 groups, and remnant R [ D k ] with size a t most 6 . 3.2 Properties of M [ D & ]

Recall that M [ D $ ] is produced from DLu and some related rows in D G ~ . Therefore, the information re- dundancy in DG suggests the information redundancy in M [ D b ] . Let

common row-vector of them; moreover,

r + l .

Proof:

X = ( X ' , . . - , X m ) a n d Y = ( Y ' ; . . , Y n )

be two matrices, where X i and Y i are the i-th columns of X and Y , respectively. X is obtained from Y b y k- shift if there is a number k such that

x denoted by X = S [ k , Y ] .

where Mis are ma- trices each with m columns, is a common matrix of A ~ , A ~ ; . . , A I , each with m columns, if there exists K(i , j ) , 1 5 i < 1 and 1 5 j 5 r , such that

= ( y m - k + l . . . , y m , y ' , . . . , y m - k - 1 , ym-k i ) ,

Matrix M C = (Ad1,. . . ,

(s[I{(~, 11, M I ] , s[I<(~, 21, ~ 2 1 , . ., s[~<(i, r), ~ r ] ) ~ r

63 3

Page 5: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

is a sub-matrix of Ai and all rows in S K ( i , j ) , Mj] are consecutive ones in Ai. For example, t 6 e following matrix is a common matrix of M[D,L] and M [ D 3 in Figure 5:

0 O 0 O O 3 l l O 3 0 O 0 O m , 0 0 0 0 8 l l m 0 0 0 0 0 0 ( 00 00 00 03 13 00 00 03

We call matrix Mi the j - t h group of M C . A I , Az, . . . , AI are k-varzant if there exists a common matrix M C of them such that for each Ai there are at most k rows of it, which are not in M C . As an example, obviously, matrices M[D,?J and M [ D i ] in Figure 5 are 1-variant.

Intuitively, the similarity among M[D&]s is due to the similarity among Dks, which has been described by the following statement in [LLASl].

If D& and D,&, are k-variant, then M [ D $ ] and M[Dg] are k-variant. Moreover, if L = (LI , L 2 , . . . , L,) is the common row-vector of DZu and DZu, where Lj is the j - t h group of L, then M C = (MI, M2, . . . , M y ) is the common matrix of M[D2] and M[Dg], where Mj is constructed as follows: the i-th row of Mj is CY[i , D2:”], where Lj # (‘0”’.

COROLLARY 3.2: A n y k + 1 consecutive matrices, M [ D ~ ] , M[D;+~], ’ e . , M[D’+‘], for 1 5 i 5 n - I C , are k-varaant; moreover, eac% of them can be repre- sented b y their common matrix, consisting of at most 2k groups, and at most 6 rows which are not in the common matrix.

Another nice property of M[D$, which plays important role in our algorithm, IS called totally monotone[AP88]. A 2-dimensional matrix 2 is mono- tone if the minimum value in its i-th column lies below or to the right of the minimum value in its (i - 1)-th column. (If more than one entry has the minimal value in the column, then we take the upmost one). In par- ticular, Z is totally monotone if every 2 x 2 sub-matrix of it is monotone.

If we distinguish “00” entries in M[Dt;] in such a way that 1) if two “03” entries in the same column the one having a smaller index is larger than the one hav- ing a larger index if both of them are a t the right side of some “finite” entries in the same rows, 2) otherwise the one having a smaller index is smaller than the one having a larger index, then we can prove the following result.

THEOREM 3.3[LLA91]: M[&] is totally monotone, for 15 i 5 n.

4 The optimal algorithm for the LCS problem

The main steps of this algorithm is described in Section 2.2. In this section we concentrate on the im- plementations of Step 2 and Step 3. We are first going to establish the data structure for distance matrix, and then giving the implementations of Step 2 and Step 3 and finally discussing the complexity of our algorithm.

1

THEOREM 3.2:

4.1 The data structure for representing DG

We use common vectors and remnants to represent distance matrix DG . More precisely, for each consec- utive k + 1 rows, say Dt; for 1 5 i 5 k + 1, of rn x n distance matrix DG, we use an 1-dimensional array to represent common row-vector L of them and use k + 1 1-dimensional arrays to represent k + 1 rem- nants R[D’ 1s Besides, .in order to access any entry of DG on &is data structure very fast, we also keep some auxiliary information. The position of group LS in the row-vector 0‘; is the index of the entry in 0; identical to the first element in Li. We denote it by Pos[p;, Li], and we denote by P[D$, L] the function for which P[Dh, L](i) = Pos[D$, Li], for 1 5 i 5 r. Similarly, we define Pos[&, Ri]s and P[D;, R[&]s. With the help of position functions, we can access any entry of DG on this data structure by using binary search. Therefore, it should be apparent that any read or write operations on DG can be executed in O(1ogk) time sequentially on this data structure.

As an example, if we choose k = 2, then the first three row-vectors of DG

Dk = (2,3,4,5,13,oo,oo,oo) D i = (3 ,4,5,11,13,co,oo,oo) D; = (4 ,5,6,11,13,oo,oo,m)

will be represented by the following data structure:

L = (4 ,5,13,00,m, m)

R P i 1 = (2,3) R [ D i ] = (3 , l l ) ‘

P P i , LI = (3,5) P [ a , RPk11 = (1) P P i , LI = (2,5) P [ G , R[G11 = (1,4) P [ G , LI = (1,5) P [ G , R[G11 = (3)

R[@] = ( 6 , l l )

Two simple facts about the position function, which will be used later, are stated as follows.

PROPOSITION 4.1: Let Lh be the h-th group of L, the common row-vector of DL, for il 5 i 5 il + k , then

1. POS[D$+’, Lh] 5 POS[DL, Lh] 5 Pos[D$, L*];

2. POS[D;+’, Lh] 2 Pos[D$, Lh] - k .

Proof: Let POS[D?,L~] = j , i .e., ~ G ( i l , j ) = Lh(1). For any i, il 5 i, by Proposition 3.1(2), we have

From this, by applying Proposition 3.1(1), we have

that is, Pos[DL, Lh] 5 Pos[D&Lh]. Similarly we can show that Pos[D&, Lh] 2 Pos[D;;’+~, Lh].

D G ( i , j ) 2 D G ( i l , j ) = Lh(1).

Pos[DL, Lh] 5 j ,

634

Page 6: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

Now we show Proposition 4.1(2). By Proposition 3.1(3) we have

DG(i1 + 1 , j - 1) 5 D ~ ( i i , j ) = Lh(l),

that is Pos[D;+~, Lh] 2 j - 1. By using Proposition 3.1(3) repeatedly, we have P o s [ D $ + ~ , L ~ ] 2 j - E , thus we have Pos[D$+~, Lh] 2 Pos[D;, &h] - E . 0

From now on, we presume that any distance ma- trix is represented by the above data structure. So, by computing DG we mean computing the common row- vectors, corresponding remnants and position func- tions. Now we would like to make a short comment on why the above representation of DG could help us to achieve our objective. Note that, by Corollary 3.1, k + 1 consecutive rows of DG with size n x m can be represented by their common vector L , with size at most m, and k + 1 remnants, each with size at most E , hence O(mn/k + nk) space suffices to store dis- tance matrix DG. By this representation, not only the redundant information in DG has been removed and thus saved the space, more importantly, the number of entries to be computed has been greatly reduced. Thus it becomes possible to compute DG within the required number of operations. 4.2 The implementation of Step 2 in the

main structure We first establish here the time bound of the al-

gorithm for computing DG from D G ~ . and D G ~ , and provide the proof in the next sub-section.

THEOREM 4.1: Given D G ~ and D G ~ , O(logmlog1ogm) time with mlogm processors suffices t o compute iog4m consecutive row-vectors of n x m distance matrix DG.

COROLLARY 4.1: Given D G ~ and D G ~ , O(logmlog1ogm) time with mn/log3m processors suf- fices t o obtain n x m distance matrix DG.

The basic approach applied in the algorithm is partitioning DG with size n x m into n/log4m sub- matrices each with log4m rows of DG, and com- pute these sub-matrices independently. We assign mlogm processors for each sub-matrix and require the sub-matrix to be computed in O(logmlog1ogm) time. Without loss of generality, we concentrate on comput- ing the sub-matrix consisting of the first log4m rows of DG. The algorithm for computing these row-vectors consists of the following three steps:

1. compute the common row-vector L of Db, f o r 1 5 i 5 iog4m as follows:

(a) compute D& and D91rn; (b) compute common row-vector L of D& and

(c) compute corresponding position functions

D9'rn. 1

P[D&, L] and P [ D 2 l r n , L];

2. compute remnants R[D&], for 1 5 i 5 log4m;

9. compute position functions P[DL, L] and

To understand these steps, we only need to remem- ber that common row-vector L of D&, D 9 ' " is, in fact, the common row-vector of the log4m rows (From Proposition 3.2 (1)). 4.3

We now provide the details of implementing each of these three steps. Let us start with Step 2.1. By Lemma 2.2, DL and O F 4 , can be obtained basically by identifying column minima Cmin[M[D&]] and C m i n [ M [ D ~ 4 r n ] ] of matrix M[D&] and M[Dgg4"] , respectively. Remember that here Cmin[Z] is an 1- dimensional array in which Cmin[Z](i) is the mini- mum of the i-th column'of matrix Z . The computa- tion gets benefit from the totally monotone property of M[D&] and M[Dlc"p4"].

The column minima of an m x m totally monotone array Z can be computed in O(1ogm) tame with mlogm processors on CREW- PRAM on model.

Observation 4.1 suggests that O(1ogm) time with mlogm processors suffices to compute Cmin[M[D&]] and Cmin[M[D~g4"]] if D G ~ and D G ~ are imple- mented by two 2-dimensional arrays, respectively. Since the number of groups in L is bounded by O(log4m) (by Proposition 3.2(2)), with the help of po- sition functions any read/write operations on M[D,$] and M[Dlc"p4"] with size bounded by m x m can be done on our data structure in O(log1ogm) time, hence Step 2.l(a) can be done in O(logmlog1ogm) time with mlogm processors.

The algorithm for computing the common vector L of D& and D'og4m takes the advantage of the fact that values of entries in any row-vector of the distance ma- trix monotonely increase (by Proposition 3.1(1)). For each entry v in DkU, we assign processor P, to it. Typically, processor P, ,will execute a binary search on DE$rn and mark v if there is an entry w in D!$" identical to v . L can be obtained simply by ranking the marked entries of 06 , . In order to partition L into groups L1, La, . . . , L,, we identify and mark every consecutive two elements of L , say I1 and 1 2 , such that they are not the consecutive ones in D;fm. Since the elements in the same group should be the consec- utive ones in D21rn , 11 and 12 must belong to two groups. Moreover, 11 should be the last element in some group Lj whilst 12 be the first element in group L;+1. Thus we partitioned L into groups. Therefore, on our data structure, Step 2.l(b) can be trivially done in O(logmlog1ogm) time using m/2 processors.

To compute position functions P[DLU, L] (or P[D24rn, L]) we just identify, for each L j , the en- try in DhU identical to the first element in L j . Since

P[D&, R[Dd]], f o r 1 5 i 5 log4m.

The proof of Theorem 4.1

OBSERVATION 4.1[AP88]:

Gu

635

Page 7: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

r 5 log4m (by Corollary 3.1), by using binary search this can be done in O(1ogm) time with log4m proces- sors.

Now we discuss Step 2.2 which is the most, com- plicated step in this algorithm. Let R[Dt;] = (R’;, . . , R:+,) be the remnant of 0;. We are going to show how to compute the b-th group R,“ of R [ D b ] , where 1 5 a 5 log4m and 1 5 b 5 r. We assume that L contains at least one element which is not “00”. The case where L consists of only “00” should be easy to deal with. Since any “finite” elements in R,”, b 2 2, is contained in Rbog4m (by Proposition 3.2(3)) and all “00” entries appear after “finite” entries in D& so re- moving “00” elements from R1og‘m will not effect the computation of generating the “‘finite” elements in RE. Therefore, without loss of generality, we also assume that there is no “00” element in Rpg4m.

Note that, since Dk, D 2 4 m and L have already been computed, there should be no difficulty to com- pute R [ D k ] and R[Dkg4”] in the required time. Other remnants will be computed from them. There are two cases to be considered: b = 1 and b 2 2. By Proposition 3.2(3), when b 2 2 Rt is a sub-vector of

ues of the entries in 0 2 , implies that Ri is the largest common sub-vector of Dz and RFg4m. Therefore, if Db is given, then R,” can be obtained by identify- ing the largest common sub-vector of Db and Riog4‘“. However, this approach doesn’t work because bc is unknown. So, instead of using DZ, we are going to define a sub-row SD[a ,b] of Db such that Rg is a sub-vector of SD[a, b]. Obviously, R,“ is the largest common sub-vector of SD[a, b] and Rlbog4m. Remem- ber that the size of R [ D 2 ‘ l m ] is bounded by log4m, so

the size of Rbog’” is a t most log4m. An approach sim- ilar to that we used in Step l (a ) suggests the following lemma.

LEMMA 4.1: Suppose SD[a ,b] is given, where b 2 2, moreover, the sire of SD[a,b] is bounded b y log4m, then R,” can be computed in O(log1ogm) time with log4m processors.

When b = 1, it can be easily shown, by Proposition 3.1(3), that the size of R; is bounded by log4m, so we define SD[a, 13 by simply taking the first log4m entries for Db. It should be clear that RY consists of the first IR‘j‘l entries of S D [ a , l ] , and that the (IRYI + 1)-th entry of S D [ a , 11 is identical to L1 1) . So, we have

LEMMA 4.2: Suppose that SD\u, 11 and L(1) are given, then log4m processors sufice t o identify R’j‘ in constant time.

By Lemma 4.1 and Lemma 4.2, together with the fact that r , the number of groups of L , is bounded by log4m, we have

Suppose that SD[i , j ] and L(1) are given, and moreover, the size of SD[i, j ] is bounded

Rlog4m , which, together with the monotonality of val-

COROLLARY 4.2:

b y log4m, then Rj, for 1 5 i 5 log4m and 1 5 j 5 r , can be obtained in O(log1ogm) time b y using log16m processors .

Since L l ( 1 ) has already been computed in Step 2.l(b), so we concentrate on the crucial problem of how to obtain SD[a, b] , b 2 2 , such that R,“ is a sub- vector of it and its size is bounded by log4m. We first raise the result.

LEMMA 4.3: SD[a ,b] can be generated in O(logmlog1ogm) time with polylogarithmic number of processors.

In order to apply Lemma 2.2 to generate SD[a, b ] , we have to first decide which entries of Dg are to be included in SD[a, b]. By Ind[SD[a, b ] ] , we establish the relationship between SD[a, b] and D g , i.e.,

Ind[SD[a, b ] ] ( i ) = 1 if SD[a, b ] ( i ) = Dg(1).

Before giving the method for choosing Ind[SD[a, b ] ] , we first take a look a t the relationship between Pos[D& Ri] and

Lb] and P ~ ~ [ D Z ~ ~ , Lb] + iog4m (from Proposition 4.1), Pos[D& RE] should be between 1 and 1 + log4m, where 1 = P O S [ D ~ ~ ~ , Rbog4m]. Re- call that the size of R[&] is bounded by log4m, so is group RE. Thus we conclude that if S D [ a , b] consists of 210g4m entries of D; from entry D G ( u , I ) to entry &(U, 1 + 210g4m - l), then R: is a sub-vector of it as required. Therefore, we choose Ind[SD[a, b]] with size 210g4m as follows: For 1 5 i 5 210g4m,

Pos[@g4”, Rlog4m 1 . Since Pos[Dz ,Lb] is between

Ind[SD[a, b ] ] ( i ) = 1 + i - 1 (1)

where 1 = P o s [ D ~ ~ ” , RFg4m]. Having Ind[SD[a, b ] ] , by Lemma 2.2, to compute

S D b , b] we only need to identify the column minima o f t e sub-matrix M [ S D [ a , b]] of M [ D g ] such that the i-th column in M [ S D [ a , b ] ] is the ( I n d [ S D [ a , b ] ] ( i )-tli column in M [ D 2 ] . It should be clear that M [ S D & , b]] is totally monotone and its size is bounded by m/2 x 210g4m. Even though the size of M [ S D [ a , b ] ] is . very small compared with that of A l [ D g ] , it is still not small enough for us to be able to identify the column minima of it by applying Lemma 2.2 in the required time with the required npmber of processors. For this reason, we introduce another matrix M‘[SD[a, b]] such that M’[SD[a, b]] is much smaller than M [ S D [ a , b ] ] , and is equivalent to M [ S D [ a , b]] in the sense that the column minima of M‘[SD[a, b ] ] is the same as those of M [ S D [ a , b ] ] .

Before we give the formal definition of M’[SD[a, b ] ] , we need some notations. Recall that Al[D&] comes from D G ~ and D G ~ . Let D L U , for 1 5 i 5 log4,, be the first log4m rows of and L’ = ( L ; , . . s , L c l ) be the common row-vector of them. Consider the ma- trix M C = ( M 1 , . . . , M,.,), where the i-th row of Mj

636

Page 8: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

m, 1

X[I. k l

Figure 6: matrix S M [ S D [ i , j ] , k]

L‘.(i) is CY[i,D,i 1 . By Theorem 3.2, M C is the com- mon matrix of M[D,!J, for 1 5 i 5 log4m. Remem- ber that X [ i , j ] is the sub-matrix, in M [ D $ ] , corre- sponding to M j , i.e., X [ i , j ] = S [ K ( i , j ) , M j ] . By

we refer to the sub-matrix of both

the maximum number of columns and rows. See Fi ure 6. Matrix M’[SD[a,b]] is obtained from M f S D a , b] by replacing matrices S M [ S D [ a , b] , k] with Jmin]SM[SD[a, b ] , k ] ] , for 1 5 k 5 9-1.

The following facts about M’[SD[a, b]] are obvious. PROP0 SITION 4.2:

X [ i , k ] , such that S M [ S D [ i , j ] , k] has

1. Cmin[M’[SD[a, b ] ] ] = Cmin[M[SD[a, b]]];

2. the sire of M’[SD[a, b]] is bounded b y (210g4m) x ( 2 i 0 g 4 4 .

According to these facts, it is not hard to see that suppose that we have already obtained M‘[SD[i , j ] ] , for 1 5 i 5 log4m and 1 5 j 5 r , and the are represented by arrays, then Cmin[M[SD[i, j ] ] [ for 1 5 i 5 log4m and 1 5 j 5 r , can be computed in O(log1ogm) time with polylogarithmic number of processors.

Since once we have obtained Cmin[M[SD[a, b]] ] , SD[a,b] can be easily obtained from it by just per- forming some comparisons (by Lemma 2.2), which needs only constant time and polylogarithmic number of processors, so to show Lemma 4.3 we only need to show that M’[SD[i , j ] ] , for 1 5 i 5 log4m and 1 5 j 5 r , can be generated in the required time. Note that the numbers of both rows and columns of M‘[SD[a, b]] is bounded by polylogarithm and each entr , except those entries belonging to Cmin[SM[SD[a, bf, k]] ,.can be obtained from D G ~ and DG, in O(log1ogm) time. Therefore,

OBSERVATION 4.2: Suppose C m i n [ S M [ S D [ a , b ] , k ] , f o r 1 5 k 5 T I , are given, then Cmin[M[SD[a, can be obtained in the required tame with polylogarat mic number of proces- sors.

Now we dis- cuss how to compute Cmin[SM[SD[a, b ] , k]]s in the required time. I t is not hard to see that we can not afford to compute Cmin[SM[SD[a, b]., 6 1s by identi- fying the column minima of SM[SD[ t , 31, k ] indepen- dently. The alternative strategy bases on the fact that every Cmin SM[SD[ i , j ] , k]] is a sub-vector of

C o h m n minima C m i n [ X [ i , k]] C m i n [ X [ i , k ] ] , an 6 moreover,

OBSERVATION 4.3: can be obtained from Cmin 1 5 k 5 r1, where Mk is t matrix M C .

This is just because that X [ i , k ] is obtained from Mk by K(i,k)-shift, for 15 k 5 T I .

Observation 4.3 suggests that instead of generat- ing all Cmin[SM[SD[i, j ] , k s independently, we only need to generate all Cmin/X[i,k,,. Moreover, in- stead of generating all Cmin[X[i, ]Is independently, we only need to generate all Cmin[Mk]s. Though the number of operations is greatly reduced by ap- plying this approach, we‘ still can not afford to apply it directly. This is because that, typically, there are

minima of them in O(logmlog1ogm) time by just using mlogm proces- sors. However, notice that not all entries of Cmin[Mk] will be used for generating M ’ [ S D [ i , j ] ] , similarly as before, we can define a sub-matrix SMk of Mk such that all the entries in Cmin[Mk] , which are used to generate ~ r n i n [ ~ ~ [ ~ ~ [ i , j J , k ] ] , for 1 < i 5 iog4m and 1 5 j 5 T I , are inch ed in C m i q S M k ] . Also by Ind[SMk], we establish the concrete relationship between SMk and Mk, such that, the h-th column of SMk iS the Ind[SMk](h)-th Column Of Mk.

The procedure which takes Ind[SD[ i , j ] ] , for 1 5 i 5 log4m and 1 5 j 5 T I , as inputs and generates Ind[SMk] is described as follows.

O(log4m) Mks each with size (m log4m) x m. It is im- possible to compute all of the CO

procedure Ind[SMk] { Suppose that K ( i , I;) is the co-

1. for each of the elements in Ind[SD[ i , j ] ] , assign a processor to i t , for 1 5 i 5 log4m and 1 5 j 5 T I , and each processor P,,,,h assigned to ele- ment Ind[SD[ i , j ] ] (h ) of Ind[SD[ i , j ] ] computes in- dex y, such that if l n d [ S D [ i , j ] ] ( h ) - K ( i , j ) ? 1 then y = Ind[SD[ i , j ] ] (h ) - K ( i , j ) ; otherwise, y = m - K ( i , j ) + Ind[SD[ i , j ] ] (h ) ;

2. sort all indices generated in the first step and mark the index if it is the leftmost one among those which have the same value;

3. Ind[SMk] is obtained by ranking those marked in-

By the definition of the common matrix, it is not hard to see that the y-th column in Mk is the same column as the ( Ind[SD[i , j ] ] ( h))-th column in X [ i , k ] . So, through Step 1 of this procedure all columns in Mk, which may be required to generate s D [ i , j ] , are identified. The intention of Step 2 .and Step 3 of this procedure is trivial. Notice that both r1 and the size of

efficient such that x[;, E ] = .?[IC(;, k), Mk]. }

dices.

637

Page 9: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

Ind[SD[i , j ] ] are bounded by O(log4m), so O(log12m) processors suffices for this procedure. Obviously, Step 1 takes constant time while Step 2 and Step 3 take O(10gm) time. Hence, to compute Ind[SMk]s, for 1 5 k 5 1-1, O(1ogm) time suffices with the required number of processors.

Since Mk is totally monotone, so is SMk. In order to give the time bound for applying the search tech- nique on the monotone array to compute Cmin[SMk.], we consider the size of SMk. As we mentioned be- fore, the number of columns of SMk is bounded by O(log’2m). It should be of no doubt that the total number of rows in s h f k s , for 1 _< k 5 PI, is bounded by m/2 and the number of groups, r1, is at most log4m. So, without loss of generality, we assume that there are at most (m/2)/1og4m rows in SMk, otherwise we just divide the larger SMk into several smaller ones each consisting of at most (m/2)/log4m rows and the total number of SMkS will be at most double. There- fore, according to Observation 4.1, mlogm processors suffice to compute Cmin[SMk] , for 1 _< k 5 r1, in 0 (logmloglogm) time.

It is worth noticing that the number of columns in SMk is bounded by O(log12m). So, if Cmin[SMk] has already been generated, then any en- try of C m i n [ S M [ S D [ i , j], k ] can be obtained from Cmin[SMk] in O(log1ogmj time. That is be- cause the h-th entry of Cmin S M [ S D [ i , j Ind[SD[i, j]](h)-th entry of A m i n [ X [ i , kl) ?kh% the ( Ind[SD[i , j ] ] ( h ) - K(i , j))-th entry in Cmin[Mk] if I n d [ S D z j ] ] ( h - IC(z,j) 2 1, or otherwise the ~ ~ - l ( ( i , ~ ~ ~ I n d ~ S D [ i , j ] l ( h ) ) - t h entry in Cmin[Mk].

ince the entry we are looking for should be included in C m i n [ S M k ] , so with the help of Ind S M k ] , we can

Once C m i n [ S M [ S D [ a , b] , k]]s have been ob- tained, as we mentioned by Observation 4.2, Cmin[M[SD[a, b] ] ] can be obtained in the required time. Thus we complete the proof of Lemma 4.3. Corollary 4.2, together with Lemma 4.3, suggests that Step 2.2(b) can be done as required.

find the entry in Cmin[SMk] through a 6 inary search.

, for 1 5 i 5 log4m, have been be any difficulty to com-

pute P[D$, L] and P[DL, R [ D $ ] ] , for 1 5 i _< log4m, because the number of groups of L and R [ D $ ] are bounded by a polylogarithm. Thus we complete the proof of Theorem 4.1. 4.4 The implementation of Step 3 in the

main structure The objective of this sub-section is to describe an

approach to identify vertices on the longest path be- tween the source and the sink of G.

In general, the longest path p on the (m + 1) x (n + 1) grid DAG, G, from the source to the sink can be represented by p =< V I , 212, . . . , W I >, where v1 = ( 1 , l ) is the source whilst v1 = (m + 1 , n + 1) is the sink. Since we presume m 5 n, there may be more than one vertices on p , which are on the same row of G. A vertex v; on p is a cross vertex if w; is the leftmost one among those on p , which lie on the same row of G. To distinguish cross vertices from other vertices on p , we

denote the cross vertex on the j - th row of G, ui, as w i [ j ] (sometimesimply v[j]). In particular, v1 = v1[1].

Step 3 in the main structure is implemented by two st ages:

1. identify cross vertex v[ i ] on p , for 1 5 i 5 m + I; 2. identify other vertices on p .

We first raise the result, then provide the proof in the remainder of this sub-section.

TBEOREM 4.2: Suppose that all distance matrices are given, then Step 3.1 and Step 3.2 can be done in O(log2m) time and O(1ogn) time, respectively, with n processors.

The result about Step 3.1 will be shown by presenting the implementation of Step 3.1. Step 3.1 is implemented by a procedure called CrossVer te z (v [ i l ] , w [ i z ] ) which, with the help of dis- tance matrices obtained from Step 2 in the main struc- ture, returns all cross vertices on p between .[ill and .[&I. CrossVer te z (v [ i l ] , v[i2]) is a simple recursive procedure as follows: procedure CrossVer te z (v [ i l ] , v[i2])

1. V [ ( i l + i 2 ) / 2 1 + e ( v [ ~ l I , ~ [ i z l ) ;

2. call CrossVer t e z (v [ i l ] , v[(il + i ? ) /2 ] ) and CrossVertez(w[(i l + i 2 ) / 2 ] , ~ [ i z ] ) i f z l # 22.

Initially, we run CrossVer te z (u[ l ] , v[m -+ 13) (re- member we have presumed that m is a power of 2). In order to analyze the complexity of O(v 21 , v [ i 2 ] ) , with- out loss of generality, we consider O [ i ( l ] , v[m + l]), where v [ l ] = ( 1 , I) and v[m + I] = (rn + 1 , n + 1 ) . We are going to assign m/2 processors to compute it.

O(1ognz) time sufices t o compute e(v[l], u[m+ 11) (‘i.e., to identify cross vertex u [ m / 2 +

Proof: Suppose that the length of the longest path between u [ l ] and w[m + 11 is j . Considering the fact that any breakout vertices of v [ l ] on the boundary be- tween Gu and GL may be v[y/2+ l]] we assign a pro- cessor for each breakout vertices of v[l]. The behavior of processor P; that is assigned to the i-th breakout

LEMMA 4.4:

11).

functions, taking O(log1ogm) time, processor P; ob- tains entry D G ~ ( ~ , i ) from the distance matrix. Then taking another O(log1ogm) time, Pi checks whether vertex ( rn + l , D ~ ~ ( D ~ ~ ( l , i ) , j - i)) is identical to vertex v[m + 13 and mark vertex (nz/2 + 1 , D G ~ ( ~ , i ) ) on Gu if it is. Remember that the longest path we are interested in is the “lowest” one among all of the longest paths. So vertex v[m/2+ 13 should be the left- most one among those marked vertices, which can be identified from them in

Now we turn back to Let T’m) be the time bound for Cross er tez (w[ l] , v[m + I ] ) , Lemma 4.4 suggests that T(m) = O(log2m) which can be found by the following recurrence:

T(k) 5 T(k/2) + cllogk, for 2 < k 5 m.

638

Page 10: An optimal algorithm for the longest common subsequence problemcesg.tamu.edu/wp-content/uploads/2012/02/97.-An-Optimal... · 2020. 3. 20. · memory. A CREW-PRAM allows simultaneous

Obviously, the number of processors needed is bounded by n.

Step 3.2 is quite easy. Let v[i] be vertex (i,jl) and v[ i + 1 be vertex (i + l ,j4 on G. It should be clear

p between v[i] and v i + 11. So, once all cross vertices have been identified through Step 3.1, there should be no difficulty to print out all vertices on p in O(1ogn) time with n processors. Thus we have completed the proof of Theorem 4.2.

I that a1 1 vertices ( i , j , for j , 5 j < j 2 , are vertices on

4.5 The complexity of the algorithm We first establish the result. THEOREM 4.3: The L c s problem can be

solved b y using mn/log2mloglogm processors i n O(log2mloglogm) time when log2mloglogm > logn, or otherwise by using mnllogn processors in O(1ogn) time.

To prove this, we just count the complexity for each of the steps in the main structure. Here we only dis- cuss the complexity of Step 1 and Step 2. The discus- sion on Step 1 is applicable to Step 3 and Step 4.

We have proved, in Section 2.3, that Step 1 can be done in O(1ogn) time with mnllogn processors. To be consistent with Theorem 4.3, we just point out that, when log’mloglogm > logn, instead of using mnllogn processors we can only use rnn/log2mloglogm proces- sors and the same algorithm for Step 1 can be simu- lated in O(log2rnloglogm) time by using those proces- sors. This is due to Brent’s principle.

THEOREM 4.4[Brent74]: Let s be a given algorithm with a parallel computation time o f t . Suppose that S involves a total number of m computational opera- tions. Then S can be implemented using p processors in O(m/p + t ) parallel time.

Now we turn to Step 2, and have to show that it can be done in O(log2mloglogm) time with mn/log2rnloglogm processors and, that if we have only rnnllogn processors, where mnllogn < mn/log2mloglogm, then 0 logn) time suffices. Here we only provide the details I or the first result, the sec- ond one is handled by applying Brent’s principle.

Consider the i-th stage of O(1ogm) stages of Step 2. We are dealing with gi, where gi = O(m/2i), grid DAGS each with size O(2’) x (n+ 1). For any of them, by Corollary 4.1, the corresponding distance matrix can be obtained in ti time by using pi ,processors, where t i = 0(10g(2~)10glog(2~)) and pi = 2’n/10g3(2’). Since totally P , where P = mn/log2rnloglogm, pro- cessors are available, so we can compute si distance matrices simultaneously, where si = P / p ; . Re- member that there are in total gi distance matri- ces to be computed, therefore, when gi > Si, i.e., i < vlog2mloglogm, O((gi/si)ti) time suffices for this stage, while, when gi 5 si , ti time is enough. This dis- cussion, together with the fact that there are O(1ogrn) stages in Step 2, results in the time bound T for Step

2, which can be found as follows:

. . i=l a = a l

i l czlogm

5 c;log2mloglogm logi + logmloglogm a 2 . . i=l ( = a l

= O( log2mloglogm),

where il = vlog2mloglogrn. Thus we have proved that O(log2mloglogm) time with rnn/log2rnioglogm processors suffices to Step 2.

Hence, we have completed the proof of Theorem 4.3.

5 Conclusion We gave an optimal algorithm on CREW-PRAM

model for the LCS problem, which based upon ex- ploiting nice properties of the distance matrix.

[AHU76] A. V. Aho, D. S. Hirschberg and J . D. U11- man, “Bounds on the complexity of the longest common subsequence problem,” J. Assoc. Com- put. Math. 23(1) (Jan. 1976) pp. 1-12.

[AP88] A. Aggarwal and J . Park, “Notes on searching in multidimensional monotone arrays,” Proc. 29th Annual IEEE Symposium on Foundations of Com- puter Science, IEEE Computer Society, Washing- ton, DC, 1988, pp. 497-512.

[AALMSO] A. Apostolico, M. J . Atallah, L. L. Lar- more, and S. Mcfaddin, “Efficient parallel algo- rithms for string editing and related problems,” SIAM J. on Computing, Vol. 19, No. 5, pp. 968- 988, Oct. 1990.

[Brent741 R. P. Brent, “The parallel evaluation of gen- eral arithmetic expressions,” J . Assoc. Comput. Mach. 21(1974), pp. ,201-206.

[GR88] A. Gibbons and W. Rytter, Eficient Parallel Algorithms, Cambridge University Press, 1988.

[Hirschberg77] D. S. Hirschberg, “Algorithms for the longest common subsequence problem”, J. A CM,

[HD84] W. J . Hsu and M. W. Du, “New algorithmsfor the LCS problem”, J . Comput. Sys. Sci., 29(1984),

[LLASl] H . Lin, M. Lu and J. Abello, “An efficient parallel algorithm for the longest common subse- quence problem”, accepted by Prallel Computing 91, Sept 1991, London, U.K.

[Lug01 M. Lu, “A parallel algorithm for longest- common-subsequence computing,” Proc. of Inter- national Conference on Computing and Informa- tion, Ontario, Canada, May 1990, pp.372-377.

References

24(1977), pp. 664-675.

pp. 133-152.

639