7
Direct and implicit optical matrix-vector algorithms David Casasent and Anjan Ghosh New direct and implicit algorithms for optical matrix-vector and systolic array processors are considered. Direct rather than indirect algorithms to solve linear systems and implicit rather than explicit solutions to solve second-order partial differential equations are discussed. In many cases, such approaches more prop- erly utilize the advantageous features of optical systolic array processors. The matrix-decomposition opera- tion (rather than solution of the simplified matrix-vector equation that results) is recognized as the compu- tationally burdensome aspect of such problems that should be computed on an optical system. The House- holder QR matrix-decomposition algorithm is considered as a specific example of a direct solution. Exten- sions to eigenvalue computation and formation of matrices of special structure are also noted. I. Introduction Optical matrix-vector processors represent a major new class of general-purpose optical system. The original matrix-vector processors'- 3 most recently de- scribed in Ref. 4 were recently extended to iterative processors 5 ' 6 and to systolicarray architectures. 7 - 9 The newest 9 optical systolic array processor architecture (Fig. 1) uses a frequency-multiplexed AO cell to realize matrix-matrix operations. In the system of Fig. 1, the outputs from a linear array of LEDs are imaged through separate sections of an acoustooptic (AO) cell, and the Fourier transform of the light distribution leaving the AO cell is formed on a linear output photodetector array. To realize matrix-matrix multiplications on this system, we time and space multiplex the rows and col- umns of a matrix A and feed one column or row of ma- trix data in parallel to the LEDs at each unit of time. The second matrix is entered into the AO cell sequen- tially one column or row at a time in parallel by fre- quency-multiplexing. Details of this system are pro- vided in Ref. 9. Depending upon the multiplexing scheme, this system can produce the matrix-matrix products C = A B or B A with one row (or column) of C produced on the output detector array in parallel at each bit time TB. In this paper, we shall consider sev- eral new algorithms for use on such a processor. Prior uses of matrix-vector and systolic array optical processors have concentrated on iterative or indirect solutions x to the general matrix-vector equation A x = (Refs. 5 and 6) and iterative solutions to partial differential equations. 8 (We will denote matrices by capital letters and vectors by lower case letters. We will use underbars to distinguish matrices and vectors from their elements.) In this paper, we advance direct (rather than indirect) solutions to A x = b on such pro- cessors as well as implicit (rather than explicit) solutions to partial differential equations on such systems. In Sec. II, we discuss implicit and explicit solutions to partial differential equations, and we briefly survey various direct solution techniques for linear systems. We also show that direct techniques are appropriate for many implicit problem solutions. In Sec. III, we detail the steps required in the Householder QR matrix-de- composition algorithm we selected for our direct solu- tion case study. In Sec. IV, we discuss the realization and the time required to achieve this matrix-decom- position algorithm on the optical systolic array processor of Fig. 1 with attention to pipelining of data and oper- ations. The Householder QR decomposition algorithm is very general as we discuss in Sec. V, where we note its use in producing matrices with special structures (e.g., Hessenberg, tridiagonal and pentadiagonal) and in computing the eigenvalues of a matrix. II. Implicit and Direct Solutions Let us consider solution of a second-order partial differential equation such as the diffusion equation: au(x,t) i0 2 u(x,t) at Ox 2 (1) The authors are with Carnegie-Mellon University, Department of Electrical Engineering, Pittsburgh, Pennsylvania 15213. Received 4 May 1983. 0003-6935/83/223572-07$01.00/0. © 1983 Optical Society of America. for u(x,t). Our general approach applies to elliptic and hyperbolic partial differential equations, as well as the parabolic equation (1). In the time-dependent problem in Eq. (1), we specify boundary conditions at t = 0 and at x = 0 and x = L: 3572 APPLIED OPTICS / Vol. 22, No. 22 / 15 November 1983

Direct and implicit optical matrix-vector algorithms

  • Upload
    anjan

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Direct and implicit optical matrix-vector algorithms

Direct and implicit optical matrix-vector algorithms

David Casasent and Anjan Ghosh

New direct and implicit algorithms for optical matrix-vector and systolic array processors are considered.Direct rather than indirect algorithms to solve linear systems and implicit rather than explicit solutions tosolve second-order partial differential equations are discussed. In many cases, such approaches more prop-erly utilize the advantageous features of optical systolic array processors. The matrix-decomposition opera-tion (rather than solution of the simplified matrix-vector equation that results) is recognized as the compu-tationally burdensome aspect of such problems that should be computed on an optical system. The House-holder QR matrix-decomposition algorithm is considered as a specific example of a direct solution. Exten-sions to eigenvalue computation and formation of matrices of special structure are also noted.

I. Introduction

Optical matrix-vector processors represent a majornew class of general-purpose optical system. Theoriginal matrix-vector processors'-3 most recently de-scribed in Ref. 4 were recently extended to iterativeprocessors5'6 and to systolic array architectures.7-9 Thenewest9 optical systolic array processor architecture(Fig. 1) uses a frequency-multiplexed AO cell to realizematrix-matrix operations. In the system of Fig. 1, theoutputs from a linear array of LEDs are imaged throughseparate sections of an acoustooptic (AO) cell, and theFourier transform of the light distribution leaving theAO cell is formed on a linear output photodetectorarray. To realize matrix-matrix multiplications on thissystem, we time and space multiplex the rows and col-umns of a matrix A and feed one column or row of ma-trix data in parallel to the LEDs at each unit of time.The second matrix is entered into the AO cell sequen-tially one column or row at a time in parallel by fre-quency-multiplexing. Details of this system are pro-vided in Ref. 9. Depending upon the multiplexingscheme, this system can produce the matrix-matrixproducts C = A B or B A with one row (or column) of Cproduced on the output detector array in parallel ateach bit time TB. In this paper, we shall consider sev-eral new algorithms for use on such a processor.

Prior uses of matrix-vector and systolic array opticalprocessors have concentrated on iterative or indirect

solutions x to the general matrix-vector equation A x= (Refs. 5 and 6) and iterative solutions to partialdifferential equations.8 (We will denote matrices bycapital letters and vectors by lower case letters. We willuse underbars to distinguish matrices and vectors fromtheir elements.) In this paper, we advance direct(rather than indirect) solutions to A x = b on such pro-cessors as well as implicit (rather than explicit) solutionsto partial differential equations on such systems. InSec. II, we discuss implicit and explicit solutions topartial differential equations, and we briefly surveyvarious direct solution techniques for linear systems.We also show that direct techniques are appropriate formany implicit problem solutions. In Sec. III, we detailthe steps required in the Householder QR matrix-de-composition algorithm we selected for our direct solu-tion case study. In Sec. IV, we discuss the realizationand the time required to achieve this matrix-decom-position algorithm on the optical systolic array processorof Fig. 1 with attention to pipelining of data and oper-ations. The Householder QR decomposition algorithmis very general as we discuss in Sec. V, where we note itsuse in producing matrices with special structures (e.g.,Hessenberg, tridiagonal and pentadiagonal) and incomputing the eigenvalues of a matrix.

II. Implicit and Direct Solutions

Let us consider solution of a second-order partialdifferential equation such as the diffusion equation:

au(x,t) i02 u(x,t)

at Ox2(1)

The authors are with Carnegie-Mellon University, Department ofElectrical Engineering, Pittsburgh, Pennsylvania 15213.

Received 4 May 1983.0003-6935/83/223572-07$01.00/0.© 1983 Optical Society of America.

for u(x,t). Our general approach applies to elliptic andhyperbolic partial differential equations, as well as theparabolic equation (1). In the time-dependent problemin Eq. (1), we specify boundary conditions at t = 0 andat x = 0 and x = L:

3572 APPLIED OPTICS / Vol. 22, No. 22 / 15 November 1983

Page 2: Direct and implicit optical matrix-vector algorithms

LDs/LEDs

B = b n- nm }

= b(x,t)

SHIFTED

AOCELL

FTLENS

Fig. 1. Schematic diagram of a frequency-multiplexed matrix-matrix systolic array processor.

u(x,) = f(x), for 0 x < L,

u(0,t) = u(L,t) = g(t), for 0 t S T.

(2a) Rearranging Eq. (5), we obtain

(2b) (1 + 2)u - Xu"j - \uX = Xu'+1 + (1 - 2X)u4 + Xu0 ,(6)

(3)

As our time and space increments, we use At and Ax =L/(J + 1), where JAx and nAt represent points in our2-D space and time grid. Applying single-differencingin time and space to both sides of Eq. (1), we obtain

uq+- u 2 Uq+ -2u7 + _1

At (Ax)2

where superscripts denote time increments, and sub-scripts denote space increments. Rearranging Eq. (3),we obtain

u 7l=u7+ + (1-2X)u! + Xul 1for n 0, 1 j , J. (4)where X = c2At/(Ax) 2 < 0.5 is required for stability.

To solve Eq. (1) in the form in Eq. (4), we calculateu(1)(at t = lAt) for all 1 < j < J. We then calculate u(2)for all j etc. As formulated, this algorithm is easily re-alized digitally as it does not require the solution ofmatrix-vector equations. For this reason, this is re-ferred to as an explicit solution. It is, of course, possibleto formulate Eq. (4) as a matrix-vector equation.However, this is not necessary. Moreover, the finite-differencing used in Eq. (3) represents a poor approxi-mation to Eq. (1), unless Ax is very small. From ourstability requirement, we see that At must be much lessthan (Ax/c)2. Thus a good approximation with thisalgorithm will require a large number of calculationsand cycles, and more important it will require an in-creased precision in representation of the data. Thislatter issue is of concern in an optical processor. Thestability of the solution in Eq. (4) is thus of major con-cern. Thus, in general, such explicit solutions to partialdifferential equations are not attractive.

We thus consider and advance use of an implicit so-lution to Eq. (1) on an optical processor. In such a so-lution, we perform a double-differencing of the right-hand side of Eq. (1) and obtain

U+U c2u +- n+1u7 ' u%'+ ujn+l - 2uj + uj)._,At 2(AX)2

where X = c2 At/(Ax) 2. This Crank-Nicholson algo-rithm can be formulated as the matrix-vector equa-tion

A n+l = bn (7)

where A is tridiagonal with elements -X, 1 + 2X, and -Xalong the three diagonals and where bn is known fromthe values at prior time steps (and boundary conditions)and un+l is to be calculated for all space coordinates ateach time step n. The implicit solution is uncondi-tionally stable and has second-order accuracy, and fromthis consideration alone an implicit solution is prefer-able.

In the case of an inhomogeneous medium (or whenA is very large), an iterative solution to Eq. (7) bytechniques such as the Richardson algorithm'0 may bepreferable. However, when the coefficient c in Eq. (1)is a constant (or slowly varying with time), so is thematrix A in Eq. (7). In such cases, a matrix-decompo-sition or direct solution is generally preferable. Sucha technique is attractive because we need only performa matrix-decomposition of A once and thereafter solvea simplified matrix-vector equation for all space coor-dinates at each time step. Having shown that in suchcases direct solutions are appropriate, we now considersuch solutions to Eq. (7) or to the more general ma-trix-vector equation:

Ax = b. (8)

The direct solutions we discuss are also applicable togeneral linear system problems as described by Eq. (8)whether they arise as implicit solutions to partial dif-ferential equations or from other engineering and dataprocessing problems. The A matrix in our case studyis tridiagonal. This does not exploit the full power ofour optical system for matrix-decomposition. But ex-tensions to more general matrix-vector problems areobvious.

15 November 1983 / Vol. 22, No. 22 / APPLIED OPTICS 3573

C = mn

= c(x,t)

= AB

Page 3: Direct and implicit optical matrix-vector algorithms

All direct solutions to the matrix-vector problem inEqs. (7) or (8) involve the decomposition of the matrixA into two matrices of simpler structure. Consider aQR decomposition of A into A = Q R, where Q is or-thogonal (QT = Q-1) and R is upper triangular. In thiscase, Eq. (8) becomes Q R x = b, or

Rx=Qb='. (9)

Once the matrix-decomposition has been performed,the solution of Eq. (9), by back substitution algo-rithms," is trivial on a dedicated digital processor (sinceR is triangular and b' is known). Thus, in this paper,we consider and advance the use of optical systolic arrayprocessors to realize the matrix-decomposition with thesolution of the simplified matrix-vector problem suchas Eq. (9) performed on a dedicated digital system. Thesolution of the simplified problem in Eq. (9) is alsopossible on the optical system (should the situationmerit it).

Several reasons for considering such direct solutionsto matrix-vector equations on an optical processor arenow advanced. Direct solutions require a fixed numberof iterations (N - 1 steps to decompose an N X N ma-trix). This is often preferable to the larger number ofcycles necessary in an indirect iterative solution (andthe fact that the number of cycles required in an indirectsolution is not easily estimated in advance). In appli-cations when the same matrix is used many times, directsolutions are often preferable, because the matrix-decomposition need be performed only once, andthereafter a much simpler matrix-vector problem needbe solved many times with different input vectors only.Since the final solution to the simplified matrix problemis performed digitally at high accuracy, the accuracy ofthe final solution may be better than when a finite-precision optical processor is used for the entire problemsolution. The importance and magnitude of each ofthese possible advantages must be weighed and con-sidered for each specific application.

We now briefly review various direct algorithms' 2"13for matrix-decomposition with attention to those al-gorithms that are most attractive for realization on anoptical systolic array processor. There are two generalclasses of such algorithms. In the first class, the matrixA is decomposed into an upper U and lower L triangularmatrix. The algorithms that achieve this decomposi-tion are variations of Gaussian elimination and areknown as LU decomposition algorithms. We do notpresently consider such algorithms, since they aretypically all formulated and described as serial algo-rithms in which one element of A is operated upon se-quentially at each step. If the formulation of LU de-composition is redone using a parallel algorithm, sucha technique is appropriate for use on a parallel opticalsystolic array processor. For our present case study, weconsider the second class of matrix-decomposition al-gorithms in which the matrix is decomposed into anorthogonal matrix Q and an upper triangular matrix Ras in Eq. (9). We refer to these as QR matrix-decom-position algorithms. The Givens and Householder al-gorithms are the most popular techniques by which such

a decomposition is achieved. QR factorization by theGivens algorithm is numerically stable, and a digitalsystolic architecture for its implementation has beendescribed by Gentleman and Kung. However, this al-gorithm appears to be less attractive for optical systolicrealization because of the local memory storage thatappears to be required. Thus, we concentrate ourpresent case study of direct optical solutions to ma-trix-decomposition by the Householder QR algorithm.This algorithm is most attractive for optical realization,since it is inherently parallel (operating in parallel onone column of that matrix A at a time) in its conven-tional digital formulations and descriptions.' 2 Forthese same reasons, this algorithm is not necessarilypopular for digital realization.

111. Householder QR Matrix-Decomposition AlgorithmWe now discuss the general procedure used in all di-

rect algorithms for matrix-decomposition, and we detailthe steps required in a Householder QR matrix-de-composition for the simplification of A x = b. For anN X N matrix, N - 1 steps are required for matrix-decomposition. At step 1, all direct algorithms operateon the original equation in the form Aox = ho, where AO= A and bo = b. In the first step, both sides of thisequation are multiplied by the decomposition matrixPi to form PiAo = Aix = P112o = b. In the next step,both sides of the new Aix = b equation are multipliedby P2 to form yet another new equation P9Ax = A9x =R2b = b2. This process is repeated N - 1 times, untilthe resultant matrix (- . . . P)A = P A = R is uppertriangular. As the final result, this matrix-decompo-sition procedure reduces A to a matrix R that is uppertriangular, and it provides us with a new known vectorP b = b'. We thus need only solve the simplifiedequation in Eq. (9) to obtain the solution to Eq. (8).

At successive steps in this matrix-decomposition, weobtain a new equivalent set of equations given (at themth step) by

Ax = m (10)

We note that using unitary transformations, I det(A,) I= I det(A) , the condition number for Am is no worsethan that of A, the solutions x to Eqs. (8) and (10) arethe same and the set of equations in Eq. (10) areequivalent to those in Eq. (8).

We now consider the Householder algorithm for QRdecomposition to provide the new R and P b = b' matrixand vector we require. At step m of this algorithm, werequire that PA,,- = Am be structured as

A7 = ~I Wm4 m] sN(11)

where R is an upper triangular m X m matrix, thematrices Vm and R are unchanged at step m, and insubsequent steps WN-m denotes an N - m order ma-trix. From this, we see that Pm+1 affects only the ma-trix elements WN-m of Am. To calculate the decom-position matrix _ +l we require only the elements ofthe first column of EN-m in Eq. (11). This will proveto be very useful in our suggested optical realization.

3574 APPLIED OPTICS / Vol. 22, No. 22 / 15 November 1983

Page 4: Direct and implicit optical matrix-vector algorithms

We represent this column of the matrix WV-m by thecolumn vector _im+1 with elements

+ = (Wm+ ,m+ ., WNm+)T. (12)

The form of_ +, is12

0 l I-m-r ' (13)

where I denotes the N - m identity matrix, and nor-malization requires

kM--m = 2. (14a)

The first element of u is

U, = Wm+ism+i + im sign(wm+im+)

= Wm+lm+l + Pm+1, (14b)

and the remaining N - m - 1 elements of u are thesame as those of the first column of WN-1., i.e.,

ui = wm+i,m+1, for 1 < i < N - m. (14c)

The constant im in Eq. (14b) is the norm or the vectorinner product of the first column of WN\-m, i.e.,

N-m1n = E W2+i,m+l, (14d)

,and the constant km in Eq. (13) is

km = (12 + Im|Wm+,m+j)-1. (14e)

The Householder algorithm'2 in Eq. (13), with the termsas defined in Eqs. (14), describes a most attractiveparallel technique to calculate the necessary decom-position matrices P by which to multiply the ma-trix-vector equation in Eq. (8) to achieve the desiredsimplified matrix-vector equation form in Eq. (9).

IV. Optical Realization of the Householder ORMatrix-Decomposition

There are many techniques by which one can realizeQR decomposition in the system of Fig. 1. The finalchoice depends upon the specific application and theelectronic and memory support available. The finaloutput we require is R and the new P b = b'. To obtainthese, we require the intermediate Pm Am, and b Atstep m + 1 of the Householder algorithm, we mustcompute Am+l and m+,. This requires a matrix-matrix multiplication to obtain

Pm+jm = A+, (15a)

and a matrix-vector multiplication to provide

Pm+lkm = b,+,. (15b)

Each of these operations in Eqs. (15) is required at eachof the N - 1 cycles or steps of the Householder algo-rithm. At each cycle or step, we also require calculationof the new matrix

P' +1 kmuT (16)

in Eqs. (13) and (14). In practice, we will not computethe entire new Am+ 1 and km+ matrix and vector in Eqs.(15) at each step. Rather, we will form P+WV-m and

thus compute only the new elements WN-m-1 of Am+iat each step.

Thus, at each step of the Householder QR decom-position, we must compute the new Pm+1 in Eq. (16) andthe new elements of A+l and km+l in Eqs. (15). Theevaluation of Eq. (16) requires computation of thevector outer product umu in Eq. (16) and then Pm+The evaluation of Eqs. (15) requires a matrix-matrixand matrix-vector multiplication (by the same matrix).At each successive step or cycle, we store one new rowand column of R and one new elements of b'. The orderof the problem in Eqs. (15) and (16) to be solved thusdecreases by 1 at each step.

We now consider one optical approach to realize Eqs.(15) and (16) at each of the N - 1 steps with attentionto pipelining of data and operations. To compute thenew P'm+, we note that _m depends on only one columnof A m or WN-m. We can thus form the vector outerproduct um u in Eq. (16) by feeding one column a+,of Am (or Wm+1 of WN-m) to the AO cell frequency-multiplexed. At N - m successive times TB, we pulseon a different LED in sequence (first the bottom LEDat TB, then the next one at 2TB, etc.). This techniqueproduces successively one row or column of the sym-metric matrix w m+ TY+i at each successive TB. Thisrealization of the vector outer product on the system ofFig. 1 is a new operation described for such a system.After (N - m)TB, this vector outer product is produced.In the next TB, we assemble Pm in Eq. (16) in dedicatedanalog hardware, and we have thus completed the firstoperation required. For an N X N matrix, this requires(N + 1)TB of time.

We now consider the second operation, calculationof Eq. (15) required at each step. Recall that we en-tered the first column w+1 of WN-m into the AO cellat 1TB. If we enter the other columns of WN-m (oneat a time at subsequent TB), when we have finishedcomputing Eq. (16), the full WN-m matrix (the newelements of Am) will all be present (frequency and timeor space multiplexed) in the cell. Thus, immediatelyafter calculation of Eq. (16), we can feed the columns ofP ±.+ to the LEDs and at successive TB obtain succes-sive columns of A+ in Eq. (15a). We can compute Eq.(15b) in parallel with Eq. (5a) by adding an extra (N+ 1)th frequency and feeding the elements of b to theAO cell one at a time (in parallel with the columns ofAm). Thus (N - m)TB after the calculation of Eq. (16),we have evaluated Eq. (15).

For an N X N matrix A or W, we thus require NTBof time for calculation of Eq. (15) and (2N + 1)TB forone total step of the Householder QR decomposition,i.e., evaluation of Eqs. (15) and (16) for the present Ammatrix and b vector. The optical system thus requires(2N + 1)TB for step 1, [2(N - 1) + 1] TB for step 2, etc.,or (N2 + 2N - 3) TB of time for the N - 1 steps. Thedigital realization of this algorithm requires 2N3/3multiplications. Assuming TB to equal a digital mul-tiplication time and assuming N to be large, we see thatthe optical realization requires N 2 TB of time, whereasthe time required digitally12,13 is proportional to N 3 TB.The optical realization is thus a factor of N faster. This

15 November 1983 / Vol. 22, No. 22 / APPLIED OPTICS 3575

Page 5: Direct and implicit optical matrix-vector algorithms

occurs since the optical system performs N vector innerproducts in parallel to each TB.

The flow of data in the optical system is also quitesignificant. As soon as a column and element of Am andbm are computed, they are immediately fed back to theAO cell (which effectively stores them 9 in the properformat until they are needed). The architecture insuresthat all the AO cell data (the prior A, and K data) areprocessed before they reach the end of the AO cell. Thedigital realization of a QR decomposition can oftenencounter storage and memory access problems that canappreciably increase the required time per step above2N3TB/3. The optical realization is seen to have ex-cellent pipelining and flow of data and operations andto be a factor of N faster than the equivalent serialdigital system.

V. Extensions

Our choice of the Householder QR decompositionalgorithm as our case study was also made because ofthe rather general nature of this algorithm, i.e., itsability to provide other structured matrices (besides justthe upper triangular matrix O) and its use in moregeneral problems (e.g., computation of the eigenvaluesof a matrix). In this section, we briefly discuss howsimple extensions of our basic technique can achievesuch operations. The formulation we advance is quitenew, although it follows directly from standard matrixalgebra.'2"13 We thus do not elaborate on the detailsof these extensions on our optical processor of Fig. 1.

The general philosophy associated with such exten-sions is to decompose a given matrix A into a simplerstructure (other than an upper triangular matrix as inQR decomposition). The choice depends upon theproblem and application and upon the simplified matrixproblem that the given dedicated digital processor canmost easily solve. The digital computation of the ei-genvalues and eigenvectors of a matrix is the majorapplication for these special decompositions. Thus, wewill now consider optical implementations for suchapplications. If we can reduce A to a tridiagonal matrix,Sturm sequence and bisection techniques' 4 as well asothers can be used (in a digital processor) to computethe eigenvalues of A quite easily and efficiently (obviousoptical solutions also exist). If A is full, the simplestform to which it can be reduced by similarity transformsis a Hessenberg matrix (an upper triangular matrix withone additional nonzero diagonal row of elements belowthe main diagonal). It is often much simpler (compu-tationally and in terms of memory storage and memorymanagement) to first transform A into a Hessenbergmatrix H or to a tridiagonal matrix and to then computethe eigenvalues of A.14 These remarks on computa-tional efficiency and memory requirements parallelthose that motivated our QR decomposition (Sec. III)to solve partial differential equations and linear systemsof equations in Sec. II.

A. Similarity Transformations

The general procedure used in similarity transfor-mations is to transform A,-, into AM using transfor-mation matrices T:

A, = T-1m -ITm, (17)

where AO = A and m = 1, . . . , M. The successive ap-plication of Eq. (17) transforms A into a matrix Bwhere

B = AM = T- 1 AT (18)

and T = T .. TM.To demonstrate the usefulness of such a transfor-

mation and the definition of a similarity transformation,we consider calculation of the eigenvalues X and ei-genvectors o of B (i.e., B 0 = Xh) and how they relate tothe eigenvalues and eigenvectors of A. We define x =To/. Then

A x = (T B T-1)T = T _ = TX = AT = A_, (19)

where the first equality follows from Eq. (18) (i.e., A =T B T-'), the second equality follows from matrixmultiplication and T-'T = I, the third equality is thedefinition of X ahd 4, and the last equality follows fromour definition of x = T 4. From Eq. (19), we see that theeigenvalues X of A are also the eigenvalues of B and thatx = T 0 is the eigenvector associated with the eigenvalueX. By a similarity transformation as in Eq. (18), wemean that the eigenvalues of B and A are equal.

The structure of B should be chosen so that (a) de-termination of the eigenvalues of B is as simple as pos-sible and (b) the eigenvalue problem B = X_0 is noworse-conditioned than the original problem. By (b),we mean that small changes or errors in the elements ofB should not affect the computed X for _ anymore thansimilar errors in the elements of A would.

The simplified structures possible for B depend onthe properties of A. For a general full matrix A with nostructural property, the simplest structure for B is aHessenberg matrix. If A is Hermitian (symmetric), onecan show' 5 that B can be reduced to a HermitianHessenberg matrix (i.e., a tridiagonal matrix).

B. Hessenberg and Tridiagonal Decompositions

We now discuss how one can optically reduce A to aHessenberg form in N - 2 steps. We use a modifiedHouseholder algorithm similar to the one used in Sec.IV to reduce A to upper triangular form in N - 1 steps.In this algorithm, we multiply A from the left and rightby Householder decomposition matrices P. At step m- 1, we calculate

M = P' m -,E = LA-iPm, (20)

where P, is an orthogonal Householder decompositionmatrix as before. Since the L are symmetric and or-thogonal, P,, = PE1 . It follows from Eq. (20) and Sec.IV that calculation of L-I requires calculations anal-ogous to Eq. (16) and calculation of A, in Eq. (20) re-quires a matrix-matrix-matrix multiplication. In Sec.IV, we showed how calculation of Eq. (16) or P-, couldbe achieved. In Ref. 9, we showed how the required

3576 APPLIED OPTICS / Vol. 22, No. 22 / 15 November 1983

Page 6: Direct and implicit optical matrix-vector algorithms

matrix-matrix-matrix multiplication can be performedon the system of Fig. 1 and that it easily pipelines.Thus the Hessenberg matrix-decomposition is realizedon Fig. 1 by a simple extension of our technique in Sec.IV.

If A is symmetric, the above procedure using themodified Householder algorithm will yield a tridiagonalmatrix. (The eigenvalues can then be obtained bySturm or other techniques.' 4) One can also computethe eigenvalues and eigenvectors of a matrix by powermethods (i.e., calculation of A2, A3, etc.). Thus far,optical techniques for eigenvalue computationl6"17 haveconsidered only such algorithms. If A is first reducedto Hessenberg form, computation of its powers is muchsimpler digitally. (There is not necessarily significantimprovement in the optical computation of the powersof a Hessenberg matrix.) Use of Hessenberg matricesis beneficial in various computations required in controltheory.' 8

C. Eigenvalue Computation

As our last extension, we consider a general techniquefor calculation of the eigenvalues of A and its opticalrealization. This technique involves reducing A toupper triangular form by successive orthogonal simi-larity transformations as in Eq. (20). This is known asthe QR algorithm for eigenvalue computation.' 2 Ateach step, this algorithm requires the Householder QRdecomposition. However, since this decomposition isnot a similarity transformation, an additional matrix-matrix multiplication will be necessary. It is wellknown'2 that any matrix can be decomposed by simi-larity transformations into an upper triangular matrixB = T-1A T whose diagonal elements are the eigen-values of A. If A is Hermitian, B will be diagonal withits elements being the eigenvalues of A.19 Many pos-sible similarity transformation matrices exist. Sincewe will utilize a QR technique, we consider unitarymatrices T (i.e., THT = I or T-' = TH where H denotesthe Hermitian transpose). The QR algorithm will de-termine one such unitary similarity transformationmatrix T. If A is real and its eigenvalues are real, T isorthogonal. (This is the case we will consider.) Thisapproach is often preferable' 2 to other optical meth-ods' 6"7 for eigenvalue computation by the powermethod.

We consider the case of a full matrix A. In this case,each step in the QR algorithm is computationally in-tensive, and thus optical techniques are attractive. Thealgorithm must compute B = T-'A T whose diagonalelements are the eigenvalues of A. We utilize a QR al-gorithm, and thus T = Q is orthogonal. In our opticalalgorithm, we first perform a QR decomposition of A =Ao as Ao = QiRi (by the optical Householder QR algo-rithm of Secs. III and IV). Next, we form Al = QjWe then perform a QR decomposition of Al (i.e., Al =Q2R1), form A2 = R.Q2, etc. This process is continueduntil, at step M, AM is upper triangular. From Ao =QjRi with Q, orthogonal, we know 1R = QTAo. Sub-stituting R, into A = RQi, we find A = QTAoQi-After M iterations of the QR algorithm (Am-, = Qm m

and formation of Am= Q), we obtain the newmatrixB=_QTAQQ(whereQ=Qm... Q). B is uppertriangular, and its diagonal elements are the eigenvaluesof A (by the nature of the similarity transformationused). In Ref. 12, the convergence of such an algorithm(i.e., B is an upper triangular matrix) is proven (for a realand symmetric matrix) together with the fact that thisalgorithm arranges the N eigenvalues of A in descendingorder (with the largest eigenvalue X appearing in thetop left corner of B and the smallest eigenvalue in thebottom right corner of B).

Convergence of this QR algorithm (and others) canbe greatly accelerated (thus reducing the number ofcycles M needed) by incorporating shifts into each cycle.For example, at step m, we compute A - mT =QmRm and then form A+ = &mQm + AmI where Amis the shift for the mth iteration. Either of the mostpopular shift techniques [Rayleigh or Wilkinson (Ref.12, pp. 524-9)] in numerical analysis can be imple-mented on our system, since each requires access to atmost the lower 2 X 2 matrix.

The optical realization of this new optical eigenvaluecomputation algorithm follows directly from Sec. IV.We realize the QR decomposition optically as before(Sec. IV) and form Ao = QiRi. We then form Al =RQ, optically (by a matrix-matrix multiplication).We then repeat our optical QR decomposition to cal-culate Al = Q2R9 and the optical multiplication to formA2 = R2Q2 etc. Each iteration of the QR algorithm re-quires [(N2 + 2N - 3) + (2N - 1)]TB = (N2 + 4N -4) TB of time. After M iterations of the QR decompo-sition, the matrix B and hence the eigenvalues of A areobtained in a total time M(N 2 + 4N - 4)TB. Theequivalent number of digital multiplications and therequired time (assuming a multiplication time TB) isM(5N 3/3) TB.- Thus the optical system is a factor of Nfaster and again offers good pipelining and flow of dataand operations (thus reducing memory storage and datamanagement problems).

VI. Discussion and Conclusion

In this paper, we have advanced two alternate tech-niques for optical systolic array processors: an implicitor Crank-Nicholson solution to partial differentialequations (rather than an explicit solution) and a director matrix-decomposition solution to matrix-vectorequations (rather than an indirect or iterative solutionusing the Richardson or similar algorithm). In caseswhen the elements of the matrix are fixed and the ma-trix is not prohibitively large, a direct solution may bepreferable. Such solutions are also quite appropriatein applications in which the same matrix must be op-erated upon many times (e.g., using different externalvectors for different cases). As a representative ex-ample of a direct solution, we considered the House-holder QR matrix-decomposition algorithm. We notedthat the matrix-decomposition is the computationallyburdensome operation. We thus proposed a directsolution in which the matrix-decomposition was per-formed optically, and the solution to the simplified

15 November 1983 / Vol. 22, No. 22 / APPLIED OPTICS 3577

Page 7: Direct and implicit optical matrix-vector algorithms

matrix-vector equation that results is performed indedicated digital hardware by back substitution.

The importance of data flow on an optical processorwas also noted and discussed for our specific case studyof the QR matrix-decomposition. Using our new fre-quency-multiplexed optical systolic array architecture,we considered the time required to achieve a matrix-decomposition optically (compared to digitally) andfound that the time required in the optical system wasproportional to N2 time units, whereas a digital systemrequires approximately N 3 multiplications. The speedimprovement obtained optically results from the par-allel nature of the optical systolic array processor whichperforms N vector inner products in parallel everyTB.

The Householder QR algorithm we used as our casestudy is of considerably more general use than othermatrix-decomposition algorithms. With small changesin the algorithm and the optical system, we can producevarious specific matrix structures such as Hessenbergor tridiagonal matrices. We can also use QR decom-position techniques to obtain the eigenvalues and ei-genvectors of a matrix. Such extensions and applica-tions were briefly described and detailed.

The support of this research by NASA Lewis Re-search Center (grant NAG 3-5) and the Air Force Officeof Scientific Research (grant AFOSR 79-0091) isgratefully acknowledged as are helpful discussions withC. P. Neuman of the Department of Electrical Engi-neering at Carnegie-Mellon University. We also sin-cerely acknowledge Jeffrey Speiser of NOSC for themotivation he provided us to consider orthogonaltransformations for matrix reduction.

References

1. A. Edison and M. Noble, "Optical Analog Matrix Processors,"AD646060 (Nov. 1966).

2. P. Mengert et al., U.S. Patent 3,525,856 (6 Oct. 1966).3. M. A. Monahan et al., Proc. IEEE 65, 121 (Jan. 1977).4. J. Goodman et al., Opt. Lett. 2, 1 (1978).5. D. Psaltis et al., Opt. Lett. 4, 348 (1979).6. M. Carlotto and D. Casasent, Appl. Opt. 21, 147 (1982).7. H. J. Caulfield et al., Opt. Commun. 40, 86 (1981).8. D. Casasent, Appl. Opt. 21, 1859 (1982).9. D. Casasent, J. Jackson, and C. Newman, Appl. Opt. 22, 115

(1983).10. L. Richardson, Philos. Trans. R. Soc. London, Ser. A 210, 307

(1910).11. R. K. Montoye and D. H. Lawrie, IEEE Trans. Comput. C-31,

1076 (1982).12. J. H. Wilkinson, The Algebraic Eigenvalue Problem (Clarendon,

Oxford, 1965).13. E. Issacson and H. B. Keller, Analysis of Numerical Methods

(Wiley, New York, 1966).14. A. R. Gourlay and G. A. Watson, Computational Methods for

Matrix Eigenproblems (Wiley, London, 1973).15. J. Stoer and R. Bulirsch, Introduction to Numerical Analysis

(Springer, New York, 1980).16. H. J. Caulfield, D. Dvore, J. W. Goodman, and W. Rhodes, Appl.

Opt. 20, 2263 (1981).17. B. V. K. V. Kumar and D. Casasent, Appl. Opt. 20, 3707

(1981).18. C. VanLoan, Math. Prog. Study 18, 102 (May 1982).19. L. Meirovitch, Computational Methods in Structural Dynamics

(Sijthoff and Noordhoff International Publishers B.V., Alpehnaan den Rijn, The Netherlands, 1980).

COURE TITLE:

COURSE TOPICS:

CONTACT:

COURSE LENGTH:

DATES/LOCATION:

CHEMICAL LASERS

A course covering the theory, design, hardware and tech-niques of those lasers where the population inversion iscreated by chemical reactions. Emphasis will be placedon the practical aspects of these devices such as engineeringdesign considerations, materials, and laser output character-istics. Aside from a treatment of the theory of chemicalreactions, molecular spectroscopy and first generationdevices, the course will also cover recent technologicaladvances, present capabilities, and limitations as wellas short-and-long term probable applications.

ENGINEERING TECHNOLOGY, INC.P.O. Box 8859Waco, TX 76714(817) 772-0082

3 DaysCost: $550.00

March 3-5, 1984 (Las Cruces, NM)

CONTINUING EDUCATION UNITS: 2 CEU's

3578 APPLIED OPTICS / Vol. 22, No. 22 / 15 November 1983