12
0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEE Transactions on Antennas and Propagation 1 Nested Reduction Algorithms for Generating a Rank-Minimized H 2 -Matrix from FMM for Electrically Large Analysis Chang Yang, Student Member, IEEE, and Dan Jiao, Fellow, IEEE Abstract—In this work, we develop efficient algorithms to generate a rank-minimized H 2 -matrix to represent electrically large surface integral operators for a prescribed accuracy. We first generate an H 2 -matrix using the Fast Multipole Method (FMM), and hence the complexity for H 2 -construction is as low as O(NlogN ) for solving electrically large surface integral equations. We then develop fast algorithms to convert the FMM- based H 2 -matrix whose rank is full asymptotically to a new H 2 - representation, whose rank is minimized based on accuracy. The proposed algorithms cost O(k 3 ) in time for each cluster in cluster basis generation, and O(k 2 ) in memory, where k is the minimal rank of the cluster basis required by accuracy. When the rank of the H 2 -matrix is a constant, the complexity of the proposed algorithms is O(N ) in both time and memory consumption. When the rank is a variable dependent on electrical size, the total complexity can be evaluated based on the rank’s behavior. The resultant rank-minimized H 2 -matrix has been employed to accelerate both iterative and direct solutions. Numerical exper- iments on large-scale surface integral equation based scattering analysis have demonstrated its accuracy and efficiency. Index Terms—Surface integral equations, electric field integral equations, electrically large analysis, fast multipole method, H 2 - matrix. I. I NTRODUCTION The H 2 -matrix [1] is a general mathematical framework for compact storage and efficient computation of dense matrices, using which fast solvers can be developed to speed up a computational method. There exist methods for generating an H 2 -matrix to represent electrically large surface integral operators (IEs). However, they are generally computationally expensive. For example, in an interpolation-based method [1], [2], the Green’s function is directly interpolated to obtain a nested H 2 -representation. The resultant rank is of full rank for electrically large kernels, and the transfer matrix and coupling matrix are both dense. As a result, the complexity of using an interpolation-based method to generate the H 2 - matrix of an electrically large IE can be as high as O(N 3 ) in computation time. The ACA (Adaptive Cross Approximation) [3], [4], in conjunction with reduced SVD, has also been employed to generate an H 2 -representation of IE operators [5], [6]. However, the complexity remains high for electrically large analysis, scaling as O(N 2 ) for electrically large SIEs considering rank’s growth with electrical size. The methods reported in [7]–[10] show a good complexity for electrically The authors are with the School of Electrical and Computer En- gineering, Purdue University, West Lafayette, IN 47907 USA (e-mail: [email protected]). Manuscript received May 25, 2020, and revised on Aug. 28, 2020. small or moderate problems. Nevertheless, their computational complexity increases for electrically large analysis and remains to be studied in detail. In this work, we propose to start from a Fast Multipole Method (FMM) [11]–[13] based H 2 -representation of an electrically large IE operator, and convert it to a new H 2 - representation, whose rank is minimized based on accuracy. In this way, the initial H 2 -matrix obtained from the FMM has a low complexity of O(NlogN ) for electrically large surface IEs. The FMM and multi-level fast multipole algorithm (MLFMA) have been widely used for solving electrically large electromagnetic problems [14]. There exists research to compress MLFMA to further reduce the matrix-vector multiplication time and storage [15], the cost of which can be high due to the initial high rank of the MLFMA. The matrix structure resulting from an FMM is an H 2 -matrix. Furthermore, the FMM-based H 2 -matrix is special in the sense that its coupling matrix is diagonal, and its transfer matrix is sparse [16]. However, the resultant rank is a full rank asymptotically for electrically large analysis, since it increases quadratically with the electrical size of a cluster in a surface IE. Sometimes, the rank of a cluster in an FMM- based representation can even be larger than the size of the cluster. As a consequence, there is room to further improve the efficiency of the FMM-based representation. In [17], a linear-complexity algorithm is developed to convert a constant- rank H 2 -matrix, whose rank is not minimized for accuracy, to a new rank-minimized H 2 -matrix. This algorithm is used in [18] to generate an H 2 -matrix from the FMM. Even so, the resultant complexity is high due to the large rank of FMM in the electrically large analysis. In [19], a more efficient conversion algorithm is developed, the complexity of which is O(k 4 ) for each cluster, where k is the minimal rank of the cluster basis required by accuracy. In this work, we develop fast nested reduction algorithms of further reduced complexity of O(k 3 ) for each cluster to convert the FMM-based representation to a new H 2 -matrix whose rank is minimized based on accuracy. Different from the method reported in [19], the proposed new algorithms are much simplified and take much less CPU run time while retaining the same accuracy. Taking into account the rank’s growth with electrical size, the overall time complexity of the proposed algorithms is O(N 1.5 ) for electrically large SIEs. The resultant rank-minimized H 2 -matrix can then be used to accelerate both iterative and direct solutions. Comparisons with analytical Mie series solutions and reference solutions Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Nested Reduction Algorithms for Generating a Rank

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

1

Nested Reduction Algorithms for Generating aRank-Minimized H2-Matrix from FMM for

Electrically Large AnalysisChang Yang, Student Member, IEEE, and Dan Jiao, Fellow, IEEE

Abstract—In this work, we develop efficient algorithms togenerate a rank-minimized H2-matrix to represent electricallylarge surface integral operators for a prescribed accuracy. Wefirst generate an H2-matrix using the Fast Multipole Method(FMM), and hence the complexity for H2-construction is aslow as O(NlogN) for solving electrically large surface integralequations. We then develop fast algorithms to convert the FMM-based H2-matrix whose rank is full asymptotically to a new H2-representation, whose rank is minimized based on accuracy. Theproposed algorithms cost O(k3) in time for each cluster in clusterbasis generation, and O(k2) in memory, where k is the minimalrank of the cluster basis required by accuracy. When the rankof the H2-matrix is a constant, the complexity of the proposedalgorithms is O(N) in both time and memory consumption.When the rank is a variable dependent on electrical size, thetotal complexity can be evaluated based on the rank’s behavior.The resultant rank-minimized H2-matrix has been employed toaccelerate both iterative and direct solutions. Numerical exper-iments on large-scale surface integral equation based scatteringanalysis have demonstrated its accuracy and efficiency.

Index Terms—Surface integral equations, electric field integralequations, electrically large analysis, fast multipole method, H2-matrix.

I. INTRODUCTION

The H2-matrix [1] is a general mathematical framework forcompact storage and efficient computation of dense matrices,using which fast solvers can be developed to speed up acomputational method. There exist methods for generatingan H2-matrix to represent electrically large surface integraloperators (IEs). However, they are generally computationallyexpensive. For example, in an interpolation-based method [1],[2], the Green’s function is directly interpolated to obtain anested H2-representation. The resultant rank is of full rankfor electrically large kernels, and the transfer matrix andcoupling matrix are both dense. As a result, the complexityof using an interpolation-based method to generate the H2-matrix of an electrically large IE can be as high as O(N3) incomputation time. The ACA (Adaptive Cross Approximation)[3], [4], in conjunction with reduced SVD, has also beenemployed to generate an H2-representation of IE operators[5], [6]. However, the complexity remains high for electricallylarge analysis, scaling as O(N2) for electrically large SIEsconsidering rank’s growth with electrical size. The methodsreported in [7]–[10] show a good complexity for electrically

The authors are with the School of Electrical and Computer En-gineering, Purdue University, West Lafayette, IN 47907 USA (e-mail:[email protected]).

Manuscript received May 25, 2020, and revised on Aug. 28, 2020.

small or moderate problems. Nevertheless, their computationalcomplexity increases for electrically large analysis and remainsto be studied in detail.

In this work, we propose to start from a Fast MultipoleMethod (FMM) [11]–[13] based H2-representation of anelectrically large IE operator, and convert it to a new H2-representation, whose rank is minimized based on accuracy.In this way, the initial H2-matrix obtained from the FMMhas a low complexity of O(NlogN) for electrically largesurface IEs. The FMM and multi-level fast multipole algorithm(MLFMA) have been widely used for solving electricallylarge electromagnetic problems [14]. There exists researchto compress MLFMA to further reduce the matrix-vectormultiplication time and storage [15], the cost of which canbe high due to the initial high rank of the MLFMA. Thematrix structure resulting from an FMM is an H2-matrix.Furthermore, the FMM-based H2-matrix is special in thesense that its coupling matrix is diagonal, and its transfermatrix is sparse [16]. However, the resultant rank is a fullrank asymptotically for electrically large analysis, since itincreases quadratically with the electrical size of a cluster ina surface IE. Sometimes, the rank of a cluster in an FMM-based representation can even be larger than the size of thecluster. As a consequence, there is room to further improvethe efficiency of the FMM-based representation. In [17], alinear-complexity algorithm is developed to convert a constant-rank H2-matrix, whose rank is not minimized for accuracy, toa new rank-minimized H2-matrix. This algorithm is used in[18] to generate an H2-matrix from the FMM. Even so, theresultant complexity is high due to the large rank of FMMin the electrically large analysis. In [19], a more efficientconversion algorithm is developed, the complexity of whichis O(k4) for each cluster, where k is the minimal rank of thecluster basis required by accuracy.

In this work, we develop fast nested reduction algorithmsof further reduced complexity of O(k3) for each cluster toconvert the FMM-based representation to a new H2-matrixwhose rank is minimized based on accuracy. Different fromthe method reported in [19], the proposed new algorithmsare much simplified and take much less CPU run time whileretaining the same accuracy. Taking into account the rank’sgrowth with electrical size, the overall time complexity of theproposed algorithms is O(N1.5) for electrically large SIEs.The resultant rank-minimized H2-matrix can then be usedto accelerate both iterative and direct solutions. Comparisonswith analytical Mie series solutions and reference solutions

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 2: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

2

from a commercial tool have validated the accuracy andefficiency of the proposed method for solving electrically largesurface IEs.

The rest of this paper is organized as follows. In SectionII, we review the background of this paper. In Section III,we present the approach to generating an H2-matrix from theFMM. In Section IV, we detail the proposed nested reductionalgorithms, which convert an FMM-based H2-matrix to arank-minimized H2-matrix efficiently. In section V, we showanother algorithm to further reduce the complexity of thenested reduction algorithm. In Section VI, accuracy and com-plexity are analyzed. In Section VII, a number of numericalresults are presented to validate the accuracy and complexityof the proposed algorithms. In Section VIII, conclusions aredrawn.

II. BACKGROUND

A. Surface Electric Field Integral Equations (S-EFIEs)

The Method of Moments based discretization of a surfaceelectric field integral equation results in a dense system matrixZ, the mn-th entry of which can be written as

Zm,n =

∫Sm

∫Sn

( ~fm · ~fn −1

k20∇s · ~fm∇s · ~fn)GdS′dS, (1)

where G = e−jk0R

4πR is the Green’s function, k0 is the wavenumber, R = |r − r′| is the distance between a source pointr′ and an observer point r, ~fm and ~fn are the vector baseson triangular patches Sm, and Sn respectively. Here, triangularelements are used to discretize a surface, and RWG bases [20]are employed to expand unknown currents.

The Z can be expressed as the sum of the electric scalarand magnetic vector potential based components as follows

Z = −Zφ + ZAx + ZAy + ZAz, (2)

where,

Zφ,mn =1

k20

∫Sm

∫Sn

(∇s · ~fm∇s · ~fn)GdS′dS, (3)

and

ZAξ,mn =

∫Sm

∫Sn

(fmξfnξ)GdS′dS, ξ = {x, y, z}. (4)

B. H2-matrix

An H2-matrix [1] represents the interaction between twobinary trees, an example of which is illustrated in Fig. 1. Wecall the binary tree as a cluster tree since each node in thetree corresponds to a cluster of unknowns. All the source basisfunctions form a column tree, whereas all the observer basesform a row tree. When the testing function is chosen to be thesame as the basis function, the resultant matrix is symmetric,whose row and column trees are identical. In an H2-matrix, bychecking the admissibility condition level by level between arow cluster tree and a column cluster tree, the original matrix ispartitioned into multilevel admissible and inadmissible blocks.The admissible block is represented by a green submatrix, andthe inadmissible one is shown in red in Fig. 1. Physically, an

Fig. 1: Illustration of an H2-matrix.

admissible block represents the interaction between separatedsources (column cluster) and observers (row cluster), whichsatisfies the following admissibility condition

d < ηD (5)

where d denotes the maximal diameter of the row and columncluster, D is the distance between two clusters, and η is apositive parameter, which can be used to control the matrixpartition. An inadmissible block is stored in its original fullmatrix format. An admissible block has a compact storage.Take an admissible block formed between a row cluster t anda column cluster s as an example. It is stored as VtSt,s(Vs)H ,where S is called a coupling matrix, and V is called a clusterbasis. V is nested in the sense that for a cluster t whosechildren clusters are t1 and t2, their cluster bases have thefollowing relationship

Vt =

[Vt1

Vt2

] [Tt1

Tt2

], (6)

in which T matrix is called a transfer matrix.

III. METHOD FOR GENERATING AN INITIAL H2-MATRIXFROM FMM

According to the addition theorem, Green’s function G canbe written as

e−jk0R

4πR≈∑p

ωpe−jk0 ~Sp·~dTL, ~X( ~Sp), (7)

where p refers to the index of the sampling point defined on aunit spherical surface, ~Sp and ωp are the position vector, andthe quadrature weight of each sampling point, respectively;L is the truncation parameter used in the addition theorem;~d = ~R− ~X = r− r′ − ~X , ~X = ~O1 − ~O2, with ~O1 being thecenter of an observer group whose points are denoted by r,and ~O2 the center of a source group whose points are r′, and

TL, ~X( ~Sp) =−jk04π

L∑l=0

(−j)l(2l + 1)

4πh(1)∗l (k0X)Pl( ~Sp · X),

(8)

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 3: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

3

where h(1)l stands for a spherical Hankel function of the

first kind, superscript ∗ denotes a complex conjugate, Pl areLegendre polynomials, and X denotes a unit vector along ~X .Substituting (7) into (3), we obtain

Zφ,ij = Vi,pSp,pVj,pH , (9)

in which

Vi,p =∑q

wi,qk0

e−jk0~Sp·(~ri− ~O1)∇s · ~fi(~ri,q) (10)

Sp,p = diag(ωpTl, ~O1− ~O2( ~Sp)) (11)

Vj,p =∑q

wj,qk0

e−jk0~Sp·(~rj− ~O2)∇s · ~fj(~rj,q), (12)

where wq are weighting coefficients used for a numericalsurface integration on the source, and field triangular patchrespectively. It is clear that the number of the sampling points,i.e., the number of p indexes, is the rank of cluster basis V. Inθ direction, the sampling points are chosen as Gauss-Legendrepoints; while in φ direction, a uniform sampling is used.

Consider a cluster i, whose parent cluster is i′, the Vi′ isrelated to Vi by Vi′ = ViT, where T is called a transfermatrix shown as the following

Tpp′ =∑

m,l≤K

Yml(θ′p′ , φ

′p′)Y

∗ml(θp, φp)ωpe

−jk0 ~S′p( ~O1− ~O′1),

where p′ is the index of the sampling points for Vi′ , K herestands for the number of quadrature points in θ directionof Vi, and Yml are spherical harmonics. T is sparse fora prescribed accuracy, whose number of entries per column(row) is a constant regardless of matrix size. This number canbe further reduced using a filter [13]. The H2 representationfor the vector magnetic potential term (4) can be generated ina similar way as (9).

IV. PROPOSED NESTED REDUCTION ALGORITHM

For an arbitrary cluster t in the cluster tree, let its originalcluster basis obtained from the FMM be Vt. Such a Vt is ofsize #t by kF , where #t denotes the number of unknownscontained in cluster t, and kF is the rank resulting from theFMM. Since kF is large, the algorithm presented here is togenerate a new cluster basis Vt such that its rank k is theminimal one required by accuracy, and hence being muchsmaller than kF . Certainly, such a rank reduction algorithmmust retain the original nested property of the V across thecluster tree, while keeping the computational complexity low.In addition, for an electrically large analysis, both k and kF areelectrical size-dependent, and hence tree level dependent. Forthe simplicity of the notation, we would not add a tree-leveldependence for k and kF . But it should be noted that they aredifferent at different tree levels. The kF scales quadraticallywith the electrical size, whereas k scales linearly with theelectrical size [21].

We start from the leaf level l = L (root level is at l = 0)of the cluster tree. For a leaf cluster t, we perform an SVDon Vt. Based on prescribed accuracy ε, we keep the singularvectors whose singular values normalized by the maximum

one are no less than ε. These singular vectors make the newleaf cluster basis Vt. Thus, we obtain

Vt#t×kF

ε≈ Vt

#t×k(Ut)Hk×kF , (13)

where the subscripts denote the matrix dimension, and (Ut)H

is the other factor resulting from the SVD. Since #t isbounded by leafsize which is a constant, the cost of (13)is constant, which is small.

We then proceed to the non-leaf level. For each nonleafcluster t, its cluster basis is related to its children clusters’bases as shown in (6). Now the children clusters’ bases havebeen changed. Hence, the transfer matrices must be updatedfor nonleaf cluster t. To see how to update them, we cansubstitute (13) into (6) obtaining

Vt =

[(Vt1)

(Vt2)

] [(Ut1)HTt1

(Ut2)HTt2

], (14)

from which it can be seen that

Tt =

[(Ut1)HTt1

(Ut2)HTt2

](15)

makes the new transfer matrix of cluster t. However, its rankmay not be the minimal one required by accuracy. Therefore,we perform another SVD on (15) and truncate singular vectorsbased on accuracy ε to obtain the new transfer matrix Tt. Asa result, we have

Tt(k1+k2)×kF

ε≈ Tt

(k1+k2)×k(Ut)Hk×kF , (16)

where the rank k1 + k2, which is the sum of the rank of thetwo children’s new cluster bases, is further reduced to k basedon prescribed accuracy.

The aforementioned procedure at a non-leaf level is thenrepeated level by level up, until we reach the minimal levelthat has admissible blocks. At that level, we finish generatingthe new cluster bases. After that, we update the orginal FMM-based coupling matrix, St,s, as follows for each admissibleblock

St,s = (Ut)HSt,sUs. (17)

The overall procedure is a bottom-up tree traversal pro-cedure summarized as follows, which is termed the NestedReduction Algorithm (NRA).

At each tree level of the cluster tree, we do the followingcomputation:

1) For a leaf cluster t, do (13) to obtain new cluster basisVt based on required accuracy ε.

2) For a non-leaf cluster t, compute (15) to obtain Tt, thenfactorize it to obtain (16), and hence new transfer matrixTt as well as (Ut)H , based on accuracy ε.

If column cluster bases are different from row cluster basessuch as those in an unsymmetrical matrix, Steps 1 and 2 arerepeated for column cluster bases. After cluster basis update,for each admissible block, we update coupling matrix basedon (17).

At a non-leaf level, the cost of computing (15) is only ofO(kkF ), which is O(k3), since transfer matrix T is sparse.However, the subsequent step of computing the SVD of Tt

can be costly, which is O(k2kF ), and hence scaling as O(k4).

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 4: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

4

To circumvent this cost, we propose a fast algorithm as the fol-lowing. For a non-leaf cluster t, in (15), we randomly chooseO(k1+k2) columns to compute its low-rank factorization (16)instead of operating on the full kF columns. This can be donebecause the rank of Tt is no greater than its row rank, which isk1+k2. We then compute a full cross approximation (FCA) [1]with prescribed accuracy on the randomly selected O(k1+k2)columns of Tt, obtaining row pivots τ and column pivots σ.As a result, Tt is factorized to

Tt ε≈ Tt

:,σ(Ttτ,σ)−1Tt

τ,: (18)

whose new rank is k. Since the full cross approximationis performed on the O(k1 + k2) columns of Tt, the entirecomputational cost of obtaining (18) is reduced to O(k3) ascompared to a brute-force SVD-based low-rank compressionof Tt. In (18), Tt

:,σ denotes the selected k columns of Tt,whose indexes are contained in σ. The Tt

:,σ multiplied by thefollowing small k by k matrix, (Tt

τ,σ)−1, is nothing but the

new transfer matrix of t, thus

Tt = Tt:,σ(T

tτ,σ)−1, (19)

whereas the(Ut)H = Tt

τ,:, (20)

is the k rows of Tt, whose row indexes are contained in τ . Inthis way, the (Ut)H can be obtained in O(k3) cost because itinvolves k-rows of (U)H multiplied by a sparse T, as can beseen from (15).

It is worth mentioning the full cross approximation insteadof adaptive cross approximation is employed here because theaccuracy of the former is better than the latter. Furthermore,for a low-rank matrix, the accuracy of an FCA is guaranteed.Certainly, the SVD can be directly used on the selected O(k)columns to obtain new bases, which has a reduced cost ofO(k3) as well. However, if we do that, the subsequent step ofobtaining (Ut)H would cost more than O(k3).

The pseudo-code of the aforementioned fast NRA is shownin Algorithm 1. In line 2 of this algorithm, Algorithm 2 iscalled starting from the leaf level of the cluster tree, whichfactorizes V = V(U)H for each leaf cluster, and thenfactorizes T = T(U)H for each non-leaf cluster. Algorithm 2yields new rank-minimized cluster bases based on accuracy.In line 3 we apply Algorithm 3 to the root of the block clustertree of the H2-matrix, which updates the coupling matrix ofeach admissible block.

Algorithm 1 Nested Reduction Algorithm

1: procedure NESTED REDUCTION ALGORITHM(t, b)2: update cluster basis(t)3: update coupling matrix(b)4: end procedure

From the aforementioned cost analysis given along with thedescription of the algorithm, it can be seen that the time costof obtaining new cluster bases Vt is low, which is O(k3) foreach cluster. Here, notice that k is the minimal rank requiredby accuracy instead of the original FMM’s rank. However, the

Algorithm 2 Update Cluster BasisThis algorithm efficiently obtains new nested cluster baseswith rank minimized based on accuracy

1: procedure RANDOMIZED UPDATE CLUSTER BASIS(t)2: if t is a non-leaf cluster then

3: Compute T =

[(Ut1)HTt1

(Ut2)HTt2

]4: Randomly select |ct| columns from T to form Tw.5: Do FCA based on ε on Tw to get τ and σ6: Obtain (Ut)H = Tt

τ,:, T = Tt:,σ(T

tτ,σ)−1

7: else8: Do SVD on Vt such that Vt = Vt(Ut)H

9: end if10: end procedure

Algorithm 3 Update Coupling MatrixThis algorithm updates coupling matrices

1: procedure UPDATE COUPLING MATRIX(b)2: if b is an admissible block then3: St,s = (Ut)HSt,sUs

4: else if i is b’s child then5: update coupling matrix(bi)6: end if7: end procedure

storage of Ut would cost O(kkF ) units for each cluster, thusbeing O(k3). In the next section, we propose another NRA toreduce the memory cost.

V. NEST REDUCTION ALGORITHM WITH REDUCEDMEMORY COMPLEXITY

In this new algorithm, there are also two steps. One is togenerate new cluster basis whose rank is minimized, and theother is to update coupling matrices, similar to the algorithmpresented in previous section. However, we do not explicitlycompute or store U matrix for each cluster. As can be seenfrom (13) and (16), if the new cluster basis Vt is made unitary,then (Ut)H is nothing but the projection of the original basisonto the new basis, thus

(Ut)H = (Vt)HVt. (21)

Hence, whenever (Ut)H is involved in computation, we canutilize the nested property of both the original basis Vt andthe new basis (Vt) to compute it efficiently. Storage wise, weonly need to store the new cluster basis (Vt) whose rank isminimized, and the original cluster basis Vt which is sparse,thus bypassing the storage of (Ut)H . Next, we elaborate thisalgorithm.

At the leaf level, the computation is the same as that inthe previous algorithm, where SVD is used to obtain Vt sothat each leaf cluster basis is unitary. At a non-leaf level, foreach cluster t, we randomly select O(k1+ k2) columns of Tt

to compute (15). Let this set be ct. Computing ct-columns ofTt is the same as computing ct-columns of (Vt)HchV

t, where(Vt)ch is a block diagonal matrix containing t’s two children’s

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 5: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

5

new cluster bases. To see this more clearly, we can rewrite (15)as

Tt =

[(Vt1)H

(Vt2)H

] [Vt1Tt1

Vt2Tt2

], (22)

which is nothing but

Tt = (Vt)HchVt. (23)

For each index c in the set ct, we form a cardinal vectorec, which has only one non-zero element at the c-th entry.Multiplying (Vt)HchV

t by ec is the same as computing Vtec

first, and then multiplying the resultant vector by (Vt)Hch, eachof which costs O(nlogn) complexity using the nested propertyof both bases, where n is the size of cluster t. After obtainingthe ct columns of Tt, we perform an SVD on it based onprescribed accuracy ε to obtain new transfer matrix Tt whoserank is reduced to k. The cost of this step is O(k3). In addition,such a new transfer matrix is unitary. The aforementionedprocedure of computing new transfer matrix T at a non-leaflevel continues level by level up, until the minimum levelhaving admissible block is reached.

The pseudo-code of the new algorithm is shown in Al-gorithm 4, in which the fast matrix-vector multiplicationalgorithm for Vt and (Vt)H are shown in Algorithm 5, andAlgorithm 6 respectively.

Algorithm 4 new update cluster basisThis algorithm is for new cluster basis generation.

1: procedure NEW UPDATE CLUSTER BASIS(t)2: if t is a non-leaf cluster then3: Randomly select ct pivots from kF columns of

t’s original transfer matrix Tt

4: for c ∈ ct do5: ωt = ec

6: recursively multi old trans(t, c)7: recursively multi new trans(t, c)8: end for9: Use the resultant ct columns of Tt to form Tt

w

10: Do SVD on Ttw based on ε to get Tt

11: else12: Do SVD on Vt to obtain Vt

13: end if14: end procedure

After generating new cluster bases, next we update thecoupling matrix of each admissible block. Similarly, utilizing(21), (17) becomes the computation of

St,s = (Vt)HVtSt,s(Vs)HVs. (24)

Since V basis is of rank k, Vs has only k columns. Thecomputation of (Vs)HVs is the computation of k matrix-vector multiplications of (Vs)H multiplying the k columnsin Vs. This can be done using Algorithm 6 where the ˜ ontop of the symbols is removed. This step costs O(nlogn) forone vector. Hence the total cost of (Vs)HVs is O(knlogn),which is O(k3logk) since n scales as k2. Next, we multiplySt,s by the computed (Vs)HVs. Since St,s is diagonal,the cost of this step is O(kF k), which is O(k3). After

Algorithm 5 recursively multi old transThis algorithm performs a top-down tree travserval to computeVω

1: procedure RECURSIVELY MULTI OLD TRANS(t,ω)2: if t is a non-leaf cluster then3: for i is t’s child do4: ωi = Tiω5: recursively multi old trans(ti,ωi)6: end for7: else8: ωt = Vtωt

9: end if10: end procedure

Algorithm 6 recursively multi new transThis Algorithm performs a bottom-up tree travserval to com-pute VHω

1: procedure RECURSIVELY MULTI NEW TRANS(t,ω)2: if t is a non-leaf cluster then3: for i is t’s child do4: recursively multi new trans(ti,ωi)5: end for6: ωt ←

[(Tt1)Hω1

(Tt2)Hω2

]7: else8: ωt ← (Vt)Hωt

9: end if10: end procedure

that, we multiply (Vt)HVt by the k vectors resulting fromSt,s(Vs)HVs, which again can be computed by k matrix-vector multiplications using Vt, followed by the other kmatrix-vector multiplications of (Vt)H . The cost of this stepis also O(knlogn) utilizing the nest property of the clusterbases.

The memory requirement of the new algorithm for eachcluster is O(k2). This is because we do not store U for eachcluster. Instead, we store V and V. At the leaf level, thestorage is a constant for each cluster. At a non-leaf level, thestorage is a new transfer matrix of size k×k for each cluster.As for the original cluster basis V, the storage is a transfermatrix of size kF ×kF for each cluster. However, this transfermatrix is sparse, thus costing O(kF ) ≈ O(k2) units to storealso.

VI. ACCURACY AND COMPLEXITY

A. Accuracy

In the algorithms proposed in this work, only the stepsof SVD and FCA involve approximations. However, they areperformed subject to a prescribed accuracy. Hence, the overallprocedure is error controlled. In the FCA part, we randomlyselect O(k1 + k2) = c(k1 + k2) columns to perform FCA,where c is a constant coefficient greater than 1. Since thematrix upon which FCA is performed is known to be boundedby k1 + k2 in rank, and c is chosen to be greater than 1,

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 6: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

6

the resultant cross approximation yields an accurate rank-k representation. As can be seen from [10], the choice ofk columns is not unique in an ACA or FCA algorithm forapproximating a rank-k matrix. As long as the k columns arelinearly independent for a prescribed accuracy, they producean accurate rank-k model. Different from ACA, in an FCA,at every step, the maximum entry in the residual matrix isidentified, whose row and column pivots are chosen to generatea rank-1 model. Hence, the accuracy of FCA is guaranteed [1].If it happens that the randomly selected c(k1+k2) columns donot contain k linearly independent columns, it can be identifiedin the FCA process, and more columns can then be selected.

B. Time and Memory Complexity

The complexity analysis is dependent on the rank’s behav-ior. Consider a rank that grows linearly with the electricalsize. This is also shown to be the minimal rank required byaccuracy in an electrically large IE analysis [21]. Let n be thesize of a cluster, then in an SIE, the rank k scales as

k = O(√n). (25)

As for the rank of FMM, it scales quadratically with theelectrical size, thus

kF = O(n). (26)

In the proposed fast NRA algorithm, every cluster basisneeds O(k3+kF k) ≈ O(k3) operations to be computed. Basedon (25) and (26), the total time complexity for new clusterbasis generation can be computed as:

Ct =L∑l=0

2lO(kkF + k3)

=L∑l=0

2lO

√N

2lN

2l+

(√N

2l

)3 = O(N1.5).

(27)

The time complexity for coupling matrix updates can becomputed as:

Ct,S =L∑l=0

2lCspO(k2kF )

=L∑l=0

2lCspO

(√N

2l

2N

2l

)= O(N2),

(28)

where Csp denotes the maximal number of blocks formed bya single cluster, which is a constant [1]. The total memorycomplexity can be computed by adding the memory cost ofeach cluster basis with that of each admissible block as thefollowing

Cm =L∑l=0

(2lO(k3) + Csp2lO(k2))

≈L∑l=0

2lO

(N

2l

)1.5

= O(N1.5).

(29)

Hence, O(N1.5) memory is required during the rank reduction.After the rank reduction is finished, the memory of storing the

new rank-minimized H2-matrix would be just O(NlogN) forSIE, since only O(k2) is required for storing each cluster basis,and each admissible block.

As for the new algorithm shown in Section V, the com-plexity for every cluster is O(knlogn) in time and O(k2) inmemory. Adding the cost of every cluster across all tree levels,we obtain the total time cost as

Ct =L∑l=0

2lO(knlogn)

=L∑l=0

2lO

(√N

2lN

2ll

)= O(N1.5).

(30)

Similarly, updating all the admissible blocks has also O(N1.5)complexity, since at each tree level, there are 2lO(Csp) admis-sible blocks, each of which costs O(knlogn) operations toupdate. The total memory consumption including the memoryrequired for both cluster basis generation and coupling matrixupdates scales as

Cm =L∑l=0

(2lO(k2) + Csp2lO(k2))

=L∑l=0

2lO

(N

2l

)= O(NlogN)

(31)

for electrically large SIE analyses.It is worth mentioning that although the new algorithm

of Section V has the same time complexity in cluster basisgeneration as compared to that in Section IV, the constant infront of the N1.5 is larger. As for the coupling matrix update,the new algorithm in Section V has a reduced complexity ofO(N1.5), but the constant is also larger. We notice that theabsolute run time of the algorithm in Section V can exceedthat of the fast NRA algorithm in Section IV when simulatingmedium-sized problems. However, memory and its scaling rateare reduced. In addition, from the aforementioned complexityanalysis, it can be seen that for applications where the rank isa constant, the total complexity of the proposed algorithms isO(N).

C. Further Rank Reduction

In the proposed NRA algorithms, the original FMM-basedcluster bases are reduced in rank based on accuracy. To furtherexplore the redundancy in the matrix content, we employ thealgorithm in [16] on top of the new H2-matrix generated fromthe proposed NRA algorithms to further reduce its rank. Thealgorithm in [16] can be used to convert an H2-matrix whoserank is not minimized to a new one that is minimized based onaccuracy. However, if the algorithm in [16] is directly appliedto an FMM-based matrix, the conversion cost would be toohigh since the starting rank is high. In contrast, when using theH2-matrix generated from the proposed NRA algorithms to dofurther compression using [16], the cost is as low as O(k3)for each cluster and admissible block in time, and O(k2) inmemory, thus not increasing the complexity of the proposedNRA algorithms.

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 7: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

7

TABLE I: Accuracy Comparison between NRA and Fast NRA

N 4608 18432 73728 294912

err0(ε = 10−2) 7.44e-3 1.07e-2 1.66e-2 2.07e-2

err1(ε = 10−2) 6.92e-3 1.18e-2 1.83e-2 2.41e-2

err0(ε = 10−3) 1.03e-3 1.13e-3 1.79e-3 2.28e-3

err1(ε = 10−3) 9.42e-4 1.17e-3 1.95e-3 2.71e-3

VII. NUMERICAL RESULTS

A common set of simulation parameters are used for allexamples simulated in this section. Specifically, in the FMM,we set the truncation criterion of the addition theorem as L =k0d+ 1.8d

2/30 (k0d)

13 , where d0 = log10(

1εF

) and εF = 10−2,k0 is wavenumber, and d is the diameter of the targeted cluster.When generating an H2-representation of the FMM, we use6 points along each of the θ and φ directions in the Lagrangepolynomial based interpolation to obtain transfer matrices. Inaddition, we choose η = 0.8 in the admissibility condition andleafsize to be 40. The choice of η and leafsize follows thesame rule discussed in [2]. The only simulation parameter inthe proposed NRA algorithm is accuracy parameter ε, whichis a user-defined parameter. When randomly choosing #ct =c(k1 + k2) columns from Tt in fast NRA, we choose c to be4. The c can be chosen as an arbitrary constant, but a largerchoice of c would increase CPU run time. In the case that thedirect solver of [22] is used to solve the H2-matrix generatedfrom the proposed algorithm, the accuracy is set to be 10−3.

A. Accuracy

We first validate the accuracy of the proposed algorithmsbefore examining their complexity in time and memory.

1) Accuracy comparison between the proposed fast NRAalgorithm and the proposed NRA: In Algorithm 2, a fastalgorithm is developed to bypass the cost of the SVD in theproposed NRA algorithm. In this algorithm, O(k) columnsare randomly selected to perform FCA. Here, we examinethe accuracy of this approach as compared to the brute-forceNRA. We take a random vector x, and perform a matrix-vectormultiplication using the new H2-matrix generated by thebrute-force NRA to obtain ZH2x. We also use the fast NRAto generate an H2-matrix ZfastH2 , and compute ZfastH2x.The error is then assessed by comparing the result with theoriginal FMM-based matrix-vector multiplication, ZFMMx.The results for a sphere example whose diameter rangesfrom 2.21 wavelengths to 17.68 wavelengths are shown inTable I for ε = 10−2, and ε = 10−3 respectively. In thisTable, err0 = ‖ZH2x−ZFMMx‖/‖ZFMMx‖ represents therelative error of the newH2 generated by the brute-force NRA,err1 = ‖ZFastH2x−ZFMMx‖/‖ZFMMx‖ represents that ofthe fast NRA, As can be seen, the fast NRA is accurate, andits accuracy is also controllable like the NRA. In this example,we also simulate a larger sphere whose diameter is 35.36 m,and number of unknowns is 1, 179, 648. With ε = 10−2, wefind the accuracy of the proposed NRA algorithm is 2.42e−2,whereas that of fast NRA is 4.51e− 2.

0 50 100 150-20

-10

0

10

20

30

40

50

60

RC

S(d

Bsm

)

RCS over

HFSS

Fast NRA

(a)

0 50 100 150-20

-10

0

10

20

30

40

50

60

RC

S (

dB

sm

)

RCS over

HFSS

NRA in Section V

(b)

Fig. 2: RCS of a conducting cube simulated using two differentalgorithms. (a) Fast NRA. (b) NRA in Section V.

2) Scattering from a Conducting Cube: In this example, wecompute the bistatic RCS of a conducting cube of size 12.8λ×12.8λ×12.8λ at 300 MHz, which has 294,912 unknowns. Thenew H2-matrix generated from the proposed method is solvedusing the direct solver of [22]. The resultant bistatic RCSis compared with that from HFSS, which shows very goodagreement, as can be seen from Fig. 2(a). In this example, theaccuracy criterion used in the fast NRA is set to be ε = 10−3.We also use the Nested Reduction Algorithm with reducedmemory complexity, shown in Section V, to simulate the sameexample. As can be seen from Fig. 2(b), the algorithm showsgood accuracy as well.

3) Scattering from a Conducting Plate: In this example,we compute the bistatic RCS of a conducting plate of size60.8λ × 60.8λ at 300 MHz, which has 1,107,776 unknowns.A BiCGStab iterative solver with a diagonal preconditioneris employed to solve the new H2-matrix generated fromthe proposed fast NRA. The result is then compared withHFSS and shown in Fig. 3(a). Again, very good agreementis observed, which validates the accuracy of the proposedalgorithm. The simulation parameters are chosen the same asin the previous cube example. For comparison, we also plot the

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 8: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

8

0 50 100 150-10

0

10

20

30

40

50

60

70

80

90

RC

S(d

Bsm

)

RCS over

HFSS

Fast NRA

(a)

0 50 100 150-10

0

10

20

30

40

50

60

70

80

90

RC

S (

dB

sm

)

RCS over

HFSS

FMM and GMRES

(b)

Fig. 3: (a) RCS of a conducting plate simulated using the newH2-matrix generated from fast NRA. (b) RCS simulated usingFMM.

RCS obtained by using FMM with GMRES and ε = 5×10−3

in Fig. 3(b). As can be seen, the slight discrepancy betweenthe proposed method and HFSS is not due to the proposedmethod.

4) Scattering from an Array of Spheres: In this example,we simulate an array of spheres, having 4×4×4 spheres, eachof which has a diameter of 0.5525 m and is discretized with288 unknowns at 300 MHz. The distance between the twoadjacent spheres is 1.3820 m. The bistatic RCS is computedwith a direct solution of the newH2-matrix by the direct solverin [22] and compared with HFSS’s result. Good agreement isobserved, as can be seen from Fig. 4(a). The accuracy criterionused in the fast NRA is ε = 10−3.

We also simulate another array having 6 × 6 × 6 spheres,each of which is of diameter 0.8288 m and is discretized with648 unknowns, yielding 139,968 unknowns in total at 300

0 50 100 150-40

-30

-20

-10

0

10

20

30

40

RC

S(d

Bsm

)

RCS over

HFSS

Fast NRA

(a)

0 50 100 150-20

-10

0

10

20

30

40

50

60

RC

S(d

Bsm

)

RCS over

HFSS

Fast NRA

(b)

Fig. 4: Simulated RCS of an array of conducting spheres usingthe new H2-matrix generated from fast NRA, solved by adirect solver. (a) 4× 4× 4 array. (b) 6× 6× 6 array.

MHz. The distance between two adjacent spheres is 2.0730m. Again, the new H2-matrix generated from the Fast NRAis directly solved using the solver in [22], and the bistaticRCS is extracted and compared with HFSS. The comparisonis shown in Fig. 4(b). which further validates the accuracyof the proposed algorithm. The simulation parameters are thesame as those in the previous example.

For comparison, we use an FMM with GMRES to iterativelysolve the of 6×6×6 sphere array, the result of which is shownin Fig. 5(a). Similarly, we use the new H2-matrix generated byfast NRA and GMRES to iteratively solve the same example.As can be seen from Fig. 5(b), a similar agreement is observedin the RCS result.

5) Scattering from more complicated structures: We alsosimulate a coil, whose shape is shown in Fig. 6(a). Thiscoil has a diameter of 14.156 m, and illuminated by an

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 9: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

9

0 50 100 150-20

-10

0

10

20

30

40

50

60

RC

S (

dB

sm

)

RCS over

HFSS

FMM and GMRES

(a)

0 50 100 150-20

-10

0

10

20

30

40

50

60

RC

S (

dB

sm

)

RCS over

HFSS

Fast NRA and GMRES

(b)

Fig. 5: RCS of the 6 × 6 × 6 sphere array simulated using(a) FMM with a GMRES iterative solver. (b) New H2-matrixgenerated from the Fast NRA with a GMRES solver.

incident field at 300 MHz. After discretization, the number ofunknowns is 121,914. The new H2-matrix generated from theFast NRA is then solved using the fast direct solver of [22].The resultant bistatic RCS is compared with HFSS’s result inFig. 6(b). As can be seen, the two agree well with each other.

Another example we simulated is shown in 7(a). Thestructure has a diameter of 17.302 m, and discretized into172,077 unknowns at 300 MHz. The newH2-matrix generatedfrom the Fast NRA is again solved using the fast direct solverof [22]. The resultant bistatic RCS is compared with HFSS’sresult in Fig. 7(b). Good agreement can be observed.

B. Time and Memory Complexity

With the accuracy of the proposed algorithms validated,next, we examine the complexity of the proposed algorithmsfor generating a rank-minimized H2-matrix for electricallylarge analysis.

(a)

0 50 100 150-10

0

10

20

30

40

50

RC

S (

dB

sm

)

RCS over

HFSS

Fast NRA

(b)

Fig. 6: (a) Geometry and mesh of a coil. (b) Bistatic RCS.

1) The growth rate of the rank: First, we examine thegrowth rate of the rank with electrical size since it is oneof the key parameters in the complexity analysis. We use aconducting sphere as an example, and find its rank level bylevel using the proposed NRA algorithm with ε = 10−3. Theresults are listed in Table II. In this table, es denotes the largestelectrical size of all clusters at a tree level, n is the largestunknown number of all clusters, kF is the FMM’s rank (notethat in FMM, all cluster bases at the same tree level share thesame rank in common), k is the rank of the new H2-matrix ofthe entire SIE obtained from the proposed reduction algorithm,and kφ is the rank of the new H2-matrix for Zφ part only. Wealso plot the new rank as a function of electrical size in Fig. 8.From this figure, it can also be seen clearly that the rank scaleslinearly with electrical size, which agrees with the one usedin our complexity analysis.

2) Complexity Analysis: A suite of conducting spheres ofvarious diameters at 300 MHz is then simulated to examinethe time and memory complexity of the proposed fast NRAalgorithm. First, as a sanity check of the accuracy, we simulateone case, which is a sphere of 17.68 m diameter, whosenumber of unknowns is 294,912. The accuracy criterion ischosen to be ε = 10−3 in the fast NRA algorithm. We thenuse the direct solver in [22] to solve the H2 matrix generatedfrom the proposed algorithm. In Fig. 9, the simulated bistatic

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 10: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

10

(a)

0 50 100 1505

10

15

20

25

30

35

40

45

50

RC

S (

dB

sm

)

RCS over

HFSS

Fast NRA

(b)

Fig. 7: (a) Geometry and mesh of a joint. (b) Bistatic RCS.

TABLE II: Rank versus tree level using NRA with ε = 10−3

tree level es n kF k kφ

3 20.61 49152.00 42050 967 954

4 15.74 24576.00 25538 721 552

5 10.26 12288.00 11552 509 323

6 8.29 6144.00 7938 324 204

7 5.26 3072.00 3698 207 125

8 4.48 1536.00 2738 134 86

9 2.79 768.00 1250 90 57

10 2.46 384.00 1058 63 43

11 1.50 192.00 512 44 31

12 1.39 96.00 450 32 25

13 0.88 48.00 242 25 18

14 0.84 24.00 242 18 15

RCS is plotted as a function of θ, which reveals an excellentagreement with MIE series solution [23].

We then vary the size of the sphere and examine the timeand memory scaling of the proposed fast NRA algorithm. Thescaling data are listed in Table III. In this Table, N is thenumber of unknowns, D is the diameter of the conductingsphere, eMV denotes the relative error between the result ofan FMM-based matrix multiplied by a vector and the newH2-matrix multiplied by the same vector, tC,φ is the timefor converting the φ part of Z matrix, tC is the time forconverting the whole Z matrix, and tFMM (s) is the assembly

0 5 10 15 20 25

electrical size of a cluster

0

200

400

600

800

1000

Ne

w R

an

k

the growth rate of new rank with electrical size

Fig. 8: New rank’s growth rate with electrical size in a sphereexample.

0 50 100 1500

10

20

30

40

50

60

bis

tatic R

CS

(dB

sm

)

bistatic RCS over

Mie series

Fast NRA

Fig. 9: Simulated RCS of a conducting sphere using the newH2-matrix generated from Fast NRA in comparison with Mieseries solution.

TABLE III: Time, rank, and memory scaling of the proposedfast NRA with ε = 10−2 and comparison with the FMM-basedrepresentation.

N 4608 18432 73728 294912 1179648

D 2.21λ 4.42λ 8.84λ 17.68λ 35.38λ

eMV 6.92e-3 1.18e-2 1.83e-2 2.41e-2 4.51e-2

tC(s) 114.75 1180.56 7570.54 46893.61 295287.38

tC,φ(s) 30.29 287.69 1874.19 10269.44 70441.52

tFMM (s) 72.96 283.85 1188.37 4798.54 18731.23

kF,φ 288 1152 3698 11858 42050

kφ 21 50 115 302 898

tF,MV (s) 5.53 29.00 75.26 209.48 719.94

tMV (s) 1.27 1.89 5.61 19.90 65.12

mF (Mb) 278.36 1429.50 6633.92 27883.57 115991.14

mgF (Mb) 318.25 1622.83 7481.95 31392.16 130411.31

m(Mb) 176.05 776.78 3416.73 15623.0 80965.49

mC(Mb) 312.78 1793.66 8844.36 42692.75 231295.43

time of FMM. The kF,φ is the rank of the FMM H2-matrix’s φpart, kφ is the rank of the new H2-matrix’s φ part, tF,MV (s)

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 11: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

11

104

105

106

Number of unknowns

101

102

103

104

105

106

Tim

e o

f C

on

ve

rtin

g p

roce

ss

O(N1.5

)

O(N)

Converting time part

(a)

104

105

106

Number of unknowns

102

103

104

105

106

107

Ma

xim

um

Me

mo

ry U

sa

ge

of

Co

nve

rtin

g p

roce

ss

O(N1.5

)

O(N)

Maximum Memory Usage

(b)

Fig. 10: Complexity of the proposed fast NRA. (a) Time. (b)Memory.

is the time of the FMM H2-matrix multiplied by a vector,tN,MV (s) is the time of the new H2-matrix multiplied bythe same vector, mF (Mb) is the memory to store the FMM-based H2-matrix, mgF (Mb) is the memory used to assemblethe FMM-based matrix, m(Mb) is the memory to store thenew H2-matrix, mC(Mb) is the maximal memory used inthe converting process. It can be seen clearly that the newH2-matrix generated from the proposed work has a muchreduced rank, memory, and matrix-vector multiplication timeas compared to the FMM-based representation. We also plotthe time and memory usage in log scale in Fig. 10 for theminimal-rank H2-generation time. They are shown to agreevery well with our theoretical complexity analysis.

In addition, we plot the memory required to store an FMMmatrix and the new H2-matrix generated by Fast NRA inFig. 11(a), and the CPU time of one matrix-vector multiplica-tion of the two methods in Fig. 11(b). It is obvious that due toits minimal-rank representation, the new H2-matrix generatedby the proposed algorithms has a much reduced storage andCPU run time, while scaling with N in the same way.

VIII. CONCLUSION

We present new algorithms to generate a rank-minimizedH2-matrix to represent electrically large surface IE operators.

0 2 4 6 8 10 12

Number of unknowns 105

0

2

4

6

8

10

12

Mem

ory

(M

B)

104

FMM

New H2 from Fast NRA

(a)

0 2 4 6 8 10 12

Number of unknowns 105

0

100

200

300

400

500

600

700

800

Matr

ix-V

ecto

r m

ultip

lication tim

e (

s) FMM

New H2 form Fast NRA

(b)

Fig. 11: Comparison between the FMM and the new H2-matrix obtained from the proposed fast NRA. (a) Memory.(b) Time.

First, the FMM is leveraged to obtain an initial H2-matrixin low complexity. Fast nested reduction algorithms are thendeveloped to convert the FMM-based H2-representation to anew H2-matrix whose rank is minimized based on accuracy.The resultant new H2-matrix is found to have a much reducedrank without sacrificing prescribed accuracy, which acceleratesboth iterative and direct solutions. The proposed work has beenapplied to solve electrically large SIE equations for scatteringanalysis. Its accuracy and efficiency are demonstrated bynumerical experiments. In addition to surface IEs, it is alsoapplicable to volume IEs, and other IE operators.

REFERENCES

[1] S. Borm, L. Grasedyck, and W. Hackbusch, “Hierarchical matrices,”Lecture notes, vol. 21, 2003.

[2] W. Chai and D. Jiao, “An H2-matrix-based integral-equation solver ofreduced complexity and controlled accuracy for solving electrodynamicproblems,” IEEE Trans. Antennas and Propagation, vol. 57, no. 10, pp.3147–3159, 2009.

[3] M. Bebendorf, “Approximation of boundary element matrices,” Nu-merische Math., vol. 86, no. 4, pp. 565–589, 2000.

[4] K. Zhao, M. N. Vouvakis, and J. F. Lee, “The adaptive cross approxi-mation algorithm for accelerated method of moments computations of

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.

Page 12: Nested Reduction Algorithms for Generating a Rank

0018-926X (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAP.2020.3044584, IEEETransactions on Antennas and Propagation

12

emc problems,” IEEE Trans. Electromagn. Compat., vol. 47, no. 4, pp.763–773, 2005.

[5] W. Chai and D. Jiao, “A complexity-reduced H-matrix based di-rect integral equation solver with prescribed accuracy for large-scaleelectrodynamic analysis,” in IEEE International Symp. Antennas andPropagation. IEEE, 2010.

[6] S. Omar, M. Ma, and D. Jiao, “Low-complexity direct and iterative vol-ume integral equation solvers with a minimal-rank H2-representation forlarge-scale 3-D electrodynamic analysis,” IEEE Journal on Multiscaleand Multiphysics Computational Techniques, vol. 2, pp. 210–223, 2017.

[7] M. Bebendorf and R. Venn, “Constructing nested bases approximationsfrom the entries of non-local operators,” Numerische Mathematik, vol.121, no. 4, pp. 609–635, 2012.

[8] M. Li, M. A. Francavilla, F. Vipiana, G. Vecchi, and R. Chen, “Nestedequivalent source approximation for the modeling of multiscale struc-tures,” IEEE Trans. Antennas Propag., vol. 62, no. 7, pp. 3664–3678,July 2014.

[9] M. A. E. Bautista, M. A. Francavilla, P. G. Martinsson, and F. Vipiana,“O(N) nested skeletonization scheme for the analysis of multiscalestructures using the method of moments,” IEEE Journal on Multiscaleand Multiphysics Computational Techniques, vol. 1, pp. 139–150, 2016.

[10] Y. Zhao, D. Jiao, and J. Mao, “Fast nested cross approximation algorithmfor solving large-scale electromagnetic problems,” IEEE Transactions onMicrowave Theory and Techniques, vol. 67, no. 8, pp. 3271–3283, 2019.

[11] J. M. Song and W. C. Chew, “Multilevel fast-multipole algorithm forsolving combined field integral equations of electromagnetic scattering,”Microwave and Optical Technology Letters, vol. 10, no. 1, pp. 14–19,1995.

[12] R. Coifman, V. Rokhlin, and S. Wandzura, “The fast multipole methodfor the wave equation: A pedestrian prescription,” IEEE Antennas andPropagation Magazine, vol. 35, no. 3, pp. 7–12, 1993.

[13] E. Darve, “The fast multipole method: numerical implementation,”Journal of Computational Physics, vol. 160, no. 1, pp. 195–240, 2000.

[14] W. C. Chew, E. Michielssen, J. Song, and J.-M. Jin, Fast and efficientalgorithms in computational electromagnetics. Artech House, Inc.,2001.

[15] Z. N. Jiang, Y. Xu, R. S. Chen, Z. H. Fan, and D. Z. Ding, “Efficientmatrix filling of multilevel simply sparse method via multilevel fastmultipole algorithm,” Radio Science, vol. 46, no. 5, 2011.

[16] D. Jiao and S. Omar, “Minimal-rank H2-matrix-based iterative and di-rect volume integral equation solvers for large-scale scattering analysis,”in 2015 IEEE International Symp. Antennas and Propagation. IEEE,2015, pp. 740–741.

[17] W. Chai and D. Jiao, “Linear-complexity direct and iterative in-tegral equation solvers accelerated by a new rank-minimized H2-representation for large-scale 3-D interconnect extraction,” IEEE Trans.Microwave Theory and Techniques, vol. 61, no. 8, pp. 2792–2805, 2013.

[18] C. Yang and D. Jiao, “Method for generating a minimal-rank H2-matrixfrom fmm for electrically large analysis,” in IEEE International Symp.on Antennas and Propagation. IEEE, 2018, pp. 2503–2504.

[19] C. Yang, M. Ma, and D. Jiao, “Fast algorithms for converting anFMM-based representation of electrically large integral operators to aminimal-rank H2-matrix,” in IEEE International Symp. on Antennasand Propagation. IEEE, 2019.

[20] S. Rao, D. Wilton, and A. Glisson, “Electromagnetic scattering bysurfaces of arbitrary shape,” IEEE Trans. Antennas and Propagation,vol. 30, no. 3, pp. 409–418, 1982.

[21] W. Chai and D. Jiao, “Theoretical study on the rank of integral operatorsfor broadband electromagnetic modeling from static to electrodynamicfrequencies,” IEEE Trans. on Components, Packaging and Manufactur-ing Technology, vol. 3, no. 12, pp. 2113–2126, 2013.

[22] M. Ma and D. Jiao, “Accuracy directly controlled fast direct solutionof general H2-matrices and its application to solving electrodynamicvolume integral equations,” IEEE Trans. Microwave Theory and Tech-niques, vol. 66, no. 1, pp. 35–48, 2017.

[23] C. Balanis, Advanced Engineering Electromagnetics, ser.CourseSmart Series. Wiley, 2012. [Online]. Available:https://books.google.com/books?id=cRkTuQAACAAJ

PLACEPHOTOHERE

Chang Yang (GS’20) received the B.S. degree inElectronic Science and Technology from the Xi’anJiaotong University, Xi’an, China, in 2016. He iscurrently pursuing the Ph.D. degree at Purdue Uni-versity, West Lafayette, IN, USA. He is also amember of the On-Chip Electromagnetics Group,School of Electrical and Computer Engineering,Purdue University. His current research interests in-clude computational electromagnetics, fast and high-capacity numerical methods, and scattering analysis.

Mr. Yang was a recipient of the Honorable Men-tion Award in the 2020 IEEE Antennas and Propagation Symposium (AP-S)Student Paper Competition.

PLACEPHOTOHERE

Dan Jiao (M02, SM06, F16) received her Ph.D.degree in electrical engineering from the Universityof Illinois at Urbana-Champaign, in 2001. She thenworked at the Technology Computer-Aided Design(CAD) Division, Intel Corporation, until September2005, as a Senior CAD Engineer, Staff Engineer,and Senior Staff Engineer. In September 2005, shejoined Purdue University, West Lafayette, IN, as anAssistant Professor with the School of Electricaland Computer Engineering, where she is now aProfessor. She has authored three book chapters and

over 300 papers in refereed journals and international conferences. Her currentresearch interests include computational electromagnetics, high-frequencydigital, analog, mixed-signal, and RF integrated circuit (IC) design andanalysis, high-performance VLSI CAD, modeling of microscale and nanoscalecircuits, applied electromagnetics, fast and high-capacity numerical methods,fast time-domain analysis, scattering and antenna analysis, RF, microwave, andmillimeter-wave circuits, wireless communication, and bio-electromagnetics.

Dr. Jiao has served as the reviewer for many IEEE journals and conferences.She is an Associate Editor of the IEEE Trans. on Components, Packaging,and Manufacturing Technology, and an Associate Editor of the IEEE Journalon Multiscale and Multiphysics Computational Techniques. She served as theGeneral Chair of the 2019 IEEE MTT-S International Conference on Numer-ical Electromagnetic and Multiphysics Modeling and Optimization (NEMO),Boston, USA. She was selected as an IEEE MTT-Society DistinguishedMicrowave Lecturer in 2020. She was a recipient of Intel’s 2019 OutstandingResearcher Award. She received the 2013 S. A. Schelkunoff Prize PaperAward of the IEEE Antennas and Propagation Society, which recognizes theBest Paper published in the IEEE Transactions on Antennas and Propagationduring the previous year. She was among the 21 women faculty selectedacross the country as the 2014-2015 Fellow of ELATE (Executive Leadershipin Academic Technology and Engineering) at Drexel, a national leadershipprogram for women in the academic STEM fields. She has been named aUniversity Faculty Scholar by Purdue University since 2013. She was amongthe 85 engineers selected throughout the nation for the National Academyof Engineering’s 2011 US Frontiers of Engineering Symposium. She was therecipient of the 2010 Ruth and Joel Spira Outstanding Teaching Award, the2008 National Science Foundation (NSF) CAREER Award, the 2006 Jackand Cathie Kozik Faculty Start up Award (which recognizes an outstandingnew faculty member of the School of Electrical and Computer Engineering,Purdue University), a 2006 Office of Naval Research (ONR) Award under theYoung Investigator Program, the 2004 Best Paper Award presented at the IntelCorporation’s annual corporate-wide technology conference (Design and TestTechnology Conference) for her work on generic broadband model of high-speed circuits, the 2003 Intel Corporation’s Logic Technology Development(LTD) Divisional Achievement Award, the Intel Corporation’s TechnologyCAD Divisional Achievement Award, the 2002 Intel Corporation’s Compo-nents Research the Intel Hero Award (Intel-wide she was the tenth recipient),the Intel Corporation’s LTD Team Quality Award, and the 2000 Raj MittraOutstanding Research Award presented by the University of Illinois at Urbana-Champaign.

Authorized licensed use limited to: Purdue University. Downloaded on April 08,2021 at 20:16:47 UTC from IEEE Xplore. Restrictions apply.