Journal of Computational Physics - web.stanford.edulexing/empaper.pdf · Received 16 September 2010 Received in revised form 1 February 2011 Accepted 13 February 2011 Available online

Journal of Computational Physics 230 (2011) 5471–5487

Contents lists available at ScienceDirect

Journal of Computational Physics

journal homepage: www.elsevier .com/locate / jcp

A fast directional algorithm for high-frequency electromagnetic scattering

Paul Tsuji a,⇑, Lexing Ying b

a ICES, University of Texas at Austin, Austin, TX 78712, USAb Department of Mathematics and ICES, University of Texas at Austin, TX 78712, USA

a r t i c l e i n f o a b s t r a c t

Article history:Received 16 September 2010Received in revised form 1 February 2011Accepted 13 February 2011Available online 27 March 2011

Keywords:Electromagnetic scatteringBoundary integral equationsFast algorithmsFast multipole methodsSparse Fourier transforms

0021-9991/$ - see front matter � 2011 Elsevier Incdoi:10.1016/j.jcp.2011.02.013

⇑ Corresponding author.E-mail addresses: [email protected] (P. Tsuji), le

This paper is concerned with the fast solution of high-frequency electromagnetic scatteringproblems using the boundary integral formulation. We extend the O(N logN) directionalmultilevel algorithm previously proposed for the acoustic scattering case to the vectorelectromagnetic case. We also detail how to incorporate the curl operator of the magneticfield integral equation into the algorithm. When combined with a standard iterativemethod, this results in an almost linear complexity solver for the combined field integralequations. In addition, the butterfly algorithm is utilized to compute the far field patternand radar cross section with O(N logN) complexity.

� 2011 Elsevier Inc. All rights reserved.

1. Introduction

Electromagnetic scattering from perfectly conducting objects is a well-studied problem, having applications in antennamodeling, radar engineering, and remote sensing. The most popular computational technique for solving the time-harmonicexterior scattering problem is the boundary element method, or ‘‘method of moments’’ as it is known in the electrical engi-neering community. In this method, the electric and magnetic field integral equations can be derived using the boundaryconditions of a perfectly conducting object, with the unknown quantity being the fictitious current induced on the surface.In the discretization process, an N-dimensional finite element approximation is used to represent the current, while theequations can be tested using the same basis (Galerkin) or by pointwise collocation. This results in an N � N linear systemof equations. Once the coefficients of the basis functions are determined, relevant quantities such as the far-field pattern andscattered field can be computed by the appropriate radiation formulas.

In many of these applications, it is often found that the operating wavelength is much smaller in scale compared to thesize of the target object; that is, the diameter of the scatterer is orders of magnitude larger than k. For the boundary elementmethod to be reasonably accurate, each element must be of size 0.2k or smaller; thus, for high-frequency problems, the num-ber of unknowns N can become very large, and solving the resulting N � N linear system in a reasonable amount of time be-comes a difficult task. Standard O(N3) methods such as Gaussian elimination or Cholesky factorization do not suffice. In thepast few decades, there has been a great deal of effort put forth in developing fast algorithms to remedy this problem.

There are two main types of fast solvers for the system matrix: direct methods and iterative methods. Direct methods aimat computing an approximate inverse to the method-of-moments matrix and applying it to the right hand side; the benefitshere are that the computational time is usually unaffected by the condition number, and multiple right hand sides can behandled with ease once the approximate inverse is generated. In the realm of fast direct solvers for integral equations of

. All rights reserved.

[email protected] (L. Ying).

http://dx.doi.org/10.1016/j.jcp.2011.02.013

mailto:[email protected]

mailto:[email protected]

http://dx.doi.org/10.1016/j.jcp.2011.02.013

http://www.sciencedirect.com/science/journal/00219991

http://www.elsevier.com/locate/jcp

5472 P. Tsuji, L. Ying / Journal of Computational Physics 230 (2011) 5471–5487

scattering theory, recent works presented include the matrix compression algorithms by Michielssen, Boag, and Chew [19],as well as Martinsson and Rokhlin [18]. For the two-dimensional case of elongated structures, these algorithms have com-putational complexities which scale as O(N log2N) and O(N), respectively. Canning et al. [6] and Heldring et al. [16] have pre-sented direct solvers for the three-dimensional case which scale as O(N2) and O(N log3N) in the low-frequency regime; thesecomplexity estimates break down, however, in the high-frequency regime. To the author’s knowledge, there currently are noO(N) or O(N logN)-type direct solvers for the integral equations of high-frequency scattering in three dimensions.

On the other hand, iterative-type fast solvers combine a basic iterative method with a fast algorithm to accelerate thematrix–vector multiplication necessary at each iteration. Standard iterative methods alone have an O(N2) memory require-ment, with a complexity of O(nitN

2); here, nit is the number of iterations necessary for convergence to a desirable error tol-erance. This is easily seen, as there are N2 matrix entries needed to be stored, and direct matrix–vector multiplications takeO(N2) operations. The goal of the fast algorithm is to reduce the complexity of the matrix–vector operation down toO(N logN) or O(N), thus resulting in a total complexity down to O(nitN logN) or O(nitN), respectively. In the computationalelectromagnetics community, a plethora of algorithms have been utilized in this setting; some of these popular methodsare based on the Fast Fourier Transform (FFT). The first method of this type was the adaptive integral method [3] proposedby Bleszynski et al., which reduces the complexity of surface problems down to O(N1.5logN). An improvement by Bruno andKunyansky in [4] uses non-uniform Cartesian grids to reduce the computational cost, but still has superlinear dependence onN. The precorrected-FFT method [21], presented by Phillips and White, can reduce the complexities of electrostatic and low-frequency problems down to O(N logN), but for higher frequency problems, this estimate returns to O(N1.5logN); Lee’s IE-FFTmethod [25] has also been shown to work on PEC surface problems with similar complexity.

The more commonly used algorithms are the variations of the fast multipole method (FMM) and other low-rank factor-ization methods. Many of these methods start by rewriting the matrix–vector multiplication as an N-body potential prob-lem; due to properties of the Green’s function in the far field, the interactions between well-separated sets can becomputed with low-rank approximations. For low-frequency problems, a wide range of low-rank factorization algorithmshave been introduced, most notably Bebendorf’s adaptive cross approximation [2], Lee’s IE-QR algorithm [24], Kapur andLong’s IES3 method [17], and Canning’s simply sparse method [32]. For high-frequency problems, however, the fast multipolemethod has become the norm. The original FMM for the Laplace equation was adapted by Rokhlin to solve the Helmholtzequation for high frequencies in [23]. In the paper, he states that the two-level method reduces the complexity to O(N3/2),the three-level method reduces it further to O(N4/3), and so on. With a multilevel approach, he argues that the complexitycan be reduced to O(N logN). This multilevel approach was realized in the well-celebrated multilevel fast multipole algo-rithm (MLFMA) by Song, Lu, and Chew [27].

The focus of this paper is to adapt two recently developed algorithms to address high-frequency electromagnetic scatter-ing problems: the directional multilevel algorithm [10,11] for the Helmholtz kernel, and the butterfly algorithm [30] for fastradar cross section calculations. In the high-frequency case, the fast directional algorithm constructs low-rank approxima-tions between well-separated groups; unlike other low-rank factorization methods, however, these approximations arebased on direction. This allows for the solution of high-frequency problems in O(N logN) time. The advantages of the direc-tional multilevel algorithm in the electromagnetic case are the following: first, it does not use complicated analytical expres-sions for the far-field approximations; it only uses the directional low-rank property of the kernel function and kernelevaluations for the translation operators. This results in a simpler implementation and ease of use. Secondly, the directionalmultilevel algorithm can handle both the Green’s function of the Helmholtz equation and its derivatives, as well as linearoperations such as the curl operator in the magnetic field integral equation, with minor modifications. In this sense, it ismore user-friendly than other descendants of the FMM.

The rest of the paper is as follows: in Section 2, we formulate the boundary element method for electromagnetics, includ-ing the relevant radiation formulas needed in the post-processing stage. In Section 3, we discuss how to apply the directionalmultilevel algorithm and the butterfly algorithm to the vector Maxwell equations. In Section 4, we present numerical resultswith canonical examples, and show that the complexity of our method scales as O(N logN). Finally, we conclude with someremarks in Section 5.

2. Boundary integral formulation and discretization

2.1. Surface integral equations

We begin with a brief review of surface integral equations for scattering from perfectly conducting objects. The discussionhere mostly follows the method-of-moments formulation in [20]. Before we proceed, however, we must comment that theirformulation assumes the continuity equation with the opposite sign, i.e. r �~J ¼ ıxq. This results in a sign change within theelectric field integral equation. For consistency, we have assumed the time convention J

!ð~r; tÞ ¼ J

!ð~rÞeıxt , resulting in the con-

tinuity equationr � J!¼ �ıxq. All other fields follow the same convention, i.e. E

!ð~r; tÞ ¼ E

!ð~rÞeıxt;H

!ð~r; tÞ ¼ H

!ð~rÞeıxt , and so on.

Consider the scattering domain D with boundary C = @ D; note that D does not necessarily have to be connected, so therecan be more than one scattering object. Let us denote the incident fields ðEi

!;Hi

!Þ, the scattered fields ðEs

!;Hs

!Þ, and the total fields

ð~E; ~HÞ ¼ ðEi

!þ Es

!;Hi

!þHs

!Þ. The time-harmonic Maxwell equations for the scattered fields in the exterior of the scatterers are

P. Tsuji, L. Ying / Journal of Computational Physics 230 (2011) 5471–5487 5473

r� Es

!¼ �ıxlHs

!

r� Hs

!¼ ıxe Es

!þ J!

r � Es

!¼ q

e

r � Hs

!¼ 0

limj~rj!1ðHs

!�~r � j~rj Es

!Þ ¼ 0

limj~rj!1ðEs

!�~r þ j~rjHs

!Þ ¼ 0; ð2:1Þ

where ı ¼ffiffiffiffiffiffiffi�1p

; e is the permittivity of the surrounding medium, l is the permeability, x is the angular frequency, and J!

isthe fictitious induced current on the surface of the scatterer which satisfies the continuity equation r � J

!¼ �ıxq. In this

paper, we take the surrounding medium to be free space, i.e. l = 4p � 10�7 and e � 8.854 � 10�12. The last two conditionsare known as the Silver–Müller radiation conditions; these equations guarantee uniqueness of the solution to the exteriorproblem. The boundary conditions enforced on C for the total fields are

nð~rÞ � E!ð~rÞ ¼ 0; ð2:2Þ

nð~rÞ � H!ð~rÞ ¼ J

!ð~rÞ ~r 2 C;

where n is the outward unit normal vector on C. The wavenumber k is equal to x ffiffiffiffiffiffilepand the wavelength k is 2p/k. In this

paper, we scale the geometry so that the wavelength k = 1 (and equivalently k = 2p). The diameter of the scaled object is thendenoted by K; that is, the object is centered at the origin and contained inside the box � K

2 ;K2

� �3. For a problem in the high-frequency regime, K is typically quite large.

Using the standard machinery for Green’s functions of the vector Helmholtz equation, we can formulate the exterior scat-tered fields in terms of the current distribution J

!ð~rÞ for~r 2 C, i.e. the radiation formulas are

Es

!ð~rÞ ¼ �ıxl

ZC

Gð~r;~r0Þ J!ð~r0Þ þ 1

k2r0r0 � J

!ð~r0Þ

� �d~r0;Hs

!ð~rÞ ¼ r�

ZC

Gð~r;~r0Þ J!ð~r0Þd~r0; ð2:3Þ

where Gð~r;~r0Þ is the Green’s function for the scalar Helmholtz equation

Gð~r;~r0Þ ¼ e�ıkj~r�~r0 j

4pj~r �~r0j : ð2:4Þ

Using the boundary condition J!¼ n� H

!and taking the limit as~r approaches the surface, one can derive the electric field

integral equation (EFIE) and magnetic field integral equation (MFIE), which respectively are

nð~rÞ � Ei

!ð~rÞ ¼ nð~rÞ � ıxl

ZC

Gð~r;~r0Þ J!ð~r0Þd~r0 þ 1

k2rZ

CGð~r;~r0Þr0 � J

!ð~r0Þd~r0

� �; ð2:5Þ

nð~rÞ � Hi

!ð~rÞ ¼ J

!ð~rÞ2� nð~rÞ �

ZCnf~rg

Jð~r0Þ � r0Gð~r;~r0Þd~r0: ð2:6Þ

Here, the differential operators are redistributed for convenience in the discretization process. In order to get a better con-ditioned system of equations, one typically takes a convex linear combination of the EFIE and MFIE to get the combined fieldintegral equation (CFIE). For a constant a between 0 and 1, the CFIE is

CFIE ¼ aEFIEþ ð1� aÞgMFIE; ð2:7Þ

where g ¼ffiffiffile

qis the intrinsic impedance.

2.2. Method-of-moments discretization

For any given scatterer, we first discretize its boundary surface C with a triangular mesh that has a constant number oftriangles per wavelength k; the typical edge length h of the triangles is on the order of k but significantly smaller. Let us de-note the triangles of the mesh by fTigNt

i¼1 where Nt is the number of triangles and C ¼SNt

i¼1Ti. The discretization of (2.7) is thenperformed using the standard RWG basis functions presented in [22]. Consider the RWG basis functions f~f ngN

n¼1 defined onthe triangular mesh of C, where N = O(Nt) is the number of total edges of the mesh for a closed surface (for an open surface,basis functions are only defined on interior edges). Since the scatterer surface is discretized with a near-constant number oftriangles per unit area and the surface area is of order O(K2), N = O(K2). The support Cn of each~f n is supported on a pair oftriangles (Tn,1,Tn,2), i.e. Cn = Tn,1 [ Tn,2. The current J

!ð~rÞ is then expanded in terms of N basis functions,


J!ð~rÞ ¼

XN

n¼1

jn~f nð~rÞ; ð2:8Þ

where fjngNn¼1 2 C are the coefficients to be determined. Seeking a solution of this form, we plug expansion (2.8) into (2.5)

and (2.6) with constants xl = kg to get the discretized integral equations

nð~rÞ � Ei

!ð~rÞ ¼ nð~rÞ �

XN

n¼1

jn ıkgZ

CGð~r;~r0Þ~f nð~r0Þd~r0 �

gıkrZ

CGð~r;~r0Þr0 �~f nð~r0Þd~r0

� �;

nð~rÞ � Hi

!ð~rÞ ¼ 1

2

XN

n¼1

jn~f nð~rÞ � nð~rÞ �

XN

n¼1

jn

ZCnf~rg

~f nð~r0Þ � r0Gð~r;~r0Þd~r0:

The next step is to test the integral equations by using the Galerkin method or another basis to obtain a linear system ofequations. Since the emphasis of this paper is on the application of fast multilevel algorithms and not on discretization tech-niques, we choose the simplest testing functions for efficient implementation; that is, razor-blade testing functions for theEFIE and point-matching for the MFIE. For the details, please refer to Chapter 10 of [20]. With this formulation, the N � Nlinear system of equations obtained is

ZI ¼ V ; ð2:9Þ

where the entries of Z, I, and V are

Zmn ¼ aAmn þ ð1� aÞgDtmBmn;

Amn ¼ ıkgZ

Cm

ZCn

Gð~r;~r0Þ~f nð~r0Þd~r0 � d~r�

gık

ZCm

rZ

Cn

Gð~r;~r0Þr0 �~f nð~r0Þd~r0� �

� d~r;

Bmn ¼ em �Z

Cn

~f nð~r0Þ � rGð~r;~r0Þd~r0~r¼~rm

; m–n

Bmm ¼2p�Xm

2pIn ¼ jn;

Vm ¼ aZ

Cm

E!

ið~rÞ � d~r þ ð1� aÞgDtmem � H

!

ið~rmÞ: ð2:10Þ

Here, Cm is the path of the razor-blade function along the surface of the mth RWG, em is the unit vector in the direction of theadjoining edge of the mth RWG (oriented so that n� em points in the direction of the basis function, for either triangle),~rm isthe center of the same adjoining edge, and Xm is the interior angle created by the two triangles (Tm,1,Tm,2) (see Fig. 1 for anillustration of these notations).

For the outer integrals in the matrix entries of A, we use the formulas
ZCm
r/ð~rÞ � d~r ¼ /ð~cm;2Þ � /ð~cm;1Þ;Z

Cm

~Fð~rÞ � d~r �~Fð~cm;1Þ �~tm;1 þ~Fð~cm;2Þ �~tm;2; ð2:11Þ

where~cm;1 and~cm;2 are the centroids of Tm,1 and Tm,2, respectively, and~tm;1;~tm;2 are the vectors prescribed by the razor-bladefunction at the centroids. If we set /ð~rÞ ¼

RCn

Gð~r;~r0Þr0 �~f nð~r0Þd~r0 and ~Fð~rÞ ¼RCn

Gð~r;~r0Þ~f nð~r0Þd~r0, the matrix entries of A nowbecome

Fig. 1. Illustration of a razor-blade function defined on the mth RWG basis function.


Amn ¼ ıkg ~tm;1 �Z

Cn

Gð~cm;1;~r0Þ~f nð~r0Þd~r0 þ~tm;2 �Z

Cn

Gð~cm;2;~r0Þ~f nð~r0Þd~r0 �

� gık

ZCn

Gð~cm;2;~r0Þr0 �~f nð~r0Þd~r0 �Z

Cn

Gð~cm;1;~r0Þr0 �~f nð~r0Þd~r0 �

:

It is clear that the computation of the matrix entries above first involve the evaluation of potential-type integrals at~cm;1;~cm;2,and~rm for m = 1,2, . . . ,N. This observation will become important in the very next section.

2.3. Iterative solution

To solve the linear system (2.9) efficiently, we must use an iterative method such as GMRES. Thus, in order to constructthe Krylov subspace, at each iteration a density vector I is given and a summation of the form

XN

n¼1

ZmnIn ð2:12Þ

for m = 1,2, . . . ,N must be computed. If the matrix entries are done explicitly and each summation is performed directly, thisobviously yields a complexity of O(N2), with an O(N2) memory requirement. Instead, we will modify the approach by con-sidering the evaluation of the potential integrals at the test points first, before applying the test vectors. That is, let us definethe following integral operators

U½ J!�ð~rÞ ¼

XNt

i¼1

ZTi

Gð~r;~r0Þ J!ð~r0Þd~r0;

V ½ J!�ð~rÞ ¼

XNt

i¼1

ZTi

Gð~r;~r0Þr0 � J!ð~r0Þd~r0;

W½ J!�ð~rÞ ¼

XNt

i¼1

ZTi

J!ð~r0Þ � rGð~r;~r0Þd~r0; ð2:13Þ

where the integrals are now over each triangle instead of each RWG. Using these operators and (2.10), the summation in(2.12) is simply

XN

n¼1

ZmnIn ¼ a ıkg ~tm;1 � U½ J!�ð~cm;1Þ þ~tm;2 � U½ J

!�ð~cm;2Þ

�� g

ıkV ½ J!�ð~cm;2Þ � V ½ J

!�ð~cm;1Þ

��

þ ð1� aÞgDtm em �W½ J!�ð~rmÞ þ

2p�Xm

2pjm

� �: ð2:14Þ

At each iteration, we can use expansion (2.8) to interpolate the current on every triangle given the input fjngNn¼1. It can be

observed from the matrix entries that our task is to evaluate U½ J!�ð~rÞ and V ½ J

!�ð~rÞ at points f~c‘gNt

‘¼1 and W½ J!�ð~rÞ at points

f~rmgNm¼1. Since these integrals cannot be analytically evaluated in most cases, it is necessary to introduce a numerical quad-

rature scheme. First, consider the evaluation of U½ J!�ð~c‘Þ and V ½ J

!�ð~c‘Þ. If~c‘ R Ti, then symmetric Gaussian quadrature in [28]

can be used over Ti; if~c‘ 2 Ti, however, we must use a quadrature rule especially constructed to handle the 1r singularity in the

integral. To handle this, we subdivide the triangle Ti into three new triangles which share~c‘ as a vertex, and use Duffy quad-rature proposed in [9] over each of the new triangles (see Fig. 2).

For a triangle Ti, let f~pi;qgQq¼1 and fai;qgQ

q¼1 be the nodes and weights of the Gaussian quadrature rule and let f~ti;sgSs¼1 and

fbi;sgSs¼1 be the unions of the three sets of Duffy quadrature nodes and weights (one for each of the three new triangles). It is

important to note here that the weights take into account the area when integrating over Ti, i.e. that the Jacobian is alreadybuilt into each set of weights. We can then approximate the integrals for each~c‘ as

U½ J!�ð~c‘Þ �

XNt

i¼1i–‘

XQ

q¼1

Gð~c‘;~pi;qÞ J!ð~pi;qÞai;q þ

XS

s¼1

Gð~c‘;~t‘;sÞ J!ð~t‘;sÞb‘;s;

V ½ J!�ð~c‘Þ �

XNt

i¼1i–‘

XQ

q¼1

Gð~c‘;~pi;qÞr � J!ð~pi;qÞai;q þ

XS

s¼1

Gð~c‘;~t‘;sÞr � J!ð~t‘;sÞb‘;s:

The main computation is the first double sum in each formula, since for each~c‘ one needs to sum over O(q Nt) terms. It is noteasy to apply fast summation algorithms directly to this sum, since the summation is performed over a set that depends on~c‘, i.e., the constraint i – ‘. In order to facilitate the fast summation, we rewrite the formulas as

0 0.5 1 1.50

0.51

1.50

0.2

0.4

0.6

0.8

1

Fig. 2. Integrating over an RWG basis function with a singularity at the centroid of one triangle. The triangle is subdivided into 3 new triangles. The �’smark the location of the Duffy quadrature nodes.


U½ J!�ð~c‘Þ �

XNt

i¼1

Xq:~c‘–~pi;q

Gð~c‘;~pi;qÞ J!ð~pi;qÞai;q �

Xq:~c‘–~p‘;q

Gð~c‘;~p‘;qÞ J!ð~p‘;qÞa‘;q þ

XS

s¼1


V ½ J!�ð~c‘Þ �

XNt

i¼1

Xq:~c‘–~pi;q

Gð~c‘;~pi;qÞr � J!ð~pi;qÞai;q �

Xq:~c‘–~p‘;q

Gð~c‘;~p‘;qÞr � J!ð~p‘;qÞa‘;q

XS

s¼1


The advantage of this form is that now the summation is essentially over all possible pairs (i,q), except the case~c‘ ¼~pi;q,which can be handled easily in fast summation algorithms.

For the evaluation of W½ J!�ð~rmÞ, we again use Gaussian quadrature nodes when~rm R Ti. For~rm 2 Ti, since rGð~rm;~rÞ lies in

the plane of Ti;~f nð~rÞ � rGð~rm;~rÞ is perpendicular to Ti; thus, when dotted with em, the contribution from Ti is zero. There is no

need for singularity correction quadrature, and we simply have

W½ J!�ð~rmÞ �

XNt

i¼1

Xq

ai;q J!ð~pi;qÞ � rGð~rm;~pi;qÞ:

The matrix–vector multiply can be summarized in the following process:

1. For ‘ = 1,2, . . . ,Nt and m = 1,2, . . . ,N, compute the sums

U1½ J!�ð~c‘Þ ¼

XNt

i¼1

Xq:~c‘–~pi;q

Gð~c‘;~pi;qÞ J!ð~pi;qÞai;q;

V1½ J!�ð~c‘Þ ¼

XNt

i¼1

Xq:~c‘–~pi;q

Gð~c‘;~pi;qÞr � J!ð~pi;qÞai;q;

W½ J!�ð~rmÞ ¼

XNt

i¼1

Xq

ai;q J!ð~pi;qÞ � rGð~rm;~pi;qÞ:

ð2:15Þ

2. For ‘ = 1,2, . . . ,Nt, compute the near-field terms of Gaussian quadrature, i.e.

U2½ J!�ð~c‘Þ ¼

Xq:~c‘–~p‘;q

Gð~c‘;~p‘;qÞ J!ð~p‘;qÞa‘;q;

V2½ J!�ð~c‘Þ ¼

Xq:~c‘–~p‘;q

Gð~c‘;~p‘;qÞr � J!ð~p‘;qÞa‘;q:

3. For ‘ = 1,2, . . . ,Nt, compute the near-field terms of Duffy quadrature, i.e.


U3½ J!�ð~c‘Þ ¼

Xs


V3½ J!�ð~c‘Þ ¼

Xs


4. For ‘ = 1,2, . . . ,Nt, compute the corrected sums

U½ J!�ð~c‘Þ ¼ U1½ J

!�ð~c‘Þ � U2½ J

!�ð~c‘Þ þ U3½ J

!�ð~c‘Þ;

V ½ J!�ð~c‘Þ ¼ V1½ J

!�ð~c‘Þ � V2½ J

!�ð~c‘Þ þ V3½ J

!�ð~c‘Þ:

5. For m = 1,2, . . . ,N, calculate the sums in (2.12) through (2.14).

Here, steps 2, 3, and 4 are of O(Nt) = O(N) complexity, since the number of quadrature nodes over each triangle remainsconstant and the number of triangles is always less than the number of basis functions. It is clear that step 5 also takes O(N)operations. The only step that requires fast summation is step 1, and in Section 3.1 we show how to reduce its cost down toO(N logN) by applying the directional multilevel algorithm.

2.4. Far-field pattern and RCS

Once the coefficients of the RWG basis functions are found, the scattered field in R3 can be computed by reinserting theexpansion (2.8) into (2.3). If~r is in the far field, i.e. j~rj � k, then the integral can be approximated by the leading order as

Es

!ð~rÞ � �ıxl e�ıkj~rj

4pj~rj

ZC

J!ð~r0Þeıkr�~r0d~r0;

where r ¼ ~rj~rj. The radar cross section (RCS) in the direction r is defined by

rðrÞ ¼ limq!1

4pq2 j Es

!ðqrÞj2

j Ei

!ðqrÞj2

¼ ðxlÞ2

4p

ZC

J!ð~r0Þeıkr�~r0d~r0

2

; ð2:16Þ

when j Ei

!ð~rÞj ¼ 1 for the incoming plane wave. Since the integral in (2.16) is non-singular, we can use Gaussian quadrature

over each triangle; the task of calculating the scattered far field is now doing the summation
ZC
J!ð~r0Þeıkr�~r0d~r0 �

XNt

i¼1

Xq

ai;q J!ð~pi;qÞeıkr�~pi;q : ð2:17Þ

Typically, in order to resolve the radar cross section, one samples the unit sphere with a discrete set of directions frsgNrs¼1,

where Nr is on the same order as N = O(K2). Therefore, calculation of the RCS for these directions is equivalent to evaluating

~AðrsÞ :¼XNt

i¼1

XQ

q¼1

ai;q J!ð~pi;qÞeıkrs �~pi;q : ð2:18Þ

Since both the number of directions Nr and the number of quadrature points QNt is O(N), the direct computation of (2.18) willbe of O(N2) complexity. Once again, we will propose an application of the butterfly algorithm for sparse Fourier transforms inSection 3.2; this will reduce the complexity down to O(N logN).

3. Fast summation algorithms

In this section, we adapt two algorithms proposed recently in [10,30] to speed up the computation of (2.15) and (2.18).

3.1. Directional multilevel algorithm

The tool for speeding up the computation in (2.15) is based on the directional multilevel algorithm for the high-frequencyHelmholtz kernel proposed in [10]. The general problem considered in [10] is the evaluation of the potentials fuigNz

i¼1 definedby

ui ¼XNz

j¼1j–i

Gð~zi;~zjÞfj; ð3:1Þ

where ffigNzi¼1 are the point sources located at f~zigNz

i¼1 � ½�K=2;K=2�3 and G(x,y) is the Green’s function of the scalar Helmholtzequation in (2.4). The directional multilevel algorithm in [10] provides an efficient and accurate method for evaluating allfuigNz

i¼1 in O(NzlogNz) time. A brief outline of this algorithm is contained in Appendix A.


Let us now discuss how to adapt (2.15) to the form of (3.1). We first observe that in (3.1) there is only one set of pointsthat serve both as ‘‘sources’’ and ‘‘targets,’’ while in (2.15) the source points and target points are different. To address thisdifference, we can combine the source points and target points into a single set of points, assign zero weights to the targetpoints, apply the fast summation algorithm to the combined set of points, and extract the potentials only at the target points.Clearly, this does not change the complexity of the algorithm; from now on, we can safely assume that the source and targetpoints can be different.

If we denote ~u‘ ¼ U1½ J!�ð~c‘Þ; jði; qÞ ¼ ði� 1ÞQ þ q;Ny ¼ NtQ ;~yj ¼~pi;q, and~f j ¼ ai;q J

!ð~pi;qÞ, then we can rewrite the first sum-

mation of (2.15) as

~u‘ ¼XNy

j¼1~c‘–~yj

Gð~c‘;~yjÞ~f j;

where~u‘ ¼ ðu‘;1;u‘;2;u‘;3Þ and~f j ¼ ðfj;1; fj;2; fj;3Þ. Similarly, if we define v‘ ¼ V1½ J!�ð~c‘Þ and gj ¼ ai;qr � J

!ð~pi;qÞ, the second summa-

tion of (2.15) is in the form

v‘ ¼XNy

j¼1~c‘–~yj

Gð~c‘;~yjÞgj:

Thus, we now have the forms in which we are able to apply the directional multilevel algorithm. The summation for v‘ is inthe exact same form as (3.1), i.e. the product of an Nt � Ny matrix and an Ny � 1 vector. The summation for ~u‘ is also of thesame form, only now there are three components to sum over; instead of a matrix–vector operation, it is a multiplication ofan Nt � Ny matrix with an Ny � 3 matrix. In practice, since the discretization points are the same for both sums, we can aggre-gate the computations by combining the three components from ~f j with gj. By defining ~a‘ ¼ ðu‘;1;u‘;2; u‘;3;v ‘Þ and~bj ¼ ðfj;1; fj;2; fj;3; gjÞ, the summations are of the form

~a‘ ¼XNy

j¼1~c‘–~yj

Gð~c‘;~yjÞ~bj:

Here, instead of running the directional multilevel algorithm four times (once for each component), we run the algorithmonce but vectorize all of the translation operators to handle multiple columns. This process is fairly simple, since the equiv-alent source and potential locations remain the same; the only difference is that the sources and potentials take on vectorvalues. When computing the equivalent sources or any of the translation operators, matrix–matrix products are used insteadof matrix–vector products. The resulting complexity of this computation is O(NylogNy) where we recall Ny = NtQ. Since Q isconstant and the number of triangles Nt is always less than the number of unknowns, this complexity is O(N logN).

Using the same notation for j;~yj;Ny, and~f j, the third summation of (2.15) is of the form

~wm ¼XNy

j¼1

rGð~rm;~yjÞ �~f j;

where ~wm ¼W½ J!�ð~rmÞ ¼ ðwm;1;wm;2;wm;3Þ and the gradient operator r = (@x,@y,@z) acts on the first argument of G. Rewriting

the cross product as a matrix–vector operation, we see that

wm;1

wm;2

wm;3

0B@

1CA ¼XNy

j¼1

0 �Gzð~rm;~yjÞ Gyð~rm;~yjÞGzð~rm;~yjÞ 0 �Gxð~rm;~yjÞ�Gyð~rm;~yjÞ Gxð~rm;~yjÞ 0

0B@

1CA

fj;1

fj;2

fj;3

0B@

1CA: ð3:2Þ

At first, it may seem as if the directional multilevel algorithm cannot be applied to this situation. If we make a minor mod-ification, however, it will become apparent that the algorithm can work for (3.2). More specifically, let us look at the firstcomponent,

wm;1 ¼XNy

j¼1

ð�Gzð~rm;~yjÞfj;2 þ Gyð~rm;~yjÞfj;3Þ: ð3:3Þ

The essential step of the directional multilevel algorithm is the construction of the equivalent sources. For a leaf level box Bin the low-frequency regime, (A.4) states that the potential field produced by an arbitrary set of sources inside B can be wellapproximated by a small set of equivalent sources in B, when observing the far field of B. As Gz and Gy are derivatives of theGreen’s function, the sources in (3.3) are essentially dipoles in the z and y directions. Since the field generated by a dipole canbe well approximated by a group or a distribution of monopoles (i.e., sources determined by the Green’s function) in itsvicinity, the field generated by points in B with the kernels of (3.3) can also be approximated by a small set of equivalentsources, when observing the far field of B. Moreover, notice that in (A.4) the construction of the equivalent sources for a


box B requires only the potentials at xBp

n o; clearly, these potentials can be evaluated directly using the formulas of the ker-

nels Gzð�;~yjÞ and Gyð�;~yjÞ. We would like to emphasize two points here: first, the equivalent sources are always computedusing the Green’s function even though the kernels of (3.3) are the derivatives; second, once the true sources of (3.3) aretransformed into equivalent sources at the leaf level, the computation in the high-frequency regime only involves theGreen’s function G(x,y), and hence requires no modification at all. Compared with the algorithm for the Green’s functionG(x,y) in Appendix A, the algorithm for the kernels of (3.3) requires only the following modifications:

For the computation of the equivalent sources for the leaf boxes in the low-frequency regime, use the kernels of (3.3) to

compute the potentials at xBp

n o.

Use the kernels of (3.3) for direct near-field calculations in the low-frequency regime.

The computation of the second and third components wm,2 and wm,3 can be handled in essentially the same way. In fact,since the algorithm for wm,1, wm,2, and wm,3 above the leaf level is exactly the same, we again aggregate the computations byvectorizing the translation operations for all three components, allowing us to run the algorithm only once. The resultingalgorithm for the MFIE kernel has O(NylogNy) = O(N logN) complexity.

3.2. Butterfly algorithm for RCS calculations

The tool for accelerating the computation in (2.15) is the butterfly algorithm proposed in [30]. Recall that a sparse Fouriertransform is a computation of potentials in the form

ui ¼X

j

e2pı~xi �~kj=Mfj; ð3:4Þ

where f~kjg is a set of O(M2) points sampled from a smooth surface in the Fourier domain ½�M=2;M=2�3; f~xig � ½�M=2;M=2�3

is a set of O(M2) points sampled from another smooth surface in the spatial domain [�M/2,M/2]3, and {fj} are the sources atf~kjg. The butterfly algorithm proposed in [30] first constructs adaptive octrees TX and TK for the sets f~xig and f~kjg respectively.The octree TX takes [�M/2,M/2]3 as the top level box. Each box is partitioned recursively into eight identical child boxes untilall leaf boxes are of unit size, and only the boxes that contain points in f~xig are kept. The octree TK is constructed in the sameway with [�M/2,M/2]3 as the top level box and f~kjg as the point set. Since the sets f~xig and f~kjg are both of order O(M2), theoverall cost of this butterfly algorithm is O(M2logM), which is essentially linear. We refer to [30] for the detailed complexityanalysis, and [5] for the error analysis.

Let us discuss now how to turn the far-field integral in (2.18) into the form of (3.4). If we set as beforejði; qÞ ¼ ði� 1ÞQ þ q;~yj ¼~pi;q;

~f j ¼ ai;q J!ð~pi;qÞ, and Ny = NtQ, then the summation in (2.18) can be written as

A!ðrsÞ ¼

XNy

j¼1

e2pırs �~yj~f j: ð3:5Þ

In order to get the above sum in the same form as the sparse Fourier transform, we introduce a simple trick. Setting M = 2K,we multiply the size of our scattering object by 2; since~yj 2 � K

2 ;K2

� �3, we have that 2~yj 2 ½�K;K�3 ¼ �M2 ;

M2

� �3. In addition, in-stead of evaluating the sum on the unit sphere (where jrj ¼ 1), we will evaluate on a sphere of radius K. Now, we haveKrs 2 ½�K;K�3 ¼ �M

2 ;M2

� �3; thus, both point sets fall into the octree structure of the butterfly algorithm. With these modifi-cations, we can rewrite (3.5) as

XNy

j¼1

e2pıðKrsÞ�ð2~yjÞ=M~f j;

which is in the form of the sparse Fourier transform (3.4). Since Nr ¼ OðNÞ ¼ OðM2Þ and Ny = O(N) = O(M2), the computationalcost of the Butterfly algorithm is O(M2logM) = O(N logN). One minor difference is that~f j has three components instead of one.Instead of running the butterfly algorithm three times (once for each component), we run the algorithm once but with vec-torized operations, much like we did with the directional multilevel algorithm.

4. Numerical results

The general setup of our numerical tests is as follows. The triangular meshes we use have a refinement of 5 elements perwavelength (h = 0.2) to 10 elements per wavelength (h = 0.1); since the goal of this paper is to develop fast algorithms morethan to show the convergence of the boundary element method, we feel that this is sufficient enough. For numerical inte-gration over each triangle, we use Gaussian quadrature of order 5 for non-singular integrals and Duffy quadrature of order 5around singularities. All of the code has been implemented serially in C++. To show the complexity of our methods, weproduced the run times on a computer with a 2.13 GHz CPU benchmarked at 1.38 GFLOPS using serial LINPACK.

Table 1Computational times and relative error 2-norms for the directional algorithm with discretization points on the surface of a sphere.

(K,e) Np Te(sec) Tm(sec) ee em

(12,1e�4) 4.669e+5 1.080e+2 1.250e+2 3.771e�4 6.428e�4(24,1e�4) 1.867e+6 4.830e+2 5.470e+2 4.027e�4 6.594e�4(48,1e�4) 7.471e+6 2.150e+3 2.384e+3 3.760e�4 6.991e�4(12,1e�6) 4.669e+5 4.140e+2 3.690e+2 1.500e�6 3.639e�6(24,1e�6) 1.867e+6 1.790e+3 1.601e+3 1.646e�6 3.794e�6(48,1e�6) 7.471e+6 7.665e+3 6.773e+3 2.227e�6 3.472e�6(12,1e�8) 4.669e+5 1.217e+3 1.000e+3 8.648e�9 6.371e�8(24,1e�8) 1.867e+6 5.148e+3 4.198e+3 1.637e�8 6.625e�8(48,1e�8) 7.471e+6 2.165e+4 1.742e+4 1.349e�8 5.284e�8

Table 2Computational times and relative error 2-norms for the directional algorithm with discretization points on the surface of the NASA almond.

(K,e) Np Te(sec) Tm(sec) ee em

(32,1e�4) 3.164e+5 8.400e+1 8.800e+1 4.083e�4 5.291e�4(64,1e�4) 1.265e+6 3.860e+2 3.860e+2 3.463e�4 5.042e�4(128,1e�4) 5.059e+6 1.785e+3 1.745e+3 4.350e�4 4.824e�4(32,1e�6) 3.164e+5 3.130e+2 2.700e+2 2.386e�6 2.860e�6(64,1e�6) 1.265e+6 1.372e+3 1.171e+3 1.363e�6 2.814e�6(128,1e�6) 5.059e+6 6.082e+3 5.177e+3 1.977e�6 2.579e�6(32,1e�8) 3.164e+5 9.310e+2 7.460e+2 1.396e�8 3.300e�8(64,1e�8) 1.265e+6 3.978e+3 3.226e+3 1.361e�8 3.139e�8(128,1e�8) 5.059e+6 1.719e+4 1.363e+4 1.573e�8 3.099e�8


4.1. Tests of the directional multilevel algorithm

Before solving electromagnetic scattering problems by the CFIE, we first test the directional multilevel algorithm on thesummations (2.15) to show the scaling properties and attainable accuracy. The tests are performed on two commonly-usedexamples: the sphere and the NASA almond (see Fig. 4). Here, we choose to use meshes using 5 elements per wavelength. Inthese tables, K is the diameter of the object in terms of wavelength, e is the prescribed order of accuracy, Te is the time of theEFIE summation, Tm is the time of the MFIE summation, ee is the relative error for the EFIE summation, and em is the relativeerror for the MFIE summation. Np is the total number of discretization points in the directional multilevel algorithm, includ-ing the Gaussian quadrature nodes over each triangle, the centroid of each triangle, and the center of each edge. To estimatethe relative error between direct calculation and the algorithm, we compute the true values of the summations at a subset Rof 200 points sampled from the total number of points. The relative error is computed via

103 104 105 106101

102

103

104

number of unknowns (RWG basis functions)

CPU

tim

e pe

r mat

rix−v

ecto

r mul

tiplic

atio

n

Complexity of the Fast Directional Algorithm

FDAO(N log N)O(N)

Fig. 3. CPU time per matrix–vector multiplication vs. the number of unknowns for the sphere.

Table 3Top: Bistatic RCS of a 12k radius sphere (K = 24). Bottom: Solver times and RCS relative error 2-norms for the boundary element code with the fast directionalalgorithm for the sphere.

(K,e) N Tsolve(sec) nit RCS error

(6,1e�4) 7.373e+4 3.439e+3 18 1.659e�02(12,1e�4) 2.949e+5 1.856e+4 23 1.136e�02(24,1e�4) 1.180e+6 9.602e+4 28 8.851e�03

Table 4Top: Bistatic RCS of a 15k cube. On the left are figures for our boundary element code using the fast directional algorithmand butterfly algorithm. On the right are figures done by Song and Chew in [26] using the MLFMA. Bottom: Solver timesfor the boundary element code with the fast directional algorithm for the cube.

(K,e) N Tsolve(sec) nit

(3.75,1e�4) 1.843e+4 9.620e+2 14(7.5,1e�4) 7.373e+4 2.887e+3 16(15,1e�4) 2.949e+5 1.466e+4 18


−5−4

−3−2

−10

12

34

5

−10

1

−0.50

0.5

Fig. 4. The NASA almond geometry.

−5−4

−3−2

−10

12

34

5

−1

0

1

−1

−0.5

0

0.5

1

Fig. 5. The ogive geometry.


ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPi2R

~ui �~uci

2Pi2Rj~uij2

vuut ; ð4:1Þ

where ~ui is the value by direct computation and ~uci is the value by the directional algorithm.

Table 1 shows data for the sphere, while Table 2 shows data for the NASA almond. It is observed that the computationaltime grows roughly by a factor of 3 or 4 when increasing the accuracy by two digits. We note that the algorithm for the MFIEsummation is slightly less accurate; nevertheless, the same order of accuracy is retained. To illustrate the O(N logN) scaling ofthe algorithm, we plot computational time versus the number of unknowns for the sphere in Fig. 3. Here, the size of thesphere ranges from K = 3 to K = 48, and the error tolerance is 1e�4.

4.2. Electromagnetic scattering problems

For scattering examples, we choose a few test geometries that have been well established in the electromagnetics com-munity. In these tests, we set the residual tolerance for GMRES iteration at 1e�3, while the error tolerance of the directionalalgorithm and butterfly algorithm are set at 1e�4. The constant a, which determines the balance between the EFIE and MFIE,is set at 0.3. Since the goal is to produce sufficiently accurate RCS plots, we choose to use refined meshes with 10 elementsper wavelength. To show the accuracy of the butterfly algorithm, we compare our radar cross section computations to eitheranalytical results or measured experiments. For analytical results, we measure the relative error of the radar cross section;


that is, if rðrÞ is the analytical solution of the RCS and rcðrÞ is the numerical result, given the point set frsgNrs¼1 on the unit

sphere we compute

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNrs¼1jrðrsÞ � rcðrsÞj2

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPNr

s¼1jrðrsÞj2q : ð4:2Þ

We perform tests on three commonly-used surfaces. The first example is a conducting sphere, which has well-documentedanalytical solutions [15]. In our setup, the incident plane wave is propagating in the �z direction, with the E-field in the þxdirection and H-field in the �y direction. For each simulation, the bistatic RCS was calculated along h for / = 0 and / ¼ p

2;here, h and / take on their usual definitions in spherical coordinates. Table 3(top) shows the bistatic RCS of a 12k-radiussphere. The Mie series solution is compared to the boundary element method using the directional multilevel algorithmin Table 3(bottom) for diameters K = 6, 12, and 24. We list both the solver times and number of unknowns for each exampleto show the O(N logN) scaling, where N is the number of RWG basis functions. Here, Tsolve is the total run-time of the GMRESiterative solver, and nit is the number of iterations necessary for convergence. It is important to note that since we are notusing any preconditioners, the condition number of our problem grows with respect to the number of unknowns, leading toan increased number of iterations.

The second example is a conducting cube with the same incoming field as the previous example. Table 4(top) shows thebistatic RCS of a cube with sides which are 15k long; although we do not have an analytical solution for the RCS or measureddata on the cube, we present two figures from the MLFMA paper by Song and Chew [26] as a comparison. The running timeand iteration numbers are reported in Table 4(bottom). We note that the RCS for the / = 0 cut is not exactly symmetric, butby no fault of the butterfly algorithm; we believe this is a result of using a crude testing scheme and not employing accuratenear-field techniques in the regions of corners and edges.

Table 5Top: Monostatic RCS of the ogive at 9.0 GHz. On the left are figures for our boundary element code using the fastdirectional algorithm and butterfly algorithm. On the right are figures done by Gibson in [12] using the MLFMA. Bottom:Solver times for the boundary element code with the fast directional algorithm for the ogive.

(K,e) N Tsolve(sec) nit

(16,1e�4) 3.246e+4 1.012e+3 14(32,1e�4) 1.297e+5 5.183e+3 16(64,1e�4) 5.188e+5 2.529e+4 18

Table 6Computational times per iteration for the sphere, using the MLFMA; numbers are determined based on the tables in [12]and [8]. Here, K is the diameter in terms of wavelength and Titer is time per iteration.

K N CPU speed Titer (sec)

5 3.072e+4 1.5 GFLOPS 8510 1.229e+5 1.5 GFLOPS 365120 9.633e+6 11 GFLOPS 1088

Table 7Computational times per iteration for the sphere, using the fast directional multilevel algorithm. Here, K is thediameter in terms of wavelength and Titer is time per iteration.

K N CPU speed Titer (sec)

5 3.175e+4 1.38 GFLOPS 8010 1.210e+5 1.38 GFLOPS 325120 9.618e+6 1.38 GFLOPS 8108


The third example, the ogive, is a popular benchmark target for testing electromagnetics codes [29] (see Fig. 5). Here, wetake the example of the 10 inch ogive at a frequency of 9.0 GHz. Since we take our wavelength to be 1 at all times, we scalethe geometry appropriately so that the object is still of the same diameter K in terms of wavelength. Table 5(top) shows themonostatic RCS about the observation angle /. In these plots, VV polarization signifies the E-field oriented in the þz directionand the H-field directed in the x � y plane; HH polarization signifies the E-field in the x � y plane and the H-field oriented inthe �z direction. In the comparison plots provided by Gibson in [12], the RCS curves are not normalized by wavelength; thus,in order to scale our RCS calculations down to their actual levels, we must compute 10log10(RCS � k2), where k is the free-space wavelength at the chosen frequency. Table 5(bottom) illustrates the complexity in solving larger problems; that is,for K = 16, 32, and 64.

4.3. Comparison to the MLFMA

To compare the performance of the fast directional algorithm with current methods, we refer to table 8.2 on page 252 in[12]. Here, the MLFMA is used to solve a variety of scattering problems; the simulations are done with Gibson’s Serenity codeon a comparable 2.0 GHz processor. Since Gibson uses an ILUT preconditioner to reduce the number of iterations, we observeonly the approximate iteration time. For the 2-meter sphere at 0.75 and 1.5 GHz, the solution times (which include FMMsetup and RCS calculations) are 11 min and 68 min, respectively. Given that convergence is achieved in 10 iterations or less,this corresponds to iteration times of approximately 85 s and 365 s. At these frequencies, the 2-meter sphere is 5k and 10k indiameter; we scale our geometry appropriately with a similar number of unknowns. For the 5k and 10k spheres, the fastdirectional algorithm completes one iteration in 80 s and 325 s, respectively. The results are presented in Tables 6 and 7.

To compare algorithms in a truly high-frequency setting, we take the example of a 120k diameter sphere in [8]. For thisexample, we use one quadrature point per face, instead of the regular fifth order Gaussian quadrature. The machine used bythe authors in [8] is an SGI Origin2000 utilizing 32 processors, which is benchmarked at 11 GFLOPS; therefore, the compu-tational times presented by them should be about 8 times faster. At 13 h for 43 iterations, their computational time is 1088 sper iteration; our code completes one iteration in 8108 s. Thus, we can confidently say that our method is competitive withthe MLFMA.

Fig. 6. 2D illustrations for components of the 3D algorithm. (a) B and WB, ‘ follow the directional parabolic separation condition. (b) A cross-section of theoctree of a kite-shaped scatterer.


5. Conclusion

In this paper, we adapted the fast directional multilevel algorithm proposed in [10] for electromagnetic scattering fromperfectly conducting structures; most notably, we extended the algorithm to handle the curl operator in the magnetic fieldintegral equation. This extension allows the use of combined field integral equations, which lead to better conditioned linearsystems after discretization and a fewer number of GMRES iterations necessary for convergence. From an implementationstandpoint, we have aggregated the computations such that at each iteration, only two matrix–vector products are per-formed (one for the EFIE and one for the MFIE). The result is an integral equation solver of O(N logN) complexity, whereN is the number of unknowns; it is possible to solve 3D scattering problems which are hundreds of wavelengths long in avery reasonable time.

A few remarks can be made about the versatility of the directional algorithm here. First, it is important to note that thesame directional algorithm can be used to speed up the solution of scattering problems with homogeneous dielectric bodies.Since the Green’s function in an infinite homogeneous medium has the same form as the free space Green’s function (theonly difference is the wavenumber k), the only modifications necessary are to the integral equation and the scaling of thegeometry; no changes to the directional multilevel code are needed. Secondly, although we only consider the standard CFIEformulation, the method can also be applied to a variety of other formulations, including the Calderon multiplicative precon-ditioner presented in [1]. Finally, we note that the directional algorithm could have be used to compute the radar crosssection in the post-processing stage. That is, if ~r is chosen to be large enough, the scattered field can be computed usingthe radiation formulas (2.3) instead of the far-field approximation. However, this would require another setup phase forthe extra evaluation points, and the scattered field would have to be computed at the source point locations as well; thisis a result of treating the source and target points the same, as discussed in paragraph 2 of Section 3.1. For the butterfly algo-rithm, the source and target points are independent of each other; thus, it is more efficient to use the butterfly algorithm inour case.

Appendix A. Directional multilevel algorithm

The main idea behind the directional multilevel algorithm proposed in [10] is the directional low rank property of theGreen’s function. That is, consider a box B of width wk with w P 1 and a wedge WB,‘ as illustrated in Fig. 6 (a); if WB,‘ spansan angle of size O(1/w) and the distance between B and WB,‘ is at least O(w2k), we say that WB,‘ and B satisfy the directionalparabolic separation condition. In [10], it was proven that for any arbitrary accuracy e, there exists a te-term separated approx-

imation of G(x,y) for any y 2 B and x 2WB,‘. In practice, we find sets yB;‘q

n ote

q¼1; xB;‘

p

n ote

p¼1and a matrix DB;‘ ¼ dB;‘

qp

� 16p;q6te

suchthat

Gðx; yÞ �Xte

q¼1

G x; yB;‘q

� Xte

p¼1

dB;‘qp G xB;‘

p ; y�

6 e ðA:1Þ

for y 2 B and x 2WB,‘. We call this the directional separated approximation of B in direction ‘. The locations yB;‘q

n o; xB;‘

p

n o, and

the matrix D can be computed easily using the low-rank factorization scheme detailed in [11]; it is important to emphasizethat the separation rank te is independent of the size of B.

Consider the sources {fi} located at {zi} in B. The directional separated approximation allows us to compute the potentialsin WB,‘ generated by {fi} with a smaller set of te sources. More precisely, after applying (A.1) to each zi and summing them upover all the sources, we have

Xzi2B

Gðx; ziÞfi �Xte

q¼1

G x; yB;‘q

� Xte

p¼1

dB;‘qp

Xzi2B

G xB;‘p ; zi

� fi

! ¼ OðeÞ: ðA:2Þ

This states that we can place a set of sources

f B;‘q :¼

Xte

p¼1

dB;‘qp

Xzi2B

G xB;‘p ; zi

� fi

( )ðA:3Þ

at points yB;‘q

n oin order to reproduce the potential at x 2WB,‘ generated by the sources {fi} located at points {zi} in B. These

sources are called the directional equivalent sources of B in direction ‘ and they play the role of the multipole expansions in

the original FMM [13,14]. It is obvious from (A.2) that the computation of f B;‘q

n oessentially requires only the potentialsP

zi2BG xB;‘p ; zi

� fi

n oat xB;‘

p

n o.

Since G(x,y) = G(y,x), we can reverse the role of the source and target to get a similar result for the analogous local expan-sions. Consider now the sources {fi} located at points {zi} in WB,‘. Applying (A.1) to each zi and summing over all the sourcesgives


Xzi2WB;‘

Gðy; ziÞfi �Xte

p¼1

G y; xB;‘p

� Xte

q¼1

dB;‘qp

Xzi2WB;‘

G yB;‘q ; zi

� fi

0@

1A

¼ OðeÞ:

This means that from the potentials uB;‘q :¼

Pzi2WB;‘G yB;‘

q ; zi

� fi

n o, we can reproduce the potential at any y 2 B generated by

{fi} at {zi} �WB,‘. These potentials are called the directional check potentials of B in direction ‘; they play the role of the local

expansions in the original FMM.For a box B of width wk with w < 1, the wedges are replaced with a single annulus WB with an outer radius that extends to

infinity. The equivalent sources, the check potentials, the point sets xBp

n oand yB

q

n o, and the transform matrices can be de-

fined similarly, but they are non-directional now (see [31] for details). For example, the equivalent sources for B can be com-puted by

f Bq :¼

Xte

p¼1

dBqp

Xzi2B

G xBp; zi

� fi

( )ðA:4Þ

and it is clear that the computation again requires only the potentialsP

zi2BG xBp; zi

� fi at xB.

The directional multilevel algorithm begins by constructing an octree that contains the entire scatterer (see Fig. 6(b)).Similarly to [7], we split the tree into two regimes: the low-frequency regime and high-frequency regime. A box B of widthwk is considered to be in the low frequency regime if w < 1 and in the high-frequency regime if w P 1. In the low-frequencyregime, a box B is partitioned as long as the number of points in B is greater than a fixed constant P; in the high-frequencyregime, the domain is partitioned uniformly without any adaptivity, with empty boxes being discarded. The algorithm nowuses the translation operators introduced in [31] in the low-frequency regime. In order to use the directional separatedapproximation in (A.1), we define the far field FB of a box B to be the region that is at least w2k away in the high-frequencyregime; conversely, the near field NB is defined as the union of boxes which are less than w2k away. A box A is said to be in theinteraction list of B if A is in B’s far field but not in the far field of B’s parent. FB is further partitioned into a group of directionalwedges {WB,‘}, where each wedge is contained in a cone with spanning angle O(1/w).

A crucial point is that the wedges of the parent box and the child box are nested, so that we are able to construct M2M,M2L, and L2L translations of O(1) complexity as in the original FMM algorithm [14]. However, these translations in the highfrequency regime are now directional. Combining these parts gives us the following high-frequency directional multilevelalgorithm.

1. Construct the octree. In the high-frequency regime, the boxes are partitioned uniformly. In the low-frequency regime, theboxes are partitioned adaptively until each leaf level box contains at most P points.

2. Travel up the low-frequency regime. For each box B, compute the non-directional equivalent sources following [31].3. Travel up the high-frequency regime. For each box B and each WB,‘, compute f B;‘

q

n ousing the directional M2M translation.

We skip the boxes with width greater thanffiffiffiffiKp

k since their interaction lists are empty.4. Travel down the high-frequency regime. For each box B and each WB,‘

(a) Transform f A;‘q

n oof the boxes {A} in B’s interaction list and in direction ‘ using the directional M2L translation. Next,

add the result to uB;‘q

n o.

(b) Transform uB;‘q

n ointo the directional check potentials for B’s children using the directional L2L translation.

5. Travel down the low-frequency regime. For each box B:(a) Transform the non-directional equivalent sources of the boxes {A} in B’s interaction list using the non-directional M2L

translation. Next, add the result to the non-directional check potentials.(b) Perform the non-directional L2L translation. If B is a leaf box, add the result to the potentials at the original points

inside B. If not, then add the result to the non-directional check potentials of B’s children.6. Compute the near-field interactions. For each leaf box B, add contributions from sources in NB directly to the potentials at

points in B.

For a point set fzigNzi¼1 obtained from discretizing the surface of a scatterer in [� K/2,K/2]3, Nz = O(K2). It is shown in [10,11]

that the overall complexity of this algorithm is O(NzlogNz) = O(K2logK).The directional multilevel algorithm is also efficient memory-wise; the main memory requirement comes from storing

the translation operators. Since the kernel function G(x,y) is known explicitly, one only needs to store the DB,‘ matricesfor each translation operator. Due to the translation invariance of the kernel, these matrices only depend on the width ofB and the wedge direction ‘; therefore, the actual number of matrices is very small. The Green’s function terms in the trans-lation operators and near-field interactions are all computed on the fly for each iteration.

References

[1] H. Bagci, F. Andriulli, K. Cools, F. Olyslager, E. Michielssen, A Calderon multiplicative preconditioner for the combined field integral equation, IEEETrans. Antennas Propag. 57 (10) (2009) 3387–3392.


[2] M. Bebendorf, S. Kunis, Recompression techniques for adaptive cross approximation, J. Integral Equat. Appl. 21 (3) (2009) 331–357.[3] E. Bleszynski, M. Bleszynski, T. Jaroszewicz, AIM: adaptive integral method for solving large-scale electromagnetic scattering and radiation problems,

Radio Sci. 31 (5) (1996) 1225–1251.[4] O.P. Bruno, L.A. Kunyansky, A fast, high-order algorithm for the solution of surface scattering problems: basic implementation, tests, and applications, J.

Comput. Phys. 169 (1) (2001) 80–110.[5] E. Candes, L. Demanet, L. Ying, A fast butterfly algorithm for the computation of Fourier integral operators, Multiscale Model. Simul. 7 (4) (2009) 1678–

1694.[6] F.X. Canning, K. Rogovin, Fast direct solution of standard moment-method matrices, IEEE Trans. Antennas Propag. 40 (3) (1998) 15–26.[7] H. Cheng, W.Y. Crutchfield, Z. Gimbutas, L.F. Greengard, J.F. Ethridge, J. Huang, V. Rokhlin, N. Yarvin, J. Zhao, A wideband fast multipole method for the

Helmholtz equation in three dimensions, J. Comput. Phys. 216 (1) (2006) 300–325.[8] W. Chew, J. Jin, E. Michielssen, J. Song, Fast and Efficient Algorithms in Computational Electromagnetics, Artech House, 2001.[9] M. Duffy, Quadrature over a pyramid or cube of integrands with a singularity at a vertex, SIAM J. Numer. Anal. 19 (1982) 1260–1262.

[10] B. Engquist, L. Ying, Fast directional multilevel algorithms for oscillatory kernels, SIAM J. Sci. Comput. 29 (4) (2008) 1710–1737.[11] B. Engquist, L. Ying, A fast directional algorithm for high frequency acoustic scattering in two dimensions, Commun. Math. Sci. 7 (2) (2009) 327–345.[12] W. Gibson, The Method of Moments in Electromagnetics, Chapman and Hall, 2007.[13] L. Greengard, The rapid evaluation of potential fields in particle systems, ACM Distinguished Dissertations, MIT Press, Cambridge, MA, 1988.[14] L. Greengard, V. Rokhlin, A fast algorithm for particle simulations, J. Comput. Phys. 73 (2) (1987) 25–348.[15] R. Harrington, Time-harmonic Electromagnetic Fields, McGraw-Hill, 1961.[16] A. Heldring, J.M. Tamayo, J.M. Rius, Fast direct solution of the combined field integral equation, URSI Int. Symp. Electromagn. Theory (2010) 524–527.[17] S. Kapur, D. Long, IES3: efficient electrostatic and electromagnetic simulation, IEEE Comput. Sci. Eng. 5 (4) (1998) 60–67.[18] P.G. Martinsson, V. Rokhlin, A fast direct solver for scattering problems involving elongated structures, J. Comput. Phys. 221 (1) (2007) 288–302.[19] E. Michielssen, A. Boag, W.C. Chew, Scattering from elongated objects: direct solution in O(N log2 N) operations, IEE Proc. Microw. Antennas Propag.

143 (4) (1996) 277–283.[20] A. Peterson, S. Ray, R. Mittra, Computational Methods for Electromagnetics, Wiley-IEEE Press, 2001.[21] J. Phillips, J. White, A precorrected-FFT method for electrostatic analysis of complicated 3-D structures, IEEE Trans. Comput. Aided Des. Integr. Circuits

Syst. 16 (10) (1997) 1059–1072.[22] S. Rao, D. Wilton, A. Glisson, Electromagnetic scattering by surfaces of arbitrary shape, IEEE Trans. Antennas Propag. 30 (3) (1982) 409–418.[23] V. Rokhlin, Rapid solution of integral equations of scattering theory in two dimensions, J. Comput. Phys. 86 (2) (1990) 414–439.[24] S.M. Seo, J.F. Lee, A single-level low rank IE-QR algorithm for PEC scattering problems using EFIE formulation, IEEE Trans. Antennas Propag. 52 (8)

(2004) 2141–2146.[25] S.M. Seo, J.F. Lee, A fast IE-FFT algorithm for solving PEC scattering problems, IEEE Trans. Magn. 41 (5) (2005) 1476–1479.[26] J. Song, W. Chew, Multilevel fast multipole algorithm for solving combined field integral equations of electromagnetic scattering, Microwave Optical

Technol. Lett. 10 (1) (1995) 14–19.[27] J. Song, C. Lu, W. Chew, Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects, IEEE Trans. Antennas Propag. 45

(10) (1997) 1488–1493.[28] S. Wandzura, H. Xiao, Symmetric quadrature rules on a triangle, Comput. Math. Appl. 45 (12) (2003) 1829–1840.[29] A. Woo, H. Wang, M. Schuh, M. Sanders, Benchmark radar targets for the validation of computational electromagnetics programs, IEEE Antennas

Propag. Mag. 35 (1) (1993) 84–89.[30] L. Ying, Sparse Fourier transform via butterfly algorithm, SIAM J. Sci. Comput. 31 (3) (2009) 1678–1694.[31] L. Ying, G. Biros, D. Zorin, A kernel-independent adaptive fast multipole algorithm in two and three dimensions, J. Comput. Phys. 196 (2) (2004) 591–

626.[32] A. Zhu, R. Adams, F. Canning, Modified simply sparse method for electromagnetic scattering by PEC, IEEE Antennas Propag. Soc. Int. Symp. 4A (2005)

427–430.

Documents

Journal of Computational Physics - web.stanford.edulexing/empaper.pdf · Received 16 September 2010 Received in revised form 1 February 2011 Accepted 13 February 2011 Available online