Hausdorff Distance under Translation for Points and Ballsweb.cse.ohio-state.edu/~wang.1016/papers/haus-dcg.pdf · 2006. 11. 29. · Hausdorff Distance under Translation for Points

Hausdorff Distance under Translation for Points and Balls�

Pankaj K. Agarwal�

Sariel Har-Peled�

Micha Sharir�

Yusu Wang�

February 4, 2006

Abstract

We study the shape matching problem under the Hausdorff distance and its variants. In the first partof the paper, we consider two sets �� of balls in �� , �� , and wish to find a translation � thatminimizes the Hausdorff distance between �� , the set of all balls in � shifted by � , and � . We considerseveral variants of this problem. First, we extend the notion of Hausdorff distance from sets of points tosets of balls, so that each ball has to be matched with the nearest ball in the other set. We also considerthe problem in the standard setting, by computing the Hausdorff distance between the unions of the twosets (as point sets). Second, we consider either all possible translations � (as is the standard approach),or consider only translations that keep the balls of �� disjoint from those of � . We propose severalexact and approximation algorithms for these problems. In the second part of the paper, we note that theHausdorff distance is sensitive to outliers, and thus consider two more robust variants—the root-mean-square (rms) and the summed Hausdorff distance. We propose efficient approximation algorithms forcomputing the minimum rms and the minimum summed Hausdorff distances under translation, betweentwo point sets in �� . In order to obtain a fast algorithm for the summed Hausdorff distance, we proposea deterministic efficient dynamic data structure for maintaining an � -approximation of the � -median of aset of points in �� , under insertions and deletions.

1 Introduction

The problem of shape matching in two and three dimensions arises in a variety of applications, includ-ing computer graphics, computer vision, pattern recognition, computer aided design, and molecular biol-ogy [7, 17, 27]. For example, proteins with similar shapes are likely to have similar functionalities, there-fore classifying proteins (or their fragments) based on their shapes is an important problem in computationalbiology. Similarly, the proclivity of two proteins binding with each other also depends on their shapes, soshape matching is central to the so-called docking problem in molecular biology [17].�P.A. is supported by by NSF grants EIA-98-70724, EIA-99-72879, ITR-333-1050, CCR-97-32787, and CCR-00-86013, and

by a grant from the U.S.-Israeli Binational Science Foundation (jointly with M.S.). S.H. is supported by NSF CAREER awardCCR-01-32901. M.S. is also supported by NSF Grants CCR-97-32101 and CCR-00-98246, by a grant from the Israel ScienceFund (for a Center of Excellence in Geometric Computing), and by the Hermann Minkowski–MINERVA Center for Geometry atTel Aviv University. Y.W. is supported by NSF grants ITR-333-1050 and CCR-02-04118.Part of this work was done while the last author was at Duke University.A preliminary version of this paper appeared in Proc. 19th Annual Sympos. Comput. Geom., 2003, pp. 282–291.�

Department of Computer Science, Duke University, Durham, NC 27708-0129, U.S.A. Email: [email protected].�Department of Computer Science, University of Illinois, Urbana, IL 61801, U.S.A. Email: [email protected]. School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel; and Courant Institute of Mathematical Sciences,

New York University, New York, NY 10012, USA. Email: [email protected].!Department of Computer Science and Engineering, Ohio State University, 2015 Neil Avenue, Columbus, OH 43210, USA.

Email: [email protected]

1

Informally, the shape-matching problem can be described as follows: Given a distance measure be-tween two sets of objects in �� or �� , determine a transformation, from an allowed set, that minimizesthe distance between the sets. In many applications, the allowed transformations are all possible rigid mo-tions. However, in certain applications there are constraints on the allowed transformations. For example, inmatching the pieces of a jigsaw puzzle, it is important that no two pieces overlap each other in their matchedpositions. Another example is the aforementioned docking problem, where two molecules bind togetherto form a compound, and, clearly, at this docking position the molecules should occupy disjoint portionsof space [17]. Moreover, because of efficiency considerations, one sometimes restricts further the set ofallowed transformations, most typically to translations only.

Several distance measures between objects have been proposed, varying with the kind of input objectsand the application. One common distance measure is the Hausdorff distance [7], originally proposed forpoint sets. In this paper we adopt this measure, extend it to sets of non-point objects (mainly, disks andballs), and apply it to several variants of the shape matching problem, with and without constraints on theallowed transformations. In many applications (e.g., molecular biology), shapes can be approximated by afinite union of balls [8], which is therefore the main type of input assumed in the first part of this paper.

1.1 Problem statement

Let � and � be two (possibly infinite) sets of geometric objects (e.g., points, balls, simplices) in �� , and let� � �� be a distance function between objects in � and in � . For �� , we define�� ! #"%$'&)( �� +*,� . Similarly, we define

�� *-�.�/�0� �! 1"�23&54 �� +*,� , for *6�7� . The directional Hausdorff distancebetween � and � is defined as 8 � �9��:�<;�=#>2,&,4 ?� �?��@�and the Hausdorff distance between � and � is defined as

A � �9��0�CBEDGF?H 8 � �9��@� 8 � �I�.�/�+JLK(It is important to note that in this definition each object in � or in � is considered as a single entity, and notas the set of its points.) In order to measure similarity between � and � , we compute the minimum valueof the Hausdorff distance over all translates of � within a given set MONP� � of allowed translation vectors.Namely, we define Q � �9��SRTM��0� �U #"V &-W A � �YX[Z@��@�where �YX\Z]�^H_�`X\Z6ab�c�d�eJ . In our applications, M will either be the entire �f� or the set of collision-free translates of � at which none of its objects intersects any object of � . The collision-free matchingbetween objects is useful for applications (such as those mentioned above) in which the goal is to locate atransformation where the collective shape of one set of objects best complements that of the other set. Wewill use

Q � �9�� to denoteQ � �9��IR%� � � .

As already mentioned, our definition of (directional) Hausdorff distance is slightly different from theone typically used in the literature [7], in which one considers the two unions gh� , gi� as two (possiblyinfinite) point sets, and computes the standard Hausdorff distance

A � gh�9�@gi��CB/DGF�H 8 � gj�9�@gi��@� 8 � gi�I�@gh�/�+JL�where 8 � gh�9�@gk��0�l;�=#>m &)n-4 ?�po �@gk��:�l;�=#>m &)n-4 �! 1"q &)nr( ?�po ��sL�@K

2

We will denote gh� (resp., gk� ) as � 4 (resp., � ( ), and use the notation8 � � �9�� to denote

8 � � 4 �� ( � .Analogous meanings hold for the notations

A � � �9�� andQ � � �9��SRTM�� .

A drawback of the directional Hausdorff distance (and thus of the Hausdorff distance) is its sensitivityto outliers in the given data. One possible approach to circumvent this problem is to use “partial matching”[14], but then one has to determine how many (and which) of the objects in � should be matched to � .Another possible approach is to use the root-mean-square (rms, for brevity) Hausdorff distance between �and � , defined by

8 � � �9�� 4 � � �� 4 � �� D �

A � � �9�� BEDGF�H 8 � � �9��@� 8 � � �S�.�/�+JL�with an appropriate definition of integration (usually, summation over a finite set or the Lebesgue integrationover infinite point sets). Define

Q � � �9��IRTM6� � �! #" V &_W A � � �^X Z �� . Finally, as in [23], we define thesummed Hausdorff distance to be 8�� 9�� 4 ?� �?�� 4 � �and similarly define

A �and

Q��. Informally,

8 � �9�� can be regarded as an �� -distance over the sets ofobjects � and � . The two new definitions replace � � by � � and � � , respectively.

1.2 Previous results

It is beyond the scope of this paper to discuss all the results on shape matching. We refer the reader to [7,17, 27] and references therein for a sample of known results. Here we summarize known results on shapematching using the Hausdorff distance measure.

Most of the early work on computing Hausdorff distance focused on finite point sets. Let � and � betwo families of � and � points, respectively, in � � . In the plane,

A � �9�� can be computed in � �%� � X� �� time using Voronoi diagrams [5]. In � � , it can be computed in time � �%� � X�� "!$# � , where%'&)( is an arbitrarily small constant, using the data structure of Agarwal and Matousek [2]. Huttenlocheret al. [22] showed that

Q � �9�� can be computed in time � � �� X*� � + � �� ,�� in � � , and in time� �%� �� '� � � X-� � � !$# � in �� , for any %.&/( . Chew et al. [14] presented an � �%� � X-� ��0p� � �212! � �� -timealgorithm to compute

Q � �9�� in �0� for any4365

. The minimum Hausdorff distance between � and �under rigid motion in �:� can be computed in � �%� ��X7� � 89�� time [21].

Faster approximation algorithms to computeQ � �9�� were first proposed by Goodrich et al. [16]. Aich-

holzer et al. proposed a framework of approximation algorithms using reference points [4]. In � � , theiralgorithm approximates the optimal Hausdorff distance within a constant factor, in � �%� � X:� ��;��<�� timeover all translations, and in � � ��=�� ;�� >$�� time over rigid motions. The reference point approachcan be extended to higher dimensions. However, it neither approximates the directional Hausdorff distanceover a set of transformations, nor can it cope with the partial-matching problem.

Indyk et al. [23] study the partial matching problem, i.e., given a query ? &/( , compute a rigid transform@ so that the number of pointso �E� for which

�� @ �po �@��BAC? is maximized. They present algorithms for % -approximating the maximum-size partial matching over the set of rigid motions in � � ��ED'F � ? % � � >G�H��IJ�� LKNM#�O �%�time in �0� , and in � � ��ED/�PF � ? � % �_� >$�H�;I�� KNM#2O �%� time in �� , where D is the maximum of the spreads ofthe two point sets.1 Their algorithm can be extended to approximate the minimum summed Hausdorffdistance over rigid motions. Similar results were independently achieved in [13] using a different technique.

1The spread of a set of points is the ratio of its diameter to the closest-pair distance.

3

Algorithms for computingA � � �9�� and

Q � � �9�� , where � and � are sets of segments in the plane,or sets of simplices in higher dimensions are presented in [3, 5, 6]. Atallah [12] presents an algorithm forcomputing

A � � �9�� for two convex polygons in � � . Agarwal et al. [3] provide an algorithm for computingQ � � �9�� , where � and � are two sets of � and � segments in �:� , respectively, in time � �%� �� %�9�� .If rigid motions are allowed, the minimum Hausdorff distance between two sets of points in the plane canbe computed in time � �%� �� %� (Chew et al. [15]). Aichholzer et al. [4] present algorithms forapproximating the minimum Hausdorff distance under different families of transformations for sets of pointsor of segments in � � , and for sets of triangles in � � , using reference points. Other than that, little is knownabout computing

Q � � �9�� orQ � �9�� where � and � are sets of simplices or other geometric shapes in

higher dimensions.

1.3 Our results

In this paper, we develop efficient algorithms for computingQ � �9��SRTM�� and

Q � � �9��IRTM6� for sets of balls,and for approximating

Q � � �9��@� Q$� � �9�� for sets of points in � � . Consequently, the paper consists ofthree parts, where the first two deal with the two variants of Hausdorff distances for balls, and the third partstudies the rms and summed Hausdorff-distance problems for point sets.

Let � �� ?r� denote the ball in � � of radius ? centered at�. Let �O�YH�� ,K,K,K_��J and � �YH�� ,K,K,K_�� K J

be two families of balls in �:� , where ��0�� '� ��.� and ��S�� *�� ?��G� , for each � and � . Let � be theset of all translation vectors Z�� so that no ball of � X\Z intersects any ball of � . We note, though, thatballs of the same family can intersect each other, as is typically the case, e.g., in modeling molecules ascollections of balls.

Section 2 considers the problem of computing the Hausdorff distance between two sets � and � of ballsunder the collision-free constraint, where the distance between two disjoint balls ��6�[� and ��c�P� isdefined as

�� ?� � �+* � �� ? � . We can regard this distance as an additively weighted Euclideandistance between the centers of �� and �� , and it is a common way of measuring distance between atomsin molecular biology [17]. In Section 2 we describe algorithms for computing

Q � �9��IR��]� in two and threedimensions. The running time is � � �� X � �� in � � , and � � � � � � � � X � �� in � � .The approach can be extended to solve the (collision-free) partial-matching problem under this variant ofHausdorff distance in the same asymptotic time complexity.

Section 3 considers the problem of computingQ � � �9�� and

Q � � �9��IR��f� , i.e., of computing the Haus-dorff distance between the union of � and the union of � , minimized over all translates of � in � � or in � .We first describe an � � �� X � ��;�� -time algorithm for computing

Q � � �9�� andQ � � �9��SR��f� in�� , which relies on several geometric properties of the medial axis of the union of disks. A straightforward

extension of our algorithm to � � is harder to analyze, and does not yield efficient bounds on its runningtime, mainly because little is known about the complexity of the medial axis of the union of balls in �� [8].We therefore consider approximation algorithms. In particular, given a parameter %*& ( , we compute atranslation Z , in time � �%�%� � X7� � F % � �� in � � and in time � �%�%� � � X � � � F % � �� in � � , suchthat

A � � ��X9Z ��A �� X % � Q � � �9�� . We also present a “pseudo-approximation” algorithm for computingQ � � �9��IR��f� : Given an % & ( , the algorithm computes a region � N �� that serves as an % -approximationof � (in a sense defined formally in Section 3). It then returns a placement Z]�� such that

A � � � X[Z ��SR ��BA �� X % � Q � � �9��SR ��@�in time � �%�%� � ��X)�j�3� F % �G�� in �� . This variant of approximation makes sense in applicationswhere the data is noisy and shallow penetrations between objects are allowed, as is the case in the dockingproblem [17].

4

Finally, we turn, in Section 4, to the two variants of rms and summed Hausdorff distances. Given two setsof points � and � in � � of size � and � , respectively, Section 4 describes an � �%� �� F % � ��;�� F % ��;�� F % �%� -time algorithm for computing an % -approximation of

Q � � �9�� . It also provides a data structure of size� �%� �� F % � �� F % �%� so that, for a query vector Z��d� � , an % -approximation ofA � � �<X�Z �� can be com-

puted in � � �� F % �%� time. In fact, we solve a more general problem, which is interesting in its own right.Given a family � � �,K,K,K3�� of point sets in �� , with a total of � points, we construct a decomposition of �f�into � �%� �'F % � �� F % �%� cells, which is an % -approximation of each of the Voronoi diagrams of � � �,K,K,K_�� ,in the sense defined in [10, 18]. We can preprocess this decomposition in � �%� �'F % �G�� F % �� 'F % �%�time into a data structure of size � �%� �'F % � �� F % �%� , so that for a query point s , an % -approximate nearestneighbor of s in every � can be computed in a total time of � � �;�� 'F % � X��_� . Moreover, given a semigroupoperation X , an % -approximation of ��

� � s �� .� can be computed in � � �� 'F % �%� time. We also extendthe approach to obtain an algorithm that % -approximates

Q � � �9�� in � �%� �� F % � � � >G�H��IJ�� k� � F % �%� time.This result relies on an efficient dynamic data structure for maintaining an % -coreset of � for

�-median. That

is, we show that we can maintain a weighted subset �ON � of size � �%�� F % � �� F % �%� so that the weightedsum of distances from any point �\�[�:� to the points of � is an % -approximation of the sum of distancesfrom � to the points of � . � can be updated efficiently as the points in � are inserted or deleted. Usingsimilar ideas, an algorithm for computing an % -coreset of � for � -medians has been proposed in [20].

Indyk et al. [23] outline a near-linear-time approximation algorithm for computingQ � � �9�� providing

only sketchy details. Furthermore, the running time of their algorithm depends on the spread of the pointset. We believe our technique have other applications and is of independent interest, and it has the advantageof supporting fast queries.

2 Collision-Free Hausdorff Distance between Sets of Balls

Let � � H�� ,K,K,K_�� J and � � H�� ,K,K,K_�� K J be two sets of balls in � � , � 5 �� . For two disjoint balls� � � � � � � ��E� and � � � � � * � � ? � �]�d� , we define?� ��T�� G�� '�+*��-� � � ��4?�� namely, the (minimum) distance between � and �� as point sets. Let � be the set of placements Z of � suchthat no ball in � X Z intersects any ball of � . In this section, we describe an exact algorithm for computingQ � �9��IR��]� , and show that it can be extended to partial matching.

2.1 Computing �� in � and �As is common in geometric optimization, we first present an algorithm for the decision problem, namely,given a parameter ! & ( , we wish to determine whether

Q � �9��IR��]� A"! . We then use the parametric-searching technique [3, 25] to compute

Q � �9��IR��f� . Given ! & ( , for� A �BA�� , let # 0N � � be the set of

vectors Z]�7� � such that

(V) (%$ B �! � & � & K ?� ��X[Z �� G��A'! .(In particular, � 1X[Z does not intersect the interior of any �� .)

Let �)( � � � � * � � � � � X ? � � and � ! � � � � * � �P� � � X ? � X*!G� . Then � ! �,+ � & K � ! � is theset of vectors that satisfy B �! � & � & K ?� ��0X Z@��G� A-! , and the interior of � ( � + � & K � ( � violates(%$ B �! � & � & K ?� ��1X Z@��-� . Hence, # j�.P� � � !'/ � ( � . Let

# � �9�� 0� & & �

# j�1.P� 232 0

� !541/ 276 � (8494 K5

��

(a) (b)

Figure 1: (a) Inner, middle and outer disks are � � � � , � ( � , and � ! � , respectively; (b) an example of # (dark region), which is the difference between � ! (the whole union) and � ( (inner light region).

See Figure 1 for an illustration. By definition, # � �9�� N � is the set of vectors ZI� � such that8 � � XZ ��BA'! . Similarly, we define # � �I�.�/�fN �[�YH3Zf� � a 8 � �S�.�YX[Z%�BA'! JLK

ThusQ � �9��SR��f��A'! if and only if # � �9�� # � �I�.�/�� bK

Lemma 2.1 The combinatorial complexity of # � �9�� in �� is � � � � � � .Proof: If an edge of � # � �9�� is not adjacent to any vertex, then it is the entire circle bounding a disk of� ! � or � ( � . There are � � �� such disks, so it suffices to bound the number of vertices in # � �9�� .

Let � be a vertex of # � �9�� ; � is either a vertex of #� , for some� A ��A/� , or an intersection point of

an edge in # and an edge in #�� , for some� A ��AC� . In the latter case,

�9� # #�� ! � !� � / � � ( g � (� �@KIn other words, a vertex of # � �9�� is a vertex of � ! � !� , � ! / � (� , � !� / � ( , or � ( g/� (� , for� A �� A � . Observe that a vertex of � ! � !� (resp., of � ! / � (� ) that lies on both �9� ! and �9� !�(resp., � �3(� ) is also a vertex of � ! g � !� (resp., � ! g �9(� ). Therefore, every vertex in # � �9�� is a vertexof � ! g � !� , � ! g � (� , � !� g � ( , or � ( g � (� , for some

� A �� A/� . Since each � ! �� ( is the unionof a set of � disks, each of � ( g � !� , � ! g � (� , � !� g � ( , � ( g � (� is the union of a set of

5 � disks andthus has � � � � vertices [24]. Hence, # � �9�� has � � � � � � vertices.

Lemma 2.2 The combinatorial complexity of # � �9�� in � � is � � � � � � � .Proof: The number of faces or edges of # � �9�� that do not contain any vertex is � � ��L�d�_� since they aredefined by at most two balls in a family of

5 �� balls. We therefore focus on the number of vertices in# � �9�� . As in the proof of Lemma 2.1, any vertex # � �9�� satisfies:

�e� # # � #��6� � � ! � !� � !� � / � � ( g � (� g � (� �@�for some

� A � A � A � A � . Again, such a vertex is also a vertex of � g�� g�� , where � is � ! or� ( , and similarly for � � , �� . Since the union of ? balls in � � has � � ? � � vertices, � g � �jg �� has � � � � �vertices, thereby implying that # � �9�� has � � � �L�j�3� vertices.

Similarly, we can prove that the complexity of # � �I�.�/� is � � � � �c� in � � and � � � � � � � in � � . Extend-ing the preceding arguments a little, we obtain the following.

Lemma 2.3 # � �9�� # � �S�.�/� has a combinatorial complexity of � � �� cX � �%� in �� , and � � �d�L�j� � �cX� �%� in � � .6

Remark. The above argument in fact bounds the complexity of the arrangement of � � H # � � # � �,K,K,K_� # � J .For example, in �:� , any intersection point of � # and � #�� lies on the boundary of � � # #��)� , and we haveargued that # #�� has � � � � vertices. Hence, the entire arrangement has � � � � � � vertices in � � .

We exploit a divide-and-conquer approach, combined with a plane-sweep, to compute # � �9�� , # � �I�.�/� ,and their intersections in � � . For example, to compute # � �9�� , we compute #��]�� K �� # and #�� f�� K K �"! � # recursively, and merge # � �9�� # � # � � by a plane-sweep method. The overall running

time is � �%� � X � ��=��<�� .To decide whether # � �9�� # � �S�.�/�:�� in �:� , it suffices to check whether

�� # � �9�� # � �I�.�/� � �is empty for all balls � �PH�� ( � �� ! � a � A �=A �� A � A �iJ . Using the fact that the various � ( � �� ! �meet any � � in a collection of spherical caps, we can compute

��in time � � �� X � ��;��,�� , by the

same divide-and-conquer approach as computing # � �9�� # � �I�.�/� in �]� . Therefore we can determinein � � � � � � � � X*� �� time whether

Q � �9��IR��]� A ! in � � .Finally, the optimization problem can be solved by the parametric search technique [3]. In order to

apply the parametric search technique, we need a parallel version of the above procedure. However, thisdivide-and-conquer paradigm uses plane-sweep during the conquer stage, which is not easy to parallelize.Instead, we use the algorithm of [3] to compute the union/intersection of two planar or spherical regions. Ityields an overall parallel algorithm for determining whether # � �9�� # � �I�.�/� is empty in � � �� time using � � �� X � �� processors in �0� , and � � � �L�j� � � X*� �� processors in �:� . Thestandard technique of parametric searching then implies the following result.

Theorem 2.4 Given two sets � and � of � and � disks (or balls), we can computeQ � �9��IR��f� in time� � �� X � �� in �� , and in time � � � ��j� � � X*� �� in �0� .

2.2 Partial matching

Extending the definition of partial matching in [23], we define the partial collision-free Hausdorff distanceproblem as follows.

Given an integer � , let8� � �9�� denote the � V� largest value in the set H ?� �?��Pa�� O�eJ ; note

that8 � �9�� 8

�� 9�� . We define

8� � �S�.�/� in a fully symmetric manner, and then define

A � � �9�� ,Q� � �9��IRTM6� as above. The preceding algorithm can be extended to compute

Q� � �9��IR��]� in the same asymp-

totic time complexity. We briefly illustrate the two-dimensional case. Let �^� H # � � # � �,K,K,K_� # � J be asdefined above, and let � � �:� be the arrangement of � . For each cell D � � � �:� , let � � DE� be the number of# ’s that fully contain D . Note that for any point Z in a cell D with � � DE� & � � �)�1� , 8 � � �\X Z �� A'! , andvice versa. Hence, we compute � � �:� and � � DE� for each cell D �� :� , and then discard all the cells D forwhich � � De� A � � � �1� . The remaining cells form the set M � ��H3Zà 8 � � � X\Z ��A*!LJ . By the Remarkfollowing Lemma 2.2, � has � � �c�� vertices, and it can be computed in � � � ��=�;��<�� time. Therefore,M � can be computed in � � � �L�=�� time. Similarly, we can compute M � �^H3Z�a 8 � � �I�.� X Z�� A !LJ in� � �� time, and we can determine in � � �� ^X � ��,�� time whether M �

M � �� . Similararguments can solve the partial matching problem in �f� , by computing the sets M � �TM � , and by checking fortheir intersection along the boundary of each of the balls � ! � , � ( � . Putting everything together, we obtainthe following.

Theorem 2.5 Let � and � be two families of � and � balls, respectively, and let � 3 ( be an integer, wecan compute

Q� � �9��IR��f� in � � �� X � �� time in � � , and in � � � � � � � � X4� �� time in�� .

7

3 Hausdorff Distance between Unions of Balls

In Section 3.1 we describe an algorithm for computingQ � � �9�� in �� . The same approach can be ex-

tended to computeQ � � �9��IR��f� within the same asymptotic time complexity. In Section 3.2, we present

approximation algorithms for the same problem in �]� and �k� .3.1 The exact 2D algorithm

Let � � H�� ,K,K,K3�� `J and � � H�� ,K,K,K_�� K J be two sets of disks in the plane. Write, as above, � �� '� ��.� , for �`� � �,K,K,K-� � , and ��7� � � *��L� ?��-� , for �� ,K,K,K-� � . Let � 4 (resp., � ( ) be the union ofthe disks in � (resp., � ). As in Section 2, we focus on the decision problem for a given distance parameter! &/( .

For any pointo

, we have

?�po �� ( �l� B �! q & �� po ��sL�� B �U � & � & K��po ��

� B �! � & � & K B/DGF�H?�po �+* �)� � ?��L� ( JLK

This value is greater than ! if and only if

B �! � & � & K� ?�po �+* � � � � ? � X !)� � &/( K

In other words,8 � � � XPZ �� & ! if and only if there exists a point

o � � 4 such thato X Z:F� � ( � !)��+ K

� � � �� !G� , where �� !G�:� � � *��L� ?��:X !)� is the disk �� expanded by ! .

Let M � �YH3Z a 8 � � � X[Z �� A'!LJLRM � is the set of all translations Z such that � 4 XcZ]N � (f� !)� . Our decision procedure computes the set M � andthe analogously defined set M � �YH3Z a 8 � � �I�.� X[Z��A'!LJL�and then tests whether M �

M � �� . To understand the structure of M � , we first study the case in which� consists of just one disk � , with center � and radius � . For simplicity of notation, we denote � ( � !)�temporarily by � . Let � denote the set of vertices of � � , and

�the set of (relatively open) edges of � � ; we

have a �EaJA a � a�A � � � � 5[24].

Consider the Voronoi diagram � �� Cg � � of the boundary features of � , clipped to within � . This isa decomposition of � into cells, so that, for each �d� �Cg � , the cell # � �r� of � is the set of points �[� �such that

�� r�BA �� , for all � � � ��g � . The diagram is closely related to the medial axis of �9� . SeeFigure 2(a). For each � � � , let � �h� denote the circular sector spanned by � within the disk � � � !)� whoseboundary is � , and let � � � � / +� & � � �j� . The diagram has the following structure. (A slight variant ofthe following lemma was observed in [8].)

Lemma 3.1 (a) For each �� , we have # � �h�� j� .(b) For each � � � , we have # � �L�0� � � # � � �L� , where #�� r� is the Voronoi cell of � in the Voronoi diagram� �� S� of � . Moreover, # � �r� is a convex polygon.

Lemma 3.1 implies that � �� g � � yields a convex decomposition of � of linear size. The medial axisof � � consists of all the edges of those cells # � � �L� , for �/�)� , that are also edges of � �� g � � (the dashedcells of Figure 2(a)).

8

�� (a) (b) (c)

Figure 2: (a) The medial axis (dotted segments) of the union of four disks centered at the solid points:The Voronoi diagram of the boundary decomposes the union into � cells; (b) shrinking by � the Voronoicell # � �r� of each boundary element � of the union; (c) The boundary of the lighter-colored disk contains aconvex arc, and the boundary of the darker-colored disk contains a concave arc.

Returning to the study of the structure of M � , we have, by definition, �IX�Z]N � if and only if�� ?X`Z@��r� 3

� , where � is the feature of �dg � whose cell contains �iXeZ . This implies that the set M �� of all translationsZ of � for which � X Z�N � is given by

M �� 6�0�

�� 6� &��jn � #�� r�� ?�where # � � �r��YH ��)# � �r� a ?� � ��L� 3 �1JLKFor �<� � , # � � �h� is the sector obtained from � �h� by shrinking it by distance � towards its center. For� � � , #�� r�� # � �r� / � � �b� �#� . See Figure 2(b) for an illustration.

Now return to the original case in which � consists of � disks; we obtain

M � ��0 �M �� .��

�0 �

6� &��jn � � #�� r� �� .�?KNote that each M �

� � � is bounded by � � � � circular arcs, some of which are convex (those bounding shrunksectors), and some are concave (those bounding shrunk Voronoi cells of vertices). Convex arcs are boundedby disks � � * � � � � ? � X�! �� , for some

� A �:AC� , while concave arcs are bounded by disks � � � � � � ��.�for � � � . Furthermore, since M �

� �� is obtained by removing all points �O� � such that the nearestdistance from � to � � is smaller than � , we have that: (i) � � * ��\� T� ? ��X!�� 9N M �

� �� ; and (ii)� � � �� M �

� � � / � � M � � � �%�� . See Figure 2 (c) for an illustration.

Lemma 3.2 For any pair of disks � �� e� , the complexity of M �� M �

� � � � is � � � � .Proof: Clearly, M �

� �� M �� G� is bounded by circular arcs, whose endpoints are either vertices ofM �

� � � or M �� G� , or intersection points between an arc of �1M �

� ��.� and an arc of �1M �� -� . It suffices to

estimate the number of vertices of the latter kind.Consider the set � � � of the

5 �9X 5 a �Ea disks

H�� * � �� ? � X ! � � �@�� * �� ? �]X ! � � � �+J � & � & K6H�� @�� +J � &�� K

We claim that any intersection point between two arcs, one from �1M �� and one from �1M �

� � �G� , lies on� � gi� � � � . Indeed, assume that � is such an intersection point that does not lie on � � gk� � � � . Then it has to lie

9

in the interior of gi� � � . That is, there is a disk � �� that contains � . There are two possibilities for thechoice of � .

(i) � � � � * � �\� '� ? �6X ! � �� (resp., �l� � � * � �\� � � ? ��X! � � �_� ), for some� A � A � . The

boundary of such a disk contains some convex arc on �#M �� (resp., �1M �

� � �-� ), and � NOM �� .�

(resp., � N M �� -� ). As such, � cannot appear on the boundary of �#M �

� � � (resp., �1M �� -� ), contrary

to assumption.

(ii) � � � � � � � T� �� (resp., � � � � � � � �)� � �G� ), for some �7� � . Recall that � is the set of verticeson the boundary of � � . Therefore, by definition, � rX�� (resp., � � X � ) contains � in its interior, so itcannot be fully contained in � , implying that � F� M �

� � .� (resp., � F� M �� G� ), again a contradiction.

These contradictions imply the claim. It then follows, using the bound of [24], that the number of intersec-tions under consideration is at most

�� 5 � X 5 a �Ea � � � 5 � � � � � .Each vertex of M � is also a vertex of some M �

� � .� M �� G� . Applying the preceding lemma to all the� � �d�3� pairs ��'�� , we obtain the following.

Lemma 3.3 The complexity of M � is � � � � � � , and it can be computed in � � �c��=��<�� time.

Similarly, the set M � has complexity � � �� and can be computed in time � � �� <�� . Finally,we can determine whether M �

M � �� , by plane sweep, in time � � �� X � ��,�� . Using parametricsearch, as in [3],

Q � � �9�� can be computed in � � �� X*� ��;�� time.To compute

Q � � �9��IR��f� , we follow the same approach as computingQ � �9��IR��f� in the preceding sec-

tion. Specifically, we need to modify the definition of M � and of M � , to require also that no disk of � X Zintersect any disk of � . This amounts, in the case of M � , to redefine each M �

� �6� to consist of all Z]�7�0� suchthat � X[Z]N � and

� �\X[Z�� ( �� . The latter is equivalent to requiring that Z=F� � (f� �#� �� . Hence

M � ��0 �M �� .��

�0 �

6� &��jn � � #�� r� �� / �6 �� (f� ��.� �� .�@K

It is now easy to modify the arguments in the proof of Lemma 3.2, to conclude that the complexity of M �(and, symmetrically of M � ) remains asymptotically the same, which then implies the following result.

Theorem 3.4 Given two families � and � of � and � disks in � � , we can compute bothQ � � �9�� andQ � � �9��IR��f� in time � � �� X7� �� .

3.2 Approximation algorithms

No good bounds are known for the complexity of the Voronoi diagram of the boundary of the union of �balls in � � , or, more precisely, for the complexity of the portion of the diagram inside the union [8]. Thebest known bound is � � � � � . Hence, a naıve extension of the preceding exact algorithm to � � yields analgorithm whose running time is hard to calibrate, and only rather weak upper bounds can be derived. Wetherefore resort to approximation algorithms.

ApproximatingQ � � �9�� in � � and � � . Given a parameter % & ( , we wish to compute a translation Z

of � such thatA � � �YX Z@�� A �� X % � Q � � �9�� , i.e.,

A � � �YX Z@�� is an % -approximation ofQ � � �9�� .

Our approximation algorithm forQ � � �9�� follows the same approach as the one used in [4, 5]. That is, let? � �/� (resp., ? � �� ) denote the point with smallest coordinates, called the reference point, of the axis-parallel

bounding box of � 4 (resp., � ( ). Set @ � ? � �� ? � �/� . It is shown in [5] that in � � ,Q � � �9��BA A � � � X @ ��BA �� X � � Q � � �9��@�10

and that the optimal translation lies in a disk of radiusA � � �OX @ �� centered at @ . Computing @ takes� � � X � � time. We compute

A � � � X @ �� using the parametric search technique [3], which is based onthe following simple implementation of the decision procedure:

Fix a parameter ! & ( , and put � 4 � !)�� + � � ��'� ��X !)� and � ( � !G�� + � � � *�� ?��SX !G� . Weobserve that

A � � �OX Z �� A ! if and only if � 4 X ZeN � (]� !)� and � ( N � 4�� !)��X Z . To test whether� 4 XcZ]N � (f� !)� , we compute� � 4 XcZ�� g'� (f� !)� , the union of the balls in � XcZ and of the ! -expanded balls

in � , and check whether any ball of � appears on its boundary. If not, then � 4 X Z]N � (f� !)� . Similarly, wetest whether � ( N � 4�� !G� X Z . The total time spent is proportional to the time needed to compute the unionof � XC� balls, which is � �%� � X/� ��;�� X�� %� in �0� , and � �%� � X/� �T�3� in �� , as it can be reduced tocomputing a convex hull of � X7� points in �:� and � � , respectively [28]. Plugging this into the parametricsearching technique, the time spent in computing

A � � �PX Z �� in �k� and �� is � �%� � X � �� %� and� �%� �d�0X �j�,��;�� %� , respectively.In order to compute an % -approximation of

Q � � �9�� from this constant-factor approximation, we usethe standard trick [4] of placing a grid of cell size #� !��

� A � � �CX @ �� in the disk of radiusA � � �CX�Z ��

centered at @ , and returning the smallestA � � �<X�Z �� , where Z ranges over the grid points. We thus obtain

the following result.

Theorem 3.5 Given two sets of balls, � and � , of size � and � , respectively, and % &�( , an % -approximationofQ � � �9�� can be computed in � �%�%� � X � � F % � �� time in � � , and in � �%�%� � � X � � � F % � ��

time in �k� .Pseudo-approximation for

Q � � �9��IR��f� . Currently, we do not have an efficient algorithm to % -approximateQ � � �9��IR��f� in �� . Instead, we present a “pseudo-approximation” algorithm, in the following sense.The set � � � (�� 4 � , where

�denotes the Minkowski sum, is the set of all placements of � at

which � 4 intersects � ( ; we have �Y� + �� *�� e��'� ��_X�?��_� , and �[� ��%� �� / �`� . For a parameter % 3 ( ,let

� � % �:� 6� �� * � �� % � � ��?X ?��_�%�@�

and � � % �k� ��%� �0� / � � % �%� . We call a region � N �:� % -free if � N � N � � % � .This notion of approximating � is motivated by some applications in which the data is noisy, and/or

shallow penetration is allowed. For example, each atom in a protein is best modeled as a “fuzzy” ball ratherthan a hard ball [17]. We can model this fuzziness by allowing any atom � � *-� ?r� to be intersected by otheratoms, but only within the shell � � *-� ?r� / � � *-� �� % ��?L� for some % &/( . In this way, the atoms of two dockingmolecules may penetrate a little at the desired placement. Although � can have large complexity, namely,up to � � � � � � � in � � , we present a technique for constructing an % -free region � of considerably smallercomplexity. We thus compute � and a placement Z > �� such that

A � � � X�Z > �� A �� X % � Q � � �9��IR �� .We refer to such an approximation

A � � � X[Z > �� as a pseudo- % -approximation forQ � � �9��IR��f� .

Lemma 3.6 In �� , an % -free region � of size � � �� F % �_� can be computed in time � �%� �� F % �_�� F % �%� .Proof: Let � � H�� 6� � � *�� bX ? �_� a � A ��AC�� A �'AC�iJLK We insert each ball � �� into anoct-tree M . Let �� denote the cube associated with a node � of M . In order to insert � � , we visit M in a top-down manner. Suppose we are at a node � . If � N � � , we mark � as black and stop. If � � � �� andthe size of �� is at least % � � _X ?��_� F 5 , then we recursively visit the children of � . Otherwise, we stop, leaving� unmarked. After we insert all balls from � , if all eight children of a node � are marked black, we mark� as black too. Let # � H � � � � � �,K,K,K-� � �rJ be the set of highest marked nodes, i.e., each � is marked black

11

but none of its ancestors is black. It is easy to verify that each � � marks at most � �� F % � � nodes as black,because the nodes a fixed � � marks are disjoint and of size at least % � � �X/?��-� F 5 ; thus a #ea�� F % �_� .The whole construction takes � �%� �� F % � �� F % �%� time, and obviously � � % �dN + � &�� N � K Set� �1.P� � �� / + � &�� G� ; it is an % -free region, as claimed.

Furthermore, let ? � ��@� ? � �/� , and @ �6? � �� ? � �/� be as defined earlier in this section. We prove thefollowing result.

Lemma 3.7 Let Z > �� be the closest point of @ in � . Then

A � � � X[Z > �� A �� X 5 � �r� Q � � �9��SR ��@KProof: Let

�!�� Q � � �9��IR �� and�Z]�� the placement so that

A � � � X �Z@�� ! . Then

� �Z � @ � � � �Z � ? � ��jX*? � �/� � � ?� ? � �/�jX �Z � ? � ��%�@KA result in [4] implies that

�� ? � �/�jX �Z@� ? � ��%� A � � �! . On the other hand,

A�� X Z > �� A �!fX � �Z ��Z > � A �!fX � �Z � @ � X � @ ��Z > �A �!fX 5 � @ � �Z � A �!fX 5 � � �!�� X 5 � �L� Q � � �9��IR ��@KThe point Z > � � closest to @ can be computed as follows. Recall that in Lemma 3.6, � � .P� � � � /+ � &�� G� . Set �� .P� � � � / ��f� + � &�� LR�� consists of a set of openly disjoint cubes. We first check

whether @ � �� by a point-location operation. If the answer is no, then @ � � , and we return Z > � @ .Otherwise, Z > is a point on � � � � �� that is closest to @ . In that case, Z > is either a vertex of a cube in # ,or lies in the interior of an edge or of a face of a cube in # . For each node �� # and for each boundaryfeature �� , that is, a face, an edge, or a vertex of � , we compute the point in � closest to @ . Let � �be the resulting set of closest points. We then check, for each s �� , whether s � �� , by testing whetherat least one neighboring cube is unmarked; there are � �� neighboring cubes of s . This can be achieved byperforming point-location operations in M . Finally, from among those points of � � that lie on �� (thus on� � ), we return the one that is closest to @ . There are � � �� F % �_� cubes, and each has constant number ofboundary features. Furthermore, at most a constant number of nodes in # contain a given point, and eachpoint-location operation takes � � �� F % �%� time. Hence, Z > can be computed in � �%� �� F % � �� F % �%�time.

We can computeA � � � X Z > �� in � �%� � � X � � �� time, as described in Section 3.2, so we

can approximateQ � � �9��SR �� , up to a constant factor, in � �%� � �fX �d�_�� time. We then draw an

appropriate grid around Z > and use it to compute an % -approximation ofQ � � �9��IR �� , as in Section 3.2, with

the difference that we only test those grid points that lie in � . We thus obtain the following result.

Theorem 3.8 Given � , � in �:� and % &/( , we can compute in � �%�%� � � X �d�3� F % �G�� time, an % -freeregion � N � � and a placement Z]�� of � , such that

A � � � X[Z �� A �� X % � Q � � �9��IR ��@K

12

4 RMS and Summed Hausdorff Distance between Points

We first establish a result on simultaneous approximation of the Voronoi diagrams of several point sets,which we believe to be of independent interest, and then we apply this result to approximate

Q � � �9�� andQ�� 9�� for point sets �^� H_� � �,K,K,K-�� J and � �YH-* � �,K,K,K3�+* K J in any dimension.

4.1 Simultaneous approximation of Voronoi diagrams

Given a family H � � �,K,K,K3��5J of point sets in � � , with a total of � points, and a parameter %'& ( , we wishto construct a subdivision of �:� , so that, for any �P� �0� , we can quickly compute points

o �� , for all� A � A � , with the property that?� � � o � A �� X % � ?� � �� , where

?� � �� ]�YB �! q &�� ?� � ��sL� . Arya andMalamatos [10] proposed a data structure that can answer an % -approximate nearest-neighbor query in time� � �;�� F % �%� using � �%� � F % � �� F % �%� space and � �%� � F % � �� F % �� F % �%� preprocessing. Construct-ing this data structure for each � separately, one can answer the above query in time � � �9�� F % �%� using� �%� �'F % � �� F % �%� space and � �%� �'F % � ��;�� 'F % ��;�� F % �%� preprocessing. We adapt this data structureso that the query time can be improved to � � �� 'F % �LX��3� at the cost of increasing the space by a �� 'F % �factor. This data structure also constructs a subdivision of �f� of size � �%� �'F % � �� F % �%� , which is an% -approximate Voronoi diagram of each � . Besides being interesting in its own right, the modified datastructure will be used in the subsequent subsections.

We begin by describing how we adapt the data structure by Arya and Malamatos [10]. Let�

be a setof � points in � � , let %4& ( be a parameter, and let ��

be a hypercube so that any point of�

is atleast

�1� D B � � � F % away from �� . Any point of�

is an % -approximate nearest neighbor for a point outside� . A quad-tree box of � is a hypercube that can be obtained by recursively dividing each side of � intotwo equal parts. A useful property of quad-tree boxes (of � ) is that any two of them are either disjointor one of them is contained in the other. Arya and Malamatos choose a set � � � � of � �%� � F % � �� F % �%�quad-tree boxes of � that cover � ; each box D � � � � � is associated with a point

o M � � . � � � � hasthe following crucial property: For a point � �� , let D �\� � � � be the smallest box containing � ; then?� � � o M � A �� X % � ?� � � � � . In order to find the smallest box containing � , they store � � � � in a BBD-tree,proposed by Arya et al. [11]. Instead, we store them in a compressed quad tree (see e.g. [19]), as follows.We first construct a quad tree � on � � � � . Let

A � be the hypercube associated with the node � of � ; the rootis associated with � . A box D � � � � � is stored at a node �9�� if

A �� D . If an interior node �9�� doesnot store a box of � and if the degree of both � and

o � �#� , the parent of � , is one, we compress � , i.e., wedelete � and the child of � becomes the child of

o � �#� . We repeat this step until there is no such node in � .The size of � is � � a � � � �,a � . For a node � �� , if a box in � � � � contains

A � , then we set � � � � � o M whereD is the smallest such box; if there is no such box, then � � � � is undefined. The compressed quadtree � canbe constructed in time � � a � � � �,a2��à � � � �,a �� %� � F % �)�� F % �� F % �%� [19].

We call a node � � � exposed if its degree is at most one. We associate a region � � with each exposednode � . If � is a leaf, then � � � A � , and if � has one child � , then � � � A � / A�� . For a point �[�� ,� is the lowest node in � such that �[� A � . The regions � � form a partition of � , and this partition is an% -approximate Voronoi diagram of

�. It can be checked that � � �#� is defined for every exposed node � . By

construction, ?� � � � �BA �� %� A �� X % � ?� � � � � � �c�� LKThe depth of � is linear in the worst case. In order to quickly find the node � for which � � contains

a query point � , we construct another tree � , of depth � � �� F % �%� , on the nodes of � , as follows. Weidentify in � � a��:a � time a node � in � so that the removal of � decomposes � into at most �6A 5 ��X �connected components, each of size at most a��:a F 5 . Let �� r� be the subtree that contains the ancestors of� , and let � �

� �r�@� � � � �� ( � � �L� be the subtrees of � rooted at the children of � . We recursively construct

13

a tree � � �L� on each � � �L� , for ( A � $ � , and attach it as a subtree of � . � can be constructed in time� � a��:a2�;��a��:a �� %� � F % � �� F % �� F % �%� . Given a point � �� , we visit a path in � , starting fromits root. Suppose, we are at a node � . If � is exposed and � � � � , we return � � �r� and stop. If � �� A � ,we recursively visit the subtree � � � �L� ; otherwise, we visit the child � so that �C� A�� . The query time isproportional to the depth of � , which is � � �� F % �%� .

Returning to the problem of computing % -approximate nearest neighbors of � � �,K,K,K_�� , let be a hy-percube containing � � + � � � so that any point of � is at least

�1� D B � �S� F % away from � . For each� A�=A*� , we first construct the family � � � .� of quad-tree boxes of , using the algorithm by Arya andMalamatos [10], and then construct the trees � and � on + � � � .� , as described above. For a node �c� � ,let � � � �0� o M � , where D is the smallest box of � � � � that contains

A � ; if there is no such box, then � � �#�is undefined. By definition, for any point �c� � � and for

� A �,A � ,�� A ?� � �� #�%�BA �� X % � �� @KThe regions � � form a subdivision of , which is an % -approximate Voronoi diagram of each � .

For a node � , let � � � � be a list of � items whose � th item is � � � � if it is defined and NULL otherwise.We store in an implicit manner the set of � � �#� for all nodes �d� � , using a persistent data structure [26] ofsize � �%� �'F % � �� F % �%� so that � � �#� for a node � can be reported in � � �3� time.

Note that � � �#� �� po � �#�%� if a box D � � � � � is stored at � , i.e.,A � �[� � � � , and � � �#�� o M in

this case. Let � � � �� H �à A �e�[� � � .�+J . Then � � � � can be constructed from � �po � � �%� by updating � � �#�for �]�� #� . We therefore perform an in-order traversal of � and maintain the lists � � � � in a linked list � ,using persistence, so that � � �h� for the nodes � that have been visited so far can be retrieved in � � �_� time. Ifwe are currently visiting a node � , then the “current version” of � contains � � �6� . Moreover, we maintainan array that stores pointers to each item in the current version of � so that the � th item, for any ��A � , canbe accessed in � �� time. When we arrive at a node � � � for the first time, for each � �� , we do thefollowing. Let D � � � � � be a box stored at � . Then we set � � � �� o M and update the � th item of � . Westore at � a pointer � �#� to the root of the current version of � . We also store the values of � �po � � �%� , for�]�� , in a stack at � . After we have processed the subtree rooted at � , the current version of � contains� � �#� . We delete the values stored in the stack at � and restore them in � so that its current version contains� �po � �#�%� . Hence, we perform

5 a�� ,a updates on � while processing � . Sarnak and Tarjan [26] showed thatthe amortized time and space of performing an update in a persistent linked list is � �� , therefore the totaltime spent in this step is � � & � � � a�� #�,a �:� � �%� � F % �)�� F % �%� .

Let s be a query point. Following the procedure described above, we first find in � � �� 'F % �%� time thenode � � � such that s � � � . Next, following the pointer � �r� , we access the version of � that stores � � �r�and report � � �r� for all

� A �,A � in � � �_� time. Hence, we conclude the following.

Theorem 4.1 Given a family H � � �,K,K,K_��5J of point sets in �0� , with a total of � points, and a parameter%:&)( , we can compute in � �%� �'F % � �� 'F % �� F % �%� time a subdivision of � � and a data structure ofsize � �%� �'F % �G��;�� F % �%� so that, for any point se� �:� , one can % -approximate

�� s �� .� , for all� A � A � ,

in � � �� 'F % � X �_� time.

4.2 Approximating � � �� For

� A � A/� , let � � � �� H-*�� :a � A �:A��iJ , and for � $ ��A�� X*� , let � � *� ( � � �^�H-*� ( � � � ��a � A �7A6��J . We construct the compressed quad-tree � for � � �,K,K,K_�� ! K , with the givenparameter % ; a��:ar� � �%� �� F % �-�� F % �%� . Define

� � Z%�k� � � Z �� .�� B �! � & � & K � � Z �+*�� @� � A ��A ��

B �! � & � & � � � Z �+*� ( � �� -�@� � $ ��AC� X �kK

14

Let �4 � Z%� � 8 � � � � X[Z ��

� � � � � X[Z ��

��

� � Z%��(f� Z%� � 8 � � � �I�.� X[Z��k� �� K�

� � � � � X[Z@�+* �k�

�� ! K� � ! �

� � Z��@KFor each exposed node �9�� , � � �#� is defined for all

� A �,AC� X*� . Let�� 4 � � � Z%�k� ��

� � Z �� #�%� and�� ( � � � Z�� K�

� � � Z �� ! � � �%�@K

By construction, for any Z�� ,�4�� Z�� A �� 4 � � � Z��

� � � Z �� #�%�BA ��

�� X % � � � � � Z@�� A �� X % � �

�4`� Z%�@�

implying that � �� 4 � � � Z�� A �� X % � 8 � � � X[Z ��@KSimilarly,

� �� ( � � � Z%� A �� X % � 8 � � �I�.� XPZ%�@K Hence, it suffices to store�� 4 � � � Z%�@� �� ( � � � Z�� at each exposed

node �c� � . Since they are quadratic functions in Z ��]� , they can be stored using � �� space (where theconstant depends on

) and updated in � �� time for each change in � � � � .

If we compute�� 4 � � for each exposed node �9�� independently, the total time spent is � �%� � � � F % �-�� F % �%� .

We therefore proceed as in the previous section. We perform an in-order traversal of � . For each node �e�� ,let � > � �#�0� H �fa � � �#� is defined J , and let�� 4 � � � Z%��

&�� & � � � Z@�� %� and�� ( � � � Z��

&� �� Z �� %�@KIf � is an exposed node then � > � � �]�� K K ��X �� , therefore the above definition of

�� 4 � � � �� ( � � is consistentwith that for exposed nodes defined earlier. Following the same idea as in the previous subsection,�� 4 � � � Z%� � �� m � �� Z��jX ��

&� � � �� & � � � Z �� %� � � � Z@�� po � �#�%�%�� ( � � � Z%� � �� m � �� Z��hX �� &� �� Z �� #�%� � � � Z@�� po � � �%�%�� K

Hence,�� 4 � � � �� ( � � can be computed in � � a�� > � �#�,a � time and the total time spent in the traversal is � �%� � F % � �� F % �%� .

Finally, for each exposed node �e�� , we compute

Z � � D �"� B �! V & �� B/DGF�� Z��@� � �� Z��and return B �! � A � � �YX Z �L�� A �� X % � Q � � �9��

15

where the minimum is taken over all exposed nodes � of � .If we wish to compute an % -approximate value of

A � � � XPZ �� , for a given Z�� , we construct thetree � as described in the previous section. Instead of storing the lists � � � � at each node, we now store�� 4 � � � �� ( � � at each node � � � . The total storage needed is � �%� �� F % � �� F % �%� . Hence, we obtain thefollowing.

Theorem 4.2 Given two sets � and � of � and � points in � � and a parameter %.&/( , we can:

i. compute a vector Z > � � � in � �%� �� F % � ��;�� F % ��;�� F % �%� time, so that

A � � � X[Z >-�� A �� X % � Q � � �9��@Rii. construct a data structure of size � �%� �� F % � �� F % �%� , in time � �%� �� F % � �� F % �� F % �%� ,

so that for any query vector Z/� �� , we can compute an % -approximate value ofA � � �^X Z �� in� � �� F % �%� time.

4.3 Approximating � � � � �� Modifying the above scheme, we approximate

QE� � �9�� as follows. Let � and � be the same as definedabove. We define �

4�� Z%�l��

?� Z �� k� 8 � � � X[Z@��@��( � Z%�l�

�� K��

?� Z �� ! �)�� 8 � � �S�.�YX[Z%�@KFor each exposed node �9�� , let�� 4 � � � Z��l�

��

�� Z@�� %��A �� X % � 8 � � � X[Z@�� ( � � � Z��l�� K��

�� Z@�� ! � � �#�%� A �� X % � 8 � � �S�.�YX[Z%�Z � � D �"�6B �! V & � � B/DGF � �� 4 � � � Z%�@� �� ( � � � Z��SK

Since�� 4 � � and

�� ( � � are not simple algebraic functions, we do not know how to compute, store, and updatethem efficiently. Nevertheless, we can compute an % -approximation for

�� 4 � � (resp.,�� ( � � ) that is easier to

handle. More precisely, for a given set � of points in � � , define the�-median function

B�� Z�� a �9a �m &�� po �%Z%�@K

For any �Y� � , let � 4 � � �d�lH�� #��a � A �� c� � > � � �+J , � ( � � � � H�� #��a � & �� > � � �+J ,�� 4 � � � Z%�k�CB�� Z%� , and�� ( � � � Z��k�CB� �� Z%� . In Section 4.4, we describe a dynamic data structure

that, given a point set � of size � , maintains an % -approximation of the function B� � � � � � as a functionB� � � � � � defined by set of � �%�� F % � �� F % �%� weighted points. A point can be inserted into or deleted from� in � �%�� F % �G�� %� �� F % �� ! � � � time. Furthermore, given two point sets � and � in �� , this can beused to maintain an % -approximation of D �"�:B �! V B/DGF�H_B� � � � Z��@��B� � � � Z��+J within the same time bound.

16

Using this data structure, we can traverse all cells of � , as in Section 4.2, and compute an� % F �L� -

approximation of�� 4 � � and

�� ( � � (thus an % -approximation of

�4

and

�(

) for each node � � � . However,we now spend � � a��E> � � �,a � �� F % � � >$�H�;I�� k� � F % �%�time to compute an

� % F �L� -approximation of�� 4 � � from that of

�� 4 � m � �� . Putting everything together, weconclude the following.

Theorem 4.3 Given two sets � and � of � and � points in �� and a parameter (1$ % A �, we can

compute:

i. a vector Z > �7� � , in � �%� �� F % � � � >G�H��I��;�� k� � F % �%� time, so that

A � � � X[Z >-�� A �� X % � Q�� 9��@Rii. a data structure of � �%� �� F % � � � >G�H��IJ�� k� � F % �%� size in time � �%� �� F % � � � >G�H��I��;�� k� � F % �%� , so

that for any query vector Z]�7�0� , we can % -approximateA � � �\X�Z@�� in time � � >G�H��IJ�� k� � F % �%� .

4.4 Maintaining the 1-median function

Let � be a set of � points in �:� . In this subsection, we describe an algorithm for maintaining an % -approximation of the

�-median of a point set � as points are inserted into or deleted from � . Using the

ideas in [1, 20], we construct a coreset, a weighted subset of � , whose�-median approximates that of � ,

and argue that it can be updated efficiently as the set � changes.Let

� �f� �� be a weighted point set in �:� with the weight function � �Y� � ! . For a subset �YN'� , let� � �� m & � � �po � . If the weight function is not important or obvious from the context, we will use � to

denote� �f� �� . For a point � �7�� , we define � �%� �f� ��@� � �k� � m &�� po � � o � � as the price of the

�-median

placed at � . Furthermore, let �� %� �f� �6�%��CB �U �� &�� %� �f� �6�@� � � denote the price of the optimal�-median

for� � � �6� .

Definition 4.4 (Coreset) Let� �f� �� be weighted point set in � � . A weighted set

�� k� with N*� is an% -coreset of

� �f� �� for�-median if

�� % �� f� ��@� �j�BA�� %�� i�@� �j� A �� X % �� %� �f� ��@� � � � �c�7� � K (1)

The following lemma describes an algorithm for computing a small % -coreset for�-median.

Lemma 4.5 Let� �f� �6� be a weighted set of � points in � � , and let ( $ % A � F 5 be a parameter. An % -

coreset�� k� for

� �f� �6� of size � �%�� F % �G�� F % �%� for�-median can be computed in time � � �=�� F % ��X�� F % � �� F % �%� .

Proof: Let � �[� � be the point realizing �� S� . Set ?7�� S� F � � �S� , � �� F % �-�� , where�

�3�� ( is a constant, and �I� 5�� ? � �

� ? F % � . Let � � � � �#� be the axis-parallel hypercube of side length

� centered at � . Let � � � � � �j� ?L� , and let � `� �"!��h� 5 ?$# / �"!%�h� 5 ( � ?�# for� A � A&� . Next, we

partition each � into � �� F % � � hypercubes of side length % 5 ? F �� by drawing a uniform grid, where� � is

a sufficiently large constant. This forms an exponential grid

�that covers the hypercube � �'� � �j� �3� ; see

Figure 3. Set �)( �1� � and �)*�� / �+( .For every grid cell ,O�

�, we pick a “representative” point

o � � , , add it to

and set its weight� �po �� ,f� . Finally, let �P� � be the point farthest away from � in � . We add � to

and set its

17

��

��

�

�

(a) (b)

Figure 3: (a) An exponential grid with 3 layers. (b) The larger (resp., smaller) box is � (resp., ), and theset of hollow circles is � * .

weight � � �h� � � � �)*�� F � � � � . By construction, a a�� %�� F % �G�� F % �%� . We claim that�� i� is a% -coreset of

� �f��:� for�-median, i.e., it satisfies (1).

For any � � �0� , we claim that

a � � �+( � � � � � �� ( � � �,aJA � % F � �� f� �?�� % F � �� S�@K (2)

Indeed, every point in a cell of � � can be interpreted as “traveling” to its representative point in that cell.The total (weighted) distance traveled by all the points inside � to the coreset is smaller than

� % F � �� S� .As such, the error contributed by the points of � ( is smaller than

� % F � ��$�� S� . This implies that we onlyneed to consider the error contributed by points in � * .

In particular, we need to bound the difference in the price of � � � *�� j� and � �� *�� j� . (Of course, if�)* � , the lemma trivially holds and the following is unnecessary.) Let (�� and

* �^H �jJ . Let9� � � �j� � � ?HF

� % �]N � ; see Figure 3. We consider two cases.

(i) �c� : By the triangle inequality, for any � ��7�7�:� ,

� � � � � � � � � A � � � � A � � � � X � � � � KTherefore, � � �)*�� )*0� � � � � A � � �+*�� h� A�� )*�� ?� X�� +*�� K (3)

Since � 3 �

� ?HF % � and the side length of is at most

�

� ? F� % ,

� o � � 3 � � � � � F % , for allo � � * , and

thus (3) becomes �� % F � �� )*�� A � � �+*�� h� A �� X % F � �� )*�� ?K (4)

Similarly, we have

�� % F � �� )*�� A�� *�� h� A �� X % F � �� +*�� K (5)

Putting (4) and (5) together, we have

a � � �+*�� h� � � �� *�� _a�A � % F 5 �� +*�� ?K (6)

18

Using (2) and (6), we conclude

a � � � � �h� � � �� j�,a A a � � � ( � �h� � � �� ()� �h�3a,X a � � �)*�� h� � � �� *�� 3aA � % F � �� jX � % F 5 �� )*�� A % � � �f� �h�?K(ii) �c�d� � / : We first claim that

� � �)*0��A % � � � �S� F � � K (7)

Indeed,

�� I�� m & � �

�po � � o � � 3 �m &��

�po � � 3 � � �+*0� � � � �� S�% � � � �I� �which implies the claim. Next,

� � �I� � � � � 3 � � �S� � � ?� % 3 �

� �� S� F � % (8)

Using (7) and (8), we obtain

� � � ( � �h� 3 � � �+(-� � � � � � � � �+( � ��3 ! � � % � F � � # �� I� � � � � � �� S�3 ! � � % � F � � # �� I� � � � � � � � % F � � � � � �S�

� � � �3 �� % F��L� � � �S� � � � � K (9)

The last inequality follows since�

�3 � ( . A similar argument shows that

� � �+*�� h� A � � �)*�� X � � �)*�� A % �� I� � � � � X �� S�A %� � � �S� � � � � (using (8)) K (10)

Plugging (9) into (10), we obtain

� � � * � �h� A %� � � �+()� �j�� % F�� A %� � � �f� �h�?KSimilarly, we can argue that � �� *�� h� A � % F � �� f� �h��K (11)

Thus, using (2), (10), (11), we obtain

a � � � � �h� � � �� j�,a A a � � � ( � �h� � � �� ()� �h�3a,X a � � �)*�� h� � � �� *�� 3aA � % F � �� jX � % F 5 �� A % � � �f� �h�?KThis completes the proof that

�� i� is an % -coreset of� �f��:� . The above argument works even if� � �f� � ��A � �� I� for some constant

�3 �, provided that

�

� �� are chosen appropriately.

In order to compute , we first compute in � � � � time the centroid �

� � � � . It is well known that� � �f� �� .A 5 �� S� . We then compute the �� ’s and the exponential grid in time � �%�� F % � �� F % �%� and

19

find the grid cell that contains each point of � in a total time of � � �=�� F % �%� . Hence, the total time spentin computing

is � � �=�� F % �jX �� F % � �� F % �%� .

Let� � � ��:� and

� � � ��:� be two weighted point sets so that � � � � �� , and let

�� '� �i� be an % -coreset of� � '��:� for�-median. Then

�� g � � �k� is a % -coreset of

� � � g3� � ��:� . Moreover, if�� is an % � -coreset

of��

� � � � � , and��

� � � � � is an % � -coreset of� �f��:� , then

�� is a5#� % � X % � � -coreset of

� �f��:� . Usingthese observations and plugging Lemma 4.5 into the dynamic data structure by Agarwal et al. [1], one canmaintain an % -coreset of

� �f��:� of size � �%�� F % � �� F % �%� for�-median of

� �f��:� efficiently. Omitting allthe details, we conclude the following.

Theorem 4.6 Let � be a set of � points in � � , and let % &/( be a parameter. One can maintain a % -coresetof � of size � �%�� F % �G�� F % �%� under insertions and deletions of points in � so that each update takes� �%�� F % � �� F % �� ! � � � time.

4.5 A randomized algorithm

We briefly describe below a simple randomized algorithm to approximateQ � � �9�� . The algorithm for

approximatingQG� � �9�� is similar. Let Z > be the optimal translation, i.e.,

A � � � X Z > �� Q � � �9�� .Lemma 4.7 For a random point � � from � ,

?� � �iX Z > �� A 5 Q � � �9�� , with probability greater than� F 5 .

The same claim holds forQ9� � �9�� .

Proof: Let � � be a random point from � , where each point of � is chosen with equal probability. Let�

bethe random variable

� � ?� ��fX[Z > �� . Then

� � � � ��

?� � X Z > �� A � � � X Z > �� Q � � �9��@KThe lemma now follows immediately from Markov’s inequality.

Choose a random point ��\�C� . Let Z �� *�� and ! �� A � � � X Z�� , for� A � A � . It

then follows from Lemma 4.7 and the same argument as in Lemma 3.7, that B �! � ! � is a constant-factorapproximation of

Q � � �9�� , with probability greater than� F 5 . Computing ! � exactly is expensive in �:� ,

therefore we compute an approximate value of ! � , for� A �'AC� , in time � �%� � X.� �� , by performing

approximate nearest-neighbor queries [10]. We can improve this constant-factor approximation algorithmto compute a

�� X % � -approximation ofQ � � �9�� using the same technique as in Section 3. We thus obtain

the following result.

Theorem 4.8 Given two sets � and � of � and � points, respectively, in � � , and a parameter % & ( ,we can compute, in randomized expected time � �%� �� F % � ��;��,�� , two translation vectors Z � and Z � , suchthat, with probability greater than

� F 5 ,

A � � � X Z � �� A �� X % � Q � � �9�� andA � � � X[Z � ��BA �� X % � Q�� 9��@K

5 Conclusions

We provide in this paper some initial study of various problems related to minimizing Hausdorff distancebetween sets of points, disks, and balls. One natural question following our study is to compute exactlyor approximately the smallest Hausdorff distance over all possible rigid motions in � � and � � . Given twosets of points � and � of size � and � , respectively, let D be the maximum of the diameters of � and

20

� . We believe that there is a randomized algorithm with roughly �� D expected time, that approximatesthe optimal summed-Hausdorff distance (or rms-Hausdorff distance) under rigid motions in the plane. Thealgorithm that we envisage combines our randomized approach from Section 4.5, a framework to convertthe original problem to a pattern matching problem [23], and a result by Amir et al. on string matching [9].However, this approach does not extend to families of balls. We leave the problem of computing the small-est Hausdorff distance between sets of points or balls under rigid motions as an open question for furtherresearch. Another question is to approximate efficiently the best Hausdorff distance under certain transfor-mations when partial matching is allowed. The traditional approaches using reference points break downwith partial matching.

References

[1] P. K. Agarwal, S. Har-Peled, and K. R. Varadarajan, Approximating extent measures of points, J. ACM,51 (2004), 606–635.

[2] P. K. Agarwal and J. Matousek, Ray shooting and parametric search, SIAM J. Comput., 22 (1993),540–570.

[3] P. K. Agarwal, M. Sharir, and S. Toledo, Applications of parametric searching in geometric optimiza-tion, J. Algorithms, 17 (1994), 292–318.

[4] O. Aichholzer, H. Alt, and G. Rote, Matching shapes with a reference point, Intl. J. Comput. Geom.Appl., 7 (1997), 349–363.

[5] H. Alt, B. Behrends, and J. Blomer, Approximate matching of polygonal shapes, Ann. Math. Artif.Intell., 13 (1995), 251–266.

[6] H. Alt, P. Brass, M. Godau, C. Knauer, and C. Wenk, Computing the Hausdorff distance of geometricpatterns and shapes, in: Discrete and Computational Geometry — The Goodman-Pollack Festchrift(B. Aronov, S. Basu, J. Pach, and M. Sharir, eds.), Springer-Verlag, Heidelberg, 2003, pp. 65–76.

[7] H. Alt and L. J. Guibas, Discrete geometric shapes: Matching, interpolation, and approximation, in:Handbook of Computational Geometry (J.-R. Sack and J. Urrutia, eds.), Elsevier Science PublishersB.V. North-Holland, Amsterdam, 2000, pp. 121–153.

[8] N. Amenta and R. Kolluri, The medial axis of a union of balls, Comput. Geom. Theory Appl., 20 (2001),25–37.

[9] A. Amir, E. Porat, and M. Lewenstein, Approximate subset matching with Don’t Cares, Proc. 12thACM-SIAM Sympos. Discrete Algorithms, 2001, pp. 305–306.

[10] S. Arya and T. Malamatos, Linear-size approximate Voronoi diagrams, Proc. 13th ACM-SIAM Sympos.on Discrete Algorithms, 2002, pp. 147–155.

[11] S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu, An optimal algorithm forapproximate nearest neighbor searching fixed dimensions, J. ACM, 45 (1998), 891–923.

[12] M. J. Atallah, A linear time algorithm for the Hausdorff distance between convex polygons, Inform.Process. Lett., 17 (1983), 207–209.

21

[13] D. Cardoze and L. Schulman, Pattern matching for spatial point sets, Proc. 39th Annu. IEEE Sympos.Found. Comput. Sci., 1998, pp. 156–165.

[14] L. P. Chew, D. Dor, A. Efrat, and K. Kedem, Geometric pattern matching in-dimensional space,

Discrete Comput. Geom., 21 (1999), 257–274.

[15] L. P. Chew, M. T. Goodrich, D. P. Huttenlocher, K. Kedem, J. M. Kleinberg, and D. Kravets, Geometricpattern matching under euclidean motion, Comput. Geom. Theory Appl., 7 (1997), 113–124.

[16] M. T. Goodrich, J. S. B. Mitchell, and M. W. Orletsky, Practical methods for approximate geometricpattern matching under rigid motion, Proc. 10th Annu. Sympos. Comput. Geom., 1994, pp. 103–112.

[17] I. Halperin, B. Ma, H. Wolfson, and R. Nussinov, Principles of docking: An overview of search al-gorithms and a guide to scoring functions, Proteins: Structure, Function, and Genetics, 47 (2002),409–443.

[18] S. Har-Peled, A replacement for Voronoi diagrams of near linear size, Proc. 42nd Annu. IEEE Sympos.Found. Comput. Sci., 2001, pp. 94–103.

[19] S. Har-Peled, Class notes on compressed quadtrees, , 2005.

[20] S. Har-Peled and S. Mazumdar, Coresets for � -means and � -median clustering and their applications,Proc. 36th ACM Sympos. Theory Comput., 2004, pp. 291–300.

[21] D. P. Huttenlocher, K. Kedem, and J. M. Kleinberg, On dynamic Voronoi diagrams and the mini-mum Hausdorff distance for point sets under Euclidean motion in the plane, Proc. 8th Annu. Sympos.Comput. Geom., 1992, pp. 110–120.

[22] D. P. Huttenlocher, K. Kedem, and M. Sharir, The upper envelope of Voronoi surfaces and its applica-tions, Discrete Comput. Geom., 9 (1993), 267–291.

[23] P. Indyk, R. Motwani, and S. Venkatasubramanian, Geometric matching under noise: Combinatorialbounds and algorithms, Proc. 10th ACM-SIAM Sympos. Discrete Algorithms, 1999, pp. 457–465.

[24] K. Kedem, R. Livne, J. Pach, and M. Sharir, On the union of Jordan regions and collision-free transla-tional motion amidst polygonal obstacles, Discrete Comput. Geom., 1 (1986), 59–71.

[25] N. Megiddo, Applying parallel computation algorithms in the design of serial algorithms, J. ACM,30 (1983), 852–865.

[26] N. Sarnak and R. E. Tarjan, Planar point location using persistent search trees, Commun. ACM,29 (1986), 669–679.

[27] S. Seeger and X. Laboureux, Feature extraction and registration: An overview, Principles of 3D ImageAnalysis and Synthesis, Kluwer Academic Publishers, 2002, pp. 153–166.

[28] M. Sharir and P. K. Agarwal, Davenport-Schinzel Sequences and Their Geometric Applications, Cam-bridge University Press, New York, 1995.

22

Documents

Hausdorff Distance under Translation for Points and Ballsweb.cse.ohio-state.edu/~wang.1016/papers/haus-dcg.pdf · 2006. 11. 29. · Hausdorff Distance under Translation for Points