8
Computers & Geosciences Vol. 18, No. 4, pp. 419--426, 1992 0098-3004/92 $5.00 + 0.00 Printed in Great Britain. All rights reserved Copyright © 1992 Pergamon Press Ltd RASTER GIS: MODELS OF RASTER ENCODING FRED HOLROYD l and SARAHB. M. BELL 2 Faculty of Mathematics, The Open University, Milton Keynes MK7 6AA and 2NERC Unit for Thematic Information Systems, Department of Geography, University of Reading, Whiteknights, P.O. Box 227, Reading RG6 2AB, England A~traet--A fundamental prdblem for Geographical Information systems (GIS) is the need to interrelate spatial and nonspatial data into a system that can handle both spatially and object-oriented types of query. It is natural to structure the data primarily with respect either to the spatial or to the nonspatial information. The former selection leads to encodings of the raster type. There are potentially infinitely many addressing schemes for individual locations; any of these may be used as the basis for a simple raster encoding, a run-length encoding or (intermediate between these) an encoding analogous to the linear quadtree. The paper presents a unified view of such schemes, considers questions of storage and image-processing efficiency, and concludes with a brief look at the problem of integrating raster, vector, and object-oriented data types in GIS. Key Words: Data processing, Mathematics, Geographic Information Systems. INTRODUCTION This paper will concentrate on raster encoding, but it is difficult to remain exclusively within this subject. Recent developments seem to indicate that it is desirable to think in terms of an integration of raster, vector, and object-oriented approaches to spatial data handling. The need to take an integrated view of GIS data structures, of course, has been observed before. In chapter 2 of Burrough (1986), a variety of data structures for GIS are discussed, involving data of both raster and vector types. Burrough notes that "... In recent years it has become clear that what until recently was seen as an important con- ceptual problem [of vector-raster interconversion] is, in fact, a technological problem. When punched cards were the main medium of data storage, it was clearly an enormous task... Today, it is common- place for a good graphics colour screen to have an addressable raster array of 1024x 1024 pixels within an area of some 300 x 300 mm... " He goes on to observe that "In the late 1970s, several workers.., showed that many of the algorithms that had been developed for vector data structures not only had raster alternatives, but that in some cases the raster alternatives were more efficient." Finally, he notes the existence of efficient raster-to- vector and vector-to-raster algorithms, and concludes that "...The problem of raster or vector disappears once it is realized that both are valid methods for representing spatial data, and that both structures are interconvertible." Then again, Anthony and Corr (1987) give the following as the first two of a list of seven require- ments of a spatial data structure for a GIS: (a) The ability to handle vector and raster data without loss of information present in either format. (b) Vector and raster data should be handled in similar formats for commonality of software and manipulation techniques. In another contribution towards the integrated view, Smith and others (1987) clarify the nature of the problem. They point out that "one may categorize the majority of queries typically handled by GIS into two classes. The first class of query requests the locations of some class of spatial objects within a given spatial window, while the second class of query requests the identities of objects found within a given spatial window and belonging to some sub-class of objects." (Where can I locate what I want? vs What can I locate if I look hereabouts?) They note that, correspondingly, there are two basic ways to construct a spatial data model: (a) Objects may be represented with each object having spatial locations as an essential property; (b) Locations may be represented with each location being characterized by a set of object properties. They observe that these alternatives have resulted in the vector and tessellation (i.e. raster) models respectively, the basic logical unit in a vector model being a line whereas that in a tessellation model is a unit of space (a pixel). This view clearly impinges on the concept of object-oriented GIS, as model (a) is just as much a characterization of object-oriented as of vector GIS. The reason why the vector and object-oriented data 419

Raster GIS: Models of raster encoding

Embed Size (px)

Citation preview

Page 1: Raster GIS: Models of raster encoding

Computers & Geosciences Vol. 18, No. 4, pp. 419--426, 1992 0098-3004/92 $5.00 + 0.00 Printed in Great Britain. All rights reserved Copyright © 1992 Pergamon Press Ltd

RASTER GIS: MODELS OF RASTER ENCODING

FRED HOLROYD l and SARAH B. M. BELL 2

Faculty of Mathematics, The Open University, Milton Keynes MK7 6AA and 2NERC Unit for Thematic Information Systems, Department of Geography, University of Reading, Whiteknights, P.O. Box 227,

Reading RG6 2AB, England

A~traet--A fundamental prdblem for Geographical Information systems (GIS) is the need to interrelate spatial and nonspatial data into a system that can handle both spatially and object-oriented types of query. It is natural to structure the data primarily with respect either to the spatial or to the nonspatial information. The former selection leads to encodings of the raster type. There are potentially infinitely many addressing schemes for individual locations; any of these may be used as the basis for a simple raster encoding, a run-length encoding or (intermediate between these) an encoding analogous to the linear quadtree. The paper presents a unified view of such schemes, considers questions of storage and image-processing efficiency, and concludes with a brief look at the problem of integrating raster, vector, and object-oriented data types in GIS.

Key Words: Data processing, Mathematics, Geographic Information Systems.

INTRODUCTION

This paper will concentrate on raster encoding, but it is difficult to remain exclusively within this subject. Recent developments seem to indicate that it is desirable to think in terms of an integration of raster, vector, and object-oriented approaches to spatial data handling.

The need to take an integrated view of GIS data structures, of course, has been observed before. In chapter 2 of Burrough (1986), a variety of data structures for GIS are discussed, involving data of both raster and vector types. Burrough notes that " . . . In recent years it has become clear that what until recently was seen as an important con- ceptual problem [of vector-raster interconversion] is, in fact, a technological problem. When punched cards were the main medium of data storage, it was clearly an enormous t a s k . . . Today, it is common- place for a good graphics colour screen to have an addressable raster array of 1024x 1024 pixels within an area of some 300 x 300 m m . . . " He goes on to observe that "In the late 1970s, several worke r s . . , showed that many of the algorithms that had been developed for vector data structures not only had raster alternatives, but that in some cases the raster alternatives were more efficient." Finally, he notes the existence of efficient raster-to- vector and vector-to-raster algorithms, and concludes that " . . . T h e problem of raster or vector disappears once it is realized that both are valid methods for representing spatial data, and that both structures are interconvertible."

Then again, Anthony and Corr (1987) give the following as the first two of a list of seven require- ments of a spatial data structure for a GIS:

(a) The ability to handle vector and raster data without loss of information present in either format.

(b) Vector and raster data should be handled in similar formats for commonality of software and manipulation techniques.

In another contribution towards the integrated view, Smith and others (1987) clarify the nature of the problem. They point out that "one may categorize the majority of queries typically handled by GIS into two classes. The first class of query requests the locations of some class of spatial objects within a given spatial window, while the second class of query requests the identities of objects found within a given spatial window and belonging to some sub-class of objects." (Where can I locate what I want? vs What can I locate if I look hereabouts?) They note that, correspondingly, there are two basic ways to construct a spatial data model:

(a) Objects may be represented with each object having spatial locations as an essential property;

(b) Locations may be represented with each location being characterized by a set of object properties.

They observe that these alternatives have resulted in the vector and tessellation (i.e. raster) models respectively, the basic logical unit in a vector model being a line whereas that in a tessellation model is a unit of space (a pixel).

This view clearly impinges on the concept of object-oriented GIS, as model (a) is just as much a characterization of object-oriented as of vector GIS. The reason why the vector and object-oriented data

419

Page 2: Raster GIS: Models of raster encoding

420 F. HOLROYD and S. B.M. BELL

structures are descriptively similar is as follows. In a vector data structure the vector coordinates are recorded explicitly but the storage order is deter- mined by considerations other than spatial position (e.g. it might be the order in which somebody decides to digitize the information on a map). On the other hand, in a raster data structure it is the attributes at a point in space that are explicitly recorded, the position of that point being inferred from the order of storage. Thus vector data are inherently in an "object-oriented" form.

A well-known logical consequence of the different natures of vector and raster data is that, whereas vector data can record position to any degree of accuracy, raster data have a built-in level of pos- itional accuracy. Mathematically, one might put this by saying that vector positional data are real whereas raster positional data are integer. Franklin (1986) claims (in the context of considering graphics algor- ithms) that "raster is harder than vector" on the grounds that existence problems that are easy to state but seemingly impossible to solve (such as Goldbach's conjecture or Fermat's last theorem) can be constructed for the integers more easily than for the reals. This particular observation probably is not of immediate application to GIS problems, but, of course, the fact that maps and visual displays produced from vector data tend to be much sharper and clearer than from raster data is of significance. However, this is really a problem of data volume as much as anything else. In mathemati- cal theory a real number is defined to an arbitrarily high accuracy; but the vertices of a polygon in a vector data set have in reality a limited accuracy, which may not in fact be as good as the clean-looking lines on the display would suggest (see Goodchild, 1987).

Be that as it may, it is clear that the fundamental problem of GIS is that spatial and nonspatial information must be integrated into a structure in which both the "where can I locate this?" and the "what will I locate here?" type queries can be dealt with efficiently, and that the basic approaches use one of these types of information to index into the other. So the distinguishing feature of raster data structures is that nonspatial attribute infor- mation is indexed by spatial location, making it particularly efficient to answer the second query type. Perhaps a good general term for this type of structure is a spatially oriented data structure. This structure type now will be examined in more de- tail, and the paper will touch briefly on integrated structures.

SPATIALLY ORIENTED DATA STRUCTURES: WHAT WILL ! FIND HERE?

What is "here"?

Theoretically, "here" is any point in the geo- graphical region under consideration, or any defined

neighborhood thereof. In order to ask the question which is the subject of this section, the user therefore must define a location. As we noted in the Introduc- tion, this location is defined only up to a certain resolution, which is modeled by tessellating the appropriate portion of 2-D space into pixels (or voxels in the 3-D analogs). Each pixel then is allo- cated an address code, the result being a pixel addressing system. Alternatively, one can regard the address codes as referring to points on a geometri- cally regular grid, surrounded by notional polygons (perhaps the corresponding Thiessen polygons, but not necessarily). There are in fact subtle differences between these interpretations. If one is tessellating, then in principle the attribute value(s) for a given address should be a pixel average, whereas in the point interpretation this problem does not arise. Then again, what do we imply by "geometric regularity"? If we imply that each pixel or point should fit into the geometric structure in the same way as all the others, then the number of topologically distinct pixel patterns is 11, though if symmetry structure is taken into account the number rises to 81 (see Griinbaum and Shephard, 1977, 1987). As far as point patterns are concerned, there is no obvious interpretation of the concept of topological structure, and counting by symmetry structure the number of possibilities is 30 (Grfinbaum and Shephard, 1981, 1987). However, if one is allowed only to operate by translation (rather than rotation or reflection) in comparing the pixeis or points with each other, then the possibilities are reduced. For pixels, one has the possibility of the square or regular hexagonal tessellations. As far as topological structure goes these are the only ones, although one then has the option of altering the symmetry properties by applying a linear transformation to the structure as a whole, thus converting the squares into rec- tangles, rhombs, or general parallelograms and the regular hexagons into analogously general hexagonal shapes. For point grids, one is likewise forced to have the points at the centroids of such structures.

Smith and others (1987), in surveying data models, consider only the pixei interpretation. They observe that the hexagonal tessellation has the disadvantage over the square tessellation that it is not infinitely recursively decomposable (which clearly the square tessellation is). However, this disadvantage does not really apply if one thinks in terms of a regular hexagonal point grid (that is, the grid of centroids of a hexagonal tessellation); recursive embedding into similar but finer grids in fact is possible. To see this, consider the recursive amalgamation of the hexagonal tessellation described by Gibson and Lucas (1982) and Bell and others (1983) (see Fig. 1).

Associated with this amalgamation is a linear transformation which maps the centroids of the original hexagons onto those of the amalgamated shapes comprising seven hexagons each, and it is

Page 3: Raster GIS: Models of raster encoding

Raster GIS: models of raster encoding 421

clear that the inverse of this transformation will map the original point grid to a finer grid and that this inverse transformation can be applied recursively. Note that this process in this situation has the disadvantage that each recursive level is rotated by an irrational angle with respect to the previous one. There are hexagonal tessellation possibilities that avoid this; for example, the so-called HoR quadtree (see Bell, Diaz, and Holroyd, 1989). This is in essence the square tessellation subjected to a linear trans- formation that maps the squares to rhombs with angles of 60 and 120°; the shapes at the lowest level of the hierarchy now can be interpreted either as rhombs (the corresponding grid points being at the vertices) or as regular hexagons (the grid points being at their centroids) (see Fig. 2).

In any situation, the tessellation rather than the point grid interpretation seems to be the more usual one for geographers to make, so we may as well stay with it.

Pixel addressing systems

An addressing sytem is a system of codes, or symbols, representing pixels or points in an unam- biguous way. When human beings use such systems, their nature can be varied, employing a wide range of mathematical and other symbols. However, in a computer such systems are implemented finally as bit patterns. It is perfectly possible for two different mathematical representations to have bit-pattern realizations that are identical or closely related. Con- versely, it is possible for a given symbolic represen- tation to be implemented in the computer in a variety of ways that differ significantly in speed and other measures of efficiency.

Addressing systems using mathematical sym- bolism are discussed fairly fully, for example in Holroyd (1985). From the standpoint of functional- ity, however, it is the corresponding computer bit-

Figure 1. Recursive amalgamation of hexagonal tessellation.

i / / i / / / / / / / / / / / / / /

I I / / / / / / / / / / / / / / / / Z / / / / / / / /

l / l / / i / i / / / I I / / / / 7 / / / / / / / / /

Figure 2. Two interpretations of lowest-level shapes of HoR quadtree.

pattern representations that are of primary interest. In this paper, therefore, a pixel addressing system is understood to be a method of assigning a 1-1 correspondence between a set of bit patterns and a set of pixels (or voxels, if the context is three dimensional).

It is natural to represent spatial position in terms of Cartesian coordinates, so a pixel addressing system may be described as of Cartesian type if the bit positions can be partitioned into two subsets X and Y such that the X-bits uniquely describe the x-coor- dinate and the Y-bits the y-coordinate of the pixel, for some pair of (not necessarily perpendicular) axes. (Throughout this paper, please make the obvious adjustments for the 3-D context.) Usually the X and the Y bits will be stored in separate bytes or words, but there is no necessity for this.

During the early 1980s considerable effort went into developing the theory of tesseral addressing systems, in which space is subdivided hierarchically and addressed by multidigit symbols analogous to the conventional decimal numeral system, successive digits from left to right corresponding to positions in successive subdivisions (see in particular Bell and others, 1983; Bell and Mason, 1990). Much of this work is presented as a monograph of collected papers published by the Natural Environment Research Council (see Diaz and Bell, 1986). Accordingly, we shall describe a pixel addressing system as of tesseral type if the bit positions can be partitioned into a

Page 4: Raster GIS: Models of raster encoding

422 F. HOLROYD and S. B.M. BELL

system of sets such that the bits in each set describe a particular digit in a tesseral addressing system.

The sets in question need not be sets of consecutive bit positions, and the most well-known tesseral ad- dressing system, discovered independently by Morton (1966), Gargantini (1982), and Oliver and Wiseman (1983), is just the familiar Cartesian system with the x and y coordinate bits interleaved. The least signifi- cant x-bit, in conjunction with the least significant y-bit, represents the least significant tesseral digit (a quaternary digit) in the tesseral addressing system; then the next x-bit and the next y-bit represent the next tesseral digit; and so on. The ordering of the pixels implied by interpreting these bit patterns as conventional binary integers is known as the Morton ordering. Thus a system that is of tesseral type can be permutation-equivalent to one of Cartesian type, although not all tesseral-type systems have such an equivalence. For example, the system investigated by Gibson and Lucas (1982), known as Generalized Balanced Ternary and based on the hexagonal tessel- lation hierarchy in Figure l, is not permutation- equivalent to a Cartesian system.

The work on tesseral addressing systems men- tioned revealed that there are infinitely many such systems theoretically available, and Wingate (1988) and Bell and Holroyd (1991) give algorithms for generating them. The general theory is of some mathematical interest and gives rise to many aesthet- ically attractive tessellation hierarchies. The efficiency of these is discussed in Bell and Holroyd (1991); we shall return to this question later.

If a pixel addressing sytem can be regarded either as of Cartesian type or of tesseral type (depending on how the bits in the bit pattern are conceived as being grouped together), one might say that it is of Cartesio-tesseral type. Now a Cartesian system in which coordinates are expressed in fixed-point or integer format always will yield a tesseral addressing system upon bit-interleaving (in any number of dimensions, not just two), whereas most tesseral systems cannot be converted to Cartesian by bit permutation. Because positional coordinates are clearly always of relevance in geographical appli- cations, there seems to be a prima facie case for concentrating on Cartesio-tesseral systems; which in practice indicates Cartesian systems which can be bit-interleaved where tesseral applications and methods are desirable.

Quam (1980) considers a rather different range of possible addressing systems for multidimensional raster arrays; he considers the problem to be one of mapping the set of all (integer) array positions (i , j , . . . ) in a given multidimensional array to a linear integer array; thus the addresses of the "pixels" are regarded as integers rather than explicitly as bit patterns. He considers the possible addressing sys- tems given by mappings of the form:

array element (i , j , . . .) maps to f ~ ( i ) + f z ( J ) + ' " •

Different addressing systems then correspond to different selections of functions f l , f2 . . . . Quam shows that in two dimensions both the row ordering and the Morton ordering can be realized by suitable specification of the functions fj and f2.

As Quam's functions are linked to Cartesian coor- dinates, it is clear that all Cartesian-type addressing systems can be realized in this way, and it seems that his proposal is in effect a generalization of Cartesian- type addressing systems.

Bit-interleaving by hardware

A bit-interleaving operation using a conventional computing language normally is expensive in pro- cessor time. Therefore, if the conclusion that both the Cartesian and the tesseral forms of pixel addresses are needed is correct, an efficient interconversion process is desirable. A software table-driven solution has been described in appendix 1 of Holroyd and Mason (1990).

It is not unusual for complex problems to be solved by suitable hardware modifications; Adams and oth- ers (1984), for example, describe a successful hard- ware solution to the problem of carrying out "rubber sheet" transforms on raster-displayed data, and Wise (1988) describes the use of contents addressable filestore in conjunction with specially designed hardware for rapid access to large data sets.

We note here that a hardware solution to the bit-interleaving problem is easy to prescribe; all that is needed is a pair of registers linked together in such a way that the contents of one are always the bit-interleaved form of the contents of the other. A single bit-interleaving operation then can be accom- plished by reading into one register and out of the other, and a sequence of operations on tesseral addresses could be prefaced by a machine-codable register exchange.

Image-encoding systems

An image of resolution n is a 2" x 2" pixel array. The simplest type of raster encoding system for such an image is one in which the attributes of every pixel are explicitly recorded. This, of course, leads to considerable data-storage problems for large values of n, although advancing technology is lessening these. Nevertheless, data compression is clearly ad- vantageous, especially as many image-processing al- gorithms are faster with certain types of compressed data than with simple raster data.

Data compression involves both encoding a raster image into a compressed form and decoding the compressed format so that a "what will I locate here?" question can be efficiently answered. To some extent, there is a trade-off between degree of com- pression and encoding and decoding efficiency. For example, a remarkably high degree of data com- pression is claimed by Barnsley (1988) for a type of encoding of binary images known as an iterated function system. Such a system is simply a finite set of

Page 5: Raster GIS: Models of raster encoding

Raster GIS: models

contractive affine transformations, and the set of black pixels which it defines is the unique set which is the union of its self-images under the transform- ations. It is remarkably easy to prescribe extremely intricate images using a small IFS, and the decoding technique is simple and not enormously slower than the decoding of a simple raster; but unfortunately the process of encoding an image as an IFS seems cur- rently to be more of an art than a science, and there seems to be a lack of theoretical results giving bounds for the number of affine transformations one might need for an arbitrary raster image.

Leaving IFSs aside, raster data-compression tech- niques rely on the empirical fact that points close together tend statistically to have similar or identical attributes, which can be "blocked together". The oldest such technique is run length encoding (Rutovitz, 1968), in which the image is taken row by row and each run of similar pixels is encoded by a record containing the start position of the run and the common attribute value. (The length of the run also may be included if speed is more important than memory space, though it is theoretically redundant.) A generalization of this method is one in which the set of pixels with a particular attribute value (or vector of values) is given as the (not necessarily disjoint) union of rectangular blocks. This is known as the medial axis transformation (Blum, 1967; Rosenfeld and Pfaltz, 1966). It has the advantage of good compression, but the encoding is not unique and the process of determining optimal rectangles may be time-consuming.

A more popular compression method is the quadtree, on which there is a large literature. A comprehensive treatment of the quadtree (and its 3-D analog, the oct-tree) is given by Samet (1990a, 1990b); see also Samet (1984) who reviews early work in this area. The fundamental idea is to decompose the square picture area by recursively decomposing squares into four, stopping whenever a square of a uniform attribute value is obtained. Originally, each nonleaf node of a quadtree had explicit pointers to its offspring (and sometimes to its siblings also); Gargantini (1982) and Oliver and Wiseman (1983) effectively discovered tesseral addressing by realizing that this was unnecessary if the leaf nodes were given codes that were in fact quaternary tesseral address codes, with a fifth digit representing a "wild" digit. Thus the code 21XX (for example) refers to the block of pixels whose tesseral addresses are 2100, 2101, 2102, 2103, 2110, . . .2133-- that is, all ad- dresses whose most significant two digits are 21. Storing a list of such codes along with the corre- sponding attribute values effectively stores the quadtree information. Gargantini named such a quadtree a linear quadtree.

Any tesseral addressing system gives rise to an analogous image encoding system. In particular, Gib- son and Lucas (1982) described such a system based on the hexagonal tessellation and an addressing

of raster encoding 423

system using septary digits, and Tamminen (1981) and Lusby-Taylor (1986) have described tesseral sys- tems using binary tesseral digits. All of these systems, with the exception of that described by Gibson and Lucas, are Cartesio-tesseral in the sense described earlier. [Image-encoding systems also can be derived from hierarchical addressing systems that do not have any derived tesseral arithmetic (Dutton, 1990; Goodchild and Shiren, 1990).]

Normally, quadtree and run length encoding sys- tems are thought of as separate answers to the data-compression problem. However, an important link is made by Lauzon and others (1985). These authors describe two-dimensional run encoding--that is, run length encoding based on runs of tesseral addresses rather than runs along x-scan lines (i.e. runs in the row order raster encoding).

In Holroyd (1987) it is observed that, for any pixel addressing system whatever, one can have a run length encoding consisting of a list of maximal runs of similarly attributed pixels, and that one obtains a quadtree type of encoding from this by partitioning these runs into maximal subruns with binary ad- dresses from P00 . . . 0 to PI 1 . . . 1 (where P is some binary string which is constant for the subrun, and an even number of least significant bits is allowed to range through all possible values). If the addressing system is the tesseral one of Morton and others described previously (hereafter referred to as the Morton system) and the number of bits allowed to differ is 2 k, this is a quadtree leaf of level k, and corresponds to a 2kx 2 k square of pixels. On the other hand, if the addressing system were the usual Cartesian one, then such a subrun would be a line of similarly attributed pixels of a constant y coordinate and with x coordinates running from P00 . . . 0 to P l l . . . 1.

If the number of least significant bits allowed to change is not restricted to be even, then in each situation we obtain leaves of a binary rather than a quadtree. In the example of the Morton addressing system these leaves have two possible geometric shapes: a square, or a rectangle whose width is twice its height. This is essentially the system described by Tamminen (1981).

We thus have the following taxonomy of encoding systems. First, the system can be classified according to the underlying pixel addressing system. Then, in the situation of compression obtained by recording runs of similarly attributed pixels, one can ask: "Are the runs maximal? If not, on what scheme are maximal runs partitioned into subruns?"

Efficiency comparisons

In Holroyd (1987), some experiments are described which attempt to estimate the efficiency of the encoding systems as described, as regards both stor- age and the performance of certain standard image- processing algorithms. Clearly, for a given addressing system the run-length encoding is necessarily more

Page 6: Raster GIS: Models of raster encoding

424 F. HOLROYD and S. B. M. BELL

efficient in storage than the binary tree encoding, which in turn is more efficient than the quadtree encoding. However, when the Morton and row order- ings are compared for storage efficiency, a slight but definite tendency is determined for row ordering to be more efficient than Morton ordering for run length encodings, but Morton ordering more efficient than row ordering for binary and quadtree encodings. The storage efficiency of binary tree encoding is confirmed by Bell and Holroyd (1991), where the tree forms of sixteen tesseral addressing systems are compared, the system giving the greatest storage efficiency being a binary system (in fact, the system described by Lusby- Taylor, 1986). However, the binary trees based on row and Morton ordering are not discussed in the paper referenced here.

For the image operations of Boolean overlay and windowing, the row order run-length encoding seems to support the most efficient algorithms. For the dilation algorithm (i.e. line thickening), the analysis is more difficult but tentatively seems once more to support the row order run-length encoding. For geometric transformations, the conclusion is that such operations seem to be particularly resistant to data compression, but that a recent algorithm based on the linear quadtree encoding, due to van Lierop (1986), possibly may prove faster than that based on the simple raster encoding provided that the tree encoding has a sufficiently high compression factor when compared to the simple raster.

Lauzon and others (1985) describe algorithms for interconversion between Morton order run length and quadtree encodings; in Holroyd (1987) similar algorithms are described that also work for the binary tree encodings. In particular, the execution time for obtaining the tree encoding from the run encoding is determined to be proportional to the product of the number of runs and the resolution of the picture. When this observation is set alongside the findings given here, there seems to be a situation for saying that row ordered run-length encoding is worthy of reconsideration. It has the best storage efficiency, and leads to efficient image-processing operations as long as geometric transformations are not to be per- formed. On the other hand, if geometric transform- ations are envisaged, then the Morton order run-length encoding seems attractive, as it is easy to convert to a quadtree encoding when necessary.

These considerations refer to situations where the entire image code is resident in core memory. The question of paging costs when a virtual memory system must be used is a separate one. Quam (1980) considers this question in the context of his general mapping scheme described earlier, and concludes (in essence) that the Morton ordering scheme will be superior to the row-ordering scheme from this point of view, in the context of simple raster data. He conjectures that the multidimensional analog of Morton ordering is in fact optimum among the mappings of the type which he considers. Denham,

Holroyd, and Johnson (1986) tentatively come to the same conclusion for Morton vs row ordering in two dimensions. These conclusions await experimental verification.

INTEGRATED SYSTEMS

The problem of integrating raster and vector or object-oriented data formats has received consider- able attention, particularly in the situation where the raster data is in quadtree form. The crucial obser- vation here is that the quadtree organization of the data is independent of the exact nature of the data stored at the leaves (and nodes, in the situation of quadtrees with specific node storage). Samet (1984) describes point quadtrees, in which precise point coordinates are stored in the appropriate leaves of a quadtree, and several types of solution to the problem of storing information on boundaries (and lines generally) in quadtree form. In particular, the PM3 quadtree (Samet and Webber, 1983) stores at each leaf the coordinates of at most one point, and also information characterizing each line which ends at that point or passes through that leaf. This structure is proving useful for quadtree storage of vector data in GIS (see, for example, Callen and others, 1986; Ibbs and Stevens, 1988). There are other possibilities, however; see Gahegan (1989), who uses a linear quadtree in which four bits of each node are used to indicate the presence or absence of boundary lines on the north, east, south, and west edges of the corresponding square.

A more fundamental problem lies in the orthog- onal nature of the two types of query noted in the Introduction of this paper (object vs spatially ori- ented). In this context, Gahegan and Roberts (1988) propose a GIS in which a spatial system and an object system coexist and are coupled loosely. The object system passes requests for spatial data to the spatial system, which constructs "derived objects" that are passed back to the object system.

SUMMARY

The reason for the existence of raster, vector, and object-oriented types of geographical data structures lies in the fact that spatial and nonspatial data have to be interrelated and it is natural to select one type of information with respect to which to carry out the fundamental structuring of the data. If this structur- ing is by spatial position, we obtain encodings of the raster type. There are potentially infinitely many addressing schemes for individual locations; the two most widely used (row ordering and Morton order- ing) are related to each other by a bit-interleaving permutation. Based on any point (or pixel) address- ing system, the data can be compressed by using a tree structure, or by run-length encoding. Row order run-length encoding seems to give the best data compression, and to support a wide variety of

Page 7: Raster GIS: Models of raster encoding

Raster GIS: models of raster encoding 425

efficient image-processing algorithms. However, geo- metric t ransformations may in some situations be performed more efficiently if the data are in quadtree format. Interconversion between row order and Mor ton order addresses could be performed efficiently using suitable hardware, and there are efficient interconversion algorithms between run length and the corresponding tree encodings.

Vector data can be encoded into quadtrees using PM 3 and other similar quadtree structures, but the overall problem of object oriented vs spatially ori- ented analysis and queries may well be best tackled by coupling an object and a spatial system.

REFERENCES

Adams, J., Patton, C., Reader, C., and Zamora, D., 1984, Fast hardware for geometric warping, in Proc. Third Australian Remote Sensing Conference, Queensland, Australia, unpaginated.

Anthony, S., and Corr, D., 1987, GIS integration study: Final report: Systems Designers Scientific, Pembroke House, Pembroke Broadway, Camberley, Surrey, 63 p.

Barnsley, M., 1988, Fractals everywhere: Academic Press, San Diego, California, 394 p.

Bell, S. B. M., Diaz, B. M., Holroyd, F. C., and Jackson, M., 1983, Spatially referenced methods of processing raster and vector data: Image and Vision Computing, v. 1, no. 4, p. 211-220.

Bell, S. B. M., and Diaz, B. M., 1986, eds., Spatial data processing using tesseral methods (collected papers from Tesseral Workshops 1 and 2): Natural Environment Research Council, Polaris House, Swindon, 425 p.

Bell, S. B. M., Diaz, B. M., and Holroyd, F. C., 1989, The HoR quadtree: an optimal structure based on a non- square 4-shape, in Brooks, S. R., ed., Mathematics in remote sensing: Inst. Mathematics and its Applications Conference Series, No. 21, Clarendon Press, Oxford, p. 315-343.

Bell, S. B. M., and Holroyd, F. C., 1991, Tesseral amalgam- ators and hierarchical tessellations: Image and Vision Computing, v. 9, no. 5, p. 313-328.

Bell, S. B. M., and Mason D. C., 1990, Tesseral quaternions for the octtree: The Computer Jour., v. 33, no. 5, p. 386-397.

Blum, H., 1967, A transformation for extracting new de- scriptors of shape, in Wathen-Dunn, W., ed., Models for the perception of speech and visual form: MIT Press, Cambridge, Massachusetts, p. 362-380.

Burrough, P. A., 1986, Principles of geographical infor- mation systems for land resource assessment: Clarendon Press, Oxford, 193 p.

Callen, M., James, I., Mason, D. C., and Quarmby, N., 1986, A test-bed for experiments on hierarchical data models in integrated geographic information systems: Natural Environment Research Council, Polaris House, Swindon, p. 193-212.

Denham, C. M., Holroyd, F. C., and Johnson, J. H., 1986, Tessellation and pixel addressing systems for image processing: Natural Environment Research Council, Polaris House, Swindon, p. 87-98.

Dutton, G., 1990, Locational properties of triangular meshes: Proc. Fourth Intern. Symposium on Spatial Data Handling, v. 2, Zurich, p. 901-910.

Franklin, W. R., 1986, Problems with raster graphics algor- ithms, in Kessner, L. R. A., Peters, F. J., and van Lierop,

M. L. P., eds., Data structures for raster graphics: Springer-Vedag, Berlin, p. I-7.

Gahegan, M. N., and Roberts, S. A., 1988, An intelligent, object-oriented geographical information system: Intern. Jour. Geographical Information Systems, v. 2, no. 2, p. I01-I 10.

Gahegan M. N., 1989, An efficient use of quadtrees in a geographical information system: Intern. Jour. Geo- graphical Information Systems, v. 3, no. 3, p. 201-214.

Gargantini, I., 1982, An effective way to represent quadtrees: Communications, Association for Comput- ing Machinery, v. 25, no. 12, p. 905-910.

Gibson, L., and Lucas, D., 1982, Vectorisation of raster images using hierarchical methods: Computer Graphics and Image Processing, v. 20, no. I, p. 82--89.

Goodchild, M. F., 1987, A spatial analytical perspective on geographical information systems: Intern. Jour. Geo- graphical Information Systems, v. l, no. 4, p. 327-334.

Goodchild, M. F., and Shiren, Y., A hierarchical spatial data structure for global geographic information sys- tems: Proc. Fourth Intern. Symposium on Spatial Data Handling, v. 2, Zurich, p. 911-918.

Grfinbaum, B., and Shephard, G. C., 1977, The eighty-one types of isohedral tilings in the plane: Mathematical Proc. Cambridge Philosophical Society, v. 82, no. 2, p. 177-196.

Griinbaum, B., and Shephard, G. C., 1981, A hierarchy of classification methods for patterns: Zeitschrift der Krystallographie, v. 154, no. 3-4, p. 163-187.

Griinbaum, B., and Shephard, G. C., 1987, Tilings and patterns: W. H. Freeman, New York, 700 p.

Holroyd, F. C., 1985, Addressing systems for digital pictures (NERC remote sensing project, discussion paper no. 2): Centre for Configurational Studies, The Open Univer- sity, Walton Hall, Milton Keynes, 74 p.

Holroyd, F. C., 1987, Image encoding systems and image processing algorithms (NERC remote sensing project, discussion paper no. 4): Centre for Configurational Studies, The Open University, Walton Hall, Milton Keynes, 48 p.

Holroyd, F. C., and Mason, D. C., 1990, Efficient linear quadtree construction algorithm: Image and Vision Computing, v. 8, no. 3, p. 218-224.

Ibbs, T. J., and Stevens, A., 1988, Quadtre¢ storage of vector data: Intern. Jour. Geographical Information Systems, v. 2, no. l, p. 43-56.

Lauzon, J. P., Mark, D. M., Kikuchi, L., and Armando Guevera, J., 1985, Two-dimensional run-encoding for quadtree representation: Computer Vision, Graphics and Image Processing, v. 30, no. l, p. 56-69.

van Lierop, M. L. P., 1986, Geometrical transformations on pictures represented by leaf codes: Computer Vision, Graphics and Image Processing, v. 33, no. 1, p. 81-98.

Lusby-Taylor, C., 1986, A rectangular tessellation with computational and database advantages: Natural En- vironment Research Council, Polaris House, Swindon, p. 391-402.

Morton, G. M., 1966, A computer oriented goedetic data base and a new technique in file sequencing: IBM Canada Ltd., internal rept., not seen.

Oliver, M. A., and Wiseman, N. E., 1983, Operations on quadtree encoded images: The Computer Journal, v. 26, no. 1, p. 83-91.

Quam, L. H., 1980, A storage representation for efficient access to large multidimensional arrays: Tech. Note 220, SRI project 1009, SRI International, Menlo Park, California, 21 p.

Rosenfield, A., and Pfaltz, J. L., 1966, Sequential operations in digital image processing: Jour. Association for Computing Machinery, v. 13, no. 4, p. 471-494.

Rutovitz, D., 1968, Data structures for operations on digital images, in Cheng, G. C. and others, eds., Pictorial

Page 8: Raster GIS: Models of raster encoding

426 F. HOLROYD and S. B. M. BELL

pattern recognition: Thompson Book Co., Washington, DC, p. 105-133.

Samet, H., and Webbcr, R. E., 1983, Using quadtrees to represent polygonal maps: Proc. Computer Vision and Pattern Recognition, No. 83, Washington, DC, p. 127-132.

Samet, H., 1984, The qaadtree and related hierachical data structures: Association for Computing Machinery Surveys, v. 16, no. 2, p. 187-260.

Samet, H., 1990a, The design and analysis of spatial data structures: Addison-Wesley, Reading, Massachusetts, 493 p.

Samet, H., 1990b, Applications of spatial data structures: computer graphics, image processing and geographical information systems: Addison-Wesley, Reading, Mas- sachusetts, 507 p.

Smith, T. R., Menon, S., Star, J. L., and Estes, J. E., 1987, Requirements and principles for the implementation and construction of large-scale geographic information sys- tems: Intern. Jour. Geographical Information Systems, v. 1, no. 1, p. 13-31.

Tamminen, M., 1981, The EXCELL method for efficient geometric access to data: Acta Polytechnica Scandi- navica, Mathematics and Computer Science Series, No. 34, 26 p.

Wingate, W. J. G,, 1988, Tilings and amalgamations: un- publ. masters thesis, The Open University, Walton Hall, Milton Keynes, 133 p.

Wise, S., 1988, Using contents addressable filestore for rapid access to a large cartographic data set: Intern. Jour. Geographical Information Systems, v. 2, no. 2, p. 111-120.