48
Chapter 2 Geospatial Grids Abstract A geospatial grid is a uniform 2D grid mapped to the earth’s surface. Because the earth is a lumpy 3D object, any 2D grid involves approximating the earth (such as to an ellipsoid) and results in distortion. A variety of map projections are available, and pointers are given on choosing the appropriate map projection to handle trade-offs in the type of distortion associated with each projection. To analyze multiple geospatial grids, it is necessary to remap them to a common 2D grid. This process, illustrated for the Lambert to cylindrical equidistant case, typically involves bilinear interpolation of input grid values. Many image processing operations, like bilinear interpolation, assume that the grid values are locally linear. This has to be verified, either informally using a perceptual color map or formally by testing the root mean square of leave-one-out linear interpolation at different distances. Often geospatial grids have to be created from nonuniform 2D arrays such as from an instrument, from vector graphics such as lines or polygons or by interpolating between point observations. Techniques to handle these cases are described. 2.1 Representation In this book, a spatial grid will be a uniformly spaced two-dimensional array of numbers where each number corresponds to a “pixel” or grid point on the earth’s surface. The grid itself occupies an area on the earth’s surface although we should be careful about calling it a rectangular area because the earth’s surface is curved. An example of a spatial grid – the population density in 2000 in the area around New York City – is depicted in Fig. 2.1. Note that the actual data values have been mapped to a color scale for display purposes. Although we will focus on 2D grids, many of the techniques we will consider can be easily extended to three dimensions by considering a 3D grid as a stack of 2D grids. Nonuniform grids, i.e., grids whose V. Lakshmanan, Automating the Analysis of Spatial Grids, Geotechnologies and the Environment 6, DOI 10.1007/978-94-007-4075-4 2, © Springer Science+Business Media Dordrecht 2012 19

Automating the Analysis of Spatial Grids || Geospatial Grids

Embed Size (px)

Citation preview

Chapter 2Geospatial Grids

Abstract A geospatial grid is a uniform 2D grid mapped to the earth’s surface.Because the earth is a lumpy 3D object, any 2D grid involves approximating theearth (such as to an ellipsoid) and results in distortion. A variety of map projectionsare available, and pointers are given on choosing the appropriate map projectionto handle trade-offs in the type of distortion associated with each projection. Toanalyze multiple geospatial grids, it is necessary to remap them to a common2D grid. This process, illustrated for the Lambert to cylindrical equidistant case,typically involves bilinear interpolation of input grid values. Many image processingoperations, like bilinear interpolation, assume that the grid values are locally linear.This has to be verified, either informally using a perceptual color map or formallyby testing the root mean square of leave-one-out linear interpolation at differentdistances. Often geospatial grids have to be created from nonuniform 2D arrayssuch as from an instrument, from vector graphics such as lines or polygons orby interpolating between point observations. Techniques to handle these cases aredescribed.

2.1 Representation

In this book, a spatial grid will be a uniformly spaced two-dimensional array ofnumbers where each number corresponds to a “pixel” or grid point on the earth’ssurface. The grid itself occupies an area on the earth’s surface although we shouldbe careful about calling it a rectangular area because the earth’s surface is curved.An example of a spatial grid – the population density in 2000 in the area aroundNew York City – is depicted in Fig. 2.1. Note that the actual data values have beenmapped to a color scale for display purposes. Although we will focus on 2D grids,many of the techniques we will consider can be easily extended to three dimensionsby considering a 3D grid as a stack of 2D grids. Nonuniform grids, i.e., grids whose

V. Lakshmanan, Automating the Analysis of Spatial Grids, Geotechnologiesand the Environment 6, DOI 10.1007/978-94-007-4075-4 2,© Springer Science+Business Media Dordrecht 2012

19

20 2 Geospatial Grids

Fig. 2.1 (a) A spatial grid is a uniformly spaced two-dimensional array of numbers that covers anarea of the earth’s surface. (b) Each number in the grid corresponds to a “pixel” or grid point thatitself has a definite area

resolution varies throughout the domain, are harder to handle: we will assume thatyou will subsample or supersample the data to create a uniformly spaced grid andso focus exclusively in this book on uniform spatial grids.

In our computer implementations of automated analysis techniques (in the Javaprogramming language), we will store geospatial data as a LatLonGrid1:

1 publ i c c l a s s LatLonGrid f2 p r i v a t e i n t [ ] [ ] d a t a ;3 p r i v a t e LatLon nwCorner ;4 p r i v a t e double l a t R e s ;5 p r i v a t e double l onRes ;6 p r i v a t e i n t m i s s i n g ;7 / / e t c .8 g

One thing to note (line 2) is that the data values are stored as a two-dimensionalarray of integers. Why integers and not doubles or floats? The reason is that wewill need to do comparisons of pixel values extensively and comparing floatingnumbers is problematic because of computer round-off errors. It is safer to workwith integers. If necessary, floating point numbers should be scaled (multiplied by anappropriate number and rounded off) to make them integers. At the end of analysis,if necessary, they can be unscaled to their real values. The two-dimensional arrayimplementation feels natural because these are two-dimensional grids. However, itis not necessarily the best choice. Two-dimensional arrays in many programminglanguages are stored as arrays of arrays and, as such, a single spatial grid’s valuescould end up being stored in noncontiguous areas of a computer’s memory. This will

1LatLonGrid.java in the package edu.ou.asgbook.core

2.1 Representation 21

Fig. 2.2 The coordinatesystem used to index thepixels in a spatial grid

lead to a slowdown in performance as we traverse the grid pixel by pixel. Thus, amore efficient implementation, albeit an unnatural one, would be keep the data as aone-dimensional array of data and look up the data at a particular row and columnnot as data[row][col] but as data[row*numcols+col]. On present daycomputers, this multiplication is typically faster than memory access, but you shouldmeasure this on your hardware just to be sure.

In Java and other programming languages derived from C, the first element inthe two-dimensional array is (0,0). The first index is the row number and the secondindex is the column number. This leads to the somewhat nonintuitive coordinatesystem shown in Fig. 2.2. The first “axis” is not the “horizontal” axis, but thevertical axis and the vertical axis goes down (southwards), not up. This is a right-handed coordinate system that will be familiar from matrix algebra as well. Wewill sometimes refer to the first coordinate as x and the second coordinate as y.Keep in mind that this is not the traditional .x; y/ of a graph, but of this matrix-likecoordinate system.

The grid itself is located on the earth’s surface. Line 3 of the code listing abovespecifies the northwest corner of the grid. The northwest corner is the corner of thegrid, not the center of its first pixel (recall that each pixel itself occupies a definitearea, so this difference can be significant). The location is simply stored as a latitude-longitude pair. The latitude ranges from 90ı at the north pole to the �90ı at the southpole, with 0ı being the equator. The longitude ranges from �180 to 180 with 0 beingthe Greenwich meridian and 180/�180 representing the longitude exactly halfwayaround the world from the Greenwich meridian.

Our definition of a spatial grid as a uniform 2D grid mapped to the earth’s surfaceis vague. The earth is not a perfect sphere, or even a perfect ellipsoid. Instead, itis rather lumpy. So, how is the spatial grid uniform? Do the pixels all have thesame lengths (in kilometers)? A quick look at Fig. 2.1 makes it clear that this isnot the case – the LatLonGrid depicted is clearly wider at the bottom than at thetop. Our representation is in a reference system called cylindrical equidistant or

Platte Carree (we will look at map projections shortly), a very simplistic geographicreference system whereby the uniform spatial grid has pixels that all subtend thesame fraction of latitude and longitude. In the population density grid, for example,the pixels are all of size 0:0417ı � 0:0417ı. The simplicity of the cylindricalequidistant reference system has led it to being commonly used in geospatial data

22 2 Geospatial Grids

Fig. 2.3 Many real-world datasets have data missing in parts of the domain. (a) Global populationdensity in interior Africa. There was a civil war raging in the area where data are missing, makingit dangerous to conduct a census. (b) The terminator line on the satellite visible channel happensbecause the sun has already set in East Africa. (c) Ground-based radar beams can be blocked byterrain. (d) Parts of the earth’s surface beneath clouds are not sensed by Landsat

dissemination. Many freely available GIS tools such as Google Earth and NASAWorld Wind natively support reading images each of whose pixels have the samesize in latitude and longitude. Lines 4 and 5 of the representation are, therefore,the latitude and longitude resolution or the size of the pixels in the latitude andlongitude directions. The drawback of this coordinate system is also quite obvious– the length of a pixel in kilometers is nonuniform, decreasing as one moves closerto the poles. As long as our grids are reasonably small, this may not pose a problem.But when dealing with global datasets, one must be careful. We will point out thesesituations as they arise.

Line 6 of the code listing for the LatLonGrid is for a special sentinel value thatpertains to missing data. This is often needed for real-world datasets. For example,the population density spatial grid was created using survey data and where surveysor census information were not available (see Fig. 2.3), the pixel value is encodedwith a special value. This is usually an integer value that is physically unlikely,such as a negative number (�999) for population data. When doing local imageprocessing, one should be careful to not treat this special sentinel value as real data.

2.1 Representation 23

The data could be missing because it was not collected (as in the case of globalpopulation), because it was not possible to sense part of the domain (as in thecase of the satellite terminator line in Fig. 2.3b), or because of obstruction (asin Fig. 2.3c, d). In some cases, the data itself may be coded with missing data,whereas in others, automated processing is required to detect where data wouldhave been missing. For example, in remotely sensed fields, missing data is often afunction of the instrument’s interaction with the environment. For example, usingthe satellite navigation system and time of day, one can determine the location ofthe terminator line. Similarly, using radar siting, terrain heights and assumptionsabout atmospheric refraction, one can determine areas of beam blockage due toterrain. The case in Fig. 2.3d is probably the most difficult, since rather sophisticatedautomated processing may be needed to mask out clouds from the Landsat images.

Given that a pixel occupies some area, and that the northwest corner of the spatialgrid is the corner of the first pixel, the location of the pixel will be defined as thelocation of the pixel’s center, so that:

1 publ i c LatLon g e t L o c a t i o n ( i n t row , i n t c o l ) f2 / / l a t i t u d e decreases , l o n g i t u d e i n c r e a s e s3 return new LatLon ( nwCorner . g e t L a t ( ) � ( row + 0 . 5 )� l a t R e s ,4 nwCorner . ge tLon ( ) + ( c o l + 0 . 5 ) � l onRes ) ;5 g

Note that because the latitude decreases as the row number increases (it is 90ı atthe North Pole and 0ı at the equator: see Fig. 2.2), the latitude of any pixel in thegrid will be smaller than the latitude of the northwest corner while the longitudeof the pixel’s center will be greater than the southeast corner. This explains whylatitude is subtracted, whereas longitude is increased. The 0.5 accounts for thedifference between the corner and the center of a pixel.

There is one caveat to keep in mind about the spatial grid: the grid wraps around.In other words, the right edge of a global grid (longitudeD 180) and the left edge ofthe grid (longitude D �180) are identical. Therefore, if we are considering a globalspatial grid, we may need to explicitly handle this problem.

2.1.1 Georeference

Pretty much every conceptual model you have of the earth is an approximation. It isnot a sphere – it is somewhat flattened with the radius from the center of the earthto the equator greater than the distance to the poles by about 31 km (see Fig. 2.4).When treated as a sphere, one works with the mean of these radii, using a value ofabout 6,371 km. A better approximation to the earth’s surface would be to treat theearth as an ellipsoid, to account for the flattening. It is possible to use a regionalellipsoid to get greater accuracy if one is working on a specific region of the earth,but for the earth as a whole, the World Geodetic System 1984 (WGS84; NIMA2009) is the usual choice. It is the ellipsoid used by the Global Positioning System(GPS), for example.

24 2 Geospatial Grids

Fig. 2.4 The earth can beapproximated by an ellipsoid

There are two ways of measuring latitude – either from the center of the earthassuming a perfect sphere, or from the equatorial plane assuming an ellipsoidalshape. Because most latitudes are reported by GPS units, it is usually safe to assumethat, unless explicitly stated otherwise, latitudes are geodetic latitudes i.e., measuredfrom the equatorial plane.

Given that we are using WGS84 and geodetic latitudes, the distance in kilometersbetween a LatLon point on the earth’s surface and another LatLon point should becomputed using the ellipsoidal approximation. This can be done by finding the meanlatitude of the two points, finding the effective earth radius at the mean latitude andusing trigonometry to find the length of the arc connecting the two points (Snyder1987):

1 publ i c double d i s t ance InK m s ( LatLon o t h e r ) f2 double l a t 1 = Math . t o R a d i a n s ( t h i s . l a t ) ;3 double l a t 2 = Math . t o R a d i a n s ( o t h e r . l a t ) ;4 double l on1 = Math . t o R a d i a n s ( t h i s . l on ) ;5 double l on2 = Math . t o R a d i a n s ( o t h e r . l on ) ;67 double l a t 0 = ( l a t 2 + l a t 1 ) / 2 ; / / mean l a t i t u d e o f t h e two

p o i n t s8 double a = 6 3 7 8 . 1 3 7 ; / / WGS�849 double f = 1 . 0 / 2 9 8 . 2 5 7 2 2 3 5 6 3 ;

10 double esq = f �(2� f ) ;11 / / R i s t h e e f f e c t i v e e a r t h r a d i u s12 double R=a �

(1� esq ) / Math . pow ( sq (1� esq �( Math . s i n ( l a t 0 ) ) ) , 1 . 5 ) ;1314 double d l on = l on2 � l on1 ;15 double d l a t = l a t 2 � l a t 1 ;16 double t e rm = sq ( Math . s i n ( d l a t / 2 ) ) +17 Math . cos ( l a t 1 ) � Math . cos ( l a t 2 ) �

sq ( Math . s i n ( d l on / 2 ) ) ;18 return (2 � R � Math . a s i n ( Math . min ( 1 , Math . s q r t ( t e rm ) ) ) ) ;19 g

This formula is also often called the great circle distance – the shortest pathbetween two points on the earth’s surface is an arc that connects the two points,not necessarily one that is parallel to the latitude lines.

2.1 Representation 25

It should be noted that this distance does not take into account topography. TheWGS84 ellipsoid was fitted to the mean sea level (MSL) which itself is not constantthroughout the globe, but varies due to gravity variations. If height is important toyou, adapt this equation to make the location a 3D location (latitude, longitude,height above MSL) and use the Euclidean distance that takes into account the 2Ddistance as computed above and the distance based on differences in MSL C terrainheight between the two points. The differences in MSL are within 110 m forthe WGS84 geoid, so if a difference of this magnitude is not critical in yourapplication, you could ignore the difference in MSL. In this book, we will assume(for simplicity) that neither difference – in MSL or in topography – matters.

2.1.2 Map Projections

In order to carry out automated analysis on multiple datasets, it is necessary for allof them to have the same coordinate system. Because the earth is lumpy, almost anellipsoid, any two-dimensional coordinate system we choose comes with trade-offs.

The equal latitude-longitude (also known as “cylindrical equidistant,” “equirect-angular” or “Platte Carree”) representation introduced in the previous section hasthe advantage of simplicity (see Fig. 2.5). Most point or vector data are provided inthe form of latitude and longitude, making the registering of such data to a rastergrid straightforward. The disadvantage of the cylindrical equidistant representationis that (see Fig. 2.1) pixels of a uniform grid in this coordinate system do not havethe same size. In terms of the area of the earth’s surface that is covered, pixels closerto the poles are smaller than pixels closer to the equator. Distances calculations arealso complicated, since one has to compute a great circle distance – it is not the casethat a line connecting two pixels is the shortest distance between them.

Consequently, we might want to carry out our processing in a coordinate systemwhere the axes are lengths, rather than angles. Such coordinate systems, calledCartesian coordinate systems or planar coordinate systems, have the advantage offollowing Euclidean geometry – lines are the shortest path between points andlength and area measures are accurate. In order to obtain a grid on a flat, two-dimensional surface, it is necessary to project the three-dimensional surface of theearth on to a two-dimensional plane. This transformation of a 3D surface to a 2Done is called a map projection.

Mapping a 3D surface to a 2D one, regardless of the technique used, results indistortion. Different transformations result in different distortions and the underly-ing trade-offs might lead you to choose one method over another. Some projections(e.g., Mercator) preserve direction and are commonly used for navigational mapswhere it is important that a ship’s bearing is right. Other projections (e.g. LambertConformal Conic) preserve angles locally, but distort direction, length and area. Thedistortion increases away from standard parallels (see Fig. 2.6). Hence, conformalprojections are used mainly for grids covering small countries. Still other projections(e.g. Albers Equal-Area) preserve area at the cost of distorting angles, directions

26 2 Geospatial Grids

Fig. 2.5 We use the Platte Carree representation pretty much for the reasons mentioned in thiscartoon by Randall Munroe (The complete cartoon may be found at xkcd.com)

and distances. These can be used for continentscale maps where it is essential thatrelative areas are preserved across large distances. If minimizing the distortion indirection, shape (angle) or area is critical to your application, you should define yourspatial grid in the appropriate projection rather than in the equal-latitude-longitudereference system as we did in the previous section.

In addition to the cylindrical equidistant reference system, there is another widelyused reference system that is not quite a map projection. This is called UniversalTransverse Mercatur or UTM. UTM is defined on the basis of the transverseMercatur projection, which preserves angles and minimizes distortion of area,distance and direction around a central meridian. Thus, the UTM is a set of zonesand offsets where the central meridian is defined for each zone. This referencesystem has the advantage over the cylindrical equidistant in that the coordinate

2.1 Representation 27

Fig. 2.6 Distortion in the Lambert conformal projection is minimal near the standard parallels(Adapted from Snyder (1987))

Table 2.1 Suggested projections (Adapted from Snyder (1987))

Domain aspect Domain location Preserve Suggested projection

East-west Equator Shape MercatorEast-west Midlatitudes Area Cylindrical equal areaNorth-south any Shape Transverse MercatorNorth-south any Area Transverse cylindrical equal areaSquare Poles Shape Polar stereographicSquare Poles Area Polar Lambert Azimuthal equal areaSquare Equator Shape Equatorial stereographicSquare Equator Area Equatorial Lambert Azimuthal equal areaSquare Midlatitudes Shape Oblique stereographicSquare Midlatitudes Area Oblique Lambert Azimuthal equal area

system is in meters rather than in degrees. However, UTM maps are also quite smalland are useful only if your analysis domain is quite small, on the order of a 100 kmon each side.

Use the cylindrical equidistant for global datasets where it is not essential tominimize distortion of length, area, angle or direction. For global datasets wheredistortion needs to be minimized, the Robinson projection (long used by theNational Geographic Society) or the Miller Cylindrical are good choices. However,if the usage is for navigation, Mercator is a good choice. If the usage is to illustrategreat circle routes (such as for aircraft tracking), a gnomic projection is best. Fordatasets that cover small areas (such as a single urban area), use the UTM referencesystem. For continental scale studies, choose the projection based on several factors:whether the domain is predominantly east-west or north-south, whether or not thedomain is close to the equator and whether you seek to minimize distortion in areaor in shape. Projections suggested by Snyder (1987) in each of these situations areshown in Table 2.1. See also Fig. 2.7.

28 2 Geospatial Grids

Fig. 2.7 Different projections involve mapping the earth’s surface to different 2D surfaces.Cylindrical projections have low distortion near the equator; conical projections have low distortionat midlatitudes and planar or stereographic projections are used near the poles

For simplicity, we will describe and illustrate various techniques on a cylindricalequidistant grid in this book. Therefore, you will very likely have to adapt theimplementation to work on the projection that best fits your application.

2.1.3 Going from One Projection to Another

Having a comprehensive knowledge of map projections is not critical to workingsuccessfully with geographic data. It is enough to realize that all projections aremathematical transformations and result in distortion. Because the transformationsare mathematical in nature, it is possible to invert them successfully although insome cases, a closed-form formula does not exist, and you have to use numericalmethods to converge on a solution. The canonical reference for such transformationsis a USGS technical document by Snyder (1987).

An example of converting map projections is illustrative of the general process.Continental scale numerical weather forecasts are typically disseminated in “Lam-bert2SP,” Lambert Conformal Conic with two standard parallels. Consider that thedescription of surface albedo from a weather forecast provides the following infor-mation

1 ELLIPSOID WGS�842 PROJECTION LAMBERT2SP3 TRUELAT1 3 0 .4 TRUELAT2 6 0 .5 CEN LAT 38.000016 CEN LON �92.57 DELTA EW 4000 .8 DELTA NS 4000 .9 NROWS 749

10 NCOLS 979

2.1 Representation 29

Fig. 2.8 When mappingpoints from one grid toanother, the mapping has tobe carried out using inverseformulas: for every outputgrid point, find thecorresponding point(s) in theinput grid

and you wish to convert the spatial grid into a cylindrical equidistant coordinatesystem. In this case, the two standard parallels are at 30 and 60 (degrees latitude)while the pixels are 4,000 � 4,000 m. The grid center point has its center at(38.00001, �92.5). How does one remap this spatial grid into a LatLonGrid?

The necessary mathematical formulas to compute the location (latitude, longi-tude) of any .x; y/ in the Lambert coordinate system can be obtained from Snyder(1987) but what we actually require is the inverse. Given the (lat,lon), we would liketo determine the .x; y/. The reason is that even though we are given data on a spatialgrid in the .x; y/ coordinate system, we would like to fill out a raster grid that isuniform in latitude and longitude. This is a key point – for every point of the outputgrid, one needs to find the appropriate value from the input grid. Performing themapping using forward formulas will result in the same .x; y/ point being mappedto multiple (lat,lon) grid points and some (lat,lon) grid points remaining unfilled.This is shown pictorially in Fig. 2.8. Note that there are four output grid pointsthat get their value from the one input grid point. If we had carried out a forwardmapping, three of the output grid points would have been unfilled, resulting in holesin the output grid.

Consequently, we need the inverse mapping formulas, which are also con-veniently provided by Snyder (1987, p. 109) and involves solving an equationnumerically i.e., using an iterative process. Obtain (Snyder 1987) and follow alongas you read the code listing below. While your grids may not be Lambert Conformal,the process to map your data to a LatLonGrid (or map other grids to your choice ofprojection) will be similar.

The first step is to code up the inverse mapping formulas. In the case of a Lambertgrid, each grid point has a coordinate that is given by:

30 2 Geospatial Grids

1 publ i c c l a s s LambertConformal2SP f2 publ i c s t a t i c c l a s s Coord f3 publ i c f i n a l double n o r t h i n g , e a s t i n g ;4 publ i c Coord ( double n o r t h i n g , double e a s t i n g ) f5 t h i s . n o r t h i n g = n o r t h i n g ;6 t h i s . e a s t i n g = e a s t i n g ;7 g8 g9 g

The northing and easting are the coordinates and are in meters.Given the properties of a projection, from the description of the input grid, it is

possible to precompute the projection properties:

1 publ i c LambertConformal2SP ( E l l i p s o i d e l l i p s o i d , LatLonf a l s e O r i g i n L l ,

2 double l a t 1 , double l a t 2 , Coord f a l s e O r i g i n L a m ) f3 t h i s . e l l i p s o i d = e l l i p s o i d ;4 t h i s . f a l s e o r i g i n l l = f a l s e O r i g i n L l ;5 t h i s . l a t 1 = l a t 1 ;6 t h i s . l a t 2 = l a t 2 ;7 t h i s . f a l s e o r i g i n l a m = f a l s e O r i g i n L a m ;89 t h i s . e = Math . s q r t ( e l l i p s o i d . eccsq ) ;

10 double ph i 1 = Math . t o R a d i a n s ( t h i s . l a t 1 ) ;11 double ph i 2 = Math . t o R a d i a n s ( t h i s . l a t 2 ) ;12 double t 1 = com pu t e t ( e , ph i 1 ) ;13 double t 2 = com pu t e t ( e , ph i 2 ) ;14 double m1 = compute m ( e , ph i 1 ) ;15 double m2 = compute m ( e , ph i 2 ) ;1617 t h i s . n = ( Math . l og (m1) � Math . l og (m2) ) / ( Math . l og ( t 1 ) �

Math . l og ( t 2 ) ) ;18 t h i s . F = m1 / ( n�Math . pow ( t1 , n ) ) ;1920 double phiF = Math . t o R a d i a n s ( f a l s e o r i g i n l l . g e t L a t ( ) ) ;21 double t F = com pu t e t ( e , phiF ) ;22 t h i s . rF = e l l i p s o i d . eq r � F � Math . pow ( tF , n ) ;23 g

Then, given any location in latitude and longitude, it is possible to obtain thelocation in the Lambert projection:

1 publ i c Coord ge t Lam ber t ( LatLon i n ) f2 double ph i = Math . t o R a d i a n s ( i n . g e t L a t ( ) ) ;3 double t = com pu t e t ( e , ph i ) ;4 double r = e l l i p s o i d . eq r � F � Math . pow ( t , n ) ;5 double lambda = Math . t o R a d i a n s ( i n . ge tLon ( ) ) ;6 double lambdaF =

Math . t o R a d i a n s ( f a l s e o r i g i n l l . ge tLon ( ) ) ;7 double t h e t a = n � ( lambda � lambdaF ) ;8

2.1 Representation 31

9 double e a s t i n g = f a l s e o r i g i n l a m . e a s t i n g + r �Math . s i n ( t h e t a ) ;

10 double n o r t h i n g = f a l s e o r i g i n l a m . n o r t h i n g + rF � r �Math . cos ( t h e t a ) ;

11 return new Coord ( n o r t h i n g , e a s t i n g ) ;12 g

This provides the Lambert coordinate of a LatLon point. Now, we can simplyfind the closest input grid point to this Lambert coordinate and assign its value tothe LatLonGrid. Do this for every LatLon grid point and one has a 2D array ofvalues that can be used to populate a LatLonGrid;

1 i n t [ ] [ ] l am da t a = . . . ; / / i n p u t da t a i n l a m b e r t p r o j e c t i o n2 LambertConformal2SP p r o j = . . . ; / / p r o j e c t i o n3 i n t [ ] [ ] l l d a t a = new i n t [ ou t rows ] [ o u t c o l s ] ; / / o u t p u t l a t l o n

g r i d da t a4 for ( i n t i =0; i < ou t rows ; ++ i ) f5 double l a t = m ax l a t � i � l a t r e s ;6 for ( i n t j =0; j < o u t c o l s ; ++ j ) f7 double l on = minlon + j � l o n r e s ;8 LambertConformal2SP . Coord lam = p r o j . ge t L am ber t ( new

LatLon ( l a t , l on ) ) ;9 double rowno = (0 � lam . n o r t h i n g ) / n o r t h r e s ;

10 double co l no = ( lam . e a s t i n g � 0) / e a s t r e s ;11 l l d a t a [ i ] [ j ] = Remapper . n e a r e s t N e i g h b o r ( rowno , colno ,

l amdata , m i s s i n g ) ;12 g13 g

Finding the closest grid point involves rounding off to obtain the row and col values.This is called a “nearest neighbor” mapping and might suffice in many instances.

1 publ i c s t a t i c i n t n e a r e s t N e i g h b o r ( double rowno , double colno ,i n t [ ] [ ] i n p u t , i n t m i s s i n g ) f

2 f i n a l i n t row = ( i n t ) Math . round ( rowno ) ;3 f i n a l i n t c o l = ( i n t ) Math . round ( co l no ) ;4 f i n a l i n t nrows = i n p u t . l e n g t h ;5 f i n a l i n t n c o l s = ( nrows > 0) ? i n p u t [ 0 ] . l e n g t h : 0 ;6 i f ( row >= 0 && c o l >= 0 && row < nrows && c o l < n c o l s ) f7 return i n p u t [ row ] [ c o l ] ;8 g e l s e f9 return m i s s i n g ;

10 g11 g

However, where the input and output resolutions are drastically different, thisresults in a pixelated grid. In such cases, it is preferable to interpolate the inputgrid when wrapping it over a new coordinate system. The basic idea with bilinearinterpolation is to find the four grid points that bracket a given Lambert coordinate.If, for example, the rowno is 6.2, then the value of the grid point at row D 6 (the“floor”) and row D 7 (the “ceil”) should be interpolated with weights of 0.8 and 0.2,respectively (see Fig. 2.9).

32 2 Geospatial Grids

Fig. 2.9 Bilinear interpolation involves interpolating between the four input grid points thatbracket the mapped output point

Care should be taken to account for the situation where the point is exactly at theLambert row or column. Also, only nonmissing points should be interpolated.

1 publ i c s t a t i c i n t b i l i n e a r I n t e r p o l a t i o n ( double rowno , doublecolno , i n t [ ] [ ] i n p u t , i n t m i s s i n g ) f

2 f i n a l i n t row0 = ( i n t ) Math . f l o o r ( rowno ) ;3 f i n a l i n t c o l 0 = ( i n t ) Math . f l o o r ( co l no ) ;4 f i n a l i n t row1 = ( i n t ) Math . c e i l ( rowno ) ;5 f i n a l i n t c o l 1 = ( i n t ) Math . c e i l ( co l no ) ;6 f i n a l i n t nrows = i n p u t . l e n g t h ;7 f i n a l i n t n c o l s = ( nrows > 0) ? i n p u t [ 0 ] . l e n g t h : 0 ;89 i n t n p t s = 0 ;

10 double t o t w t = 0 ;11 double t o t v a l = 0 ;12 for ( i n t row = row0 ; row <= row1 ; ++row ) f13 for ( i n t c o l = c o l 0 ; c o l <= c o l 1 ; ++ c o l ) f14 i f ( row >= 0 && c o l >= 0 && row < nrows && c o l <

n c o l s &&15 i n p u t [ row ] [ c o l ] != m i s s i n g ) f16 double rowwt = 1 � Math . abs ( rowno�row ) ;17 double co l w t = 1 � Math . abs ( colno�c o l ) ;18 double wt = rowwt � co l w t ;19 n p t s ++;20 t o t w t += wt ;21 t o t v a l += wt � i n p u t [ row ] [ c o l ] ;22 g23 g24 g2526 / / we i gh t ed average

2.2 Linearity of Data Values 33

27 i f ( n p t s == 0) f28 return m i s s i n g ;29 g e l s e f30 return ( i n t ) Math . round ( t o t v a l / t o t w t ) ;31 g32 g

One issue with bilinear interpolation is that it provides only a piece-wise linearoutput grid. If your data are smoother, a higher order interpolation technique suchas splines are capable of dealing better with extreme values, especially if theinput sampling happened to miss it. A Catmull-Rom spline, which uses only fourvalues and passes exactly through the grid points is a particularly good choice (seeSect. 2.5.3 for details).

2.2 Linearity of Data Values

Interpolating data values when remapping data from one coordinate system toanother implicitly assumes that the data are locally linearly varying. This is anassumption that underlies nearly all of the techniques that we will consider in thisbook. Smoothing, clustering, etc. all implicitly assume that the data are locallylinear.

It is a good idea when starting to work with a new or unfamiliar dataset to actuallycheck whether this common assumption that underlies much of image processingalgorithms is actually true for your data. How can you verify linearity? We willconsider two ways: one which is rather informal, but works quite well in practiceand another which is formal and allows you to compare various transformations ofthe data in a quantitative manner.

The simplest thing to do is to explore how the dataset is used when it is usedinteractively. In order to display 2D grids, a color-map needs to be chosen to mapthe data values to colors. In many instances, the choices made for this mapping canilluminate the nature of the data. Take for example, the population density datasetof Fig. 1.6a. Looking at the poster created of this dataset, it is clear from the legendof the figure that interesting ranges are 0, 1–4, 5–24, 25–249, 250–999 and 1,000C.At least at a global scale, then, the data values are definitely not linear. Because theranges increase by half an order of magnitude from one level to the next, the dataare most likely logarithmic. Therefore, instead of reading the population densitygrid and using the pixel values directly, we should probably take the logarithm ofthe population density values. Then, we will be able to carry out most of the imageprocessing techniques that we will talk about in this book. Carrying out the imageprocessing techniques directly on the pixel data is likely to be suboptimal. Thispopulation density dataset is one that we will use extensively in this book to illustratea variety of spatial analysis techniques.2

2The data set is included along with the sample code with the kind permission of CIESIN andColumbia University.

34 2 Geospatial Grids

Fig. 2.10 A uniformly varying grid is shown using three different color maps. (a) The rainbowcolor map obscures midlevel values. (b) The human eye is much more sensitive to variations inlow intensities. (c) A “perceptual” color map is specially designed with the human visual systemin mind

It is a good strategy to look at color maps that are in use in a domain that heavilyuses the data. Thus, if we know that human intelligence analysts look at satelliteimagery to search for new nuclear power plants, we should ask them about the colormaps used to “enhance” the data, so that they can readily identify these units. Therange and spacing of such a color map are likely to provide useful information abouthow to transform the raw satellite data into a spatial field suitable for automatedpattern recognition of nuclear power plants. The reason that this works is that mostcustom display software makes it easy to configure color maps, and color maps areone of the first things that power users of data tweak. Hence, color maps that havewithstood years of use are usually well tuned to the task at hand.

However, using a color map meant for human visualization of the data is aninformal method of ensuring that the data are suitable for image processing. Wedo really need to test linearity of the data in a more formal manner. This isbecause the color maps are tuned to the human visual system and the human visualsystem varies from the mostly mathematical operations in some crucial ways. Thehuman eye is much more sensitive to variations in low intensities. In the Red-Green-Blue color scale commonly used in computers, the lowest intensity, black,is represented by the hexadecimal number 000000, whereas the highest intensity,white, is represented by the hexadecimal number ffffff. Thus, it is likely that a colormap meant for humans will have most of the interesting data closer to the darkvalue of the scale, which corresponds to small numbers. Conversely, though, mostimage processing operations are sensitive to variations in large numbers. Variationsof low intensities are usually considered “noise.” It is at high intensities that imageprocessing operations pick up on. Therefore, what the human eye observes will bedifficult for a computer algorithm to pick up on. Conversely, the computer algorithmwill pick up all sorts of things that appear spurious to a human.

A similar problem of mismatch between the human eye and computers existswith the ubiquitous “rainbow” color map. The rainbow color map is hard for thehuman eye to process. Differences in color are hard to see. The same set of numbersis uniformly mapped using three different color maps in Fig. 2.10. In which of theseis the range of the data most obvious?

2.2 Linearity of Data Values 35

2.2.1 Perceptual Color Maps

The color map used in Fig. 2.10c is a perceptual color map. A perceptual color mapis designed, so that equal variations in data values are perceived (by the humaneye) as equal steps in the data representation (Moreland 2009). The gray-scale colormap and the rainbow color map emphatically fail in this regard (Borland and Taylor2007). The following code creates the 256-level perceptual “cool-to-warm” colormap shown in Fig. 2.10. It is based on a color map created for the visualizationprogram Paraview which itself is an adaptation of work by Cindy Brewer (see http://www.colorbrewer.org/). Scale your data to a linear range of 0–255 and use this colormap to verify that your data are, indeed, linear and that the features of interest dostand out.

1 publ i c IndexColorModel createCoolToWarmColormap ( ) f2 byte [ ] r ed = new byte [ 2 5 6 ] ;3 byte [ ] g reen = new byte [ r ed . l e n g t h ] ;4 byte [ ] b l u e = new byte [ r ed . l e n g t h ] ;5 byte [ ] a l p h a = new byte [ r ed . l e n g t h ] ;67 i n t e r p o l a t e ( red , green , b lue , 0 , 25 , 0 . 0196078 , 0 . 188235 ,

0 . 380392 , 0 . 129412 , 0 . 4 , 0 . 6 7 4 5 1 ) ;8 i n t e r p o l a t e ( red , green , b lue , 25 , 51 , 0 . 129412 , 0 . 4 ,

0 . 67451 , 0 . 262745 , 0 . 576471 , 0 . 764706) ;9 i n t e r p o l a t e ( red , green , b lue , 51 , 76 , 0 . 262745 , 0 . 576471 ,

0 . 764706 , 0 . 572549 , 0 . 772549 , 0 . 870588) ;10 i n t e r p o l a t e ( red , green , b lue , 76 , 102 , 0 . 572549 , 0 . 772549 ,

0 . 870588 , 0 . 819608 , 0 . 898039 , 0 . 941176) ;11 i n t e r p o l a t e ( red , green , b lue , 102 , 127 , 0 . 819608 , 0 . 898039 ,

0 . 941176 , 0 . 968627 , 0 . 968627 , 0 . 968627) ;12 i n t e r p o l a t e ( red , green , b lue , 127 , 153 , 0 . 968627 , 0 . 968627 ,

0 . 968627 , 0 . 992157 , 0 . 858824 , 0 . 780392) ;13 i n t e r p o l a t e ( red , green , b lue , 153 , 178 , 0 . 992157 , 0 . 858824 ,

0 . 780392 , 0 . 956863 , 0 . 647059 , 0 . 509804) ;14 i n t e r p o l a t e ( red , green , b lue , 178 , 204 , 0 . 956863 , 0 . 647059 ,

0 . 509804 , 0 . 839216 , 0 . 376471 , 0 . 301961) ;15 i n t e r p o l a t e ( red , green , b lue , 204 , 229 , 0 . 839216 , 0 . 376471 ,

0 . 301961 , 0 . 698039 , 0 . 0941176 , 0 . 168627) ;16 i n t e r p o l a t e ( red , green , b lue , 229 , 256 , 0 . 698039 , 0 . 0941176 ,

0 . 168627 , 0 . 403922 , 0 , 0 . 121569) ;1718 a l p h a [ 0 ] = 0 ;19 for ( i n t i =1; i < a l p h a . l e n g t h ; ++ i ) f20 a l p h a [ i ] = 0 xc8 ;21 g2223 IndexColorModel colormap = new IndexColorModel ( 1 6 ,

r ed . l e n g t h , red , green , b lue , a l p h a ) ;24 return colormap ;25 g

36 2 Geospatial Grids

Fig. 2.11 (a) Visualizing the population density image using a rainbow color map might lull youinto believing that extracting city boundaries will be easy. (b) Visualizing the same image using aperceptual color map shows why an automated algorithm will require a bit of effort

The interpolate function interpolates between the designed colors in the appropriaterange:

1 void i n t e r p o l a t e ( byte [ ] red , byte [ ] b lue , byte [ ] green , i n ts t a r t , i n t end , double r1 , double g1 , double b1 , double r2 ,double g2 , double b2 ) f

2 for ( i n t i = s t a r t ; i < end ; ++ i ) f3 double f r a c = ( i � s t a r t ) / ( ( double ) ( end�s t a r t ) ) ;4 l ong r = Math . round (255� ( r1 + f r a c �( r2�r1 ) ) ) ;5 l ong g = Math . round (255� ( g1 + f r a c �(g2�g1 ) ) ) ;6 l ong b = Math . round (255� ( b1 + f r a c �(b2�b1 ) ) ) ;7 i f ( r < 0 ) r = 0 ;8 i f ( g < 0 ) g = 0 ;9 i f ( b < 0 ) b = 0 ;

10 i f ( r > 255 ) r = 255 ;11 i f ( g > 255 ) g = 255 ;12 i f ( b > 255 ) b = 255 ;13 red [ i ] = ( byte ) r ;14 b l u e [ i ] = ( byte ) b ;15 g reen [ i ] = ( byte ) g ;16 g17 g

It is highly recommended that you use perceptual color maps to look at yourdata when planning a set of analysis steps. In an image mapped according to theperceptual color maps, the noise level is a pretty good indication of the problemsthat you will face when applying automated techniques to the data. For example,if one were to apply a rainbow color map to a portion of the population densitydata covering the Eastern seaboard of the United States, we would see only one city(see Fig. 2.11). This might lull us into believing that it is easy to extract out the cityboundaries from the image. On the other hand, applying a perceptual color map tothe raw data quickly shows that extracting city boundaries will not be that easy ofa task.

2.2 Linearity of Data Values 37

Fig. 2.12 Carrying out a log transform of the data brings out details that were obscured in the rawdata. (a) Raw data. (b) Log transformed

Fig. 2.13 Local linearity can be verified by selecting valid triads and checking whether the valuelinearly interpolated from the bracketing points is close to the value of the center pixel

Conversely, if you do not see enough detail in the image when you apply anauto-scaled perceptual color map to the data, then it is an indication that youneed to transform your data first, preferably using a transformation function thatapproximates the color map that enables you to see something. In the case ofthe population density data set, this would lead us to realize that a logarithmictransformation of the data is needed (see Fig. 2.12). This may or may not be a goodthing – perhaps you do not want this much detail – it could be that your analysisrequires only the amount of detail that is obvious in the raw data.

2.2.2 Verifying Linearity

While visualizing the data with a perceptual color map can help guard againstinappropriate automated analysis of the data, it is preferable to formally verifythat the data are locally linear. This can be done following the technique describedin Lakshmanan (2012).

The idea is to take triads of pixels. Each triad consists of a center pixel and twopixels that bracket it (see Fig. 2.13) at distance D. The two bracketing pixel values(x�D and xD) are linearly interpolated. The difference between the interpolatedvalue and the actual value at the center (x0) is the error of the interpolation.

If we wish to compare whether taking the logarithm of the raw population densitymakes the data more linear, we find out whether the interpolation error is lower

38 2 Geospatial Grids

if the raw values are interpolated or if it is lower when the logarithm values areinterpolated. Of course, in either case, the error should be measured in the sameunits, so we could decide to measure the error in the raw data units. Mathematically,we seek a transformation f .x/ such that

eMSE DsX �

f �1

�f .x�D/ C f .xD/

2

�� x0

�2

(2.1)

is small. In the above formula, f �1.x/ is the inverse function. For example, if f .x/

is x2, then f �1.x/ would bep

x. The summation is carried out over all possibletriads, which we can find by marching through the image row-by-row and column-by-column. There are many potential choices for the transformation function f .x/

and this technique does not provide a way to find the best one. What it does is thatit allows you to compare two possible transformations by finding out, on a sampledata set, which transformation results in the lower error.

In order to verify linearity, we can march along rows and columns of the imagefinding potential triads:

1 publ i c s t a t i c S c a l a r S t a t i s t i c v e r i f y ( i n t [ ] [ ] da t a , D a t a S e l e c t o rs e l e c t o r , D a t aT rans fo rm t r a n s f o r m , i n t D) f

2 / / s e t u p3 S c a l a r S t a t i s t i c e r r o r s t a t = new S c a l a r S t a t i s t i c ( ) ;4 i n t nrows = d a t a . l e n g t h ;5 i f ( nrows == 0 ) f6 return e r r o r s t a t ;7 g8 i n t n c o l s = d a t a [ 0 ] . l e n g t h ;9 i f ( n c o l s == 0 ) f

10 return e r r o r s t a t ;11 g1213 / / f i n d t h e e r r o r i n e v e r y t r i a d i n t e r p o l a t i n g along rows14 for ( i n t c o l =0; c o l < n c o l s ; ++ c o l ) f15 for ( i n t row=D; row < nrows�D; ++row ) f16 i f ( s e l e c t o r . s h o u l d S e l e c t ( d a t a [ row ] [ c o l ] ,

d a t a [ row�D] [ c o l ] , d a t a [ row+D] [ c o l ] ) ) f17 i n t a c t u a l V a l u e = d a t a [ row ] [ c o l ] ;18 double t r a n s 0 =

t r a n s f o r m . t r a n s f o r m ( d a t a [ row�D] [ c o l ] ) ;19 double t r a n s 1 =

t r a n s f o r m . t r a n s f o r m ( d a t a [ row+D] [ c o l ] ) ;20 double t r a n s i n t e r p = ( t r a n s 0 + t r a n s 1 ) / 2 ;21 double i n t e r p V a l u e =

t r a n s f o r m . i n v e r s e ( t r a n s i n t e r p ) ;22 double e r r o r = ( i n t e r p V a l u e � a c t u a l V a l u e ) ;23 e r r o r s t a t . upda t e ( e r r o r � e r r o r ) ;24 g25 g26 g27

2.2 Linearity of Data Values 39

28 / / r e p e a t f o r columns29 for ( i n t row =0; row < nrows ; ++row ) f30 for ( i n t c o l =D; c o l < nco l s �D; ++ c o l ) f31 i f ( s e l e c t o r . s h o u l d S e l e c t ( d a t a [ row ] [ c o l ] ,

d a t a [ row ] [ col �D] , d a t a [ row ] [ c o l +D] ) ) f32 i n t a c t u a l V a l u e = d a t a [ row ] [ c o l ] ;33 double t r a n s 0 =

t r a n s f o r m . t r a n s f o r m ( d a t a [ row ] [ col �D] ) ;34 double t r a n s 1 =

t r a n s f o r m . t r a n s f o r m ( d a t a [ row ] [ c o l +D] ) ;35 double t r a n s i n t e r p = ( t r a n s 0 + t r a n s 1 ) / 2 ;36 double i n t e r p V a l u e =

t r a n s f o r m . i n v e r s e ( t r a n s i n t e r p ) ;37 double e r r o r = ( i n t e r p V a l u e � a c t u a l V a l u e ) ;38 e r r o r s t a t . upda t e ( e r r o r � e r r o r ) ;39 g40 g41 g4243 return e r r o r s t a t ;44 g

The ScalarStatistic in the above listing allows for statistics such as the mean,variance and standard deviation to be computed on data that is provided online,i.e., one-by-one.3

The DataSelector allows us to determine whether a triad is valid. Its implemen-tation could be as simple as finding out if all three values are not missing:

1 publ i c c l a s s NotMiss ing implements D a t a S e l e c t o r f2 protec te d i n t m i s s i n g ;3 publ i c NotMiss ing ( i n t m i s s i n g ) f4 t h i s . m i s s i n g = m i s s i n g ;5 g6 publ i c boolean s h o u l d S e l e c t ( i n t c e n t e r v a l , i n t v a l a , i n t

v a l b ) f7 return c e n t e r v a l != m i s s i n g && v a l a != m i s s i n g && v a l b

!= m i s s i n g ;8 g9 g

The DataTransform provides the candidate transformation function f .x/ andits inverse f �1.x/ for the purposes of calculating the mean square error. Forlogarithmic scaling with negative values not allowed, we might have:

1 publ i c c l a s s LogSca l i ng ex tends D at aT rans fo rm f2 p r i v a t e double s c a l e ;34 /�� M u l t i p l y l og ( i n p u t ) v a l u e s by t h i s amount i . e . i t i s

m u l t i p l i e r � l og ( v a l u e ) � /

3See ScalarStatistic.javainthepackageedu.ou.asgbook.core

40 2 Geospatial Grids

Table 2.2 Comparing theRMSE of linear interpolationcarried out on raw data versusthe RMSE when carried outon log-transformed data

D N RMSE (raw) RMSE (log)

1 1,947,580 1,652.4 1,752.83 1,870,488 3,083.3 2,965.55 1,814,335 3,534.9 3,269.8

11 1,686,834 4,184.9 3,527.221 1,534,294 4,341.7 3,564.431 1,422,723 4,445.9 3,615.541 1,335,652 4,309.5 3,552.4

N is the number of triads that the RMSE wascomputed from

5 publ i c LogSca l i ng ( double m u l t i p l i e r ) f6 t h i s . s c a l e = m u l t i p l i e r ;7 g89 @Override

10 publ i c double t r a n s f o r m ( double v a l u e ) f11 i f ( v a l u e > 1 ) f12 return ( s c a l e �Math . log10 ( v a l u e ) ) ;13 g e l s e f14 return 0 ;15 g16 g1718 @Override19 publ i c double i n v e r s e ( double v a l u e ) f20 i f ( v a l u e == 0 ) f21 return 1 ;22 g e l s e f23 return Math . pow ( 1 0 , v a l u e / s c a l e ) ;24 g25 g26 g

Taking the North American tile of the population density dataset, we can computethe Root Mean Square Error (RMSE) for two candidate transformations: the rawdata and logarithmic scaling.4 Results are shown in Table 2.2:

For D D 1, the raw data is somewhat better, but for larger values of D, it isclear that the RMSE of linear interpolation on the log-transformed spatial grid islower than the RMSE of linear interpolation on the raw data.5 Thus, if our imageprocessing of the global density data will be limited to 3 � 3 neighborhoods (so thatD D 1), then the raw data can be used as is. Otherwise, we should carry out imageprocessing of the data only after taking its logarithm.

4See LinearityVerifier.java in edu.ou.asgbook.linearity5As one would expect, the RMSE increases as D is increased. However, what matters for us iswhether the RMSE (raw) is greater than or less than RMSE (log).

2.3 Instrument Geometry 41

One question that you ought to also address is whether the results changedepending on the data values that you are interested in. For example, if you areonly interested in high-density (urban) areas, is the raw data more likely to be linearthan the logarithmic data? When carrying out such analysis, it is important that youtest one of the bracketing points (either one) but not the center value. This is becausein the interpolation test, the center value is unknown. Using apriori knowledge ofthe center value is not allowed. Thus, for example, the DataSelector used could beto test only triads where the first bracketing value is in a certain range:

1 publ i c c l a s s InRange ex tends NotMiss ing f2 p r i v a t e i n t t h r e s h 0 , t h r e s h 1 ;3 publ i c InRange ( i n t t h r e s h 0 , i n t t h r e s h 1 , i n t m i s s i n g ) f4 super ( m i s s i n g ) ;5 t h i s . t h r e s h 0 = t h r e s h 0 ;6 t h i s . t h r e s h 1 = t h r e s h 1 ;7 g8 publ i c boolean s h o u l d S e l e c t ( i n t c e n t e r v a l , i n t v a l a , i n t

v a l b ) f9 return super . s h o u l d S e l e c t ( c e n t e r v a l , v a l a , v a l b )

10 && v a l a >= t h r e s h 0 && v a l a < t h r e s h 1 ;11 g12 g

Using this selection criterion, the results do not change as seen in Table 2.3.Except for D D 1, the logarithmic transformation makes the data more linearspatially. Thus, image processing of the population density dataset should be carriedout on a log-transformed grid, not on the raw values. In this case, it was pleasantto observe that all three methods of verifying linearity – of checking the color-mapused for human visualization, of displaying the raw and log-transformed datasetusing a perceptual color map and verifying linearity by testing linear interpolationwithin triads – all yielded the same result. They all suggest log-transforming the rawvalues before carrying out any further operations.

However, the fact that at D D 1, the raw value is more spatially linear suggeststhat projecting the population density to other map projections should be carried outon raw values, not on the log-transformed values. This is because in bilinear interpo-lation (at approximately the same scale as the input image), the neighborhood sizebeing interpolated over will be less than 1. Of course, if the map projection involvesdownsampling the image (i.e., reducing the resolution of the image), then the pro-jection should be carried out in log-space as D will be greater than 1 in such cases.

2.3 Instrument Geometry

Sometimes, the spatial grid that we have to work with is not in a map projectionor other geographic system. Instead, it has been collected by an instrument. Whilethe data are spatial and gridded, they are not uniformly spaced in a georeferencedcoordinate system.

42 2 Geospatial Grids

Table 2.3 Except for D D 1, the log-transformed data is more spatiallylinear regardless of the data range considered

D Range N RMSE (raw) RMSE (log)

1 1–500,000 1,947,580 1;652:4 1;752:8

1 1–50 1,030,800 119:3 146:5

1 50–500 607,430 431:8 472:6

1 500–5,000 267,884 1;468:0 1;516:4

1 5,000–50,000 39,686 7;308:4 7;789:7

1 50,000–500,000 1,780 37;422:9 39;674:3

3 1–500,000 1,870,488 3;083:3 2;965:5

3 1–50 986,757 475:8 461:3

3 50–500 589,571 1;342:4 1;251:4

3 500–5,000 256,011 3;885:7 3;756:2

3 5,000–50,000 36,603 14;613:7 14;428:3

3 50,000–500,000 1,546 55;800:1 51;399:4

5 1–500,000 1,814,335 3;534:9 3;269:8

5 1–50 956,415 649:4 630:4

5 50–500 575,439 1;772:0 1;595:0

5 500–5,000 246,541 4;893:7 4;445:1

5 5,000–50,000 34,562 16;457:4 16;058:0

5 50,000–500,000 1,378 61;403:5 52;312:0

11 1–500,000 1,686,834 4;184:9 3;527:2

11 1–50 892,001 926:6 738:9

11 50–500 539,521 2;864:3 2;444:0

11 500–5,000 223,636 7;198:9 6;279:6

11 5,000–50,000 30,507 17;433:7 16;208:2

11 50,000–500,000 1,169 54;623:3 19;475:5

21 1–500,000 1,534,294 4;341:7 3;564:4

21 1–50 822,682 1;351:4 1;054:8

21 50–500 492,404 3;665:4 3;035:5

21 500–5,000 191,484 8;651:9 7;913:3

21 5,000–50,000 26,670 11;022:6 8;531:4

21 50,000–500,000 1,054 55;352:8 10;180:8

31 1–500,000 1,422,723 4;445:9 3;615:5

31 1–50 778,524 1;845:4 1;570:3

31 50–500 454,589 4;517:8 4;047:4

31 500–5,000 164,696 7;540:2 6;446:8

31 5,000–50,000 23,941 12;002:3 9;547:3

31 50,000–500,000 973 58;927:1 14;479:5

41 1–500,000 1,335,652 4;309:5 3;552:4

41 1–50 748,344 1;884:6 1;468:2

41 50–500 424,312 4;553:4 3;838:8

41 500–5,000 141,693 8;492:2 7;710:7

41 5,000–50,000 20,615 8;821:9 4;997:6

41 50,000–500,000 688 47;107:5 8;579:1

2.3 Instrument Geometry 43

Fig. 2.14 Satellites in orbit around the earth are in motion and image different parts of the earth’ssurface at different times. (a) MODIS image at 04Z on Sept 22, 2011. (b) MODIS image at 06Z

Because satellites are in orbit, their position relative to the earth is constantlychanging (except for geostationary satellites that maintain a lock on the samepoint on the earth’s surface). Thus, for example, images by the MODIS satellite inFig. 2.14 2 h apart are of different positions on the earth’s surface. Georeferencingthe image has to take the satellite’s location and orbital plane into account. Thesatellite position is typically described in terms of the satellite’s apogee (the point atwhich it is farthest from the earth), its perigee (the point at which it is closest to theearth) and the angular distance between the satellite’s current location and the linethat connects the apogee and perigee (see Fig. 2.15). The “anomaly” is the anglebetween this line and the line that connects the satellite to the center of the earth(see Fig. 2.14b).

Satellite images are usually already corrected for the satellite position andanomaly when they are georeferenced and placed into a geodesic coordinate system.This is why the MODIS images in Fig. 2.14 have black bands – they represent theparts of the LatLonGrid that have not been observed, but are nevertheless included,so that the grid can be uniform. However, if you are directly receiving the raw data,6

then you need to apply the satellite position and anomaly corrections to the data.It is essential when you do so to ensure that the data are linear, and that you caninterpolate nearby values. If you are not sure, it might be safer to simply pick thenearest neighbor.

Surface-based instruments are easier in that they typically do not move. Evenif they move, their positions in time are given in latitude and longitude. Unlikewith a satellite, we do not need to geometrically locate the instrument beyond this.

6This is not as esoteric as it sounds: the MODIS data stream, for example, is freely available andunencrypted, so that you can receive it in real time if you have a receiver.

44 2 Geospatial Grids

Fig. 2.15 When Georeferencing the raw satellite image, we have to account for the satellite’sposition and anomaly in the view angle. (a) MODIS Terra’s track on Sept 22, 2011. (b) Satellitenavigation

However, the geometry of the image could be difficult because the instrument is notmobile. In other words, although we get a grid, the grid is not uniformly spaced in ageodesic coordinate system.

Consider Fig. 2.16 which schematically shows data being collected by a surface-based weather radar. The radar is stationary but mechanically rotates to collect dataall around it. Because the aim is to observe phenomena (weather, aircraft, etc.) aloft,the radar tilts slightly upward as it scans (slightly because the interesting part of theatmosphere is only about 20 km thick, whereas a radar’s range could be as largeas 500 km). However, because of refraction and the earth’s curvature, the beamgets bent and its height above the earth’s surface keeps increasing until beyond apoint, the beam is so high above the earth’s surface that there are no interesting dataanymore. The beam also spreads as it goes away from the radar, so that the polarpixels (called “gates”) closer to the radar are smaller than the gates further awayfrom the radar. While image processing can be carried out in polar coordinates, this

2.3 Instrument Geometry 45

Fig. 2.16 (a) Due to the earth’s curvature and refraction, radar beams become higher and higherfrom the ground with distance. (b) A typical scanning pattern of a US weather radar. Note thatthe beam broadens with range, and that there are parts of the atmosphere even within the radarrange that are unobserved. (c) Weather over Cuba observed by a surface radar in Florida. Noticethe degradation of spatial resolution with range. The storms over Havana are clearly resolved, butthe echoes in the southwest corner of the image are not. (d) Mapping polar data to a Cartesian gridinvolves downsampling close to the radar and sub-pixel estimation far away from the radar

would involve dealing with beam broadening and distance calculations that varythroughout the image. It is often much more convenient to georeference the data andplace it on a uniform geographic coordinate system first. However, mapping the datato any Cartesian coordinate system will involve downsampling7 close to the radar(where the resolution of the input data is much higher than the output Cartesiangrid) and sub-pixel estimation farther away from the radar as shown in Fig. 2.16d.Operations like noise removal have to be carried out with care because far awayfrom the radar, a single noisy gate may affect multiple pixels in the output grid.

There are more complex remote sensing instruments. For example, mobile radarscombine the beam-spreading problems of a stationary surface-based radar withthe motion-related problems of satellites. When using raw data collected by aninstrument, therefore, it is important to understand the geometry of the instrument

7In signal processing terminology, downsampling or subsampling involves degrading the resolutionof a signal.

46 2 Geospatial Grids

Fig. 2.17 Regional and global datasets consisting of mosaiced and georeferenced data frommultiple instruments are often readily available. (a) Composite of MODIS runs. (b) Compositeof US weather radar. (c) Composite of European weather radar. (d) Eumetsat composite (Imagescourtesy (a) NASA, (b) NOAA, (c) EU Opera and (d) Eumetsat)

and how georeferencing the data affects the quality of the mapped data. Youmay also have to correct for instrument movement and instrument errors beforeapplying any of the techniques described in later chapters.

Consider again the MODIS images in Fig. 2.14. If you wish to analyze theMODIS images a year apart to detect changes (perhaps the construction of newbuildings or changes in crop types), working with the individual images will beextremely difficult. Obviously, you cannot simply take pairs of images and analyzethem because the tracks and view points will not be the same. The simplest approachis to first merge the images into a global dataset. Merging of this sort is quitecommon, and in many cases, much research has gone into creating optimal mergingstrategies tuned to the strengths and deficiencies of the data collecting instrument.Thus, for example, regional and global satellite datasets are available, as are country-wide radar coverages (see Fig. 2.17). If possible, start your analyses with these.

Building composites like those in Fig. 2.17 involves answering a few questions:

1. If the same scene is observed at the same time by two different instruments, howare the two data values blended?

2.4 Gridding Point Observations 47

2. If there is a gap in the domain such that none of the instruments observes it, whatvalue is assigned to this gap?

3. If the same scene is observed at slightly different times, how is the time difference(and the fact that the thing being observed could have moved in the meantime)accounted for?

We will look at ways to address the first two issues, to blend observationsand account for gaps in the next section on gridding point observations. Althoughwe will consider the methods from the standpoint of interpolating between pointobservations, they also apply to the problem of blending observations if we treateach observation at a pixel as a point observation with a weight given by its distancefrom the remote sensing instrument. Similarly, the method extends to the problem ofgap filling if we treat pixels with valid data as the point observations from which thevalue at the unfilled grid point needs to be estimated. We will postpone considerationof the third issue, of temporal alignment, to Sect. 7 where we will look at ways todetermine motion (and hence correct for it).

2.4 Gridding Point Observations

Suppose you have a set of measurements at points on the earth’s surface, and needto create a spatial grid from these measurements. Why would you want to do that?Often, it is because you wish to compare these measurements to something thatis already a spatial grid. For example, you may have sales at different stores ina domain and you want to correlate it with population density. In order to dothis, it may be convenient to take the sales at each store and spread it within itsneighborhood, so that there is a sales associated with every pixel. Of course, thealternate would be to take the population density in the neighborhood of each storeand assign it to that store. But if the problem is to estimate sales of different itemsby different stores, then it may be more convenient to work with spatial grids.This process of remapping point observations to a spatial grid is what we willcall “gridding.” Gridding may involve spreading out each observation, as with salesdata, or may involve interpolating between observations. Interpolation would be theframework of choice if the observations are of temperature and we wish to determinea likely temperature at a location somewhere in between the points. In meteorology,interpolation of surface observations into spatial grids is called “objective analysis.”

2.4.1 Objective Analysis

Given a set of point observations at .xk; yk/ of values f .xk; yk/, we wish to create aspatial grid consisting of points zij (see Fig. 2.18). For simplicity, we consider onlylinear combinations of the input observations:

48 2 Geospatial Grids

Fig. 2.18 Gridding a set ofpoint observations

zij DX

k

wkf .xk; yk/; (2.2)

where zi;j is the value at the pixel i; j of the spatial grid and f .xk; yk/’s are thepoint observation values. Note that such a linear combination of point observationsimplicitly assumes that the data are spatially linear – in Sect. 2.2.2, we consideredhow to test for the linearity of a spatial grid. It is possible, using a similar leave-one-out analysis, to verify whether this linearity assumption holds for the pointobservations. If the assumption does not hold, the data values should be transformedusing an appropriate function (we used the logarithm for the population density data,for example).

How should the weights, wk be chosen? This is often heuristic. The commonapproach is to try different weights and choose what “looks” best. The weight shoulddepend inversely on distance, so that close-by points are weighted higher.

Many functions have been proposed for computing the weight of a pointobservation at a pixel. For spreading out point observations, such as sales data,within a neighborhood, a Gaussian function is a good choice.

wk D 1

�p

2�e

� �r2

2�2 (2.3)

where r is the Euclidean distance between zij and .xk; yk/. The parameter � allowsyou to scale the Gaussian. The larger the � , the more pixels a point observationaffects. Because 99% of a Gaussian’s full value is within 3� of its center, you canuse the rule-of-thumb of setting the � to be a third of the maximum range of anobservation. For example, if pretty much no one will ever drive more than 100 kmto get to a store, then the � could be set at 33 km (see Fig. 2.19). In the aboveGaussian function, both the latitude and longitude directions have the same � . Youcould choose a weighting function where the � is different in the two directions:

2.4 Gridding Point Observations 49

Fig. 2.19 Gauss andCressman weightingfunctions

wk D 1p2��x�y

e� 1

2

�x2

�2x

C �y2

�2y

�(2.4)

2.4.2 Cressman

Cressman (1959) suggested the following weighting function:

wk D .1 � r2/=.1 � R2/; 8r < R; (2.5)

50 2 Geospatial Grids

where R is termed the radius of influence – a pixel’s value is affected only byobservations within a distance R (see Fig. 2.19). Obviously, the sum of the weightsat a pixel depends on the number of observations surrounding that pixel. Therefore,the wk’s do not necessarily sum up to 1, whereas when interpolating, we do needthem to sum up to 1. Hence, we use:

zij DP

k wkf .xk; yk/Pk wk

(2.6)

In general, this definition is better just in case the data value at one of the pointsis missing. Because the data from field instruments could frequently be missing orof poor quality, it is also a good idea in practice to insist on a minimum number ofpoints within the neighborhood of a pixel (see line 20):

1 publ i c LatLonGrid a n a l y z e ( P o i n t O b s e r v a t i o n s d a t a ) f2 LatLonGrid g r i d =

O b j e c t i v e A n a l y s i s U t i l s . c r e a t e B o u n d i n g G r i d ( da t a , l a t r e s ,l o n r e s ) ;

3 P o i n t O b s e r v a t i o n s . O b s e r v a t i o n P o i n t [ ] p o i n t s =d a t a . g e t P o i n t s ( ) ;

4 for ( i n t i =0; i < g r i d . getNumLat ( ) ; ++ i ) f5 for ( i n t j =0; j < g r i d . getNumLon ( ) ; ++ j ) f6 LatLon g r i d p t = g r i d . g e t L o c a t i o n ( i , j ) ;7 double sum = 0 ;8 double sumwt = 0 ;9 i n t n = 0 ;

10 for ( i n t k =0; k < p o i n t s . l e n g t h ; ++k ) f11 i f ( p o i n t s [ k ] . ge t V a l ue ( ) != d a t a . g e t M i s s i n g ( ) ) f12 double wt = wtFunc . computeWt (

p o i n t s [ k ] . g e t L a t ( ) � g r i d p t . g e t L a t ( ) ,p o i n t s [ k ] . ge tLon ( ) � g r i d p t . ge tLon ( ) ) ;

13 i f ( wt > 0 ) f14 sum += wt � p o i n t s [ k ] . ge t V a l ue ( ) ;15 sumwt += wt ;16 ++n ;17 g18 g19 g20 i f ( n >= m i nPo i n t s ) f21 g r i d . s e t V a l u e ( i , j , ( i n t ) Math . round ( sum / sumwt ) ) ;22 g e l s e f23 g r i d . s e t V a l u e ( i , j , g r i d . g e t M i s s i n g ( ) ) ;24 g25 g26 g27 return g r i d ;28 g

2.4 Gridding Point Observations 51

Given a set of point observations, how do you select R? A rule-of-thumb here isto find the distance from each point observation to its nearest observation:

dk D minj;j ¤k

.distance .xk; yk/ to .xj ; yj // (2.7)

and then to use twice the average of this distance over all the points.8

2.4.3 Optimization

If R is quite small and the number of points is large, the implementation shownabove is very inefficient because it tests every point at every pixel, whereas only ahandful of points are relevant. It is useful to precompute a neighborhood of weightsfor each point observation:

1 double [ ] [ ] computeWeightKernel ( W ei gh t Func t i on wtFunc , doublel a t r e s , double l o n r e s ) f

2 / / f i n d s i z e o f k e r n e l3 i n t h a l f r o w s , h a l f c o l s ;4 for ( h a l f r o w s = 0 ; ; ++ h a l f r o w s ) f5 double wt = wtFunc . computeWt ( l a t r e s �h a l f r o w s , 0 ) ;6 i f ( wt < 0 ) f7 break ;8 g9 g

10 for ( h a l f c o l s = 0 ; ; ++ h a l f c o l s ) f11 double wt = wtFunc . computeWt ( 0 , l o n r e s � h a l f c o l s ) ;12 i f ( wt < 0 ) f13 break ;14 g15 g16 / / form k e r n e l and compute w e i g h t s17 double [ ] [ ] k e r n e l = new double [2� h a l f r o w s +1][2� h a l f c o l s + 1 ] ;18 for ( i n t i =0; i < k e r n e l . l e n g t h ; ++ i ) f19 for ( i n t j =0; j < k e r n e l [ 0 ] . l e n g t h ; ++ j ) f20 double l a t d i s t = l a t r e s �( i � h a l f r o w s ) ;21 double l o n d i s t = l o n r e s �( j � h a l f c o l s ) ;22 k e r n e l [ i ] [ j ] = wtFunc . computeWt ( l a t d i s t , l o n d i s t ) ;23 g24 g25 return k e r n e l ;26 g

Then, the interpolated grid is obtained by placing a displaced weight kernel overeach of the point observations:

8See computeMeanDistance() in the class ObjectiveAnalysisUtils in the package edu.ou.asgbook.oban

52 2 Geospatial Grids

1 publ i c LatLonGrid a n a l y z e ( P o i n t O b s e r v a t i o n s d a t a ) f2 LatLonGrid g r i d =

O b j e c t i v e A n a l y s i s U t i l s . c r e a t e B o u n d i n g G r i d ( da t a , l a t r e s ,l o n r e s ) ;

3 double [ ] [ ] sum = newdouble [ g r i d . getNumLat ( ) ] [ g r i d . getNumLon ( ) ] ;

4 double [ ] [ ] sumwt = newdouble [ g r i d . getNumLat ( ) ] [ g r i d . getNumLon ( ) ] ;

5 i n t [ ] [ ] numpts = new i n t [ g r i d . getNumLat ( ) ] [ g r i d . getNumLon ( ) ] ;6 P o i n t O b s e r v a t i o n s . O b s e r v a t i o n P o i n t [ ] p o i n t s =

d a t a . g e t P o i n t s ( ) ;78 f i n a l i n t h a l f r o w s = wtKernel . l e n g t h / 2 ;9 f i n a l i n t h a l f c o l s = wtKernel . l e n g t h / 2 ;

10 for ( i n t k =0; k < p o i n t s . l e n g t h ; ++k ) f11 f i n a l i n t row = g r i d . getRow ( p o i n t s [ k ] ) ;12 f i n a l i n t c o l = g r i d . ge t C o l ( p o i n t s [ k ] ) ;13 i f ( p o i n t s [ k ] . ge t V a l ue ( ) != d a t a . g e t M i s s i n g ( ) ) f14 for ( i n t m=�h a l f r o w s ; m <= h a l f r o w s ; ++m) f15 for ( i n t n=� h a l f c o l s ; n <= h a l f c o l s ; ++n ) f16 f i n a l i n t i = row + m;17 f i n a l i n t j = c o l + n ;18 f i n a l double wt =

wtKernel [m+ h a l f r o w s ] [ n+ h a l f r o w s ] ;19 i f ( wt > 0 && g r i d . i s V a l i d ( i , j ) ) f20 sum [ i ] [ j ] += p o i n t s [ k ] . ge t V a l ue ( ) � wt ;21 sumwt [ i ] [ j ] += wt ;22 numpts [ i ] [ j ] ++;23 g24 g25 g26 g27 g2829 for ( i n t i =0; i < g r i d . getNumLat ( ) ; ++ i ) f30 for ( i n t j =0; j < g r i d . getNumLon ( ) ; ++ j ) f31 i f ( numpts [ i ] [ j ] >= m i nPo i n t s ) f32 g r i d . s e t V a l u e ( i , j , ( i n t )

Math . round ( sum [ i ] [ j ] / sumwt [ i ] [ j ] ) ) ;33 g e l s e f34 g r i d . s e t V a l u e ( i , j , g r i d . g e t M i s s i n g ( ) ) ;35 g36 g37 g38 return g r i d ;39 g

Note the assumption here that the weight kernel is the same across the domain ofthe entire grid. This will be true for grids that cover small areas of the earth or forgrids in projections like the Lambert Conformal. For continental scale domains inthe cylindrical equidistant coordinate system, it will not be true because distancesare distorted.

2.4 Gridding Point Observations 53

Fig. 2.20 Multiple passes of an objective analysis technique to sharpen an interpolated grid. (a)First pass. (b) Second pass. (c) Third pass. (d) Tenth pass

2.4.4 Successive Iteration

There is one issue with interpolating point observations into a spatial grid usingthe technique shown above. If a pixel in the spatial grid coincides with a pointobservation, the value of the grid at that pixel should be as close as possible tothe point observation (the two do not need to be equal, because presumably, thereis some error associated with the observations themselves). However, it is easy tosee that in the weighted average formulation above, this is not the case. The valueof the spatial grid at a pixel is given by a weighted average of all close-by pointobservations, not just the point observation that happens to be collocated with thepixel. Changing the value at the pixel alone would lead to abrupt discontinuities inthe field. One approach is to create an error field by interpolating the errors at eachof the points and adding the error field to the interpolated field. This has the effect of“sharpening” the field. Multiple passes of this method serve to make the field moreand more correct at the point observations at the cost of making the field less andless smooth (see Fig. 2.20).

1 publ i c LatLonGrid a n a l y z e ( P o i n t O b s e r v a t i o n s da t a , i n t numPasses ,i n t phys i ca l M i n , i n t phys ica lMax ) f

2 LatLonGrid r e s u l t = a n a l y z e ( d a t a ) ; / / pas s #13 f i n a l P o i n t O b s e r v a t i o n s . O b s e r v a t i o n P o i n t [ ] p o i n t s =

d a t a . g e t P o i n t s ( ) ;4 for ( i n t p a s s =1; p a s s < numPasses ; ++ p a s s ) f5 / / f i n d e r r o r a t each p o i n t6 P o i n t O b s e r v a t i o n s . O b s e r v a t i o n P o i n t [ ] e r r o r s = new

P o i n t O b s e r v a t i o n s . O b s e r v a t i o n P o i n t [ p o i n t s . l e n g t h ] ;7 for ( i n t k =0; k < p o i n t s . l e n g t h ; ++k ) f8 i n t a = p o i n t s [ k ] . ge t V a l ue ( ) ;9 i n t b = r e s u l t . ge t V a l ue ( p o i n t s [ k ] ) ;

10 i n t e r r o r = 0 ;11 i f ( a != d a t a . g e t M i s s i n g ( ) && b !=

r e s u l t . g e t M i s s i n g ( ) ) f12 e r r o r = a � b ;13 g

54 2 Geospatial Grids

14 e r r o r s [ k ] = newP o i n t O b s e r v a t i o n s . O b s e r v a t i o n P o i n t ( p o i n t s [ k ] .

15 g e t L a t ( ) , p o i n t s [ k ] . ge tLon ( ) , e r r o r ) ;16 g17 / / c r e a t e a g r i d o f e r r o r s and add t h i s t o t h e o r i g i n a l

g r i d18 LatLonGrid e r r G r i d = a n a l y z e ( new

P o i n t O b s e r v a t i o n s ( e r r o r s , d a t a . g e t M i s s i n g ( ) ) ) ;19 add ( r e s u l t , e r r G r i d , phys i ca l M i n , phys ica lMax ) ;20 g21 return r e s u l t ;22 g

A shortcoming of empirical approaches like Gaussian or Cressman weighting isthat it can be difficult to find out what weighting function to use and how manyiterations to carry it out over. It is possible to compare two different candidateweighting schemes, however. Leaving out one of the observations, carry out theinterpolation and determine the value of the spatial grid at the observation thatwas left out. Rotate amongst the observations to determine the mean error whenusing that weighting function. Repeat for the other candidate weighting schemes andchoose the one that yields the lowest error. Of course, this doesn’t mean that your setof weights is optimal, merely that among the weighting functions you considered,this is the one that performs best for this dataset.

2.4.5 Kriging

Is there a way to obtain an optimal weighting function based on the data themselvesand to ensure minimal error at the sampling points? Oliver and Webster (1990)suggests picking the weights wk to minimize the variance of the approximated field.If you have enough observations at the points over a long time, then you can computethe covariance matrix of the interdependence of the point observations as follows:

C D0@ EŒ.x0 � �0/.x0 � �0/� : : : EŒ.x0 � �0/.xk � �k/� : : :

: : : : : : EŒ.xj � �j /.xk � �k/� : : :

EŒ.xn�1 � �n�1/.x0 � �0/� : : : EŒ.xn�1 � �n�1/.xk � �k/� : : :

1A; (2.8)

where � is used to represent the mean value and EŒf .x/� is the expected value off .x/, i.e., the average value of f .x/ over a large enough dataset:

EŒf .x/� DP

x f .x/

Nx

(2.9)

The next step is to compute the variogram, which is the variation of thecovariance with distance. Sort the point observations by distance and fit a curve,

2.5 Rasterization 55

� , that maps h to C.x; x C h/, where h is the distance and C the covariance matrixabove. This curve is called a “variogram.” Given the variogram, one can computethe weights at a point xi j as:

W D C �1� .xij /; (2.10)

where � .xij / is read out of the variogram using the distances between each of thepoint observations and the pixel at which the estimation is being carried out:

� .xij / D

0BB@

�.dist .x0; xij //

�.dist .x1; xij //

: : :

�.dist .xn�1; xij //

1CCA (2.11)

While Kriging cannot be done for one-off datasets (unless the covariance matrixand variogram are somehow known), it is an excellent approach to interpolateobservations that are collected routinely. Collect enough data so as to compute thecovariance matrix and variogram. Once these have been estimated using historicaldata, the weights at each grid point can be computed. Given any new data, the gridcan be created quite easily as a weighted average of the point observations.

2.5 Rasterization

How do you take vector data – roads, rivers, polygons, etc. – and put them on aspatial grid? Computer graphics has a couple of useful techniques that are good toknow.

2.5.1 Points

Suppose you have a set of observations that are truly point events. Maybe you havea dataset consisting of cloud to ground lightning strikes. These happen at a specificpoint on the earth’s surface. We wish to create a spatial grid from these points so as tocompare these lightning strike events to some other data set that is naturally a spatialgrid. Note that this is different from the gridding situation discussed in Sect. 2.4 –there, we had point observations of a phenomenon that could be expected to existover the entire domain and we interpolated the observations over the entire grid.Here, we have point phenomena that need to be placed on a grid.

This is a rather straightforward problem to solve because each pixel of the spatialgrid occupies a definite area. We just need to find the pixel that contains the locationof the lighting strike and update its value. Because the spatial grid is uniform, the

56 2 Geospatial Grids

pixel that contains the location of the lightning strike can be obtained by simplyrounding off:

row D round ..nwlat � 0:5 � lat/=latres/

col D round ..lon � nwlon � 0:5/=lonres/ (2.12)

where nwlat, nwlon refer to the northwest corner of the spatial grid and latres, lonresto the resolution of the grid. The different signs are explained by the fact that latitudeincreases to the south while longitude increases to the east. Because row and colhave to be positive, the rounding off can be replaced by simple truncation:

1 publ i c P i x e l g e t P o s i t i o n I n ( double l a t , double lon , LatLonGridg r i d ) f

2 LatLon nwCorner = g r i d . getNwCorner ( ) ;3 / / can t r u n c a t e i n s t e a d o f round i ng o f f s i n c e row , c o l i s +ve4 i n t row = ( i n t ) ( ( nwCorner . g e t L a t ( ) � l a t ) / g r i d . ge t L a t R es ( )

) ;5 i n t c o l = ( i n t ) ( ( l on � nwCorner . ge tLon ( ) ) / g r i d . ge tLonRes ( )

) ;6 i f ( g r i d . i s V a l i d ( row , c o l ) ) f7 return new P i x e l ( row , col , g r i d . ge t V a l ue ( row , c o l ) ) ;8 g9 return new P i x e l (�1 , �1 , g r i d . g e t M i s s i n g ( ) ) ;

10 g

2.5.2 Lines

What if you have a dataset consisting of lines, such as aircraft track information thatneeds to be placed on a grid? In other words, how do you find the pixels in a spatialgrid that are covered by a line given the starting and ending latitudes and longitudesof the line?

The first step is to find the direction in which to draw the line. For lines that arelonger in the x-direction, we find the y for every x. For lines that are longer in they-direction, we find the x for every y (see Fig. 2.21). This is important because,otherwise, the lines will exhibit jumps i.e., the spatial grid will remain unfilled evenat points where it ought to be.

1 publ i c L i s t <P i x e l > g e t P o s i t i o n I n ( LatLonGrid g r i d ) f2 L i s t <P i x e l > r e s u l t = new A r r a y L i s t <P i x e l >() ;3 P i x e l p0 = new P o i n t ( l a t 0 , l on0 ) . g e t P o s i t i o n I n ( g r i d ) ;4 P i x e l p1 = new P o i n t ( l a t 1 , l on1 ) . g e t P o s i t i o n I n ( g r i d ) ;5 System . ou t . p r i n t l n ( ” Line from ” + p0 + ” t o ” + p1 ) ;6 i n t rowlen = Math . abs ( p0 . getRow ( ) � p1 . getRow ( ) ) ;7 i n t c o l l e n = Math . abs ( p0 . ge t C o l ( ) � p1 . ge t C o l ( ) ) ;8 / / avo i d d i v i d e by z e r o i n s l o p e c a l c u l a t i o n s below9 i f ( rowlen == 0 && c o l l e n == 0) f

10 r e s u l t . add ( p0 ) ;

2.5 Rasterization 57

Fig. 2.21 The slope of theline is important whenrasterizing lines

11 return r e s u l t ;12 g13 i f ( rowlen > c o l l e n ) f14 / / i n c r e m e n t i n row15 i n t s t a r t r o w = Math . min ( p0 . getRow ( ) , p1 . getRow ( ) ) ;16 i n t endrow = Math . max ( p0 . getRow ( ) , p1 . getRow ( ) ) ;17 double s l o p e = ( p1 . ge t C o l ( ) �

p0 . ge t C o l ( ) ) / ( ( double ) ( p1 . getRow ( )�p0 . getRow ( ) ) ) ;18 for ( i n t row= s t a r t r o w ; row <= endrow ; ++row ) f19 i n t c o l = ( i n t )

Math . round ( s l o p e �( row�p0 . getRow ( ) ) +p0 . ge t C o l ( ) ) ;20 i f ( g r i d . i s V a l i d ( row , c o l ) ) f21 r e s u l t . add ( new P i x e l ( row , col ,

g r i d . ge t V a l ue ( row , c o l ) ) ) ;22 g23 g24 g e l s e f25 i n t s t a r t c o l = Math . min ( p0 . ge t C o l ( ) , p1 . ge t C o l ( ) ) ;26 i n t endco l = Math . max ( p0 . ge t Co l ( ) , p1 . ge t Co l ( ) ) ;27 double s l o p e = ( p1 . getRow ( )�p0 . getRow ( ) ) / ( ( double )

( p1 . ge t C o l ( )�p0 . ge t C o l ( ) ) ) ;28 for ( i n t c o l = s t a r t c o l ; c o l <= endco l ; ++ c o l ) f29 i n t row = ( i n t )

Math . round ( s l o p e �( co l �p0 . ge t C o l ( ) ) +p0 . getRow ( ) ) ;30 i f ( g r i d . i s V a l i d ( row , c o l ) ) f31 r e s u l t . add ( new P i x e l ( row , col ,

g r i d . ge t V a l ue ( row , c o l ) ) ) ;32 g33 g34 g35 return r e s u l t ;36 g

58 2 Geospatial Grids

Fig. 2.22 Catmull-Romcubic splines pass exactlythrough four points and allowfor interpolation between thesecond and third point

2.5.3 Splines

What if you have a set of points that need to be connected? One method would beto connect successive points using lines and drawing those lines. However, such apiece-wise linear method is not smooth, since the slope of the line before a point canbe very different from the slope after the point. A better approach is to use a spline.The most commonly used splines, introduced by Catmull and Raphael (1974), arecubic splines. Given four points, it is possible to create a cubic polynomial thatpasses exactly through those points. Having thus fit the polynomial, it is possible toobtain the interpolated values for points in between. The idea behind Catmull andRaphael (1974) is to take four points that bracket a point (see Fig. 2.22), so that wetake successive quads of points as we march through the track.

Between the second and third point, the interpolated value f .t/ is given by:

f .t/ D 1

2.2a1 C.a2 �a1/� t C.�a3 C4a2 �5a1 C2a0/t2 C.a3 �3a2 C3a1 �a0/t3

(2.13)By sorting the values in a track and selecting four points at a time, it is then possibleto find intermediate values:

1 /��2 � D et erm i nes t h e y c o o r d i n a t e s f o r t h e g i v e n x2 by i n t e r p o l a t i n g3 � t h e s p l i n e c o n t r o l p o i n t s ( x1 , y1 ) . The c o n t r o l p o i n t s need t o

be s o r t e d i n x .4 � /5 publ i c s t a t i c double [ ] i n t e r p o l a t e ( double [ ] x1 , double [ ] y1 ,

double [ ] x2 ) f6 / / r e s u l t : i n i t i a l i z e a t lower �bound v a l u e7 i f ( x1 . l e n g t h == 0 ) return new double [ x2 . l e n g t h ] ;8 double [ ] y2 = new double [ x2 . l e n g t h ] ;9 for ( i n t i =0; i < y2 . l e n g t h ; ++ i ) f

10 y2 [ i ] = y1 [ 0 ] ;11 g1213 / / e v e r y i n t e r v a l i s p1 <= p2 <= p3 where p2 i s r e s a m p l i n g

p o s i t i o n14 double p3 = x2 [ 0 ] � 1 ; / / below f i r s t v a l u e15 for ( i n t i =0; i < x2 . l e n g t h ; ++ i ) f16 / / f i n d i n t e r v a l which c o n t a i n s p217 double p2 = x2 [ i ] ;18 i f ( p2 <= x1 [ 0 ] ) f y2 [ i ] = y1 [ 0 ] ; cont i nue ; g

2.5 Rasterization 59

19 i f ( p2 >= back ( x1 ) ) f y2 [ i ] = back ( y1 ) ; cont i nue ; g20 i n t j = 0 ;21 whi le ( j < ( i n t ) x1 . l e n g t h && p2 > x1 [ j ] ) f ++ j ; g22 �� j ; / / i f ( p2 < x1 [ j ] ) ��j ;2324 double p1 = x1 [ j ] ;25 p3 = x1 [ j + 1 ] ;2627 / / j and j +1 w i l l be i n bounds bu t j �1 and j +2 may no t be28 i n t j 1 = j �1; i f ( j 1 < 0) j 1 = 0 ;29 i n t j 2 = j +2; i f ( j 2 > ( x1 . l e n g t h �1) ) j 2 = x1 . l e n g t h �1;3031 / / s p l i n e32 double dx = 1 . 0 f / ( p3 � p1 ) ;33 double dx1 = 1 . 0 f / ( p3 � x1 [ j 1 ] ) ;34 double dx2 = 1 . 0 f / ( x1 [ j 2 ] � p1 ) ;35 double dy = ( y1 [ j +1] � y1 [ j ] ) � dx ;36 double yd1 = ( y1 [ j +1] � y1 [ j 1 ] ) � dx1 ;37 double yd2 = ( y1 [ j 2 ] � y1 [ j ] ) � dx2 ;38 double a0y = y1 [ j ] ;39 double a1y = yd1 ;40 double a2y = dx � ( 3�dy � 2�yd1 � yd2 ) ;41 double a3y = dx�dx�(�2�dy + yd1 + yd2 ) ;4243 / / c u b i c p o l y n o m i a l44 double x = p2 � p1 ;45 y2 [ i ] = ( ( a3y�x + a2y )�x + a1y )�x + a0y ;46 g47 return y2 ;48 g

2.5.4 Polygons

The final vector shape that we will consider rasterizing is a polygon. You mayfind it necessary to treat all points within a polygon similarly. For example, givencountries’ boundaries, you might want to create a spatial grid of states with andwithout standing armies.

Rasterizing a polygon boils down to a simple question: how do you determinewhether a grid point is inside or outside a given polygon?

The idea is to start outside the polygon and march along a row. Every time thepolygon’s edge is crossed, update a counter. At a pixel, if this counter is even,then the point is outside the polygon. If it is odd, then it is inside the polygon (seeFig. 2.23).

1 publ i c boolean c o n t a i n s ( double x , double y ) f2 i n t num xcros s i ng = 0 ;3 i n t num ycros s i ng = 0 ;4 for ( i n t i = 0 ; i < edges . s i z e ( ) ; ++ i ) f

60 2 Geospatial Grids

Fig. 2.23 Testing to see if apixel is inside a polygon

5 Double x i n t e r c e p t = edges . g e t ( i ) . g e t X I n t e r c e p t ( y ) ;6 Double y i n t e r c e p t = edges . g e t ( i ) . g e t Y I n t e r c e p t ( x ) ;7 i f ( y i n t e r c e p t != n u l l ) f8 i f ( y i n t e r c e p t >= y ) f9 ++ num ycros s i ng ;

10 g11 g12 i f ( x i n t e r c e p t != n u l l ) f13 i f ( x i n t e r c e p t >= x ) f14 ++ num xcros s i ng ;15 g16 g17 g18 / / odd number o f c r o s s i n g s means i n s i d e19 return ( ( num xcros s i ng % 2 == 1) && ( num ycros s i ng % 2 ==

1) ) ;20 g

with the intercepts determined as follows:

1 publ i c Double g e t Y I n t e r c e p t ( double x ) f2 i f ( ! i sB e t w een ( l a t 0 , x , l a t 1 ) ) f3 return n u l l ;4 g5 double y ;6 i f ( l a t 0 != l a t 1 ) f7 y = l on0 + ( x � l a t 0 ) � ( l on1 � l on0 ) / ( l a t 1 � l a t 0 ) ;8 g e l s e f9 y = ( l on1 + l on0 ) / 2 ;

10 g11 return y ;12 g

The technique described above is straightforward, but comes with a caveat. Itworks only if you can start at a point that is definitely outside the polygon. This maybe a problem if your domain contains partial polygons. The code above avoids thisproblem by using latitude and longitude as coordinates (instead of using a pixel’s

2.5 Rasterization 61

x; y coordinates within the spatial grid). Thus, by starting at a latitude or longitudethat is outside the realm of possibility, we avoid having to truncate polygons to thegrid and of dealing with incomplete polygons. However, this convenience comeswith a trade-off of having to deal with longitude wrapping around (i.e., �180 is thesame as 180). Since the wrap around happens at the international date line, in themiddle of the Pacific, most real-world applications are unaffected and do not needto explicitly deal with this. In the unlikely event that yours is affected, replace thelat and lon in the above code with x and y and work within the coordinate systemof your spatial grid.

2.5.5 Geocoding Polygons

Given a set of polygons, then, it is possible to find out if a pixel is inside any ofthese polygons. Since countries, states, postal codes, etc. are usually available aspolygons, this technique forms the basic approach for geocoding gridded spatialdata, i.e., determining the address of a particular pixel given its latitude andlongitude.

If we are going to be constantly checking whether a pixel is inside a particularpolygon or not, it is useful to optimize away the line intersection code for caseswhere it is obvious that the point cannot lie within the polygon. One way is tocompute the bounding box for a polygon and check the bounding box before movingon to the more precise code:

1 publ i c c l a s s Polygon f2 / / e t c .3 p r i v a t e BoundingBox boundingBox ;4 publ i c Polygon ( P o i n t [ ] v e r t i c e s ) f5 / / e t c .6 boundingBox = new BoundingBox ( v e r t i c e s ) ;7 g8 publ i c boolean c o n t a i n s ( double x , double y ) f9 / / as an o p t i m i z a t i o n , check t h e bounding box f i r s t

10 i f ( ! boundingBox . c o n t a i n s ( x , y ) ) f11 return f a l s e ;12 g13 / / normal check here . . .14 g

where the bounding box just stores the rectangular box that contains the polygon:

1 publ i c c l a s s BoundingBox f2 p r i v a t e double minx ;3 p r i v a t e double miny ;4 p r i v a t e double maxx ;5 p r i v a t e double maxy ;67 publ i c BoundingBox ( P o i n t [ ] v e r t i c e s ) f

62 2 Geospatial Grids

8 S c a l a r S t a t i s t i c l a t = new S c a l a r S t a t i s t i c ( ) ;9 S c a l a r S t a t i s t i c l on = new S c a l a r S t a t i s t i c ( ) ;

10 for ( i n t i =0; i < v e r t i c e s . l e n g t h ; ++ i ) f11 l a t . upda t e ( v e r t i c e s [ i ] . l a t ) ;12 l on . upda t e ( v e r t i c e s [ i ] . l on ) ;13 g14 maxx = l a t . getMax ( ) ;15 maxy = l on . getMax ( ) ;16 minx = l a t . getMin ( ) ;17 miny = l on . getMin ( ) ;18 g

and given any point, checks whether the point is inside the box:

1 publ i c boolean c o n t a i n s ( double x , double y ) f2 return ( x >= minx && x <= maxx && y >= miny && y <=

maxy ) ;3 g

Given two polygons, it is useful to get a bounding box that contains both of them:

1 publ i c void upda t e ( BoundingBox a ) f2 minx = Math . min ( minx , a . minx ) ;3 miny = Math . min ( miny , a . miny ) ;4 maxx = Math . max ( maxx , a . maxx ) ;5 maxy = Math . max ( maxy , a . maxy ) ;6 g

Now, given that a country’s boundaries are given as a set of polygons, it ispossible to find out if a latitude-longitude pair is within a country.9 Note that forefficiency, we maintain a bounding box of all the polygons that form the country:

1 publ i c c l a s s Count ry f2 publ i c f i n a l S t r i n g name ;3 publ i c f i n a l L i s t <Polygon> polygon ;4 p r i v a t e BoundingBox boundingBox ;56 publ i c Count ry ( S t r i n g name , L i s t <Polygon> polygon ) f7 t h i s . name = name ;8 t h i s . po lygon = polygon ;9 t h i s . boundingBox =

10 BoundingBox . copyOf ( polygon . g e t ( 0 ) . getBoundingBox ( ) ) ;11 for ( Polygon p : polygon ) f12 t h i s . boundingBox . upda t e ( p . getBoundingBox ( ) ) ;13 g14 g1516 publ i c boolean c o n t a i n s ( LatLon p t ) f17 i f ( t h i s . boundingBox . c o n t a i n s ( p t . g e t L a t ( ) ,

p t . ge tLon ( ) ) ) f

9CountryPolygons.javainedu.ou.asgbook.dataset

2.5 Rasterization 63

18 for ( Polygon p : polygon ) f19 i f ( p . c o n t a i n s ( p t . g e t L a t ( ) , p t . ge tLon ( ) ) ) f20 return true ;21 g22 g23 g24 return f a l s e ;25 g26 g

Given a file of country boundaries,10 we can obtain a list of Country objectsand use them to create a LatLonGrid. In the spatial grid, each pixel has the indexto the country that it belongs to:

1 publ i c s t a t i c LatLonGrid asL a t L onG r i d ( Count ry [ ] c o u n t r i e s ,double l a t r e s , double l o n r e s ) f

2 i n t nrows = ( i n t ) Math . round (180 / l a t r e s ) ;3 i n t n c o l s = ( i n t ) Math . round (360 / l o n r e s ) ;4 LatLon nwCorner = new LatLon (90 , �180) ;5 LatLonGrid r e s u l t = new LatLonGrid ( nrows , nco l s , �1,

nwCorner , l a t r e s , l o n r e s ) ;6 for ( i n t i =0; i < nrows ; ++ i ) f7 for ( i n t j =0; j < n c o l s ; ++ j ) f8 LatLon p t = r e s u l t . g e t L o c a t i o n ( i , j ) ;9 r e s u l t . s e t V a l u e ( i , j , r e s u l t . g e t M i s s i n g ( ) ) ;

10 for ( i n t c = 0 ; c < c o u n t r i e s . l e n g t h ; ++c ) f11 i f ( c o u n t r i e s [ c ] . c o n t a i n s ( p t ) ) f12 r e s u l t . s e t V a l u e ( i , j , c ) ;13 break ;14 g15 g16 g17 System . ou t . p r i n t l n ( ” row ” + i + ” computed . ” ) ;18 g19 return r e s u l t ;20 g

If this grid, shown in Fig. 2.24, is stored on disk, it can be used as a lookup tablefor geocoding pixels without any expensive computations. Now, given any location,which country it is part of is as simple as:

1 LatLon l o c = . . . ;2 Count ry [ ] c o u n t r i e s = . . . ; / / f rom b o u n d a r i e s f i l e3 LatLonGrid g r i d = . . . ; / / read g r i d w r i t t e n ou t4 i n t c o u n t r y = g r i d . ge t V a l ue ( l o c ) ;5 i f ( c o u n t r y >= 0) f6 System . ou t . p r i n t l n ( ” L o c a t i o n ” + l o c + ” i s i n ” +

c o u n t r i e s [ c o u n t r y ] ) ;7 g e l s e f8 System . ou t . p r i n t l n ( ” L o c a t i o n ” + l o c + ” i s unc l a i m ed ” ) ;9 g

10See data/countries/countries world.kml for an example

64 2 Geospatial Grids

Fig. 2.24 A lookup spatial grid for geocoding can be precomputed from a file containing countryboundaries. The colors in this figure are randomly assigned

2.6 Example Applications

The creation of a geospatial grid is a requisite preliminary step in many spatialanalysis applications. Often the human or environmental data to be analyzed arealready in gridded form and what needs to be done is to remap or reproject the datainto a desired projection. For example, Fraser et al. (2005) used satellite imagery todetect large-scale changes in forest cover. In order to do that, they had to correct thesatellite data for atmospheric effects, reproject the data to Lambert Conformal ConicProjection and apply quality control to remove any residual cloud contamination.The conic projection was used rather than Platte Carree because this study was overCanada which is close enough to the poles that a cylindrical equal latitude-longitudegrid would have introduced nasty distortions.

Similarly, in order to relate satellite-derived surface albedo to soils and rock typesover the desert regions of Africa and Arabia, Tsvetsinskaya et al. (2002) reprojectedall their data – satellite observations from the MODIS satellite, soil informationfrom the United Nations Food and Agricultural Organization (see Fig. 2.25) androck age and sediment data from the United States Geological Survey – into acommon Lambert Azimuthal Equal Area projection and a common resolution of1 km. This projection is most appropriate for the equatorial extent of their studydomain because their quantification requires low distortion of area measurments.

Sometimes, however, the data are not in gridded form. Instead, only pointmeasurements may be available. Then, it is necessary to interpolate these pointobservations into a spatial grid. For example, Kumar and Remadevi (2006) appliedKriging to interpolate groundwater levels measured at about 60 points (to measureground water levels, one needs to drill a well) in a canal basin to form a spatial grid.Different choices of variogram fitting models (spherical, exponential and gaussian)were tried and the Gaussian chosen. Then the interpolated ground water spatial gridsat different months were analyzed (see Fig. 2.26) to determine the change in waterlevel even at points where no observation well was located.

2.6 Example Applications 65

Fig. 2.25 Soil data from the United Nations Food and Agricultural Organization, reprojected toLambert Azimuthal Equal Area projection and resampled on a 1 km grid (Image from Tsvetsin-skaya et al. (2002))

Fig. 2.26 Groundwater level contours created by Kriging (Images from Kumar and Remadevi(2006))

The population density grids (SEDAC 2010) that we employed as examplesthroughout this section were the result of a polygon to grid transformation processas described in Diechmann et al. (2001). Population data are routinely collectedby censuses and compiled for political and administrative units such as countries,

66 2 Geospatial Grids

Fig. 2.27 Gridding population density from census data (Images from Diechmann et al. (2001))

provinces and districts. The population data grids were created by simply distribut-ing the population evenly within the highest resolution subunit into all the pixelsthat fell into that unit and by proportionally allocating data into a pixel if it coversmultiple subunits (see Fig. 2.27). Because censuses are carried at different timesin different parts of the world, population counts were adjusted for time using apopulation growth model.