10
Development of a pit filling algorithm for LiDAR canopy height models Joshua R. Ben-Arie a , Geoffrey J. Hay a, , Ryan P. Powers a , Guillermo Castilla a , Benoı ˆt St-Onge b a Department of Geography, University of Calgary, 2500 University Dr. NW, Calgary, AB, Canada T2N 1N4 b De´partementdeGe´ographie,Universite´duQue ´bec a ` Montre ´al, Case postale 8888, succursale Centre-ville, Montre ´al (Que´bec), Canada H3C 3P8 article info Article history: Received 7 June 2008 Received in revised form 25 January 2009 Accepted 20 February 2009 Keywords: LiDAR Data pits Noise removal Canopy height model Interactive data language (IDL) abstract LiDAR canopy height models (CHMs) can exhibit unnatural looking holes or pits, i.e., pixels with a much lower digital number than their immediate neighbors. These artifacts may be caused by a combination of factors, from data acquisition to post-processing, that not only result in a noisy appearance to the CHM but may also limit semi-automated tree-crown delineation and lead to errors in biomass estimates. We present a highly effective semi-automated pit filling algorithm that interactively detects data pits based on a simple user-defined threshold, and then fills them with a value derived from their neighborhood. We briefly describe this algorithm and its graphical user interface, and show its result in a LiDAR CHM populated with data pits. This method can be rapidly applied to any CHM with minimal user interaction. Visualization confirms that our method effectively and quickly removes data pits. Crown Copyright & 2009 Published by Elsevier Ltd. All rights reserved. 1. Introduction Light detection and ranging (LiDAR) is an active remote sensing technology that emits pulses of near infra-red light and records the backscatter, resulting in a three-dimensional (3D) point cloud. Once collected, this raw point cloud is typically filtered and classified into first and last returns. When captured over a forest environment, the first returns correspond to the energy echoed from the uppermost vegetation layer of a canopy. Once classified, these points are interpolated to a digital surface model (DSM) representing surface elevation above sea level. The last returns correspond to the last detectable signal when a pulse is intercepted by an opaque object, normally the ground (St-Onge et al., 2003). These classified points are interpolated into a digital terrain model (DTM), which represents bare terrain elevation above sea level. When the DTM is subtracted from the DSM, the result is a canopy height model (CHM), which represents absolute canopy height above the terrain surface. 1 As a result, LiDAR technology allows for large-area acquisition of unprecedented 3D structural forest information. However, such data are not without their challenges. Data pits are typically visible in raster CHMs as (apparently) randomly distributed dark holes that are digitally represented by exceptionally lower digital height values than their neighbors. It is believed that these artifacts are caused by a combination of factors, from data acquisition to post-processing, though no specific cause has been defined in the literature. It is important to distinguish between data pits and canopy gaps. Gaps are natural openings in a forest canopy that range in size (Spies and Franklin, 1989). However, canopy gaps are not simply bare earth; rather they are often populated with shrubs and saplings at different stages of growth. In fact, canopy gaps are expected when studying forests and are integral to a healthy and dynamic ecosystem (Whitmore, 1989). It is easy to visually discriminate small canopy gaps from data pits. Canopy gaps are asymmetrical, are generally composed of many pixels (even small canopy gaps), and appear ‘natural’ in a forest landscape. Pits exhibit none of these characteristics. In Fig. 1 , the large dark mass (image lower right) is a natural canopy gap, while the scattered small dark rectangles (more than one pixel) and squares (single pixels) represent data pits. In addition to lending a poor visual appearance to an image (Fig. 1), analyzing a pit-filled dataset may produce inaccurate biophysical and ecological measurements (elaborated below). The challenge is that not all pits are the same value, or differ the same amount in value relative to their neighbors, thus using a simple threshold to define pits does not work. If a group of contiguous pixels corresponding to a tree crown contains a pit consisting of a single pixel, that pixel value can still be higher than many non-pit pixels in other areas of the CHM. If one were to simply apply smoothing filters – common in remote sensing image processing software – to a CHM with data pits, pits could certainly be removed, or at least their ‘influence’ in the image visually reduced. However, based on current methods, ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences 0098-3004/$ - see front matter Crown Copyright & 2009 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2009.02.003 Corresponding author. E-mail address: [email protected] (G.J. Hay). 1 The common term digital elevation model (DEM), as used later in this paper, represents any type of digital model composed of elevation or height values, thus DTMs, DSMs and CHMs are specific types of DEMs. Computers & Geosciences 35 (2009) 1940–1949

Development of a pit filling algorithm for LiDAR canopy height models

Embed Size (px)

Citation preview

Page 1: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

Computers & Geosciences 35 (2009) 1940–1949

Contents lists available at ScienceDirect

Computers & Geosciences

0098-30

doi:10.1

� Corr

E-m1 Th

represe

DTMs, D

journal homepage: www.elsevier.com/locate/cageo

Development of a pit filling algorithm for LiDAR canopy height models

Joshua R. Ben-Arie a, Geoffrey J. Hay a,�, Ryan P. Powers a, Guillermo Castilla a, Benoıt St-Onge b

a Department of Geography, University of Calgary, 2500 University Dr. NW, Calgary, AB, Canada T2N 1N4b Departement de Geographie, Universite du Quebec a Montreal, Case postale 8888, succursale Centre-ville, Montreal (Quebec), Canada H3C 3P8

a r t i c l e i n f o

Article history:

Received 7 June 2008

Received in revised form

25 January 2009

Accepted 20 February 2009

Keywords:

LiDAR

Data pits

Noise removal

Canopy height model

Interactive data language (IDL)

04/$ - see front matter Crown Copyright & 2

016/j.cageo.2009.02.003

esponding author.

ail address: [email protected] (G.J. Hay).

e common term digital elevation model (DEM

nts any type of digital model composed of ele

SMs and CHMs are specific types of DEMs.

a b s t r a c t

LiDAR canopy height models (CHMs) can exhibit unnatural looking holes or pits, i.e., pixels with a much

lower digital number than their immediate neighbors. These artifacts may be caused by a combination

of factors, from data acquisition to post-processing, that not only result in a noisy appearance to the

CHM but may also limit semi-automated tree-crown delineation and lead to errors in biomass

estimates. We present a highly effective semi-automated pit filling algorithm that interactively detects

data pits based on a simple user-defined threshold, and then fills them with a value derived from their

neighborhood. We briefly describe this algorithm and its graphical user interface, and show its result in

a LiDAR CHM populated with data pits. This method can be rapidly applied to any CHM with minimal

user interaction. Visualization confirms that our method effectively and quickly removes data pits.

Crown Copyright & 2009 Published by Elsevier Ltd. All rights reserved.

1. Introduction

Light detection and ranging (LiDAR) is an active remote sensingtechnology that emits pulses of near infra-red light and recordsthe backscatter, resulting in a three-dimensional (3D) point cloud.Once collected, this raw point cloud is typically filtered andclassified into first and last returns. When captured over a forestenvironment, the first returns correspond to the energy echoedfrom the uppermost vegetation layer of a canopy. Once classified,these points are interpolated to a digital surface model (DSM)representing surface elevation above sea level. The last returnscorrespond to the last detectable signal when a pulse isintercepted by an opaque object, normally the ground (St-Ongeet al., 2003). These classified points are interpolated into a digital

terrain model (DTM), which represents bare terrain elevationabove sea level. When the DTM is subtracted from the DSM, theresult is a canopy height model (CHM), which represents absolutecanopy height above the terrain surface.1 As a result, LiDARtechnology allows for large-area acquisition of unprecedented 3Dstructural forest information. However, such data are not withouttheir challenges.

Data pits are typically visible in raster CHMs as (apparently)randomly distributed dark holes that are digitally represented byexceptionally lower digital height values than their neighbors. It is

009 Published by Elsevier Ltd. All

), as used later in this paper,

vation or height values, thus

believed that these artifacts are caused by a combination offactors, from data acquisition to post-processing, though nospecific cause has been defined in the literature.

It is important to distinguish between data pits and canopygaps. Gaps are natural openings in a forest canopy that range insize (Spies and Franklin, 1989). However, canopy gaps are notsimply bare earth; rather they are often populated with shrubsand saplings at different stages of growth. In fact, canopy gaps areexpected when studying forests and are integral to a healthy anddynamic ecosystem (Whitmore, 1989). It is easy to visuallydiscriminate small canopy gaps from data pits. Canopy gaps areasymmetrical, are generally composed of many pixels (even smallcanopy gaps), and appear ‘natural’ in a forest landscape. Pitsexhibit none of these characteristics. In Fig. 1, the large dark mass(image lower right) is a natural canopy gap, while the scatteredsmall dark rectangles (more than one pixel) and squares (singlepixels) represent data pits.

In addition to lending a poor visual appearance to an image(Fig. 1), analyzing a pit-filled dataset may produce inaccuratebiophysical and ecological measurements (elaborated below). Thechallenge is that not all pits are the same value, or differ the sameamount in value relative to their neighbors, thus using a simplethreshold to define pits does not work. If a group of contiguouspixels corresponding to a tree crown contains a pit consisting of asingle pixel, that pixel value can still be higher than many non-pitpixels in other areas of the CHM.

If one were to simply apply smoothing filters – common inremote sensing image processing software – to a CHM with datapits, pits could certainly be removed, or at least their ‘influence’ inthe image visually reduced. However, based on current methods,

rights reserved.

Page 2: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

Canopy height (m) 15 m

0 51

Fig. 1. Subset of Campbell River canopy height model (CHM). Comparing a canopy

gap approximately 10 m wide (lower right) to data pits (small dark rectangles and

squares).

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–1949 1941

all pixels in the dataset would also be altered, not just pits. Notonly would this result in a visual smoothing of the image –dependent on the size and type of smoothing filter selected – butit is also an inefficient use of LiDAR technology, as smoothingwould also alter the vertical-accuracy (i.e., height) of the image. AsHyyppa et al. (2000) reports, forest LiDAR data has an averagevertical standard error of approximately 22 cm (varying with thetechnology used). Thus, when considering LiDAR’s numerous andgrowing applications in forestry and landscape ecology (Dubayahand Drake, 2000; Hyde et al., 2005; Lefsky et al., 2002; Lim et al.,2003; Reutebuch et al., 2005; Hilker et al., 2008; St-Onge et al.,2008), it is imperative to obtain accurate CHMs, especially forindividual tree-crown delineation (Leckie et al., 2003) and treeheight estimation (Popescu and Wynne, 2004). Additionally, thepresence of pits will result in average canopy height under-estimation, which could lead to measurement errors of above-ground forest biomass and carbon sequestration.

The purpose of this study is to report on a pit filling method forLiDAR CHM’s and to evaluate its results by visually comparingthem to the effect common smoothing filters have on the samedatasets. As Wood and Fisher (1993) discuss, visual assessment ofCHMs is important. Indeed, it is an effective and often under-estimated method in assessing data quality. In the followingsections, we provide a more thorough review of data pits withinthe literature and discuss their potential causes; describe thestudy site and data; provide details on the methods developed;discuss the results; and provide a conclusion to our work.

2. Background

In this section, we briefly review pertinent literature related toLiDAR pits and discuss their possible causes.

2.1. Insight from the literature

There is a distinct paucity in the literature regarding data pitsin LiDAR digital elevation models (DEMs). Leckie et al. (2003)

states that these artifacts arise from ‘‘ground hits within a treecrown’’. Their pre-processing attempt to remove pits involvedoverlaying a 25 cm grid on a 3D point cloud and building a DSMfrom the highest hit in each cell. However, some pits remained,causing ‘‘artifacts in the surface model’’. Hyyppa et al. (2000)describe some pixels as ‘‘no data’’, and filled them ‘‘by usinginterpolation and a knowledge of near-by pixels.’’ The interpola-tion method is not given, neither is a definition of theseproblematic pixels. Kraus and Pfeifer (1998), in their influentialpaper on DTM production, encountered ‘‘big negative blunders’’,and it is relatively clear that data pits are their equivalent. These‘‘blunders’’ are not described in any detail, but they did ‘tweak’their technique to be more robust in the presence of this problem.

Zhang and Chen (2003), in attempting to remove non-groundLiDAR data points for DTM production, mention that some pointshave ‘‘large negative elevation values drastically lower than thoseof their neighbors’’. They also call this phenomenon ‘‘negativeblunders’’. To fix this problem, a morphological filter wassuggested, but not implemented or tested. They also mentionedthat the source of negative blunders is not definitively known.

MacMillan et al. (2003) attempted to fill pits in a LiDAR DTMby applying mean filters of varying sizes. They considered this asub-optimal approach and called for the development of a moreflexible DEM editor. There has been other research conducted onclosing pits or minimizing errors in LiDAR DTMs (Briese et al.,2002; Younan et al., 2002). However, these papers do notelaborate on the relative number of pits in the study area or theamount that pits drop in value relative to their neighbors, nor dothey explicitly define data pits or their causes. Additionally, thereis scarce mention in the literature specifically on pits occurring inLiDAR DEMs, let alone the appropriate filling of these pits. In fact,not a single paper was found devoted to the subject of data pits inLiDAR DEMs. The issue of data pits in LiDAR CHMs is far fromresolved.

2.2. Causes of data pits

While not specifically defined, it is probable that the cause(s)of data pits are intricately linked to LiDAR technology and the pre-processing of raw point-clouds into meaningful raster DEMs.Thus, the raison d’etre of data pits is complex and not fullyresolved. The process from laser scanning to raster modelproduction is a multi-step procedure uniting three differenttechnologies: (i) laser ranging system, (ii) inertial navigationsystem (INS) and (iii) differential global positioning system(DGPS). Because of the possible compounding effect of multiplefactors, it is difficult to measure the effect individual factors mayhave on pit formation.

Range error, platform attitude variation, position accuracy andtime misregistration can all contribute to errors in a LiDARdataset. Attitude errors arise from increasing the flying height andscan angle (Baltsavias, 1999). Time misregistration between thelaser instrument, INS and DGPS account for inaccurate 3Dpositioning (Baltsavias, 1999; Latypov, 2005). Position accuracy,mainly concerning the quality of DGPS processing, accounts formost of the intrinsic technological error associated with LiDAR use(Baltsavias, 1999; St-Onge et al., 2003). However, it is not clearwhether system errors produce data pits.

Leckie et al. (2003) postulate that data pits may form bycombining different flight line datasets. LiDAR spatial resolutionmay be affected or unevenly distributed by variation in aircraftattitude, scan pattern, structure of the canopy and terrain anddeflection of lost returns. Some areas can yield zero to more thanten pulse returns per square meter, resulting in a failure to recordmany treetops (Gaveau and Hill, 2003, St-Onge et al, 2003).

Page 3: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

Fig. 2. Study site: 3D CHM view of Campbell River study site, Vancouver Island, Canada. Dark areas represent ground level (including cut areas, roads and a river) and bright

areas represent vegetation.

Table 1LiDAR parameters.

Parameter Performance

Sensor Mark II

Laser scan frequency 25 Hz

Laser impulse frequency 40,000 Hz

Laser power o4 W

Maximum scan angle o201

Type of scanning mirror Oscillating

Laser beam divergence o0.5 milliradians

Measurement density 0.5–0.8 hits per sq. metre

Datum NAD 83

Projection UTM 10 N

Flight altitude above ground 900 m

Flight speed 25–30 ms�1

(1) Laplacian filter applied to theoriginal CHM (with pits)

-1 -1 -1 -1+8 -1 -1 -1 -1

LaplacianOperator

(5) Extract pit mask frommedian dataset

(3) Generate binary maskof Laplacian

(2) ThresholdLaplacian of CHM

(6) Fill pits in original CHM with filled pit mask values

(4) Median filter applied to Original CHM

Fig. 3. Pit filling algorithm.

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–19491942

Related to this is that the minimum detectable object depends onreflectivity, not size (Baltsavias, 1999).

Regarding the methods used in creating DEMs from LiDARpoint data, there is no standard procedure, and companiestypically use proprietary algorithms for this purpose. Not onlyare vegetation returns more complex to interpolate than groundreturns as St-Onge et al. (2003) discuss, but the precise pre-processing method is often glossed over, as seen in Mei andDurrieu (2004).

Published accounts of LiDAR pre-processing focus almostexclusively on DTM production, with very little on DSM or CHMproduction. For an example of this, see Briese et al. (2002), Sitholeand Vosselman (2004), Lee and Younan (2003) and Kraus andPfeifer (1998), although this last paper does express an interest inhoning the modeling of vegetation laser returns. Furthermore,pre-processing techniques depend on the nature of the topogra-phy, sampling density and the type of scanner used (Petzold et al.,1999). For these reasons, causes of data pits in LiDAR CHMs due topre-processing are difficult to assess. All we know is that theyvisibly exist. A comparison of different classification and inter-polation techniques for DSM production along with standard

recommendations in this area is highly desirable, and a good startis found in Hyyppa et al (2004), but is beyond the scope of thispaper. Though we do note that Kraus and Pfeifer (1998) developeda recognized method of pre-processing that takes into account askewed distribution of laser points on treetops, compared withthose on the ground. In determining the weight value for theiriterative linear prediction, it was found that one (out of three) of

Page 4: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–1949 1943

their methods was more robust in dealing with ‘‘negativeblunders’’.

When multiple echoes are returned in forested areas, informa-tion loss occurs when interpolating to a DSM, such as points withdifferent z values but similar x, y values (Axelsson, 1999). Becauseof this loss of information, Vosselmann and Maas (2001)recommends that only dense pulse returns should be interpo-lated. However, if datasets less than a certain pulse return densitywere interpolated, data pits may result.

Gaveau and Hill (2003) found that linearly interpolating aLiDAR point cloud to create a DSM resulted in data smoothing;and that smoothing increased if a larger pixel size was selected.This ‘inappropriate’ selection of a smaller pixel size may result indata pits. However, it is important to consider that modeling anoptimal spatial resolution (i.e., pixel size) for a forested area is nottrivial (Marceau et al., 1994; Marceau and Hay, 1999). Usinginverse distance weighting (IDW) to derive interpolated modelsfrom point-clouds may cause data pits, since IDW does notsmooth datasets as well as other interpolation methods such asKriging (Clark et al., 2004); however its results may be moreaccurate, as it models an area within a given distance of known

Fig. 4. GUI basic view: pit filling graphical user interface basic view appli

points. We note that IDW was applied to the Campbell Riverdataset used in this study, and many data pits are evident herein.

3. Data and study area

To evaluate the pit filling algorithm described in this study, aCHM from Vancouver Island, Canada is used. LiDAR dataacquisition took place on June 08, 2004 over a forested area10 km S.W. of Campbell River, on the east coast of VancouverIsland, Canada (Fig. 2). At this location, the terrain ranges from120 to 470 m above sea level with vegetation consisting of 80%Douglas-fir (Pseudotsuga menziesii), 17% Western red cedar (Thuja

plicata), 3% hemlock (Tsuga heterophylla) and small stands of redalder (Alnus rubra) in wetter areas. The understory is comprised ofsalal (Gaultheria shallon), oregon grape (Berberis nervosa), vanilla-leaf deer foot (Achlys triphylla) and various ferns and mosses(Morgenstern et al., 2004).

LiDAR data collection was conducted by Terra Remote Sensing(Sidney, British-Columbia, Canada). A small footprint LiDARinstrument mounted on a Bell 206 Jet Ranger helicopter achieved

ed to a 400� 400 pixel Campbell River CHM subset at 5% threshold.

Page 5: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–19491944

hit densities of 0.7 hits/m2 and a footprint (spot) size of 0.19 m(based on pulse frequency, lowest sustainable flight speed andaltitude). In processing the (raw) LiDAR point cloud, separation ofvegetation and terrain was carried out with TerraScan version004.006 (Terrasolid, Helsinki, Finland) using an iterative algo-rithm that combines filtering and thresholding methods (Krausand Pfeifer, 1998; Axelsson,1999). This resulted in a classificationof ground (last returns) and non-ground (first returns).

Subscene

250 m

3 x 3 Median

3 x 3 Mean

15 m

Fig. 5. Pit filling comparison: Campbell River CHM subscene—an example compari

More specifically, the DTM and DSM were produced byinterpolation of the appropriate classified return. That is, if morethan one return fell within a given pixel, only the lowest elevationvalue was retained in the case of the DTM and the highest value inthe case of the DSM. Pixels without LiDAR returns received a valueobtained using an inverse distance weighted interpolation bymeans of the four closest returns distributed in four quadrants(one per quadrant) around the empty pixel. The CHM was

Original

Crown

Pits

Gap

Pit filled

3 x 3 Gaussian

0 51

Canopy height (m)

ng pit filling (using a 3�3 median filter) to standard 3�3 smoothing filters.

Page 6: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–1949 1945

obtained by subtracting the DTM values from the DSM. A3000�3000 pixel subset of the original CHM is used in thisstudy. It has a spatial (pixel) resolution of 0.5 m, making the studyarea within the scene 1500 m�1500 m (2.25 km2). See Table 1 foradditional LiDAR acquisition metadata.

4. Methods

The pit filling algorithm and its related graphical user interface(GUI) are written in interactive data language (IDL version 6.2Win32�86, a product of ITT Visual Information Solutions). Thefollowing summary of the pit filling algorithm corresponds to theflowchart in Fig. 3. This is followed by a more detailed account ofkey components in Sections 4.1–4.3:

(1)

Fig.repr

A Laplacian (edge) operator is applied to the original CHM.

(2) A user-defined threshold is applied to the cumulative

histogram of the resulting Laplacian image.

(3) A binary mask is generated (from the Laplacian image) where

data pits equal one, and non-data pit pixels equal zero.

(4) A (3�3) median filter is then applied to the original CHM,

resulting in a pit-filled CHM.

(5) Pixels marked as data pits (one’s) in the binary mask are used

to extract the corresponding digital number (DN) from the‘median’ filled CHM.

(6)

These new median values are then used to replace (i.e., fill)the pit values in the original CHM.

4.1. Edge detection

In step 1, pits are identified and extracted from the originalCHM to create a binary ‘pit’ mask. However, a user cannot simplythreshold the original dataset to create a pit mask based on DN’s

6. Pit filling before and after. (A) Pre- and (B) post-pit filling x-axis profiles of C

esent DN values defined from dashed line in corresponding CHM image).

alone, since the canopy height values in the Campbell River CHMrange from 0 to 51 m, and data pits range from approximately0–28 m. Thus, selecting specific DN’s (or a range) from the originaldataset may exclude many data pits (omission error) and includemany non-data pits (commission error). To overcome this, aLaplacian edge detector was applied to the original dataset, as pitscan be recognized as abrupt changes (i.e., edges) in DN’s. Pits werethen defined by thresholding the cumulative histogram of theresulting edge image, where data pits reveal a narrower andexclusive range of DNs.

A Laplacian operator is an optimal edge detector for thisapplication, since it is not directionally biased. In addition,physiological research suggests that humans visually define edgessimilar to responses from Laplacian operators (Jensen, 2005),which is helpful as visual acuity plays an important role in thethreshold selection (elaborated in the following section). Theparticular Laplacian filter type used (shown in Fig. 3) was chosen,because it defined pits as large negative values, thus producing astark visual and quantitative contrast with non-data pits.

4.2. Threshold selection

Pit threshold selection was defined after much heuristicevaluation and is based on defining a percentage of thecumulative histogram of the Laplacian image. This is equivalentto specifying an expected percent of pits in the original dataset;which is preferable to using a fixed Laplacian value as a threshold,since it allows for explicit control of the percent of CHM pixelsthat have their DN altered by the pit filling procedure. Thisthreshold can be selected interactively using a pit filling thresholdbar GUI (Fig. 4), which allows the user to observe (and change) inreal-time the effect of varying thresholds in a subset of the CHM.

Visual evaluation of the output from different threshold valuesindicates that there is a relatively narrow range of values that

ampbell River CHM and their related images (x-axis locations in each histogram

Page 7: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–19491946

yield a visually satisfying result. Within this range, the thresholdwill include the most problematic pits, i.e., those with largedecreases in value relative to their neighbors. If the majority ofpits are visually captured in the threshold, omission errors (i.e., pitsnot included within the threshold range) are not a great concernas (i) they will barely (if at all) be visually noticeable and (ii) theyrepresent a very small percentage of total image pixels. Commis-

sion errors (i.e., non-pits defined as pits within the thresholdrange), are also not a great concern, as their numbers will be smalland their values will be replaced with the median value of their(non-pit) neighbors. Thus, they will tend to fall within the verticalresolution accuracy of the sensor anyway.

4.3. Pit filling

Once a suitable threshold has been selected, a binary ‘pit’ maskis generated where data pits equal one and all other pixels(defined as non-data pits) equal zero (Fig. 4 shows a subset of theCampbell River CHM and its associated binary ‘pit’ mask based ona 5% threshold). Pixels marked as pits subsequently have their DN

Fig. 7. GUI 3D view: pit filling graphical user interface 3D view applied

value replaced with the value obtained from the same location ina median-filtered CHM image. We have found that a singleapplication of this program provides very satisfactory results.However, if pits remain, the program can simply be run again atan equivalent or smaller threshold.

4.4. Application of the pit filling algorithm in IDL

In this study, the pit filling algorithm was applied to theCampbell River CHM using a 3�3 median filter. A threshold of 5%was heuristically determined to be suitable for this site. Thehardware used was an Intel Pentium(R) 4, 2.8 GHz CPU, with512 MB of RAM. In total, this program took 18.4 s to process the36 MB, 3000�3000 pixel CHM. In addition to the pit-filled CHM,the program also reports (i) minimum Laplacian value (ii)maximum Laplacian value, (iii) threshold Laplacian value, (iv)number of data pits defined by threshold, (v) number of negativevalues in filled dataset (all negative values are set to zero) and (vi)processing time in seconds.

to a 400� 400 pixel Campbell River CHM subset at 5% threshold.

Page 8: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–1949 1947

5. Results

5.1. Comparing the pit filling algorithm with smoothing filters

Fig. 5 visually compares the efficacy of the pit filling algorithmto three different 3�3 smoothing filters applied to a small sub-scene of the Campbell River CHM. As illustrated, pits areeffectively removed by the 3�3 median filter, but as a wholethe image appears blurred (i.e., over-smoothed) compared to theoriginal. The 3�3 mean filter does remove pits, but in-turnintroduces (larger) smoothed square artifacts along with anincreased smoothing to the scene—even more than thatgenerated by the median filter. The 3�3 Gaussian imageappears visually identical to the original; consequently, data pitsare still problematic. In examining the canopy gaps and crownshighlighted in Fig. 5, it appears that their shape is best maintainedin the pit-filled image. Not only does the pit filling algorithmeffectively remove pits, but it also preserves distinct edges such ascanopy gaps and canopy crowns.

Fig. 8. GUI expert view: pit filling graphical user interface expert view applied to a 40

5.2. x-Axis profile pre- and post-pit filling

Another technique to visually assess the efficacy of theproposed pit filling algorithm is illustrated in Fig. 6, whichcompares a subset of the Campbell River CHM pre- and post-pitfilling with its associated x-axis histogram. By observing theoriginal profile, data pits are easily visible—the most obvious ispointed out with an arrow. In contrast, this data pit is not found inthe pit-filled profile. One can also observe that aside from datapits, the x-axis profile structure remains unchanged between thetwo. In fact, even the structure of canopy gaps, as shown with acircle, remains essentially unchanged.

6. Discussion

Apart from the physical problems associated with pits,including poor visual appearance in LiDAR height models, andtheir potential to lead to the underestimation of related

0� 400 pixel Campbell River CHM subset at 5% threshold and four times zoom.

Page 9: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–19491948

biophysical measures, the conceptual problem with pits is thatthey are inconsistent with Tobler (1970) First law of Geography,a.k.a ‘‘everything is related to everything else, but near things aremore related than distant things’’. Pits violate this principle, sincethey are visually and quantitatively incongruous with theirsurroundings and do not accurately represent canopy height attheir location, or at immediate neighbors. In an effort to mitigatethese problems, pits need to be located and filled.

Based on our results, this pit filling algorithm is visuallysuperior to the tested smoothing filters for filling data pits whilepreserving the edges, shape and structure of canopy gaps andcanopy crowns. Only the 3�3 median filter is comparable inremoving pits; however, the resulting median image does notpreserve the height integrity or structural complexity of theoriginal CHM. The 3�3 mean and Gaussian filters are whollyinferior to the pit filling algorithm, in terms of both removing pitsand maintaining distinct canopy edges. Gaussian filters areineffective as data pits remain.

The 5% threshold evaluated for this dataset appears optimal,but it may not be so for other CHMs. To assist in the visualizationand threshold selection of the pit filling algorithm for otherscenes, the GUI can be used to evaluate pit filling at various spatialextents and thresholds (ranging from 1% to 30% of the cumulativeLaplacian histogram). In this example, the GUI uses valuesextracted from a median 3�3 filter to fill data pits (we note thatthe filter size can be user defined).

There are three GUI interfaces in the presented pit fillingprogram: (i) basic, (ii) 3D and (iii) expert, each of which provide astatistical summary of the pit-filled dataset such as minimum,maximum and mean values as well as the absolute number ofpixels filled. For the basic and expert options, a lookup tablepalette is available with user-defined coloring schemes to aid inthreshold selection—as different colors will represent differentheights in the canopy, providing insight into their origin (i.e., gapor pit). The basic interface (Fig. 4) displays the original image withpits, the pit-filled image and the binary pit mask (at the samespatial extent and window size), the latter two dynamicallyupdated at user-defined thresholds.

The 3D interface (Fig. 7) displays the original 2D image withdata pits plus 3D renderings of the original image and the pit-filled image. View direction, texture interpolation and elevationcan also be defined. The 3D image can also be rendered in real-time at various thresholds. The expert interface (Fig. 8) displaysthe original 2D image with pits at the largest spatial extent, whiletwo smaller, corresponding windows display a zoomed window ofthe pit-filled and binary pit mask images. All three interfacesprovide an option to save the pit-filled output in GeoTiff format,along with a text file of corresponding statistics.

7. Conclusion

LiDAR canopy height models often exhibit unnatural lookingholes or pits. While the exact cause of these pits is unknown, theymost likely represent a combination of factors ranging from dataacquisition to post-processing. The problem with these pits is thatnot only are they inconsistent with Tobler’s first law ofgeography—where near things are more related than distantthings—but they result in a visually noisy appearance to the CHM.They can also lead to an underestimation of biophysicalmeasurements, and skew ecological analyses based on theseheight models. In this paper, we present a robust semi-automatedpit filling algorithm (developed in IDL) and compare its results tothose generated from median, mean and Gaussian smoothingfilters typically found in remote sensing software. We brieflydescribe our algorithm, its graphical user interface, and the results

from applying it to a 3000�3000 pixel LiDAR CHM populatedwith data pits. Visualization confirms that our method effectivelyand quickly removes data pits. Future work will include theincorporation of a built in tiling function so that input size will notbe an issue. The IDL binary can be freely downloaded from: http://www.ucalgary.ca/f3gisci/software

Acknowledgments

This research has been generously supported in grants to Dr.Hay from the University of Calgary, the Alberta Ingenuity Fund,and the Natural Sciences and Engineering Research Council(NSERC). We also acknowledge an NSERC-BIOCAP research grantawarded to Dr. Benoıt St-Onge from which the original CampbellRiver data were provided and the problem defined. The opinionsexpressed here are those of the authors, and do not necessarilyreflect the views of their funding agencies. We also thank theanonymous reviewers for their constructive feedback.

References

Axelsson, P., 1999. Processing of laser scanner data—algorithms and applications.ISPRS Journal of Photogrammetry and Remote Sensing 54, 138–147.

Baltsavias, E.P., 1999. Airborne laser scanning: basic relations and formulas. ISPRSJournal of Photogrammetry and Remote Sensing 54, 199–214.

Briese, C., Pfeifer, N., Dorninger, P., 2002. Applications of the robust interpolationfor DTM determination. In: Proceedings International Archives of Photogram-metry and Remote Sensing 34. ISPRS symposium, Graz, Austria, pp. A-55–61.

Clark, M.L., Clark, D.B., Roberts, D.A., 2004. Small-footprint lidar estimation of sub-canopy elevation and tree height in a tropical rain forest landscape. RemoteSensing of Environment 91, 68–89.

Dubayah, R.O., Drake, J.B., 2000. Lidar remote sensing for forestry applications.Journal of Forestry 98, 44–46.

Gaveau, D.L.A., Hill, R.A., 2003. Quantifying canopy height underestimation bylaser pulse penetration in small-footprint airborne laser scanning data.Canadian Journal of Remote Sensing 29, 650–657.

Hilker, T., Wulder, M.A., Coops, N.C., 2008. Methods to extract and update forestinventory info using airborne LiDAR and high spatial resolution satelliteimagery. Canadian Journal of Remote Sensing 34, 5–12.

Hyde, P., Dubayah, R., Peterson, B., Blair, J.B., Hofton, M., Hunsaker, C., Knox, R.,Walker, W., 2005. Mapping forest structure for wildlife habitat analysis usingwaveform lidar: validation of montane ecosystems. Remote Sensing ofEnvironment 96, 427–437.

Hyyppa, J., Hyyppa, H., Litkey, P., Yu, X., Haggren, H., Ronnholm, P., Pyysalo, U.,Pitkanen, J., Maltamo, M., 2004. Algorithms and methods of airborne laserscanning for forest measurements. In: Proceedings International Archives ofthe Photogrammetry, Remote Sensing and Spatial Information Sciences 36.Freiburg, Germany, Part 8/W2, Session 4, pp. 82–89. /http://www.isprs.org/commission8/workshop_laser_forest/S.

Hyyppa, J., Pyysalo, U., Hyyppa, H., Samberg, A., 2000. Elevation accuracy of laserscanning-derived digital terrain and target models in forest environment. In:EARSEL eProceedingsLIDAR Workshop, Dresden, FRG, No. 1, pp. 139–147.

Jensen, J.R., 2005. Introductory Digital Image Processing: A Remote SensingPerspective, third ed. Pearson Prentice Hall, Upper Saddle River, NJ, 526p.

Kraus, K., Pfeifer, N., 1998. Determination of terrain models in wooded areas withairborne laser scanner data. ISPRS Journal of Photogrammetry and RemoteSensing 53, 193–203.

Latypov, D., 2005. Effects of laser beam alignment tolerance on lidar accuracy.ISPRS Journal of Photogrammetry and Remote Sensing 59, 361–368.

Leckie, D., Gougeon, F., Hill, D., Quinn, R., Armstrong, L., Shreenan, R., 2003.Combined high-density lidar and multispectral imagery for individual tree-crown analysis. Canadian Journal for Remote Sensing 29, 633–649.

Lee, H.S., Younan, N.H., 2003. DTM extraction of lidar returns via adaptiveprocessing. IEEE Transactions on Geoscience and Remote Sensing 41,2063–2069.

Lefsky, M.A., Cohen, W.B., Parker, G.G., Harding, D.J., 2002. Lidar remote sensing forecosystem studies. BioScience 52, 19–30.

Lim, K., Treitz, P., Wulder, M., St-Onge, B., Flood, M., 2003. Lidar remote sensing offorest structure. Progress in Physical Geography 27, 88–106.

MacMillan, R.A., Martin, T.C., Earle, T.J., McNabb, D.H., 2003. Automated analysisand classification of landforms using high-resolution digital elevation data:applications and issues. Canadian Journal of Remote Sensing 29, 592–606.

Marceau, D.J., Hay, G.J., 1999. Remote sensing contributions to the scale issue.Canadian Journal of Remote Sensing 25, 357–366.

Marceau, D.J., Gratton, D.J., Fournier, R.A., Fortin, J-P., 1994. Remote sensing and themeasurement of geographical entities in a forested environment. 2. Theoptimal spatial resolution. Remote Sensing of the Environment 49, 105–117.

Page 10: Development of a pit filling algorithm for LiDAR canopy height models

ARTICLE IN PRESS

J.R. Ben-Arie et al. / Computers & Geosciences 35 (2009) 1940–1949 1949

Mei, C., Durrieu, S., 2004. Tree-crown delineation from digital elevation modelsand high resolution imagery. In: Proceedings of the International Archives of thePhotogrammetry, Remote Sensing and Spatial Information Sciences 36, workinggroup VIII/2, Laser-Scanners for Forest and Landscape Assessment. Freiburg,Germany. /http://www.isprs.org/commission8/workshop_laser_forest/S.

Morgenstern, K., Black, T.A., Humphreys, E.R., Griffis, T.J., Drewitt, G.B., Cai, T., Nesic,Z., Spittlehouse, D.L., Livingston, N.J., 2004. Sensitivity and uncertainty of thecarbon balance of a Pacific Northwest Douglas-fir forest during an El Nino/LaNina cycle. Agricultural and Forest Meteorology 123, 201–219.

Petzold, B., Reiss, P., Stossel, W., 1999. Laser scanning—surveying and mappingagencies are using a new technique for the derivation of digital terrain models.ISPRS Journal of Photogrammetry and Remote Sensing 54, 95–104.

Popescu, S.C., Wynne, R.H., 2004. Seeing the trees in the forest: using lidar andmultispectral data fusion with local filtering and variable window size forestimating tree height. Photogrammetric Engineering and Remote Sensing 70,589–604.

Reutebuch, S.E., Anderson, H., McGaughey, R.J., 2005. Light detection and ranging(LIDAR): an emerging tool for multiple resource inventory. Journal of Forestry103, 286–292.

Sithole, G., Vosselman, G., 2004. Experimental comparison of filter algorithms forbare-earth extraction from airborne laser scanning point clouds. ISPRS Journalof Photogrammetry and Remote Sensing 59, 85–101.

Spies, T.A., Franklin, J.F., 1989. Gap characteristics and vegetation response inconiferous forests of the Pacific Northwest. Ecology 70, 543–545.

St-Onge, B., Hu, Y., Vega, C., 2008. Mapping the height and above-ground biomassof a mixed forest using lidar and stereo Ikonos images. International Journal ofRemote Sensing 29 (5), 1277–1294.

St-Onge, B., Treitz, P., Wulder, M.A., 2003. Tree and canopy height estimation withscanning lidar. In: Wulder, M.A., Franklin, S.E. (Eds.), Remote Sensing of ForestEnvironments: Concepts and Case Studies. Kluwer Academic Publishers,Boston, pp. 489–509.

Tobler, W.R., 1970. A computer movie simulating urban growth in the Detroitregion. Economic Geography 46, 234–240.

Vosselmann, G., Maas, H., 2001. Adjustment and filtering of raw laser altimetrydata. In: Proceedings of the OEEPE (The European Organisation for Experi-mental Photogrammetric Research) Workshop on Airborne Laser scanning andInterferometric SAR for Detailed Digital Terrain Models, Stockholm, Sweden.

Whitmore, T.C., 1989. Canopy gaps and the two major groups of forest trees.Ecology 70, 536–538.

Wood, J.D., Fisher, P.F., 1993. Assessing interpolation accuracy in elevation models.IEEE Computer Graphics and Applications 13, 48–56.

Younan, N.H., Lee, H.S., King, R.L., 2002. DTM error minimization via adaptivesmoothing. In: Proceedings IEEE Geoscience and Remote SensingSymposium, vol. 6, Toronto, Canada, pp. 3611–3613. doi:10.1109/IGARSS.2002.1027266.

Zhang, K., Chen, S., 2003. A progressive morphological filter for removing non-ground measurements from airborne lidar data. IEEE Transactions onGeoscience and Remote Sensing 41, 872–882.