10
A statistical estimator to propagate height prediction errors into a general volume model Mathieu Fortin and Josianne DeBlois Abstract: In most large-scale inventories, volume models rely on predicted heights instead of measured heights. The effect of height predictions over volume estimate uncertainty is therefore a crucial issue. In this study, we designed a statistical estimator that enables the propagation of tree height prediction errors into a general volume model. The proposed estimator relies on the assumption that both the volume model and the height–diameter model that provides the height predictions do not exhibit any lack of fit. This estimator was tested against a Monte Carlo simulation for 90 plots from three typical forest types in Quebec, Canada (hardwood, mixedwood, and softwood). The results show that the estimator provides plot volume predictions with error variances that fit those estimated using the Monte Carlo simulation. The estimator developed in this study may be useful to quickly provide the uncertainty associated with plot volume predictions in large-scale inven- tories. Re ´sume ´: Dans la plupart des inventaires a ` grande e ´chelle, les mode `les de volume sont base ´s sur des hauteurs pre ´dites pluto ˆt que mesure ´es. L’effet des pre ´visions de la hauteur sur l’incertitude des estimations du volume a donc une grande importance. Dans cette e ´tude, nous avons formule ´ un estimateur statistique qui permet la propagation des erreurs de pre ´vi- sion de la hauteur des arbres dans un mode `le ge ´ne ´ral de volume. L’estimateur propose ´ est base ´ sur l’hypothe `se qu’il n’y a aucun manque d’ajustement du mode `le de volume et de la relation hauteur–diame `tre qui fournit les estimations de la hau- teur. Cet estimateur a e ´te ´ teste ´a ` l’aide d’une simulation de Monte Carlo pour 90 parcelles provenant de trois types fores- tiers typiques (feuillu, mixte et re ´sineux) du Que ´bec, au Canada. Les re ´sultats indiquent que l’estimateur fournit des pre ´visions du volume des parcelles dont la variance de l’erreur est similaire a ` celle qui a e ´te ´ estime ´e par la simulation de Monte Carlo. L’estimateur qui a e ´te ´ mis au point dans cette e ´tude peut e ˆtre utile pour fournir rapidement l’incertitude as- socie ´e aux pre ´visions du volume des parcelles des inventaires a ` grande e ´chelle. [Traduit par la Re ´daction] Introduction The availability of commercial wood volumes is central to forest management in most jurisdictions. However, in prac- tice, measuring stem volume is almost impossible without felling the tree. In fact, this variable is usually predicted through a general volume model or a stem profile model (sometimes referred to as a stem taper model). The statisti- cal issues related to volume estimation have been addressed in many papers in the forestry literature (e.g., Sharma and Oderwald 2001; Zhang et al. 2002; Jordan et al. 2005; For- tin et al. 2007). In the province of Quebec, Canada, tree and plot volume predictions rely on general volume models based on diame- ter at breast height (DBH) (1.3 m) and tree height. Perron (1985) and Fortin et al. (2007) provide two examples of general merchantable (DBH 9.1 cm) volume models that are currently in use. The main concern for such volume models is that tree height is usually not available for all trees in a particular plot. In fact, height is measured on a subsample of trees because measuring this variable is also time consuming and may be particularly difficult to obtain in closed canopy stands. This subsample serves to fit models of the height–diameter (HD) relationships, which in turn are used to predict heights of unsampled trees. Predicted heights are then reinserted in the volume model to obtain a volume prediction for every tree in a particular plot. The plot vol- ume is finally predicted as the sum of the individual tree volume predictions. In the province of Quebec, the uncer- tainty associated with such plot volume predictions is needed to calculate the annual allowable cut volumes. In forest inventories, the plots represent samples of larger entities, which can be a stand or a stratum. As individual tree volume predictions are summed to obtain predictions of plot volumes, which in turn are averaged to obtain an esti- mate of the stand volume, several sources of error exist. At the stand or the stratum level, three main sources of errors are usually reported: measurement errors, sampling errors, and prediction errors. The relative contribution of these sources of error to the uncertainty of stand variable predic- Received 22 September 2009. Accepted 29 April 2010. Published on the NRC Research Press Web site at cjfr.nrc.ca on 3 September 2010. M. Fortin 1,2 and J. DeBlois. Direction de la recherche forestie `re, Ministe `re des Ressources naturelles et de la Faune du Que ´bec, 2700 Einstein, Que ´bec, QC G1P 3W8, Canada. 1 Corresponding author (e-mail: [email protected]). 2 Present address: Institut national de recherche agronomique (INRA), 54280 Champenoux, France. 1930 Can. J. For. Res. 40: 1930–1939 (2010) doi:10.1139/X10-107 Published by NRC Research Press

A statistical estimator to propagate height prediction errors into a general volume model

Embed Size (px)

Citation preview

A statistical estimator to propagate heightprediction errors into a general volume model

Mathieu Fortin and Josianne DeBlois

Abstract: In most large-scale inventories, volume models rely on predicted heights instead of measured heights. The effectof height predictions over volume estimate uncertainty is therefore a crucial issue. In this study, we designed a statisticalestimator that enables the propagation of tree height prediction errors into a general volume model. The proposed estimatorrelies on the assumption that both the volume model and the height–diameter model that provides the height predictionsdo not exhibit any lack of fit. This estimator was tested against a Monte Carlo simulation for 90 plots from three typicalforest types in Quebec, Canada (hardwood, mixedwood, and softwood). The results show that the estimator provides plotvolume predictions with error variances that fit those estimated using the Monte Carlo simulation. The estimator developedin this study may be useful to quickly provide the uncertainty associated with plot volume predictions in large-scale inven-tories.

Resume : Dans la plupart des inventaires a grande echelle, les modeles de volume sont bases sur des hauteurs preditesplutot que mesurees. L’effet des previsions de la hauteur sur l’incertitude des estimations du volume a donc une grandeimportance. Dans cette etude, nous avons formule un estimateur statistique qui permet la propagation des erreurs de previ-sion de la hauteur des arbres dans un modele general de volume. L’estimateur propose est base sur l’hypothese qu’il n’y aaucun manque d’ajustement du modele de volume et de la relation hauteur–diametre qui fournit les estimations de la hau-teur. Cet estimateur a ete teste a l’aide d’une simulation de Monte Carlo pour 90 parcelles provenant de trois types fores-tiers typiques (feuillu, mixte et resineux) du Quebec, au Canada. Les resultats indiquent que l’estimateur fournit desprevisions du volume des parcelles dont la variance de l’erreur est similaire a celle qui a ete estimee par la simulation deMonte Carlo. L’estimateur qui a ete mis au point dans cette etude peut etre utile pour fournir rapidement l’incertitude as-sociee aux previsions du volume des parcelles des inventaires a grande echelle.

[Traduit par la Redaction]

Introduction

The availability of commercial wood volumes is central toforest management in most jurisdictions. However, in prac-tice, measuring stem volume is almost impossible withoutfelling the tree. In fact, this variable is usually predictedthrough a general volume model or a stem profile model(sometimes referred to as a stem taper model). The statisti-cal issues related to volume estimation have been addressedin many papers in the forestry literature (e.g., Sharma andOderwald 2001; Zhang et al. 2002; Jordan et al. 2005; For-tin et al. 2007).

In the province of Quebec, Canada, tree and plot volumepredictions rely on general volume models based on diame-ter at breast height (DBH) (1.3 m) and tree height. Perron(1985) and Fortin et al. (2007) provide two examples ofgeneral merchantable (DBH ‡ 9.1 cm) volume models thatare currently in use. The main concern for such volumemodels is that tree height is usually not available for alltrees in a particular plot. In fact, height is measured on a

subsample of trees because measuring this variable is alsotime consuming and may be particularly difficult to obtainin closed canopy stands. This subsample serves to fit modelsof the height–diameter (HD) relationships, which in turn areused to predict heights of unsampled trees. Predicted heightsare then reinserted in the volume model to obtain a volumeprediction for every tree in a particular plot. The plot vol-ume is finally predicted as the sum of the individual treevolume predictions. In the province of Quebec, the uncer-tainty associated with such plot volume predictions isneeded to calculate the annual allowable cut volumes.

In forest inventories, the plots represent samples of largerentities, which can be a stand or a stratum. As individualtree volume predictions are summed to obtain predictions ofplot volumes, which in turn are averaged to obtain an esti-mate of the stand volume, several sources of error exist. Atthe stand or the stratum level, three main sources of errorsare usually reported: measurement errors, sampling errors,and prediction errors. The relative contribution of thesesources of error to the uncertainty of stand variable predic-

Received 22 September 2009. Accepted 29 April 2010. Published on the NRC Research Press Web site at cjfr.nrc.ca on 3 September2010.

M. Fortin1,2 and J. DeBlois. Direction de la recherche forestiere, Ministere des Ressources naturelles et de la Faune du Quebec, 2700Einstein, Quebec, QC G1P 3W8, Canada.

1Corresponding author (e-mail: [email protected]).2Present address: Institut national de recherche agronomique (INRA), 54280 Champenoux, France.

1930

Can. J. For. Res. 40: 1930–1939 (2010) doi:10.1139/X10-107 Published by NRC Research Press

tions and growth projections has been addressed by manyauthors (e.g., Mowrer and Frayer 1986; Gertner 1990;Mowrer 1991; McRoberts et al. 1994; Parysow et al. 2000;Makinen et al. 2009). It is usually assessed through MonteCarlo simulations, although other methods based on first-or-der Taylor approximations or orthogonal polynomials exist(e.g., Mowrer and Frayer 1986; Mowrer 1991; Parysow etal. 2000). To our knowledge, none of these studies have fo-cused on plot volume predictions and the possible impact ofheight prediction errors over the uncertainty associated withthese plot volume predictions. Lappi (2006) already reporteda significant increase in the variance of tree volume predic-tions due to the propagation of height prediction errors.However, between the tree and the stand remains the plotlevel, for which no information is available about volumeprediction uncertainty.

Even if we assume that species identification and diame-ter measurements are error free, uncertainty assessment forplot volume predictions remains complex because it in-volves two different prediction errors, which are associatedwith height and volume predictions, respectively. These twoprediction errors are closely related to each other becausethe general volume model uses predicted heights instead ofobserved ones. As a result, the errors associated with thepredicted heights propagate throughout the volume predic-tions. Moreover, height or volume prediction errors mayalso be correlated among the trees of a particular plot (e.g.,Robinson and Wykoff 2004; Fortin et al. 2007).

Should the model of the HD relationships solely rely onDBH, it would be simpler to refit the volume model usingonly DBH as a predictor. Theoretically, such a model wouldthen produce consistent error variances for the volume pre-dictions. However, models of HD relationships are often fit-ted using more extensive data sets. These include variablesthat are traditionally not available for volume modelling butthat are significant tree height predictors. In such cases, re-fitting a volume model using DBH alone would result in aloss of precision.

On the other hand, propagating the errors associated withheight predictions throughout the volume model is not triv-ial. This error propagation can be assessed through MonteCarlo simulations (Efron and Tibshirani 1993), which consistin simulating random errors with theoretical distributions inan iterative manner to obtain a sample of realizations for thevariable being examined. An estimate and its variance arethen obtained from the sample of Monte Carlo realizations.Monte Carlo simulations have been used extensively in for-estry for sensitivity analyses and error propagation studies(e.g., Mowrer and Frayer 1986; Gertner 1987, 1990).

From a practical standpoint, performing a Monte Carlosimulation for every plot in a national or even a regional in-ventory may be either impossible or too demanding in termsof computer resources. For instance, the provincial inventoryin Quebec encompasses nearly 100 000 sample plots (F.Labbe, personal communication (2009)). Assuming that aMonte Carlo simulation would yield a plot volume predic-tion and its uncertainty in a period of 10 s per plot meansthat the whole process would last over 11 days. Repeatedlyrecompiling such an inventory because of error corrections,new plot measurements, and changing demands from stake-holders is tedious. Using an analytical estimator (an estima-

tor that would use algebra instead of iterative simulations)derived from both the model of HD relationships and thegeneral volume model could prove an efficient alternativeto the Monte Carlo simulations.

The purpose of this study was to design such an analyticalestimator for plot volume predictions. To do so, we used themodel of HD relationships and the general volume modelthat are available in Fortin et al. (2007, 2009). We firstpresent the statistical development underlying this estimator.Then, the estimated variances produced by the estimatorwere tested with a case study. Ninety sample plots were se-lected in three typical forest types (30 plots by forest type)in Quebec. The plot volume predictions and their variancesproduced by the estimator were compared with those ob-tained through Monte Carlo simulations.

Statistical developments

Volume modelFor a constant notation, let us define i and j as the plot

and the tree indices, respectively, such that i = 1, 2, ..., nand j = 1, 2, ..., pi, where n is the total number of plots andpi is the number of trees in plot i.

The general volume model in Fortin et al. (2007) is in-spired from Assman (1961). It encompasses 26 species thatare represented by subscript mvol such that mvol = 1, 2, ...,26. Also, let us define subscript l as the species group indexsuch that l = 1 for coniferous species and l = 2 for broadleafspecies. The model can be expressed as

½1a� volij ¼ �b1

hij

DBHij

þ ðb2;mvolþ b3;mvol

DBHijÞp� DBH2

ijhij

40

þ DBH2ijuvol;il þ 3vol;ij

½1b� uvol;i ¼uvol;i1

uvol;i2

" #� N2ð0;GvolÞ

½1c� 3vol;ij � Nð0; s2vol;mvol

DBH4ijÞ

where volij is the volume of tree j in plot i (dm3), hij is thetree height (m), DBHij is the diameter at breast height mea-sured at 1.3 m (cm), p is the well-known mathematical con-stant 3.14159..., b1, b2;mvol

, and b3;mvolare fixed-effect

parameters, uvol,il is a plot random effect, and 3vol,ij is a tree-level residual error term whose variance is conditional onthe species and the fourth power of DBH. This fourth powerof DBH is required to account for the heteroscedastic patternof the variances, which was detected during the fitting ofthis model (Fortin et al. 2007).

Note that the correlations among observations of the treesin a particular plot reduce to plot random effects in thismodel. Once the random effects are accounted for, the cor-relations are assumed to be negligible, and therefore, thetree-level residual error terms as shown in eq. 1c are as-sumed to be independent.

Fortin and DeBlois 1931

Published by NRC Research Press

The tree-level residual error term and the vector of plotrandom effects are assumed to be independent from eachother, as in most mixed models (Pinheiro and Bates 2000).Note that the plot random effect (uvol,il) is specific to thespecies group. The original model in Fortin et al. (2007) in-cluded a random effect associated with the cruise line, butfor the sake of the example, this random effect was mergedwith the plot random effect. The model was fitted to a dataset that encompassed 22 170 stems located in 5016 plots dis-tributed across the entire range of the commercial forest inQuebec.

Model of the height–diameter relationshipsThe model of the HD relationships was fitted to another

data set characterized by repeated tree measurements overtime for 20 major species in Quebec. This extensive dataset encompassed 11 215 plots for a total of 144 262 heightobservations. Note that there is no common observation inthis data set and the one that served for the fitting of thevolume model.

The model involves many explanatory variables: treeDBH, tree species, 1971–2000 mean annual temperature,plot basal area, subecological division, plot drainage, socialstatus index, and the occurrence of human-made or naturaldisturbance. The model includes a species-specific plot ran-dom effect that takes the spatial correlation into account,i.e., the fact that many trees of the same species may havebeen measured in the same plot. The model also includes acovariance structure, which accounts for the correlations be-tween the measurements of the same tree (Fortin et al.2009).

In the context of this study, we assumed that the estimatorwas to be used with nonrepeated plot measurements (i.e., theplots were to be measured once and only once). Conse-quently, we included the plot random effect in the estimator,but the covariance structure was omitted because the treeswere to be measured only once.

For this model, let us define mht as the species index suchthat mht = 1, 2, ..., 20. In matrix notation, the model can beexpressed as

½2a� hij ¼ 1:3þ xht;ijgþ zht;ijuht;i þ 3ht;ij

½2b� uht;i ¼

uht;i1

uht;i2

:::

uht;i20

266664377775 � N20ð0;GhtÞ

½2c� 3ht;ij � Nð0; s2ht;lÞ

where xht,ij is a vector of explanatory variables for tree j inplot i, g is a vector of fixed-effect parameters, zht,ij is the de-sign vector for the plot random effects, uht,i is a vector ofspecies-specific plot random effects corresponding to uht;imht

,and 3ht,ij is a tree-level residual error term whose variance isdifferent across the species group (coniferous versus broad-leaf species). The vector of plot random effects and the vec-tor of tree-level residual error terms are assumed to be

independent from each other. For a complete description ofthis model, see Fortin et al. (2009).

Note that the index mht differs from index mvol by its size.However, the 20 species in the model of HD relationshipsare all considered in the volume model. In other words, thevolume model is slightly more comprehensive in terms ofspecies than is the model of HD relationships. As a result,species imputation is sometimes necessary during the com-pilation of the provincial inventory for some scarce speciesthat are not considered in the model of HD relationships.For the sake of the example, the effect of this imputation isnot considered in this study.

Error propagationFor error propagation, let us assume that both models do

not exhibit any lack of fit and that both vectors of parame-ters (b and g) are known . For simplification, let us also ex-press model 1a as

½3� volij ¼ xvol;ijbþ zvol;ijuvol;i þ 3vol;ij

where xvol,ij is a vector of explanatory variables, b is a vec-tor of model parameters, and zvol,ij is the design vector forthe plot random effects.

Predicted heights are commonly substituted for observedones during volume estimation. From model 2a, let bhij bethe predicted height for tree j in plot i such thatbhij ¼ 1:3þ xht;ijbg. Substituting bhij for hij in model 1a andpropagating its associated error yields

½4a� volij ¼ ~xvol;ijbþ zvol;ijuvol;i þ 3vol;ij

þ aijbðzht;ijuht;i þ 3ht;ijÞ

½4b� aijb ¼�b1

DBHij

þ ðb2;mvolþ b3;mvol

DBHijÞp� DBH2

ij

40

where ~xvol;ij corresponds to xvol,ij computed with the pre-dicted height instead of the observed one. Termaijbðzht;ijuht;i þ 3ht;ijÞ can be considered as an additional erroron tree volume predictions, which is induced by the use ofpredicted heights. Note that eq. 4b is obtained by factoringhij from model 1a.

Following the previous developments, the variance–cova-riance matrix of the within-plot volume prediction errorsbased on predicted heights (~Vvol;i) can be estimated as

½5� var ðvoli � ~Xvol;ibÞ ¼ Vvol;i þ diagðAibÞ � Vht;i

� diagðAibÞ � ~Vvol;i

where voli, ~Xvol;i, and Ai are matrices and vectors whoserows are, respectively, the volij, ~xvol;ij, and aij. MatricesVvol,i and Vht,i correspond to the variance of the volume andheight prediction errors, respectively. From models 2a and3, these matrices can be computed as

½6a� Vvol;i ¼ Zvol;iGvolZTvol;i þ Rvol;i

½6b� Vht;i ¼ Zht;iGhtZTht;i þ Rht;i

1932 Can. J. For. Res. Vol. 40, 2010

Published by NRC Research Press

where Zvol,i and Zht,i are two design matrices with typicalrows zvol,ij and zht,ij, respectively, and Rvol,i and Rht,i are twopi � pi diagonal matrices whose elements are the variancesof the tree-level residual error terms as defined in eqs. 1cand 2c.

Note that eq. 5 assumes that the model parameters in-cluded in vectors b and g are known. In practice, however,these vectors are estimated. Under the assumption that thetree-level residual errors and the random effects are nor-mally distributed in both models, the parameter estimatesfollow multivariate normal (MVN) distributions (Johnstonand Dinardo 1997, p. 87). These MVN distributions can beexpressed as bb � MVNðb;UvolÞ and bg � MVNðg;UhtÞ, andusing matrix and mathematical expectation properties (seeJohnston and Dinardo 1997 and Appendices A and B), theadditional variance components can be implemented as

½7� ~Vvol;i ¼ Vvol;i þ ~Xvol;iUvol~X

T

vol;i

þ diagðAibbÞðVht;i þ Xht;iUhtX

Tht;iÞdiagðAi

bbÞþ ðAiUvolA

Ti Þ � ðVht;i þ Xht;iUhtX

Tht;iÞ

where the symbol ‘‘�’’ denotes the entrywise product (alsoreferred to as the Hadamard product) between two matrices,i.e., the product of the corresponding elements.

Term Vvol;i þ ~Xvol;iUvol~XT

vol;i in eq. 7 represents the basicvariance, as if tree height predictions were not consideredas predictions but rather as observed heights. TermdiagðAi

bbÞðVht;i þ Xht;iUhtXTht;iÞdiagðAi

bbÞ is the variancecomponent induced by the use of predicted heights. TermðAiUvolA

Ti Þ � ðVht;i þ Xht;iUhtX

Tht;iÞ is a hybrid variance com-

ponent, i.e., a variance component that is due to both vol-ume prediction and the use of predicted heights.Henceforth, we will refer to those three terms as the basicvolume component, the predicted height-induced compo-nent, and the hybrid component, respectively. The develop-ments behind eq. 7 are annexed to this paper.

Note that the analytical estimator 7 takes into account theerrors on the parameter estimates (through Uht and Uvol) andthe plot random effects and the tree-level residual errors(through Vht,i and Vvol,i) in both the model of HD relation-ships and the volume model. Measurement errors, such aserrors on species identifications and imputations or DBHmeasurements, are assumed to be negligible.

Plot volume predictions and variance of predictionerrors

The volume for plot i, denoted yi, is commonly estimatedas the sum of the individual predicted volumes in plot i, i.e.,

byi ¼Xpij¼1

~xvol;ijbb. Because the individual volume prediction

errors follow normal distributions, the prediction error ofthe plot volume is normally distributed, with its variance es-timated as the sum of the elements in matrix ~Vvol;i, i.e.,var ðyi � byiÞ ¼ sumð~Vvol;iÞ where sum(�) is the sum of the el-ements of the matrix argument. The covariances between thetrees of plot i are represented by the off-diagonal elementsin matrix ~Vvol;i. By summing all of the elements of ~Vvol;i,

the estimated variance accounts for the correlations amongobservations of the trees in plot i.

The estimator was implemented in software using theJava object-oriented programming language. This softwareis based on the Java runtime environment 6 (JRE 6) (avail-able at java.sun.com/javase/downloads/index.jsp).

Case study

SimulationPlot volumes are rarely observed, except in small-scale

studies, because trees have to be felled. Moreover, a cross-validation was impossible because the models had been fit-ted to two independent data sets having different samplingdesigns. For instance, many explanatory variables requiredin the model of HD relationships were not available in thedata set that served to fit the volume model. Consequently,we had no data set to serve as a benchmark to test the ana-lytical estimator. On the other hand, if both the model ofHD relationships and the volume model are assumed to ex-hibit no lack of fit, the proposed estimator can be testedthrough a Monte Carlo simulation. The process consists insimulating the prediction errors for both the model of HDrelationships and the volume model for each tree a largenumber of times such that it reproduces the variability inher-ent to plot volume estimation (Efron and Tibshirani 1993).

In this case study, three types of errors were simulated forboth the model of HD relationships and the volume model:the errors in the parameter estimates, the plot random ef-fects, and the tree-level residual errors. According to the dis-tributional assumptions, these errors follow MVNdistributions. For instance, the errors in the parameter esti-mates of the model of HD relationships and the volumemodel follow MVN distributions, with means 0 and cova-riances Uht and Uvol, respectively. The tree-level residual er-rors and the plot random effects of both models can also beassumed to follow joint MVN distributions with means 0 inboth cases and covariances Vht,i and Vvol,i, respectively. Allthree sources of errors were assumed to be independent ofeach other. In other words, the errors in the parameter esti-mates were simulated independently from the random ef-fects and the tree-level residual errors.

In practice, the covariance matrices are unknown, andconsequently, they are replaced by their respective best esti-mates, which were obtained by maximum likelihood estima-tion (see Fortin et al. 2007, 2009). A Choleskydecomposition was then used to draw random MVN deviatesfrom such MVN distributions. An example is annexed tothis paper.

Once the simulation is over, a plot volume prediction andits variance can be obtained from the realizations of the plotvolumes, i.e., the simulated distribution of plot volumes.Normally, a large number of runs (i.e., 10 000 or 50 000) isrequired to ensure a proper simulated distribution. Once theplot volume prediction and its variance are obtained, theycan be compared with those calculated through the analyti-cal estimator. The analytical estimator also makes it possibleto assess the magnitude of the three variance components,namely the basic volume, predicted height-induced, and hy-brid components.

Fortin and DeBlois 1933

Published by NRC Research Press

DataTo test the analytical estimator, three typical Quebec for-

est types were first selected: sugar maple (Acer saccharumMarsh.) – yellow birch (Betula alleghaniensis Britt.), yellowbirch – balsam fir (Abies balsamea (L.)), and balsam fir –black spruce (Picea mariana (Mill.) BSP). These foresttypes refer to the stand composition when it reaches a late-successional stage. The forest type is assumed to be constantin time for a particular plot. These forest types are describedin Saucier et al. (1998).

For simplification, we will refer to the aforementionedforest types as hardwood, mixedwood, and softwood types,respectively. The hardwood and the softwood forest typesare mostly dominated (more than 75% of total crown cover)by species that belong to the broadleaf and coniferousgroups, respectively. Also, the mixedwood forest type com-prises a mix of species that belong to both species groups inproportions that range from 25% to 75% of total crowncover.

Thirty plots were selected from the provincial network ofpermanent plots. This network encompasses more than12 000 permanent plots randomly distributed throughout theprovince of Quebec, mainly on Crown lands. It was estab-lished in the 1970s for monitoring purposes and is thoughtto be representative of Quebec’s commercial forest. Theplots of this network are 400 m2 fixed-radius plots. At theplot level, many variables are assessed, such as the drainageclass, the dominant height, and the forest type. Within theplots, all trees with a DBH ‡ 9.1 cm are measured. For thesetrees, the species and the status (alive or dead) are recordedand the DBH is measured to the nearest millimetre using adiameter tape. The measurements are scheduled on a 10-year basis. Further details about the inventory procedurescan be found in Ministere des Ressources naturelles et de laFaune (2009).

Within each one of the three typical forest types that weselected, 30 plots were randomly chosen regardless of theirgeographical locations, densities, ages, or management his-tory. Once the plots were selected, we only used the firstmeasurement of each plot for our case study. Note that thesethree subsets of plots are in no way representative of thethree forest types across Quebec’s commercial forests, norwas it the purpose of this selection. We only wanted to test

the estimator with an array of conditions in these three for-est types. A summary of the plot data used in this case studyis shown in Table 1.

EvaluationThe plot volume prediction was estimated through a

50 000-iteration Monte Carlo simulation for each plot. Theplot volume prediction and its error variance obtainedthrough the Monte Carlo simulation were compared withthose of the analytical estimator.

To test the performance of the proposed analytical estima-tor, we compared the elapsed times required to compute theplot volume predictions and their error variances using theestimator and Monte Carlo simulations based on differentnumbers of iterations. To do so, we used the aforementionedsoftware developed in the Java programming language.

ResultsThe comparison between the volume predictions and error

variances of the Monte Carlo and the analytical estimators isshown in Fig. 1. The relative differences are computed asthe differences between the analytical estimator and theMonte Carlo estimator divided by the Monte Carlo estima-tor. There was no evidence of departures, as the analyticalpredictions were very close to Monte Carlo predictions inall cases (Figs. 1a, 1c, and 1e). For the variances, some de-partures were observed (Figs. 1b, 1d, and 1f). However,none of these exceeded the [–2%,+2%] range and no clearpattern could be distinguished.

For both the analytical method and the Monte Carlo sim-ulations, the plot volume is predicted as the sum of the indi-vidual tree volume predictions. As a consequence, no largedifference should be expected on the plot volume predic-tions. On the other hand, the analytical estimator providesan estimate of the error variance based on algebra and calcu-lus, whereas the Monte Carlo process emulates the variabil-ity through a large number of iterations. The results showthat the analytical variance estimates are consistent with si-mulated ones for the three forest types. There is no evidencethat the analytical estimator does not correctly account forthe error propagation in the volume estimation.

Based on the analytical estimator, the average contribu-tion of each variance component to the total variance of the

Table 1. Summary of the selected plots (each forest type encompasses 30plots).

Forest type Minimum Mean MaximumHardwood

Basal area (m2�ha–1) 4.1 23.4 50.9Stem density (stem�ha–1) 200 698 1150Predicted volume (m3�ha–1) 20.9 168.0 543.3

MixedwoodBasal area (m2�ha–1) 6.5 22.2 34.6Stem density (stem�ha–1) 275 974 1825Predicted volume (m3�ha–1) 22.9 125.5 232.1

SoftwoodBasal area (m2�ha–1) 1.0 18.1 40.6Stem density (stem�ha–1) 125 1038 2125Predicted volume (m3�ha–1) 0.8 94.8 247.2

1934 Can. J. For. Res. Vol. 40, 2010

Published by NRC Research Press

prediction errors is shown in Fig. 2. Basically, the variancecomponent associated with predicted heights was the mostimportant one, which accounted for 60%–70% of the totalvariance. The basic volume variance component came next,with proportions of around 30%–40%, and finally, the hy-brid variance component was not null, but negligible. Theseresults indicate that using predicted heights instead of ob-served ones at least doubles and may even triple the varian-ces of the plot volume predictions in the three forest types.

The elapsed times required to compute the analytical esti-mator and to run the Monte Carlo simulations are shown inTable 2. Note that the ratio between the time to run a MonteCarlo simulation and the time required by the analytical es-timator increases with the number of iterations. In terms oftime, the analytical estimator is clearly more efficient thanany of the Monte Carlo simulations. On average, it requiresslightly more than one tenth of second per plot to provide avariance estimate. Depending on the number of iterations,

the Monte Carlo simulations take from 6 to 259 times longerthan does the analytical estimator.

Discussion

Predicting volumes for numerous plots in a large forestterritory like the province of Quebec, Canada, is challengingfrom a statistical point of view because many sources of er-ror exist. Among others, let us mention the sampling errors,measurement errors, and prediction errors on both tree heightand tree volume. In this study, we focused on the predictionerrors and managed to design an analytical estimator that en-ables the propagation of the prediction errors associated withheight predictions from the tree to the plot level. Under theassumption that the model of HD relationships and the vol-ume model that compose the estimator do not exhibit anylack of fit, the analytical estimator provides plot volume pre-dictions with apparently consistent error variances.

Fig. 1. Relative differences between the Monte Carlo estimator and the analytical estimator (relative differences computed as the differencesbetween the analytical and Monte Carlo estimators divided by the Monte Carlo estimators).

Fortin and DeBlois 1935

Published by NRC Research Press

Two factors contributed to facilitate the derivation of thisestimator. First, there was no transformation of tree height inthe volume model. Consequently, factoring this variablefrom the volume model was trivial. For most transforma-tions, such as logarithmic or quadratic, this factorizationwould have had to rely on a linear approximation such as afirst-degree Taylor series. Second, the model of HD relation-ships directly produces a predicted height with an additiveerror term. If the logarithm of tree height had been predictedinstead, the error propagation throughout the volume model

would have had to take into account biases due to the backtransformation as well as a multiplicative error term. An an-alytical estimator for such complex cases remains to be de-veloped. Meanwhile, the proposed estimator can be appliedto any linear model of HD relationships and volume modelas long as tree height can be factored from the volumemodel.

The analysis of the variance components revealed that thevariance due to height prediction errors accounts for thelargest proportion of the total variance. A common practiceconsists in considering predicted heights as if they were ob-served. By doing so, plot volume predictions may exhibit nolack of fit. However, the error variances of the plot volumepredictions are largely underestimated because they only in-clude the basic volume component, which accounts for only30%–40% of the total variance. So, using predicted heightsinstead of observed ones at least doubles the variance of theplot volume predictions. At the tree level, similar resultshave been reported by Lappi (2006) when estimating treevolumes using a stem profile model and predicted treeheights.

This result was expected because models of HD relation-ships usually have a large variance that may range, for in-stance, from 3 to 10 (e.g., Lopez Sanchez et al. 2003). Sucha variance implies prediction errors of several metres insome cases. For example, the model of HD relationships inthis study had root mean square prediction errors rangingfrom 1.7 to 3.1 m (Fortin et al. 2009, table 6). Moreover,the plot random effects in this model indicate that the pre-diction errors do not fully compensate each other at the plotlevel.

Forest managers usually have more interest in stand- orstratum-level volume estimates. These estimates are basedon the aggregation of sample plots. The errors associatedwith height predictions in the sample plots propagate at thestand level. As a result, height predictions also contribute tothe uncertainty of stand volume estimates, in addition toother sources of error, such as the sampling and measure-ment errors (Gertner 1990). Actually, considering plot vol-ume predictions as observed values results in overoptimisticconfidence intervals, i.e., narrower intervals, for the standvolume estimates because the uncertainty due to height pre-dictions is not taken into account. The degree to which theestimated variances of the stand volume estimates are biaseddepends on the relative contribution of each source of errorto the total uncertainty at the stand level. If the samplingvariance is overwhelmingly higher than the average varianceof the plot volume predictions, the bias may be negligible.On the other hand, the variance might be underestimatedfor stands with small sampling variance. Gertner (1990) re-ported that errors of tree volume predictions had a small ef-fect on the total variance of the stand volume estimate in ared pine (Pinus resinosa Ait.) stand in Michigan, USA.However, prediction errors for tree volume were assumed tobe uncorrelated, and this may have contributed to underesti-mating the real contribution of the prediction error to the to-tal variance of the stand volume estimate. In any case, theerror variance of the plot volume predictions has to be esti-mated to determine whether or not it is negligible whencompared with other sources of error. The proposed estima-tor may serve this purpose.

Fig. 2. Average contribution of each variance component to the to-tal variance of the plot volume predictions.

1936 Can. J. For. Res. Vol. 40, 2010

Published by NRC Research Press

As shown in Fig. 2, height prediction errors account forthe lion’s share of the error variance of the plot volume pre-dictions. Height prediction accuracy can be improved by themeasurement of a few tree heights in each plot. In fact, treeheights in a particular plot often tend to be correlated (e.g.,Robinson and Wykoff 2004; Fortin et al. 2009). Observedtree heights can be considered as prior information and canbe used to obtain a best linear unbiased predictor (BLUP)for the plot random effect. Then the BLUP can be used inturn to obtain enhanced predictions for the other trees ofthis particular plot as suggested by Lappi (2006). Such aprocess usually decreases the variance of the prediction er-rors (e.g., Robinson and Wykoff 2004). In the model of HDrelationships that we used in this case study, the plot randomeffect accounted for about 25%–50% of the variance ofheight predictions. Consequently, it could reasonably be ex-pected that using observed tree heights to compute BLUPwould decrease the variance of the plot volume predictions.This remains to be tested, and the estimator proposed in thispaper might eventually serve as framework for this.

The analytical estimator does not take into account possi-ble measurement errors. Errors in tree and stand variables,such as DBH, tree density, basal area, or site index, areknown to propagate throughout growth simulations and in-flate the prediction variance (Gertner and Dzialowy 1984;McRoberts et al. 1994; Makinen et al. 2009). For the pro-posed estimator, errors in DBH measurements are likely toinflate the error variance of the plot volume predictions aswell. In practice, the model of HD relationships and the gen-eral volume model are mostly used with the provincial in-ventory data, for which DBH is reported in 2 cm diameterclasses. Using the median of the 2 cm diameter class clearlyimplies a measurement error. The robustness of the estimatorwith respect to the different measurement errors is, however,beyond the scope of this study and remains to be tested.

The analytical estimator relies on the assumption of a nullcovariance between height and volume prediction errors atboth the tree and the plot levels. For instance, the covarian-ces between the plot random effects and the covariances be-tween the tree-level residual error terms could be nonnull,i.e., Cov(uht,i, uvol,i) = 0 and Covð3ht;1; 3vol;iÞ 6¼ 0. Becausethe model of HD relationships and the volume model werefitted to two data sets independent of each other, this as-sumption cannot be tested for now. Positive correlationswould result in an increased error variance for the plot vol-ume predictions. This correlation remains to be evaluatedfor Quebec. Meanwhile, it represents a limitation of the pro-posed estimator.

Many authors propose stem profile models as an alterna-tive to general volume models. During the last three decades,

the literature on these models has become increasingly popu-lar (e.g., Kozak 1988; Parresol and Thomas 1996; Garberand Maguire 2003; Lappi 2006). Compared with these mod-els, a general volume model may appear simplistic. How-ever, considering the number of species involved here, thepresence of broadleaf species, and the extent of the commer-cial forest in Quebec, using stem profile models for thewhole territory and for all species is still out of reach. Evenif some stem profile models are being developed for someconiferous species, a general volume model is still requiredfor the remaining species, especially broadleaf ones. In thiscontext, the estimator developed in this study may be useful.

ConclusionsThis study proposes an analytical estimator for the plot

volume. This estimator propagates the errors associatedwith height predictions from the tree to the plot level. Theestimator is relatively easy to implement and is much fasterthan a traditional Monte Carlo approach in providing a plotvolume prediction and its uncertainty. Moreover, the estima-tor makes it possible to distinguish three components thatcompose the uncertainty of plot volume predictions: a firstcomponent related to the volume model, a second one re-lated to the use of predicted heights instead of observedones, and a third one, which is a hybrid component betweenthe two models.

On the other hand, this estimator assumes no lack of fitfor the model of HD relationships and the volume model. Italso assumes independence between the models as regardstheir coefficient estimates, their random effects, and theirtree-level residual errors. The estimator may be sensitive tomeasurement errors, particularly to DBH measurement andspecies identification errors. Its robustness vis-a-vis suchmeasurement errors remains to be evaluated.

AcknowledgementsThanks to Marie-Claude Lambert (Direction de la re-

cherche forestiere, Ministere des Ressources naturelles et dela Faune du Quebec), Sylvain Bernier (Direction des inven-taires forestiers, Ministere des Ressources naturelles et de laFaune du Quebec), Jean-Daniel Bontemps (INRA-ENGREF,France), and two anonymous reviewers for their helpfulcomments on an early version of this paper.

ReferencesAssman, E. 1961. The principles of forest yield study. Studies in

the organic production, structure, increment and yield of foreststands. Pergamon Press, New York.

Efron, B., and Tibshirani, R.J. 1993. An introduction to the boot-

Table 2. Elapsed times to compute the analytical estimator or to run the Monte Carlosimulations for the 90 selected plots (using an Intel Dual Core P8600 at 2.40 GHz).

MethodElapsed time(minutes:seconds) Time ratio*

Analytical estimator 0:11 1.01000-iteration Monte Carlo simulations 1:08 6.210 000-iteration Monte Carlo simulations 10:08 55.350 000-iteration Monte Carlo simulations 47:37 259.7

*Based on the analytical estimator.

Fortin and DeBlois 1937

Published by NRC Research Press

strap. Monographs on statistics and applied probability 57.Chapman and Hall/CRC, New York.

Fortin, M., DeBlois, J., Bernier, S., and Blais, G. 2007. Mise aupoint d’un tarif de cubage general pour les forets quebecoises:une approche pour mieux evaluer l’incertitude associee aux pre-visions. For. Chron. 83: 754–765.

Fortin, M., Bernier, S., Saucier, J.-P., and Labbe, F. 2009. Une re-lation hauteur–diametre generale pour 20 especes commercialesde la province de Quebec avec une attention particuliere a l’in-fluence des variables climatiques et stationnelles. Memoire derecherche forestiere n8 153. Gouvernement du Quebec, Minis-tere des Ressources naturelles et de la Faune, Direction de la re-cherche forestiere, Quebec, Que.

Garber, S.M., and Maguire, D.A. 2003. Modeling stem taper ofthree central Oregon species using nonlinear mixed effects mod-els and autoregressive error structures. For. Ecol. Manag. 179(1–3): 507–522. doi:10.1016/S0378-1127(02)00528-5.

Gertner, G. 1987. Approximating precision in simulation projections: anefficient alternative to Monte Carlo methods. For. Sci. 33: 230–239.

Gertner, G.Z. 1990. The sensitivity of measurement error in standvolume estimation. Can. J. For. Res. 20(6): 800–804. doi:10.1139/x90-105.

Gertner, G.Z., and Dzialowy, P.J. 1984. Effects of measurement er-rors on an individual tree-based growth projection system. Can.J. For. Res. 14(3): 311–316. doi:10.1139/x84-057.

Johnston, J., and Dinardo, J. 1997. Econometric methods. 4th ed.McGraw-Hill, New York.

Jordan, L., Berenhaut, K., Souter, R., and Daniels, R.F. 2005. Par-simonious and completely compatible taper, total, and merchan-table volume models. For. Sci. 51: 578–584.

Kozak, A. 1988. A variable-exponent taper equation. Can. J. For.Res. 18(11): 1363–1368. doi:10.1139/x88-213.

Lappi, J. 2006. A multivariate, nonparametric stem-curve predictionmethod. Can. J. For. Res. 36(4): 1017–1027. doi:10.1139/X05-305.

Lopez Sanchez, C.A., Gorgoso Varela, J., Castedo Dorado, F., RojoAlboreca, A., Rodrıguez Soalleiro, R., Alvarez Gonzalez, J.G.,and Sanchez Rodrıguez, F. 2003. A height–diameter model forPinus radiata D. Don in Galicia (Northwest. Spain). Ann. For.Sci. 60: 237–245. doi:10.1051/forest:2003015.

Makinen, A., Holopainen, M., Kangas, A., and Rasinmaki, J. 2009.Propagating the errors of initial forest variables through stand-and tree-level growth simulators. Eur. J. For. Res.. doi:10.1007/s10342-009-0288-0.

McRoberts, R.E., Hahn, J.T., Hefty, G.J., and van Cleve, J.R. 1994.Variation in forest inventory field measurements. Can. J. For.Res. 24(9): 1766–1770. doi:10.1139/x94-228.

Ministere des Ressources naturelles et de la Faune. 2009. Normesd’inventaire forestier. Placettes-echantillons permanentes, Edi-tion 2009, Version provisoire. Gouvernement du Quebec, Minis-tere des Ressources naturelles et de la Faune, Direction desinventaires forestiers, Quebec, Que.

Mowrer, H.T. 1991. Estimating components of propagated variancein growth simulation model projections. Can. J. For. Res. 21(3):379–386. doi:10.1139/x91-047.

Mowrer, H.T., and Frayer, W.E. 1986. Variance propagation ingrowth and yield projections. Can. J. For. Res. 16(6): 1196–1200. doi:10.1139/x86-213.

Parresol, B.R., and Thomas, C.E. 1996. A simultaneous density-inte-gral system for estimating stem profile and biomass: slash pine andwillow oak. Can. J. For. Res. 26(5): 773–781. doi:10.1139/x26-087.

Parysow, P., Gertner, G., and Westervelt, J. 2000. Efficient approxi-mation for building error budgets for process models. Ecol.Model. 135(2–3): 111–125. doi:10.1016/S0304-3800(00)00347-1.

Perron, J.-Y. 1985. Tarif de cubage general. Volume marchand

brut. 2eme publication. Rapport. Gouvernement du Quebec, Min-istere de l’Energie et des Ressources, Service de l’inventaire for-estier, Quebec, Que.

Pinheiro, J.C., and Bates, D.M. 2000. Mixed-effects models in Sand S-PLUS. Springer, New York.

Robinson, A.P., and Wykoff, W.R. 2004. Imputing missing heightmeasures using a mixed-effects modeling strategy. Can. J. For.Res. 34(12): 2492–2500. doi:10.1139/x04-137.

Saucier, J.-P., Bergeron, J.-F., Grondin, P., and Robitaille, A. 1998.Les regions ecologiques du Quebec meridional: un des elementsdu systeme hierarchique de classification ecologique du terri-toire mis au point par le Ministere des Ressources naturelles.L’Aubelle, fevrier–mars 1998. Ordre des Ingenieurs Forestiersdu Quebec, Quebec, Que.

Sharma, M., and Oderwald, R.G. 2001. Dimensionally compatiblevolume and taper equations. Can. J. For. Res. 31(5): 797–803.doi:10.1139/cjfr-31-5-797.

Zhang, Y., Borders, B.E., and Bailey, R.L. 2002. Derivation, fit-ting, and implication of a compatible stem taper – volume –weight system for intensively managed, fast growing loblollypine. For. Sci. 48: 595–607.

Appendix ALet us consider model 4a and eq. 4b (from the main text):

½A1� volij ¼ ~xvol;ijbþ zvol;ijuvol;i þ 3vol;ij

þ aijbðzht;ijuht;i þ 3ht;ijÞ

½A2� aijb ¼�b1

DBHij

þ ðb2;mvolþ b3;mvol

DBHijÞp� DBH2

ij

40

Now, let 3b and 3g be two vectors of multivariate normalerrors on the estimates of b and g such thatbb � b ¼ 3b � Nð0;UvolÞ and bg � g ¼ 3g � Nð0;UhtÞ. Usingmatrix properties as in Johnston and Dinardo (1997, p. 100),the variance–covariance of the tree volume predictionsbased on height predictions for plot i can be expressed as

½A3� ~Vvol;i ¼ Vvol;i þ E½ð~Xvol;i3bÞð~Xvol;i3bÞT �þ diagðAi

bbÞðVht;i þ E½ðXht;i3gÞðXht;i3gÞT �ÞdiagðAibbÞ

þ E½ðdiagðAi3bÞðZht;iuht;i þ 3ht;i þ Xht;i3gÞÞ� ðdiagðAi3bÞðZht;iuht;i þ 3ht;i þ Xht;i3gÞÞT �

where E[�]is the mathematical expectation of the argument. Usingmatrix properties, this variance can be expressed as follows:

½A4� ~Vvol;i ¼ Vvol;i þ E½~Xvol;i3b3Tb~X

T

vol;i�

þ diagðAibbÞðVht;i þ E½Xht;i3g3

TgXT

ht;i�ÞdiagðAibbÞ

þ E½diagðAi3bÞðZht;iuht;i þ 3ht;i þ Xht;i3gÞ�ðZht;iuht;i þ 3ht;i þ Xht;i3gÞTdiagðAi3bÞ�

where terms E½~Xvol;i3b3Tb~XT

vol;i� and E½Xht;i3g3TgXT

ht;i� are ex-

pressed as ~Xvol;iUvol~XT

vol;i and Xht,iUhtXht,iT , respectively.

Using a property of conditional expectation, termE½diagðAi3bÞðZht;iuht;i þ 3ht;i þ Xht;i3gÞðZht;iuht;i þ 3ht;i þXht;i3gÞTdiagðAi3bÞ� can be expressed as

½A5� E3b ½E½diagðAi3bÞðZht;iuht;i þ 3ht;i

1938 Can. J. For. Res. Vol. 40, 2010

Published by NRC Research Press

þ Xht;i3gÞðZht;iuht;i þ 3ht;i þ Xht;i3gÞTdiagðAi3bÞj3b��¼ E3b ½diagðAi3bÞE½ðZht;iuht;i þ 3ht;i

þ Xht;i3gÞðZht;iuht;i þ 3ht;i þ Xht;i3gÞT �diagðAi3bÞ�¼ E3b ½diagðAi3bÞvar ðZht;iuht;i

þ 3ht;i þ Xht;i3gÞdiagðAi3bÞ�¼ E3b ½diagðAi3bÞðVht;i þ Xht;iUhtX

Tht;iÞdiagðAi3bÞ�

which can be expressed in turn as

½A6� ðAiUvolATi Þ � ðVht;i þ Xht;iUhtX

Tht;iÞ

where the symbol ‘‘�’’ denotes the entrywise product (alsoreferred to as the Hadamard product) of two matrices. Sub-stituting these three terms in eq. A4 yields

½A7� ~Vvol;i ¼ Vvol;i þ ~Xvol;iUvol~X

T

vol;i

þ diagðAibbÞðVht;i þ Xht;iUhtX

Tht;iÞdiagðAi

bbÞþ ðAiUvolA

Ti Þ � ðVht;i þ Xht;iUhtX

Tht;iÞ

ReferenceJohnston, J., and Dinardo, J. 1997. Econometric methods. 4th ed.

McGraw-Hill, New York.

Appendix B Random vector drawn from amultivariate normal distribution

Let us consider the vector 3 as a random vector that fol-lows a multivariate normal distribution with mean m and co-variance V, i.e., 3 � MVNðm;VÞ. The Cholesky

decomposition provides the lower triangular matrix A thatsatisfies the condition V = AAT. If y is a vector of independ-ent standard normal random variates (i.e., all independentlyand normally distributed with mean 0 and variance 1), thesum m + Ay yields a random vector from the desired multi-variate distribution.

For example, let us consider the bivariate case such that

m ¼0

0

" #and

V ¼0:700 0:500

0:500 2:000

" #The Cholesky decomposition of V yields

A ¼0:837 0:000

0:598 1:282

" #Now, let us draw a random vector of independent standardnormal variates such that y = (0.246, –1976)T. The sum ofm + Ay, i.e.,

0

0

" #þ

0:837 0:000

0:598 1:282

" #�

0:246

�1:976

" #¼

0:206

�2:386

" #yields a vector that follows a MVN distribution with meanm and covariance V.

Fortin and DeBlois 1939

Published by NRC Research Press