Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1...

Comparison of a k-NN approach and regression techniques for single tree

biomass estimation

8th FIA Symposium of the USDA Forest Service, October 16-19 Monterey CA

Lutz Fehrmann & Christoph Kleinn

Introduction

• On the way to more general biomass estimationapproaches on single tree level a compilation of readily available datasets is required and useful.– This might be very challaging because the willingness

to share data is not always well developed

• Once a comprehensive enough database isgiven, also instance based methods like the k-NN approach can be applied

The k-NN approach

• The k-NN method is based on a non-parametricpattern recogition algorithm

• Basic idea is to classify an unknown feature of an instance according to its similarity to otherknown instances stored in a database– Based on a calculated distance the k nearest (most

similar) neighbours to a certain query point areidentified and under the assumption that they arealso similar concerning their target values, used to derive an estimation

The k-NN approach

• Different to regression analysis or processmodel approaches no functional relationshipsbetween the variables have to be formulated

• The estimations are derived as localapproximations, not as a global function

( )( )

=← k

Distance function

jrirrjiw

xxwxxd

⎥⎥

⎢⎢

⎟⎟

⎜⎜

⎛ −⋅= ∑

• As distance function given multivariatemeasures from cluster- or discriminant analysescan be used:

dw = weighted distance between two instancesn = number of variableswr = wheigh assigned to the variable rr = rth variable of an instance(xi,xj) = instancesδr = standardisation factor (range of variable or multiple of σ of variable r)c = >=1 Minkowski constant (2= euclidean distance)

Implementation

• To run the k-NN Algorithm a suitable softwareapplication and database is necessary

Size of the Neighbourhood

• Instance based methods come along with a typical bias-variance dilemma that is in partsinfluenced by asymmetric neighbourhoods at theedges of the feature space of the training data

0 20 40 60 80BHD [cm]

° observed

∆ k=15

dbh [cm]

Cross validation

• To determine the parameters for the distance-and weighting function as well as k cross-validation methods are suitable– Therefore an estimation for every tree is derived

based on the remaining N-1 trees of the trainingdata.

– The definition of optimal weighting factors, the sizeof the neighbourhood and parameters of thedistance function can be approximated by an iterative process or by means of optimazationalgorithms.

Example

• A large dataset of Norway spruce and Scots pinetrees (provided by the METLA) was used to evaluate the k-NN approach in comparison to regression models– Datasets where split into „modelling“ (n=143 for

spruce, n=145 for pine) and „test“ (n=60 each) subsets

– Modelling subsets where used to estimate regressioncoefficiants and as training data for the k-NN algorithm (independent variables are dbh and height)

Example

• Predictions for the „test“ datasets were used to compare the performance of both approachesby means of different error criterions

Fichte

10152025303540

7 10 13 16 19 22 25 28 31 34 37 40 43 46

BHD-Klasse

spruce Kiefer

10152025303540

7 10 13 16 19 22 25 28 31 34 37 40 43 46

BHD-Klasse

modellingtest

dbh classdbh class

Example

• Multiple cross-validation was used to minimize theRMSE and bias by an approximation of optimal featureweights and parameter settings.

0 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 1819 20 21

Number of neighbours (k )

rMSE% Scots pinerMSE% Norway sprucebias Scots pinebias Norway spruce

Example

• Alternative to a fixed number of neighbours also a kernel- method was applied– In this case neighbours are considered up to a

defined standardized distance

0 200 400 600 800 1000 1200agb [Kg]

Trainingsdatenk NachbarnKernel-Distanz

0 10 20 30 40

BHD [cm]

dbhagb [Kg]

Example

• Linear mixed effect models and simple linear models were used as reference

0 100 200 300 400 500 600 700

Regressionk-NN

Fichte

0 100 200 300 400 500 600 700

Regressionk-NN

Kiefer

Beobachtet agb [Kg]„observed“ agb [Kg]

spruce pine

Example results

• The RMSE could be reduced in comparison to regression models for both species:

Regression models / Approach RMSE rMSE% MAPE ME

Scots pine

kikikiki ehdagb +++= lnlnlnln χβα 20.68 15.79 9.67 -2.562

kikikikki ehdaagb ++++= lnlnlnlnln χβα 19.76 15.00 9.21 -1.718

k-NN 19.41 14.54 12.61 0.009

Norway Spruce

( ) kikikiki edhdagb +++= lnlnlnln χβα 22.91 19.85 13.80 -1.630

( ) kikikikki edhdaagb ++++= χβα lnlnlnln 20.31 17.36 13.73 -0.398

k-NN 19.19 16.42 13.98 -0.493

Outlook

• The k-NN method offers the possibility to includeadditional variables (for example meta informationabout sites or tree species) without knowledge aboutthe cause-and-effect relationships

• In case of using multiple search variables theimplementation of optimazation approaches, like thegenetic algorithm (Tomppo and Halme, 2004), forfeature weighting is required and useful.

•Thank you!

This study was conducted in close collaboration with the Finnish Forest Research Institute (METLA). We thank Errki Tomppo and Aleksi Lehtonen!

Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1...

Documents

Permutation tests for regression, ANOVA and comparison of

A comparison of time dependent Cox regression, pooled logistic regression and … · 2017-08-28 · regression, pooled logistic regression and ... of repeated measures of covariates

UVA CS 6316: Machine Learning Lecture 9: K-nearest-neighbor · e.g., support vector machine, decision tree, logistic regression, e.g. neural networks (NN), deep NN 2. Generative:

Comparison of linear and logistic regression for segmentation

A COMPARISON OF LOGISTIC REGRESSION, GEOSTATISTICS …

COMPARISON OF PRINCIPAL COMPONENT REGRESSION (PCR) …

Comparison of Regression Models on House Value Predictionmwang2/projects/ML_Regression4House... · 2020-03-18 · Comparison of Regression Models on House Value Prediction T e am

Performance Comparison of K-NN, Minimum Distance and SVM ... · Performance Comparison of K-NN, Minimum Distance and SVM Classifiers Used to Predict Hypertension Based on Manuscript

Comparison of Regression and Geostatistical

A Comparison of k-NN Methods for Time Series Classi cation

A Comparison of Nonlinear Regression Codeseuler.nmt.edu/~brian/students/mondragon.pdf · · 2007-05-23A Comparison of Nonlinear Regression Codes by ... The Levenberg-Marquardt Method

A Comparison of Ordinary Least Squares and Logistic Regression

Robustness of Kernel Based Regression: a Comparison of

A Comparison Between Poisson Regression And The Bradley

12 comparison of NN and FS.ppt - Auburn Universitywilambm/nn/cn_org/12 comparison of NN and FS_9.pdfNeural network with one hidden layer 6 0 5 10 1 0.5 1-0.5 0 0.5-1-0.5 0 0 5 10 15

A COMPARISON OF LOGISTIC REGRESSION, GEOSTATISTICS …€¦ · a comparison of logistic regression, geostatistics and maxent for distribution modeling of a forest endemic; a pilot

Comparison of Regression, Support Vector Regression (SVR

Comparison of the Performance of Simple Linear Regression

COMPARISON OF CONCEPTUALLY BASED AND REGRESSION RAINFALL- RUNOFF

Lecture 6 Comparison of logistic regression and stratified analyses