16
1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation 8th FIA Symposium of the USDA Forest Service, October 16-19 Monterey CA Lutz Fehrmann & Christoph Kleinn

Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

Embed Size (px)

Citation preview

Page 1: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

1

Comparison of a k-NN approach and regression techniques for single tree

biomass estimation

8th FIA Symposium of the USDA Forest Service, October 16-19 Monterey CA

Lutz Fehrmann & Christoph Kleinn

Page 2: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

2

Introduction

• On the way to more general biomass estimationapproaches on single tree level a compilation of readily available datasets is required and useful.– This might be very challaging because the willingness

to share data is not always well developed

• Once a comprehensive enough database isgiven, also instance based methods like the k-NN approach can be applied

Page 3: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

3

The k-NN approach

• The k-NN method is based on a non-parametricpattern recogition algorithm

• Basic idea is to classify an unknown feature of an instance according to its similarity to otherknown instances stored in a database– Based on a calculated distance the k nearest (most

similar) neighbours to a certain query point areidentified and under the assumption that they arealso similar concerning their target values, used to derive an estimation

Page 4: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

4

The k-NN approach

• Different to regression analysis or processmodel approaches no functional relationshipsbetween the variables have to be formulated

• The estimations are derived as localapproximations, not as a global function

( )( )

=

=← k

ik

k

ikk

q

w

xfwxf

1

Page 5: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

5

Distance function

( )c

n

r

c

r

jrirrjiw

xxwxxd

1

1,

⎥⎥

⎢⎢

⎟⎟

⎜⎜

⎛ −⋅= ∑

= δ

• As distance function given multivariatemeasures from cluster- or discriminant analysescan be used:

dw = weighted distance between two instancesn = number of variableswr = wheigh assigned to the variable rr = rth variable of an instance(xi,xj) = instancesδr = standardisation factor (range of variable or multiple of σ of variable r)c = >=1 Minkowski constant (2= euclidean distance)

Page 6: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

6

Implementation

• To run the k-NN Algorithm a suitable softwareapplication and database is necessary

Page 7: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

7

Size of the Neighbourhood

• Instance based methods come along with a typical bias-variance dilemma that is in partsinfluenced by asymmetric neighbourhoods at theedges of the feature space of the training data

0

500

1000

1500

2000

2500

3000

0 20 40 60 80BHD [cm]

agb

[Kg]

° observed

+ k=3

∆ k=15

dbh [cm]

Page 8: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

8

Cross validation

• To determine the parameters for the distance-and weighting function as well as k cross-validation methods are suitable– Therefore an estimation for every tree is derived

based on the remaining N-1 trees of the trainingdata.

– The definition of optimal weighting factors, the sizeof the neighbourhood and parameters of thedistance function can be approximated by an iterative process or by means of optimazationalgorithms.

Page 9: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

9

Example

• A large dataset of Norway spruce and Scots pinetrees (provided by the METLA) was used to evaluate the k-NN approach in comparison to regression models– Datasets where split into „modelling“ (n=143 for

spruce, n=145 for pine) and „test“ (n=60 each) subsets

– Modelling subsets where used to estimate regressioncoefficiants and as training data for the k-NN algorithm (independent variables are dbh and height)

Page 10: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

10

Example

• Predictions for the „test“ datasets were used to compare the performance of both approachesby means of different error criterions

Fichte

05

10152025303540

7 10 13 16 19 22 25 28 31 34 37 40 43 46

BHD-Klasse

Anz

ahl

spruce Kiefer

05

10152025303540

7 10 13 16 19 22 25 28 31 34 37 40 43 46

BHD-Klasse

Anz

ahl

modellingtest

pine

dbh classdbh class

num

bero

f tre

es

num

bero

f tre

es

Page 11: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

11

Example

• Multiple cross-validation was used to minimize theRMSE and bias by an approximation of optimal featureweights and parameter settings.

-10

0

10

20

30

40

50

0 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 16 17 1819 20 21

Number of neighbours (k )

Erro

r %

rMSE% Scots pinerMSE% Norway sprucebias Scots pinebias Norway spruce

Page 12: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

12

Example

• Alternative to a fixed number of neighbours also a kernel- method was applied– In this case neighbours are considered up to a

defined standardized distance

0

0,2

0,4

0,6

0,8

1

0 200 400 600 800 1000 1200agb [Kg]

norm

ierte

Dis

tanz

Trainingsdatenk NachbarnKernel-Distanz

0

5

10

15

20

25

30

35

0 10 20 30 40

BHD [cm]

k

dbhagb [Kg]

norm

ated

dist

.

k

Page 13: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

13

Example

• Linear mixed effect models and simple linear models were used as reference

0

100

200

300

400

500

600

700

0 100 200 300 400 500 600 700

Ges

chät

zt a

gb [K

g]

Regressionk-NN

Fichte

0

100

200

300

400

500

600

700

0 100 200 300 400 500 600 700

Regressionk-NN

Kiefer

Beobachtet agb [Kg]„observed“ agb [Kg]

estim

ated

agb

[Kg]

spruce pine

Page 14: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

14

Example results

• The RMSE could be reduced in comparison to regression models for both species:

Regression models / Approach RMSE rMSE% MAPE ME

Scots pine

kikikiki ehdagb +++= lnlnlnln χβα 20.68 15.79 9.67 -2.562

kikikikki ehdaagb ++++= lnlnlnlnln χβα 19.76 15.00 9.21 -1.718

k-NN 19.41 14.54 12.61 0.009

Norway Spruce

( ) kikikiki edhdagb +++= lnlnlnln χβα 22.91 19.85 13.80 -1.630

( ) kikikikki edhdaagb ++++= χβα lnlnlnln 20.31 17.36 13.73 -0.398

k-NN 19.19 16.42 13.98 -0.493

Page 15: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

15

Outlook

• The k-NN method offers the possibility to includeadditional variables (for example meta informationabout sites or tree species) without knowledge aboutthe cause-and-effect relationships

• In case of using multiple search variables theimplementation of optimazation approaches, like thegenetic algorithm (Tomppo and Halme, 2004), forfeature weighting is required and useful.

Page 16: Comparison of a k-NN approach and regression techniques ...lfehrma/doc/fehrmann_knn.pdf · 1 Comparison of a k-NN approach and regression techniques for single tree biomass estimation

16

•Thank you!

This study was conducted in close collaboration with the Finnish Forest Research Institute (METLA). We thank Errki Tomppo and Aleksi Lehtonen!