ST.Monteiro-EmbeddedFeatureSelection.pdf

Embedded Feature Selection

of Hyperspectral Bands with

Boosted Decision Trees

Sildomar Monteiro and Richard Murphy

The University of Sydney

Rio Tinto Centre for Mine Automation

• Totally Autonomous Mine in 10 years:– Brings together all elements of systems, perception,

machine learning, data fusion and more

– A grand challenge for Field Robotics

• Driven by safety, predictability and efficiency

Dr Sildomar Monteiro 22

IGARSS 2011

Goal: Mine Picture Compilation

• Provide a complete and accurate model of the mine

– Mine planning and better prediction outcomes

• Maintain and update a multi-scale probabilistic

representation

– Geology

– Geometry

– Equipment

– And other properties of interest for the mining process

Dr Sildomar Monteiro 3IGARSS 2011

Today

Floor mapping using ripped trench sections

Geology Feedback to Batch

Cone logging

4

Today


Geology (ground-truth)


Mine Face Scanning

IGARSS 2011 6

[Nieto, Viejo and Monteiro, 2010]

Hyperspectral sensing for mining

• Geology classification (material identification) still has

many challenges

• Environmental conditions

– Illumination, temperature, dust

• Timely data acquisition and processing is needed

– Algorithms and calibration

• High spectral similarity between (ore-bearing) rock

types

– Few, if any, distinctive spectral features

Dr Sildomar Monteiro IGARSS 2011 7

Outline

• Hyperspectral classification using Boosting

• Embedded band selection

• Experiments using iron ore data


Hyper-spectral Sensors

VisNIR400-970 nm

SWIR 970-2500 nm

Multispectral

Hyper-spectral


Band 4Band 5

Band 6

Band 2Band 3

Band n

Band 1

Example of Classification and Spectra


500 750 1000 1250 1500 1750 2000 2250

Refle

cta

nce

0.0

0.1

0.2

0.3

0.4

0.5

0.6

500 750 1000 1250 1500 1750 2000 2250

0.00

0.05

0.10

0.15

0.20

Wavelength (nm)

500 750 1000 1250 1500 1750 2000 2250

Refle

cta

nce

0.0

0.2

0.4

0.6

0.8

1.0

Wavelength (nm)

500 750 1000 1250 1500 1750 2000 2250

0.0

0.1

0.2

0.3

0.4

0.5

a b

c d

500 750 1000 1250 1500 1750 2000 2250

Reflecta

nce

0.0

0.1

0.2

0.3

0.4

0.5

0.6

500 750 1000 1250 1500 1750 2000 2250

0.00

0.05

0.10

0.15

0.20

Wavelength (nm)

500 750 1000 1250 1500 1750 2000 2250

Reflecta

nce

0.0

0.2

0.4

0.6

0.8

1.0

Wavelength (nm)

500 750 1000 1250 1500 1750 2000 2250

0.0

0.1

0.2

0.3

0.4

0.5

a b

c d

500 750 1000 1250 1500 1750 2000 2250

Refle

cta

nce

0.0

0.1

0.2

0.3

0.4

0.5

0.6

500 750 1000 1250 1500 1750 2000 2250

0.00

0.05

0.10

0.15

0.20

Wavelength (nm)

500 750 1000 1250 1500 1750 2000 2250

Refle

cta

nce

0.0

0.2

0.4

0.6

0.8

1.0

Wavelength (nm)

500 750 1000 1250 1500 1750 2000 2250

0.0

0.1

0.2

0.3

0.4

0.5

a b

c d

Hyperspectral Band Selection

• Feature Selection (vs Dimensionality Reduction)

– Remove correlated inputs

– Physical interpretation (band wavelengths)

• Faster data processing

• Possible faster data acquisition

• Can be tailored to application

• Indicate multispectral bands


Boosting

• Sound theoretical foundation

– Additive Logistic Regression [Friedman, 2000]

• Empirical studies show that boosting

– Yields small classification error rates

– Is very resilient to overfitting

• State-of-the-art results in many applications, e.g. face

recognition in computer vision

• The idea of Boosting is to train many “weak” learners

on various distributions (or set of weights) of the input

data and then combine the resulting classifiers into a

single “committee”


Decision Trees

• Advantages:

– Robustness and interpretability

• Disadvantages

– Low accuracy and high variance

• Binary decision trees

• Boosted trees

– Accurate, robust and interpretable


( , , , , ) ( )f x a b a x ba bb

( )x

1

( ) ( )M

m m

m

G x sign f x

m

32

1IGARSS 2011

Embedded Feature Selection

• Relative Importance of input variables

• Approximation for decision trees (heuristic)

[Friedman, 1999]

• Least-squares improvement criterion


12ˆ( )

.varj x x j

j

F xI E x

x

12 2

1

ˆ ˆ( ) ( ( ) )J

j tt

I T i t j

22 , l rl r l r

l r

wwi R R y y

w w

IGARSS 2011

Embedded Feature Selection (cont.)

• Boosted Decision Trees

• The Multi-class case


2 2

1

1ˆ ˆj j

M

mmMTI I

1

1ˆ ˆK

j jkk

I IK

IGARSS 2011

Experiments

• Hyperspectral data acquired using a field

spectrometer (ASD)

– 429 bands (same as hyperspectral camera)

– Wavelengths from 350 nm to 2500 nm

• Samples of ore-bearing rocks

– Martite, goethite, kaolinite, etc (total 9 classes)

– Different illumination and physical conditions (direct sunlight,

shadow and viewing angles)

• Methodology of experiments

– Metrics: accuracy, precision, recall, F, Kappa, AUC

– 4-fold cross-validation



Hyperspectral data set

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

500 750 1000 1250 1500 1750 2000 2250 2500

Re

flectan

ce

Wavelength

samples_644-17_1_00000.asd.ref samples_644-17_1_00035.asd.ref

VisNIR SWIR

Dr Sildomar Monteiro IGARSS 2011

Information in spectra

18

Experimental Results: 9 rock types

• Relative importance of features

• Normalized count of features


400 600 800 1000 1200 1400 1600 1800 2000 2200 24000

20

40

60

80

100

Wavelength (nm)

Rela

tive im

port

ance (

%)

400 600 800 1000 1200 1400 1600 1800 2000 2200 24000

20

40

60

80

100

Wavelength (nm)

Norm

aliz

ed f

eatu

re c

ount

(%)

IGARSS 2011

Experimental Results: 9 rock types

• Classification performance of selected features


0.1000

0.2000

0.3000

0.4000

0.5000

0.6000

0.7000

0.8000

0.9000

Accuracy F-score Kappa AUC

Relative Importance

Normalized Count

Experimental Results

• All 9 classes

• Martite


Summary

• Boosting increases the performance of decision trees

while keeping model interpretability

• We presented two approaches to perform feature

selection using boosted decision trees

• Calculating the relative importance of features was

more efficient than the counting of features

• The reduced set is able to predict the classes

accurately, and more efficiently than using all features


Conclusions

• The standard learning procedure of boosted decision

trees can perform feature selection automatically

• The feature selection is embedded in the internal

structure of the model, no need for extra parameters

or separate selection algorithms

• Instability of the models can be an issue

• Future work: how to determine the optimal number of

features (using statistical tests)


When Things Don’t Work...


Technology

ST.Monteiro-EmbeddedFeatureSelection.pdf