45
Computer Vision Group Prof. Daniel Cremers 16. Online Learning

16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

Computer Vision Group Prof. Daniel Cremers

16. Online Learning

Page 2: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Motivation

• So far, we have handled offline learning methods, but very often data is observed in an ongoing process

• Therefore, it would be good to adapt the learned models with newly arriving data without having to consider old data

• This is called online learning

• Major benefits:

• algorithms are adaptive to new observations

• learning is usually faster

2

Page 3: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Clarification of Notions

There are (at least) three notions:

• incremental learning:

• model is updated with new samples, but old samples might be considered again

• online learning:

• after considering a data sample, it can be disregarded (no need to store it)

• learning at frame rate:

• learning is fast enough such that it can be performed in one perception cycle

3

Page 4: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Example: Pedestrian Detection

Offline Approach:

1. Collect and annotate a large data set from different sensor modalities, here: camera and 2D laser

4

Codebook

Image DataLaser Data

Training SetTraining Set

[Spinello, Triebel, and Siegwart, IJRR 2010]

Page 5: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Example: Pedestrian Detection

Offline Approach:

1. Collect and annotate a large data set from different sensor modalities, here: camera and 2D laser

2. Train a classifier for each sensor modality

5

Codebook

Image DataLaser Data

Boosted CRF

Extended ISM

Training SetTraining Set

[Spinello, Triebel, and Siegwart, IJRR 2010]

Page 6: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Example: Pedestrian Detection

Offline Approach:

1. Collect and annotate a large data set from different sensor modalities, here: camera and 2D laser

2. Train a classifier for each sensor modality

3. Apply the classifiers and fuse data

6

Codebook

Image DataLaser Data

Boosted CRF

Extended ISM

Training SetTraining Set

[Spinello, Triebel, and Siegwart, IJRR 2010]

New Laser Data Stream New Image Data Stream

Page 7: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Example: Pedestrian Detection

Testing Phase:

7

Fused pedestrian detection and trackingFused detection and tracking

of cars and pedestrians

[Spinello, Triebel, and Siegwart, IJRR 2010]

But: non-adaptive; large training set required

Page 8: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Major Problem of this Approach

Offline learning:

• cannot adapt to new environments

• cannot learn new classes

• cannot consider new instances of a given class

• is often very slow

• requires a large amount of manually annotated training data

Ideas:

• use online learning for adaptivity

• use active learning to reduce (human) work load

8

Page 9: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning

• To reduce the required training data, the learner can select the data it needs to learn from.

• This is called active learning

• Major advantages:

• only those samples where classification is hard are used for re-training

• humans only need to give ground truth labels for a smaller set of samples

• In principle, active learning can be used with offline and online learning, but online makes more sense

9

Page 10: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning

10

Optimization

Training data

function

f(x)

“Laptop”

“Cup”f(x)

X

Y

training data

Page 11: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning

11

Optimization Prediction

Training data New Data

function Uncertainty,Labelsf(x)

“Laptop”

“Cup”f(x)

X

Y

f(x)

training data

“Laptop”

Page 12: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning

12

Optimization Prediction

Training data New Data

function

f(x)

“Laptop”

“Cup”f(x)

X

Y

f(x)

training data

?

Uncertainty,Labels

Page 13: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning

13

Optimization Prediction

Training data New Data

Label Query

Supervisor

function

f(x)

“Laptop”

“Cup”f(x)

X

Y

training data

“Present”

Uncertainty,Labels

Page 14: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning

14

Optimization Prediction

Training data New Data

Label Query

Supervisor

function

f(x)

“Laptop”

“Cup”f(x)

X

Y

training data

“Present”

Extend training data

New training data

Uncertainty,Labels

Page 15: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning

15

Optimization Prediction

Training data New Data

Label Query

Supervisor

function

f(x)Extend training

dataNew training

dataUncertainty,

Labels

Major Benefits:

• Adapts to new situations

• Requires less training samples

Page 16: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Uncertainty Estimates

• A key step in active learning is the selection step

• It requires a good estimate of uncertainties

• Examples for classification algorithms to compute uncertainties:

• Support Vector Machine (“Platt scaling”)

• Gaussian Process Classifier (“predictive variance”)

• Tree-based classifiers (entropy in leaf nodes)

16

But: It is important how uncertainty correlates to incorrect classifications!

Page 17: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Uncertainty Estimation: An Example

17

Comparison by evaluation on an unseen class:

Ground Truth GP Classification

building

tree

ground

hedge

car

background

0

1

Normalized Entropy

Uncertainty

0 0.2 0.4 0.6 0.8 10

50

100

150

200

Normalized Entropy for Label Distribution

Fre

quency

GP Classifier

0 0.2 0.4 0.6 0.8 10

50

100

150

200

250

300

Normalized Entropy for Label Distribution

Fre

qu

en

cy

SVM Classifier

The GP is less overconfident than the SVM!

[Paul, Triebel, Rus, Newman, IROS 2012]

SVM GPC

uncertainty uncertainty

Page 18: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Application: Traffic Sign Classification

18

10

Lorry Stop Roadworks Keep left

BDT

IVM

Linear GPC

Linear SVM

Logit Boost

Non-linear SVM

Non-linear GPC

Trained classes Unseen classes� �0 0.5 10

5

10

15

20

70kph

0 0.5 10

10

20

30

40

50

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

100

0 0.5 10

5

10

15

20

25

Right ahead

0 0.5 10

50

100

150

0 0.5 10

5

10

15

20

0 0.5 10

10

20

30

40

50

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

0 0.5 10

5

10

15

20

25

0 0.5 10

50

100

0 0.5 10

5

10

15

20

25

0 0.5 10

10

20

30

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

0 0.5 10

5

10

15

20

0 0.5 10

50

100

150

0 0.5 10

10

20

30

40

0 0.5 10

20

40

60

80

100

0 0.5 10

10

20

30

40

50

0 0.5 10

50

100

150

0 0.5 10

10

20

30

40

0 0.5 10

20

40

60

80

100

0 0.5 10

5

10

15

0 0.5 10

5

10

15

20

25

0 0.5 10

10

20

30

0 0.5 10

50

100

150

0 0.5 10

5

10

15

20

0 0.5 10

50

100

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

10

20

30

40

0 0.5 10

10

20

30

40

0 0.5 10

50

100

150

0 0.5 10

50

100

0 0.5 10

20

40

60

0 0.5 10

50

100

Fig. 4: Normalised entropy histograms (frequency vs NE) of the marginal probabilities for a variety of classifiers trained onthe road sign classes stop and lorries prohibited and tested on 200 instances of the unseen classes roadworks ahead and keepleft. Higher normalised entropy implies more uncertainty in classifier output, so we expect the classifiers to be certain (lowNE) on the trained classes and uncertain (high NE) for the unseen classes.

10

Lorry Stop Roadworks Keep left

BDT

IVM

Linear GPC

Linear SVM

Logit Boost

Non-linear SVM

Non-linear GPC

Trained classes Unseen classes� �0 0.5 10

5

10

15

20

70kph

0 0.5 10

10

20

30

40

50

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

100

0 0.5 10

5

10

15

20

25

Right ahead

0 0.5 10

50

100

150

0 0.5 10

5

10

15

20

0 0.5 10

10

20

30

40

50

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

0 0.5 10

5

10

15

20

25

0 0.5 10

50

100

0 0.5 10

5

10

15

20

25

0 0.5 10

10

20

30

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

0 0.5 10

5

10

15

20

0 0.5 10

50

100

150

0 0.5 10

10

20

30

40

0 0.5 10

20

40

60

80

100

0 0.5 10

10

20

30

40

50

0 0.5 10

50

100

150

0 0.5 10

10

20

30

40

0 0.5 10

20

40

60

80

100

0 0.5 10

5

10

15

0 0.5 10

5

10

15

20

25

0 0.5 10

10

20

30

0 0.5 10

50

100

150

0 0.5 10

5

10

15

20

0 0.5 10

50

100

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

10

20

30

40

0 0.5 10

10

20

30

40

0 0.5 10

50

100

150

0 0.5 10

50

100

0 0.5 10

20

40

60

0 0.5 10

50

100

Fig. 4: Normalised entropy histograms (frequency vs NE) of the marginal probabilities for a variety of classifiers trained onthe road sign classes stop and lorries prohibited and tested on 200 instances of the unseen classes roadworks ahead and keepleft. Higher normalised entropy implies more uncertainty in classifier output, so we expect the classifiers to be certain (lowNE) on the trained classes and uncertain (high NE) for the unseen classes.

10

Lorry Stop Roadworks Keep left

BDT

IVM

Linear GPC

Linear SVM

Logit Boost

Non-linear SVM

Non-linear GPC

Trained classes Unseen classes� �0 0.5 10

5

10

15

20

70kph

0 0.5 10

10

20

30

40

50

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

100

0 0.5 10

5

10

15

20

25

Right ahead

0 0.5 10

50

100

150

0 0.5 10

5

10

15

20

0 0.5 10

10

20

30

40

50

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

0 0.5 10

5

10

15

20

25

0 0.5 10

50

100

0 0.5 10

5

10

15

20

25

0 0.5 10

10

20

30

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

0 0.5 10

5

10

15

20

0 0.5 10

50

100

150

0 0.5 10

10

20

30

40

0 0.5 10

20

40

60

80

100

0 0.5 10

10

20

30

40

50

0 0.5 10

50

100

150

0 0.5 10

10

20

30

40

0 0.5 10

20

40

60

80

100

0 0.5 10

5

10

15

0 0.5 10

5

10

15

20

25

0 0.5 10

10

20

30

0 0.5 10

50

100

150

0 0.5 10

5

10

15

20

0 0.5 10

50

100

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

10

20

30

40

0 0.5 10

10

20

30

40

0 0.5 10

50

100

150

0 0.5 10

50

100

0 0.5 10

20

40

60

0 0.5 10

50

100

Fig. 4: Normalised entropy histograms (frequency vs NE) of the marginal probabilities for a variety of classifiers trained onthe road sign classes stop and lorries prohibited and tested on 200 instances of the unseen classes roadworks ahead and keepleft. Higher normalised entropy implies more uncertainty in classifier output, so we expect the classifiers to be certain (lowNE) on the trained classes and uncertain (high NE) for the unseen classes.

10

Lorry Stop Roadworks Keep left

BDT

IVM

Linear GPC

Linear SVM

Logit Boost

Non-linear SVM

Non-linear GPC

Trained classes Unseen classes� �0 0.5 10

5

10

15

20

70kph

0 0.5 10

10

20

30

40

50

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

100

0 0.5 10

5

10

15

20

25

Right ahead

0 0.5 10

50

100

150

0 0.5 10

5

10

15

20

0 0.5 10

10

20

30

40

50

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

0 0.5 10

5

10

15

20

25

0 0.5 10

50

100

0 0.5 10

5

10

15

20

25

0 0.5 10

10

20

30

0 0.5 10

20

40

60

80

100

0 0.5 10

20

40

60

80

0 0.5 10

5

10

15

20

0 0.5 10

50

100

150

0 0.5 10

10

20

30

40

0 0.5 10

20

40

60

80

100

0 0.5 10

10

20

30

40

50

0 0.5 10

50

100

150

0 0.5 10

10

20

30

40

0 0.5 10

20

40

60

80

100

0 0.5 10

5

10

15

0 0.5 10

5

10

15

20

25

0 0.5 10

10

20

30

0 0.5 10

50

100

150

0 0.5 10

5

10

15

20

0 0.5 10

50

100

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

50

100

150

200

0 0.5 10

10

20

30

40

0 0.5 10

10

20

30

40

0 0.5 10

50

100

150

0 0.5 10

50

100

0 0.5 10

20

40

60

0 0.5 10

50

100

Fig. 4: Normalised entropy histograms (frequency vs NE) of the marginal probabilities for a variety of classifiers trained onthe road sign classes stop and lorries prohibited and tested on 200 instances of the unseen classes roadworks ahead and keepleft. Higher normalised entropy implies more uncertainty in classifier output, so we expect the classifiers to be certain (lowNE) on the trained classes and uncertain (high NE) for the unseen classes.

uncertaintyhistograms

The GP is less overconfident than the SVM!

[Grimmett, Paul, Triebel, Posner ICRA 2013]

Page 19: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Over- and Underconfidence

19

Definition 1: The overconfidence of a classifier is the mean of all confidence values that correspond to incorrect classifications.

Definition 2: The underconfidence of a classifier is the mean of all uncertainty values that correspond to correct classifications.

Page 20: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

The 3 Criteria of Learning Algorithms

20

•memory •run time

•precision •recall

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1P−R: Tracking and fusion PED class

Precision

Re

call

CameraLaserFusion

Run

tim

e

Memory

precision

recall

Efficiency Accuracy

Page 21: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

The 3 Criteria of Learning Algorithms

21

Efficiency Accuracy

•memory •run time

•precision •recall

Confidence•overconfidence •underconfidence

Page 22: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning for Semantic Mapping

Active Learning is well suited for semantic mapping because:

• it can deal with large amounts of data.

• data is not independent.

• the task is essentially an online learning problem.

22

Page 23: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Active Learning for Semantic Mapping

23

Optimization Prediction

Training data New Data

Label Query

Supervisor

function

f(x)Extend training

dataNew training

dataUncertainty,

Labels

Concrete Approach:

• Use IVM (a sparse online GP) for classification

• Sort classified data by uncertainty

• Request labels for the most uncertain samples

• Remove uninformative samples (“forgetting”)

Train IVMhyperparams

active set

New Batch

Predict and Sort

query most uncertain

Extend and forget training data

1

2

34

56

1

2

34

56

Template Image

1

2

3

Window

1

2

3

⨂ =⨂ =⨂ =

2

666664

n1

n2

n3...

nN

3

777775...

[Triebel, Grimmett, Paul, Posner ISRR 2013]

Page 24: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

The Informative Vector Machine

Main differences to standard GP classifier:

• it uses a subset (“active set”) of training points

• the (inverse) posterior covariance matrix is computed incrementally

Decision of inclusion in the active set based on infor-mation-theoretic criterion

Slight caveat: Training of hyper-parameters needsto be done iteratively

24

From: http://staffwww.dcs.shef.ac.uk/people/N.Lawrence/ivm/

Page 25: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Semantic Mapping: Results

➡ GP “overtakes” and stays better than SVM.

➡ Active learning better than passive learning.

➡ Random selection is not better.

25

Epoch&0& Epoch&2&Passive&detector& Ac3ve&detector& Epoch&9&

low

high

number ofdetections

false positives

true positions

[Triebel, Grimmett, Paul, Posner ISRR 2013]

0 1 2 3 4 5 6 7 8 9 100.75

0.8

0.85

0.9

0.95

1

Epoch no.

f 1 mea

sure

SVM active SVM passive SVM random IVM active IVM passive IVM random

epoch

perf

orm

ance

Page 26: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Which Classifier Should We Choose?

26

Accuracylow high

low

high

Efficiency

GPC

SVM

Boosting

Page 27: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Which Classifier Should We Choose?

27

Accuracylow high

low

high

Efficiency

Con

f. Est

.

SVM

Boosting

GPC

• SVMs are accurate, but (more) overconfident

• GPCs are less overconfident, but very inefficient

• Boosting is very efficient, but also overconfident

bad

good

Page 28: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Can We Reduce Over-(Under-)Confidence?

• Start with a very efficient multi-class boosting classifier

• Modify it so that it reduces overconfidence

• Use also the confidence to weight training samples

Result (“Confidence Boosting”):

• wrong, certain classified samples receive higher weight

• overconfidence reduced in every learning round

• overall classifier is less overconfident ⇒ better for Active Learning

Note: This is the alternative approach to making a non-overconfident classifier efficient!

28

Page 29: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Online Gradient Boosting

Online: outer loopover data points,inner loop overweak classifiers

• For each point, allweak classifiers are updated

• Agreement between prediction and ground truth

• Sample receives low weight if agreement high

Note: Agreement does not use the confidence!

29

Algorithm 1: Online Multi-class Gradient Boost [Sa↵ari et al. 2010]

Data: training data (X ,y) with C classesInput: number of weak learners M , loss function `, agreement function aOutput: weak learners f1, . . . , fM

1 Initialize(f1, . . . , fM)2 for n = 1, . . . , N do3 wn 14 gn 05 for m = 1, . . . ,M do6 fm UpdateWeakLearner(fm,xn, yn, wn)7 pnm fm(xn)8 ↵nm a(pnm, yn)9 gn gn + ↵nm

10 wn �r`(gn)11 end

12 end

Page 30: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Confidence Boosting

Main idea:

Use also the confidence to compute the agreement

Intuitively:

• if classification is correct ⇒ use the positive

confidence

• if classification is wrong ⇒ use negative confidence

Result:

• wrong, certain classified samples receive higher weight

• overconfidence reduced in every learning round

• overall classifier is less overconfident ⇒ better for Active Learning

30

Page 31: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Example: AL for 3D Object Recognition

Qualitative Results:

31

True Label = lime

0

0.05

0.1

0.15

0.2

0.25Epoch 10

klee

nex

lem

on

light

bulb

lime

mar

ker

apple

ball

bana

na

ballP

eppe

r

bind

erbo

wl

calculat

or

cam

era

cap

cellP

hone

cere

alBox

coffe

eMug

Gradient Boost −>lemonConfidence Boost −>lime

True Label = binder

0

0.05

0.1

0.15

0.2Epoch 10

klee

nex

lem

on

light

bulb

lime

mar

ker

apple

ball

bana

na

ballP

eppe

r

bind

erbo

wl

calculat

or

cam

era

cap

cellP

hone

cere

alBox

coffe

eMug

Gradient Boost −>calculatorConfidence Boost −>binder

True Label = bowl

0

0.05

0.1

0.15

0.2

0.25Epoch 10

kleen

ex

lem

on

light

bulb

lime

mar

ker

apple

ball

bana

na

ballP

eppe

r

bind

erbo

wl

calcu

lato

r

cam

era

cap

cellP

hone

cere

alBox

coffe

eMug

Gradient Boost −>coffeeMugConfidence Boost −>bowl

Page 32: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Example: AL for 3D Object Recognition

Confidence Boosting

• classifies better than GradientBoost.

• generates fewer label queries.

• is less overconfident.

• is orders of magnitude faster than GPC.

32

1 2 3 4 5 6 7 8 9 10 110

500

1000

1500

Pendigits GBPendigits CBUSPS GBUSPS CBLetter GBLetter CB

no

. q

ueries

epochs

GB

CB

correct false

Avg

. C

lass

ificatio

n E

rro

r

0

0.1

0.2

0.3

0.4

USPS Letter Pendigits DNA Begbroke RGBD

Saffari et al.act. GradientBoostact. Confidence BoostingLai et al.

Page 33: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Pool-based and Stream-based AL

Problem: Most often, data occurs in streams (online) and not in pools (offline)

But: Online Random Forests

• require a balanced class dist.

• depend on the order of the data

Idea: use Mondrian Forests [1]

Main differences:

•MFs also store the range of the data in each dimension

•MFs are independent on the class labels

33

[1] B. Lakshminarayanan, D. M. Roy, and Y. W. Teh, “Mondrian Forests: Efficient Online Random Forests,” in Advances in Neural Information Processing Systems (NIPS), 2014, pp. 3140–3148.

data stream

Page 34: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Mondrian Forests

• start with two points

• insert a split

34

Page 35: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Mondrian Forests

• start with two points

• insert a split

• add a third point

35

Page 36: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Mondrian Forests

• start with two points

• insert a split

• add a third point

• sample a “time” from an exponential distr.

• insert a node above the given node

• insert a random split

36

Page 37: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Mondrian Forests

37

• start with two points

• insert a split

• add a third point

• sample a “time” from an exponential distr.

• insert a node above the given node

• insert a random split

• add a new node inside the outer box

Page 38: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Mondrian Forests

38

• start with two points

• insert a split

• add a third point

• sample a “time” from an exponential distr.

• insert a node above the given node

• insert a random split

• add a new node inside the outer box

• sample again and insert new node at the end

• this requires extending a split

Page 39: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Why the Name “Mondrian” Forests?

39

Page 40: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Why the Name “Mondrian” Forests?

Piet Mondrian: “Composition with Red, Yellow, Blue, and Black” 1926

Gemeentemuseum, Den Haag

40

Page 41: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Application to Real Data

KiTTI benchmark data set:

• 3D point clouds

• cars, pedestrians, bikes, trucks, etc.

• segmentation given (“tracklets”)

Goal:

• online classification

• use active learning

Major problems:

• un-balanced classes

• order of appearance

41

Page 42: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

The Data Set

The real data

42

The data uniformly resampled

Page 43: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

Results (no Active Learning)

43

The real data The data uniformly resampled

A. Narr, R. Triebel, D. Cremers “Stream-based Active Learning for Efficient and Adaptive Classification of 3D Objects” in: ICRA 2016

Page 44: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning

• Mondrian Forests require only 5% of the data to reach 90% accuracy

• Standard Random Forests can not reach that

44

Results Using Active Learning

A. Narr, R. Triebel, D. Cremers “Stream-based Active Learning for Efficient and Adaptive Classification of 3D Objects” in: ICRA 2016

Page 45: 16. Online Learning - TUM€¦ · Online Learning Uncertainty Estimation: An Example 17 Comparison by evaluation on an unseen class: Ground Truth GP Classification building tree

PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning

Summary

• Online learning has important advantages over offline learning: efficiency and adaptivity

• In active learning the algorithm selects the data to learn from (human in the loop)

• This requires good confidence estimates

• The GP classifiers tends to be less overconfident, but inefficient

• A faster alternative method is Confidence Boosting, it is very efficient and not much overconfident

• For stream-based Active Learning Mondrian Forests are better suited

45