Computer Vision Group Prof. Daniel Cremers
16. Online Learning
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Motivation
• So far, we have handled offline learning methods, but very often data is observed in an ongoing process
• Therefore, it would be good to adapt the learned models with newly arriving data without having to consider old data
• This is called online learning
• Major benefits:
• algorithms are adaptive to new observations
• learning is usually faster
2
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Clarification of Notions
There are (at least) three notions:
• incremental learning:
• model is updated with new samples, but old samples might be considered again
• online learning:
• after considering a data sample, it can be disregarded (no need to store it)
• learning at frame rate:
• learning is fast enough such that it can be performed in one perception cycle
3
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Example: Pedestrian Detection
Offline Approach:
1. Collect and annotate a large data set from different sensor modalities, here: camera and 2D laser
4
Codebook
Image DataLaser Data
Training SetTraining Set
[Spinello, Triebel, and Siegwart, IJRR 2010]
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Example: Pedestrian Detection
Offline Approach:
1. Collect and annotate a large data set from different sensor modalities, here: camera and 2D laser
2. Train a classifier for each sensor modality
5
Codebook
Image DataLaser Data
Boosted CRF
Extended ISM
Training SetTraining Set
[Spinello, Triebel, and Siegwart, IJRR 2010]
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Example: Pedestrian Detection
Offline Approach:
1. Collect and annotate a large data set from different sensor modalities, here: camera and 2D laser
2. Train a classifier for each sensor modality
3. Apply the classifiers and fuse data
6
Codebook
Image DataLaser Data
Boosted CRF
Extended ISM
Training SetTraining Set
[Spinello, Triebel, and Siegwart, IJRR 2010]
New Laser Data Stream New Image Data Stream
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Example: Pedestrian Detection
Testing Phase:
7
Fused pedestrian detection and trackingFused detection and tracking
of cars and pedestrians
[Spinello, Triebel, and Siegwart, IJRR 2010]
But: non-adaptive; large training set required
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Major Problem of this Approach
Offline learning:
• cannot adapt to new environments
• cannot learn new classes
• cannot consider new instances of a given class
• is often very slow
• requires a large amount of manually annotated training data
Ideas:
• use online learning for adaptivity
• use active learning to reduce (human) work load
8
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning
• To reduce the required training data, the learner can select the data it needs to learn from.
• This is called active learning
• Major advantages:
• only those samples where classification is hard are used for re-training
• humans only need to give ground truth labels for a smaller set of samples
• In principle, active learning can be used with offline and online learning, but online makes more sense
9
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning
10
Optimization
Training data
function
f(x)
“Laptop”
“Cup”f(x)
X
Y
training data
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning
11
Optimization Prediction
Training data New Data
function Uncertainty,Labelsf(x)
“Laptop”
“Cup”f(x)
X
Y
f(x)
training data
“Laptop”
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning
12
Optimization Prediction
Training data New Data
function
f(x)
“Laptop”
“Cup”f(x)
X
Y
f(x)
training data
?
Uncertainty,Labels
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning
13
Optimization Prediction
Training data New Data
Label Query
Supervisor
function
f(x)
“Laptop”
“Cup”f(x)
X
Y
training data
“Present”
Uncertainty,Labels
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning
14
Optimization Prediction
Training data New Data
Label Query
Supervisor
function
f(x)
“Laptop”
“Cup”f(x)
X
Y
training data
“Present”
Extend training data
New training data
Uncertainty,Labels
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning
15
Optimization Prediction
Training data New Data
Label Query
Supervisor
function
f(x)Extend training
dataNew training
dataUncertainty,
Labels
Major Benefits:
• Adapts to new situations
• Requires less training samples
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Uncertainty Estimates
• A key step in active learning is the selection step
• It requires a good estimate of uncertainties
• Examples for classification algorithms to compute uncertainties:
• Support Vector Machine (“Platt scaling”)
• Gaussian Process Classifier (“predictive variance”)
• Tree-based classifiers (entropy in leaf nodes)
16
But: It is important how uncertainty correlates to incorrect classifications!
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Uncertainty Estimation: An Example
17
Comparison by evaluation on an unseen class:
Ground Truth GP Classification
building
tree
ground
hedge
car
background
0
1
Normalized Entropy
Uncertainty
0 0.2 0.4 0.6 0.8 10
50
100
150
200
Normalized Entropy for Label Distribution
Fre
quency
GP Classifier
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300
Normalized Entropy for Label Distribution
Fre
qu
en
cy
SVM Classifier
The GP is less overconfident than the SVM!
[Paul, Triebel, Rus, Newman, IROS 2012]
SVM GPC
uncertainty uncertainty
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Application: Traffic Sign Classification
18
10
Lorry Stop Roadworks Keep left
BDT
IVM
Linear GPC
Linear SVM
Logit Boost
Non-linear SVM
Non-linear GPC
Trained classes Unseen classes� �0 0.5 10
5
10
15
20
70kph
0 0.5 10
10
20
30
40
50
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
100
0 0.5 10
5
10
15
20
25
Right ahead
0 0.5 10
50
100
150
0 0.5 10
5
10
15
20
0 0.5 10
10
20
30
40
50
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
0 0.5 10
5
10
15
20
25
0 0.5 10
50
100
0 0.5 10
5
10
15
20
25
0 0.5 10
10
20
30
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
0 0.5 10
5
10
15
20
0 0.5 10
50
100
150
0 0.5 10
10
20
30
40
0 0.5 10
20
40
60
80
100
0 0.5 10
10
20
30
40
50
0 0.5 10
50
100
150
0 0.5 10
10
20
30
40
0 0.5 10
20
40
60
80
100
0 0.5 10
5
10
15
0 0.5 10
5
10
15
20
25
0 0.5 10
10
20
30
0 0.5 10
50
100
150
0 0.5 10
5
10
15
20
0 0.5 10
50
100
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
10
20
30
40
0 0.5 10
10
20
30
40
0 0.5 10
50
100
150
0 0.5 10
50
100
0 0.5 10
20
40
60
0 0.5 10
50
100
Fig. 4: Normalised entropy histograms (frequency vs NE) of the marginal probabilities for a variety of classifiers trained onthe road sign classes stop and lorries prohibited and tested on 200 instances of the unseen classes roadworks ahead and keepleft. Higher normalised entropy implies more uncertainty in classifier output, so we expect the classifiers to be certain (lowNE) on the trained classes and uncertain (high NE) for the unseen classes.
10
Lorry Stop Roadworks Keep left
BDT
IVM
Linear GPC
Linear SVM
Logit Boost
Non-linear SVM
Non-linear GPC
Trained classes Unseen classes� �0 0.5 10
5
10
15
20
70kph
0 0.5 10
10
20
30
40
50
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
100
0 0.5 10
5
10
15
20
25
Right ahead
0 0.5 10
50
100
150
0 0.5 10
5
10
15
20
0 0.5 10
10
20
30
40
50
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
0 0.5 10
5
10
15
20
25
0 0.5 10
50
100
0 0.5 10
5
10
15
20
25
0 0.5 10
10
20
30
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
0 0.5 10
5
10
15
20
0 0.5 10
50
100
150
0 0.5 10
10
20
30
40
0 0.5 10
20
40
60
80
100
0 0.5 10
10
20
30
40
50
0 0.5 10
50
100
150
0 0.5 10
10
20
30
40
0 0.5 10
20
40
60
80
100
0 0.5 10
5
10
15
0 0.5 10
5
10
15
20
25
0 0.5 10
10
20
30
0 0.5 10
50
100
150
0 0.5 10
5
10
15
20
0 0.5 10
50
100
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
10
20
30
40
0 0.5 10
10
20
30
40
0 0.5 10
50
100
150
0 0.5 10
50
100
0 0.5 10
20
40
60
0 0.5 10
50
100
Fig. 4: Normalised entropy histograms (frequency vs NE) of the marginal probabilities for a variety of classifiers trained onthe road sign classes stop and lorries prohibited and tested on 200 instances of the unseen classes roadworks ahead and keepleft. Higher normalised entropy implies more uncertainty in classifier output, so we expect the classifiers to be certain (lowNE) on the trained classes and uncertain (high NE) for the unseen classes.
10
Lorry Stop Roadworks Keep left
BDT
IVM
Linear GPC
Linear SVM
Logit Boost
Non-linear SVM
Non-linear GPC
Trained classes Unseen classes� �0 0.5 10
5
10
15
20
70kph
0 0.5 10
10
20
30
40
50
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
100
0 0.5 10
5
10
15
20
25
Right ahead
0 0.5 10
50
100
150
0 0.5 10
5
10
15
20
0 0.5 10
10
20
30
40
50
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
0 0.5 10
5
10
15
20
25
0 0.5 10
50
100
0 0.5 10
5
10
15
20
25
0 0.5 10
10
20
30
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
0 0.5 10
5
10
15
20
0 0.5 10
50
100
150
0 0.5 10
10
20
30
40
0 0.5 10
20
40
60
80
100
0 0.5 10
10
20
30
40
50
0 0.5 10
50
100
150
0 0.5 10
10
20
30
40
0 0.5 10
20
40
60
80
100
0 0.5 10
5
10
15
0 0.5 10
5
10
15
20
25
0 0.5 10
10
20
30
0 0.5 10
50
100
150
0 0.5 10
5
10
15
20
0 0.5 10
50
100
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
10
20
30
40
0 0.5 10
10
20
30
40
0 0.5 10
50
100
150
0 0.5 10
50
100
0 0.5 10
20
40
60
0 0.5 10
50
100
Fig. 4: Normalised entropy histograms (frequency vs NE) of the marginal probabilities for a variety of classifiers trained onthe road sign classes stop and lorries prohibited and tested on 200 instances of the unseen classes roadworks ahead and keepleft. Higher normalised entropy implies more uncertainty in classifier output, so we expect the classifiers to be certain (lowNE) on the trained classes and uncertain (high NE) for the unseen classes.
10
Lorry Stop Roadworks Keep left
BDT
IVM
Linear GPC
Linear SVM
Logit Boost
Non-linear SVM
Non-linear GPC
Trained classes Unseen classes� �0 0.5 10
5
10
15
20
70kph
0 0.5 10
10
20
30
40
50
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
100
0 0.5 10
5
10
15
20
25
Right ahead
0 0.5 10
50
100
150
0 0.5 10
5
10
15
20
0 0.5 10
10
20
30
40
50
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
0 0.5 10
5
10
15
20
25
0 0.5 10
50
100
0 0.5 10
5
10
15
20
25
0 0.5 10
10
20
30
0 0.5 10
20
40
60
80
100
0 0.5 10
20
40
60
80
0 0.5 10
5
10
15
20
0 0.5 10
50
100
150
0 0.5 10
10
20
30
40
0 0.5 10
20
40
60
80
100
0 0.5 10
10
20
30
40
50
0 0.5 10
50
100
150
0 0.5 10
10
20
30
40
0 0.5 10
20
40
60
80
100
0 0.5 10
5
10
15
0 0.5 10
5
10
15
20
25
0 0.5 10
10
20
30
0 0.5 10
50
100
150
0 0.5 10
5
10
15
20
0 0.5 10
50
100
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
50
100
150
200
0 0.5 10
10
20
30
40
0 0.5 10
10
20
30
40
0 0.5 10
50
100
150
0 0.5 10
50
100
0 0.5 10
20
40
60
0 0.5 10
50
100
Fig. 4: Normalised entropy histograms (frequency vs NE) of the marginal probabilities for a variety of classifiers trained onthe road sign classes stop and lorries prohibited and tested on 200 instances of the unseen classes roadworks ahead and keepleft. Higher normalised entropy implies more uncertainty in classifier output, so we expect the classifiers to be certain (lowNE) on the trained classes and uncertain (high NE) for the unseen classes.
uncertaintyhistograms
The GP is less overconfident than the SVM!
[Grimmett, Paul, Triebel, Posner ICRA 2013]
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Over- and Underconfidence
19
Definition 1: The overconfidence of a classifier is the mean of all confidence values that correspond to incorrect classifications.
Definition 2: The underconfidence of a classifier is the mean of all uncertainty values that correspond to correct classifications.
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
The 3 Criteria of Learning Algorithms
20
•memory •run time
•precision •recall
0 0.2 0.4 0.6 0.8 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1P−R: Tracking and fusion PED class
Precision
Re
call
CameraLaserFusion
Run
tim
e
Memory
precision
recall
Efficiency Accuracy
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
The 3 Criteria of Learning Algorithms
21
Efficiency Accuracy
•memory •run time
•precision •recall
Confidence•overconfidence •underconfidence
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning for Semantic Mapping
Active Learning is well suited for semantic mapping because:
• it can deal with large amounts of data.
• data is not independent.
• the task is essentially an online learning problem.
22
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Active Learning for Semantic Mapping
23
Optimization Prediction
Training data New Data
Label Query
Supervisor
function
f(x)Extend training
dataNew training
dataUncertainty,
Labels
Concrete Approach:
• Use IVM (a sparse online GP) for classification
• Sort classified data by uncertainty
• Request labels for the most uncertain samples
• Remove uninformative samples (“forgetting”)
Train IVMhyperparams
active set
New Batch
Predict and Sort
query most uncertain
Extend and forget training data
1
2
34
56
1
2
34
56
Template Image
1
2
3
Window
1
2
3
⨂ =⨂ =⨂ =
2
666664
n1
n2
n3...
nN
3
777775...
[Triebel, Grimmett, Paul, Posner ISRR 2013]
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
The Informative Vector Machine
Main differences to standard GP classifier:
• it uses a subset (“active set”) of training points
• the (inverse) posterior covariance matrix is computed incrementally
Decision of inclusion in the active set based on infor-mation-theoretic criterion
Slight caveat: Training of hyper-parameters needsto be done iteratively
24
From: http://staffwww.dcs.shef.ac.uk/people/N.Lawrence/ivm/
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Semantic Mapping: Results
➡ GP “overtakes” and stays better than SVM.
➡ Active learning better than passive learning.
➡ Random selection is not better.
25
Epoch&0& Epoch&2&Passive&detector& Ac3ve&detector& Epoch&9&
low
high
number ofdetections
false positives
true positions
[Triebel, Grimmett, Paul, Posner ISRR 2013]
0 1 2 3 4 5 6 7 8 9 100.75
0.8
0.85
0.9
0.95
1
Epoch no.
f 1 mea
sure
SVM active SVM passive SVM random IVM active IVM passive IVM random
epoch
perf
orm
ance
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Which Classifier Should We Choose?
26
Accuracylow high
low
high
Efficiency
GPC
SVM
Boosting
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Which Classifier Should We Choose?
27
Accuracylow high
low
high
Efficiency
Con
f. Est
.
SVM
Boosting
GPC
• SVMs are accurate, but (more) overconfident
• GPCs are less overconfident, but very inefficient
• Boosting is very efficient, but also overconfident
bad
good
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Can We Reduce Over-(Under-)Confidence?
• Start with a very efficient multi-class boosting classifier
• Modify it so that it reduces overconfidence
• Use also the confidence to weight training samples
Result (“Confidence Boosting”):
• wrong, certain classified samples receive higher weight
• overconfidence reduced in every learning round
• overall classifier is less overconfident ⇒ better for Active Learning
Note: This is the alternative approach to making a non-overconfident classifier efficient!
28
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Online Gradient Boosting
Online: outer loopover data points,inner loop overweak classifiers
• For each point, allweak classifiers are updated
• Agreement between prediction and ground truth
• Sample receives low weight if agreement high
Note: Agreement does not use the confidence!
29
Algorithm 1: Online Multi-class Gradient Boost [Sa↵ari et al. 2010]
Data: training data (X ,y) with C classesInput: number of weak learners M , loss function `, agreement function aOutput: weak learners f1, . . . , fM
1 Initialize(f1, . . . , fM)2 for n = 1, . . . , N do3 wn 14 gn 05 for m = 1, . . . ,M do6 fm UpdateWeakLearner(fm,xn, yn, wn)7 pnm fm(xn)8 ↵nm a(pnm, yn)9 gn gn + ↵nm
10 wn �r`(gn)11 end
12 end
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Confidence Boosting
Main idea:
Use also the confidence to compute the agreement
Intuitively:
• if classification is correct ⇒ use the positive
confidence
• if classification is wrong ⇒ use negative confidence
Result:
• wrong, certain classified samples receive higher weight
• overconfidence reduced in every learning round
• overall classifier is less overconfident ⇒ better for Active Learning
30
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Example: AL for 3D Object Recognition
Qualitative Results:
31
True Label = lime
0
0.05
0.1
0.15
0.2
0.25Epoch 10
klee
nex
lem
on
light
bulb
lime
mar
ker
apple
ball
bana
na
ballP
eppe
r
bind
erbo
wl
calculat
or
cam
era
cap
cellP
hone
cere
alBox
coffe
eMug
Gradient Boost −>lemonConfidence Boost −>lime
True Label = binder
0
0.05
0.1
0.15
0.2Epoch 10
klee
nex
lem
on
light
bulb
lime
mar
ker
apple
ball
bana
na
ballP
eppe
r
bind
erbo
wl
calculat
or
cam
era
cap
cellP
hone
cere
alBox
coffe
eMug
Gradient Boost −>calculatorConfidence Boost −>binder
True Label = bowl
0
0.05
0.1
0.15
0.2
0.25Epoch 10
kleen
ex
lem
on
light
bulb
lime
mar
ker
apple
ball
bana
na
ballP
eppe
r
bind
erbo
wl
calcu
lato
r
cam
era
cap
cellP
hone
cere
alBox
coffe
eMug
Gradient Boost −>coffeeMugConfidence Boost −>bowl
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Example: AL for 3D Object Recognition
Confidence Boosting
• classifies better than GradientBoost.
• generates fewer label queries.
• is less overconfident.
• is orders of magnitude faster than GPC.
32
1 2 3 4 5 6 7 8 9 10 110
500
1000
1500
Pendigits GBPendigits CBUSPS GBUSPS CBLetter GBLetter CB
no
. q
ueries
epochs
GB
CB
correct false
Avg
. C
lass
ificatio
n E
rro
r
0
0.1
0.2
0.3
0.4
USPS Letter Pendigits DNA Begbroke RGBD
Saffari et al.act. GradientBoostact. Confidence BoostingLai et al.
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Pool-based and Stream-based AL
Problem: Most often, data occurs in streams (online) and not in pools (offline)
But: Online Random Forests
• require a balanced class dist.
• depend on the order of the data
Idea: use Mondrian Forests [1]
Main differences:
•MFs also store the range of the data in each dimension
•MFs are independent on the class labels
33
[1] B. Lakshminarayanan, D. M. Roy, and Y. W. Teh, “Mondrian Forests: Efficient Online Random Forests,” in Advances in Neural Information Processing Systems (NIPS), 2014, pp. 3140–3148.
data stream
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Mondrian Forests
• start with two points
• insert a split
34
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Mondrian Forests
• start with two points
• insert a split
• add a third point
35
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Mondrian Forests
• start with two points
• insert a split
• add a third point
• sample a “time” from an exponential distr.
• insert a node above the given node
• insert a random split
36
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Mondrian Forests
37
• start with two points
• insert a split
• add a third point
• sample a “time” from an exponential distr.
• insert a node above the given node
• insert a random split
• add a new node inside the outer box
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Mondrian Forests
38
• start with two points
• insert a split
• add a third point
• sample a “time” from an exponential distr.
• insert a node above the given node
• insert a random split
• add a new node inside the outer box
• sample again and insert new node at the end
• this requires extending a split
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Why the Name “Mondrian” Forests?
39
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Why the Name “Mondrian” Forests?
Piet Mondrian: “Composition with Red, Yellow, Blue, and Black” 1926
Gemeentemuseum, Den Haag
40
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Application to Real Data
KiTTI benchmark data set:
• 3D point clouds
• cars, pedestrians, bikes, trucks, etc.
• segmentation given (“tracklets”)
Goal:
• online classification
• use active learning
Major problems:
• un-balanced classes
• order of appearance
41
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
The Data Set
The real data
42
The data uniformly resampled
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
Results (no Active Learning)
43
The real data The data uniformly resampled
A. Narr, R. Triebel, D. Cremers “Stream-based Active Learning for Efficient and Adaptive Classification of 3D Objects” in: ICRA 2016
PD Dr. habil. Rudolph TriebelComputer Vision GroupOnline Learning
• Mondrian Forests require only 5% of the data to reach 90% accuracy
• Standard Random Forests can not reach that
44
Results Using Active Learning
A. Narr, R. Triebel, D. Cremers “Stream-based Active Learning for Efficient and Adaptive Classification of 3D Objects” in: ICRA 2016
PD Dr. Rudolph TriebelComputer Vision GroupOnline Learning
Summary
• Online learning has important advantages over offline learning: efficiency and adaptivity
• In active learning the algorithm selects the data to learn from (human in the loop)
• This requires good confidence estimates
• The GP classifiers tends to be less overconfident, but inefficient
• A faster alternative method is Confidence Boosting, it is very efficient and not much overconfident
• For stream-based Active Learning Mondrian Forests are better suited
45