Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Lecture 8Online and Incremental Learning
Pavel Laskov1 Blaine Nelson1
1Cognitive Systems Group
Wilhelm Schickard Institute for Computer Science
Universitat Tubingen, Germany
Advanced Topics in Machine Learning, 2012
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 1 / 24
Why Learning Online?
Non-stationarity of real data
⇒ Update of models is needed when new data is available⇒ Evaluation of relevance of past data
Learning on a tight budget
⇒ Some data may be irrelevant⇒ Cost-sensitive learning
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 2 / 24
Applications of Online Learning
Time series analysis
Security event monitoring
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2
x 104
100
110
120
130
140
150
160
170
180
190
200EW−CUSUM, geometric STD normalization
Anomaly and change point detection
Object tracking
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 3 / 24
Computational Considerations
How much do we pay for training?
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 4 / 24
Computational Considerations
How much do we pay for training?
PCA: O(n3) (eigenvalue decomposition)
Ridge regression: O(n3) (matrix inversion)
SVM: O(n2 log n) (theoretical bound on feasible directiondecomposition)
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 4 / 24
Computational Considerations
How much do we pay for training?
PCA: O(n3) (eigenvalue decomposition)
Ridge regression: O(n3) (matrix inversion)
SVM: O(n2 log n) (theoretical bound on feasible directiondecomposition)
How much are we willing to pay?
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 4 / 24
Computational Considerations
How much do we pay for training?
PCA: O(n3) (eigenvalue decomposition)
Ridge regression: O(n3) (matrix inversion)
SVM: O(n2 log n) (theoretical bound on feasible directiondecomposition)
How much are we willing to pay?
An order of magnitude less: to match batch learning
Constant or linear time: if we are really greedy!
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 4 / 24
Incremental SVM: Problem Setup
Initial state: An SVM has been trained on a data setX = {(x1, y1), . . . , (xk , yk)}, and the initial solution α0 is available.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 5 / 24
Incremental SVM: Problem Setup
Initial state: An SVM has been trained on a data setX = {(x1, y1), . . . , (xk , yk)}, and the initial solution α0 is available.
Goal: Given a new data point xc , find an solution α∗ which is optimalfor the extended training set {X ∪ xc}.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 5 / 24
Incremental SVM: Problem Setup
Initial state: An SVM has been trained on a data setX = {(x1, y1), . . . , (xk , yk)}, and the initial solution α0 is available.
Goal: Given a new data point xc , find an solution α∗ which is optimalfor the extended training set {X ∪ xc}.
Key idea: Ensure that optimality conditions are enforced at every stepof updating the solution from α0 to α∗.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 5 / 24
Incremental SVM: Problem Setup
Initial state: An SVM has been trained on a data setX = {(x1, y1), . . . , (xk , yk)}, and the initial solution α0 is available.
Goal: Given a new data point xc , find an solution α∗ which is optimalfor the extended training set {X ∪ xc}.
Key idea: Ensure that optimality conditions are enforced at every stepof updating the solution from α0 to α∗.
Notation: All training data will be grouped into three sets according tothe values of α:
set S: 0 < αi < C
set O: αi = 0set E : αi = C
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 5 / 24
KKT Re-visited (once again ,)
Consider the saddle-point formulation of the SVM training problem:
maxµ
min0≤α≤C
W = −1⊤α+ 12α
⊤Hα+ µ(y⊤α)
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 6 / 24
KKT Re-visited (once again ,)
Consider the saddle-point formulation of the SVM training problem:
maxµ
min0≤α≤C
W = −1⊤α+ 12α
⊤Hα+ µ(y⊤α)
The KKT conditions for this problem can be expressed as follows:
gi =∂W
∂αi
=∑
j
Hijαj + yiµ− 1
≥ 0, i ∈ O
= 0, i ∈ S
≤ 0, i ∈ E
∂W
∂µ=
∑
j
yjαj = 0
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 6 / 24
KKT Re-visited (once again ,)
Consider the saddle-point formulation of the SVM training problem:
maxµ
min0≤α≤C
W = −1⊤α+ 12α
⊤Hα+ µ(y⊤α)
The KKT conditions for this problem can be expressed as follows:
gi =∂W
∂αi
=∑
j
Hijαj + yiµ− 1
≥ 0, i ∈ O
= 0, i ∈ S
≤ 0, i ∈ E
∂W
∂µ=
∑
j
yjαj = 0
These conditions must be met for both α0 and α∗, i.e., before and afterthe inclusion of xc , and at every single intermediate step!
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 6 / 24
Incremental SVM: Initialization
Which weight αc should be assigned to xc at the beginning?
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 7 / 24
Incremental SVM: Initialization
Which weight αc should be assigned to xc at the beginning?
Let’s try αc = 0. What happens?
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 7 / 24
Incremental SVM: Initialization
Which weight αc should be assigned to xc at the beginning?
Let’s try αc = 0. What happens?
Constraints for all gi , i 6= c remain intact.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 7 / 24
Incremental SVM: Initialization
Which weight αc should be assigned to xc at the beginning?
Let’s try αc = 0. What happens?
Constraints for all gi , i 6= c remain intact.
Equality constraint remains intact.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 7 / 24
Incremental SVM: Initialization
Which weight αc should be assigned to xc at the beginning?
Let’s try αc = 0. What happens?
Constraints for all gi , i 6= c remain intact.
Equality constraint remains intact.
A new constraint gets added for gc :
gi =∑
j
Hcjαj + ycµ− 1 ≥ 0, since αc = 0
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 7 / 24
Incremental SVM: Initialization
Which weight αc should be assigned to xc at the beginning?
Let’s try αc = 0. What happens?
Constraints for all gi , i 6= c remain intact.
Equality constraint remains intact.
A new constraint gets added for gc :
gi =∑
j
Hcjαj + ycµ− 1 ≥ 0, since αc = 0
If the last constraint is satified for αc = 0, then nothing needs to bedone, and α∗ = α0.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 7 / 24
Incremental SVM: Initialization
Which weight αc should be assigned to xc at the beginning?
Let’s try αc = 0. What happens?
Constraints for all gi , i 6= c remain intact.
Equality constraint remains intact.
A new constraint gets added for gc :
gi =∑
j
Hcjαj + ycµ− 1 ≥ 0, since αc = 0
If the last constraint is satified for αc = 0, then nothing needs to bedone, and α∗ = α0.
Otherwise we need to increase αc and update all other α’s so as to keepall conditions except for gc satisfied.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 7 / 24
KKT Conditions in Differential Form
Consider the difference between the optimality conditions in two differentSVM states:
∆gi = Hic∆αc +∑
j∈S
Hij∆αj + yi∆µ, ∀i ∈ D ∪ {c}
0 = yc∆αc +∑
j∈S
yj∆αj
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 8 / 24
KKT Conditions in Differential Form
Consider the difference between the optimality conditions in two differentSVM states:
∆gi = Hic∆αc +∑
j∈S
Hij∆αj + yi∆µ, ∀i ∈ D ∪ {c}
0 = yc∆αc +∑
j∈S
yj∆αj
For i ∈ S, gi = 0; hence, these conditions can be written in matrix form:
0 ys1 · · · yskys1 Hs1s1 · · · Hs1sk...
.... . .
...ysk Hsk s1 · · · Hsk sk
︸ ︷︷ ︸
Q
∆µ
∆αs1...
∆αsk
︸ ︷︷ ︸
∆αS
= −
ycHs1c
...Hskc
︸ ︷︷ ︸ηc
∆αc
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 8 / 24
Sensitivity Conditions
The conditions for gS imply that
∆αS = −Q−1ηc︸ ︷︷ ︸
β
∆αc
⇒ vector β measures sensitivity of α with respect to ∆αc .
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 9 / 24
Sensitivity Conditions
The conditions for gS imply that
∆αS = −Q−1ηc︸ ︷︷ ︸
β
∆αc
⇒ vector β measures sensitivity of α with respect to ∆αc .
Substituting the sensitivity of αS back into the conditions for g , weobtain:
∆gi = γi∆αc , ∀i ∈ D ∪ {c}
where γi = Hic +∑
j∈S
Hijβj + yiβ0
⇒ vector γ measures sensitivity of gi , for i ∈ {O, E}, with respect to ∆αc .
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 9 / 24
“Bookkeeping”
Knowing the increment ∆αc we directly forecast:
the change ∆αS of the weights for points in S,the change ∆gi of the gradient for points in {O, E}.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 10 / 24
“Bookkeeping”
Knowing the increment ∆αc we directly forecast:
the change ∆αS of the weights for points in S,the change ∆gi of the gradient for points in {O, E}.
/ Unfortunately, these forecasts hold only as long as no structural changesoccur between S, O and E .
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 10 / 24
“Bookkeeping”
Knowing the increment ∆αc we directly forecast:
the change ∆αS of the weights for points in S,the change ∆gi of the gradient for points in {O, E}.
/ Unfortunately, these forecasts hold only as long as no structural changesoccur between S, O and E .
Potential structural changes:
for i ∈ S, αi → C : i joins E .for i ∈ S, αi → 0: i joins O.for i ∈ O, gi > 0→ 0: i joins S.for i ∈ E , gi < 0→ 0: i joins S.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 10 / 24
“Bookkeeping”
Knowing the increment ∆αc we directly forecast:
the change ∆αS of the weights for points in S,the change ∆gi of the gradient for points in {O, E}.
/ Unfortunately, these forecasts hold only as long as no structural changesoccur between S, O and E .
Potential structural changes:
for i ∈ S, αi → C : i joins E .for i ∈ S, αi → 0: i joins O.for i ∈ O, gi > 0→ 0: i joins S.for i ∈ E , gi < 0→ 0: i joins S.
Key idea: we can predict the smallest ∆αc such that for some i , thefirst structural change occurs.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 10 / 24
Predicting Structural Changes
1 i ∈ S, βi > 0: α grows up to C .
find ∆α1c = mini
C−αi
βi
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 11 / 24
Predicting Structural Changes
1 i ∈ S, βi > 0: α grows up to C .
find ∆α1c = mini
C−αi
βi
2 i ∈ S, βi < 0: α falls down to 0.
find ∆α2c = mini
−αi
βi
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 11 / 24
Predicting Structural Changes
1 i ∈ S, βi > 0: α grows up to C .
find ∆α1c = mini
C−αi
βi
2 i ∈ S, βi < 0: α falls down to 0.
find ∆α2c = mini
−αi
βi
3 i ∈ E , γi > 0 or i ∈ O, γi < 0: gi grows up / falls down to 0
find ∆α3c = mini
−giγi
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 11 / 24
Predicting Structural Changes
1 i ∈ S, βi > 0: α grows up to C .
find ∆α1c = mini
C−αi
βi
2 i ∈ S, βi < 0: α falls down to 0.
find ∆α2c = mini
−αi
βi
3 i ∈ E , γi > 0 or i ∈ O, γi < 0: gi grows up / falls down to 0
find ∆α3c = mini
−giγi
4 gc becomes 0 (termination condition 1):
compute ∆α4c = −gc
γc
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 11 / 24
Predicting Structural Changes
1 i ∈ S, βi > 0: α grows up to C .
find ∆α1c = mini
C−αi
βi
2 i ∈ S, βi < 0: α falls down to 0.
find ∆α2c = mini
−αi
βi
3 i ∈ E , γi > 0 or i ∈ O, γi < 0: gi grows up / falls down to 0
find ∆α3c = mini
−giγi
4 gc becomes 0 (termination condition 1):
compute ∆α4c = −gc
γc
5 αc reaches C (termination condition 2):
compute ∆α5c = C − αc
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 11 / 24
Predicting Structural Changes
1 i ∈ S, βi > 0: α grows up to C .
find ∆α1c = mini
C−αi
βi
2 i ∈ S, βi < 0: α falls down to 0.
find ∆α2c = mini
−αi
βi
3 i ∈ E , γi > 0 or i ∈ O, γi < 0: gi grows up / falls down to 0
find ∆α3c = mini
−giγi
4 gc becomes 0 (termination condition 1):
compute ∆α4c = −gc
γc
5 αc reaches C (termination condition 2):
compute ∆α5c = C − αc
The smallest increment ∆αc among the 5 possible cases yields the firststructural change.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 11 / 24
Structural Change: Post-processing
The main operation to be performed after the structural change is theupdate of the inverse matrix Q−1.
Example k joins S:
⇒ add a row and a column with zero entries to Q−1
⇒ perform a recursive update (next slide)
Example k leaves S:
⇒ re-arrange Q−1 to put row and column k “at the end”⇒ update the values (follows)
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 12 / 24
Recursive Expansion of Q−1
Consider the specific structure of Q:
Q−1 =
0 y⊤s ykys Hss Hsk
yk Hks Hkk
−1
=
[Q ηkη⊤k Hkk
]−1
whereη⊤k =
[yk Hks
]
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 13 / 24
Recursive Expansion of Q−1
Consider the specific structure of Q:
Q−1 =
0 y⊤s ykys Hss Hsk
yk Hks Hkk
−1
=
[Q ηkη⊤k Hkk
]−1
whereη⊤k =
[yk Hks
]
We will make use of the Sherman-Woodbury-Morrison formula:
[A B
C D
]−1
=
[A−1 + A−1BZCA−1 −A−1BZ
−ZCA−1 Z
]
where Z = (D − CA−1B)−1 = (Hkk − η⊤k Q−1ηk)
−1 (scalar !)
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 13 / 24
Recursive Expansion of Q−1 (ctd.)
Putting everything together and using the fact that βk = −Q−1ηk (seeslide 9), we obtain:
0 y⊤s ykys Hss Hsk
yk Hks Hkk
−1
=
[Q−1 + Z (Q−1ηk)(Q
−1ηk)⊤ −Z (Q−1ηk)
−Z (Q−1ηk)⊤ Z
]
=
[Q−1 00 0
]
+ Z
[βk1
][β⊤k 1
]
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 14 / 24
Recursive Expansion of Q−1 (ctd.)
Putting everything together and using the fact that βk = −Q−1ηk (seeslide 9), we obtain:
0 y⊤s ykys Hss Hsk
yk Hks Hkk
−1
=
[Q−1 + Z (Q−1ηk)(Q
−1ηk)⊤ −Z (Q−1ηk)
−Z (Q−1ηk)⊤ Z
]
=
[Q−1 00 0
]
+ Z
[βk1
][β⊤k 1
]
Hence, expansion of Q−1 amounts to:
extending Q−1 with a zero row and column (O(s))
computing Z (O(s)2)
computing and adding the outer product of [β⊤k 1] (O(s)2)
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 14 / 24
Recursive Contraction of Q−1
Before contraction of Q−1, we have:
Q−1 =
[q11 q12q21 q22
]
=
[Q−1 + Zβkβ
⊤k Zβk
Zβ⊤k Z
]
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 15 / 24
Recursive Contraction of Q−1
Before contraction of Q−1, we have:
Q−1 =
[q11 q12q21 q22
]
=
[Q−1 + Zβkβ
⊤k Zβk
Zβ⊤k Z
]
Writing this relation component-wise, we obtain:
Q−1 = q11 − Zβkβ⊤k
q⊤21 = q12 = Zβk
q22 = Z
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 15 / 24
Recursive Contraction of Q−1
Before contraction of Q−1, we have:
Q−1 =
[q11 q12q21 q22
]
=
[Q−1 + Zβkβ
⊤k Zβk
Zβ⊤k Z
]
Writing this relation component-wise, we obtain:
Q−1 = q11 − Zβkβ⊤k
q⊤21 = q12 = Zβk
q22 = Z
Combination of the three constraints above yields:
Q−1 = q11 −q12q21
q22
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 15 / 24
Incremental SVM: High-Level Algorithm
1: Read example xc , compute gc .2: while gc < 0 & αc < C do
3: Compute β and γ.4: Compute ∆α1
c , . . . ,∆α5c .
5: ∆αmaxc ← min{∆α1
c , . . . ,∆α5c}
6: αc ← αc +∆αmaxc
7: αs ← β∆αmaxc
8: gc,e,o ← γ∆αmaxc
9: Let k be the index of the example yielding the minimum in step 5.10: if k ∈ S then
11: Move k from S to either E or O.12: else if k ∈ E ∪O then
13: Move k from either E or O to S.14: else
15: {k = c : do nothing, the algorithm terminates.}16: end if
17: Update Q−1 recursively.18: end while
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 16 / 24
Incremental SVM: Summary
The only general method to exactly update an existing SVM solution
Can be reversed for incremental “unlearning”
Can be extended for regression and anomaly detection
Moderate performance:
O(s2) time and space (s being the size of the set S)suitable for working sets of up to 50,000 examples
Further usage:
Computation of the leave-one-out errorPoisoning attacks against SVM
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 17 / 24
Online Learning
An online learning algorithm looks at every example exactly once.
, Easy update of existing solutions
, Low storage requirements
, (Sometimes) faster learning of batch datasets
/ Inexact: may converge to a different solution
/ Potentially slow convergence rate
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 18 / 24
LASVM: a Simple Online Extension of SMO
Function PROCESS(k):
1: αk ← 0, compute gk , add k to S.2: j ← argmins∈S ysgs3: Bail out if (j , k) is not a τ -violating pair.4: Perform an SMO update step on (j , k).
Function REPROCESS:
1: i ← argmaxs∈S gs with αs < C
2: j ← argmins∈S gs with αs > 03: Bail out if (j , k) is not a τ -violating pair.4: Perform an SMO update step on (i , j).
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 19 / 24
LASVM: Main Algorithm and Analysis
Initialization
Seed S with a few examples of each class and compute the initial gradient g
Online iterations: repeat a predefined number of times:
Pick an example ktRun PROCESS(kt)Run REPROCESS once
Finishing:
Repeat REPROCESS until δ ≤ τ
, The online part brings us close enough to the optimal solution at lowlinear cost.
/ Still no hard guarantees on the solution quality: the final stage has anuncontrollable number of iterations.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 20 / 24
Stochastic Gradient Descent (SGD)The Best Pill for an Easy Case
Consider a linear SVM in d -dimensional case. A large number of simpleonline algorithms implement the following simple strategy:
Pick up a random example xt
Update the weight as:
wt+1 ← wt − γtBt∇wQ(xt ,wt)
where γt is a learning rate and Bt is a scaling matrix.
Under mild regularity conditions and with appropriate learning rates(decreasing gains satisfying
∑
t γ2t <∞), the stochastic gradient descent
converges to an optimal solution.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 21 / 24
SGD Variants and their Convergence Rates
First order, B = λ−1I :
wt+1 ← wt −1
λ(t + t0)gt(wt)
Iterations to reach accuracy ρ: νκ2
ρ+ O( 1
ρ), iteration cost: O(d)
Second order, B = H−1:
wt+1 ← wt −1
t + t0H−1gt(wt)
Iterations to reach accuracy ρ: νρ+ O( 1
ρ), iteration cost: O(d2)
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 22 / 24
Summary
In contrast to incremental learning, online learning attempts to achievefixed cost per example, potentially at the expense of learning accuracy.
Theoretical analysis shows that high optimization accuracy is not alwaysnecessary for learning accuracy.
Efficient algorithms for online learning exist mainly for linear learningproblems.
For non-linear methods, online learning currently cannot guaranteeperfect covergence to the optimal solution.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 23 / 24
Bibliography I
[1] Antoine Bordes, Leon Bottou, and Patrick Gallinari. SGD-QN: Carefulquasi-newton stochastic gradient descent. Journal of Machine Learning
Research, 10:1737–1754, 2009.
[2] Antoine Bordes, Seyda Ertekin, Jason Weston, and Leon Bottou. Fast kernelclassifiers with online and active learning. Journal of Machine Learning
Research, 6:1579–1619, 2005.
[3] Gert Cauwenberghs and Tommaso Poggio. Incremental and decrementalsupport vector machine learning. In Advances in Neural Information
Processing Systems 13, pages 409–415, 2001.
[4] Pavel Laskov, Christian Gehl, Stefan Kruger, and Klaus-Robert Muller.Incremental support vector learning: Analysis, implementation andapplications. Journal of Machine Learning Research, 7:1909–1936, 2006.
P. Laskov and B. Nelson (Tubingen) Lecture 8: Online and Incremental Learning June 19, 2012 24 / 24