Upload
dangtram
View
216
Download
0
Embed Size (px)
Citation preview
Online Outlier Detection for Time Series
Tingyi Zhu
September 14, 2016
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 1 / 1
Outline
Overview of Online Machine Learning
Online Time Series Outlier Detection
Demo
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 2 / 1
Overview of Online Machine Learning
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 3 / 1
Introduction
Online learning
I a method of machine learning in which data is accessed in a sequentialorder
I train the data in consecutive steps
I update the predictor at each step
Batch (offline) learning
I learn on the entire training data set
I generate the best predictor at once
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 4 / 1
Two scenarios online algorithms are useful :
When dealing with large dataset, computationally infeasible to trainover the entire dataset
When new data is constantly being generated, and prediction isneeded instantly, cases include
I In financial market, stock price prediction, portfolio selection
I Real-time Recommendation
I Online ad placement
I Anomaly (outlier) detection
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 7 / 1
Statistical Learning Model
X : space of inputs
Y : space of outputs
To learn the function f : X → Y
Loss function: V : Y × Y → R
V (f (x), y) measures the difference between the predictive value f (x)and the true value y
Goal: Minimize the expected risk
minf
E [V (f (x), y)]
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 8 / 1
Given the training set
(x1, y1), . . . , (xn, yn)
Then minimize the empirical loss
minf
n∑i=1
V (f (xi ), yi )
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 9 / 1
Online Model
Pure online learning model:
I At time t + 1, learning of ft+1 depends on the new input (xn+1, yn+1)and the previous best predictor ft
I Need extra stored information, usually independent of training data size
Hybrid online learning model
I Learning of ft+1 depends on ft , and all previous data(x1, y1), . . . , (xt+1, yt+1)
I The extra storing space requirement no longer constant, depends ontraining data size
I Take less time to compute, compared to batch learning
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 10 / 1
Example: Linear Least Square
xj ∈ Rd , yj ∈ R
Learning a linear function: f (x) = 〈w , x〉
Parameters to learn: w ∈ Rd
Square loss: V (f (x), y) = (f (x)− y)2
Minimize the empirical loss:
Q(w) =n∑
j=1
Qj(w) =n∑
j=1
V (〈w , xj〉, yj) =n∑
j=1
(xTj w − yj)2
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 11 / 1
Batching Learning
Let X be the i × d data matrix
Y be the i matrix of target values after the arrival of the first i datapoints
The covariance matrix Σi = XTX
The best solution of w is
w = (XTX )−1XTY = Σ−1i
i∑j=1
xjyj
Recompute the solution after the arrival of every datapoint
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 12 / 1
Complexity of Batching Learning
After the arrival of the ith datapoint,
Calculating Σi takes time O(id2)
Inverting the d × d matrix Σ−1i takes time O(d3)
Computation time at ith step: O(id2 + d3)
Total time with n total data points in the dataset:∑i
O(id2 + d3) = O(n2d2 + nd3)
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 13 / 1
Improving on Batch Learning
When storing the matrix Σi
Updating Σ at each step:
Σi+1 = Σi + xi+1xTi+1
which only takes O(d2) time
Total time reduces to∑
i O(d2 + d3) = O(nd3)
Additional storage space of O(d2) needed to store Σi
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 14 / 1
Online Learning: Recursive Least Square
Initializing: w0 = 0 ∈ Rd , Γ0 = I ∈ Rd×d
Solution of LS can be computed by the following iteration
Γi = Γi−1 −Γi−1xix
Ti Γi−1
1 + xTi Γi−1xi,
wi = wi−1 − Γixi (xTi wi−1 − yi )
w computed above is the same as the solution of batch learning,which can be proved by induction on i
Complexity: O(nd2) for n steps
require storage of matrix Γ, which is constant O(d2)
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 15 / 1
For the case when Σi is not invertible, consider the regularized versionof loss function
Q(w) =n∑
j=1
(xTj w − yj)2 + λ||w ||22.
The same recursive algorithm works with the minor change oninitialization:
Γ0 = (I + λI )−1
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 16 / 1
Stochastic Gradient Descent
Recall the standard gradient descent (also called batch gradientdescent)], dealing with the problem of minimizing an objectivefunction Q(w)
The solution w is found through the iteration
w := w − η∇Q(w)
When the objective has the form of a sum: Q(w) =∑n
j=1Qj(w)
the standard gradient descent performs the iteration:
w := w − ηn∑
j=1
∇Qj(w)
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 17 / 1
Each summand function Qj is typically associated with the i-thobservation in the data set
When sample size n is large, standard GD requires expensiveevaluations of the gradients from all summand functions, which isredundant because it computes gradients of very similar functions ateach update
SGD: update solution by computing the gradient of a single Qj
w := w − η∇Qj(w)
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 18 / 1
SGD for Online Learning
Linear least square:
minw
Q(w) =n∑
j=1
Qj(w) =n∑
j=1
V (〈w , xj〉, yj) =n∑
j=1
(xTj w − yj)2
After the arrival of the ith datapoint (xi , yi ), update the solution by
wi = wi−1 − γi∇V (〈w , xi 〉, yi ) = wi−1 − γixi (xTi wi−1 − yi )
which is similar to the updating procedure in recursive least square
wi = wi−1 − Γixi (xTi wi−1 − yi )
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 19 / 1
Complexity of SGD algorithm for n steps reduces to O(nd)
The storage requirement is constant at O(d)
Consistency of SGD:
When the learning rate γ decreases with an appropriate rate, SGDshows the same convergence behavior as batch gradient descent
Averaging SGD: Choose γi = 1/√i , keep track of
wn =1
n
n∑i=1
wi
which is consistent.
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 20 / 1
Figure: SGD fluctuation (Source: Wikipedia)
SGD performs frequent updates, is faster
With high variance, cause heavy fluctuation
Complicates convergence to the exact minimum
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 21 / 1
Mini-Batch Gradient Descent
Use b datapoints in each iteration
Generalization and compromise between batch GD and SGD
Get b datapoints: (xi+1, yi+1), . . . , (xi+b, yi+b),
w := w − γ · 1
b
i+b∑k=i+1
xk(xTk w − yk)
b ranges from 2 to 100
For online learning, b much smaller than n
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 22 / 1
Other topics of Online Learning
Incremental (Online) PCA
Incremental (Online) SVD
Adversarial Model
etc.
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 23 / 1
Online Time Series Outlier Detection
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 24 / 1
The Problem
Given: A time series x1, . . . , xt , new data point keep coming in astime went on
Goal: Upon arrival of the new data, determine whether it’s outlier
Applications:
I Financial market: abrupt change of stock price
I Health data: blood pressure, heart beats
I Intrusion detection: sudden increase of login attempts
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 25 / 1
The Framework
Initial series x1, . . . , xn as training sample
Use prediction algorithm, which can be adapted to online setting, topredict xn+1
Set criteria of identifying outliers or outlier events (subsequences),based on the prediction xn+1 and the new data point coming in xn+1
Update the training series to x1, . . . , xn, xn+1, and predict xn+2
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 26 / 1
Chanllenge:
Size of training sample grows as time went on, leading to moretraining time, fail to detect outliers on time
Solutions:
Use slide window, drop the earliest data in the training sample whileadding the new data, keep the size of training sample
Use online learning techniques to update the model, avoidingretraining the whole sample.
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 27 / 1
Classical Online Time Series Prediction Model
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 28 / 1
Training of AO,B —- A machine learning problem
Linear model: methods in previous section
Nonlinear model: Support vector regression, online SVR [Ma andPerkins, 2003]
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 29 / 1
Support Vector Regression
Training set: {(xi , yi ), i = 1, . . . , l}, where xi ∈ RB , yi ∈ R
Construct the regression function
f (x) = W TΦ(x) + b
where Φ(x) maps the input x to vector in feature space
W and b obtained by
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 30 / 1
Width of the band (between dashed lines) is 2ε||W || , so minimize ||W ||2
Also penalizes data points whose y -value differ from f (x) by morethan ε
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 31 / 1
The dual optimization problem:
where Qij = Φ(xi )TΦ(xj) = K (xi , xj).
K (·, ·) is the predetermined nonlinear kernel function and the solutionof SVR can be written as
f (x) =l∑
i=1
(α− α∗)K (xi , x) + b
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 32 / 1
Online SVR: [Ma, Theiler and Perkins, 2003]
The trained SVR function
f (x) =l∑
i=1
(α− α∗)K (xi , x) + b
Update (α− α∗) and b whenever a new data is added
Faster than batch SVR algorithm
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 33 / 1
Outlier Detection Criteria
An example [Ma and Perkins, 2003] :
Occurrence at t: O(t) = 1{|xt − xt | 6∈ (−ε, ε)}I 1{·}: indicator functionI x : prediction of xtI ε: tolerance width
A surprise is observed at t if O(t) = 1
Event at time t: En(t) = [O(t),O(t + 1), · · · ,O(t + n − 1)]T
I n: event duration
If |En(t)| > h, then En(t) is defined as a novel event (outlier)
I | · |: 1-normI h: a fixed lover bound
Tingyi Zhu Online Time Series Outlier Detection September 14, 2016 34 / 1