View
1
Download
0
Category
Preview:
Citation preview
Fast Global Alignment Kernels
05.05.2015 Rodrigo Frassetto Nogueira Thanos Papadopoulos
Time Series
Agenda• Motivation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels
• Experiments
• Conclusion
Agenda• Motivation
• Time series Introduction
• Problem Formulation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels
• Experiments
• Conclusion
Example #1• Stock Price Forecasting Apple Inc (9.2007 - 8.2012)
Example #2• Caltrans Performance Measurement System (PeMS)
• occupancy rate in San Francisco bay area freeways - [0,1]
• 963 sensors (different car lanes)
• data every 10’
• time series per day: dimension - 963, length - 6*24=144
Example #3• Speech signals
Agenda• Motivation
• Time series Introduction
• Problem Formulation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels
• Experiments
• Conclusion
Problem Formulation• Goal: Find adequate kernels for time series
• handle variable length
• be positive definite
• low computational cost
How we start?
Similarity Measure: K(x, y) = h�(x),�(y)i
Agenda• Motivation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels
• Experiments
• Conclusion
Dynamic Time Warping• pairwise comparisons?
• alignment: associate each element of sequence X to one or more elements of sequence Y and vice-versa
DTW Cost Matrix
Optimal Alignment
• warping functions:
DTW definition⇡1(i),⇡2(j)
• warping functions:
• moves:
DTW definition
!, ",%
⇡1(i),⇡2(j)
• warping functions:
• moves:
• cost per alignment π:
• optimal alignment:
DTW definition
!, ",%
⇡1(i),⇡2(j)
DTW (x, y) = min⇡2A(n,m)
D
x,y
(⇡)
D
x,y
(⇡) =
|⇡|X
i=1
'(x⇡1(i), y⇡2(i))
Why not DTW?• not PDS
• high computational cost, O(dnm)
Agenda• Motivation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost
• Experiments
• Conclusion
GA kernels• soft maximum motivation, :
• by defining :
• whole spectrum of costs/alignments
kGA
=X
⇡2A(n,m)
e�Dx,y(⇡) =X
⇡2A(n,m)
e�P|⇡|
i=1 '(x⇡1(i),y⇡2(i))
kGA =X
⇡2A(n,m)
|⇡|Y
i=1
(x⇡1(i), y⇡2(i))
= e�'
log(X
i
e
xi)
Agenda• Motivation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost
• Experiments
• Conclusion
Diagonal Dominance• off-diagonal entries far smaller than trace (Gram
matrix)
• Solution:
= e��'(x⇡1(i),y⇡2(i)), � > 0
Behavior in the limits of λ• Let be a sample of time series
1. All samples have same length n
When
When
{X1, X2, ..., Xp}
� ! 1, kGA ! Ip
� ! 0, kGA ! D(n, n) · 1p,p
k
GA
(x, y) =X
⇡2A(n,m)
|⇡|Y
i=1
e
��·'(x⇡1(i),y⇡2(i))
Delannoy numbers
D(n,m) = D(n,m� 1) +D(n� 1,m) +D(n� 1,m� 1)
Behavior in the limits of λ• Let be a sample of time series
2. Samples have different lengths
When
When
{X1, X2, ..., Xp}
� ! 1, kGA ! Ip
k
GA
(x, y) =X
⇡2A(n,m)
|⇡|Y
i=1
e
��·'(x⇡1(i),y⇡2(i))
� ! 0, kGA(Xi, Xj) ! D(|Xi|, |Xj |)
Diagonal Dominance• problem arises when length varies and λ tends to 0
• n sequences with length 1 to n
• Empirically: for � ⇡ 0 ) 12 n
m 2
Agenda• Motivation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost
• Experiments
• Conclusion
Positive Definiteness• if κ is p.d. what about ?
• mapping kernels: κ is a local kernel on substructures of x, y M is a mapping set
kGA
k(x, y) =X
(xi,yi)2M(x,y)
l
(xi
, y
i
)
GA kernels as Mapping kernels
• if κ is p.d. what about ?
• For GA kernels:
• Theorem 1: is p.d. if and only if M is transitive
• Lemma 1: is not transitive
kGA
MGA(x, y) = {(x⇡1 , y⇡2)|⇡ = (⇡1,⇡2) 2 A(n,m)}
l(x⇡1 , y⇡2) =
|⇡|Y
i=1
(x⇡1(i), y⇡2(i))
kGA
MGA
(xi, yi) 2 M(x, y), (yi, zi) 2 M(y, z) ) (xi, zi) 2 M(x, z)
we don’t
know
Constraints on κ• Theorem 2: If κ is p.d. and is p.d., is p.d.
• Geometric Divisibility (g.d.):
• mapping
• inverse mapping
• is g.d. if is p.d.
• Lemma 2: If κ is g.d. then is p.d.
⌧
�1(x) =x
1� x : [0, 1) ! R+
⌧(x) =x
1 + x: R+ ! [0, 1)
f ⌧f
1 + kGA
kGA
Constraints on κ• Infinite divisibility (i.d.): is i.d. iff is n.d.
• Lemma 3: For any i.d. kernel s.t. , is g.d. and i.d.
• how to construct a local kernel?
• Example: Gaussian kernel, , is i.d., thus is i.d. and g.d. Also, is n.d. and we set
�log()
0 < < 1⌧�1
� ⌧�1(�2)
' = �log(⌧�1(�2)) = e��'
Agenda• Motivation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels • Definition • Diagonal Dominance • Positive Definiteness • Computational Cost
• Experiments
• Conclusion
Time Complexity• DTW time complexity: O(dnm)
• What to do?
• Ignore “bad” alignments - keep alignments close to the diagonal:
D
�
x,y
(⇡) =
|⇡|X
i=1
�
⇡1(i),⇡2(i) · '(x⇡1(i), y⇡2(i))
�i,j =
⇢1, |i� j| < T1, |i� j| � T
Mi,j = (xi, yj)(Mi�1,j�1 +Mi,j�1 +Mi�1,j)
(2T � 1) ·min(n,m)� T (T � 1)/2 ) O(2Tmin(n,m))
Triangular GA Kernels• if κ is infinite divisible, and ω is a p.d. kernel in N,
then as a local kernel gives a p.d. GA kernel
• common choice for ω:
• Example:
⌧�1(! ⌦ )
!(i, j) =
✓1� |i� j|
T
◆
+
⌧
�1(! ⌦ 12�)(i, x; j, y) =
!(i, j)(x, y)
2� !(i, j)(x, y)
Agenda• Motivation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels
• Experiments
• Conclusion
Kernels to compare• DTW:
• DTW SC:
• DTAK:
• GA:
• TGA:
kDTW = e� 1tDTW
kSC = e� 1tDTWSC
DTW
SC
(x, y) = min⇡2A(n,m)
D
�
x,y
(⇡)
�kDTAK(x, y)
� 1t
kDTAK(x, y) = max⇡2A(n,m)
|⇡|X
i=1
�(x⇡1(i), y⇡2(i))
= e�log�⌧
�1( 12�)�
= ⌧�1(! ⌦ 12�)
Datasets
Classification error rates
Performance and speed
Agenda• Motivation
• Dynamic Time Warping (DTW)
• Global Alignment (GA) Kernels
• Experiments
• Conclusion
Conclusion
AlignmentGlobalFast Kernels
O(T ·min(n,m))
wide spectrum of alignments
DTW based
PDS variable size
Recommended