Online Stochastic Tensor Decomposition for
Background Subtraction in Multispectral
Video Sequences
Andrews Sobral1, Sajid Javed2, Soon Ki Jung2, Thierry Bouwmans1, and
El-hadi Zahzah1
1Laboratoire MIA (Mathematiques Image de Applications)
Universite de La Rochelle, France
2Virtual Reality Laboratory, School of Computer Science and Engineering
Kyungpook National University, Republic of Korea
18 December, 2015
• Introduction
• Tensor Decomposition
–Methods
–Challenges
• Proposed Methodology
• Experimental Evaluations
• Conclusion
2
Main Contents
• What is Tensor?
– multi-dimensional numerical array
• generalization of conventional arrays
– Matrix
o second-order tensor: rank (2) tensor
– Vector
o first-order tensor: rank(1) tensor
• Higher order tensors (order≥3): stores data in a
multi-dimensional array
– Main operation
• unfolding or matricization
– reformating tensors into matrices
o frontal, vertical, and horizontal
3
Introduction
• Is it possible to decompose tensor for Background subtraction application?
– 2 components
• Multi-dimensional low-rank tensor (corresponds to
background model)
• Multi-dimensional sparse tensor (belongs to moving objects)
• Matrix-based decomposition
– matrix considers only single dimensional (i.e., grayscale)
– spatial correlation loss
• erroneous foreground regions
• Tensor-based decomposition
– multi-dimensional data is considered (3d or 4rth order tensor)
– multi-aspects generalization of matrices
5
Tensor Decomposition
• Example: background subtraction via tensor decomposition under convex
optimization framework
6
Tensor Decomposition
MaskInput Low-rank Sparse
Frontal slices
• Methods
– Tucker/HOSVD
– CANDECOMP-PARAFAC(CP)
– NTF (Non-negative Tensor Factorization)
– NTD (Non-negative Tucker Decomposition)
– NCP (Non-negative CP Decomposition)
• Major Challenges
– Batch optimization
– Higher Order SVD computation
– Computational complexities
– Designed for only monochromatic (i.e., grayscale) or trichromatic (i.e.,
RGB) cameras.
– Real-time processing is not desirable
7
Tensor Decomposition
• Is it possible to make “Online Tensor Decomposition method for RGB as
well as Multispectral bands for background subtraction”?
– Main contributions
• Online Stochastic framework for Tensor Decomposition (OSTD)
– computationally good
– less memory cost
• OSTD for Multi-Spectral Video Sequences (MSVS)
– RGB is not sufficient for color saturation/ shadows/ reflections
– Multi-spectral bands can improve foreground segmentation
8
Proposed Methodology
9
Proposed Framework
Input
Multi-spectral
Bands
𝑁𝑡ℎ Order
Tensor
OSTD: Online
Stochastic Tensor
Decomposition
Low-rank
Sparse
• Let say 𝑁𝑡ℎ order observation tensor
– corrupted by outliers,
• Main assumption
– can be reconstructed by the combination of
• low-rank component,
• sparse component,
– convex optimization framework
• represents the nuclear norm of 𝑖𝑡ℎ mode
• represents the 𝑙1 norm
• Stochastic/Online optimization proposed by [Feng et.al 2013]
10
OSTD: The Model
• Main notion
– process only one frame at a time instance t
• MSVS: process each k band
• Nuclear norm is re-formulated
– Decompose nuclear norm into
• explicit product of basis and coefficients
• re-formulated norm is used proposed by [Feng et.al 2013]
– p ambient dimension, and r is rank
• Stochastic optimization
11
OSTD: Online Optimization
= 𝑖𝑛𝑓𝐿∈ℝ𝑝×𝑟,𝑅∈ℝ𝑛×𝑟
1
2( | 𝐿𝑖 |𝐹
2 + | 𝑅𝑖 |𝐹2) 𝑠. 𝑡. , = 𝐿𝑖𝑅𝑖
𝑇
• Advantages
– no batch processing
– iteratively update the basis
– used for each 𝑖𝑡ℎ mode
• Major Processing: 3 Steps
– Low-rank approximation
• Initialize the basis, L
– Bilateral Random Projections (BRP) method
o L, Y, A are all random matrices
o speed-up low-rank recovery: fast convergence
• SVD decay slowly
12
OSTD cont…
𝐿 = 𝑌1(𝐴1𝑇𝑌1)
−1𝑌2𝑇
• Find coefficients R as
• Fix R, and updated basis
– use block-coordinate decent method
– incremental updated
• Sparse outlier estimation
– M = −L𝑟𝑡 of k element
• Background Model:
• Sparse Component:
13
OSTD cont…
𝑒𝑡 = 𝑀𝑡 𝑘 − λ2, 𝑖𝑓 𝑀𝑡 𝑘 > λ2
𝑀𝑡 𝑘 + λ2, 𝑖𝑓𝑀𝑡 𝑘 < λ2
𝑟𝑡 = (𝐿𝑇𝐿 + λ1𝐼)−1𝐿𝑇 −𝑒𝑡−1
• Synthetic Evaluation
– True low-rank tensor of size 30 × 30 × 30 is generated
by rank-3 factor matrices
• 𝑍𝑛 ∈ ℝ30×3, where 𝑛 = 1,2,3
• random entries are corrupted
– Relative Root Square Error (RRSE) measure is computed
• Two different cases are considered
– smaller magnitude of true data
– with a higher magnitude
14
Experimental Evaluations
RRSE=
• Comparison Methods: Batch processing algorithms
– Bayesian Robust Tensor Factorization (BRTF) [Q. Zhao et.al 2014]
– Higher Order RPCA (HORPCA) [D. GoldFarb et.al 2013]
– Tensor Factorization method CP-ALS [T. Kolda 2009]
– Higher Order SVD (HOSVD) [L. De Lathauwer et.al 2013]
15
Experimental Evaluations
• Multispectral Video Sequences (MSVS)
– Acquisition
• commercial camera (FD-1665-MS)
– 7 spectral narrow bands = 6 visible + 1 NIR spectral band
– 5 video sequences
• 1 indoor video sequence
• 4 outdoor scenes
• frame Size: [658 × 491 × 3] with 250 to 2300 no. of frames
• frame rate: depends on overall scene illuminations
– 5 fps for dark scene and 15 fps for brighter one
– Main Challenges
• gradual illumination changes, shadows, and intermittent
object motion
• camouflage (color similarity between background and objects)
16
Experimental Evaluations
• MSVS dataset
– “integration of MS bands improve the foreground segmentation”
17
Experimental Evaluations
Video1 Video2 Video 3 Video 4 Video 5
• Visual Results of Video 1
18
Experimental Evaluations
RGB VS-1 VS-2 VS-3 VS-4 VS-5 VS-6 NIR
Input
Low-
rank
Sparse
Mask
• Visual Results of Video 2
19
Experimental Evaluations
RGB VS-1 VS-2 VS-3 VS-4 VS-5 VS-6 NIR
Input
Low-
rank
Sparse
Mask
20
Experimental Evaluations
Video 1
Video 2
Video 3
Video 4
Video 5
Input Low-rank Ground
Truth
RGB
Mask
6 VSB
Mask
1 NIR
Mask
• Qualitative Comparison
– White: True positive (TP) pixels
– Black: True negatives (TN) pixels
– Red: False positives (FP) pixels
– Green: False negatives (FN) pixels
21
Experimental Evaluations
Video 2
Video 3
Video 5
Input Ground
Truth
Proposed BRTF HORPCA CP-ALS
• Quantitative Analysis
– F measure score is computed for RGB and MS bands for comparison
22
Experimental Evaluations
• Time Complexity
– Independent number of samples
• grows linearly to the image resolution
23
Experimental Evaluations
• Video demo 1
– color saturation issue
24
Experimental Evaluations
Input Ground
Truth
RGB
Mask
MS
Mask
• Video demo 2
– color saturation issue
25
Experimental Evaluations
Input Ground
Truth
RGB
Mask
MS
Mask
• Video demo 3
– shadows, dynamic backgrounds, intermittent object motion
26
Experimental Evaluations
Input Ground
Truth
RGB
Mask
MS
Mask
• Online stochastic optimization framework is proposed
– tensor decomposition into low-rank and sparse tensor
• computationally attractive
• real-time processing achieved
– provides great potential for multi-spectral bands
• Limitation
– proposed method is not stable for RGB image features
• achieve promising accuracy with the integration of MS
spectral bands
• Future work
– disparity features will be integrated
– can be extended for visual tracker as Low-rank sparse tracking
27
Conclusion