Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1
The Use of Sparsity Hypothesis for Source Separation
Prasad Sudhakar, PhDPost-doctoral researcher, ICTEAM/ELEN, UCL
13 October, 2011
SPS Seminar Series, ICTEAM/ELEN, UCL
1
22
Context
MixturesSources Mixing process
33
Source separation
Unmixing process N Source estimates
M < N : Underdetermined
M Mixtures
44
Acoustic mixing model
Filter of length L
N Sources
M convolutive mixtures
5
Blind filter estimation and source separation
Needs further hypothesis
Source Estimation
Filter Estimation
• Mathematically, ill posed and hence impossible to solve
• Source separation: Known mixtures Unknown sources
• Filter estimation: Known mixtures Unknown filters
6
Enabling hypotheses
• Independent Component Analysis (ICA)• Hypothesis: Sources are statistically independent • Objective: Estimate the sources by maximising independence: Ex. Minimise mutual information.
• Non-negative Matrix Factorisation (NMF)• Hypothesis: Mixtures can be factored into two non-negative matrices• Objective: Seek factor matrices which well approximates the mixtures
• Sparse Component Analysis (SCA)• Hypothesis: Sources and/or mixing filters are sparse• Objective: Seek sparse sources and/or filters from mixtures
7
Filter types
Instantaneous Anechoic SparseFully
convolutive
8
Sparse sources• Many audio sources are sparse in the time-frequency domain
• Short-Time-Fourier-Transform (STFT): commonly used analysis tool
STFTSparse
9
Can Sparsity still help?
10
Mixing complexity
Landscape of methods
Source sparsity
State of the art: Sparse Component
Analysis (SCA)
Instantaneous/Anechoic
Fully convolutive
Rela
tive
Num
ber
of s
ourc
es
N = 1
N <= M
N > M
Sparse
N = M
State of the art: Cross-Relation (CR)
based approaches
State of the art: 1) Independent
ComponentAnalysis (ICA)
2) Non-negative MatrixFactorisation
10
What is sparsity?
• Sparsity: few significant coefficients compared to the size of the signal
Discrete, sensitive to noise
• Relaxed convex norm:
• True sparsity measure:
k - sparse vector
11
Relevance of sparsity
• Helps reduce problem complexity
• Aids data compression: data can be sparse in a transform domain
• e.g. DCT of natural images, STFT of audio sources, etc.
• Of late, a popular prior for solving linear inverse problems: sparse recovery problems
• compressed sensing
12
• Suppose we need to solve
• Under-determined, non-uniqueness of solution
• Prior: Solution is sparse
Sparsity for linear underdetermined systems
• Ideally:
• Sparse approximation:
13
Algorithmic families for sparse recovery
• Principle: Iteratively estimate the non-zero coefficients
• Sub-optimal and heuristic
• Easy on computations
• Theoretically not very well understood
• Ex.: Matching Pursuit, OMP, StOMP, etc.
Convex relaxation Greedy methods
• Principle: Replace norm by convex norms
• Provides provably optimal solutions in several settings
• Computation intensive• Well rounded theory:
•Thanks to ‘Compressed Sensing’
• Ex.: Basis pursuit, BPDN, etc.
14
Sparse component analysis
15
Estimating sparse sources with known parameters
• Suppose the mixing matrix A is known, then
• Likewise, if symbolises convolutive mixtures
What if the mixing coefficients are unknown?
• Model: Instantaneous mixtures of sparse sources
16
Instantaneous mixtures of sparse sources
• Model: Sparse and disjoint sources
Sparse and disjoint sources
Instantaneous mixture
References•A. Jourjine et al., 2000•P. Bofill & M. Zibulevsky, 2001•M. Zibulevsky & B.A. Pearlmutter, 2001
• Suppose the sets are known, use time-frequency masking and invert transform to estimate the sources.
Use geometric ideas• How to obtain the sets ?
17
Colours: Sources
Lengths: Scaling
Use of scatter plots to identify (Stereo M=2)
• Knowledge of can then be used for: 1) Filter estimation and 2) Separation
eA ⇡ A�gPg• Up to a permutation and scaling:
Scatter plot
18
Generalisations and improvements
• Anechoic mixtures:
• The mixing coefficients also involve a phase parameter
• In stereo case, only one phase needs to be estimated (fixing the other)
• Extension to quasi-disjoint sources
• In a generic setting, one can allow at most M-1 sources to be active at a given TF location
• Statistical confidence measures to obtain the sets
xi(t) =NX
j=1
(aij ⇥ sj)(t) �! xj(�, f) ⇡NX
j=1
aij(�, f)sj(�, f), 1 j M
• Convolutive mixtures
• Narrowband approximation through a suitable transform, like STFT
• One convolutive problem to M complex instantaneous problems
Can we use the techniques developed for instantaneous mixtures?
19
Convolutive mixtures: Permutation and scaling
Estimated mixingparameters
• Frequency dependent scaling:
Scatter plot
Conclusions
• Scatter plots alone are not sufficient
• Need permutation and scaling correction
• Time / frequency domain ICA approaches also suffer
• Each frequency bin is arbitrarily scaled and permuted
• If : estimated filter coefficients, then
Consequences
20
Sparse filters for permutationcorrection
21
Relevance of sparse filters
An example of an underwater communication channel
• Few reflection paths = time domain sparsity- Underwater acoustics- Communications engineering
Wide band CDMA- Biomedical
Ultrasound imaging
Picture courtesy: Berger et al., 2009
22
Permutation correction problem
• Setting: Permutation only
Permutation correction
• Claim: Suppose the filters are sufficiently sparse, thenPermutation corrected filters have least norm
and hence, permutations can be recovered by minimisation
: Set of permuted frequencies
• Problem:
Fail
Success
|�|
23
Permutation correction by minimisation
• Conclusion: Sparsity prior on filters helps solve permutation problem, in the absence of scaling.!
ATD• - True time-domain filter matrix; - Estimated filter matrix
• Recovery Performance measure:
eATD
24
Scaling correction
25
Single input - two output setting
• Model:
1. Normalisation of solution2. Possibly non-unique solution
• Issues:
• Conclusion: Needs a prior
• Consequence: Removes TF dependent source scaling
Matrix form:
Double Toeplitz matrix
Cross-relation (CR)• Property:
References•H. Liu et al., 1994•G.Xu et al., 1995
26
Formulation of sparse filter estimation problem
Normalisation
• Issue: How to exploit this approach in multiple source setting?
• In standard sparse recovery problems, minimisation seeks sparse solutions
• Convex and can be solved using standard packages
• Is noise aware
where and
• Filter recovery problem formulation
References•A. Aïssa-El-Bey et al., 2008
27
Scatter plots
Instantaneous/Anechoic
Fully convolutive
Num
ber
of s
ourc
es
N = 1
N <= M
N > M
Sparse
State of the art: Cross-Relation (CR)
based approaches
Source sparsity
State of the art: Sparse Component
Analysis (SCA)
N = M
27
Revisiting the landscape
• Permutations• Scaling
Multiple filter estimation framework
28
Multiple sparse filter estimation
29
References•A. Aïssa-El-Bey et al., 2009
Multiple filter estimation using time-domain disjointness and CR
• Suppose there exists intervals where only one source is active
• The mixtures contain intervals where only one source contributes
• Mixtures in those intervals satisfy time-domain CR
• Filters can be estimated by solving the sparse recovery problem
What if the sources are not disjoint?
3030
• Given the mixtures, if we know which source is active at which TF locations
• and build a matrix or for each source such that
• then we can express the CR in the TF domain in two different forms:
1) narrowband (NB) approximation2) wideband (WB) formulation
Multiple filter estimation using time-frequency (TF) domain disjointness and CR
31
Narrowband and Wideband CR
CR-NB CR-WB
No narrowband approximation
Narrowband approximation
CR-TDWe have:
Narrowband CR Wideband CR
Given:
3231
Multiple filter estimation using TF domain CR
B = B⌦j
NB B = B⌦j
WBwhere or and
(A)
• Further, use the matrix B to solve the following and estimate the filters
• A single N filter estimation problem is reduced to N single filter estimation problem
3332
A two stage framework
• Filter estimation: using generic toolboxes for convex programming• Clustering: difficult problem, dictates the performance of filter estimation
• For each source
1) Time-frequency clustering:
2) Filter estimation:
Solve the optimisation problem to estimate the filters
Identify the time-frequency regions where only one source is active
3438
Experiments with controlled setting
Sparse filter
Flute
Guitar
Instantaneous• Setup:
• Main issue: Blind clustering of TF points where only source 2 is active
• Remove the points corresponding to instantaneously mixed source
3539
Experiments: Blind clustering
• Remove the points corresponding to source 1
• Remaining points correspond to source 2
• STFT magnitude of a mixture
• Use DEMIX or similar approach to identify the regions where source 1 is active
Reference• Arberet et. al., 2010
3634
Performance measure
• SNR measure of the estimated filters
• Takes care of global shift and global scale ambiguity inherent to problem formulation
3740
Results
• Debiasing: Extract valid support and readjust the coefficients by performing minimisation
• Wideband method with debiasing outperforms state of the art by at least 10 dB
Reference•C. Knapp & G. Carter, 1976
38
Summary and perspectives
3942
Summary
• Yes: framework for multiple filter estimation problem, central to convolutive source separation
Sparsity hypothesis
• can it be used to solve problems beyond standard linear inverse problems?
• by combining the notions of time-frequency domain sparsity of sources and time-domain sparsity of filters• Empirically results show the ability of the two stage framework to estimate filters
4044
Perspectives (1/2)
• Anechoic approximations using DOA information.!
• Central issue with the framework: blind clustering• Ideas from anechoic settings?
• Cluster initialisation using filter approximations?
Clustering
+Noise
FilterEstimation
• From filter estimation to source estimation
4144
Perspectives (2/2)
• Connections with subspace learning
• Sparse vector orthogonal to matrix B. • Subspaces characterised by sparse vectors.
• Exploiting sparsity in non-standard domains• Connections with synthesis and analysis priors
• Exploiting structured sparsity
• Theoretical analysis of the filter estimation framework• Understanding identifiability and recovery conditions
42
1) P. Sudhakar. Sparse models and convex optimisation for convolutive blind source separation. PhD thesis, University of Rennes 1, France, February 2011.
2) A. Benichoux, P. Sudhakar, R. Gribonval. Well-posedness of the frequency permutation problem in sparse filter estimation with lp minimization. In SPARS’11, Jun 2011.
3) P. Sudhakar, S. Arberet and R. Gribonval. Sparse models for multiple mixing filter estimation from stereo convolutive mixtures. Submitted to IEEE TALSP, June 2011.
4) S. Arberet, P. Sudhakar, and R. Gribonval. A wideband doubly-sparse approach for MITO sparse filter estimation. In proceedings of ICASSP 2011, May 2011.
5) P. Sudhakar, S. Arberet, and R. Gribonval. Double Sparsity: Towards Blind Estimation of Multiple Channels. In proceedings of LVA/ICA, 2010, St. Malo, France.
6) P. Sudhakar and R. Gribonval. A sparsity-based method to solve the permutation indeterminacy in frequency domain convolutive blind source separation. In proceedings of ICA, 2009, Paraty, Brazil.
Some relevant publications
•Clipart from: http://www.clker.com/
Thanks to my collaborators
• Remi Gribonval, METISS, INRIA Rennes-Bretagne Atlantique, France [email protected]
• Simon Arberet, LTS2, EPFL, Switzerland [email protected]
43
44
•CR at a given point
•For a given frame index
•If there are frames, then define
where is the forward Fourier matrix of size FxF
•it satisfies:
Structure of NB matrix
45
Projection of convoluted sequences
Lemma 1: Let be a bounded real valued signal, let be a finite real valued signal and let be a finite signal, possibly complex,
then
where
46
Structure of WB matrix
TF domain CR
If is a STFT dictionary of one sample shift, then
By lemma 1
47
Structure of WB matrix
If we have
and
then we can define which satisfies
48
Time-frequency disjointness in NB formulation
Consider
If then
49
Time-frequency disjointness in WB formulation
Consider
If then
Note that
By lemma 1
50
Oracle clustering
True filters satisfy the CR for source k
CR-NB
CR-WB
51
Blind clustering
Cluster initialisation using filter approximations
Clustering
+Noise
FilterEstimation
5258
Permutation correction: Disjoint time supports
Let be filters with mutually disjoint
supports and let be the filters obtained
after frequency domain permutations at frequency indices in , then
Theorem 1:
• Independent of
1) Sparsity
2) Number of permutations
53
Let and be two sparse filters and let and be
the filters obtained after frequency domain permutations at frequency
indices in , then
Theorem 2:
a)
59
Permutation correction: Sparse filters
• Doesn’t assume disjoint supports
• Gives a regime of and for which minimisation recovers permutations
5460
• Inequality result comes from Theorem 1
• Equality condition implies global permutation
• Conclusion: Under assumed conditions
• Permuted filters have larger norm than the corresponding true filters
• True filters can be uniquely recovered by minimisation
Permutation correction: Equality case
Further, equality in the above equation implies
EITHER I)
OR II)
b) If and have disjoint supports, then
55
Variation of norm against permutations
• Performance measure:
• Objective: To assess whether permutations increase norm of the filters
• Conclusion: Empirically, bigger the number of permutations, larger the increase in norm
• Results:
56
Sparsity in the thesis
In this work, we use the sparsity hypothesis twice
1. Source sparsity in the time-frequency domain
2. Filter sparsity in the time domain
57
Plan of the talk
1. Tools
i. Sparse component analysis
ii. Cross-relation based approaches
2. Permutation correction using sparsity
3. Framework for multiple filter estimation
4. Summary and perspectives
58
Time-frequency masking
Instantaneous mixture
• Model: Sparse and disjoint sources
Sparse and disjoint sources
• How to identify ?
• Consequence: just need to know which source is active at which TF locations for source localisation and separation
Use scatter plots
References•A. Jourjine et al., 2000•P. Bofill & M. Zibulevsky, 2001•M. Zibulevsky & B.A. Pearlmutter, 2001
5933
Clustering
• Goal: To assess the overall performance in a realistic setting
• Blind clusteringAssumes all sources except one are
instantaneously mixed
• In this work, two kinds of experiment are done
Experiments with synthetic data Experiments with audio data
• Goal: To assess the performance of the filter estimation step
• Oracle clustering:• uses the knowledge of true filters (ground truth) • depends on a threshold
6035
Experiments with synthetic data
• Source modelSum of sinusoids with Gaussian envelopes
of random lengths
• Study the effect of 1) STFT window size F
2) Clustering threshold
on the filter recovery performance, using oracle clustering
• Sparse filters of length L = 256
• Number of sources N = 3
6136
Results: Effect of STFT window size
CR-NB
• NB approach gets better as window size increases relative to filter length
• WB approach performs better when window size is less or equal to filter length
CR-WB
6237
Results: Effect of clustering threshold CR-NB CR-WB
• NB approach degrades when threshold increases: due to lesser number of observations
• WB approach performs better when threshold increases: due to accurate CR
Window size = 1024 Window size = 128