Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Sparse Bayesian Learning for Wavefields from Sensor Array Data
Christoph Mecklenbräuker and Peter Gerstoft
Plan MAP+Lasso path (3 slides) Performance simulation (6 slides) Three acoustic data sets (4 slides)
Gerstoft et al, JASA Oct 2015.
Y = A X + N
Y A X N
CS approach to geophysical data analysis
CS of Earthquakes
Yao, GRL 2011, PNAS 2013
Sequential CS
Mecklenbrauker, TSP 2013
a) Sequential h0=0.5
5 10 15 20 25 30 35 40 45 500
45
90
135
180
Time
DOA
(deg
)
b) Sequential h0=0.05
5 10 15 20 25 30 35 40 45 500
45
90
135
180
10
15
20
25
30
35
40
CS beamforming
Xenaki, JASA 2014, 2015Gerstoft JASA 2015
CS fathometer
Yardim, JASA 2014
CS Sound speed estimation
Bianco, JASA 2016 Gemba, JASA 2016
CS matched field
DOA estimation with arrays
DOA estimation with sensor arrays
-90o 90o
0o
-45o 45o
�1 �2
x1
x2
p1(r,t) = x1 ej(�t-k1r)
k1
k2
p2(r,t) = x2 ej(�t-k2r)
r
�
�
x 2 C, ✓ 2 [�90�, 90�]
k = �2⇡
�sin ✓, �:wavelength
ym =X
n
xnej 2⇡� rm sin ✓n
m 2 [1, · · · ,M]: sensorn 2 [1, · · · ,N]: look direction
y = AM⇥Nx
y = [y1, · · · , yM ]T , x = [x1, · · · , xN ]T
A = [a1, · · · , aN ]
an =1pM
[e j2⇡� r1 sin ✓n , · · · , e j
2⇡� rM sin ✓n ]T
A. Xenaki (SIO/DTU) Compressive beamforming, JASA 2014 UA 2014 3 / 16
Beamfroming vs Compressive sensing
y = AM⇥Nx+n,M < N
n 2 CM , SNR = 20log10kAxk2knk2 , knk2 ✏
Beamformingsimplified l2-norm minimization (AAH = IM )
x̂ = A
Hy = A
HAx+A
Hn
Compressive sensingl1-norm minimization
x̂ = argminx2CN
kxk1 s.t. kAx� yk2 ✏
ULA M = 8, d� = 1
2 , [✓1, ✓2] = [0, 5]�, SNR = 20 dB
−90 −60 −30 0 30 60 90
−20
−10
0
θ [◦]
P[dB
remax
]
−90 −60 −30 0 30 60 90
−20
−10
0
θ [◦]
P[dB
remax
]
A. Xenaki (SIO/DTU) Compressive beamforming, JASA 2014 UA 2014 7 / 16
CBF : x̂ =AHy CS: min Ax− y2
2+µ x
1( )
N>>M
High resolution No sidelobes
MAP
Likelihood (noise complex Gaussian) p(y | x)∝ exp −Ax − y
2
2
σ 2
⎛
⎝⎜⎜
⎞
⎠⎟⎟
Prior (Laplacian) p(x)∝ exp −x
1
ν
⎛
⎝⎜
⎞
⎠⎟
Bayes rule p(x|y)∝ p(y|x)p(x)∝ exp −Ax − y
2
2
σ 2 −x
1
ν
⎛
⎝⎜⎜
⎞
⎠⎟⎟
MAP
solutions. The choice of the (unconstrained) LASSO for-mulation (8) over the constrained formulation (7) allowsthe sparse reconstruction method to be interpreted in astatistical Bayesian setting, where the unknowns x andthe observations y are both treated as stochastic (ran-dom) processes, by imposing a prior distribution on thesolution vector x which promotes sparsity14–16.
The Bayes theorem32 connects the posterior distribu-tion p(x|y), of the model parameters x conditioned onthe data y, with the data likelihood p(y|x), the prior dis-tribution of the model parameters p(x) and the marginaldistribution of the data p(y),
p(x|y) = p(y|x)p(x)p(y)
. (9)
From the Bayes rule (9), the maximum a posteriori(MAP) estimate is,
x̂MAP = argmaxx
ln p(x|y)
= argmaxx
[ln p(y|x) + ln p(x)]
= argminx
[� ln p(y|x)� ln p(x)] ,
(10)
where the marginal distribution of the data p(y) is omit-ted since it is independent of the model x.
Based on a complex Gaussian noise model with i.i.d.real and imaginary parts, n ⇠ CN (0,�2
I), the likelihoodof the data is also complex Gaussian distributed p(y|x) ⇠N (Ax,�
2I),
p(y|x) / e
� ky�Axk22�
2. (11)
Assuming that the coe�cients of the solution vector x
have i.i.d. Laplace (i.e., double exponential) priors33,
p(x) /NY
i=1
e
(� |xi
|⌫
) = e
(� kxk1⌫
), (12)
the LASSO estimate (8) can be interpreted as the maxi-mum a posteriori (MAP)estimate,
x̂MAP = argminx
⇥
ky �Axk22 + µkxk1⇤
= x̂LASSO(µ),
(13)where µ = �
2/⌫. The Laplace prior distribution encour-
ages sparse solutions with many zero components since itconcentrates more mass near 0 than in the tails. There-fore, the model selected by the LASSO optimization al-gorithm has the highest posterior probability under theBayesian framework.
V. REGULARIZATION PARAMETER SELECTION
The choice of the regularization parameter µ in (8),also called LASSO shrinkage parameter, is crucial as itcontrols the balance between the degree of sparsity ofthe estimated solution and the data fit determining thequality of the reconstruction.
For large µ, the solution is very sparse (with small `1-norm) but the data fit is poor. As µ decreases towardszero, the data fit is gradually improved since the cor-responding solutions become less sparse. Note that forµ = 0 the solution (8) becomes the unconstrained leastsquares solution.
A. The LASSO path
As the regularization parameter µ evolves from 1 to 0,the LASSO solution (8) changes continuously followinga piecewise smooth trajectory referred to as the solutionpath or the LASSO path18,19,34. In this section, we showthat the singularity points in the LASSO path are as-sociated with a change in the degree of sparsity of thesolution and can be used to indicate a proper value forµ.We obtain the full solution path using convex optimiza-
tion to solve (8) iteratively for di↵erent values of µ. Weuse the cvx toolbox for disciplined convex optimizationwhich is available in the Matlab environment. It usesinterior point solvers to obtain the global solution of awell-defined optimization problem17,28,29.
Let L(x, µ) denote the objective function in (8),
L(x, µ) = ky �Axk22 + µkxk1. (14)
The value x̂ minimizing (14) is found by di↵erentiation,
g(µ) = infx2CN
L(x, µ),
@
x
L(x, µ) = 2AH (Ax� y) + µ@
x
kxk1,(15)
where the subdi↵erential operator @x
is a generalizationof the partial di↵erential operator for functions that arenot di↵erentiable everywhere (Ref.29 p.338). The sub-gradient for the `1-norm is the set of vectors defined as,
@
x
kxk1 =�
s : ksk1 1, sHx = kxk1
, (16)
which implies,
si =xi
|xi
| , xi 6= 0|si| < 1, xi = 0,
(17)
i.e., for every active element xi 6= 0 of the vector x 2CN , the corresponding element of the subgradient is aunit vector in the direction of xi. For every null elementxi = 0 the corresponding element of the subgradient hasamplitude less than unity. Thus, the amplitude of thesubgradient is uniformly bounded by unity, ksk1 1.Denote,
r = 2AH (y �Ax̂) , (18)
the beamformed residual vector for the estimated solu-tion x̂. The minimum (15) is attained if,
0 2 @
x
L(x, µ) ) r 2 µ@
x
kxk1. (19)
Then, from (17) and (19), the coe�cients ri =2aHi (y �Ax̂) of the beamformed residual vector r 2 CN
have amplitude such that,
|ri| = µ, x̂i 6= 0|ri| < µ, x̂i = 0,
(20)
Compressive beamforming 3
µ
CS: LASSO for Multiple Snapshots
Y = [y1,…, yL ]∈CNxL Data
A = [a1,!,aM ]∈CNxL Sensing matrix
X = [x1,…,xL ]∈CMxL Source amplitudesData Fit
Y -AX2
2
which has the least squares solutionX =AH (AAH )−1Y ≈ AHY→ a new solution for every snapshot.Convetional beamforming
x(θm ) = 1LamH (YYH )am
One source magnitude for all snapshots.
Row sparsity constraint
X21= xn
l2
n=1
N
∑ with xnl2 = xnl
l=1
L
∑2
CS
X̂ =X∈CMxL
argmin Y - AX2
2+µ X
21
Snapshots5 10 15 20 25 30 35 40 45 50
DO
A (
de
gre
es)
-80
-60
-40
-20
0
20
40
60
80
0
2
4
6
8
10
12
14
16
18
The complex amplitude of X is allowed to vary across snapshots, but the sparsity pattern is assumed to be constant across snapshots
Problem with Degrees of Freedom
• As the number of snapshots (=observations) increases, so does the number of unknown complex source amplitudes
• PROBLEM: LASSO for multiple snapshots estimates the realizations of
the random complex source amplitudes.
• However, we would be satisfied if we just estimated their power
γm = E{ |xml|2 } • Note that γm does not depend on snapshot index l.
The problem revisited
Multiple snapshots: l=1,...,L
Likelihood function (observations conditioned on source amplitudes):
Sparsity promoted by Gaussian prior?
Sparsity promoted by Gaussian prior?
−2 −1.5 −1 −0.5 0 0.5 1 1.5 20
0.5
1
1.5
2
2.5
3
a = 0.125a = 0.25a = 0.5
Proceeding with Bayes rule
is
Evidence
To determine the hyperparameters γ1, γ2, ..., γM, and σ2 the evidence is maximized. The evidence is the product of the likelihood and the prior integrated over the complex source signals.
Maximizing the Evidence: Covariance Fitting
Maximizing the Evidence: Covariance Fitting
= 0
Exploiting Jaffer’s necessary condition
A.G. Jaffer. Maximum likelihood direction finding of stochastic sources: A separable solution. In IEEE Int. Conf. on Acoust., Speech, and Sig. Proc. (ICASSP-88), vol. 5, pp. 2893–2896, 1988.
Sparse Bayesian Learning Algorithm
Sparse Bayesian Learning Algorithm
Sparse Bayesian Learning Algorithm
D.P. Wipf, B.D. Rao. An empirical Bayesian strategy for solving the 1000 simultaneous sparse approximation problem. IEEE Trans. Signal Proc.,55(7):3704–3716, 2007.
Example Scenario
In the simulation, we consider an array with N = 20 antenna elements and intersensor spacing d = λ/2. The DOAs for plane wave arrivals are assumed to be on a fine angular grid [−90:0.5:90]◦and L = 50 snapshots are observed. The CS solution is found using LASSO extended to multiple measurement vectors (multiple snapshots) 3 sources at DOAs [−3, 2, 75] degrees with magnitudes [4, 13, 10].
Example Scenario
N = 20 elements
Source 1 DOA = -3° Magnitude = 4
Source 2 DOA = +2° Magnitude = 13
Source 3 DOA = +75° Magnitude = 10
Example Results
0
10
20
P [d
B]
a)
0
20
40 CS SNR=0 RMSE:1.2 9.8 0.49b)
0
20
40 CBF SNR=0 RMSE:16 20 0.51
-90 -60 -30 0 30 60 90DOA [◦]
0
20
40
Bin
coun
t SBL SNR=0 RMSE:0.64 0 0.66
-5 0 5 10 15 20Array SNR (dB)
0
2
4
6
8
10
DOA
RMSE
[◦]
c)
RMVSBL-EMCSExhaustCBFMusic
Example RMSE Performance
0
10
20
P [d
B]
a)
0
20
40 CS SNR=0 RMSE:1.2 9.8 0.49b)
0
20
40 CBF SNR=0 RMSE:16 20 0.51
-90 -60 -30 0 30 60 90DOA [◦]
0
20
40
Bin
coun
t SBL SNR=0 RMSE:0.64 0 0.66
-5 0 5 10 15 20Array SNR (dB)
0
2
4
6
8
10
DOA
RMSE
[◦]
c)
RMVSBL-EMCSExhaustCBFMusic
RVM-ML SBL-EM LASSO Exhaust CBF MUSIC
Example CPU Time
1 10 100 1000Snapshots
0.15
1
10
100
1000
CP
U t
ime
(s) RVM
RVM1LASSO
1 10 100 1000Snapshots
0
0.5
1
1.5
2
DO
A R
MS
E [°]
RVMRVM1LASSORVM-ML
RVM-ML1 LASSO
Conclusions
• Sparse Bayesian Learning for complex valued array data using evidence maximization.
• In examples it is ~ 50% faster than the SBL Expectation Maximization (SBL-EM) approach.
• For multiple measurement vectors (snapshots) with stationary sources the benefit of RVM-ML is pronounced.
– For each DOA it uses the hyperparameter γm as a proxy, with computational effort independent of no. of snapshots.
– Increasing no. of snapshots improves the RMSE. – The RMSE performance of RVM and exhaustive search are equal in this example
References 1. D. Malioutov, M. Cetin, A.S. Willsky. A sparse signal reconstruction
perspective for source localization with sensor arrays. IEEE Trans. Signal Process., 53(8):3010–3022, 2005.
2. A. Xenaki, P. Gerstoft, and K. Mosegaard. Compressive beamforming. J. Acoust. Soc. Am., 136(1):260–271, 2014.
3. A. Xenaki, P. Gerstoft. Grid-free compressive beamforming. J. Acoust. Soc. Am., 137:1923–1935, 2015.
4. H.L. Van Trees. Optimum Array Processing, chapter 1–10. Wiley- Interscience, New York, 2002.
5. G.F. Edelmann, C.F. Gaumond. Beamforming using compressive sensing. J. Acoust. Soc. Am., 130(4):232–237, 2011.
6. C.F. Mecklenbräuker, P. Gerstoft, A. Panahi, and M. Viberg. Sequential Bayesian sparse signal reconstruction using array data. IEEE Trans. Signal Process., 61(24):6344–6354, 2013.
7. S. Fortunati, R. Grasso, F. Gini, M. S. Greco, and K. LePage. Single- snapshot DOA estimation by using compressed sensing. EURASIP J. Adv. Signal Process., 120(1):1–17, 2014.
References 8. D.P. Wipf, B.D. Rao. An empirical Bayesian strategy for solving the
1000 simultaneous sparse approximation problem. IEEE Trans. Signal Proc.,55(7):3704–3716, 2007.
9. P. Gerstoft, A. Xenaki, C.F. Mecklenbräuker. Multiple and single snapshot compressive beamforming. J. Acoust. Soc. Am., 138(4):2003– 2014, 2015.
10. P.Stoica, P.Babu. Spice and likes:Two hyperparameter-free methods for sparse-parameter estimation. Signal Proc., 92(7):1580–1590, 2012.
11. D. P. Wipf, B. D. Rao. Sparse Bayesian learning for basis selection. IEEE Trans. Signal Proc, 52(8):2153–2164, 2004.
12. Z. Zhang, B.D. Rao. Sparse signal recovery with temporally correlated source vectors using sparse bayesian learning. IEEE J Sel. Topics Signal Proc.,, 5(5):912–926, 2011.
13. M.E. Tipping. Sparse Bayesian learning and the relevance vector machine. J. Machine Learning Research, 1:211–244, 2001.
References 14. J.F. Böhme. Source-parameter estimation by approximate maximum
likelihood and nonlinear regression. IEEE J. Oc. Eng.,10(3):206–212,1985. 15. J.F. Böhme. Estimation of spectral parameters of correlated signals in
wavefields. Signal Processing, 11:329–337, 1986. 16. A.G. Jaffer. Maximum likelihood direction finding of stochastic sources: A
separable solution. In IEEE Int. Conf. on Acoust., Speech, and Sig. Proc. (ICASSP-88), vol. 5, pp. 2893–2896, 1988.
17. P. Stoica, A. Nehorai. On the concentrated stochastic likelihood function in array processing. Circuits Syst. Signal Proc., 14(5):669– 674, 1995.
18. Z.-M. Liu, Z.-T. Huang, Y.-Y. Zhou. An efficient maximum likelihood method for direction-of-arrival estimation via sparse bayesian learning. IEEE Trans. Wireless Comm., 11(10):1–11, Oct. 2012.
19. R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B, 58(1):267–288, 1996.
20. A.P. Dempster, N.M. Laird, D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, pp. 1–38, 1977.