Sparse Bayesian Learning for Wavefields from Sensor Array Data · DOA estimation with arrays DOA estimation with sensor arrays-90 o90 0o-45o 45 o 1 2 x 1 x 2 p 1(r,t) = x 1 e

Sparse Bayesian Learning for Wavefields from Sensor Array Data

Christoph Mecklenbräuker and Peter Gerstoft

Plan MAP+Lasso path (3 slides) Performance simulation (6 slides) Three acoustic data sets (4 slides)

Gerstoft et al, JASA Oct 2015.

Y = A X + N

Y A X N

CS approach to geophysical data analysis

CS of Earthquakes

Yao, GRL 2011, PNAS 2013

Sequential CS

Mecklenbrauker, TSP 2013

a) Sequential h0=0.5

5 10 15 20 25 30 35 40 45 500

45

90

135

180

Time

DOA

(deg

)

b) Sequential h0=0.05

5 10 15 20 25 30 35 40 45 500

45

90

135

180

10

15

20

25

30

35

40

CS beamforming

Xenaki, JASA 2014, 2015Gerstoft JASA 2015

CS fathometer

Yardim, JASA 2014

CS Sound speed estimation

Bianco, JASA 2016 Gemba, JASA 2016

CS matched field

DOA estimation with arrays

DOA estimation with sensor arrays

-90o 90o

0o

-45o 45o

�1 �2

x1

x2

p1(r,t) = x1 ej(�t-k1r)

k1

k2

p2(r,t) = x2 ej(�t-k2r)

r

�

�

x 2 C, ✓ 2 [�90�, 90�]

k = �2⇡

�sin ✓, �:wavelength

ym =X

n

xnej 2⇡� rm sin ✓n

m 2 [1, · · · ,M]: sensorn 2 [1, · · · ,N]: look direction

y = AM⇥Nx

y = [y1, · · · , yM ]T , x = [x1, · · · , xN ]T

A = [a1, · · · , aN ]

an =1pM

[e j2⇡� r1 sin ✓n , · · · , e j

2⇡� rM sin ✓n ]T

A. Xenaki (SIO/DTU) Compressive beamforming, JASA 2014 UA 2014 3 / 16

Beamfroming vs Compressive sensing

y = AM⇥Nx+n,M < N

n 2 CM , SNR = 20log10kAxk2knk2 , knk2 ✏

Beamformingsimplified l2-norm minimization (AAH = IM )

x̂ = A

Hy = A

HAx+A

Hn

Compressive sensingl1-norm minimization

x̂ = argminx2CN

kxk1 s.t. kAx� yk2 ✏

ULA M = 8, d� = 1

2 , [✓1, ✓2] = [0, 5]�, SNR = 20 dB

−90 −60 −30 0 30 60 90

−20

−10

0

θ [◦]

P[dB

remax

]

−90 −60 −30 0 30 60 90

−20

−10

0

θ [◦]

P[dB

remax

]

A. Xenaki (SIO/DTU) Compressive beamforming, JASA 2014 UA 2014 7 / 16

CBF : x̂ =AHy CS: min Ax− y2

2+µ x

1( )

N>>M

High resolution No sidelobes

MAP

Likelihood (noise complex Gaussian) p(y | x)∝ exp −Ax − y

2

2

σ 2

⎛

⎝⎜⎜

⎞

⎠⎟⎟

Prior (Laplacian) p(x)∝ exp −x

1

ν

⎛

⎝⎜

⎞

⎠⎟

Bayes rule p(x|y)∝ p(y|x)p(x)∝ exp −Ax − y

2

2

σ 2 −x

1

ν

⎛

⎝⎜⎜

⎞

⎠⎟⎟

MAP

solutions. The choice of the (unconstrained) LASSO for-mulation (8) over the constrained formulation (7) allowsthe sparse reconstruction method to be interpreted in astatistical Bayesian setting, where the unknowns x andthe observations y are both treated as stochastic (ran-dom) processes, by imposing a prior distribution on thesolution vector x which promotes sparsity14–16.

The Bayes theorem32 connects the posterior distribu-tion p(x|y), of the model parameters x conditioned onthe data y, with the data likelihood p(y|x), the prior dis-tribution of the model parameters p(x) and the marginaldistribution of the data p(y),

p(x|y) = p(y|x)p(x)p(y)

. (9)

From the Bayes rule (9), the maximum a posteriori(MAP) estimate is,

x̂MAP = argmaxx

ln p(x|y)

= argmaxx

[ln p(y|x) + ln p(x)]

= argminx

[� ln p(y|x)� ln p(x)] ,

(10)

where the marginal distribution of the data p(y) is omit-ted since it is independent of the model x.

Based on a complex Gaussian noise model with i.i.d.real and imaginary parts, n ⇠ CN (0,�2

I), the likelihoodof the data is also complex Gaussian distributed p(y|x) ⇠N (Ax,�

2I),

p(y|x) / e

� ky�Axk22�

2. (11)

Assuming that the coe�cients of the solution vector x

have i.i.d. Laplace (i.e., double exponential) priors33,

p(x) /NY

i=1

e

(� |xi

|⌫

) = e

(� kxk1⌫

), (12)

the LASSO estimate (8) can be interpreted as the maxi-mum a posteriori (MAP)estimate,

x̂MAP = argminx

⇥

ky �Axk22 + µkxk1⇤

= x̂LASSO(µ),

(13)where µ = �

2/⌫. The Laplace prior distribution encour-

ages sparse solutions with many zero components since itconcentrates more mass near 0 than in the tails. There-fore, the model selected by the LASSO optimization al-gorithm has the highest posterior probability under theBayesian framework.

V. REGULARIZATION PARAMETER SELECTION

The choice of the regularization parameter µ in (8),also called LASSO shrinkage parameter, is crucial as itcontrols the balance between the degree of sparsity ofthe estimated solution and the data fit determining thequality of the reconstruction.

For large µ, the solution is very sparse (with small `1-norm) but the data fit is poor. As µ decreases towardszero, the data fit is gradually improved since the cor-responding solutions become less sparse. Note that forµ = 0 the solution (8) becomes the unconstrained leastsquares solution.

A. The LASSO path

As the regularization parameter µ evolves from 1 to 0,the LASSO solution (8) changes continuously followinga piecewise smooth trajectory referred to as the solutionpath or the LASSO path18,19,34. In this section, we showthat the singularity points in the LASSO path are as-sociated with a change in the degree of sparsity of thesolution and can be used to indicate a proper value forµ.We obtain the full solution path using convex optimiza-

tion to solve (8) iteratively for di↵erent values of µ. Weuse the cvx toolbox for disciplined convex optimizationwhich is available in the Matlab environment. It usesinterior point solvers to obtain the global solution of awell-defined optimization problem17,28,29.

Let L(x, µ) denote the objective function in (8),

L(x, µ) = ky �Axk22 + µkxk1. (14)

The value x̂ minimizing (14) is found by di↵erentiation,

g(µ) = infx2CN

L(x, µ),

@

x

L(x, µ) = 2AH (Ax� y) + µ@

x

kxk1,(15)

where the subdi↵erential operator @x

is a generalizationof the partial di↵erential operator for functions that arenot di↵erentiable everywhere (Ref.29 p.338). The sub-gradient for the `1-norm is the set of vectors defined as,

@

x

kxk1 =�

s : ksk1 1, sHx = kxk1

, (16)

which implies,

si =xi

|xi

| , xi 6= 0|si| < 1, xi = 0,

(17)

i.e., for every active element xi 6= 0 of the vector x 2CN , the corresponding element of the subgradient is aunit vector in the direction of xi. For every null elementxi = 0 the corresponding element of the subgradient hasamplitude less than unity. Thus, the amplitude of thesubgradient is uniformly bounded by unity, ksk1 1.Denote,

r = 2AH (y �Ax̂) , (18)

the beamformed residual vector for the estimated solu-tion x̂. The minimum (15) is attained if,

0 2 @

x

L(x, µ) ) r 2 µ@

x

kxk1. (19)

Then, from (17) and (19), the coe�cients ri =2aHi (y �Ax̂) of the beamformed residual vector r 2 CN

have amplitude such that,

|ri| = µ, x̂i 6= 0|ri| < µ, x̂i = 0,

(20)

Compressive beamforming 3

µ

CS: LASSO for Multiple Snapshots

Y = [y1,…, yL ]∈CNxL Data

A = [a1,!,aM ]∈CNxL Sensing matrix

X = [x1,…,xL ]∈CMxL Source amplitudesData Fit

Y -AX2

2

which has the least squares solutionX =AH (AAH )−1Y ≈ AHY→ a new solution for every snapshot.Convetional beamforming

x(θm ) = 1LamH (YYH )am

One source magnitude for all snapshots.

Row sparsity constraint

X21= xn

l2

n=1

N

∑ with xnl2 = xnl

l=1

L

∑2

CS

X̂ =X∈CMxL

argmin Y - AX2

2+µ X

21

Snapshots5 10 15 20 25 30 35 40 45 50

DO

A (

de

gre

es)

-80

-60

-40

-20

0

20

40

60

80

0

2

4

6

8

10

12

14

16

18

The complex amplitude of X is allowed to vary across snapshots, but the sparsity pattern is assumed to be constant across snapshots

Problem with Degrees of Freedom

•  As the number of snapshots (=observations) increases, so does the number of unknown complex source amplitudes

•  PROBLEM: LASSO for multiple snapshots estimates the realizations of

the random complex source amplitudes.

•  However, we would be satisfied if we just estimated their power

γm = E{ |xml|2 } •  Note that γm does not depend on snapshot index l.

The problem revisited

Multiple snapshots: l=1,...,L

Likelihood function (observations conditioned on source amplitudes):

Sparsity promoted by Gaussian prior?

Sparsity promoted by Gaussian prior?

−2 −1.5 −1 −0.5 0 0.5 1 1.5 20

0.5

1

1.5

2

2.5

3

a = 0.125a = 0.25a = 0.5

Proceeding with Bayes rule

is

Evidence

To determine the hyperparameters γ1, γ2, ..., γM, and σ2 the evidence is maximized. The evidence is the product of the likelihood and the prior integrated over the complex source signals.

Maximizing the Evidence: Covariance Fitting

Maximizing the Evidence: Covariance Fitting

= 0

Exploiting Jaffer’s necessary condition

A.G. Jaffer. Maximum likelihood direction finding of stochastic sources: A separable solution. In IEEE Int. Conf. on Acoust., Speech, and Sig. Proc. (ICASSP-88), vol. 5, pp. 2893–2896, 1988.

Sparse Bayesian Learning Algorithm



D.P. Wipf, B.D. Rao. An empirical Bayesian strategy for solving the 1000 simultaneous sparse approximation problem. IEEE Trans. Signal Proc.,55(7):3704–3716, 2007.

Example Scenario

In the simulation, we consider an array with N = 20 antenna elements and intersensor spacing d = λ/2. The DOAs for plane wave arrivals are assumed to be on a fine angular grid [−90:0.5:90]◦and L = 50 snapshots are observed. The CS solution is found using LASSO extended to multiple measurement vectors (multiple snapshots) 3 sources at DOAs [−3, 2, 75] degrees with magnitudes [4, 13, 10].

Example Scenario

N = 20 elements

Source 1 DOA = -3° Magnitude = 4

Source 2 DOA = +2° Magnitude = 13

Source 3 DOA = +75° Magnitude = 10

Example Results

0

10

20

P [d

B]

a)

0

20

40 CS SNR=0 RMSE:1.2 9.8 0.49b)

0

20

40 CBF SNR=0 RMSE:16 20 0.51

-90 -60 -30 0 30 60 90DOA [◦]

0

20

40

Bin

coun

t SBL SNR=0 RMSE:0.64 0 0.66

-5 0 5 10 15 20Array SNR (dB)

0

2

4

6

8

10

DOA

RMSE

[◦]

c)

RMVSBL-EMCSExhaustCBFMusic

Example RMSE Performance

0

10

20

P [d

B]

a)

0

20

40 CS SNR=0 RMSE:1.2 9.8 0.49b)

0

20

40 CBF SNR=0 RMSE:16 20 0.51

-90 -60 -30 0 30 60 90DOA [◦]

0

20

40

Bin

coun

t SBL SNR=0 RMSE:0.64 0 0.66

-5 0 5 10 15 20Array SNR (dB)

0

2

4

6

8

10

DOA

RMSE

[◦]

c)

RMVSBL-EMCSExhaustCBFMusic

RVM-ML SBL-EM LASSO Exhaust CBF MUSIC

Example CPU Time

1 10 100 1000Snapshots

0.15

1

10

100

1000

CP

U t

ime

(s) RVM

RVM1LASSO

1 10 100 1000Snapshots

0

0.5

1

1.5

2

DO

A R

MS

E [°]

RVMRVM1LASSORVM-ML

RVM-ML1 LASSO

Conclusions

•  Sparse Bayesian Learning for complex valued array data using evidence maximization.

•  In examples it is ~ 50% faster than the SBL Expectation Maximization (SBL-EM) approach.

•  For multiple measurement vectors (snapshots) with stationary sources the benefit of RVM-ML is pronounced.

–  For each DOA it uses the hyperparameter γm as a proxy, with computational effort independent of no. of snapshots.

–  Increasing no. of snapshots improves the RMSE. –  The RMSE performance of RVM and exhaustive search are equal in this example

References 1.  D. Malioutov, M. Cetin, A.S. Willsky. A sparse signal reconstruction

perspective for source localization with sensor arrays. IEEE Trans. Signal Process., 53(8):3010–3022, 2005.

2.  A. Xenaki, P. Gerstoft, and K. Mosegaard. Compressive beamforming. J. Acoust. Soc. Am., 136(1):260–271, 2014.

3.  A. Xenaki, P. Gerstoft. Grid-free compressive beamforming. J. Acoust. Soc. Am., 137:1923–1935, 2015.

4.  H.L. Van Trees. Optimum Array Processing, chapter 1–10. Wiley- Interscience, New York, 2002.

5.  G.F. Edelmann, C.F. Gaumond. Beamforming using compressive sensing. J. Acoust. Soc. Am., 130(4):232–237, 2011.

6.  C.F. Mecklenbräuker, P. Gerstoft, A. Panahi, and M. Viberg. Sequential Bayesian sparse signal reconstruction using array data. IEEE Trans. Signal Process., 61(24):6344–6354, 2013.

7.  S. Fortunati, R. Grasso, F. Gini, M. S. Greco, and K. LePage. Single- snapshot DOA estimation by using compressed sensing. EURASIP J. Adv. Signal Process., 120(1):1–17, 2014.

References 8.  D.P. Wipf, B.D. Rao. An empirical Bayesian strategy for solving the

1000 simultaneous sparse approximation problem. IEEE Trans. Signal Proc.,55(7):3704–3716, 2007.

9.  P. Gerstoft, A. Xenaki, C.F. Mecklenbräuker. Multiple and single snapshot compressive beamforming. J. Acoust. Soc. Am., 138(4):2003– 2014, 2015.

10.  P.Stoica, P.Babu. Spice and likes:Two hyperparameter-free methods for sparse-parameter estimation. Signal Proc., 92(7):1580–1590, 2012.

11.  D. P. Wipf, B. D. Rao. Sparse Bayesian learning for basis selection. IEEE Trans. Signal Proc, 52(8):2153–2164, 2004.

12.  Z. Zhang, B.D. Rao. Sparse signal recovery with temporally correlated source vectors using sparse bayesian learning. IEEE J Sel. Topics Signal Proc.,, 5(5):912–926, 2011.

13.  M.E. Tipping. Sparse Bayesian learning and the relevance vector machine. J. Machine Learning Research, 1:211–244, 2001.

References 14.  J.F. Böhme. Source-parameter estimation by approximate maximum

likelihood and nonlinear regression. IEEE J. Oc. Eng.,10(3):206–212,1985. 15.  J.F. Böhme. Estimation of spectral parameters of correlated signals in

wavefields. Signal Processing, 11:329–337, 1986. 16.  A.G. Jaffer. Maximum likelihood direction finding of stochastic sources: A

separable solution. In IEEE Int. Conf. on Acoust., Speech, and Sig. Proc. (ICASSP-88), vol. 5, pp. 2893–2896, 1988.

17.  P. Stoica, A. Nehorai. On the concentrated stochastic likelihood function in array processing. Circuits Syst. Signal Proc., 14(5):669– 674, 1995.

18.  Z.-M. Liu, Z.-T. Huang, Y.-Y. Zhou. An efficient maximum likelihood method for direction-of-arrival estimation via sparse bayesian learning. IEEE Trans. Wireless Comm., 11(10):1–11, Oct. 2012.

19.  R. Tibshirani. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B, 58(1):267–288, 1996.

20.  A.P. Dempster, N.M. Laird, D.B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B, pp. 1–38, 1977.

Documents

Sparse Bayesian Learning for Wavefields from Sensor Array Data · DOA estimation with arrays DOA estimation with sensor arrays-90 o90 0o-45o 45 o 1 2 x 1 x 2 p 1(r,t) = x 1 e