8

Click here to load reader

Statistical modeling and denoising Wigner–Ville distribution

Embed Size (px)

Citation preview

Page 1: Statistical modeling and denoising Wigner–Ville distribution

Digital Signal Processing 23 (2013) 506–513

Contents lists available at SciVerse ScienceDirect

Digital Signal Processing

www.elsevier.com/locate/dsp

Statistical modeling and denoising Wigner–Ville distribution

Maryam Amirmazlaghani a,∗, Hamidreza Amindavar b

a Amirkabir University of Technology, Department of Computer Engineering and Information Technology, Tehran, Iranb Amirkabir University of Technology, Department of Electrical Engineering, Tehran, Iran

a r t i c l e i n f o a b s t r a c t

Article history:Available online 3 September 2012

Keywords:Statistical modeling2-D GARCH modelMAP estimationWigner–Ville distribution

Studying the properties of the Wigner–Ville distribution (wvd) and its smoothed versions such assmoothed pseudo-WVD (spwvd), we demonstrate that they have significantly non-Gaussian statistics.Also, we investigate the presence of two-dimensional heteroscedasticity in them for different signalsbased on employing Lagrange multiplier (LM) procedure. Therefore, we employ a heteroscedastic modelcalled two-dimensional generalized autoregressive conditional heteroscedastic (2-D garch) for statisticalmodeling of these distributions. This modeling captures the characteristics of WVD and SPWVD, suchas heavy tailed marginal distribution, and the dependencies among them. Since the performanceof WVD and its smoothed versions degrade in the presence of additive noise, we design a novelBayesian estimator for estimating the clean distributions based on garch modeling. Also, estimating theinstantaneous frequency (if) curves of signals in presence of noise based on WVD and its smoothedversions is an interesting topic in the radar domain. So, we apply the denoised distributions forestimating the if. Experimental results demonstrate the efficiency of proposed method in denoising wvd

and SPWVD and also performance improvement for if estimation in utilizing the denoised distributions.© 2012 Elsevier Inc. All rights reserved.

1. Introduction

Almost all physical signals are obtained by receivers record-ing variations with time, hence, the time representation is usuallythe first (and the most natural) description of a signal we con-sider. The Fourier transform has become one of the most widelyused signal-analysis tools in real-time signal analysis. While theFourier transform is a very useful concept for stationary signals,it is not adapted to the analysis of non-stationary signals since itessentially tells us which frequencies are contained in the signal,as well as their corresponding amplitudes and phases, but doesnot tell us at which times these frequencies occur. Many signalsencountered in real-world situations are non-stationary and theirfrequency contents change over the time. The Fourier transformthat is a mono-dimensional solution seems not to be sufficient,and one has to consider bi-dimensional functions; i.e., functionsof at least two variables such as time and frequency. A first classof such time–frequency representations is given by the atomic de-compositions also known as the linear time–frequency representa-tions (e.g., short-time Fourier transform, Gabor expansion, Wavelettransform). These distributions decompose the signal on a basisof elementary signals (the atoms) which have to be well localizedin time and in frequency. Another class of such time–frequency

* Corresponding author.E-mail addresses: [email protected] (M. Amirmazlaghani),

[email protected] (H. Amindavar).

1051-2004/$ – see front matter © 2012 Elsevier Inc. All rights reserved.http://dx.doi.org/10.1016/j.dsp.2012.08.016

representations is energy distribution also known as the bilineartime–frequency representations. The purpose of the energy dis-tributions is to distribute the energy of the signal over the twodescription variables: time and frequency. This concept has provedin a wide range of applications involving radar image processing,biomedical signal processing, speech processing, communications,power analysis, geophysics and acoustics [1–4]. The bilinear time–frequency representations are characterized by a number of distri-butions, the most common one being the Wigner–Ville distribution(WVD) [5,6] because of its simplicity and better characterizationof the signal’s time-dependent spectra than the STFT spectrogram(which is squared magnitude of the short-time Fourier transform)and scalogram (which is square of the wavelets). Given a non-stationary random process x(t), its WVD is defined as

W x(t, ν) =+∞∫

−∞x(t + τ/2)x∗(t − τ/2)e− j2πντ dτ . (1)

The WVD gives high resolution in time–frequency domain andalso has many other good properties such as energy conservation,translation covariance, dilation covariance, satisfying the marginalproperties and . . . . Despite of its good properties since WVD is abilinear function of the signal x, the quadratic superposition prin-ciple applies:

W x+n(t, ν) = W x(t, ν) + Wn(t, ν) + 2�{W x,n(t, ν)}

(2)

Page 2: Statistical modeling and denoising Wigner–Ville distribution

M. Amirmazlaghani, H. Amindavar / Digital Signal Processing 23 (2013) 506–513 507

where

W x,n(t, ν) =+∞∫

−∞x(t + τ/2)n∗(t − τ/2)e− j2πντ dτ (3)

is the cross-WVD of x and n. This can be easily generalized to Ncomponents, but for the sake of clarity, we will only consider thetwo-component case. The WVD cross term (W x,n(t, ν)) will benon-zero regardless of the time–frequency distance between thetwo signal terms. Therefore, if n is an additive noise, the bilinearnature of WVD, however, will accentuate the effects of noise. Theexistence of the noise and cross terms (Wn(t, ν) + 2�{W x,n(t, ν)})are troublesome since they may overlap with auto-terms (signalterms W x(t, ν)) and thus make it difficult to visually interpret theWVD image. Some methods have been proposed to overcome thisproblem [7,8]. The effects of cross terms can be suppressed inthe smoothed versions of the Wigner–Ville such as Choi–Williamsdistribution (CWD), pseudo-Wigner–Ville distribution (PWVD), andsmoothed pseudo-Wigner–Ville distribution (SPWVD) [9]. Thoughthese smoothed versions can suppress the noise effect, they stillcontain considerable noise and the existence of noise terms limitsthe use of WVD and its smoothed versions for many applicationsat low SNR.

To overcome this problem, we propose a novel approach for re-ducing effect of the noise and cross terms in WVD of a noisy signal.To simplify the explanations, in the following parts, conglomerateeffect of noise and the cross terms, i.e., Wn(t, ν)+2�{W x,n(t, ν)} iscalled noise–cross term. We consider digitized WVD and SPWVD.We first study the statistical behavior of WVD and SPWVD for dif-ferent signals. We demonstrate through extensive modeling thatWVD/SPWVD of different signals have significantly non-Gaussianstatistics. In this paper, we propose using 2-D GARCH for sta-tistically modeling WVD/SPWVD that is a heteroscedastic model.The one-dimensional GARCH model [10] is widely used for mod-eling financial time series. Extending the one-dimensional GARCHmodel into two dimensions has been discussed in [11,12]. Inthe same way, multidimensional GARCH model has been intro-duced in [13]. By GARCH modeling of WVD/SPWVD, we providea location-dependent conditional variance. 2-D GARCH model iscapable of taking into account important statistical characteristicsof the WVD/SPWVD for different signals as will be discussed inSection 3. Subsequently, using GARCH modeling, our goal is de-signing a Bayesian estimator for estimating the clean WVD/SPWVDcoefficients. For a Bayesian estimation process to be successful,the correct choice of priors for WVD/SPWVD of noise-free signal(W x(t, ν)) and the noise–cross terms (Wn(t, ν) + 2�{W x,n(t, ν)})is certainly a very important factor. Therefore, while we use 2-DGARCH model for WVD/SPWVD of noise-free signal (W x(t, ν)),we should choose an appropriate prior for noise–cross terms(Wn(t, ν) + 2�{W x,n(t, ν)}). Previously, the Gaussian distributionhas been used for the noise–cross terms [14]. But as mentioned in[15], the Gaussian distribution is not precisely fitted to the data.We study this fact and propose using Gaussian mixture distribu-tion for the noise–cross terms. After statistical modeling of noiseand signal component, we design a Bayesian estimator for esti-mating the clean WVD/SPWVD coefficients that its formulation isquite different from the processor proposed in [14]. Consequently,one can use the denoised WVD or denoised SPWVD in many relatedapplications. As an example of these applications, we use the de-noised distributions in estimating the instantaneous frequency (IF)of signals [8,16,17].

This paper is organized as follows: We introduce the 2-D GARCHmodel in Section 2. Section 3 is dedicated to studying the statis-tical behavior of WVD/SPWVD indicating their compatibility with2-D GARCH model. In Section 4, we describe a new Bayesianmethod for denoising the WVD and SPWVD of noisy signals based

on 2-D GARCH modeling. Consequently, in Section 5, we describean important application of time–frequency distributions, i.e., IFestimation. The experimental results are presented in Section 6.Finally, Section 7 concludes the study.

2. 2-D GARCH modeling of WVD/SPWVD

In this section, first, we describe 2-D GARCH model. Then,we demonstrate that the conditional variance of WVD coefficientsis non-constant, hence, we propose using GARCH model, that isa heteroscedastic model, for these coefficients. Consequently, westudy the statistical characteristics of WVD/SPWVD coefficients andwe verify the compatibility between these coefficients and 2-DGARCH model.

2.1. 2-D GARCH model

Generalized Autoregressive Conditional Heteroscedastic (GARCH)process allows the conditional variance to change over the timewhile conventional time series operate under assumption of con-stant variance [10,18]. Extending the one-dimensional GARCHmodel into two dimensions has been discussed in [11,12]. 2-DGARCH processes are zero mean, serially uncorrelated processeswith non-constant conditional variances. Let’s consider a doublyindexed process zi j to represent a two-dimensional stochastic pro-cess. This process follows a pure 2-D GARCH(p1, p2,q1,q2) modelif E(zi j) = 0 and

zi j =√

hijεi j, (4)

hij = α0 +∑

k�∈Λ1

αk�z2i−k, j−� +

∑k�∈Λ2

βk�hi−k, j−�, (5)

where

Λ1 = {k�∣∣ 0 � k � q1,0 � �� q2, (k�) �= (0,0)

},

Λ2 = {k�∣∣ 0 � k � p1,0 � �� p2, (k�) �= (0,0)

},

hij denotes the conditional variance of zi j and εi j ∼ N (0,1) is aniid two-dimensional stochastic process. Let the information set ψi j

be defined as

ψi j = {{zi−k, j−�}k,�∈Λ1 , {hi−k, j−�}k,�∈Λ2

},

zi j is therefore conditionally distributed as zi j | ψi j ∼ N (0,hij). IfE(zi j) �= 0, we use 2-D GARCH(p1, p2,q1,q2) regression model [14].This model is obtained by letting yij be the innovation in a two-dimensional linear regression:

yij = zi j − rTi jb, (6)

where ri j is a vector of explanatory variables and b is a vector ofunknown parameters. The model parameters, Γ = {{α0,α01, . . . ,

αq1q2 , β01, . . . , βp1 p2 },b}, can be estimated using maximum likeli-hood estimation (MLE) [12].

2.2. Interpretation of using 2-D GARCH model for WVD/SPWVD

Here, we show that the conditional variance of WVD changesover the time, i.e., heteroscedasticity exists in WVD coefficients.We suppose that the mean of WVD is equal to zero. If this suppo-sition is not correct, we can use regression model that its outputis zero mean. To compute the variance of WVD coefficients con-ditioned on their neighborhood (ψtν ) as defined in 2-D GARCHmodel, through (1), orthonormal expansion of x(t) seems viable.

Page 3: Statistical modeling and denoising Wigner–Ville distribution

508 M. Amirmazlaghani, H. Amindavar / Digital Signal Processing 23 (2013) 506–513

Fig. 1. Vertical bars show the normalized histogram of WVD/SPWVD coefficients. The best fitted Gaussian distribution is depicted in black solid lines and the histogram ofthe best fitted 2-D GARCH model is depicted in red dashed lines. (For interpretation of the references to color in this figure legend, the reader is referred to the web versionof this article.)

E{∣∣W (t, ν)

∣∣2 ∣∣ψtν}

=∞∫

−∞

∞∫−∞

E{

x(t + τ1/2)x∗(t − τ1/2)x∗(t + τ2/2)

× x(t − τ2/2)∣∣ψtν

}e− j2πντ1 e j2πντ2 dτ1 dτ2. (7)

For the evaluation of this variance we arrive at (7) which re-quires the knowledge of the fourth-order moments of the pro-cess x(t), this information is not generally available, moreover, itis generally time-varying. This can be seen by expanding x(t) interms of an orthonormal bases {φk}:

x(t) =∑

k

xkφk(t),⟨φn(t),φ

∗m(t)

⟩= δ(n − m) (8)

by using (8) in (7), we have an expression for conditional variancein (9):

E{∣∣W (t, ν)

∣∣2 ∣∣ψtν}

=∑k1

∑k2

∑k3

∑k4

E{

xk1 xk2 x∗k3

x∗k4

∣∣ψtν}( ∞∫

−∞

∞∫−∞

φk1(t + τ1/2)

× φk2(t − τ1/2)φ∗k3

(t + τ2/2)φ∗k4

(t − τ2/2)e− j2πντ1

× e j2πντ2 dτ1 dτ2

). (9)

We note that the double integral in (9) cannot be further simplifiedand the time, i.e. t , and frequency, i.e. ν , dependence remain in-tact, therefore, E{|W (t, ν)|2 | ψtν} is a time–frequency-dependentprocess, and W (t, ν) is a heteroscedastic process. The distributionof Wigner–Ville coefficients cannot be generally obtained. There-fore, we propose to use a Gaussian distribution with non-constant

variance to model these coefficients. This model is sufficiently gen-eral and can capture the heteroscedasticity of the coefficients. Asdescribed in the previous section, GARCH model provides Gaussiandistribution with varying variance, hence, we are motivated to useGARCH model for WVD.

2.3. Statistical properties of WVD/SPWVD using 2-D GARCH model

Here, we study the statistical properties of WVD/SPWVD coeffi-cients experimentally. First, we assess whether the WVD/SPWVDcoefficients of different noise-free signals deviate from the nor-mal distribution. To determine that, we make use of histogramplots [11], and compute kurtosis. Then, we determine whether the2-D GARCH model provides a flexible and appropriate tool for mod-eling the coefficients within the framework of WVD/SPWVD. Wefollow three approaches to check the compatibility between 2-DGARCH model and the WVD/SPWVD coefficients: (i) studying thecompatibility between normalized histogram of WVD/SPWVD coef-ficients and 2-D GARCH model, (ii) employing the Engle hypothesistest for the presence of ARCH/GARCH effects [18], and (iii) us-ing conditional histograms to check that if 2-D GARCH model cancorrectly capture the dependencies between the WVD/SPWVD co-efficients.

Histograms of WVD/SPWVD of different non-stationary signalsand the best fitted Gaussian probability density function (blacksolid lines) are shown in Fig. 1. Compared to a Gaussian, the under-lying density is more sharply peaked at zero, with more extensivetails. To quantify these results, we use the sample kurtosis (fourthmoment divided by squared of second moment). All of the es-timated kurtoses, as represented in Fig. 1, are larger than threewhich is expected for a Gaussian distribution. Therefore, it is obvi-ous that the histograms of WVD/SPWVD coefficients are not com-patible with Gaussian distribution. In the following, we study thecompatibility between WVD/SPWVD and 2-D GARCH model.

Page 4: Statistical modeling and denoising Wigner–Ville distribution

M. Amirmazlaghani, H. Amindavar / Digital Signal Processing 23 (2013) 506–513 509

As the first approach to examine the compatibility betweenthe 2-D GARCH model and WVD/SPWVD, we employ normalizedhistograms. Histograms give a good indication of whether 2-DGARCH model matches the data. Fig. 1 that depicts the histogramof WVD/SPWVD of different signals, also represents the histogramof the corresponding 2-D GARCH model (red dashed lines). A highlyaccurate fit is observed for different scales and orientation.

Now, we study the presence of heteroscedasticity in theWVD/SPWVD. Engle has proposed using LM procedure [18] thattests the existence of heteroscedasticity. It is a formal test for thepresence of ARCH effects. This test is simply based on the auto-correlation of the squared ordinary least square residuals. It teststhe null hypothesis that no ARCH effects exist. This test statis-tic is also asymptotically chi-square distributed and can be usedfor the one-dimensional signals. In Appendix A, we introduce anew extension of this test into two dimensions. Also, based onthe comparison between the structure of conditional variance inone-dimensional and 2-D GARCH model, where we usually use2-D GARCH(1,1,1,1), one can conclude that for testing the 2-DGARCH effect, the existence of GARCH effect in horizontal, verti-cal and diagonal scans should be checked. The results of applyingthis hypothesis test for WVD/SPWVD of different non-stationarysignals have been presented in Table 1. In this table, “H” is aBoolean decision variable. “0” indicates acceptance of the nullhypothesis that no GARCH effects exist. “pValue” indicates the sig-nificance level at which this test rejects the null hypothesis of noARCH effect and “GARCHstat” indicates ARCH test statistic. Signif-icance level is 0.05 in our experiments. It is clear from Table 1that two-dimensional heteroscedasticity exists in WVD/SPWVD ofthe tested signals that can be captured using 2-D GARCH model.In 2-D GARCH model, both the neighboring sample variances andthe neighboring conditional variances play a role in the currentconditional variance (5). As the third approach to check the com-patibility between the 2-D GARCH model and WVD/SPWVD, westudy whether this feature exists in the WVD/SPWVD of signals.To achieve this goal, we use conditional histograms [12]. We studythe conditional histogram H(c|n) of WVD/SPWVD coefficients, c,of signals conditioned on n. n is one of the neighboring samplesof c as defined in 2-D GARCH model (5). The choice of n doesnot affect the conditional histogram considerably. We computedthe conditional histograms of WVD/SPWVD for different signals.The form of the histograms, H(c|n), is surprisingly robust across awide range of signals. Fig. 2(a) shows the conditional histogram ofWVD coefficients H(c|n) for a sinusoidal FM signal.

This histogram illustrates several important aspects of the re-lationship between the two coefficients. First, the expected valueof c is approximately zero for all values of n. Second, the varianceof the conditional histogram of c depends on the value of n. Thus,c and n are statistically dependent. The structure of the relation-ship between c and n becomes more apparent upon transformingto the log domain.

Fig. 2(b) shows the histogram of log(c2) conditioned on thesquare of adjacent coefficient as defined in 2-D GARCH model (5)for the sinusoidal FM signal. The form of this histogram, H(log(c2)|log(n2)), is also robust across WVD/SPWVD of a wide range ofsignals. This conditional histogram is approximately concentratedalong a straight line. This suggests that the conditional expec-tation, E(c2|linear combination), is approximately proportional tothe underlying linear combination. These observations prove thatthe relation between the neighboring sample variances and theneighboring conditional variances in WVD/SPWVD coefficients ofdifferent signals is compatible with the underlying relations in 2-DGARCH model.

Table 1Results of using Engle’s hypothesis test for the presence of ARCH/GARCH effects inhorizontal, vertical and diagonal scan of WVD and SPWVD of different signals.

Signal Distribution Scan H pValue GARCHstat

Linear FM WVD vertical 1 0 1.4332e+003horizontal 1 0 1.0651e+004diagonal 1 0 6.4581e+003

Linear FM SPWVD vertical 1 0 1.4678e+004horizontal 1 0 1.6013e+004diagonal 1 0 1.5575e+004

Quadratic FM WVD vertical 1 0 1.8366e+003horizontal 1 0 1.5294e+004diagonal 1 0 3.1869e+003

Quadratic FM SPWVD vertical 1 0 1.4541e+004horizontal 1 0 1.6240e+004diagonal 1 0 1.4895e+004

Cubic FM WVD vertical 1 0 1.6660e+003horizontal 1 0 1.6253e+004diagonal 1 0 1.9893e+003

Cubic FM SPWVD vertical 1 0 1.4522e+004horizontal 1 0 1.6317e+004diagonal 1 0 1.2138e+004

Sinusoidal FM WVD vertical 1 0 3.9554e+003horizontal 1 0 7.0822e+003diagonal 1 0 7.2565e+003

Sinusoidal FM SPWVD vertical 1 0 1.4899e+004horizontal 1 0 1.5747e+004diagonal 1 0 1.4319e+004

Fig. 2. Brightness corresponds to probability. (a) The conditional histogram, H(c|n).(b) The log-domain conditional histogram for square, H(log(c2)| log(n2)), where “c”is the WVD coefficient of sinusoidal FM signal and “n” indicates one of the neigh-boring samples of “c” as defined in 2-D GARCH model.

2.4. Bootstrap technique for assessing the accuracy of GARCHcoefficients

After, demonstrating the efficiency of 2-D GARCH in modelingWVD/SPWVD coefficients, we address the statistical properties ofthe estimated model parameters in this section. A powerful tech-nique for assessing the accuracy of a parameter estimator in situa-tions where conventional techniques are not valid, is the bootstrapmethod [12]. We employ the bootstrap method numerically forassessing the accuracy of 2-D GARCH(1,1,1,1) parameters. Thismethod calculates confidence intervals for parameters in circum-stances where standard methods cannot be applied. To assess aparameter estimator using bootstrap method, we examine whetherthe estimated parameter is in corresponding confidence interval ornot. We use the bootstrap method for dependent data [19] and ex-tend it into two dimensions for our purposes. Table 2 shows theresults of using bootstrap method for calculating 95% confidenceinterval for 2-D GARCH coefficients corresponding to WVD of thesinusoidal FM signal. It is clear from this table that the 2-D GARCHparameters are in the 95% confidence interval.

Page 5: Statistical modeling and denoising Wigner–Ville distribution

510 M. Amirmazlaghani, H. Amindavar / Digital Signal Processing 23 (2013) 506–513

Table 295% confidence interval for GARCH coefficients of WVD of the sinusoidal FM signal.

GARCH coefficients Estimate 95% confidence interval

β10 0.1129 (0.0000,0.1965)

β01 0.0513 (−1.80e−016,0.1736)

β11 0.0000 (−2.43e−021,7.08e+006)

α10 0.2842 (0.2296,0.6436)

α01 0.3236 (0.3111,0.4933)

α11 0.2280 (−2.77e−017,0.2641)

3. A map estimator for reducing noise–cross terms in WVD andSPWVD

In this section, our goal is to design an MAP estimator that re-covers the signal component of the WVD of a noisy signal by using2-D GARCH model. A similar approach can be used for the SPWVD.The proposed processor is motivated by the modeling studies inthe previous section. This approach is built on rigorous statisti-cal theory. In the proposed method, we apply 2-D GARCH modelfor the WVD and then we use a Bayesian processor for estimatingthe WVD of the clean signal. Let y and x, respectively, representa noisy observation and the corresponding noise-free signal. Also,let n represent the corrupting additive noise component. We canwrite

y = x + n. (10)

We compute the WVD of the noisy signal:

W y(t, ν) = W x(t, ν) + Wn(t, ν) + 2�{W x,n(t, ν)}

= W x(t, ν) + N(t, ν), (11)

where N(t, ν) = Wn(t, ν) + 2�{W x,n(t, ν)} is the noise–crossterms. To simplify the notation, in the following parts, we useYtν , Xtν and Ntν instead of W y(t, ν), W x(t, ν), and N(t, ν) re-spectively: A Bayesian estimator can properly estimate clean WVDcoefficients Xtν if it takes into consideration the true statistics ofthe signal component (Xtν ) and the noise–cross terms Ntν . In thisstudy, we propose using 2-D GARCH model for signal component.In Section 3, we demonstrated that 2-D GARCH model is compat-ible with WVD/SPWVD coefficients and can capture the importantcharacteristics of these coefficients such as heavy tailed marginaldistribution and the dependencies between them.

Now, we study the distribution of noise–cross terms Ntν . Previ-ously, the Gaussian distribution has been used for the noise–crossterms [14]. To examine the distribution of the noise–cross terms,we add AWGN to the non-stationary signals, then, we computethe noise–cross terms. Fig. 3 shows the histograms of noise–crossterms together with the best fitted Gaussian distributions for cu-bic and sinusoidal FM signals. It is clear from this figure that theGaussian distribution is not precisely fitted to the data. Since Gaus-sian modeling of the noise–cross terms seems is missing somefeatures we propose to utilize a Gaussian mixture probability mod-eling to capture the missing. This is done by obtaining the pa-rameters of the Gaussian mixture in such a way to match thenoise–cross terms. We choose Gaussian mixture distribution be-cause it is more simple than the other heavy tailed distributionsand results a closed form for denoising WVD. The best fitted Gaus-sian mixture distributions have been shown in Fig. 3. It is clearfrom this figure that the Gaussian mixture is superior to the Gaus-sian because it provides a better fit to the actual data. So, we useGaussian mixture distribution for the noise–cross terms Ntν andassume that the number of Gaussian distributions in the mixtureis equal to M . We express:

Ntν ∼ GM(M, w,m,σ ), (12)

where w = {wi}Mi=1, m = {mi}M

i=1 and σ = {σi}Mi=1 indicate the

weights, the means and the standard deviations of Gaussian

Fig. 3. Vertical bars depict the normalized histogram of WVD/SPWVD noise–crossterms. The best fitted Gaussian distribution and the best fitted Gaussian mixturedistribution are depicted in blue and red lines, respectively. (For interpretation ofthe references to color in this figure legend, the reader is referred to the web ver-sion of this article.)

distributions in the mixture. Γn = {w,m, σ } indicates the parame-ter set of Gaussian mixture distribution for noise–cross terms Ntν .

Now, we can design the MAP estimator for estimating the cleanWVD coefficients Xtν , based on using 2-D GARCH model for Xtνand Gaussian mixture distribution for Ntν . First of all, we shouldestimate parameters of 2-D GARCH model Γ = {{α0,α01, . . . ,αq1q2 ,

β01, . . . , βp1 p2 },b} and the parameters of Gaussian mixture dis-tribution Γn . As mentioned in Section 2, we use the maximumlikelihood method for estimating unknown parameters. In thisnoisy case, the likelihood function is formulated as: nv whereΓT = {Γ,Γn}. Using 2-D GARCH model for Xtν , we can express:

f (Xtν | ψtν,ΓT ) = 1√2πhtν

exp

(−(Xtν − rTtνb )2

2htν

), (13)

where

htν = σ 2Xtν

= α0 +∑

k�∈Λ1

αk�

(Xt−k,ν−� − rT

t−k,ν−�b)2

+∑

k�∈Λ2

βk�ht−k,ν−�.

Also, from (12), we have:

f (Ntν | ψtν,ΓT ) =M∑

i=1

wi√2πσi

exp

(−(Ntν − mi)2

2σ 2i

). (14)

From (13), and (14), we express:

f (Ytν | ψtν,ΓT )

=M∑

i=1

wi√2π(σ 2 + htν)

exp

(−(Ytν − mi − rTtνb)2

2(σ 2i + htν)

). (15)

i

Page 6: Statistical modeling and denoising Wigner–Ville distribution

M. Amirmazlaghani, H. Amindavar / Digital Signal Processing 23 (2013) 506–513 511

Using (15), we have:

LF(Γ ) =∏

t,ν∈Φ

M∑i=1

wi√2π(σ 2

i + htν)

exp

(−(Ytν − mi − rTtνb)2

2(σ 2i + htν)

).

(16)

By maximizing the above likelihood function, we can estimate the2-D GARCH model parameters (Γ ) and the Gaussian mixture pa-rameters (Γn). After estimating the 2-D GARCH model and noise–cross terms distribution parameters from the data, our goal is todesign and implement a maximum a posteriori (MAP) estimator(for estimating Xtν ) given the noisy observation, Ytν , the parame-ters, and ψtν :

Xtν = maxXtν

f Xtν |Ytν ,ψtν ,ΓT (Xtν | Ytν,ψtν,ΓT ). (17)

Bayes’ theorem gives the a posteriori PDF of Xtν based on the mea-sured data:

f(

Xtν∣∣ Ytν,σ 2

Xtν

)= f (Xtν | ψtν,ΓT ) f (Ytν | Xtν,ψtν,ΓT )

f (Ytν | ψtν,ΓT ). (18)

Substituting (18) in (17), we get

Xtν = maxXtν

f (Xtν | ψtν,ΓT ) f (Ytν | Xtν,ψtν,ΓT ). (19)

In the above equation, f (Xtν | ψtν,ΓT ) can be substituted from(13). So, the conditional pdf of Ytν can be computed:

f (Ytν | Xtν,ψtν,ΓT ) =M∑

i=1

wi√2πσi

exp

(−(Ytν − mi − Xtν)2

2σ 2i

).

(20)

By substituting (13), (20) in (19) for computing Xtν , we canobtain the following formula:

Xtν =M∑

i=1

[wiN (rT

tνb + mi,htν + σ 2i )|Ytν∑M

j=1 w jN (rTtνb + m j,htν + σ 2

j )|Ytν

×(

htν

htν + σ 2i

× (Ytν − mi) + σ 2i

htν + σ 2i

rTtνb

)]. (21)

We are able to compute the MAP estimation by using (21). There-fore, a closed-form solution for the MAP estimate of noise-freeWVD exists when the signal prior and the noise–cross terms aredescribed by 2-D GARCH model and Gaussian mixture distribution,respectively.

4. Experimental results

In this section, we illustrate the performance of our proposedmethod by presenting the simulation results obtained by process-ing several test signals. We use 2-D GARCH(1,1,1,1). We studythe efficiency of proposed method in denoising the WVD andSPWVD of noisy signals. The number of frequency and time binsare equal to the signal length (N). For SPWVD, time smooth-ing window is a hamming window of size N/10 and frequencysmoothing window is a hamming window of size N/4. The WVDand SPWVD of a clean sinusoidal frequency modulated signal havebeen shown in Figs. 4(a) and (d), respectively. We add Gaussiannoise to this signal (SNR = 5 dB). Figs. 4(b) and (e) show the WVDand SPWVD of the noisy signal. Then, we apply the proposed de-noising method on the noisy WVD and SPWVD. The results ofdenoising have been shown in Figs. 4(c) and (f). In order to quan-tify the achieved performance improvement, an extensively usedmeasure is MSE that can be computed based on the original and

Fig. 4. (a) WVD of clean signal, (b) WVD of noisy signal, (c) denoised WVD,(d) SPWVD of clean signal, (e) SPWVD of noisy signal, and (f) denoised SPWVD(SNR = 5 dB).

Table 3MSE results for WVD and SPWVD.

Signal Noisy Denoised Noisy DenoisedSNR WVD WVD SPWVD SPWVD

Chirp 1.64e+003 90.2591 32.6615 7.6349−6 dB

Chirp 337.9601 64.6923 17.5430 9.7649−3 dB

Chirp 141.0449 63.0292 6.3862 2.92000 dB

Quadratic FM 1.23e+003 75.0377 37.5986 5.9952−6 dB

Quadratic FM 493.8924 64.2257 18.3138 7.3920−3 dB

Quadratic FM 170.1426 63.0583 7.2856 4.50570 dB

Cubic FM 1.18e+003 75.5308 38.2339 7.7770−6 dB

Cubic FM 650.7485 67.3700 20.5317 4.9657−3 dB

Cubic FM 208.8797 63.0002 4.5468 3.03230 dB

Sinusoidal FM 1.34e+003 71.1172 48.5626 8.5062−6 dB

Sinusoidal FM 569.9030 64.0000 19.9617 7.3270−3 dB

Sinusoidal FM 198.8754 63.1462 5.4123 3.34170 dB

the denoised data. We study the WVD and SPWVD denoising of achirp, a quadratic FM, a cubic FM and a sinusoidal FM signal at dif-ferent SNRs. Table 3 shows the results of denoising. It is clear fromFig. 4 and Table 3 that the proposed method has suppressed thenoise successfully while preserving important information of thesedistributions.

4.1. An application: Instantaneous frequency estimation

An important application of time–frequency distributions (suchas WVD) is the instantaneous frequency (IF) estimation that is

Page 7: Statistical modeling and denoising Wigner–Ville distribution

512 M. Amirmazlaghani, H. Amindavar / Digital Signal Processing 23 (2013) 506–513

Table 4MSE results between estimated IFs and known IF laws for IF estimators based ondifferent TFDs (SNR = 0 dB).

WVD PWVD SPWVD Denoised SPWVD

Chirp 0.0109 0.0109 0.0019 5.14e−004

Quadratic FM 0.0292 0.0292 0.0056 4.38e−004

Cubic FM 0.0490 0.0490 0.0275 0.0058

Sinusoidal FM 0.0262 0.0262 4.74e−004 1.31e−004

strongly dependent to the noise level. The concept of IF has be-come very useful in many engineering applications where it isused to describe the time-varying nature of a signal [3,17]. Manyradar signal processing problems involve changing frequencies. In atypical radar application, the IF aids in the detection, tracking, andimaging of targets whose radial velocities change with time. Whenthe radial velocity is not constant, the radar’s Doppler induced fre-quency has a non-stationary spectrum, which can be tracked byIF estimation techniques. IF can be determined by taking the firstmoment of the bilinear time–frequency distributions. Due to thequadratic nature of the time–frequency distributions, they accen-tuate the effects of noise, hence, they are not useful at low SNRs.Therefore, we propose to compute the IF using the denoised ver-sion of time–frequency distributions. IF estimation based on WVDis only applicable at high SNRs. Using the GARCH based denoisingmethod, we can apply denoised WVD or denoised SPWVD to esti-mate the IF at low SNRs. In order to evaluate the results of IF es-timation quantitatively, we compute the mean square error (MSE)between estimated IFs and known IF laws. Table 4 gives the MSEfor IF estimators based on different time–frequency distributions.In this table, the best value in each row is represented in bold. It isobvious from this table that using GARCH based denoising method,we can estimate IF more accurately in noisy conditions.

5. Conclusion

In this paper, we first studied the statistical behavior ofWVD and SPWVD. We showed that the conditional variance ofWVD coefficients is non-constant, hence, these coefficients areheteroscedastic. Consequently, we demonstrated the compatibil-ity between 2-D GARCH, that is a heteroscedastic model, andWVD/SPWVD coefficients. In the noisy situations, these distribu-tions accentuate the effect of noise. To denoise them, we designedand tested an MAP processor. We modeled the signal componentin various scales in terms of 2-D GARCH process and the noise–cross term has been modeled as Gaussian mixture process. Ourprocessor is based on solid statistical theory. Furthermore, pro-posed processor provides a closed-form solution for denoisingWVD/SPWVD. Experimental results demonstrate the good perfor-mance of proposed method in denoising these distributions.

Acknowledgments

Authors would like to acknowledge the comments provided bythe reviewers and the associate editor which enhanced the qual-ity of this paper. We especially appreciate Professor Ercan EnginKuruoglu’s coordinations throughout the review process.

Appendix A. Testing the existence of two-dimensionalheteroscedasticity

It is desirable to test whether the two-dimensional heterosce-dasticity exists in a two-dimensional process before going to sta-tistical modeling of it. The LM test procedure is ideal for this as inmany similar cases. Since the LM test has been described in liter-ature [20], we refrain from repeating the details. We compute the

LM test statistic for testing the existence of two-dimensional het-eroscedasticity. The existence of 2-D ARCH(q1,q2) is tested becausea general test for GARCH model is not feasible [10].

Suppose that yij follows 2-D GARCH(0,0,q1,q2), i.e., 2-DARCH(q1,q2) model as defined in (4) and (5). The logarithm oflikelihood function can be easily computed:

log(LF) = L =∑i j∈φ

−1

2log(hij) − y2

i j

2hij.

Due to (5), we have

hij = zi jα,

where α is the parameters vector and

α =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

α0α01...

α0q2

α10...

α1q2...

αq10...

αq1q2

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, zTi j =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

1y2

i, j...

y2i,( j−q2)

y2(i−1),( j)

...

y2(i−1),( j−q2)

...

y2(i−q1),( j)

...

y2(i−q1),( j−q2)

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

,

hence, the score vector (S) in the LM test can be computed as

S = ∂�

∂α=∑i j∈φ

(− zT

i j

2hij+ zT

i j y2i j

2h2i j

).

Under the null hypothesis (H0) that there is no heteroscedasticity,hij is constant denoted h0 and we can express:

S = 1

2h0

∑i j∈φ

zTi j

( y2i j

h0− 1

)= 1

2h0Z T f 0,

where

Z =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

z11z12...

z1N

z21...

z2N...

zM1...

zMN

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

, f 0 =

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

y211

h0 − 1

y212

h0 − 1...

y21N

h0 − 1

y221

h0 − 1...

y22N

h0 − 1...

y2M1

h0 − 1...

y2MNh0 − 1

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦

,

and M × N is the size of two-dimensional process. To computethe LM test statistic that is W = S T V −1 S , first we compute V =E{S S T }:

Page 8: Statistical modeling and denoising Wigner–Ville distribution

M. Amirmazlaghani, H. Amindavar / Digital Signal Processing 23 (2013) 506–513 513

V = E{

S S T }= E

{∂�

∂α

(∂�

∂α

)T}

= E

{[∑i j∈φ

(− zT

i j

2hij+ zT

i je2i j

2h2i j

)][ ∑i1 j1∈φ

(− zi1 j1

2hi1 j1

+ zTi1 j1

e2i1 j1

2h2i1 j1

)]}

= E

{∑i j∈φ

∑i1 j1∈φ

zTi j zi1 j1

4hijhi1 j1

( y2i j

hi j− 1

)( y2i1 j1

hi1 j1

− 1

)},

under H0 hypothesis, V can be computed as

V = E

{∑i j∈φ

∑i1 j1∈φ

zTi j zi1 j1

4h02

( y2i j

h0− 1

)( y2i1 j1

h0− 1

)}, (A.1)

if (i, j) �= (i1, j1), we can express:

V =∑i j∈φ

∑i1 j1∈φ

1

4h02E

{zT

i j zi1 j1

( y2i j

h0− 1

)( y2i1 j1

h0− 1

)},

A = E

{zT

i j zi1 j1

( y2i j

h0− 1

)( y2i1 j1

h0− 1

)},

where A can be expanded as in (A.2):

A = E

⎧⎪⎪⎨⎪⎪⎩

⎡⎢⎢⎣

1 y2i1( j1−1)

. . . y2(i1−q1)( j1−q2)

y2i( j−1)

y2i( j−1)

y2i1( j1−1)

. . . y2i( j−1)

y2(i1−q1)( j1−q2)

.

.

.

.

.

. . . .

.

.

.

y2(i−q1)( j−q2)

y2(i−q1)( j−q2)

y2i1( j1−1)

. . . y2(i−q1)( j−q2)

y2(i1−q1)( j1−q2)

⎤⎥⎥⎦

×( y2

i j

h0− 1

)( y2i1 j1

h0− 1

)⎫⎪⎪⎬⎪⎪⎭ . (A.2)

Under the null hypothesis, we have E{ y2i j

h0 − 1} = 0, hence through

(A.2) and due to the uncorrelatedness of yij , we can conclude thatif (i, j) �= (i1, j1), A will be equal to zero and consequently V =E{S S T } = 0.

If (i, j) = (i1, j1), we can rewrite (A.1) as

V =∑i j∈φ

1

4h02E

{zT

i j zi j

( y2i j

h02− 1

)2}.

Since yij is uncorrelated and zi j doesn’t involve yij , we can rewritethe above equation as

V =∑i j∈φ

1

4h02E{

zTi j zi j}

E

{( y2i j

h02− 1

)2}

=∑i j∈φ

1

4h02E{

zTi j zi j}× 2

= 1

2h02E{

Z T Z}.

At this step, we can compute W :

W = S T V −1 S = 1

2h0f 0T

Z

(1

2h02E{

Z T Z})−1 1

2h0Z T f 0

= 1

2f 0T

Z(

E{

Z T Z})−1

Z T f 0,

therefore, the LM test statistic can be consistently estimated by

LM = 1

2f 0T

Z(

Z T Z)−1

Z T f 0.

The statistic will be asymptotically distributed as chi-square whenthe null hypothesis is true.

References

[1] M. Xing, R. Wu, Y. Li, Z. Bao, New ISAR imaging algorithm based on modifiedWigner–Ville distribution, IET Radar Sonar Navig. 3 (1) (2009) 70–80.

[2] C. Kun, X. Shengl, Semi-blind fetal electrocardiogram extraction by eliminatingthe cross-terms of the Wigner–Ville representations, in: 5th International Con-ference on Bioinformatics and Biomedical Engineering (ICBBE), 2011, pp. 1–4.

[3] C.Y. Mei, A.Z. Sha’ameri, Adaptive windowed cross Wigner–Ville distributionas an optimum phase estimator for PSK signals, Digital Signal Process. 23 (1)(2013) 289–301, http://dx.doi.org/10.1016/j.dsp.2012.06.017.

[4] A.R. Abdullah, A.Z. Sha’ameri, Power quality analysis using smooth-windowedWigner–Ville distribution, in: 10th International Conference on InformationSciences Signal Processing and Their Applications (ISSPA), 2010, pp. 798–801.

[5] G. Matz, F. Hlawatsch, Wigner distributions (nearly) everywhere: Time–frequency analysis of signals, systems, random processes, signal spaces, andframes, Signal Process. 83 (7) (2003) 1355–1378.

[6] É. Chassande-Mottin, A. Pai, Discrete time and frequency Wigner–Ville distri-bution: Moyal’s formula and aliasing, IEEE Signal Process. Lett. 12 (7) (2005)508–511.

[7] Y. Dong, Y. Cui, Analysis of a new joint time–frequency distribution of sup-pressing cross-term, Res. J. Appl. Sci. Eng. Technol. 4 (11) (2012) 1580–1584.

[8] Y. Wang, Y.C. Jiang, New time–frequency distribution based on the polynomialWigner–Ville distribution and L class of Wigner–Ville distribution, IET SignalProcess. 4 (2) (2010) 130–136.

[9] S. Krishnan, A new approach for estimation of instantaneous mean frequencyof a time-varying signal, EURASIP J. Appl. Signal Process. 17 (2005) 2848–2855.

[10] T. Bollerslev, Generalized autoregressive conditional heteroscedasticity,J. Econometrics 31 (1986) 307–327.

[11] M. Amirmazlaghani, H. Amindavar, Two novel Bayesian multiscale approachesfor speckle suppression in SAR images, IEEE Trans. Geosci. Rem. Sens. 47 (7)(2010) 2980–2993.

[12] M. Amirmazlaghani, H. Amindavar, A.R. Moghaddamjoo, Speckle suppressionin SAR images using the 2-D GARCH model, IEEE Trans. Image Process. 18 (2)(2009) 250–259.

[13] A. Noiboar, I. Cohen, Anomaly detection based on wavelet domain GARCH ran-dom field modeling, IEEE Trans. Geosci. Rem. Sens. 45 (2007) 1361–1373.

[14] M. Amirmazlaghani, H. Amindavar, Modeling and denoising Wigner–Ville dis-tribution, in: Proc. IEEE DSP/SPE Workshop, Jan. 2009, pp. 457–462.

[15] I. Djurovic, L. Stankovic, J.F. Böhme, Robust L-estimation based forms of sig-nal transforms and time–frequency representation, IEEE Trans. Signal Pro-cess. 51 (7) (2003) 1753–1761.

[16] B. Barkat, Instantaneous frequency estimation of nonlinear frequency-modulated signals in the presence of multiplicative and additive noise, IEEETrans. Signal Process. 49 (10) (2001) 2214–2222.

[17] L. Rankine, M. Mesbah, B. Boashash, If estimation for multicomponent signalsusing image processing techniques in the time–frequency domain, Signal Pro-cess. 87 (2007) 1234–1250.

[18] R.F. Engle, Autoregressive conditional heteroskedasticity with estimates of thevariance of U.K. inflation, Econometrica 50 (1982) 987–1008.

[19] D.N. Politis, Computer intensive methods in statistical analysis, IEEE Signal Pro-cess. Mag. 15 (1998) 39–55.

[20] L.G. Godfrey, Misspecification Test in Econometrics: The Lagrange MultiplierPrinciple and Other Approaches, Andrew Chesher, 2005.

Maryam Amirmazlaghani received the B.S., M.S.,and Ph.D. degrees in 2003, 2005, and 2009 all inelectrical engineering from Iran University of Scienceand Technology, Sharif University of Technology, andAmirkabir University of Technology, Tehran, Iran, re-spectively.

Now, she is a faculty member in Computer En-gineering and Information Technology Department ofAmirkabir University of Technology. Her research in-

terests include statistical signal processing, information hiding and mul-tiresolution signal analysis.

Hamidreza Amindavar is a faculty member inElectrical Engineering Department of Amirkabir Uni-versity of Technology since 1993.

He obtained his BSEE in 1985, MSEE in 1987, MSCin applied mathematics in 1991, PhD in electrical en-gineering in 1991. His research interests include sta-tistical image processing, RADAR and SONAR signalprocessing, multiresolution signal analysis, and mul-tiuser detection.