90
Speech Processing Short-Time Fourier Transform Analysis and Synthesis

Ch7-Short-Time Fourier Transform Analysis and Synthesis

Embed Size (px)

DESCRIPTION

hi

Citation preview

Page 1: Ch7-Short-Time Fourier Transform Analysis and Synthesis

Speech Processing

Short-Time Fourier Transform Analysis and Synthesis

Page 2: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 2

Short-Time Fourier Transform Analysis and Synthesis Minimum-Phase Synthesis Speech & Audio Signals are varying and can be considered

stochastic signals that carry information. This necessitates short-time analysis since a single Fourier

transform (FT) can not characterize changes in spectral content over time (i.e., time-varying formants and harmonics) Discrete-time short-time Fourier transform (STFT) consists of

separate FT of the signal in the neighborhood of that instant. FT in the STFT analysis is replaced by the discrete FT (DFT) Resulting STFT is discrete in both time and frequency.

Discrete STFT vs. Discrete-time STFT which is continuous in frequency.

In linear Prediction and Homomorphic Processing, underlying model of the source/filter is assumed. This leads to: Model based analysis/synthesis, also note that Analysis methods presented implicitly both used short time

analysis methods (to be presented). In Short-Time Analysis systems no such restrictions apply.

Page 3: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 3

Short-Time Analysis (STFT) Two approaches of STFT are

explored:1. Fourier-transform &2. Filterbank

Page 4: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 4

Fourier-Transform View Recall (from Chapter 3):

w[n] is a finite-length, symmetrical sequence (i.e., window) of length Nw. w[n] ≠ 0 for [0, Nw-1] w[n] – Analysis window or Analysis Filter

m

njemnwmxnX ,

Page 5: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 5

Fourier-Transform View x[n] – time-domain signal fn[m]=x[m]w[n-m] - Denotes short-time

section of x[m] at point n. That is, signal at the frame n.

X(n,) - Fourier transform of fn[m] of short-time windowed signal data.

Computing the DFT:

kN

nXknX 2|,,

Page 6: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 6

Fourier-Transform View Thus X(n,k) is STFT for every =(2/N)k

Frequency sampling interval = (2/N) Frequency sampling factor = N

DFT:

m

kmN

jemnwmxknX

2

,

Page 7: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 7

Fourier-Transform View

Page 8: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 8

Example 7.1 Let x[n] be a periodic impulse train sequence:

Also let w[n] be a triangle of length P:

l

lPnnx ][][

P 2P 3P-P

0

n

P/2+1-P/2

P-pointsn

Page 9: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 9

Example 7.1

lPj

l

m

mj

l

m

mj

elPnw

emnwlPm

emnwmxnX

)(

][)(

][][),(

Non-zero only for m=lP

Window located at lP &

Linear phase -lP

Page 10: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 10

Example 7.1 Since windows w[n] do not overlap, |X(n,)| =

constant and ∠X(n,) is linear. Computation of DFT for N=P gives:

lPkP

j

l

m

kmP

j

l

m

kmN

j

elPnw

emnwlPm

emnwmxknX

2

2

2

)(

][)(

][][),(

constant

)(,

l

lPnwknX

1

DFT of translated, non-overlapping windows with phase shift of zero (due to sampling)

Page 11: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 11

Spectogram |X(n,)|2

If analysis window length is ≤ pitch period ⇒ wideband spectrogram⇒ vertical striations

Otherwise⇒ narrowband spectrogram⇒ horizontal striations

How often to apply analysis window to the signal? X(n,k) is decimated by a temporal decimation

factor L: X(nL,k) = DFT{fnL(m)} fnL[m] sections are a subset of fn[m]

How to chose sampling rates in time (L) and frequency (N-fft length) it will be addressed in one of the forthcoming sections.

Page 12: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 12

Analysis window

Lp=1 p=2 p=3

w[pL-m]x[m]

Page 13: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 13

Spectrogram |X(n,)|2

Page 14: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 14

Fourier-Transform View Note that in , X(n,) is periodic over 2 (same as Fourier

transform) and is Hermetian (H=H’) symmetric. For real sequences

Re{X(n,)} or |X(n,)| is symmetric Im{X(n,)} or arg{X(n,)} is anti-symmetric

A time-shift results in linear phase shift (same as in Fourier Transform):

Thus, a shift by n0 in the original time sequence introduces a linear phase, but also a shift in time, corresponding to a shift in each short-time section by n0.

),(][][

][][][][),(~

00

)(00

00

0

nnXeeqnnwqxe

eqnnwqxemnwnmxnX

nj

q

qjnj

q

nqj

m

mj

Page 15: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 15

Filtering View In this interpretation w[n] is considered to be a filter

whose impulse response is w[n]. Thus w[n] is referred to as analysis filter. Let’s fix the value of =o.

The above equation represents the convolution of the sequence x[n]e-jon with the sequence w[n]. Thus:

m

mjo mnwemxnX o ,

nwenxnX njo

o ,

Page 16: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 16

Filtering View The product:

x[n]e-jon Modulation of x[n] up to frequency o.

Page 17: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 17

Filtering View Alternate view:

The discrete STFT can be also interpreted from the filtering viewpoint.

This equation brings the interpretation of the discrete STFT as the output of the filter bank shown in the next slide.

, njnj

ooo enwnxenX

,22

knN

jknN

jenwnxeknX

Page 18: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 18

Filtering View

Page 19: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 19

Filtering View General Properties:

1. If x[n] has the length N & w[n] has the length M, then X(n,) has length N+M+1 along n.

2. The bandwidth of X(n,o) is less than or equal to that of w[n].

3. Sequence X(n,o) has its spectrum centered at the origin.

Page 20: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 20

Example 7.2 Consider a Gaussian window of the form:

The discrete STFT with DFT length N, therefore, can be considered as a bank of filters with impulse responses:

For x[n]=(n) x[n]*hk[n]=hk[n] If N=50, corresponding to bandpass filters spaced

by 200 Hz for the sampling rate of 10000 samples/s, then:

2)(][ onnaenw

knN

jnnak eenh o

2)( 2

][

Page 21: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 21

Example 7.2 For k=0,5,10,15 the following is

obtained:

njnna

njnna

njnna

nnanjnnao

eenh

eenh

eenh

eeenh

o

o

o

oo

15502

)(15

10502

)(10

5502

)(5

)(0502

)(

2

2

2

22

][

][

][

][

Page 22: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 22

Example 7.2

Page 23: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 23

Example 7.3 Consider the filter bank of previous example 7.2 that was designed with

a Gaussian window of the form:

Figure 7.7 shows the Fourier transform magnitudes of the output of the four complex bandpass filters hk[n] for k=0,5,10, and 15 as presented in previous slide and depicted in the figure 7.6.

2)(][ onnaenw

Page 24: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 24

Example 7.3 After Demodulation the resulting bandpass outputs

have the same spectral shape as in the figure but centered at the origin.

Page 25: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 25

Time-Frequency Resolution Tradeoffs In Chapter 3 basic issue in analysis window selection is the

compromise required between a long window for showing signal detail in frequency and a short window required for representing fine temporal structure:

Since both X() and W() are periodic over 2 linear convolution is essentially circular.

From the equation above: W() smears (smoothes) X(). Want W() as narrow as possible ideally W()=() for good

frequency resolution. W()=() will result in a infinitely long w[n]. Poor time resolution. Conflicting goal

dXeW

eWXmnwmxmfnxSTFT

nj

njn

)()(21

)()(][][][]}[{

Page 26: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 26

Example 7.4 Figure 7.8 depicts time-frequency resolution

tradeoff:

Page 27: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 27

Time-Frequency Resolution Tradeoffs From the previous example, smoothing interpretation of

STFT is not valid for non-stationary sequences. For steady signal long analysis windows are appropriate

and they yield good frequency resolution as depicted in the next figure.

Page 28: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 28

Time-Frequency Resolution Tradeoffs However, for short and transient signals, plosive

speech, flaps, diphthongs, etc. , short windows are preferred in order to capture temporal events.

Shorter windows yield poor frequency resolution.

Page 29: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 29

Short-Time Synthesis How to obtain original sequence back from its

discrete-time STFT? The inversion is represented mathematically by a

synthesis equation which expresses a sequence in terms of its discrete-time STFT.

Recall that for fn[m]=x[m]w[n-m]:

Thus:

If w[n]≠0 then recovery is complete.

n

njn emfnX ][),(

][][][),(1 mnwmxmfnX n

Page 30: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 30

Short-Time Synthesis For each n, we take the inverse Fourier transform of the

corresponding function of frequency, then we obtain the sequence fn[m].

Evaluating fn[m] for m=n the following is obtained: x[n]w[0]. For w[0]≠0 x[n] can be obtained by dividing fn[n]/w[0].

The process of taking the inverse Fourier transform of X(n,) for a specific n and then dividing by w[0] is represented in the following relation:

representing synthesis equation for the discrete-time STFT.

denXw

nx nj),(]0[2

1][

Page 31: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 31

Short-Time Synthesis In contrast to discrete-time STFT X(n,) the discrete

STFT X(n,k) is not always invertible. Example 1.

Consider the case when w[n] is bandlimited with bandwidth of B.

Page 32: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 32

Short-Time Synthesis Note if there are frequency components of x[n] which

do not pass through any of the filter regions of the discrete STFT then

it is not a unique representation of x[n], and x[n] is not invertible.

Example 2. Consider X(n,k) decimated in time by factor L, i.e.,

STFT is applied every L samples. w[n] is non-zero over its length Nw.

If L > Nw then there are gaps in time where x[n] is not represented/considered.

Thus in such cases again x[n] is not invertible.

Page 33: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 33

L > Nw

Lw[pL-m]x[m]

Nw

Page 34: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 34

Short-Time Synthesis Conclusion:

Constraints must be adopted to ensure uniqueness and invertability:1. Proper/Adequate frequency sampling:

B≥2/Nw (B - Window bandwidth)2. Proper Temporal Decimation: L≤Nw

Page 35: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 35

Filter Bank Summation (FBS) Method Traditional short-time synthesis method that is

commonly referred to as the Filter Bank Summation (FBS).

FBS is best described in terms of the filtering interpretation of the discrete STFT. The discrete STFT is considered to be the set of

outputs of a bank of filters. The output of each filter is modulated with a complex

exponential Modulated filter outputs are summed at each instant

of time to obtain the corresponding time sample of the original sequence (see Figure 7.5(b) in the slide 18).

Page 36: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 36

Filter Bank Summation (FBS) Method Recall the synthesis equation given earlier:

FBS method carries out discrete version of this equation by utilizing discrete STFT X(n,k):

Derive conditions such that to ensure that y[n] x[n].

denXw

nx nj),(]0[2

1][

1

0

2

),(]0[

1][N

k

knN

jeknX

Nwny

Page 37: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 37

Filter Bank Summation (FBS) Method From Figure 7.5

Thus:

Interchanging summation operation this equation reduces to:

1

x[n] y[n]Analysis followed by synthesis

1

0

2

),(

2

][][]0[

1][N

k

knN

j

knX

kmN

j

m

eemnwmxNw

ny

1

0

2

][][]0[

1][N

k

nkN

jenwnx

Nwny

Page 38: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 38

Filter Bank Summation (FBS) Method Furthermore

N period

trainimpulse Periodic

1

0

2

1

0

2

][][][]0[

1][

][][]0[

1][

][][]0[

1][

r

N

k

nkN

j

N

k

nkN

j

rNnNnwnxNw

ny

enwnxNw

ny

enwnxNw

ny

Page 39: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 39

Filter Bank Summation (FBS) Method Thus:

y[n] is the output of the convolution of x[n] with a product of the analysis window with a periodic impulse sequence.

Note:

reduces to [n] if: Window length Nw≤N, or For Nw>N, must have w[rN]=0 for r≠0, that is

r

rNnnwnxw

ny ][][][]0[

1][

r

rNnnw ][][

, ...,,rrNw 321for ;0][

Page 40: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 40

Filter Bank Summation (FBS) Method

Page 41: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 41

Filter Bank Summation (FBS) Method This constraint is known as the FBS constraint. It must be fulfilled in order to ensure exact signal

synthesis with the FBS method.

This constrained is commonly expressed in frequency domain:

This expression states that the frequency responses of the analysis filters should sum to a constant across the entire bandwidth.

We will conclude this discussion by stating that a filter bank with N filters, based on an analysis filter of length less than or equal to N, is always an all-pass system.

1

0

02 N

k

NwkN

W

Page 42: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 42

Generalized FBS Method Note:

“Smoothing” function f[n.m] is referred to as the time-varying synthesis filter.

It can be shown that any f[n,m] that fulfills the condition below makes the synthesis equation above valid (Exercise 7.6):

Note also that basic FBS method can be obtained by setting the synthesis filter to be a non-smoothing filter:

f[n,m]=[m]

derXrnnfnx nj

r

),(],[21][

m

mwmnf 1][][

Page 43: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 43

Generalized FBS Method Consider the discrete STFT with decimation factor L.

Generalized FSB of the synthesized signal is given by:

Furthermore, consider time invariant smoothing filter:

f[n,m]=f[m] That is:

f[n,n-rL]=f[n-rL]

r

N

k

nkN

jekrLXrLnnf

NLny

1

0

2

),(],[][

Page 44: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 44

Generalized FBS Method Thus

This equation holds when the following constrain is satisfied by the analysis and synthesis filters as well as the temporal decimation and frequency sampling factors:

For f[m]=[m] and L=1 this method reduces to the basic FBS method.

r

N

k

nkN

jekrLXrLnf

NLny

1

0

2

),(][][

r

nppNnrLwrLnfL ],[][][

Page 45: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 45

Generalized FBS Method Interested in L>1 case and in using f[n] as

interpolator. Interpolation FBS Methods:

1. Helical Interpolation (Partnoff)2. Weighted Overlap-add Method (Croshiere)

Page 46: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 46

Overlap-Add (OLA) Method FBS Method was motivated from the filtering view of the STFT OLA method was motivated from the Fourier transform view of

the STFT.

In the OLA method:1. Inverse DFT for each fixed time in the discrete STFT is taken,2. Overlap and add operation between the short-time section is

performed,

This works provided that analysis window is designed such that the overlap and add operation effectively eliminates the analysis window from the synthesized sequence.

Basic idea is that the redundancy within overlapping segments and the averaging of the redundant samples remove the effect of windowing.

Page 47: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 47

Overlap-Add (OLA) Method Recall the short-time synthesis relation:

If x[n] is averaged over many short-time segments and normalized by W(0) then

where

denXW

nx nj),(]0[2

1][

p

pj depXW

nx ),(]0[2

1][

n

nwW ][)0(

Page 48: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 48

Overlap-Add (OLA) Method Discretized version of OLA is given by:

Note that the above IDFT is true provided that N>Nw. The expression for y[n] thus becomes:

Which provided that:

theny[n]=x[n]

p

npwnxnfIDFT

N

k

knN

j

p

ekpXNW

ny

][][][:

1

0

2

),(1)0(

1][

pp

npwW

nxnpwnxW

ny ][)0(

1][][][)0(

1][

)0(][ Wnpwp

Always True because sum of values of a sequence must always equal the first value of its Fourier transform (D.C. Energy of a signal is by definition sum of signal values)

Page 49: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 49

Overlap-Add (OLA) Method For decimation in time by factor of L, it can be shown (Exercise

7.4) that:

Then x[n] can be synthesized using the following equation:

The above equation depicts general constrain imposed by OLA method. It requires that the sum of all the analysis windows (obtained by sliding w[n] with L-point increments) to add up to a constant as shown in the next figure.

LWnpLw

p

)0(][

p

N

k

knN

jekpLX

NWLny

1

0

2

),(1)0(

][

Page 50: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 50

Overlap-Add (OLA) Method

Page 51: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 51

Overlap-Add (OLA) Method Duality of OLA constraint and FBS constraint:

FBS method requires that finite-length windows have a length Nw less than the number of analysis filters N to satisfy FBS constrain (N>Nw).

Analogously, for OLA methods it can be shown that its constrained is satisfied by all-finite-bandwidth analysis windows whose maximum frequency is less than 2/L (where L is temporal decimation factor). In addition this finite-bandwidth constraint can be relaxed by allowing the

shifted window transform replicas to take on value zero at the frequency origin =0:

Analogous to FBS constrain for Nw>N where the window w[n] is required to take on value zero at n=N, 2N, 3N,...

kL

kL

W 2at ,02

FBS OLA

1

0

02N

k

NwkN

W L

WnpLwp

)0(][

Page 52: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 52

Overlap-Add (OLA) Method

Page 53: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 53

Time-Frequency Sampling Different qualitative view of the time-frequency

sampling concepts for OLA and FBS constrains from the perspective of classical time-domain and frequency-domain aliasing.

Following discussion serves as additional summary of sampling issues for those two methods that gives motivation for our earlier statement that sufficient but not necessary conditions for invertability of the discrete STFT are:

1. The analysis window is non-zero over its finite length Nw.2. The temporal decimation factor L≤Nw3. The frequency sampling interval 2/N ≤ 2/Nw

Page 54: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 54

Time-Frequency Sampling Consider windowed/short-time signal:

fn[m]=w[m]x[n-m], and X(n,) – Fourier transform of fn[m] Analysis window duration of Nw

From Fourier transform point of view: Reconstruction of fn[m] from X(n,k) requires a frequency

sampling of at least 2/Nw or finer. From Time-domain point of view:

Time decimation interval L is required to meet Nyquist criterion based on the bandwidth of the window w[n]. This implies sampling of X(n,k) at a time interval

L ≤ 2/c to avoid frequency-domain aliasing of the time sequence X(n,)

c is the bandwidth of W() [-c, c]-c c

Page 55: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 55

Time-Frequency Sampling

Page 56: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 56

Time-Frequency Sampling Sufficient (but not necessary) conditions for

signal reconstruction are:1. Window is non-zero over its lengths Nw

2. Temporal decimation factor L ≤ Nw (2/c)3. Frequency sampling interval 2/N ≤ 2/Nw

To avoid aliasing:I. In the time domain - by ensuring condition 3.II. In the frequency domain - by ensuring condition 2.

Page 57: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 57

Time Decimation Sampling Implication on the use of practical windows:

I. Rectangular window, Nw Assuming bandwidth equal

to the extent of the main lobe B = [-2/Nw,: 2/Nw]= 4/Nw

;50% Overlap in windows

II. Hamming Window, Nw Bandwidth B = 8/Nw

;75% Overlap in windows

22 w

wN

BL

42 w

wN

BL

-c c

Page 58: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 58

Summary OLA Method (DFT of order N)

1. No time aliasing if window length Nw so that: 2/N ≤ 2/Nw

2. No frequency-domain aliasing occurs if decimation factor L is small enough so that filter bandwidth c =(2/L)

3. If zeros are allowed in W() then condition 2 can be relaxed. In this case we can under-sample in frequency and still recover the sequence.

Page 59: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 59

Summary FBS Method

1. No frequency-domain aliasing occurs if the decimation factor L meets the Nyquist criterion, i.e., L ≤ Nw (2/c) where c is the w[n] bandwidth.

2. Not time-domain aliasing occurs if 2/N ≤ 2/Nw Nw≤ N.

3. If zeros in w[n] are allowed then condition 2 can be relaxed. In this case we can under-sample in time and still recover the sequence.

Page 60: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 60

Short-Time Fourier Transform Magnitude (STFTM) Spectrogram major tool in speech applications: Spectrogram is squared STFT magnitude (STFTM).

It has been suggested that human ear extracts perceptual information strictly form a spectrogram-like-representation of speech ( J.C. Anderson, “Speech Analysis/Synthesis Based on Perception”, PhD Thesis, MIT, 1984)

Experienced speech researchers have trained themselves to “read” the spectrogram itself (Victor Zue, MIT). Primary topic of FIT-ece5528 – “Acoustics of

American Speech”.

Page 61: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 61

Short-Time Fourier Transform Magnitude (STFTM) STFTM discards (possibly) phase information, which has

numerous uses in application areas: Time-scale modification Speech Enhancement

In all these applications phase information estimation of speech is difficult (e.g., presence of noise in the signal)

Furthermore, a number of techniques have been developed to obtain phase estimate from a STFT magnitude.

This section introduces STFTM as an alternative time-frequency signal representation.

In addition analysis and synthesis techniques will be developed for STFTM.

Page 62: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 62

Short-Time Fourier Transform Magnitude (STFTM) Squared-Magnitude and Autocorrelation

Relationship:

m-autocorrelation “lag”

denXmnr nj2),(21],[

nj

m

emnrnX

],[),( 2

Short-timeautocorrelation

Short-timemagnitude

Page 63: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 63

Short-Time Fourier Transform Magnitude (STFTM) Furthermore, the autocorrelation r[n,m] is given by

the convolution of the short-time signal:

r[n,m] = fn[m]*fn[-m]where

fn[m]=x[m]w[n-m]

Page 64: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 64

Signal Representation Under what conditions STFTM can be used to

represent a sequence uniquely? Note that:

|F{x[n]}|= |F{-x[n]}|

⇒ Ambiguity, thus STFTM is not unique representation for all cases.

However, by imposing certain mild restrictions on: the analysis window and the signal, unique signal representation is indeed possible with the discrete-time STFTM.

Page 65: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 65

Signal Representation Suppose x[n] is the sum of two

signals: x1[n] and x2[n] occupying different regions of the n-axis.

Furthermore, suppose that the gap of zeros between x1[n] and x2[n] is large enough so that there is no analysis window position for which the corresponding short-time section includes non-zero samples of both x1[n] and x2[n].

Because of the ambiguity condition STFTM of: x1[n] + x2[n] x1[n] - x2[n], and -x1[n] + x2[n]is the same.

Page 66: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 66

Signal Representation Any uniqueness conditions must include a

restriction on the length of zero gaps between non-zero portions of the signal x[n].

Sufficient uniqueness conditions are the following:1. The analysis window w[n] is known sequence of finite

length Nw, with no zeros over its durations.2. The sequence x[n] is one-sided with at most Nw-2

consecutive zero samples, and the sign of its first non-zero value is known.

Page 67: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 67

Signal Representation If the successive STFTM correspond to overlapping

signal segments then: If short-time spectral magnitude of signal segment at

time n is know then Spectral magnitude of the adjacent section at time

n+1 must be consistent in the region of overlap with the known short-time section.

⇒ If the analysis window were non-zero and of length Nw, then after dividing out the analysis window, the first Nw-1 samples of the segment at time n+1, must equal the last Nw-1 of the segment at time n (as illustrated in the next slide)

⇒ If the last sample of a segment can be extrapolated from its first Nw-1 values, one could repeat this process to obtain the entire signal x[n].

Page 68: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 68

Signal Representation

Page 69: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 69

Signal Representation To develop the procedure for extrapolating the next

sample of a sequence using its STFTM, assume that the first Nw-1 samples under the analysis window positioned at time n are known. The sequence x[n] has been obtained up to some time n-1

from its STFTM. Goal is to compute sample x[n] from these initial

samples and the STFT magnitude, |X(n,)|, or equivalently r[n,m].

Page 70: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 70

Signal Representation Note that r[n, Nw-1], the maximum lag of autocorrelation, is

given by the product of the first and last value of the segment:

present oflast next offirst

)]1([]1[]0[]0[]1,[ www NnxNwnxwNnr

)]1([]1[]0[]1,[][

ww

w

NnxNwwNnrnx

Page 71: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 71

Signal Representation Note that:

If the first value of the short-time section, x[n-(Nw-1)] happens to be equal to zero, must find the first non-zero value within the section and again use the product relation as depicted in the last expression.

Note that such a sample can be found because it was assumed that there are at most Nw-2 consecutive zero samples between any two non-zero samples of x[n].

m

njemnrnX ,, 2

Page 72: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 72

Signal Representation Sequential extrapolation algorithm

1. Initialize with x[0]2. Update time n3. Compute r[n,Nw-1] from the inverse DFT

of |X(n,k)|2.

4. Compute:

5. Return to step (2) and repeat

)]1([]1[]0[]1,[][

ww

w

NnxNwwNnrnx

Page 73: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 73

Reconstruction from Time-Frequency Samples To carry out STFTM analysis on a digital computer,

discrete STFTM must be applied. Uniqueness theory of STFTM can be easily extended

to discrete STFTM. Uniqueness of STFTM based on the short-time

autocorrelation functions. Autocorrelation functions can be obtained even

if the STFTM is sampled in frequency (discrete STFTM) with adequate frequency sampling.

To consider effects of temporal decimation with factor L, we note that adjacent short-time sections now have an overlap of Nw-L instead of Nw-1.

Page 74: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 74

Reconstruction from Time-Frequency Samples Sufficient uniqueness conditions for the partial

overlap case:1. The analysis window w[n] is a known sequence of

finite length Nw, with no zeros over its duration.2. The sequence x[n] is one-sided with, at most Nw-2L

consecutive zero samples. L consecutive samples of x[n] (from the first non-zero sample) are known. This is a sufficient but not a necessary condition.

Page 75: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 75

Signal Estimation from the Modified STFT or STFTM Synthesis of a signal from a time-frequency function of

a modified STFT or STFTM required in many applications.

Modification may arise due to:1. Quantization errors (e.g., from speech coding)2. Time-varying filtering3. Speech Enhancement4. Signal Rate modifications

Limitations: Modifications in frequency should result in time

modification that are restricted within an analysis window (Figure 7.18 next slide)

Overlapping sections must undergo similar modifications (Figure 7.19)

Page 76: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 76

Signal Estimation from the Modified STFT or STFTM Example 7.5. Removal of

interfering tone. Consider modifying a valid

X(n,) of short time fn[m]=x[m]w[n-m] segment by inserting a zero gap where there is known to lie an unwanted interfering sine wave component.

Removal of the interfering signal with H(n,).

Resulting frequency representation is: Y(n,)=X(n,)H(n,)

Inverse transforming it to obtain modified short-time sequence gn[m] is non-zero beyond the extent of the original short-time segment fn[m]=x[m]w[n-m].

Page 77: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 77

Signal Estimation from the Modified STFT or STFTM Example 7.6

At time n: Suppose a time-decimated

STFT, X(nL,) is multiplied by a linear phase factor ejno to obtain Y(nL,)=X(nL,)ejno

At time (n+1) X((n+1)L,) is multiplied

by a negative of this linear phase factor e-jno to obtain Y((n+1)L,)=X((n+1)L,)e-jno

Overlapping sections of inverse Fourier Transforms denoted by gnL[m] and g(n+1)L[m] are not consistent.

Page 78: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 78

Heuristic Application of STFT Synthesis Methods Although modifications of the STFT or STFTM may violate

some principles, results may be ”reasonable”. Resulting effect of modifying STFT (FBS and OLA) with

another time-frequency function can be shown to be a time-varying convolution between x[n] and a function ĥ[n,m]: x[n]*ĥ[n,m].

Let X(n,) be modified by a function H(n,):Y(n,) = X(n,)H(n,)

This corresponds to a new short-time segment:gn[m] = fn[n]*h[n,m]

h[n,m] – time varying system impulse response (Chapter 2).

Page 79: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 79

Heuristic Application of STFT Synthesis Methods Consider FBS method (discretization in frequency to

obtain):

N-point IDFT of H(n,k):

Then resulting sequence can be written as:

where

),(),(|),(),( 2 knHknXnYknYk

N

Nover periodic ,],[],[~

l

lNmnhmnh

m

mnhmnxny ],[ˆ][][

l

lNmnhnwmnh ],[][],[ˆ

Page 80: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 80

Heuristic Application of STFT Synthesis Methods Using OLA method, it can be shown (see Exercise

7.11) that:

Contrasting FBS with OLA FBS: multiplication instantaneous change OLA: convolution smoothing

l

lNmnhnwmnh ],[][],[ˆ

Page 81: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 81

Heuristic Application of STFT Synthesis Methods Example 7.7

Suppose we want to deliberately introduce reverberation into a signal x[n] by convolution with the filter:

h[n] = [n] + [n-no]

Fourier transform of which is:H() = 1 + e-jno

STFT of resulting signal is given by: Y(n,)= X(n,)H()

where

m

mjemnwmxnX ][][),(

Page 82: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 82

Example 7.7 (cont.) Using OLA method (7.21):

It is then possible to express y[n] in terms of original sequence:

p

N

k

knN

jekpY

nWny

1

0

2

,1)0(

1][

p

W

pp

rNmnhIDFT

N

k

mnkN

j

mnhmx

mpwekHN

mxW

ny

r

][ˆ][

][1][)0(

1][

)0(][

1

0

)(2

Page 83: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 83

Example 7.7 (cont.) Where

is periodic extension of h[n], over N, of which we only consider interval [0,N-1].

This implies that original reverberated signal is obtained only when no<N, otherwise temporal alias will occur (as illustrated in 7.20).

r

or

rNnnrNnrNnhnh ][][][][ˆ

Page 84: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 84

Example 7.7 (cont.)

Page 85: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 85

Time-Scale Modification and Enhancement of Speech The signal construction methods presented in this

chapter can be applied in a variety of speech applications.

Time-Scale Modification In speech case would like to change articulation rate

(faster, slower) without changing the pitch

Page 86: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 86

Time-Scale Modification

Page 87: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 87

Time-Scale Modification Methods:

Cut & Paste (Fairbanks method): Discard or duplicate frames, in order to speed up or slow down the

articulation respectively. Problem:

Pitch period mismatch at adjacent frames causes distortion. Pitch-synchronous OLA (Scott & Gerber)

Select frame size & location synchronous to pitch periods. Problem of pitch period mismatch is avoided.

Problem: Pitch synchronization is not always easy.

STFTM Synthesis To avoid pitch synchronization problems use only the magnitude of STFT

(i.e., STFTM)1. Compute |X(nL,)| at an appropriate frame interval – decimation rate L

(e.g., L=128 at Fs=10000 Hz, and N is several T0 long)2. Modify decimation rate with new rate M (e.g., M=L/2) for a speed-up of

factor of ½: |Y(nM,)|= |X(nL,)|3. Apply the Least-Squared Error iterative estimation algorithm until |

Y(nM,)| converged. Problem:

Occasional reverberant characteristic of synthesized signal are perceived due to lack of STFT phase control.

Page 88: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 88

Time-Scale Modification

Page 89: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 89

Noise Reduction A number of techniques developed to

remove/reduce additive noise: Noise corrupted signal is given by:

y[n]=x[n]+b[n] STFT Synthesis:

Subtract Noise spectrum Ŝb()

Original phase spectrum Y(nL,) is retained because phase of the noise can not be reliably estimated in general.

Factor is a control of the degree of noise reduction.

0)(ˆ),( 0 )(ˆ),(

)(ˆ),(),(ˆ22

),(21

2

bb

nLY-jb

SnLYSnLYif

eSnLYnLX

Page 90: Ch7-Short-Time Fourier Transform Analysis and Synthesis

April 22, 2023 Veton Këpuska 90

Noise Reduction STFTM Synthesis:

Ignore phase and use Sequential Extrapolation or Least-Squared Error estimation method to construct clean signal.