Download pdf - Part B Slidingband Dynamic Range Compressionpcpandey/notes/pcpandey_nstl...wavelet filter bank & resynthesis after applying a logarithmic compression on the wavelet coefficients) Increase

8/15/2015

1

P. C. Pandey, "Signal processing for hearing aids: Challenges and some

solutions,” Invited talk, Workshop “Radar and Sonar Signal Processing,”

NSTL Visakhapatnam, 17-21 Aug 2015

Workshop Coordinator: Ms. M. Vijaya < vijaya.m @ nstl.drdo.in >

Session: 21 Aug 2015, 1100 to 1230===========================================================================

Part B

Sliding-band Dynamic

Range Compression

(Ref: N. Tiwari & P. C. Pandey, NCC 2014, Paper No.1569847357)

2/25

Overview

1. Introduction

2. Sliding-band Dynamic Range Compression

3. Offline & Real-time Implementations

4. Test Results

5. Summary & Conclusion

8/15/2015

2

1 2 3 4 5 3/25

1. Introduction

Dynamic range compressionTo present sounds comfortably within the limited dynamic range of the

listener by amplifying the low level sounds without making the high level

sounds uncomfortably loud.

Processing steps• Input level estimation

• Gain calculation based on input level

• Multiplication of input with gain function

• Output resynthesis

Classification of compression schemes• On the basis of signal level calculation: single-band or multiband

• On the basis of gain control method: feedback or feed-forward

1 2 3 4 5 4/25

Processing

Gain dependent on the

dynamically varying signal level.

Parameters:

• Compression threshold (TH)

• Compression ratio (CR)

• Attack & release time

Problems

Single-band dynamic range compression

• Compensation for frequency-dependent loudness growth not feasible.

• Power mostly contributed by low-frequency components

→ level of of high-frequency components controlled by low-frequency

components

→ Inaudibility of high frequency components, distortions in temporal

envelope

8/15/2015

3

1 2 3 4 5 5/25

Multiband dynamic range compression

General scheme of processing

Spectral components of the input signal divided in multiple bands and the

gain for each band calculated on the basis of signal power in that band.

Parameters (band specific): compression threshold TH, compression ratio

CR, attack & release time for detection.

1 2 3 4 5 6/25

Lippmann et al. (1980): 16-channel compression

9% improvement in recognition score over linear amplification.

Asano et al.(1991): Multiband dynamic range compression realized as a single

time-varying FIR filter & implemented on a 32-bit DSP fixed-point processor

Less spectral distortion due to smoothened frequency response of FIR filter.

Stone et al. (1999): Comparison of single and four-channel compression

schemes & effect of varying CR, TH, and attack & release times

Intelligibility & quality tests showed no specific preference for schemes.

Li et al. (2000): Wavelet-based compression (7 octave sub-band analysis using

wavelet filter bank & resynthesis after applying a logarithmic compression on

the wavelet coefficients)

Increase in intelligibility without introducing noticeable distortions.

Magotra et al. (2000): Multiband dynamic range compression using a 16-bit

fixed-point processor

Taylor's series approximation used for the compression function to reduce

computations in gain calculation.

8/15/2015

4

1 2 3 4 5 7/25

Disadvantages of multiband compression

• Spurious spectral distortions

• Reduction in spectral contrasts and modulation depth

• Distortion in spectral shape of formants lying across the

band boundaries

• Distortion of formant transitions across the adjacent bands

• Time-varying magnitude response without corresponding

variation in the phase response leading to quality

degradation

→ Audible distortions, perceptible discontinuities, adverse

effect on the perception of certain speech cues

1 2 3 4 5 8/25

Example of distortion due to multiband dynamic range

compression during spectral transition

Processed output:

multiband

compression with 18

auditory critical

bands, CR = 30, Ta =

6.4 ms, Tr = 192 ms

Swept sinusoidal

input: constant

amplitude, 125 –250

Hz linearly swept

frequency, 200 ms

sweep duration

Time (s)

Time (s)

8/15/2015

5

1 2 3 4 5 9/25

Investigation objective

Real-time dynamic range compression to compensate for

frequency-dependent loudness recruitment associated with

sensorineural hearing loss for use in hearing aids with a low-

power processor.

• Low distortions

• Low computational complexity & memory requirement

• Low signal delay (algorithmic + computational)

1 2 3 4 5 10/25

Proposed scheme: Sliding-band dynamic

range compression

• Proposed for significantly reducing the temporal and

spectral distortions associated with the currently used

single-band and multiband compressions in hearing aids.

• Realized with computational complexity acceptable for

implementation on a 16-bit fixed-point DSP processor and

signal delay acceptable for real-time application.

Investigations using offline & real-time implementations

Selection of processing parameters

Evaluation of the implementations

Informal listening, PESQ measure

8/15/2015

6

1 2 3 4 5 11/25

2. Sliding-band Dynamic Range

Compression

• Short-time spectral analysis: windowing, zero-padding, DFT calculation

• Spectral modification: gain calculation, output spectrum calculation

• Resynthesis: IDFT calculation, windowing, overlap-add

ProcessingApplying a frequency-dependent gain function, with the gain for each

spectral sample determined by the short-time power in auditory critical

bandwidth centered at it & in accordance with the specified hearing

thresholds, compression ratios, and attack and release times.

1 2 3 4 5 12/25

Spectral modification:

LEVEL

ESTIMATION

TARGET

GAIN CALC.

CR(k) Pmc(k)

GAIN CALC.

kTH SAMPLE-

CENTERED

BAND

kTH SAMPLE

ATTACK

TIME

RELEASE

TIME

INPUT SHORT-TIME

COMPLEX SPECTRUM

MODIFIED SHORT-TIME

COMPLEX SPECTRUM

BAND

SAMPLES

×

Pmc(k): Power at upper comfortable listening level

CR(k): Compression ratio

Short-time spectral analysis: windowing (length L, shift S), zero-

padding, N-point DFT

Resynthesis: N-point IDFT, overlap-add

8/15/2015

7

1 2 3 4 5 13/25

Gain calculation

Auditory critical bandwidth

BW(k) = 25 + 75(1 + 1.4f 2)0.69, freq. sample = k, freq. = f

Target gain calculation

Power at upper comfortable listening level: Pmc(k)

Compression ratio: CR(k)

Input power: Pic(k), Output power: Poc(k)

Target gain: Gt(k) = Poc(k) / Pic(k)

Compression relation

• dB scale: [Poc(k) / Pmc(k)]dB = [Pic(k) / Pmc(k)]dB / CR(k)

• linear scale: Poc(k) / Pmc(k) = [Pic(k) / Pmc(k)]1/ CR(k)

Target gain for kth spectral sample

[Gt(k)]dB = [1 − 1 / CR(k)] [Pmc(k) / Pic(k)]dB

1 2 3 4 5 14/25

Gain changed in steps from the previous value towards the target value with

settable attack and release times

Fast attack: to avoid the output level from exceeding UCL during transients

Slow release: to avoid the pumping effect or amplification of breathing

Number of steps during attack phase = sa

Number of steps during release phase = sr

Target gain corresponding to min. input level = Gmax

Target gain corresponding to max. input level = Gmin

Gain ratio for attack phase γa = (Gmax / Gmin)1/sa

Gain ratio for release phase γr = (Gmax / Gmin)1/sr

Gain for ith window & kth spectral sample

G(i,k) = max[G(i − 1 ,k) / γa, Gt(i,k)] for Gt(i,k) < G(i − 1 ,k)

min[G(i − 1 ,k) γr, Gt(i,k)] for Gt(i,k) > G(i − 1 ,k)

Attack time Ta = saS / fs , Release time Tr = srS / fs

[fs = sampling freq., S = window shift]

8/15/2015

8

1 2 3 4 5 15/25

Implementation related challenges

• Modifications in the short-time magnitude spectrum without

corresponding changes in the phase spectrum can cause audible

distortions.

• Computational complexity: log or series approximation based gain

calculations not suitable for use in sliding-band compression.

Solutions• Analysis-synthesis using least-square error based signal estimation from

modified STFT (Griffin & Lim, 1984): Processing artifacts reduced by

masking the effect of phase discontinuities in the modified short-time

complex spectrum.

• Look-up table based gain calculation: Two-dimensional look-up table

relating the input power with gain as a function of frequency. Permits

compression function most suited to compensate for the abnormal

loudness growth.

1 2 3 4 5 16/25

3. Offline & Real-time Implementations

Implementation for offline processing

Implementation using Matlab 7.10 for evaluating the proposed

technique and the effect of processing parameters.

• Processing parameters

◦ fs = 10 kHz ◦ Frame length = 25.6 ms (L = 256)

◦ Overlap = 75% (S = 64) ◦ FFT size N = 512

• 2D look-up table for frequency-dependent compression based on a linear

relation between input-dB and output-dB, with settable CR(k) and Pmc(k).

◦ Input range: 20 log intervals (trade-off: small gain increments, look-up table

size).

◦ Look-up table with 256×20 entries

• Attack and release times

◦ sa=1, Ta = 6.4 ms: Fast attack to avoid uncomfortable level during transients

◦ sr=30, Tr = 192 ms: Slow release to avoid pumping & amplification of breathing

8/15/2015

9

1 2 3 4 5 17/25

Implementation for real-time

processing

Implementation on a 16-bit fixed-point DSP board to examine

suitability of the technique for use in hearing aids.

• DSP chip: TI/TMS320C5515◦ 16 MB memory space (320 KB on-chip RAM with 64 KB dual access data

memory)

◦ Three 32-bit programmable timers

◦ 4 DMA controllers each with 4 channels

◦ FFT hardware accelerator (up to 1024-point FFT)

◦ Max. clock speed: 120 MHz

• DSP Board: eZdsp◦ 4 MB on-board NOR flash for user program

◦ Stereo codec TLV320AIC3204: 16/20/24/32-bit ADC & DAC, 8 – 192 kHz

sampling

• Software development: C using TI's 'CCStudio ver. 4.0

1 2 3 4 5 18/25

• Input-output operations: DMA based I/O with cyclic buffers

• ADC and DAC: one codec (left channel) with 16-bit quantization

• Processing parameters (same as for offline processing): fs = 10 kHz, L = 256, S =

64, N = 512

• Data representation (input samples, spectral values, processed samples): 16-bit

real & 16-bit imaginary

Implementation details

8/15/2015

10

1 2 3 4 5 19/25

Data transfers & buffering operations (S = L/4)

DMA cyclic buffers

• 5-block S-sample

input buffer

• 2-block S-sample

output buffer

Pointers

• Current input block

• Just-filled input block

• Current output block

•Write-to output block

(incremented cyclically

on DMA interrupt)

Signal delay: Algorithmic: 1 frame (25.6 ms), Computational ≤ frame shift (6.4 ms)

1 2 3 4 5 20/25

4. Test Results

Tests for verification and evaluation

Offline processing

• Verification of the compression technique for speech input with a large

level variation and examination of the effect of different set of processing

parameters.

• Assessment of output speech quality (using informal listening) for

different input speech materials and time varying levels.

• Comparison of distortions introduced by different compression

techniques during spectral transitions.

Real-time processing

• Comparison of the processed outputs from offline & real-time

implementation: informal listening, PESQ measure (0 – 4.5).

• Signal delay & computational requirement.

8/15/2015

11

1 2 3 4 5 21/25

Example: "you will mark ut please" concatenated with scaling factors for

variation in the input level. CR = 2, Ta = 6.4 ms, Tr = 6.4 & 192 ms.

Input waveform

Scaling factor

Unprocessed

waveform

Processed Tr =

6.4 ms, low Pmc

Processed Tr =

192 ms, low Pmc

Processed Tr =

6.4 ms, high Pmc

Processed Tr =

192 ms, high Pmc

Time (s)

Results from offline processing

Processing of different speech materials with varying levels: No audible

roughness or distortion during informal listening.

1 2 3 4 5 22/25

Time (s)

Distortions during spectral transitions: Example of swept sinusoidal input.

Sliding band compression

output

Multiband compression (18

auditory critical bands) output

Single-band compression output

Input: constant amplitude, 125

–250 Hz linearly swept

frequency, 200 ms sweep

duration

CR = 30, Ta = 6.4 ms, Tr = 192 ms.

8/15/2015

12

1 2 3 4 5 23/25

Results from real-time processing

Informal listening: real-time output perceptually similar to the offline output

PESQ for real-time w.r.t. offline : 3.5

Signal delay = 36 ms

Use of processing capacity: 41% (lowest acceptable clock: 50 MHz, max = 120 MHz)

Unprocessed

Offline processed

Real-time processed

Example: "you will mark ut please" concatenated with scaling factors for

variation in the input level. CR = 2, Ta = 6.4 ms, Tr = 192 ms, low Pmc.

Time (s)

1 2 3 4 5 24/25

5. Summary & Conclusions

Summary: Development & investigation of sliding band

compression scheme

• Realized using modified fixed-frame analysis-synthesis for low

computational complexity & without distortions associated with phase

discontinuities.

• Suitable for speech & non-speech audio & provision for settable attack

time, release time, & compression ratios.

• Implemented using 16-bit fixed-point DSP chip & tested for satisfactory

operation: 36 ms signal delay, 41% use of processing capacity, indicating

scope for combination with other processing techniques.

Conclusion: Sliding-band compression can be used to

compensate for frequency-dependent loudness recruitment

without introducing the distortions associated with single-band

& multiband compression.

8/15/2015

13