8/15/2015
1
P. C. Pandey, "Signal processing for hearing aids: Challenges and some
solutions,” Invited talk, Workshop “Radar and Sonar Signal Processing,”
NSTL Visakhapatnam, 17-21 Aug 2015
Workshop Coordinator: Ms. M. Vijaya < vijaya.m @ nstl.drdo.in >
Session: 21 Aug 2015, 1100 to 1230===========================================================================
Part B
Sliding-band Dynamic
Range Compression
(Ref: N. Tiwari & P. C. Pandey, NCC 2014, Paper No.1569847357)
2/25
Overview
1. Introduction
2. Sliding-band Dynamic Range Compression
3. Offline & Real-time Implementations
4. Test Results
5. Summary & Conclusion
8/15/2015
2
1 2 3 4 5 3/25
1. Introduction
Dynamic range compressionTo present sounds comfortably within the limited dynamic range of the
listener by amplifying the low level sounds without making the high level
sounds uncomfortably loud.
Processing steps• Input level estimation
• Gain calculation based on input level
• Multiplication of input with gain function
• Output resynthesis
Classification of compression schemes• On the basis of signal level calculation: single-band or multiband
• On the basis of gain control method: feedback or feed-forward
1 2 3 4 5 4/25
Processing
Gain dependent on the
dynamically varying signal level.
Parameters:
• Compression threshold (TH)
• Compression ratio (CR)
• Attack & release time
Problems
Single-band dynamic range compression
• Compensation for frequency-dependent loudness growth not feasible.
• Power mostly contributed by low-frequency components
→ level of of high-frequency components controlled by low-frequency
components
→ Inaudibility of high frequency components, distortions in temporal
envelope
8/15/2015
3
1 2 3 4 5 5/25
Multiband dynamic range compression
General scheme of processing
Spectral components of the input signal divided in multiple bands and the
gain for each band calculated on the basis of signal power in that band.
Parameters (band specific): compression threshold TH, compression ratio
CR, attack & release time for detection.
1 2 3 4 5 6/25
Lippmann et al. (1980): 16-channel compression
9% improvement in recognition score over linear amplification.
Asano et al.(1991): Multiband dynamic range compression realized as a single
time-varying FIR filter & implemented on a 32-bit DSP fixed-point processor
Less spectral distortion due to smoothened frequency response of FIR filter.
Stone et al. (1999): Comparison of single and four-channel compression
schemes & effect of varying CR, TH, and attack & release times
Intelligibility & quality tests showed no specific preference for schemes.
Li et al. (2000): Wavelet-based compression (7 octave sub-band analysis using
wavelet filter bank & resynthesis after applying a logarithmic compression on
the wavelet coefficients)
Increase in intelligibility without introducing noticeable distortions.
Magotra et al. (2000): Multiband dynamic range compression using a 16-bit
fixed-point processor
Taylor's series approximation used for the compression function to reduce
computations in gain calculation.
8/15/2015
4
1 2 3 4 5 7/25
Disadvantages of multiband compression
• Spurious spectral distortions
• Reduction in spectral contrasts and modulation depth
• Distortion in spectral shape of formants lying across the
band boundaries
• Distortion of formant transitions across the adjacent bands
• Time-varying magnitude response without corresponding
variation in the phase response leading to quality
degradation
→ Audible distortions, perceptible discontinuities, adverse
effect on the perception of certain speech cues
1 2 3 4 5 8/25
Example of distortion due to multiband dynamic range
compression during spectral transition
Processed output:
multiband
compression with 18
auditory critical
bands, CR = 30, Ta =
6.4 ms, Tr = 192 ms
Swept sinusoidal
input: constant
amplitude, 125 –250
Hz linearly swept
frequency, 200 ms
sweep duration
Time (s)
Time (s)
8/15/2015
5
1 2 3 4 5 9/25
Investigation objective
Real-time dynamic range compression to compensate for
frequency-dependent loudness recruitment associated with
sensorineural hearing loss for use in hearing aids with a low-
power processor.
• Low distortions
• Low computational complexity & memory requirement
• Low signal delay (algorithmic + computational)
1 2 3 4 5 10/25
Proposed scheme: Sliding-band dynamic
range compression
• Proposed for significantly reducing the temporal and
spectral distortions associated with the currently used
single-band and multiband compressions in hearing aids.
• Realized with computational complexity acceptable for
implementation on a 16-bit fixed-point DSP processor and
signal delay acceptable for real-time application.
Investigations using offline & real-time implementations
Selection of processing parameters
Evaluation of the implementations
Informal listening, PESQ measure
8/15/2015
6
1 2 3 4 5 11/25
2. Sliding-band Dynamic Range
Compression
• Short-time spectral analysis: windowing, zero-padding, DFT calculation
• Spectral modification: gain calculation, output spectrum calculation
• Resynthesis: IDFT calculation, windowing, overlap-add
ProcessingApplying a frequency-dependent gain function, with the gain for each
spectral sample determined by the short-time power in auditory critical
bandwidth centered at it & in accordance with the specified hearing
thresholds, compression ratios, and attack and release times.
1 2 3 4 5 12/25
Spectral modification:
LEVEL
ESTIMATION
TARGET
GAIN CALC.
CR(k) Pmc(k)
GAIN CALC.
kTH SAMPLE-
CENTERED
BAND
kTH SAMPLE
ATTACK
TIME
RELEASE
TIME
INPUT SHORT-TIME
COMPLEX SPECTRUM
MODIFIED SHORT-TIME
COMPLEX SPECTRUM
BAND
SAMPLES
×
Pmc(k): Power at upper comfortable listening level
CR(k): Compression ratio
Short-time spectral analysis: windowing (length L, shift S), zero-
padding, N-point DFT
Resynthesis: N-point IDFT, overlap-add
8/15/2015
7
1 2 3 4 5 13/25
Gain calculation
Auditory critical bandwidth
BW(k) = 25 + 75(1 + 1.4f 2)0.69, freq. sample = k, freq. = f
Target gain calculation
Power at upper comfortable listening level: Pmc(k)
Compression ratio: CR(k)
Input power: Pic(k), Output power: Poc(k)
Target gain: Gt(k) = Poc(k) / Pic(k)
Compression relation
• dB scale: [Poc(k) / Pmc(k)]dB = [Pic(k) / Pmc(k)]dB / CR(k)
• linear scale: Poc(k) / Pmc(k) = [Pic(k) / Pmc(k)]1/ CR(k)
Target gain for kth spectral sample
[Gt(k)]dB = [1 − 1 / CR(k)] [Pmc(k) / Pic(k)]dB
1 2 3 4 5 14/25
Gain changed in steps from the previous value towards the target value with
settable attack and release times
Fast attack: to avoid the output level from exceeding UCL during transients
Slow release: to avoid the pumping effect or amplification of breathing
Number of steps during attack phase = sa
Number of steps during release phase = sr
Target gain corresponding to min. input level = Gmax
Target gain corresponding to max. input level = Gmin
Gain ratio for attack phase γa = (Gmax / Gmin)1/sa
Gain ratio for release phase γr = (Gmax / Gmin)1/sr
Gain for ith window & kth spectral sample
G(i,k) = max[G(i − 1 ,k) / γa, Gt(i,k)] for Gt(i,k) < G(i − 1 ,k)
min[G(i − 1 ,k) γr, Gt(i,k)] for Gt(i,k) > G(i − 1 ,k)
Attack time Ta = saS / fs , Release time Tr = srS / fs
[fs = sampling freq., S = window shift]
8/15/2015
8
1 2 3 4 5 15/25
Implementation related challenges
• Modifications in the short-time magnitude spectrum without
corresponding changes in the phase spectrum can cause audible
distortions.
• Computational complexity: log or series approximation based gain
calculations not suitable for use in sliding-band compression.
Solutions• Analysis-synthesis using least-square error based signal estimation from
modified STFT (Griffin & Lim, 1984): Processing artifacts reduced by
masking the effect of phase discontinuities in the modified short-time
complex spectrum.
• Look-up table based gain calculation: Two-dimensional look-up table
relating the input power with gain as a function of frequency. Permits
compression function most suited to compensate for the abnormal
loudness growth.
1 2 3 4 5 16/25
3. Offline & Real-time Implementations
Implementation for offline processing
Implementation using Matlab 7.10 for evaluating the proposed
technique and the effect of processing parameters.
• Processing parameters
◦ fs = 10 kHz ◦ Frame length = 25.6 ms (L = 256)
◦ Overlap = 75% (S = 64) ◦ FFT size N = 512
• 2D look-up table for frequency-dependent compression based on a linear
relation between input-dB and output-dB, with settable CR(k) and Pmc(k).
◦ Input range: 20 log intervals (trade-off: small gain increments, look-up table
size).
◦ Look-up table with 256×20 entries
• Attack and release times
◦ sa=1, Ta = 6.4 ms: Fast attack to avoid uncomfortable level during transients
◦ sr=30, Tr = 192 ms: Slow release to avoid pumping & amplification of breathing
8/15/2015
9
1 2 3 4 5 17/25
Implementation for real-time
processing
Implementation on a 16-bit fixed-point DSP board to examine
suitability of the technique for use in hearing aids.
• DSP chip: TI/TMS320C5515◦ 16 MB memory space (320 KB on-chip RAM with 64 KB dual access data
memory)
◦ Three 32-bit programmable timers
◦ 4 DMA controllers each with 4 channels
◦ FFT hardware accelerator (up to 1024-point FFT)
◦ Max. clock speed: 120 MHz
• DSP Board: eZdsp◦ 4 MB on-board NOR flash for user program
◦ Stereo codec TLV320AIC3204: 16/20/24/32-bit ADC & DAC, 8 – 192 kHz
sampling
• Software development: C using TI's 'CCStudio ver. 4.0
1 2 3 4 5 18/25
• Input-output operations: DMA based I/O with cyclic buffers
• ADC and DAC: one codec (left channel) with 16-bit quantization
• Processing parameters (same as for offline processing): fs = 10 kHz, L = 256, S =
64, N = 512
• Data representation (input samples, spectral values, processed samples): 16-bit
real & 16-bit imaginary
Implementation details
8/15/2015
10
1 2 3 4 5 19/25
Data transfers & buffering operations (S = L/4)
DMA cyclic buffers
• 5-block S-sample
input buffer
• 2-block S-sample
output buffer
Pointers
• Current input block
• Just-filled input block
• Current output block
•Write-to output block
(incremented cyclically
on DMA interrupt)
Signal delay: Algorithmic: 1 frame (25.6 ms), Computational ≤ frame shift (6.4 ms)
1 2 3 4 5 20/25
4. Test Results
Tests for verification and evaluation
Offline processing
• Verification of the compression technique for speech input with a large
level variation and examination of the effect of different set of processing
parameters.
• Assessment of output speech quality (using informal listening) for
different input speech materials and time varying levels.
• Comparison of distortions introduced by different compression
techniques during spectral transitions.
Real-time processing
• Comparison of the processed outputs from offline & real-time
implementation: informal listening, PESQ measure (0 – 4.5).
• Signal delay & computational requirement.
8/15/2015
11
1 2 3 4 5 21/25
Example: "you will mark ut please" concatenated with scaling factors for
variation in the input level. CR = 2, Ta = 6.4 ms, Tr = 6.4 & 192 ms.
Input waveform
Scaling factor
Unprocessed
waveform
Processed Tr =
6.4 ms, low Pmc
Processed Tr =
192 ms, low Pmc
Processed Tr =
6.4 ms, high Pmc
Processed Tr =
192 ms, high Pmc
Time (s)
Results from offline processing
Processing of different speech materials with varying levels: No audible
roughness or distortion during informal listening.
1 2 3 4 5 22/25
Time (s)
Distortions during spectral transitions: Example of swept sinusoidal input.
Sliding band compression
output
Multiband compression (18
auditory critical bands) output
Single-band compression output
Input: constant amplitude, 125
–250 Hz linearly swept
frequency, 200 ms sweep
duration
CR = 30, Ta = 6.4 ms, Tr = 192 ms.
8/15/2015
12
1 2 3 4 5 23/25
Results from real-time processing
Informal listening: real-time output perceptually similar to the offline output
PESQ for real-time w.r.t. offline : 3.5
Signal delay = 36 ms
Use of processing capacity: 41% (lowest acceptable clock: 50 MHz, max = 120 MHz)
Unprocessed
Offline processed
Real-time processed
Example: "you will mark ut please" concatenated with scaling factors for
variation in the input level. CR = 2, Ta = 6.4 ms, Tr = 192 ms, low Pmc.
Time (s)
1 2 3 4 5 24/25
5. Summary & Conclusions
Summary: Development & investigation of sliding band
compression scheme
• Realized using modified fixed-frame analysis-synthesis for low
computational complexity & without distortions associated with phase
discontinuities.
• Suitable for speech & non-speech audio & provision for settable attack
time, release time, & compression ratios.
• Implemented using 16-bit fixed-point DSP chip & tested for satisfactory
operation: 36 ms signal delay, 41% use of processing capacity, indicating
scope for combination with other processing techniques.
Conclusion: Sliding-band compression can be used to
compensate for frequency-dependent loudness recruitment
without introducing the distortions associated with single-band
& multiband compression.
8/15/2015
13