[email protected] EE Dept., IIT Bombay NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No. 1569847357...
29
n i t y a @ e e . i i t b . a c . i n E E D e p t . , I I T B o m b a y NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No. 1569847357 (Session III, Sat., 1 st Mar., 1020 – 1200) A Sliding-band Dynamic Range Compression for Use in Hearing Aids Nitya Tiwari Prem C. Pandey {nitya, pcpandey} @ ee.iitb.ac.in IIT Bombay
[email protected] EE Dept., IIT Bombay NCC2014 Kanpur, 28 Feb.- 2 Mar. 2014, Paper No. 1569847357 (Session III, Sat., 1 st Mar., 1020 – 1200) A Sliding-band
[email protected] EE Dept., IIT Bombay NCC2014 Kanpur, 28
Feb.- 2 Mar. 2014, Paper No. 1569847357 (Session III, Sat., 1 st
Mar., 1020 1200) A Sliding-band Dynamic Range Compression for Use
in Hearing Aids Nitya Tiwari Prem C. Pandey {nitya, pcpandey} @
ee.iitb.ac.in IIT Bombay
[email protected] EE Dept., IIT Bombay 123453/27 1.
Introduction Sensorineural hearing loss Causes: abnormalities in
the cochlear hair cells or the auditory nerve Characteristics
Increase in hearing thresholds (due to loss of inner hair cells)
Loudness recruitment (abnormal loudness growth) & decrease in
dynamic range (due to loss of outer hair cells) Increased spectral
& temporal masking, leading to degraded speech perception
Signal processing in hearing aids Frequency selective amplification
to compensate for frequency dependent elevation of hearing
thresholds Amplitude compression to compensate for decreased
dynamic range
Slide 5
[email protected] EE Dept., IIT Bombay 123454/27 Objective To
present sounds comfortably within the limited dynamic range of the
listener by amplifying the low level sounds without making the high
level sounds uncomfortably loud. Processing steps Input level
estimation Gain calculation based on input level Multiplication of
input with gain function Output resynthesis Classification On the
basis of signal level calculation: single-band or multiband On the
basis of gain control method: feedback or feed-forward Dynamic
range compression
Slide 6
[email protected] EE Dept., IIT Bombay 123455/27 Processing
Gain dependent on the dynamically varying signal level. Parameters
Compression threshold (T H ) Compression ratio (CR) Attack &
release time Single-band dynamic range compression Problems Does
not account for frequency dependent loudness growth function Power
mostly contributed by low-frequency components amplification of
high-frequency components depends low-frequency components
Inaudibility of high frequency components, distortions in temporal
envelope
Slide 7
[email protected] EE Dept., IIT Bombay 123456/27 Multiband
dynamic range compression General scheme of processing Spectral
components of the input signal divided in multiple bands and the
gain for each band calculated on the basis of signal power in that
band. Parameters (band specific): compression threshold T H,
compression ratio CR, attack & release time for detection.
Slide 8
[email protected] EE Dept., IIT Bombay 123457/27 Lippmann et
al. (1980): 16-channel compression 9% improvement in recognition
score over linear amplification. Asano et al.(1991): Multiband
dynamic range compression realized as a single time-varying FIR
filter & implemented on a 32-bit DSP fixed-point processor Less
spectral distortion due to smoothened frequency response of FIR
filter. Stone et al. (1999): Comparison of single and four-channel
compression schemes & effect of varying CR, T H, and attack
& release times Intelligibility & quality tests showed no
specific preference for schemes. Li et al. (2000): Wavelet-based
compression (7 octave sub-band analysis using wavelet filter bank
& resynthesis after applying a logarithmic compression on the
wavelet coefficients) Increase in intelligibility without
introducing noticeable distortions. Magotra et al. (2000):
Multiband dynamic range compression using a 16-bit fixed-point
processor Taylor's series approximation used for the compression
function to reduce computations in gain calculation.
Slide 9
[email protected] EE Dept., IIT Bombay 123458/27 Spurious
spectral distortions Reduction in spectral contrasts and modulation
depth Distortion in spectral shape of formants lying across the
band boundaries Distortion of formant transitions across the
adjacent bands Time-varying magnitude response without
corresponding variation in the phase response leading to quality
degradation Audible distortions, perceptible discontinuities,
adverse effect on the perception of certain speech cues
Disadvantages of multiband dynamic range compression
Slide 10
[email protected] EE Dept., IIT Bombay 123459/27 Example of
distortion due to multiband dynamic range compression during
spectral transition Processed output: multiband compression with 18
auditory critical bands, CR = 30, T a = 6.4 ms, T r = 192 ms Swept
sinusoidal input: constant amplitude, 125 250 Hz linearly swept
frequency, 200 ms sweep duration Time (s)
Slide 11
[email protected] EE Dept., IIT Bombay 1234510/27 Research
objective Real-time dynamic range compression to compensate for
frequency-dependent loudness recruitment associated with
sensorineural hearing loss for use in hearing aids with a low-
power processor. Low distortions Low computational complexity &
memory requirement Low signal delay (algorithmic +
computational)
Slide 12
[email protected] EE Dept., IIT Bombay 1234511/27
Sliding-band compression Proposed for significantly reducing the
temporal and spectral distortions associated with the currently
used single-band and multiband compressions in hearing aids.
Realized with computational complexity acceptable for
implementation on a 16-bit fixed-point DSP processor and signal
delay acceptable for real-time application. Investigations using
offline & real-time implementations S election of processing
parameters Evaluation of the implementations Informal listening,
PESQ measure
Slide 13
[email protected] EE Dept., IIT Bombay 1234512/27 2.
Sliding-band Dynamic Range Compression Processing steps Short-time
spectral analysis: windowing, zero-padding, DFT calculation
Spectral modification: gain calculation, output spectrum
calculation Resynthesis: IDFT calculation, windowing, overlap-add
Processing Applying a frequency-dependent gain function, with the
gain for each spectral sample determined by the short-time power in
auditory critical bandwidth centered at it & in accordance with
the specified hearing thresholds, compression ratios, and attack
and release times.
Slide 14
[email protected] EE Dept., IIT Bombay 1234513/27 Spectral
modification P mc (k): Power at upper comfortable listening level
CR(k): Compression ratio Short-time spectral analysis: windowing
(length L, shift S ), zero-padding, N -point DFT Resynthesis: N
-point IDFT, overlap-add
Slide 15
[email protected] EE Dept., IIT Bombay 1234514/27 Auditory
critical bandwidth BW(k) = 25 + 75(1 + 1.4f 2 ) 0.69, freq. sample
= k, freq. = f Target gain calculation Power at upper comfortable
listening level: P mc (k) Compression ratio: CR(k) Input power: P
ic (k), Output power: P oc (k) Target gain: G t (k) = P oc (k) / P
ic (k) Compression relation dB scale: [P oc (k) / P mc (k)] dB = [P
ic (k) / P mc (k)] dB / CR(k) linear scale: P oc (k) / P mc (k) =
[P ic (k) / P mc (k)] 1/ CR(k) Target gain for k th spectral sample
[G t (k)] dB = [1 1 / CR(k)] [P mc (k) / P ic (k)] dB Gain
calculation
Slide 16
[email protected] EE Dept., IIT Bombay 1234515/27 Gain
calculation (contd.) Gain changed in steps from the previous value
towards the target value with settable attack and release times
Fast attack: to avoid the output level from exceeding UCL during
transients Slow release: to avoid the pumping effect or
amplification of breathing Number of steps during attack phase = s
a Number of steps during release phase = s r Target gain
corresponding to min. input level = G max Target gain corresponding
to max. input level = G min Gain ratio for attack phase a = (G max
/ G min ) 1/sa Gain ratio for release phase r = (G max / G min )
1/sr Gain for i th window & k th spectral sample G(i,k) =
max[G(i 1,k) / a, G t (i,k)] for G t (i,k) < G(i 1,k) min[G(i
1,k) r, G t (i,k)] for G t (i,k) > G(i 1,k) Attack time T a = s
a S / f s, Release time T r = s r S / f s [f s = sampling freq., S
= window shift]
Slide 17
[email protected] EE Dept., IIT Bombay 1234516/27
Analysis-synthesis using least-square error based signal estimation
from modified STFT (Griffin & Lim, 1984): Processing artifacts
reduced by masking the effect of phase discontinuities in the
modified short-time complex spectrum. Look-up table based gain
calculation: Two-dimensional look-up table relating the input power
with gain as a function of frequency. Reduces computations for
real-time implementation. Permits compression function most suited
to compensate for the abnormal loudness growth. Implementation
related challenges Modifications in the short-time magnitude
spectrum without corresponding changes in the phase spectrum can
cause audible distortions. Computational complexity: log or series
approximation based gain calculations not suitable for use in
sliding-band compression. Solution
Slide 18
[email protected] EE Dept., IIT Bombay 1234517/27 3. Offline
& Real-time Implementations Implementation for offline
processing Implementation using Matlab 7.10 for evaluating the
performance of the proposed technique and the effect of processing
parameters. Processing parameters f s = 10 kHz Frame length = 25.6
ms ( L = 256 ) Overlap = 75% ( S = 64 ) FFT size N = 512 2D look-up
table for frequency-dependent compression based on a linear
relation between input-dB and output-dB, with settable CR(k) and P
mc (k). Input range: 20 log intervals (trade-off: small gain
increments, look-up table size). Look-up table with 25620 entries
Attack and release times s a =1, T a = 6.4 ms : Fast attack to
avoid uncomfortable level during transients s r =30, T r = 192 ms :
Slow release to avoid pumping & amplification of breathing
Slide 19
[email protected] EE Dept., IIT Bombay 1234518/27
Implementation for real-time processing Implementation on a 16-bit
fixed-point DSP board to examine suitability of the technique for
use in hearing aids. DSP chip: TI/TMS320C5515 16 MB memory space (
320 KB on-chip RAM with 64 KB dual access, 128 KB on- chip ROM)
Three 32 -bit programmable timers 4 DMA controllers each with 4
channels FFT hardware accelerator ( up to 1024 -point FFT) Max.
clock speed: 120 MHz DSP Board: eZdsp 4 MB on-board NOR flash for
user program Stereo codec TLV320AIC3204: 16/20/24/32-bit ADC &
DAC, 8 192 kHz sampling Software development: C using TI's
'CCStudio ver. 4. 0
Slide 20
[email protected] EE Dept., IIT Bombay 1234519/27
Input-output operations: DMA based I/O with cyclic buffers ADC and
DAC: one codec (left channel) with 16 -bit quantization Processing
parameters (same as for offline processing): f s = 10 kHz, L = 256,
S = 64, N = 512 Data representation (input samples, spectral
values, processed samples): 16 -bit real & 16 -bit imaginary
Implementation
Slide 21
[email protected] EE Dept., IIT Bombay 1234520/27 Data
transfers & buffering operations ( S = L/4 ) DMA cyclic buffers
5 -block S - sample input buffer 2 -block S - sample output buffer
Pointers Current input block Just-filled input block Current output
block Write-to output block (incremented cyclically on DMA
interrupt) Signal delay Algorithmic: 1 frame ( 25.6 ms)
Computational frame shift ( 6.4 ms)
Slide 22
[email protected] EE Dept., IIT Bombay 1234521/27 4. Test
Results Tests for verification and evaluation Offline processing
Verification of the compression technique for speech input with a
large level variation and examination of the effect of different
set of processing parameters. Assessment of output speech quality
(using informal listening) for different input speech materials and
time varying levels. Comparison of distortions introduced by
different compression techniques during spectral transitions.
Real-time processing Comparison of the processed outputs from
offline & real-time implementation: informal listening, PESQ
measure (0 4.5). Signal delay & computational requirement.
Slide 23
[email protected] EE Dept., IIT Bombay 1234522/27 Example:
"you will mark ut please" concatenated with scaling factors for
variation in the input level. CR = 2, T a = 6.4 ms, T r = 6.4 &
192 ms. Input waveform Scaling factor Unprocessed waveform
Processed T r = 6.4 ms, low P mc Processed T r = 192 ms, low P mc
Processed T r = 6.4 ms, high P mc Processed T r = 192 ms, high P mc
Time (s) Results from offline processing Processing of different
speech materials with varying levels: No audible roughness or
distortion during informal listening.
Slide 24
[email protected] EE Dept., IIT Bombay 1234523/27 Time (s)
Distortions during spectral transitions: Example of swept
sinusoidal input. Sliding band compression output Multiband
compression (18 auditory critical bands) output Single-band
compression output Input: constant amplitude, 125 250 Hz linearly
swept frequency, 200 ms sweep duration CR = 30, T a = 6.4 ms, T r =
192 ms.
Slide 25
[email protected] EE Dept., IIT Bombay 1234524/27 Results
from real-time processing Informal listening: real-time output
perceptually similar to the offline output PESQ for real-time
w.r.t. offline : 3.5 Signal delay = 36 ms Use of processing
capacity: 41% (lowest proc. clock for satisfactory operation = 50
MHz, max. clock = 120 MHz) Unprocessed waveform Offline processed
waveform Real-time processed waveform Example: "you will mark ut
please" concatenated with scaling factors for variation in the
input level. CR = 2, T a = 6.4 ms, T r = 192 ms, low P mc. Time
(s)
Slide 26
[email protected] EE Dept., IIT Bombay 1234 525/27 5. Summary
& Conclusions Sliding-band dynamic range compression presented
to compensate for frequency-dependent loudness recruitment
associated with sensorineural hearing loss without introducing the
distortions associated with single-band & multiband
compression. Realized using modified fixed-frame analysis-synthesis
for low computational complexity & without distortions
associated with phase discontinuities. Suitable for speech &
non-speech audio & provision for settable attack time, release
time, & compression ratios. Implemented using 16-bit
fixed-point DSP chip & tested for satisfactory operation: 36 ms
signal delay, 41% use of processing capacity, indicating scope for
combination with other processing techniques. Further work
Evaluation of speech perception by hearing impaired listeners.
Implementation in combination with other techniques (spectral
subtraction, multiband frequency compression, etc.) &
evaluation.
[email protected] EE Dept., IIT Bombay National Conference on
Communications, 28th Feb. to 2nd Mar., 2014, Kanpur, India (NCC
2014) A Sliding-band Dynamic Range Compression for Use in Hearing
Aids Nitya Tiwari and Prem C. Pandey Dept. of Electrical
Engineering IIT Bombay, Mumbai, India Email: { nitya, pcpandey } @
ee.iitb.ac.in Abstract Sensorineural hearing loss is associated
with elevated hearing thresholds, reduced dynamic range, and
loudness recruitment. Dynamic range compression in the hearing aids
is provided for restoring normal loudness of low level sounds
without making the high level sounds uncomfortably loud. A sliding-
band compression is presented for significantly reducing the
temporal and spectral distortions generally associated with the
currently used single and multiband compression techniques. It uses
a frequency-dependent gain function calculated on the basis of
critical bandwidth based short-time power spectrum and the
specified hearing thresholds, compression ratios, and attack and
release times. It is realized using FFT-based analysis-synthesis
and can be integrated with other FFT-based signal processing in
hearing aids to save computation. The technique is implemented and
tested for satisfactory real-time operation, with sampling
frequency of 10 kHz, window length of 25.6 ms with 75% overlap on a
16- bit fixed-point DSP processor with on-chip FFT hardware.
Slide 29
[email protected] EE Dept., IIT Bombay References [1]H.
Levitt, J. M. Pickett, and R. A. Houde, Eds., Senosry Aids for the
Hearing Impaired. New York: IEEE Press, 1980. [2]B. C. J. Moore, An
Introduction to the Psychology of Hearing, London, UK: Academic,
1997, pp 66107. [3]S. A. Gelfand, Hearing : An Introduction to
Psychological and Physiological Acoustics, 3rd ed., New York:
Marcel Dekker, 1998, pp. 314318 [4]P. N. Kulkarni, P. C. Pandey,
and D. S. Jangamashetti, Binaural dichotic presentation to reduce
the effects of spectral masking in moderate bilateral sensorineural
hearing loss, Int. J. Audiol., vol. 51, no. 4, pp. 334344, 2012.
[5] J. Yang, F. Luo, and A. Nehorai, Spectral contrast enhancement:
Algorithms and comparisons, Speech Commun., vol. 39, no. 12, pp.
3346, 2003. [6]T. Arai, K. Yasu, and N. Hodoshima, Effective speech
processing for various impaired listeners, Proc. 18th Int. Congr.
Acoust., Kyoto, Japan, 2004, pp. 13891392. [7]P. N. Kulkarni, P. C.
Pandey, and D. S. Jangamashetti, Multiband frequency compression
for improving speech perception by listeners with moderate
sensorineural hearing loss, Speech Commun., vol. 54, no. 3 pp.
341350, 2012. [8]N. Tiwari, P. C. Pandey, and P. N. Kulkarni,
Real-time implementation of multi-band frequency compression for
listeners with moderate sensorineural impairment, in Proc.
Interspeech 2012, Portland, Oregon, 2012, paper no. 689. [9]P. C.
Loizou, Speech Enhancement: Theory and Practice. New York: CRC,
2007. [10]S. K. Waddi, P. C. Pandey, and N. Tiwari, Speech
enhancement using spectral subtraction and cascaded-median based
noise estimation for hearing impaired listeners, in Proc. Nat.
Conf. Commun. 2013, New Delhi, India, doi: 10.1109/NCC.2013.
6487989. [11]H. Dillon, Hearing Aids. New York: Thieme Medical
Publisher, 2001. [12]R. E. Sandlin, Textbook of Hearing Aid
Amplification, San Diego, Cal.: Singular 2000, pp. 210220. [13]L.
D.Braida, N. I. Durlach, R. P. Lippmann, B. L. Hicks, W. M.
Rabinowitz, and C. M. Reed, Hearing aidsa review of past research
on linear amplification, amplitude compression, and frequency
lowering, Journal of the American Speech and Hearing Association
Monographs 19, pp. 1114, 1979.