Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
EVALUATION OF
INFINITE PEAK CLIPPING
AS A MEANS OF AMPLITUDE COMPRESSION
by
Karen Ann SillettoIf
B.S.E.E., University of California at Davis(1982)
Submitted to the Department ofElectrical Engineering and Computer Science
in Partial Fulfillment of theRequirements for the
Degree of
MASTER OF SCIENCE IN ELECTRICAL ENGINEERING
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
September 1984
Massachusetts Institute of Technology 1984
Signature of AuthorDepartment of Electxtial Engineeringj aad C6mputer Science
July 13, 1984
Certified by:
Accepted by:
Dr. P.M/ Zurek, Thesis Supervisor
A.C.Smith, ChairmanDepartmental Committee on Graduate Students
ivASSACHUSEdiS NSTTUTEOF TFCNOLoGY
OCT 0 4 1984LIBRA1IES
ARCHIVES
EVALUATION OF
INFINITE PEAK CLIPPING
AS A MEANS OF AMPLITUDE COMPRESSION
by
KAREN ANN SILLETTO
Submitted to the Department of Electrical Engineering andComputer Science on July 13, 1984 in partial fulfillment
of the requirements for the Degree ofMaster of Science in Electrical Engineering
ABSTRACT
Infinite peak clipping is a simple and effective
technique for processing speech prior to transmission over
an analog channel with limited dynamic range. However, the
technique has yet to be applied seriously to the problem of
limited auditory dynamic range accompanying sensorineural
hearing loss. In this thesis, an attempt was made to
evaluate the potential for infinite peak clipping as a means
of amplitude compression. Measurements were made of the
compression of the range of speech levels when clipping is
followed by post-filtering (as is presumed to occur when
clipped speech is analyzed by the ear). Spectral
distortions caused by clipping were also assessed.
- 2 -
The widths of level distributions of clipped and
filtered speech were found to be relatively independent of
the characteristics of both the pre-filter and the
post-filter. Measured ranges between the 10% and 90%
cumulative levels were about 10-15 dB as compared to the
input ranges of 30 to 40 dB.
The clipped spectra of unvoiced speech sounds can be
predicted analytically. The clipped spectra of voiced
sounds cannot be easily predicted and so several cases were
examined empirically. In general, despite the radical
distortion of the input time wave, spectral distortions are
not severe. Pre-filtering schemes designed to preserve
relevant cues in the clipped spectra are discussed. It is
concluded that infinite peak clipping as a means of extreme
amplitude compression deserves detailed study with
hearing-impaired listeners.
Thesis Supervisor: Dr. Patrick M. ZurekTitle: Research Scientist, Research Laboratory ofElectronics
- 3 -
ACKNOWLEDGMENT
First, I wish to thank Pat Zurek for his continued
guidance and support on this project. I would also like to
thank all the members of the Communications Biophysics Group
for helping to make the laboratory such an enjoyable place
to work. Although I cannot mention all the people who have
been wonderfully supportive and encouraging, I would like to
thank my office-mates, Dan Leotta, and Jean Reid (not
forgetting Mineola Minos), for making the office such a home
away from home. Also, special thanks to Dan Leotta, Mike
McConnell, and Diane Bustamante for their extra help when I
really needed it. Finally, I would like to thank my parents
and family for their love, and continued support.
- 4 -
TABLE OF CONTENTS
ABSTRACT ......................
ACKNOWLEDGEMENTS ..............
TABLE OF CONTENTS .............
LIST OF FIGURES ...............
CHAPTER 1 INTRODUCTION .......
1.1 Problem Definition
1.2 Previous Research
1.3 Rationale for Pres
CHAPTER 2 METHODS ............
ent Study
......... 0
2.1 Inputs
2.2 Filtering and Clipping
2.2.1 Pre-filters ...
2.2.2 Clippers ......
2.2.3 Post-filters ..
2.3 Dynamic Range Analysis
2.3.1 Level Detection
2.3.2 Probability Distribu
2.4 Spectral Analysis .........
2.4.1 H.P. Spectrum Analyz
2.4.2 ILS Spectral Analysi
..2
..4
..5
..7
..9
..9
12
16
18
18
20
20
21
21
... . 24
... . 24
tion Analysis .25
............... 26
er .............. 26
s ..............27
2.5 Control Measurements and SystemEvaluation...................... .......... 28
- 5 -
........... a 6
.... a
.
.
.
2.5.1 Evaluation of the Level-Distribution
Measurement System....................
2.5.2 Determination of Sample Size forSentences........................
CHAPTER 3 RESULTS ..............................
3.1 Comparison of Distributions in 16Frequency Channels..................
3.2 Comparison of Digital and AnalogImplementations.....................
3.2.1 Digital vs. Analog Sentences
3.2.2 Digital vs. Analog Processin
3.3 Compression Results ...............
3.3.1 Pre-filtered Speech ........
3.3.2 Post-filtered Speech .......
3.4 Spectral Modifications ............
3.4.1 Spectra of Clipped Unvoiced
3.4.2 Phase Dependence of ClippedSounds......................
3.4.3 Spectra of Clipped Vowels ..
CHAPTER 4
4.1
4.2
4. 3
4.4
DISCUSSION ........................
Compression Results .............
Comparison with an AGC System ...
Spectral Modifications ..........
Implications for Hearing Aid Desi
37
37
...... 40
...... 40
g ..... 43
....... 45
... . 45
....... 50
....... 51
Speech 54
Voiced....... 69
....... 73
... .94
... . 95
... . 98
... .. 99
.100
4.5
REFERENCES
APPENDIX A
APPENDIX B
Recommendations for Future Work ......... 101
. . . . . .. . . . ......................102
Probability Densities of Test Signals .... 104
10 Harvard Sentences ..................... 109
- 6 -
a .28
.32
gn
LIST OF FIGURES
1.1 A running spectrum of a two-word utterance, beforeand after infinite peak clipping (Taken from Licklideret al., 1948)..... ....................................... 15
2.1 System Block Diagram................................. 19
2.2 Measured and calculated ranges for narrowband noiseafter squaring and lowpass filtering..................... 30
2.3 Measured and calculated ranges for tones after squaringand lowpass filtering with an RC-lowpass filter.......... 31
2.4 Ranges in the 1/3-octave band centered at 1000 Hz as afunction of'sentence sample size......................... 34
2.5 Ranges across 13 frequency channels comparing inputsof 10 and 20 digital Harvard sentences.................... 36
3.1 Ranges across 13 frequency bands for both the clippingand nonclipping conditions................................. 38
3.2 Clipped and nonclipped ranges comparing inputs ofdigital and analog Harvard sentences...................... 42
3.3 Effect of different pre-filters on ranges in 13frequency bands.......................................... 47
3.4 Effect of different narrowband pre-filters on rangesin the 1000 Hz frequency band............................. 49
3.5 Effect of varying the post-filter bandwidth and slopes.Pre-filter is an allpass filter........................... 52
- 7 -
3.6 Effect of varying the post-filter bandwidth and slopes.Pre-filter is an optimal filter and differentiator....... 53
3.7-3.12 Spectra of synthetic consonant noise burstsbefore and after clipping................................
3.13-3.19 Spectra of filtered noise before and afterclipping.................................................
3.20 The phase dependence of clipped voiced sounds.Phase shifted time-waves.................................
3.21 The magnitude spectra of phase shifted time-waves...
3.22-3.39 Spectra of synthetic vowels before and afterclipping.................................................
55
62
71
72
76
Al Probability density functions for narrowband noise aftersquaring and lowpass filtering for R less than 1........ 107
A2 Probability density functions for narrowband noise aftersquaring and lowpass filtering for R greater than 1..... 108
- 8 -
Chapter 1
INTRODUCTION
1.1 Problem Definition
Sensorineural hearing impairments are characterized by
elevated hearing thresholds without a corresponding
elevation in the discomfort threshold. This reduction in
dynamic range, if severe enough, can necessitate a
compromise between speech intelligibility and comfort.
Consequently, amplitude compression of speech has been
extensively studied as a means of matching the dynamic range
of speech and other acoustic signals to the reduced dynamic
range of the impaired listener.
The most common means of amplitude compression has been
automatic gain control (AGC) in which the gain of -an output
amplifier is controlled by an estimate of the input
amplitude. Results of studies of AGC in hearing aids are
reviewed by Braida et al. (1979). While there are clear
benefits and applications for such systems, there are also
limitations in the hearing-aid application. For instance,
due to the "attack time" of an AGC system, the effectiveness
in protecting the user from high-level transient sounds is
limited. Another limitation is that the actual degree of
- 9 -
compression is less than that specified (DeGennaro et al.,
1981).
Infinite peak-clipping is a simple method of amplitude
compression that drastically reduces the dynamic range of
the speech signal by clipping the input time-wave so that a
rectangular wave of constant amplitude results. Only the
zero-crossings of the original waveform are preserved.
Since the output (envelope) amplitude is constant, infinite
peak clipping produces maximal range reduction of the
wideband signal.
The obvious cost of this range compression is severe
waveform distortion. However, it has been found that,
despite this extreme distortion, infinitely-clipped speech
can be highly intelligible (Licklider, 1946; Licklider,
Bindra, Pollack, 1948). Thus, it would seem that peak
clipping should be considered as a means of amplitude
compression in hearing aids and also perhaps cochlear
implants and tactile aids, since it achieves large
compression of the amplitude range of speech with relatively
little loss in intelligibility.
Though clipping produces a constant-amplitude wideband
signal, the perceived loudness may not be constant because
the ear effectively filters incoming sounds into frequency
bands (critical bands) whose outputs are believed to be
- 10 -
relevant to loudness (Green and Swets, 1966, Chapter 10).
The amplitude range of speech after post-filtering has not
yet been studied and clearly needs to be understood in order
to evaluate clipping as a means of amplitude compression.
The effects of clipping on important cues for speech
intelligibility are also not well understood. In
particular, while it is generally accepted that the
short-time spectral amplitude pattern of speech is
important, changes in this pattern resulting from infinite
peak clipping are not understood. It is possible that, even
though wideband clipping produces fair intelligibility,
multiple narrowband clippers could better preserve spectral
amplitude patterns and consequently increase
intelligibility.
The general problem addressed in this thesis is how
infinite peak clipping might be used to compress the
amplitude range of speech for the hearing-impaired. More
specifically, the thesis will investigate the influence of
pre- and post-filtering on the amplitude distributions and
spectral patterns of clipped speech. The range of
filtered-clipped-filtered speech will be measured and the
spectra of clipped waveforms will be analyzed in order to
evaluate the potential for clipping as a means of amplitude
compression, as well as to understand past studies which
- 11 -
examined the intelligibility of clipped speech.
1.2 Previous Research
Peak clipping was first introduced in radio
transmitters during WWII as a means for maximizing the
intelligibility over a transmission channel with limited
dynamic range. Licklider (1946) studied different types of
amplitude distortion and concluded that peak-clipping was
the least detrimental to intelligibility. Licklider's
experiments showed that since infinite peak clipping
preserves only the zero crossings of the input signal, the
pattern of instantaneous amplitudes in the speech waveform
is not essential for intelligibility.
Pollack (1952) confirmed Licklider's results and showed
that the intelligibility of infinitely-clipped speech,
relative to unmodified speech, is a function of SNR (with
signals equated in terms of peak amplitude and with noise
added after the clipper) and roughly independent of the
frequency range of the speech signal. Clipped speech is
more intelligible than normal speech at low SNR's because
the square speech has more power per unit peak amplitude
than normal speech. At high SNR's, distortion introduced by
the clipper degrades intelligibility relative to unmodified
speech.
- 12 -
Intelligibility of clipped speech is improved by
highpass filtering the speech before clipping (Licklider,
Bindra, Pollack, 1948; Pollack, 1952; Thomas and
Niederjohn, 1970; Thomas and Ravindran, 1971). One
interpretation of this finding (Thomas and Niederjohn, 1970)
is that a highpass pre-filter with cutoff at 1100 Hz reduces
high-amplitude, low-frequency signals and results in better
preservation of the second formant, which is very important
for intelligibility.
There has been one investigation of clipped speech with
hearing-impaired listeners. Thomas and Sparks (1971)
compared the intelligibility of filtered-clipped speech with
linearly-amplified speech by testing 17 ears of 16
hearing-impaired subjects. For 13 of the 17 cases, the
filtered-clipped speech was more intelligible than
linearly-amplified speech with the two types of speech
equated in terms of their overall SPLs. However, these
results are open to the criticism that the linear reference
condition may have been less than optimal.
Hildebrant (1982) investigated a multiband clipping
system that filtered the incoming speech into frequency
regions corresponding to the ranges of the first three
formants. The frequency bands were individually clipped,
filtered again and then summed together. A reduced dynamic
- 13 -
range was created for normal listeners by adding white noise
after processing and restricting the wideband signal to be
below an artificial "discomfort threshold". Intelligibility
with clipped speech was much better than with
linearly-amplified speech. A similar system was studied by
Guidarelli (1981), who found that intelligibility was
greatest with 4 octave-wide or 6 two-third-octave-wide
channels.
The general conclusion that can be drawn from past
studies is that infinite peak clipping reduces the dynamic
range of wideband speech while maintaining a fair degree of
intelligibility. This fundamental result, that the
"amplitude information" in a speech waveform can be
discarded without a drastic degradation of intelligibility,
initially appears quite surprising. However, the key to
understanding this result lies in the realization that
"amplitude information" is also coded in the zero-crossings
of a wave. Although it is not at all obvious from
inspection of the wave, the spectra of clipped signals are
often very similar to the spectra of unclipped signals.
Licklider, Bindra, and Pollack (1948) clearly
emphasized the importance of examining the spectral changes
caused by clipping in order to understand the relatively
small effect of clipping on intelligibility. Figure 1.1,
- 14 -
SW -- 4E SEN --------- CHrIME--m
NO CLIPPINGI..
I.-
Figure 1.1 A running spectrum of a two-word utterance,before and after infinite peak clipping (taken from
Licklider et al., 1948).
- 15 -
SH----- E 8 EN-----------;H
INFINITE PEAK CUPPING
taken from their paper, shows a running spectrum (a 3-D plot
of amplitude vs. time and frequency) of a two-word
utterance before and after infinite peak clipping. From
this frequency analysis it is clear why clipping has little
effect on intelligibility - the spectral patterns of clipped
and unclipped speech are not drastically different.
1.3 Rationale for Present Study
The rationale for studying the effect of pre- and
post-filters on the range of clipped speech and the spectral
distortions caused by clipping follows from an underlying
model and set of assumptions about the operation of the ear
and the perception of speech. First, it is generally
believed that the ear analyzes incoming speech into
frequency bands that are approximately one-third of an
octave wide. In the present study, it is assumed that the
quantity relevant to dynamic range is the envelope of these
bandpass signals. A major portion of this thesis addresses
the question: what are the effects of post-filtering on the
envelope distributions of infinitely-clipped speech waves?
- 16 -
A second important assumption is that the primary cues
to intelligibility lie in the spectral (magnitude) patterns
of speech. In particular, it is assumed that local maxima
in the spectra (formants) are vital to intelligibility. The
effects of clipping on these patterns will be studied as
well as ways, such as pre-filtering, in which the spectral
distortions can be minimized.
- 17 -
Chapter 2
Methods
A block diagram of the system used in the present study
is shown in Figure 2.1. Input signals (speech and other
test signals) were either first filtered with a pre-filter
and clipped, or they were unmodified. The spectra of the
two signals at point 'A' were compared. The dynamic ranges
of these signals were assessed by measuring the
distributions of the signal envelopes after bandpass
filtering.
2.1 Inputs
Harvard sentences (IEEE, 1969) spoken by a male talker
were chosen as input signals. It was desired to use a
speech sample that had an amplitude distribution that was
characteristic of spoken English, but yet was of manageable
size for digital processing. The number of sentences was
determined by preliminary measurements described below
(Section 2.5).
- 18 -
Level Distribution
..-- -- - -0(A)
Input Pre-filter InfiniteO---- -- i eak clipper - -
(d,a) (d,a) (d,a)
d = digital
a = analog
J
Post-filter Level ProbabilityDetector Analysis ' Single Channel
(d,a) (a) (a) I
16 bands
Post-filter Level Probability(Multifilter) Detectors Analysis
(a) * (a) (a) Multiple Channel
H.P. 3582A H.P. 7004B0--Spectrum Analyzer X-Y Recorder
(a) (a)
ILS Analysis
(d)
Figure 2.1 System block diagram
2.2 Filtering and Clipping
Many of the operations shown in Figure 2.1 could be
implemented either digitally (not in real time), or with
analog equipment. The letters 'd' and 'a' in Figure 2.1
indicate digital and analog capability.
2.2.1 Pre-filters
The analog pre-filter was a 2-pole, 1100-Hz highpass
filter suggested by Thomas and Niederjohn (1970). For ease
of reference, this filter is termed "optimal" because Thomas
and Niederjohn found it to give the best intelligibility of
a number of pre-filters. Also for ease of reference, the
condition in which the signal is clipped without
pre-filtering will be termed "allpass" pre-filtering.
A software package, ILS (Interactive Laboratory System)
written by Signal Technology Incorporated, was used to
design the digital filters and filter waveforms. An
infinite impulse response Butterworth design was used for
the digital filters. Digital pre-filters were a digital
"optimal" filter, an allpass filter (i.e. no
pre-filtering), and a single-pole, 6 dB/octave 8000-Hz
highpass filter, which will be termed "differentiator" for
- 20 -
obvious reasons. Speech was also digitally pre-filtered
into the three formant regions that Hildebrant (1982)
described. The first formant filter was a bandpass filter
between 200 Hz and 900 Hz, the second formant filter was 900
Hz to 2800 Hz, and the third formant filter was 2800 Hz to
6000 Hz. Finally, narrowband pre-filters with bandwidths
less than 231 Hz and center frequency at 1000 Hz were used
for the purpose of exploring the effects of the pre-filter
bandwidth and slope.
2.2.2 Clippers
Analog clipping was performed by a.Schmitt trigger with
an adjustable hysteresis "dead zone" (Hildebrant, 1982).
The dead zone was adjusted to a minimum value which
prevented triggering by internal noise so that no output
signal resulted from a null input signal. Digital clipping
was performed by keeping only the sign of the digitized
waveform values (except zero, which remained zero).
2.2.3 Post-filters
The clipped waveform (at point 'A' in Figure 2.1) could
be post-filtered (before detection) into adjacent
"critical-band" filters using a General Radio (1925)
- 21 -
multifilter, or digitally post-filtered into a single
frequency band. The multifilter contains 30 one-third
octave, 6-pole (3-zero), Butterworth filters with center
frequencies that range from 25 Hz to 8000 Hz. The filter
skirt slopes near the passband are about 36 dB/octave and
flatten out to about 18 dB/octave due to the three zeros in
the transfer function.
Only the outputs from 19 one-third octave bands between
125 Hz and 8000 Hz are used, since the most important cues
for intelligibility lie in this frequency range. Because
critical bands are proportionately broader at lower
frequencies, outputs from the three lowest of these
one-third octave bands are summed to form channel 1, and the
outputs from third-octaves centered at 250 Hz and 315 Hz are
summed to form channel 2. The remaining channels are the
outputs of single third-octave bands. The bandwidths and
center frequencies of the 16 frequency channels are listed
in Table I.
For greater flexibility in examining the effect of
post-filter skirts and bandwidths, single band post-filters
were implemented digitally. Post-filter skirts were varied
by changing the number of poles in the filter transform from
2 to 8 poles. The bandwidth of the postfilter was varied
from 231 Hz to .29 Hz. The filtered bandpass output was
- 22 -
Table I
Channel Centernumber frequency
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
160
280
400
500
630
800
1000
1250
1600
2000
2500
3150
4000
5000
6300
8000
Bandwidth
Hz 113
130
93
115
146
185
231
289
371
463
579
729
926
1158
1459
1852
Hz
RC-FilterTime -3 dB
Constant Frequency
17825.0 usecs 9 Hz
9390.0 17
5730.0 28
4456.0 36
3390.0 47
2657.9 60
1989.4 80
2021.0 79
1559.7 102
1368.7 117
859.4 185
859.4 185
251.5 531
198.9 795
99.9 1592
39.9 3979
- 23 -
R
12.50
7.65
3.32
3.19
3.11
3.08
2.89
3.66
3.63
3.95
3.12
3.94
1.74
1.46
0.92
0.47
input into a single analog detector.
2.3 Dynamic Range Analysis
2.3.1 Level Detection
The hardware used to detect the envelope of the signal
is part of a multi-channel Automatic Gain Control
compression system designed and developed by Coln (1979).
Each level detector performs a short-term rms calculation in
which the bandpass signal is first squared and then filtered
by an RC-lowpass filter. The output of one of these
detectors is an estimate of the logarithm of the envelope
squared. There are 16 level detectors, one for each of the
16 bandpass outputs of the multifilter. When a single-band
digital post-filter is employed, the output of the filter
(after D/A conversion) is presented directly to the level
detector with an 80 Hz RC-filter (channel 7).
Each RC-lowpass filter time constant can be adjusted by
varying the filter capacitor. In channels 1 through 12, the
time constants were adjusted to keep the second-harmonic
ripple down to 6 percent at twice the center frequency of
the signal. However, in the two lowest channels the
bandwidth is larger than one-third octave, therefore a
- 24 -
greater percentage of the signal envelope will be filtered,
resulting in a "smoother" signal envelope. In channels 13
through 16, the RC-filter bandwidth is larger in proportion
to the input signal center frequency than in the other
channels, so more second harmonic ripple will be included in
the estimate of the envelope. The filter time-constants are
listed in Table I.
2.3.2 Probability Distribution Analysis
Samples from the 16 level detector outputs were
digitized, and "bins" corresponding to the decibel level and
channel of each sample were incremented. The signal in each
band was sampled at a rate of 400 samples/second, giving a
total sampling rate of 6400 samples/second. The binwidth
was 1/2 dB and the dynamic range, which was determined by
the level detector, was 80 dB in each band.
Normalized histograms of the decibel level of the
signal envelope are plotted for each of the 16 frequency
channels. The distributions are normalized to the largest
histogram value in the 16 channels. The rms level of each
distribution is calculated along with the 10% and 90%
amplitude levels., The 10% level is the amplitude level
which only 10% of the speech samples exceed; 90% of the
samples have amplitudes greater than the 90% point.
- 25 -
Henceforth in this thesis, the term "range" applied to a
level distribution will mean the difference between the 10%
and 90% levels.
Samples of silence or low-level noise may corrupt the
distributions. In cases where the signal and noise
distributions are clearly separated, the noise samples can
be simply deleted. The modified file will produce a more
accurate signal amplitude distribution. Such distributions
will be labeled "edited" in the following sections.
2.4 Spectral Analysis
Spectral analysis was performed using two methods. The
H.P. 3528A spectrum analyzer computed and averaged the
spectral magnitudes of a number of time samples. Spectral
graphs were plotted with the H.P. 7004B X-Y recorder. The
ILS software package was also used to compute and plot the
magnitude spectrum of a single time sample. Both methods
used the Fast Fourier Transform (FFT) algorithm.
2.4.1 H.P. Spectrum Analyzer
The H.P. 3582A spectrum analyzer transforms a finite
segment of discrete time data into a discrete frequency
spectrum using an FFT implementation. In single-channel
- 26 -
mode, 1024 samples of the input signal are transformed by
the FFT into 256 complex values. The duration, T, of a time
frame is determined by the selected frequency spans and the
number of samples,
T = 1024/(4*frequency span) = 256/frequency span.
For example, to compute the spectrum over a 5000 Hz span, a
51.2 msec time segment is sampled. The discrete frequency
points are spaced at equal intervals, vF=l/T. In this
example frequency resolution is 19.5 Hz.
The H.P. analyzer also allows for up to 256 spectra to
be averaged to give a better estimate of the spectrum of a
random signal. Time samples were weighted with a Hanning
window in order to minimize both the amplitude and frequency
uncertainty in the spectrum.
2.4.2 ILS Spectral Analysis
The spectrum of a finite digital wave can be computed
and graphed with an ILS routine which computes a 1024-point
FFT. The 1024-point time sample was weighted with a Hamming
window in order to minimize the frequency and amplitude
distortion. Since only 1024 points are used, static time
waves which repeat periodically in less than 1024 sample
points (such as static vowels) are appropriate signals for
this analysis.
- 27 -
2.5 Control Measurements and System Evaluation
2.5.1 Evaluation of the Level Distribution Measurement
System
In order to assess the performance of the level
detector system, ranges of signals with known distributions
were measured. Analog narrowband random noise and tones
were input directly into the level detector in channel 7,
which has an RC-filter cutoff of 80 Hz. Narrowband random
noise was used as a test signal because its envelope is
known to follow the Rayleigh probability density. The
narrowband noise was centered at 1000 Hz, and the bandwidth
of the noise ranged from 3 Hz to 231 Hz.
The measured range for a 10-Hz band of noise was 11 db,
2 dB smaller than the expected range of 13.4 dB for a
Rayleigh-distributed variable. The measured range for a
10-Hz tone was 13 dB, 3 dB smaller than the expected range
of 16 dB for a tone.
The discrepancy between the measured and expected
ranges of the envelope distributions can be attributed to
the influence of the RC-lowpass filter. Calculated
probability densities for narrowband noise and tones after
squaring and lowpass filtering are given in Appendix A. The
- 28 -
calculated and measured ranges for noisebands and tones are
graphed as a function of R, the ratio of input signal
bandwidth (or frequency) to the RC-filter bandwidth in
Figures 2.2 and 2.3, respectively. It is evident from these
figures that the measured ranges follow the calculated
ranges to within about 1 dB. As the noise bandwidth or the
tone frequency increases, the difference between the input
envelope range (dashed line) and the lowpass-filtered
envelope range increases. If the RC-filter bandwidth is
much larger than the noise bandwidth (or tone frequency),
and if the second-harmonic ripple can be neglected, then the
effect of lowpass filtering will be small.
Residual deviations between the calculated and measured
ranges may be attributable to sampling errors or to circuit
imperfections. A possible reason for the measured range
being slightly greater than the calculated range for noise
with bandwidths larger than approximately 80 Hz is that the
analysis did not include the effects of second harmonic
ripple. As the noise bandwidth increases, the signal
envelope will include more second-harmonic ripple which will
increase the measured level range.
It is expected that the decibel underestimation of the
input envelope range will be the same for signals, such as
speech, with level ranges that are much wider than those of
- 29 -
j. = calculated ranges- o= measu red range
14-
t2 +0 0
1 ~ 0 +- 0C0
++ 0
6 4-
4-
2-
313 .0625 0.2 0.5 .0 2.0 4.0 6.0
= Input Noise Bandwidth/Post-detector Bandwidth
Figure 2.2 Measured and calculated ranges for narrowband noise(centered at 1000 Hz) after squaring and lowpass filteringwith an RC-lowpass filter. Range is graphed as a functionof the input noise bandwidth divided by the RC-filterbandwidth. Dashed line represents the expected range fora Rayleigh distributed variable.
- 30 -
calculated range
measured range
---------------------------------------------------------- 0
0 +
0
0
0+ A-
0
0 --
L. 2.0 4.0 8.0
R = Tone Frequency/Post-detector Bandwidth
Figure 2.3 Measured and calculated ranges for tones after squaringand lowpass filtering with an RC-lowpass filter. Range isgraphed as a function of the tone frequency divided by theRC-filter bandwidth.
- 31 -
20
14
12 -#-
(Y_
a,
c
8
4
2-
0.0 -/, .0313 .062S 0.12S 0.2S 0.S
noise or tones. If we assume that such wide distributions
result from a large but slow modulation of a bandpass
process, then the resulting level distribution should be the
convolution of the distributions of the modulation and the
bandpass envelope. If the modulation is sufficiently slow,
its variation will be unaffected by the lowpass filter.
Thus, for example, if there is a 5-dB underestimation of the
input 13.4 dB range of narrowband noise, the underestimation
will still be about 5 dB if the noise is slowly modulated
over a range of, for instance, 40 dB.
2.5.2 Determination of Sample Size for Sentences
The range of a bandpass speech signal estimated with an
input of n sentences should converge to a constant value as
n is increased. The sample size used in the main study was
chosen by preliminary measurements in which the number of
sentences in the sample was increased until there was only a
small change in the range for an increase in the sample
size.
The level ranges of single sentences were measured
after digitally post-filtering by a one-third octave (231
Hz) filter centered at 1000 Hz. The ranges were found to
- 32 -
vary from 25 dB to 46 dB among a sample of 10 sentences
(Figure 2.4). Since this range, for 10 single sentences,
varied 21 dB, a larger sample of sentences was considered.
The sample size was increased to four, ten, or twenty
sentences and the range was measured (Figure 2.4). (The
10-sentence sample contained the original 4 sentences, and
the 20-sentence sample contained those same 10 sentences.)
It appeared that a sample size of 10 or 20 sentences would
be appropriate since the measured ranges, 36 dB and 38 dB,
were close to the results obtained by Dunn and White (1939)
for one-half octave wide di'stributions above 500 Hz, and by
Krieg (1980) and DeGennaro et al. (1981) for a third-octave
distribution.
The measured ranges in 13 frequency channels for an
input sample of the 10 and 20 sentences were compared
(Figure 2.5). In order to examine all channels, the
sentences were bandpass filtered with the analog one-third
octave multifilter. (The ranges in channels 14, 15, and 16
were omitted because the input was filtered at 4.5 kHz). In
all 13 frequency channels the ranges differed by at most 3
dB. Since the difference was small, 10 Harvard sentences
were chosen for the sample data. The 10 sentences chosen
are listed in Appendix B. The sentences are aproximately 3
seconds long, so 30 seconds of data at 400 samples/second,
- 33 -
- 1 1 1 1 1 1 1 1s0
46 H-
40 I-
301-
I i I I I I I I I I0 2 4 G 9 10 12 14 1G 19 20 22
Number oF Digital Harvard Sentences
Figure 2.4 Ranges in the one-third octave band centered at1000 Hz as a function of sentence sample size.
- 34 -
Cn
20
or 12000 samples of data are included in each histogram from
which range is estimated.
- 35 -
j. = 10 sentences_o = 20 sentences
4S-
40 - 0a
<U
20-
10-
0-
- .16 .28 .4 . .83 .8 1.0 1.2S 1.9 2.0 2.9 3.19 4.0
Band Center Frequency (kHz)
Figure 2.5 Ranges across 13 frequency channels comparinginputs of 10 and 20 digital Harvard sentences.
- 36 -
Chapter 3
Results
3.1 Comparison of Distributions in 16 Frequency Channels
The histograms of one-third octave bands of clipped and
unmodified speech were investigated in the following
experiments. First, the 10 digital sentences were clipped
(pre-filter was an allpass filter) and post-filtered by the
multifilter; second, the sentences were pre-filtered with an
1100-Hz highpass filter (analog), clipped, and post-filtered
by the multifilter. The reference condition of speech
filtered only by the multifilter was also investigated.
The amplitude ranges, shown in Figure 3.1, exhibit a
dependence on band center frequency. Part of this dependence
may be attributed to the variation across bands of the
quantity R, the ratio of input bandwidth to the detector's
RC-filter bandwidth. From the analysis presented in Section
2.5.1 (see Figure 2.3), we expect a smaller range in the
channels centered at 160 Hz and 280 Hz, because R is larger
than in the higher frequency channels (see Table I).
Conversely, there should be relatively less smoothing of the
envelope in channel 13 (centered at 4000 Hz) since the
bandwidth of the RC-filter is larger in proportion to the
- - 37 -
I I 0
0
03
C 03
[= F2= F1/C/F2
0= C/F2
0
001
03
0 0
41- -
1-0 0 -
of-0
00
-1-
00
0
-1-
0
0 0
0I I I I I I I I I I I~ I
- .1 .28 .4 .9 .63 .8 1.0 1.26 1.6 2.0 2.9 3.19 4.0
Band Center Frequency (kHz)
Figure 3.1 Ranges across 13 frequency bands for clipping and.nonclipping conditions. Input is 10 digital Harvard sentences.
F1 = 1100 Hz highpass filter (analog)C = Infinite peak clipper (analog)F2 = Multifilter
- 38 -
45-
ea1
:7)
35-
30-
25-
20-
0
is
10 I-
signal bandwidth (R is smaller) than in the lower-frequency
channels. There will also be more second-harmonic ripple in
channel 13 which will tend to increase the range. (Note that
results for only the first 13 channels are plotted because
the digital signals were lowpass filtered at 4.5 kHz, and the
three highest channels are centered at frequencies above that
cutoff).
The amplitude range for speech filtered by the
multifilter varies at most 15 dB across all frequency
channels and averages 32 dB. The ranges agree reasonably
well with the results of Krieg (1980), who used the present
system, as well as with Dunn and White (1940) who. used a
different measurement procedure (peak pressures in 1/8 second
intervals). The present measurements are slightly smaller
than those reported by DeGennaro et al. (1981), who also
used the same system as used here for measuring bandpass
envelopes. It should be noted that while DeGennaro et al.
(1981) display a 50-dB range for one talker and a single band
(Figure 4 in their paper), their subsequent figures (5, 6,
and 7) for at least two talkers and all sixteen bands show
nearly all ranges to be between 30 and 40 dB.
- 39 -
Clipping reduces the range of speech by about 15 to 20
dB across all frequency bands. The range of bandpass speech
that was pre-filtered with an 1100-Hz highpass filter and
clipped varies 12 dB across frequency bands and averages 15.8
dB. The range of clipped speech with no pre-filtering varies
by 11 dB and averages 9.5 dB across bands. This difference
between highpass pre-filtering and no pre-filtering will be
discussed further in Section 3.3.1.
In subsequent experiments in which a single channel is
analyzed, the band centered at 1000 Hz is used. This band
was chosen as representative because it is approximately in
the middle of the speech frequency range, and also because
the measured ranges in this band are near the respective
averages across bands for the three conditions in Figure 3.1.
3.2 Comparison of Digital and Analog Implementations
3.2.1 Digital vs. Analog Sentences
In order to identify any differences in the measurements
due to digital or analog sentence inputs, histograms made
with tape-recorded analog sentences and digital sentences
were compared. Thirty tape-recorded analog Harvard sentences
and 10 digital Harvard sentences (after D/A conversion and
lowpass filtering at 4.5 kHz) were processed identically with
- 40 -
the analog equipment. Ranges were compared for bandpass
filtered speech and clipped/filtered speech. In the clipping
condition, these sentences were pre-filtered with the 1100-Hz
highpass filter, infinitely peak-clipped, and bandpass
filtered into the sixteen channels by the multifilter. The
analog sentence histograms were "edited" (as described in
Section 2.3.2) to eliminate obvious noise. No editing was
neccessary for the digital sentence histograms. The ranges
are shown in Figure 3.2.
Except for discrepancies at 800 and 2500 Hz, the
difference between the ranges for digital and analog
sentences in the nonclipped condition is less than 5 dB.
(Note that the three bands above 4.5 kHz are included for the
analog sentences.) With the exception of channel 1, a larger
range was measured with analog sentences than with digital
sentences.
In the clipping condition, the difference between the
ranges for digital and analog sentences is 2 to 3 dB. The
smaller discrepancy here may be attributed to the clipper's
"dead-zone" which was adjusted so that there was no output
produced by tape hiss (when the speech was off). Because the
distributions of the unclipped analog sentences are wide (30
to 40 dB), it is more likely for the speech and noise
distributions to overlap in this condition than with clipp.ed
- 41 -
s0
45
33
3- 30
c 2S
20
15
10
0
- .16 .29 .4 .S .G3 .8 1.0 1.2S LB 2.0 2.G3.194.0 S.0 6.3 9.0 -
Band Center Frequency (kHz)
Figure 3.2 Ranges across 13 or 16 frequency bands comparingdigital and analog Harvard sentences as inputs.
Fl = 1100 Hz highpass filter (analog)C = Infinite peak clipper (analog)F2 = Multifilter
- 42 -
S I I I I I I I I I I I I IAnalog Sentences (edited) OLgftel Sentences
3 = F2 4- = F2- = FI/C/F2 x = FI/C/F2
0
3 0 0 0+ 0
- 4++
- + + 0
0 0
x 0x
06o
I I I I I I I I II| | |
speech. Under clipping, the mixing of noise and speech
distributions is reduced because the range of speech is
reduced.
It appears from these measurements that digital
sentences are superior to analog sentences in terms of
signal-to-noise ratio. However, for the clipping condition,
analog and digital sentences produce similar results.
3.2.2 Digital vs. Analog Processing
In order to investigate the effects of a number of
different pre- and post-filters, digital filters were
designed. Histograms after digital and analog processing
were compared. The 10 digital sentences listed in Appendix B
were used as test inputs. The analog multifilter, 1100 Hz
highpass filter, and clipper were used for analog processing.
Counterpart digital filters were designed for digital
processing. The comparison was made on the one-third octave
distribution centered at 1000 Hz (channel 7).
Digital and analog processing were compared with three
different pre-filters (an allpass filter, an 1100-Hz highpass
filter, and a one-third octave (231 Hz) filter centered at
1000 Hz) before clipping as well as for the reference
condition of no clipping.
- 43 -
Table II
A comparison of ranges in channelcenter frequency=1000 Hz) afterprocessing of 10 digital sentences.
7 (bandwidth=231 Hz,digital and analog
F2 = 1/3 octave (231-Hz bandwidth) filter centered at 1000 HzFl = 1100-Hz highpass pre-filterC = Infinite peak clipper
Digitallyprocessed
Analogprocessed
F2
36
34
C/F2
11
14
Fl/C/F2
13
18
F2/C/F2
6
7
- 44 -
Table II shows that the ranges differ between digital
and analog processing by at most 5 dB. An interesting fact
is that pre-filtering the speech with a one-third octave
filter decreases the range from 36 to 6 dB. This will be
discussed in Section 3.3.1.
3.3 Compression Results
3.3.1 Pre-filtering
The following experiments were conducted to determine
the effect of various pre-filters on the range of clipped
speech. The pre-filters used here were those employed in
various studies of clipped speech. The first is an allpass
filter (i.e. no pre-filtering) as used by Licklider (1948),
for example. The second is an RC highpass filter with cutoff
at 8000 Hz (subsequently referred to as a differentiator) as
used by Licklider and Pollack (1948) and Thomas and
Niederjohn (1968). The third is an 1100-Hz (12 dB/octave)
highpass filter (subsequently referred to as the "optimal"
filter) suggested by Thomas and Niederjohn (1968). A fourth
pre-filter is the set of pre-filters described by Hildebrant
(1982) that was designed to separate the formant regions of
speech. The first-formant filter is a 200 Hz to 900 Hz,
4-pole Butterworth bandpass filter (spanning channels 1
- 45 -
through 6), the second-formant filter is a 900 Hz to 2800 Hz,
10-pole Butterworth bandpass filter (spanning channels 7
through 11), and the third-formant filter covers the range
from 2800 Hz to 6000 Hz (channels 12 and 13) and is also a
10-pole Butterworth filter.
The ranges in the thirteen channels under the various
configurations are shown in Figure 3.3. Clipped speech with
no pre-filtering has the lowest overall range, from 9 dB to
20 dB and averaging 13.7 dB across frequency bands.
Differentiating the sentences before clipping increases the
range relative to no pre-filtering by about 6 dB across all
bands; these ranges vary from 13 dB to 27 dB and average
19.8 dB. Pre-filtering with an optimal filter does not
increase the range quite as much as differentiating; the
range is 9 dB to 22 dB across all bands and averages 15.5 dB.
In the low-frequency bands, speech pre-filtered with the
first formant filter has the smallest range of the four types
of pre-filtering (averaging 12.8 dB in the first 6 channels).
The second and third formant filters produce an effect
similar to the differentiator and optimal filters. The
average ranges in these two formant filter regions are 16.6
dB in channels 7 through 11 and 14.5 dB in channels 12 and
13.
- 46 -
Fl/C/F= F2/C/F= F3/C/F
x = C/F= Fopt/C/F= FdtF/C/F
35-
30-
7 +~25-
20 - \CL/ +-, x x Zk\4
0 7-
XV X x"
10 - /
S_
- .16 .28 .4 .5 .S3 .8 10 1.25 1.6 ZO 2.5 3.15 4.0
Band Center Frequency (kHz)
Figure 3.3 Effect of different pre-filters on ranges in 13frequency bands.
F1 = 200-900 Hz bandpass digital filterF2 = 900-2800 Hz bandpass digital filterF3 = 2800-6000 Hz bandpass digital filterFopt = 1100 Hz highpass digital filterFdif = 8000 Hz highpass digital filterC = Infinite peak clipperF = Multifilter
- 47 -
The effects of varying pre-filter bandwidth and
rejection slope were investigated in the following
experiments. The sentences were digitally pre-filtered,
clipped, and digitally post-filtered with a one-third octave
bandpass filter centered at 1000 Hz, which had the same
filter-skirt slope as the pre-filter also centered at 1000
Hz. The pre-filter bandwidth varied from 231 Hz to 29 Hz and
the slope of the pre-filter and post-filter skirt were varied
together by changing the Butterworth filter transfer function
from 2-pole to 8-pole. The signal level was detected in
channel 7 (RC-filter bandwidth is 80 Hz) after bypassing the
multifilter.
Plots comparing the ranges of speech pre-filtered with
the narrow-band filters to speech pre-filtered with the
allpass, optimal, and differentiator filters are shown in
Figure 3.4. Except for the case of a 2-pole Butterworth
filter, the range for clipped speech that was pre-filtered
into a one-third octave (231 Hz) band is 5-8 dB smaller than
speech that was pre-filtered with the differentiator or
optimal filters. Generally, as the bandwidth of the
pre-filter decreases, the range increases. Given that the
pre-filter transform has more than 2 poles, varying the
pre-filter slope does not change the ranges appreciably.
- 48 -
2-pole Butterworth Filter
20 -
t4 -
12 -
to
2
4-pole Butterworth Filter
=1
4
0,M
X
- 2q G - 11S - 1_7 - 231 - - - O -p - All-
Bandwidth (Hz) P-e-FE~ter
6-pole Butterworth Filter
I I I I I I I
14
*
14
14 14
14 14
14
* 2q S - IlS - 173 - 2.31 - - AL-
Bandwidth (Hz) P
--4
20 -
18 -
16-
1-4
2-
11
-2q S2 - 1S - 1-73 - 231 - - - OIFP - ;oat - All -
Bandwidth (Hz) P'eFlIter
S-pole Butterworth Filter
161-
4-1-
2
2q 8 is t73 2.1 - - OFP A
Bandwidth (Hz)
Figure 3.4 Effect of different narrowband pre-filters onranges. Also plotted are the ranges with the differentiator(Diff), the 1100-Hz highpass filter (Fopt), and the allpassfilter (All). Post-filter is 1/3-octave centered at 1000 Hz.Pre-filter and post-filter slopes are the same. For thecases of 'Diff', 'Fopt', and 'All' these slopes apply onlyto the post-filter.
- 49 -
0,
20 -
18 -
12
2
The pre-filters with a 2-pole transfer function will
produce an effect on the low frequency part of the spectrum
similar to the differentiator. Since the speech spectrum is
dominated by low-frequency components, the ranges are similar
(upper left pannel of Figure 3.4). Other discrepancies from
the general trends appear for speech that is pre-filtered
with a 29 Hz, 6-pole Butterworth filter.
Clipping clearly compresses the amplitude range of
speech. Digitally pre-filtering and clipping produces output
ranges of 8 to 16 dB in the 1000-Hz frequency band, a
reduction of more than 20 dB (Figure 3.4). Across all
frequency bands the range is slightly larger, from 10 to 23
dB (Figure 3.3); that is, clipping decreases the range by 15
to 20 dB across all frequency bands.
3.3.2 Post-filtering
The effect of post-filtering the clipped speech was
determined by varying the bandwidths and slopes of the
post-filter. Post-filters were centered at 1000 Hz and the
bandwidth was decreased from one-third octave (231 Hz) to
1/24 octave (29 Hz). The slopes of the post-filter bandpass
skirts were increased by varying the Butterworth filter
transfer function from 2-pole to 8-pole. Variations in
- 50 -
post-filtering were examined with speech that was
pre-filtered with either the allpass filter, the optimal
filter, or the differentiator. With the differentiator as
pre-filter, only the effect of varying postfilter slopes was
examined; the post-filter bandwidth was fixed at one-third
octave.
Figures 3.5 and 3.6 show that for all pre-filters
examined, variations in post-filter bandwidth and skirt
slopes have relatively small effects on compression. Figure
3.5 presents range results with no pre-filter (i.e. an
allpass filter) as a function of post-filter bandwidth and
skirt slopes. The range of the clipped speech varies from
9-15 dB, with the exception of the measurement with the 29 Hz
bandwidth and 8-pole Butterworth filter. Results with
optimal and differentiator pre-filtering, shown in Figure
3.6, exhibit a similar spread of ranges.
3.4 Spectral Modifications
Speech sounds can be classified into two categories
determined by the presence or absence of voicing. This
distinction is important in the analysis of the spectra of
clipped speech sounds. The spectra of the two types of
speech sounds are analyzed in the following sections.
- 51 -
2-pole Butterworth post-filtercenter frequency=OOOHz
is-
1.-
12
C
4-
2-
4-pole Butterworth post-filter
center frequency=OOOHzI I I I I T
I I I I I
*
*
* *
z 2q sa ils 174 232
Post-ftIter Bandwidth (Hz)
6-pole Butterworth post-filtercenter frequency-COOHz
Is-
1.4
2
*
*
I I
-I
a 29 sa Lis 174 232
Post-Fi[ter Bandwidth (Hz)
3-pole Butterworth post-filtercenter frequency=OOHz
te - I1.8-
1.4H
2VY
_2
S 29 1.1U5 174 232
Post-Filter Bandwidth (Hz)
I I I I I I
*
IUs 174 232
Post-rtiter Bandwidth (Hz)
2q G2
Figure 3.5 Effect of varying the post-filter bandwidth andslopes. Pre-filter is an allpass filter.
- 52 -
18-
16-
Is-14 -
12 -
a--
4-
5-12
*
I I
-
I I
2-pole Butterworth post-filtercenter frequency=l00oHz
I I I II8-
Is-t4 -
12 -
t2-
8-
6-
II - I I I I I L3 2q sa lls 174 232
Post-filter Bandwidth (Hz)
6-cole Butterworth post-filtercenter frequency=1000Hz
7 71 1* i I I I I I i I]
** 0*
*A
2 Us :74 232
Post-Filter Bandwidth (Hz)
4-pole Butterworth post-filter
center frequency=100OHz
1 1 6 1 1. I I II:8
ts
14
:2
:4
12
0
IS -
14 -
12
Z 0~
caH
0 20 S a
' *0
*
a 2q a :15 L74 232
Post-Flter Bandwidth (Hz)
3-pole Butterworth post-filtercenter frequency=lOOHz
I t I I
0
,-Is 17.
Post-4'iter Bandwidth (Hz)
232
Figure 3.6 Effect of varying the post-filter bandwidth and
slopes. Pre-filter is an optimal filter (*) ordifferentiator (o).
- 53 -
0
10 -'4
8 0
2 t
I
H-4
-
222
3.4.1 Spectra of Clipped Unvoiced Speech
It has been shown (Papoulis, 1965; Fawe, 1966) that for
gaussian noise as input, the output power spectrum Gy(w)
after infinite clipping can by determined from the input
power spectrum Gx(w) by the "arcsin law" relating their
respective autocorrelation functions, Ry(t) and Rx(t). Thus,
Ry(t)/Ry(O) = 2/TY arcsinLRx(t)/Rx(O)]
This relation was verified by measuring the spectra of
various noises before and after clipping. The first set of
noises consisted of synthesized tokens of the consonants
/f,sh,s/ and the burst portions of /p,t,k/. Clipping was
done by the analog clipper and spectra were analyzed with the
spectrum analyzer. The power spectrum of each synthetic
consonant was also calculated from the filter transfer
function from which it was generated, and the arcsin law was
applied to these spectra using the FFT.
Figures 3.7 through 3.12 show both the measured spectra
and the spectra calculated from the arcsin law. (Since the
noise bursts were generated with a sampling rate of 10 kHz,
only a 5 kHz range is shown). From the figures it is clear
that the clipped spectra follow the arcsin law.
- 54 -
1000
0
2000 3000
Frequency (Hz)
2000 3000FREQUENCY (HZ)
4000
4000
5000
5000
Figure 3.7 Spectra of the synthetic consonant noise burst, /f/,measured before and after clipping. In the top plot, thesolid line is the ideal input spectrum and the dashed lineis the spectrum calculated from the arcsin law. In the bottomplot, the lower trace is the measured input spectrum and theupper trace is the measured clipped spectrum.
- 55 -
Spectrum of /F/
- clipped
I I I
-20
~i%
a
(nCq,40
0
L4-
4)a-
-60
-70
-80
0 100
-10
-20
-30
U, -40
-50CL.
-60
-70
-80 0
1%
clipp-
0
lows 4000
Frequency (Hz)
1000 2000 3000FREQUENCY (HZ)
4000 5000
Figure 3.8 Spectra of the synthetic consonant noise burst, /sh/.Plots are the same as in Figure 3.7.
- 56 -
Spectrum of /SH/
-' '-,. -clipped
--20
-3
4
a
C
C4~i
+A
-so
0
-10
-20
-30
U -50
-60
-70
-80
- clipped
II I -I
0
I I
-
-
-
0
-10
C
C140
C-
V)
Spectrum of /S/. .............. _ T
400
Frequency (Hz)
0
-10
-20
2 -30
Z-4o
S-50
-60
-70
-801000 2000 3000
FREQUENCY (Hz)41000 5000
Figure 3.9 Spectra of the synthetic consonant noise burst, /s/.Plots are the same as in Figure 3.7.
- 57 -
clipped
M 4 12 - ---
- clipped
e
0
000 3000
Frequency (Hz)
2000 3000FREQUENCY (Hz)
40M0
400
5000
Figure 3.10 Spectra of the syntheticPlots are the same as in Figure 3
consonant.7.
noise burst, p/.
- 58
-101
Spectrum of /P/
-T
- clipped
- --------a4J
in
0
U3-.60
0
-10
-20
-30
~-40
CL1
-60
-70
-80
0
0
100
1000
- clipped
-j
-
Spectrum of /T/
1000
1000
20Me 300
Frequency (Hz)
2000 3000FREQUENCY (4Z)
5000
5000
Figure 3.11 Spectra of the synthetic consonant noise burst, /t/.
Plots are the same as in Figure 3.7.
- 59 -
-101--
1-o
C
0
a-(I)
-clipped
I I II _
-60I-
-80
e
0
-10
-20
clipped
'0-30
I-40-j
-50CA)
-60
-70
-80-.0)
I I I I
400r)
3-1Ca
aC
usecn
-88
Spectrum of /K/
Z0
210
41)
SO
-70
-801000
200 .300
Frequency (Hz)
FREQUENCY (HZ)30002000
4000
4000
5000
500
Figure 3.12 Spectra of the synthetic consonant noise burst, /k/.Plots are the same as in Figure 3.7.
- 60 -
0
10 --
- - - ' - clipped
---- --- --- ----- --7
LL.
I-
U~)
clipped
I I
I I
-60
-/0 -
Ivi
In order to look at relative changes in the clipped
spectrum after varying spectral frequencies and amplitudes,
random noise was filtered with the 1925 General Radio 1/3
octave multifilter. The output was clipped and analyzed with
the H.P. analyzer. Power spectra were also calculated by
varying the parameters of the filter transfer function of the
synthesized consonants and the arcsin law was applied.
(Thus, the calculation was performed using a filter transfer
function different from the one used empirically. This
difference can be seen between the unclipped calculated and
measured spectra in the figures.) Spectra with a single peak
at 315 Hz and 1250 Hz are shown in Figures 3.13 and 3.14.
Spectra with two peaks are shown in Figures 3.15 through
3.19.
Several observations can be made about the spectral
modifications caused by clipping. First, clipping tends to
fill in "valleys" in the spectrum, much as an increased noise
floor would. Second, as expected from the Fourier series for
a square wave, harmonics appear at odd multiples of the
spectral peaks. This is especially noticeable in Figures
3.13 and 3.14 which display a single peak. Third, the
amplitude relation between two spectral peaks after clipping,
- 61 -
I I
I I
---
E I I1~ LE0 2QSS
a-50a-3
o
-7
as
25Q
Frequency (Hz)
-10 r-
-60
-70
0 500 100 FREQueNCY ([A500 2000 2500
Figure 3.13 Spectra of filtered narrowband noise centered at315 Hz measured before and after clipping. The top plot iscalculated from the arcsin law using a different transferfunction from that measured for the input noise. The bottomplot is measured with the spectrum analyzer. The format ofthis plot is the same as in Figures 3.7-3.12.
- 62 -
a
-to
IT
0
-20
-30
-80
- d
'
I Ito"M
Z4
9-50ULU156,
W to" 2U06 3UU 4WQ S" Sees 7095 aBG Iees teee
Frequency (Hz)
-10
-20
-30
S-50
-70
-80
Figure 3.14 As in Figure 3.13 but with input noise centeredat 1250 Hz.
- 63 -
a
CO
0
,C-
U -500Q-
If)
-0SI
-7,
-00
_ I I I I I
- t1--
- clipped
I I I I I I I I
-I clipped --
.................-
I I i
IW 200 =7 = m 10000
- 1 1 1 1 1 _
-to
-2S--
200 _258
Frequency (Hz)
-10
-20
-30
-50
-60
-70 1
500FREQUENCY (RO
Figure 3.15 As in Figure 3.13 but withcentered at 250 Hz and 1250 Hz.
2OW 2UO
input noise peaks
- 64
C0-4.
0
aO
-8sf--
-- - - clipped
L.-0S
a
0--ippe
-80
a
0
-10-
-20-
30
- clipped
a.
(n70-
-M -
-80
0 low 2000 3000 4000 5000Frequency (Hz)
-20
-30
-clipped
-60
-70
800FREQUEICY (HZ
Figure 3.16 As in Figure 3.13 but with input noise peakscentered at 250 Hz and 1250 Hz.
- 65 -
200 300
Frequency (Hz)
30 C
FREQiENCy (Hz)
Figure 3.17centered
As in Figure 3.13 but with input noise peaksat 315 Hz and 1250 Hz.
- 66 -
-Wo-
-201
Cq 40
- .- clipped
H- -------- -------
I.-80
0 I000 4000
-10 1-
OT I I I
- clipped
-I
-20
-30
1-50
-80
5000
50YU LWJ LIAAJ
-1 I
I I I
-70
3 i I
-20-
-30
c --- ,- - -clipped140 - -
a.L
I)
-60
-70-
.. -.
0 1000 2008 3000 4000 SamFrequency (Hz)
0
-10
-20
-30
t-50
-70
-80omu 32(W) 3m)
FRE'UENCY (Hz)
Figure 3.18 As in Figure 3.13 but with input noise peakscentered at 315 Hz and 1250 Hz.
- 67 -
-10
-20
'30
Cq 401
a.C-U)
-60
4090 6000
Frequency (Hz)
0
-10 -
-20 -
-30
9-50
-70
-80 10 1 AJ 2W FREl UNY ( OZ
limo
Figure 3.19 As in Figure 3.13 but with input noise peakscentered at 315 Hz and 1250 Hz.
- 68 -
I-
--- -~clipped
iT-I-
- clipped
i I
tese
when neither is obscurred by distortion components, is nearly
the same as the input (Figures 3.16, 3.18, 3.19). Low-level
peaks can be obscurred by the harmonics of higher-level
peaks, especially if the frequency of the low-level peak is
greater than the frequency of the higher-level peak (Figures
3.15 vs. 3.16 and 3.17 vs. 3.19). The low-level peak can
also be enhanced by the harmonics of higher-level peaks
(Figure 3.15).
3.4.2 Phase Dependence of Clipped Voiced Sounds
For the case of voiced sounds, which consist
components, it is expected that the arcsin law
applicable in general, although it may possibly
useful approximation. In this section the
predicting the clipped spectrum of harmonic
demonstrated.
of harmonic
will not be
provide a
problem in
sounds is
The problem can be seen by constructing a signal that is
the sum of harmonically related sine waves and varying the
phases of the components (Figure 3.20). The three frequency
components were chosen in a one-third octave band with center
- 69 -
frequency of 1000 Hz.
Any prediction of the clipped spectrum based solely on
the input auto-correlation function would be independent of
the input phases. It is clear that clipped spectra are
dependent on the phases of the individual frequency
components in the input (panels A-C of Figure 3.21). The
prediction from the arcsin law is shown is Panel D of Figure
3.21. It will be noted that the gross spectral shape is
rather consistent across the three clipped spectra and
follows the shape predicted by the arcsin law. However, the
validity of the arcsin law approximation in other harmonic
cases is unknown. Given that fine details of the spectrum
are lost as a result of "critical-band" filtering by the ear,
an approximation as good as that shown in Figure 3.21 would
probably be adequate. Determining the goodness of the arcsin
approximation for harmonic signals is a worthwhile but
lengthy task which was not pursued here because of time
constraints.
If the tones in the signal are not harmonically related,
phase shifts of components can be viewed as a shift in the
time orgin, and there should be no effect of phase shifts on
the clipped spectrum. However, this case is not relevant to
speech signals, since the components of voiced sounds are
harmonically related.
- 70 -
VALUES1i
8
BEG = .006400 SEC MID = .01640 SEC ENO = .02640 SEC
Figure 3.20 Differing time-waves caused by phase shifting.The signal was generated by summing three harmonicallyrelated tones with frequencies of 900, 1000, and 1100 Hz,and relative amplitudes of 0, -2, and -5 dB respectively.The three tones (900, 1000, and 1100 Hz) in Panel A havezero phase. The phases of the three tones were respectively0, 180, and 90 degrees in Panel B. The phases were 90,270, and 180 degrees in Panel C.
- 71 -
0
0
0 --
0
0
0
0
0 --
| ||
0
0 VV
r VV
(A)
-8
-16VALUES
16(
(B)
-160VALUES
iC
(C)
-8
-1G6
(A)
2000 3000FREQUENCY (HZ)
4000
I Y I
2000 3000FREQUENCY (HZ)
(C)
4000
(B)(OB)T 10 0
f A
I Will I )I 1AIIHIH I If 1
Jill HIA 1 11991111 i v
_ _ A7-..
2000 3000FREQUENCY (HZ)
(08)1 100 too
qG
80
70
90
so
so
40C40
U
30 V) 30
20 20
to
0 t000 2000 3000
4000
4000
Frequency (Hz)
(D)
Figure 3.21 The phase dependence of harmonically related tones.Panels A, B, and C show the clipped. magnitude spectra withthe three signals in Figure 3.20 as input. Panel D showsthe prediction of the arcsin law applied to the input.
- 72
(DB)-1 100
40
s0
70
SO
40
30
20
I A111111 11
MIR h 1.1 1111 1 li wMITI, I r )j 11 U M 11.11111W HIMIll H
41 LU.A. 11111MM 11
0
I-4
20
to
0
3.4.3 Spectra of Clipped Vowels
In order to observe the effects of clipping on vowel
spectra, measurements were made with steady-state synthetic
vowels. The spectra of the vowels were evaluated with the
ILS spectral analysis package.
The vowels were synthesized using a model in which the
glottal source is passed through a filter designed to model
the vocal tract (Rabiner and Schafer, 1978). Parameters of
the model were varied to change the frequencies and
bandwidths of the formants. In the present implementation,
an impulse was used as a glottal pulse waveform in order to
remove the negative spectral tilt associated with the more
natural waveform. It is easier to evaluate the clipped
spectra after eliminating this low-frequency emphasis.
The vowels were synthesized with a fundamental frequency
of 125 Hz. Thus, the spectrum of each vowel is a sequence of
harmonics spaced 125 Hz apart with spectral-envelope peaks at
specified formant frequencies.
The spectra of clipped synthetic vowels are presented in
Figures 3.22 through 3.39. There are some major differences
between the clipped spectra of vowels and noise. A clipped
vowel spectrum is a sequence of harmonics with the same
- 73 -
inter-component spacing of 125 Hz as the unmodified spectrum,
but with components extending to higher frequencies. As with
noise, harmonics appear at odd multiples of the spectral
peaks. However, there is no counterpart to the "noise floor"
seen with clipped noise.
Clipped harmonics of the fundamental frequency
components can emphasize or de-emphasize formant peaks. For
example, in Figures 3.22 through 3.25, the formant is at 1000
Hz, a multiple of the pitch period, and the clipped spectrum
has peaks only at odd harmonics of this formant. In another
example where the formant of the vowel is higher in frequency
and not exactly at a harmonic of the pitch period (Figures
3.26-3.29, the clipped spectrum contains other spectral peaks
at low frequencies. However, the original formant is
preserved as an absolute maximum in the spectral envelope.
The extent to which the formant is retained in the clipped
spectrum, even when the input spectr'al peak is very shallow,
is quite remarkable (Figures 3.25 and 3.29).
The clipped spectra of vowels with two formants have
peaks at not only odd multiples of the formants but at other
frequencies as well (Figure 3.30). Again, clipping seems to
emphasize individual formants that are difficult to see in
the unmodified spectrum (figure 3.31).
- 74 -
In Figures 3.32-3.34, the two formants (at 1000 Hz and
3000 Hz) are not only harmonically related to the pitch, but
the higher frequency formant is an odd multiple of the lower
frequency formant. After clipping, narrow spectral peaks
appear at only odd harmonics of the formants. Even as the
bandwidths of the formants are varied, no new peaks appear in
the spectrum (as seen in Figures 3.30 and 3.31), and the
spectrum remains unchanged except for the spectral height of
the peaks.
When two formants are not harmonics of the fundamental
pitch frequency, the formants are preserved but the clipped
spectrum contains other spectral peaks (Figures 3.35-3.39).
Similarly to noise, lower-amplitude formants in the spectrum
are obscurred after clipping. (Figure 3.37).
- 75 -
(OB)1
0 1000 2000 3000 4000 500FREQUENCY (HZ)
'A(
0 1000 2000 3000FREQUENCY (HZ)
4000 S000
Figure 3.22before
Spectra of a synthetic steady-state vowel measured(top plot) and after (bottom plot) clipping. The
formant frequency is 1000 Hz and bandwidth is 50 Hz.
- 76 -
00
70
6o
30
20
10
08)100
60
70
60
20
10
0
2000 3000FREQUENCY (HZ)
2000 3000FREQUENCY (HZ)
Figure 3.23 As in Figure 3.22 with a 1000 Hz formant frequency
and a 200 Hz bandwidth.
- 77
(DB)100
A h I I I A
11111 H A V IIIIIIIIII IIIIIII III I)IIIII IIIIIIIII 111IM 11 , IIH IIIII
I IIIII 11 111111 11111111 11 H IM
111111111MI IIII
uvd Jj julifill" tin m 4fW 30
20
10
05000
(OB)100
80
70
so
20
10
0
0
0
4000
40001000 5000
80
70
so
I A
2000 3000FREQUENCY (HZ)
1000
4000
2000 3000FREQUENCY (HZ)
5000
I HitIlHil
11111 1 1 H I 111111 11 1111111 11 IN I I I 111 1 11 1 1 11111111 1 1 1 1 IN 1 11 1 11111
1111 1 111 H ill 11 1 11 111 1111 1 11111
1111 11H W -44M III III1IlUV ' N I W II;
s0
30
20
10
0
(0B)
5000
100
70
so
30
20
to
0
Figure 3.24 As in Figure 3.22 with a 1000 Hz formant frequency
and a 500 Hz bandwidth.
- 78
0
0
100
90
80
11111 .1111 H I IJ 11 H IH A III . III JillA il 1 11 111 11 1111 1 1 1 1 1 11 1 11 11 1 1111 ji I I I Ill 1 111111 H I
1 11 1 1 111, 111 1 1 1111111 H I II I M 1111 1 111111111111H I M 11111H 11111111
111111 1 HIHIIN IIHM
I U I I I- v
2000 3000FREQUENCY (HZ)
1000 2000 3000FREQUENCY (HZ)
4000
(MB)100
W 111 IHII F- U A ' I I !111
I )I 11 11 11 H I I III M
1]~~ ~~~ 111- 1IIII I
30
20
10
0
(OB)- 100
80
70
60
40
30
20
10
00100
Figure 3.25 As in Figure 3.22 with a 1000 Hz formant frequencyand a 1000 Hz bandwidth.
- 79
0
* * li.1
j 11
70
s0
1000
1000
2000FREQUENCY
3000(HZ)
2000 3000FREQUENCY (HZ)
4000
4000
.(OB)
IIIII II. I i I f 1 fva
. I I I I l m
30
20
0
(DB)1100
SO
70
60
20
10
0EQO
Figure 3.26 As in Figure 3.22 with a 2700 Hz formant frequencyand a 50 Hz bandwidth.
- 80 -
0
0
I I________ 1. ,I I. - I__
f ______________
-
1 - - ________
-1 - - - @- - - -- - .. -1, 11 / -- , - - - ------------ -z A I
80
70
s0
I
1 0
I
IliI________1_11
2000 3000FREQUENCY (HZ)
2000 3000FREQUENCY (HZ)
40
20
to
0
Figure 3.27 As in Figure 3.22 with a 2700 Hz formant frequencyand a 100 Hz bandwidth.
- 81 -
DB)100
90
80
70
60
40
30
to
04000 5000
III III 1 'lH 1 if iM H
(OB)1 100
60
70
60
0 1.000 S000
II
01 20il30l
2000 3000FREQUENCY (HZ)
HIM IIIHI I M HIM JI 11111 I I 1 11 j
111111 11111 HIM I 1 11 ji il 11 4111 1 H M I( 11 I 111 1 MW HIC HIIIIIII11 . 1111H II .114 111 11 1,111 11111H IIII. 1111111
( H ill 1 11 11 11 U ll I' ll 1 11 11111 1 111 111Ij I I IM I 11 1 '1 H IP 11 1 lj 11 1 IN I I III
I o
1000 2000 3000FREQUENCY (HZ)
Figure 3.28 As in Figure 3.22 with aand a 500 Hz bandwidth.
2700 Hz formant frequency
- 82
80
70
60
40
30
20
04000 Eooo
(OB)100
80
70
60
SO
40
30
20
10
00 4000 5000
0
0
0
1000
1000
2000FREQUENCY
2000
3000(HZ)
4000
3000FREQUENCY (HZ)
5000
III HIM HINIII IIM I III MI9 1 11 11 H 111 11 1111-11 111 11111 1 1 11111 1 111 Jill I I III Jill
1 111 1 1 11 11 Jill
0 UA-1.1 1) [1 Ij 111111( U l
III HIM30
20
t0
0
(DB)t00
80
70
60
30
20
to
05200 -
Figure 3.29 As in Figure 3.22 with a 2700 Hz formant frequencyand a 1000 Hz bandwidth.
- 83
s0
so
I E A
0
0
2000 3000FREQUENCY (HZ)
2000 3000FREQUENCY (HZ)
4000
4000
100
I I I l
Lai
30
10
0
DB)100
70
60
50
30
20
10
05000
Figure 3.30 As in Figure 3.22. First formant frequency is 1000 Hzwith 50 Hz bandwidth. Second formant frequency is 2000 Hz with50 Hz bandwidth.
- 84 -
s0
70
60
ilt(
A
T. IN III I I I 111 J I I fl 11 11 11111 11
11 1 11 IN 11 J llu l I 1 1111111111. 11111 11 1 11P III 1 11
11 HIM I I I], r4 i 4 1111 I 1111i 11 1)11 1 11111 11,11MIll
Ili Ilim , 1 1 11 1 111 1) 111111 1 .. 11 1111
.1. HIP P, I I III HIM 111111
I I JIM Mill Hil I
0 1000 2000 3000 4000 SooFREQUENCY (HZ)
IN I 111 L
II II II I t
IIIIIIIIIII I H 111111 I I I i ill 111111 1 11 H I h ill 11
11111 11 ll 11.1 U1111 i I IIIIH I 11M 11 M ill 1. H il
j 1 1 1 ' Jill i ll 1111I I 1 1111111 1 i llHil 0 Hill L ill
2000 3000FREQUENCY (HZ)
4000
100
60
70
30
10
0
100
70
60
E000
10
0
Figure 3.31with 500
As in Figure 3.22. First formant frequency is 1000 HzHz bandwidth. Second formant frequency is 2000 Hz with
500 Hz bandwidth.
- 85 -
0
A1
0 1000 2000 3000 4000 5000FREQUENCY (HZ)
___(
1000 2000 3000FREQUENCY (HZ)
4000
00
so
70
60
30
20
10
0
OB)100
630
70
60
30
20
0S000
Figure 3.32with 50
As in Figure 3.22. First formant frequency is 1000 HzHz bandwidth. Sencond formant frequency is 3000 Hz with
500 Hz bandwidth.
- 86 -
1000
1000
t
\ 0sL.Jj
100
I H IIII I IH I I
0Sao02000 3000
FREQUENCY (HZ)
2000 3000FREQUENCY (HZ)
4000
(DB)1 100
s0
70
s0
40
S000
20
10
0
Figure 3.33 As in Figure 3.22. First formant frequency is 1000 Hz
with 100 Hz bandwidth. Second formant frequency is 3000 Hz with
100 Hz bandwidth.- 87 -
0
0
30
20
to
--------- ,
.~ .. III 11 11 111M INIIII
a 1000 2000 3000 4000 S00FREQUENCY (HZ)
- -
100 2000 3000FREQUENCY (HZ)
4000
0
Figure 3.34with 500
As in Figure 3.22. First formant frequency is 1000 HzHz bandwidth. Second formant frequency is 3000 Hz with
50 Hz bandwidth.- 88 -
(Ut:$)1 100
90
70
60
30
10
0
(OB)100
80
70
60
40
30
20
10
00
__A I I___ ___
0 1000 2000 3000 4000 SOOFREQUENCY (HZ)
IH I~ II
1000 2000 3000FREQUENCY (HZ)
4000 5000
re 3.35with 5050 Hz ba
As in Figure 3.22. First formant frequency is 1000 HzHz bandwidth. Second formant frequency is 2700 Hz withndwidth.
- 89
100
s0
40
30
20
10
0
OB)100
70
30
50
30
20
to
00
Figu
2000 3000FREQUENCY (HZ)
I A
jt U
H IM I 11 Ill I I W , J 11 J I I I I I I I I
111111 IIJ 11,11 d I I I I I I J M illi I ji H II
11111111111 IN l I Ill H ill H I L IIIIIII 11111111 M ill
2000 3000FREQUENCY (HZ)
Figure 3.36 Aswith 500 Hzwith 500 Hz
in Figure 3.22. First formant frequency is 1000 Hz
bandwidth. Second formant frequency is 2700 Hz-
bandwidth.
- 90 -
\ 1- -
100
AI I A
11111 11 1 H ill M ill
4 11111 11 11111 Jill
60
20
10
50000
0
1000
1000
4000
4000
OB)100
80
70
s0
20
10
0E000
80
2000 3000FREQUENCY (HZ)
2000 3000FREQUENCY (HZ)
Figure 3.37with 50
As in Figure 3.22. First formant frequencyHz bandwidth. Sencond formant frequency is
is 600 Hz2300 Hz
with 500 Hz bandwidth.
- 91
0
1 100
I- J1111 H 11 11 IN I I j I I 1 11.1
I I I U 111 11 (1 IRM O- 11 1 11111 11 1 111111 1111H
I 1 1 114 1 111 1 11111111 1 1 Iffl - t 1 111111 111 1111A I 1 9 111 1.1 U 1111 ti 11 1i 11 1 1 1111111111 1 1
p I P I I I l' I
30
20
10
0S000
(B)100
70
630
30
20
10
05000
4000
4000
- - - I 4 1'i
0 1000
80
70
6o
2000FREQUENCY
3000(HZ)
4000
(
I I HIM 1 A I
... 1 M l I 1 1 l 1 1 1 A l 1 1 1 1 1 : 1
2000 3000FREQUENCY (HZ)
4000
Figure 3.38with 100with 100
As in Figure 3.22. First formantHzHz
frequency is 600 Hzbandwidth. Second formant frequency is 2300 Hzbandwidth.
- 92
0
I W 11 IIM 111
911 1111 - 111 5
J1 i
IIIHIIII I MU HIM IPH 111W40101 HIM 11111 IIIH IIII...11
s0
so
.30
10
0
OB)100
s0
30
20
10
00
80
500
2000 3000FREQUENCY (HZ)
2000 3000FREQUENCY (HZ)
Figure 3.39with 500
As in Figure 3.22. FIrst formant frequency is 600 HzHz bandwidth. Second formant frequency is 2300 Hz
with 50 Hz bandwidth.
- 93
I I AI(
11
4000
DB)100
s0
30
20
10
(DB)100
60
so
30
20
10
0
0
(0 4000 5000
H ill I 1 1111 11 11 1 1
IJ 111 11 11 H 11 U I Ill
i IN A wiftill 1 111'
I
Chapter 4
Discussion
The present study was designed to assess infinite peak
clipping as a means of reducing the amplitude variations in
speech levels. The study was based on the following two
assumptions about hearing and the perception of speech. The
first is that the ear analyzes sound into "critical-band"
filters, the outputs of which are relevant to the perception
of loudness. The second assumption is that the primary cues
to intelligibility lie in the spectral (magnitude) patterns
of speech. The investigation was designed to answer
questions based on these assumptions: what are the effects
of post-filtering speech after infinite-peak clipping, and
how can spectral distortion be minimized? The goal was to
determine if an amplitude compression system based on
infinite peak clipping could be designed which would give
maximum compression and minimum spectral distortion, thereby
maximizing intelligibility for individuals with sensorineural
hearing loss.
- 94 -
4.1 Compression Results
The general result from the study of compression was the
insensitivity of the amplitude range of clipped speech to the
characteristics of the filters that preceded and followed the
clipper. Regardless of which pre-filters and post-filters
were used, clipping reduced the range of speech by at least
15 to 20 dB. Output ranges were on the order of 10 to 20 dB.
The general trends of the effects of pre- and
post-filtering clipped speech can be understood qualitatively
with concepts discussed by DeGennaro et al. (1981). The
fluctuations in a bandpass speech envelope can be modeled as
consisting of slow and rapid components. The rapid
components result from the narrowband filtering (the rate of
fluctuations are on the order of the bandwidth) and account
for 10-15 dB of the measured range. The slow component (with
fluctuations on the order of a few per second) results from
variation in the overall intensity and in spectral
distributions across different speech sounds. The slow
component accounts for about 30 dB of the range. To a first
approximation, the effect of infinite peak clipping on the
range measured after narrow post-filtering is to remove the
slow component. An example is a third-octave band of noise
that is slowly modulated over a large decibel range, then
clipped and post-filtered by a third-octave filter with the
- 95 -
same center frequency. The zero-crossing pattern is
unaffected by the slow modulation, so the clipped noise will
exhibit the same envelope variations as if there were no slow
modulations.
The preceding analysis applies to the case where the
post-filter is in the spectral region that is dominant at the
input to the clipper. If this dominant region changes over
time, variations in the clipped spectral distributions will
contribute to the measured output range.
This discussion aids in interpreting the finding that
with no pre-filtering the 16 band ranges were slightly
smaller than with a highpass pre-filter (the optimal filter
or differentiator), or a bandpass pre-filter (formant
filters). With no pre-filtering, the input speech waveform
(and hence the sequence of zero-crossings) is dominated by
the low-frequency first formant. The harmonics in the higher
frequency regions will be relatively constant across the
various speech sounds after clipping. If the first-formant
is reduced in amplitude by highpass filtering, the input
zero-crossing sequence will exhibit greater variability,
resulting in greater variation in the clipped spectrum across
the different speech sounds.
- 96 -
It was also observed that ranges were reduced further if
the pre-filter bandwidth was narrowed to one-third octave or
less. This effect can be understood by considering the case
of a very small pre-filter bandwidth. The input to the
clipper will approximate a tone with slowly varying amplitude
and frequency modulation. Since the clipped signal will be
frequency modulated with constant amplitude, the output of
the third-octave filter will also be a tone with slow
frequency modulation and an amplitude modulation resulting
from FM-to-AM conversion by the post-filter. Thus, the
narrower the input band, the less frequency modulation and
consequently the less amplitude modulation there will be
after post-filtering.
The finding that the clipped range is not strongly
dependent on the bandwidth or slope of the post-filter
inplies that the exact characteristics of the ear's
critical-band filters need not be known in order to make
meaningful measurements of range.
It is interesting to speculate on whether there is a
minimimum range of speech levels that can be achieved after
(auditory) post-filtering by any compression system. When
one considers that the envelope ranges of narrowband noise
and harmonic complexes are on the order of 10-15 dB, one must
wonder how the variations in the speech envelope could be
- 97 -
further reduced. It would seem that further reduction of
these envelope variations could be achieved only in special
cases.
4.2 Comparison with AGC Systems
Krieg (1980) studied multiband amplitude compression
achieved by independent AGC in each band. The input speech
was CVC lists spoken by a male talker, and the compression
ratios in the 16 frequency bands were 3 (with a deviation of
approximately 0.5 in channels 11 and 16). The input
third-octave ranges were 25 to 42 dB, averaging 36.8 dB
across all 16 channels (Krieg, 1980, Figure 9). After
compression, the ranges were 14 to 38 dB, averaging 24.3 dB
across all frequency bands. The actual amount of compression
was approximately 12 dB (Krieg, 1980, Figures 11 and 15) and
the effective compression ratio was approximately 1.5.
Infinite peak clipping, in the present study, reduced
the range of speech by approximately 20 dB, 8 dB more than
the AGC system used by Krieg (1980). Assuming that greater
intelligibility results from presenting a greater range of
speech within the residual hearing area of the listener, then
clipping should allow superior intelligibility results when
the listener's dynamic range is less than about 20 dB.
- 98 -
4.3 Spectral Modifications
The spectral pattern of clipped unvoiced sounds can be
predicted from the arcsin law (Section 3.4). After clipping,
spectral peaks will appear at odd multiples of the original
spectral peaks. Clipping spreads the spectrum, filling in
the low-level region much as a noise floor would. Clipped
spectra with two formants have harmonic peaks at not only odd
multiples of the formants but at combination frequencies.
The lower-frequency formant, when stronger, tends to obscure
the higher-frequency formant.
The clipped spectra of voiced sounds are dependent on
the phases of the input frequency components, so the arcsin
law cannot be aplied. Except for the fact that the clipped
harmonic spectrum contains frequency components with the same
fundamental spacing as the unmodified spectrum (i.e. there
is no counterpart to the "noise floor" of unvoiced sounds),
the general results are similar with voiced and unvoiced
sounds. Spectral peaks which appear at odd multiples of the
formants and combination frequencies after clipping will
obscure the low-level formants.
A remarkable aspect of clipping vowels is the degree to
which spectral peaks can be enhanced. This finding suggests
that infinite peak clipping could be exploited in schemes for
- 99 -
formant detection and tracking.
4.4 Implications for Hearing Aid Design
As stated above, the present study sought to determine
the effects of infinite peak clipping in order to minimize
both output ranges in critical-band-like filters and spectral
distortions. It was found that output ranges are relatively
insensitive to post-filtering and slightly dependent on
pre-filtering. If minimizing output ranges were the only
criterion, there would be either no pre-filtering or
narrowband pre-filtering. However, pre-filters like Thomas
and Niederjohn's (1968) 1100-Hz highpass filter and
Hildebrant's (1982) formant filters produced ranges only a
few dB larger than no pre-filter and are known to produce
greater intelligibility. The differentiator pre-filter gave
the largest ranges of the pre-filters investigated.
The present results on spectral distortions will help to
determine a choice of pre-filter. Even though the narrowband
filters produce the greatest amount of compression, the
relative spectral qualities among frequency bands would be
lost in such a system. For multiple-formant sounds it would
seem that a pre-filtering scheme that separates the formants,
like Hildebrant's (1982), would be beneficial for preserving
the formants (without losing the relationship between bands)
- 100 -
and avoiding interactions in the clipper. However, for
single-peak sounds, this scheme may result in spurious output
peaks, or a less well-defined peak than in the input. In
this respect, a single-channel clipper with highpass
pre-filtering might prove superior.
Of course, the particular loss of the individual should
also be considered. For example if the individual depends
only on first formant cues, and has virtually no second
formant hearing, then pre-filtering with an optimal filter
will hinder intelligibility by decreasing the saliency of the
cues.
4.5 Recomendations for Future Work
Future work should test intelligibiltiy with a 3-band
clipper and optimal filter/clipper systems on
hearing-impaired listeners with small dynamic ranges.
Ideally, results could be compared with performance on other
amplitude compression systems using the same subjects.
Examination of the types of speech errors for both
hearing-impaired and normal subjects should help to determine
the next course of action.
- 101 -
REFERENCES
Braida, L.D., et al. (1979). "Hearing Aids--A review ofPast Research on Linear Amplification, AmplitudeCompression, and Frequency Lowering," ASHA Monograph No. 19
Coln, M.C.W. (1979). "A Computer Controlled MultibandAmplitude Compressor," Master's Thesis, Mass. Inst. Tech.,Cambridge, MA.
De Gennaro, S., Braida, L.D., Durlach, N.I. (1981). "AStatictical Analysis of Third-Octave Speech AmplitudeDistributions," Paper presented at the 101st meeting of theAccoustical Society of America, Ottawa, Ontario, Canada
De Gennaro, S. Krieg, K.R., Braida L.D., Durlach, N.I.(1981). "Third-Octave Analysis of Multichannel AmplitudeCompressed Speech," Proceedings of IEEE Inter. Conf. onAcoust., Speech and Signal Processing, Atlanta, Georgia
De Gennaro, S. (1982). "An Analytical Study of SyllabicCompression for Severely Impaired Listeners," PhD Thesis,Mass. Inst. Tech., Cambridge, MA.
Dunn, H.K., White, S.D. (1940). "Statistical Measurementson Conversational Speech," JASA 11, pp278-288
Fawe, A.L. (1966). "Interpretation of Infinitely ClippedSpeech Properties," IEEE Trans. on Audio and.Electroacoustics 14, ppl78-183
Green, D.M., Swets, J.A. (1966). Signal Detection Theoryand Psychophysics, (John Wiley and Sons, New York)
Guidarelli, G. (1981). "Intelligibility and NaturalnessImprovement of Infinitely Clipped Speech," Acoustics Letters4, No.7
Hildebrant, E.M. (1982). "An Electronic Device to Reducethe Dynamic Range of Speech," Bachelor's Thesis, Mass.Inst. of Tech., Cambridge, MA
IEEE (1969). "Recommended practice for speech qualitymeasurements." IEEE Trans. Audio Electroacoust. 17,pp225-246
- 102 -
Kac, M., Siegert, A.J.F. (1947). "On the Theory of Noisein Radio Receivers with Square Law Detectors," J. AppliedPhysics 18, pp383-397
Krieg, K.R. (1980). "Third Octave Band Level Distributionsof Amplitude Compressed Speech," Bachelor's Thesis, Mass.Inst. of Tech., Cambridge, MA.
Licklider, J.C.R. (1946). "Effects of Amplitude Distortionupon the Intelligibility of Speech," J. Acoust. Soc. Am.18, pp429-434
Licklider, J.C.R., Bindra, D., Pollack, I. (1948). "TheIntelligibility of Rectangular Speech-Waves," Am. J.Psychology 61, ppl-20
Licklider, J.C.R., Pollack, I. (1948). "Effects ofDifferentiation, Integration, and Infinite Peak Clippingupon the Intelligibility of Speech," JASA 20, pp42-51
Papoulis, A. (1965). Probability, Random Variables andStochastic Processes, (McGraw-Hill, New York)
Pollack, I. (1952). "On the Effect of Frequency andAmplitude Distortion on the Intelligibility of Speech inNoise," JASA 24, pp538-540
Rabiner, L.R., Schafer, R.W. (1978). Digital Processing ofSpeech Signals, (Prentice-Hall, New Jersey)
Thomas,. I.B., Niederjohn, R.J. (1970). "TheIntelligibility of Filtered-Clipped Speech in Noise," J.Audio Engin. Soc. 18, pp299-303
Thomas, I.B., Sparks, D.W. (1971). "Discrimination ofFiltered/Clipped Speech by Hearing-Impaired Subjects," JASA49, ppl881-1887
- 103 -
Appendix A
Probability Densities of Test Signals
Narrow-band Noise
The probability density for the random variable, V,
generated by squaring and then lowpass-filtering a band of
noise centered at a high frequency was calculated by Kac and
Siegert (1947) to be
00 exp(-V/) s)P(V) = ----------------- V>O (Al)
s (1 i3s)
where i is the ith eigenvalue in the solution of an
integral equation. For the specific case where the input
noise spectrum is that of a "simple-tuned" circuit and the
lowpass filter is a simple RC-lowpass filter, the
eigenvalues are found from the zeroes of the (R-1)th order
Bessel function, where R is the ratio of the input noise
bandwidth to the lowpass filter's -3 dB bandwidth.
J L2(R/, i) J = 0. (A2)
R-/
- 104 -
Probability density functions were computed for values of R
between 0 and 1 (Figure Al), and values between 1 and 8
(Figure A2). The case of R=0 corresponds to that of a
Rayleigh random variable squared, or an
exponentially-distributed variable. The range was computed
to be the decibel difference between the the two values, V10
and V90, at which the cumulative distribution function is
equal to 0.10 and 0.90, respectively. The range values are
graphed in Figure 2.2.
Tones
An input sine wave with unit amplitude and cyclic
frequency f=w/2T( is considered in this section. Squaring
and lowpass filtering of this signal results in the variable
1 - /H(2f)//cos(4Thft)V =----------- --------------------- (A3)
2
where
H(f) = 1/ 1+RA
and R is the ratio of f to the -3 dB bandwidth of the
RC-lowpass filter.
- 105 -
Because of symmetry in the time-wave, the calculations
can be limited to the range for c^'= 47'fft of 0< /<17/. The
cumulative distribution function for V is found by
integrating the probability density function on o/ (which is
assumed uniformly distributed) between the limits determined
by the inverse function,
(1-2V)cX =F (V) = cos L ------------ ] . (A4)
/H(2f)/
The cumulative distribution function is derived below.
Pr(V< Vo) = Pr(c/< F (Vo))
/v -
= arccos L(1-2V)// H(2f)/ I (A5)
The two values, V10 and V90, at which the cumulative
distribution is equal to 0.10 and 0.90, were calculated from
equation A5. The range, which is the decibel difference of
these two values, is graphed as a function of R in Figure
2.3.
- 106 -
2.0 -
.8
0.0
~~1
R 1/21/41/80 (exponential density)
t-
0 1 2 3 4V
Figure Al Probability density functions for narrowband noiseafter squaring and lowpass filtering for R less than 1.R is the ratio of the input noise bandwidth to the lowpassfilter's -3dB cutoff.
- 107 -
I
.41
2 .0 -
1.8 R8--1R -- 8
/ \I \
1.2 -x 4 -1
>\ 214
0 1 2 3 4V
Figure A2 Probability density functions for narrowband noiseafter squaring and lowpass filtering for R greater -than orequal to 1.
- 108 -
APPENDIX B
10 Harvard sentences used as input for experiments. Speakeris male.
1. A large size in stockings is hard to sell.
2. The birch canoe slid on the smooth planks.
3. Glue the sheet to the dark, blue background.
4. It's easy to tell the depth of a well.
5. These days a chiken leg is a rare dish.
6. Rice is often served in round bowls.
7. The juice of lemons makes fine punch.
8. The box was thrown beside the parked truck.
9. The hogs were fed chopped corn and garbage.
10. Four hours of steady work faced us.
- 109 -