INFINITE PEAK CLIPPING OF AMPLITUDE COMPRESSION

EVALUATION OF

INFINITE PEAK CLIPPING

AS A MEANS OF AMPLITUDE COMPRESSION

by

Karen Ann SillettoIf

B.S.E.E., University of California at Davis(1982)

Submitted to the Department ofElectrical Engineering and Computer Science

in Partial Fulfillment of theRequirements for the

Degree of

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

September 1984

Massachusetts Institute of Technology 1984

Signature of AuthorDepartment of Electxtial Engineeringj aad C6mputer Science

July 13, 1984

Certified by:

Accepted by:

Dr. P.M/ Zurek, Thesis Supervisor

A.C.Smith, ChairmanDepartmental Committee on Graduate Students

ivASSACHUSEdiS NSTTUTEOF TFCNOLoGY

OCT 0 4 1984LIBRA1IES

ARCHIVES

EVALUATION OF

INFINITE PEAK CLIPPING

AS A MEANS OF AMPLITUDE COMPRESSION

by

KAREN ANN SILLETTO

Submitted to the Department of Electrical Engineering andComputer Science on July 13, 1984 in partial fulfillment

of the requirements for the Degree ofMaster of Science in Electrical Engineering

ABSTRACT

Infinite peak clipping is a simple and effective

technique for processing speech prior to transmission over

an analog channel with limited dynamic range. However, the

technique has yet to be applied seriously to the problem of

limited auditory dynamic range accompanying sensorineural

hearing loss. In this thesis, an attempt was made to

evaluate the potential for infinite peak clipping as a means

of amplitude compression. Measurements were made of the

compression of the range of speech levels when clipping is

followed by post-filtering (as is presumed to occur when

clipped speech is analyzed by the ear). Spectral

distortions caused by clipping were also assessed.

- 2 -

The widths of level distributions of clipped and

filtered speech were found to be relatively independent of

the characteristics of both the pre-filter and the

post-filter. Measured ranges between the 10% and 90%

cumulative levels were about 10-15 dB as compared to the

input ranges of 30 to 40 dB.

The clipped spectra of unvoiced speech sounds can be

predicted analytically. The clipped spectra of voiced

sounds cannot be easily predicted and so several cases were

examined empirically. In general, despite the radical

distortion of the input time wave, spectral distortions are

not severe. Pre-filtering schemes designed to preserve

relevant cues in the clipped spectra are discussed. It is

concluded that infinite peak clipping as a means of extreme

amplitude compression deserves detailed study with

hearing-impaired listeners.

Thesis Supervisor: Dr. Patrick M. ZurekTitle: Research Scientist, Research Laboratory ofElectronics

- 3 -

ACKNOWLEDGMENT

First, I wish to thank Pat Zurek for his continued

guidance and support on this project. I would also like to

thank all the members of the Communications Biophysics Group

for helping to make the laboratory such an enjoyable place

to work. Although I cannot mention all the people who have

been wonderfully supportive and encouraging, I would like to

thank my office-mates, Dan Leotta, and Jean Reid (not

forgetting Mineola Minos), for making the office such a home

away from home. Also, special thanks to Dan Leotta, Mike

McConnell, and Diane Bustamante for their extra help when I

really needed it. Finally, I would like to thank my parents

and family for their love, and continued support.

- 4 -

TABLE OF CONTENTS

ABSTRACT ......................

ACKNOWLEDGEMENTS ..............

TABLE OF CONTENTS .............

LIST OF FIGURES ...............

CHAPTER 1 INTRODUCTION .......

1.1 Problem Definition

1.2 Previous Research

1.3 Rationale for Pres

CHAPTER 2 METHODS ............

ent Study

......... 0

2.1 Inputs

2.2 Filtering and Clipping

2.2.1 Pre-filters ...

2.2.2 Clippers ......

2.2.3 Post-filters ..

2.3 Dynamic Range Analysis

2.3.1 Level Detection

2.3.2 Probability Distribu

2.4 Spectral Analysis .........

2.4.1 H.P. Spectrum Analyz

2.4.2 ILS Spectral Analysi

..2

..4

..5

..7

..9

..9

12

16

18

18

20

20

21

21

... . 24

... . 24

tion Analysis .25

............... 26

er .............. 26

s ..............27

2.5 Control Measurements and SystemEvaluation...................... .......... 28

- 5 -

........... a 6

.... a

.

.

.

2.5.1 Evaluation of the Level-Distribution

Measurement System....................

2.5.2 Determination of Sample Size forSentences........................

CHAPTER 3 RESULTS ..............................

3.1 Comparison of Distributions in 16Frequency Channels..................

3.2 Comparison of Digital and AnalogImplementations.....................

3.2.1 Digital vs. Analog Sentences

3.2.2 Digital vs. Analog Processin

3.3 Compression Results ...............

3.3.1 Pre-filtered Speech ........

3.3.2 Post-filtered Speech .......

3.4 Spectral Modifications ............

3.4.1 Spectra of Clipped Unvoiced

3.4.2 Phase Dependence of ClippedSounds......................

3.4.3 Spectra of Clipped Vowels ..

CHAPTER 4

4.1

4.2

4. 3

4.4

DISCUSSION ........................

Compression Results .............

Comparison with an AGC System ...

Spectral Modifications ..........

Implications for Hearing Aid Desi

37

37

...... 40

...... 40

g ..... 43

....... 45

... . 45

....... 50

....... 51

Speech 54

Voiced....... 69

....... 73

... .94

... . 95

... . 98

... .. 99

.100

4.5

REFERENCES

APPENDIX A

APPENDIX B

Recommendations for Future Work ......... 101

. . . . . .. . . . ......................102

Probability Densities of Test Signals .... 104

10 Harvard Sentences ..................... 109

- 6 -

a .28

.32

gn

LIST OF FIGURES

1.1 A running spectrum of a two-word utterance, beforeand after infinite peak clipping (Taken from Licklideret al., 1948)..... ....................................... 15

2.1 System Block Diagram................................. 19

2.2 Measured and calculated ranges for narrowband noiseafter squaring and lowpass filtering..................... 30

2.3 Measured and calculated ranges for tones after squaringand lowpass filtering with an RC-lowpass filter.......... 31

2.4 Ranges in the 1/3-octave band centered at 1000 Hz as afunction of'sentence sample size......................... 34

2.5 Ranges across 13 frequency channels comparing inputsof 10 and 20 digital Harvard sentences.................... 36

3.1 Ranges across 13 frequency bands for both the clippingand nonclipping conditions................................. 38

3.2 Clipped and nonclipped ranges comparing inputs ofdigital and analog Harvard sentences...................... 42

3.3 Effect of different pre-filters on ranges in 13frequency bands.......................................... 47

3.4 Effect of different narrowband pre-filters on rangesin the 1000 Hz frequency band............................. 49

3.5 Effect of varying the post-filter bandwidth and slopes.Pre-filter is an allpass filter........................... 52

- 7 -

3.6 Effect of varying the post-filter bandwidth and slopes.Pre-filter is an optimal filter and differentiator....... 53

3.7-3.12 Spectra of synthetic consonant noise burstsbefore and after clipping................................

3.13-3.19 Spectra of filtered noise before and afterclipping.................................................

3.20 The phase dependence of clipped voiced sounds.Phase shifted time-waves.................................

3.21 The magnitude spectra of phase shifted time-waves...

3.22-3.39 Spectra of synthetic vowels before and afterclipping.................................................

55

62

71

72

76

Al Probability density functions for narrowband noise aftersquaring and lowpass filtering for R less than 1........ 107

A2 Probability density functions for narrowband noise aftersquaring and lowpass filtering for R greater than 1..... 108

- 8 -

Chapter 1

INTRODUCTION

1.1 Problem Definition

Sensorineural hearing impairments are characterized by

elevated hearing thresholds without a corresponding

elevation in the discomfort threshold. This reduction in

dynamic range, if severe enough, can necessitate a

compromise between speech intelligibility and comfort.

Consequently, amplitude compression of speech has been

extensively studied as a means of matching the dynamic range

of speech and other acoustic signals to the reduced dynamic

range of the impaired listener.

The most common means of amplitude compression has been

automatic gain control (AGC) in which the gain of -an output

amplifier is controlled by an estimate of the input

amplitude. Results of studies of AGC in hearing aids are

reviewed by Braida et al. (1979). While there are clear

benefits and applications for such systems, there are also

limitations in the hearing-aid application. For instance,

due to the "attack time" of an AGC system, the effectiveness

in protecting the user from high-level transient sounds is

limited. Another limitation is that the actual degree of

- 9 -

compression is less than that specified (DeGennaro et al.,

1981).

Infinite peak-clipping is a simple method of amplitude

compression that drastically reduces the dynamic range of

the speech signal by clipping the input time-wave so that a

rectangular wave of constant amplitude results. Only the

zero-crossings of the original waveform are preserved.

Since the output (envelope) amplitude is constant, infinite

peak clipping produces maximal range reduction of the

wideband signal.

The obvious cost of this range compression is severe

waveform distortion. However, it has been found that,

despite this extreme distortion, infinitely-clipped speech

can be highly intelligible (Licklider, 1946; Licklider,

Bindra, Pollack, 1948). Thus, it would seem that peak

clipping should be considered as a means of amplitude

compression in hearing aids and also perhaps cochlear

implants and tactile aids, since it achieves large

compression of the amplitude range of speech with relatively

little loss in intelligibility.

Though clipping produces a constant-amplitude wideband

signal, the perceived loudness may not be constant because

the ear effectively filters incoming sounds into frequency

bands (critical bands) whose outputs are believed to be

- 10 -

relevant to loudness (Green and Swets, 1966, Chapter 10).

The amplitude range of speech after post-filtering has not

yet been studied and clearly needs to be understood in order

to evaluate clipping as a means of amplitude compression.

The effects of clipping on important cues for speech

intelligibility are also not well understood. In

particular, while it is generally accepted that the

short-time spectral amplitude pattern of speech is

important, changes in this pattern resulting from infinite

peak clipping are not understood. It is possible that, even

though wideband clipping produces fair intelligibility,

multiple narrowband clippers could better preserve spectral

amplitude patterns and consequently increase

intelligibility.

The general problem addressed in this thesis is how

infinite peak clipping might be used to compress the

amplitude range of speech for the hearing-impaired. More

specifically, the thesis will investigate the influence of

pre- and post-filtering on the amplitude distributions and

spectral patterns of clipped speech. The range of

filtered-clipped-filtered speech will be measured and the

spectra of clipped waveforms will be analyzed in order to

evaluate the potential for clipping as a means of amplitude

compression, as well as to understand past studies which

- 11 -

examined the intelligibility of clipped speech.

1.2 Previous Research

Peak clipping was first introduced in radio

transmitters during WWII as a means for maximizing the

intelligibility over a transmission channel with limited

dynamic range. Licklider (1946) studied different types of

amplitude distortion and concluded that peak-clipping was

the least detrimental to intelligibility. Licklider's

experiments showed that since infinite peak clipping

preserves only the zero crossings of the input signal, the

pattern of instantaneous amplitudes in the speech waveform

is not essential for intelligibility.

Pollack (1952) confirmed Licklider's results and showed

that the intelligibility of infinitely-clipped speech,

relative to unmodified speech, is a function of SNR (with

signals equated in terms of peak amplitude and with noise

added after the clipper) and roughly independent of the

frequency range of the speech signal. Clipped speech is

more intelligible than normal speech at low SNR's because

the square speech has more power per unit peak amplitude

than normal speech. At high SNR's, distortion introduced by

the clipper degrades intelligibility relative to unmodified

speech.

- 12 -

Intelligibility of clipped speech is improved by

highpass filtering the speech before clipping (Licklider,

Bindra, Pollack, 1948; Pollack, 1952; Thomas and

Niederjohn, 1970; Thomas and Ravindran, 1971). One

interpretation of this finding (Thomas and Niederjohn, 1970)

is that a highpass pre-filter with cutoff at 1100 Hz reduces

high-amplitude, low-frequency signals and results in better

preservation of the second formant, which is very important

for intelligibility.

There has been one investigation of clipped speech with

hearing-impaired listeners. Thomas and Sparks (1971)

compared the intelligibility of filtered-clipped speech with

linearly-amplified speech by testing 17 ears of 16

hearing-impaired subjects. For 13 of the 17 cases, the

filtered-clipped speech was more intelligible than

linearly-amplified speech with the two types of speech

equated in terms of their overall SPLs. However, these

results are open to the criticism that the linear reference

condition may have been less than optimal.

Hildebrant (1982) investigated a multiband clipping

system that filtered the incoming speech into frequency

regions corresponding to the ranges of the first three

formants. The frequency bands were individually clipped,

filtered again and then summed together. A reduced dynamic

- 13 -

range was created for normal listeners by adding white noise

after processing and restricting the wideband signal to be

below an artificial "discomfort threshold". Intelligibility

with clipped speech was much better than with

linearly-amplified speech. A similar system was studied by

Guidarelli (1981), who found that intelligibility was

greatest with 4 octave-wide or 6 two-third-octave-wide

channels.

The general conclusion that can be drawn from past

studies is that infinite peak clipping reduces the dynamic

range of wideband speech while maintaining a fair degree of

intelligibility. This fundamental result, that the

"amplitude information" in a speech waveform can be

discarded without a drastic degradation of intelligibility,

initially appears quite surprising. However, the key to

understanding this result lies in the realization that

"amplitude information" is also coded in the zero-crossings

of a wave. Although it is not at all obvious from

inspection of the wave, the spectra of clipped signals are

often very similar to the spectra of unclipped signals.

Licklider, Bindra, and Pollack (1948) clearly

emphasized the importance of examining the spectral changes

caused by clipping in order to understand the relatively

small effect of clipping on intelligibility. Figure 1.1,

- 14 -

SW -- 4E SEN --------- CHrIME--m

NO CLIPPINGI..

I.-

Figure 1.1 A running spectrum of a two-word utterance,before and after infinite peak clipping (taken from

Licklider et al., 1948).

- 15 -

SH----- E 8 EN-----------;H

INFINITE PEAK CUPPING

taken from their paper, shows a running spectrum (a 3-D plot

of amplitude vs. time and frequency) of a two-word

utterance before and after infinite peak clipping. From

this frequency analysis it is clear why clipping has little

effect on intelligibility - the spectral patterns of clipped

and unclipped speech are not drastically different.

1.3 Rationale for Present Study

The rationale for studying the effect of pre- and

post-filters on the range of clipped speech and the spectral

distortions caused by clipping follows from an underlying

model and set of assumptions about the operation of the ear

and the perception of speech. First, it is generally

believed that the ear analyzes incoming speech into

frequency bands that are approximately one-third of an

octave wide. In the present study, it is assumed that the

quantity relevant to dynamic range is the envelope of these

bandpass signals. A major portion of this thesis addresses

the question: what are the effects of post-filtering on the

envelope distributions of infinitely-clipped speech waves?

- 16 -

A second important assumption is that the primary cues

to intelligibility lie in the spectral (magnitude) patterns

of speech. In particular, it is assumed that local maxima

in the spectra (formants) are vital to intelligibility. The

effects of clipping on these patterns will be studied as

well as ways, such as pre-filtering, in which the spectral

distortions can be minimized.

- 17 -

Chapter 2

Methods

A block diagram of the system used in the present study

is shown in Figure 2.1. Input signals (speech and other

test signals) were either first filtered with a pre-filter

and clipped, or they were unmodified. The spectra of the

two signals at point 'A' were compared. The dynamic ranges

of these signals were assessed by measuring the

distributions of the signal envelopes after bandpass

filtering.

2.1 Inputs

Harvard sentences (IEEE, 1969) spoken by a male talker

were chosen as input signals. It was desired to use a

speech sample that had an amplitude distribution that was

characteristic of spoken English, but yet was of manageable

size for digital processing. The number of sentences was

determined by preliminary measurements described below

(Section 2.5).

- 18 -

Level Distribution

..-- -- - -0(A)

Input Pre-filter InfiniteO---- -- i eak clipper - -

(d,a) (d,a) (d,a)

d = digital

a = analog

J

Post-filter Level ProbabilityDetector Analysis ' Single Channel

(d,a) (a) (a) I

16 bands

Post-filter Level Probability(Multifilter) Detectors Analysis

(a) * (a) (a) Multiple Channel

H.P. 3582A H.P. 7004B0--Spectrum Analyzer X-Y Recorder

(a) (a)

ILS Analysis

(d)

Figure 2.1 System block diagram

2.2 Filtering and Clipping

Many of the operations shown in Figure 2.1 could be

implemented either digitally (not in real time), or with

analog equipment. The letters 'd' and 'a' in Figure 2.1

indicate digital and analog capability.

2.2.1 Pre-filters

The analog pre-filter was a 2-pole, 1100-Hz highpass

filter suggested by Thomas and Niederjohn (1970). For ease

of reference, this filter is termed "optimal" because Thomas

and Niederjohn found it to give the best intelligibility of

a number of pre-filters. Also for ease of reference, the

condition in which the signal is clipped without

pre-filtering will be termed "allpass" pre-filtering.

A software package, ILS (Interactive Laboratory System)

written by Signal Technology Incorporated, was used to

design the digital filters and filter waveforms. An

infinite impulse response Butterworth design was used for

the digital filters. Digital pre-filters were a digital

"optimal" filter, an allpass filter (i.e. no

pre-filtering), and a single-pole, 6 dB/octave 8000-Hz

highpass filter, which will be termed "differentiator" for

- 20 -

obvious reasons. Speech was also digitally pre-filtered

into the three formant regions that Hildebrant (1982)

described. The first formant filter was a bandpass filter

between 200 Hz and 900 Hz, the second formant filter was 900

Hz to 2800 Hz, and the third formant filter was 2800 Hz to

6000 Hz. Finally, narrowband pre-filters with bandwidths

less than 231 Hz and center frequency at 1000 Hz were used

for the purpose of exploring the effects of the pre-filter

bandwidth and slope.

2.2.2 Clippers

Analog clipping was performed by a.Schmitt trigger with

an adjustable hysteresis "dead zone" (Hildebrant, 1982).

The dead zone was adjusted to a minimum value which

prevented triggering by internal noise so that no output

signal resulted from a null input signal. Digital clipping

was performed by keeping only the sign of the digitized

waveform values (except zero, which remained zero).

2.2.3 Post-filters

The clipped waveform (at point 'A' in Figure 2.1) could

be post-filtered (before detection) into adjacent

"critical-band" filters using a General Radio (1925)

- 21 -

multifilter, or digitally post-filtered into a single

frequency band. The multifilter contains 30 one-third

octave, 6-pole (3-zero), Butterworth filters with center

frequencies that range from 25 Hz to 8000 Hz. The filter

skirt slopes near the passband are about 36 dB/octave and

flatten out to about 18 dB/octave due to the three zeros in

the transfer function.

Only the outputs from 19 one-third octave bands between

125 Hz and 8000 Hz are used, since the most important cues

for intelligibility lie in this frequency range. Because

critical bands are proportionately broader at lower

frequencies, outputs from the three lowest of these

one-third octave bands are summed to form channel 1, and the

outputs from third-octaves centered at 250 Hz and 315 Hz are

summed to form channel 2. The remaining channels are the

outputs of single third-octave bands. The bandwidths and

center frequencies of the 16 frequency channels are listed

in Table I.

For greater flexibility in examining the effect of

post-filter skirts and bandwidths, single band post-filters

were implemented digitally. Post-filter skirts were varied

by changing the number of poles in the filter transform from

2 to 8 poles. The bandwidth of the postfilter was varied

from 231 Hz to .29 Hz. The filtered bandpass output was

- 22 -

Table I

Channel Centernumber frequency

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

160

280

400

500

630

800

1000

1250

1600

2000

2500

3150

4000

5000

6300

8000

Bandwidth

Hz 113

130

93

115

146

185

231

289

371

463

579

729

926

1158

1459

1852

Hz

RC-FilterTime -3 dB

Constant Frequency

17825.0 usecs 9 Hz

9390.0 17

5730.0 28

4456.0 36

3390.0 47

2657.9 60

1989.4 80

2021.0 79

1559.7 102

1368.7 117

859.4 185

859.4 185

251.5 531

198.9 795

99.9 1592

39.9 3979

- 23 -

R

12.50

7.65

3.32

3.19

3.11

3.08

2.89

3.66

3.63

3.95

3.12

3.94

1.74

1.46

0.92

0.47

input into a single analog detector.

2.3 Dynamic Range Analysis

2.3.1 Level Detection

The hardware used to detect the envelope of the signal

is part of a multi-channel Automatic Gain Control

compression system designed and developed by Coln (1979).

Each level detector performs a short-term rms calculation in

which the bandpass signal is first squared and then filtered

by an RC-lowpass filter. The output of one of these

detectors is an estimate of the logarithm of the envelope

squared. There are 16 level detectors, one for each of the

16 bandpass outputs of the multifilter. When a single-band

digital post-filter is employed, the output of the filter

(after D/A conversion) is presented directly to the level

detector with an 80 Hz RC-filter (channel 7).

Each RC-lowpass filter time constant can be adjusted by

varying the filter capacitor. In channels 1 through 12, the

time constants were adjusted to keep the second-harmonic

ripple down to 6 percent at twice the center frequency of

the signal. However, in the two lowest channels the

bandwidth is larger than one-third octave, therefore a

- 24 -

greater percentage of the signal envelope will be filtered,

resulting in a "smoother" signal envelope. In channels 13

through 16, the RC-filter bandwidth is larger in proportion

to the input signal center frequency than in the other

channels, so more second harmonic ripple will be included in

the estimate of the envelope. The filter time-constants are

listed in Table I.

2.3.2 Probability Distribution Analysis

Samples from the 16 level detector outputs were

digitized, and "bins" corresponding to the decibel level and

channel of each sample were incremented. The signal in each

band was sampled at a rate of 400 samples/second, giving a

total sampling rate of 6400 samples/second. The binwidth

was 1/2 dB and the dynamic range, which was determined by

the level detector, was 80 dB in each band.

Normalized histograms of the decibel level of the

signal envelope are plotted for each of the 16 frequency

channels. The distributions are normalized to the largest

histogram value in the 16 channels. The rms level of each

distribution is calculated along with the 10% and 90%

amplitude levels., The 10% level is the amplitude level

which only 10% of the speech samples exceed; 90% of the

samples have amplitudes greater than the 90% point.

- 25 -

Henceforth in this thesis, the term "range" applied to a

level distribution will mean the difference between the 10%

and 90% levels.

Samples of silence or low-level noise may corrupt the

distributions. In cases where the signal and noise

distributions are clearly separated, the noise samples can

be simply deleted. The modified file will produce a more

accurate signal amplitude distribution. Such distributions

will be labeled "edited" in the following sections.

2.4 Spectral Analysis

Spectral analysis was performed using two methods. The

H.P. 3528A spectrum analyzer computed and averaged the

spectral magnitudes of a number of time samples. Spectral

graphs were plotted with the H.P. 7004B X-Y recorder. The

ILS software package was also used to compute and plot the

magnitude spectrum of a single time sample. Both methods

used the Fast Fourier Transform (FFT) algorithm.

2.4.1 H.P. Spectrum Analyzer

The H.P. 3582A spectrum analyzer transforms a finite

segment of discrete time data into a discrete frequency

spectrum using an FFT implementation. In single-channel

- 26 -

mode, 1024 samples of the input signal are transformed by

the FFT into 256 complex values. The duration, T, of a time

frame is determined by the selected frequency spans and the

number of samples,

T = 1024/(4*frequency span) = 256/frequency span.

For example, to compute the spectrum over a 5000 Hz span, a

51.2 msec time segment is sampled. The discrete frequency

points are spaced at equal intervals, vF=l/T. In this

example frequency resolution is 19.5 Hz.

The H.P. analyzer also allows for up to 256 spectra to

be averaged to give a better estimate of the spectrum of a

random signal. Time samples were weighted with a Hanning

window in order to minimize both the amplitude and frequency

uncertainty in the spectrum.

2.4.2 ILS Spectral Analysis

The spectrum of a finite digital wave can be computed

and graphed with an ILS routine which computes a 1024-point

FFT. The 1024-point time sample was weighted with a Hamming

window in order to minimize the frequency and amplitude

distortion. Since only 1024 points are used, static time

waves which repeat periodically in less than 1024 sample

points (such as static vowels) are appropriate signals for

this analysis.

- 27 -

2.5 Control Measurements and System Evaluation

2.5.1 Evaluation of the Level Distribution Measurement

System

In order to assess the performance of the level

detector system, ranges of signals with known distributions

were measured. Analog narrowband random noise and tones

were input directly into the level detector in channel 7,

which has an RC-filter cutoff of 80 Hz. Narrowband random

noise was used as a test signal because its envelope is

known to follow the Rayleigh probability density. The

narrowband noise was centered at 1000 Hz, and the bandwidth

of the noise ranged from 3 Hz to 231 Hz.

The measured range for a 10-Hz band of noise was 11 db,

2 dB smaller than the expected range of 13.4 dB for a

Rayleigh-distributed variable. The measured range for a

10-Hz tone was 13 dB, 3 dB smaller than the expected range

of 16 dB for a tone.

The discrepancy between the measured and expected

ranges of the envelope distributions can be attributed to

the influence of the RC-lowpass filter. Calculated

probability densities for narrowband noise and tones after

squaring and lowpass filtering are given in Appendix A. The

- 28 -

calculated and measured ranges for noisebands and tones are

graphed as a function of R, the ratio of input signal

bandwidth (or frequency) to the RC-filter bandwidth in

Figures 2.2 and 2.3, respectively. It is evident from these

figures that the measured ranges follow the calculated

ranges to within about 1 dB. As the noise bandwidth or the

tone frequency increases, the difference between the input

envelope range (dashed line) and the lowpass-filtered

envelope range increases. If the RC-filter bandwidth is

much larger than the noise bandwidth (or tone frequency),

and if the second-harmonic ripple can be neglected, then the

effect of lowpass filtering will be small.

Residual deviations between the calculated and measured

ranges may be attributable to sampling errors or to circuit

imperfections. A possible reason for the measured range

being slightly greater than the calculated range for noise

with bandwidths larger than approximately 80 Hz is that the

analysis did not include the effects of second harmonic

ripple. As the noise bandwidth increases, the signal

envelope will include more second-harmonic ripple which will

increase the measured level range.

It is expected that the decibel underestimation of the

input envelope range will be the same for signals, such as

speech, with level ranges that are much wider than those of

- 29 -

j. = calculated ranges- o= measu red range

14-

t2 +0 0

1 ~ 0 +- 0C0

++ 0

6 4-

4-

2-

313 .0625 0.2 0.5 .0 2.0 4.0 6.0

= Input Noise Bandwidth/Post-detector Bandwidth

Figure 2.2 Measured and calculated ranges for narrowband noise(centered at 1000 Hz) after squaring and lowpass filteringwith an RC-lowpass filter. Range is graphed as a functionof the input noise bandwidth divided by the RC-filterbandwidth. Dashed line represents the expected range fora Rayleigh distributed variable.

- 30 -

calculated range

measured range

---------------------------------------------------------- 0

0 +

0

0

0+ A-

0

0 --

L. 2.0 4.0 8.0

R = Tone Frequency/Post-detector Bandwidth

Figure 2.3 Measured and calculated ranges for tones after squaringand lowpass filtering with an RC-lowpass filter. Range isgraphed as a function of the tone frequency divided by theRC-filter bandwidth.

- 31 -

20

14

12 -#-

(Y_

a,

c

8

4

2-

0.0 -/, .0313 .062S 0.12S 0.2S 0.S

noise or tones. If we assume that such wide distributions

result from a large but slow modulation of a bandpass

process, then the resulting level distribution should be the

convolution of the distributions of the modulation and the

bandpass envelope. If the modulation is sufficiently slow,

its variation will be unaffected by the lowpass filter.

Thus, for example, if there is a 5-dB underestimation of the

input 13.4 dB range of narrowband noise, the underestimation

will still be about 5 dB if the noise is slowly modulated

over a range of, for instance, 40 dB.

2.5.2 Determination of Sample Size for Sentences

The range of a bandpass speech signal estimated with an

input of n sentences should converge to a constant value as

n is increased. The sample size used in the main study was

chosen by preliminary measurements in which the number of

sentences in the sample was increased until there was only a

small change in the range for an increase in the sample

size.

The level ranges of single sentences were measured

after digitally post-filtering by a one-third octave (231

Hz) filter centered at 1000 Hz. The ranges were found to

- 32 -

vary from 25 dB to 46 dB among a sample of 10 sentences

(Figure 2.4). Since this range, for 10 single sentences,

varied 21 dB, a larger sample of sentences was considered.

The sample size was increased to four, ten, or twenty

sentences and the range was measured (Figure 2.4). (The

10-sentence sample contained the original 4 sentences, and

the 20-sentence sample contained those same 10 sentences.)

It appeared that a sample size of 10 or 20 sentences would

be appropriate since the measured ranges, 36 dB and 38 dB,

were close to the results obtained by Dunn and White (1939)

for one-half octave wide di'stributions above 500 Hz, and by

Krieg (1980) and DeGennaro et al. (1981) for a third-octave

distribution.

The measured ranges in 13 frequency channels for an

input sample of the 10 and 20 sentences were compared

(Figure 2.5). In order to examine all channels, the

sentences were bandpass filtered with the analog one-third

octave multifilter. (The ranges in channels 14, 15, and 16

were omitted because the input was filtered at 4.5 kHz). In

all 13 frequency channels the ranges differed by at most 3

dB. Since the difference was small, 10 Harvard sentences

were chosen for the sample data. The 10 sentences chosen

are listed in Appendix B. The sentences are aproximately 3

seconds long, so 30 seconds of data at 400 samples/second,

- 33 -

- 1 1 1 1 1 1 1 1s0

46 H-

40 I-

301-

I i I I I I I I I I0 2 4 G 9 10 12 14 1G 19 20 22

Number oF Digital Harvard Sentences

Figure 2.4 Ranges in the one-third octave band centered at1000 Hz as a function of sentence sample size.

- 34 -

Cn

20

or 12000 samples of data are included in each histogram from

which range is estimated.

- 35 -

j. = 10 sentences_o = 20 sentences

4S-

40 - 0a

<U

20-

10-

0-

- .16 .28 .4 . .83 .8 1.0 1.2S 1.9 2.0 2.9 3.19 4.0

Band Center Frequency (kHz)

Figure 2.5 Ranges across 13 frequency channels comparinginputs of 10 and 20 digital Harvard sentences.

- 36 -

Chapter 3

Results

3.1 Comparison of Distributions in 16 Frequency Channels

The histograms of one-third octave bands of clipped and

unmodified speech were investigated in the following

experiments. First, the 10 digital sentences were clipped

(pre-filter was an allpass filter) and post-filtered by the

multifilter; second, the sentences were pre-filtered with an

1100-Hz highpass filter (analog), clipped, and post-filtered

by the multifilter. The reference condition of speech

filtered only by the multifilter was also investigated.

The amplitude ranges, shown in Figure 3.1, exhibit a

dependence on band center frequency. Part of this dependence

may be attributed to the variation across bands of the

quantity R, the ratio of input bandwidth to the detector's

RC-filter bandwidth. From the analysis presented in Section

2.5.1 (see Figure 2.3), we expect a smaller range in the

channels centered at 160 Hz and 280 Hz, because R is larger

than in the higher frequency channels (see Table I).

Conversely, there should be relatively less smoothing of the

envelope in channel 13 (centered at 4000 Hz) since the

bandwidth of the RC-filter is larger in proportion to the

- - 37 -

I I 0

0

03

C 03

[= F2= F1/C/F2

0= C/F2

0

001

03

0 0

41- -

1-0 0 -

of-0

00

-1-

00

0

-1-

0

0 0

0I I I I I I I I I I I~ I

- .1 .28 .4 .9 .63 .8 1.0 1.26 1.6 2.0 2.9 3.19 4.0


Figure 3.1 Ranges across 13 frequency bands for clipping and.nonclipping conditions. Input is 10 digital Harvard sentences.

F1 = 1100 Hz highpass filter (analog)C = Infinite peak clipper (analog)F2 = Multifilter

- 38 -

45-

ea1

:7)

35-

30-

25-

20-

0

is

10 I-

signal bandwidth (R is smaller) than in the lower-frequency

channels. There will also be more second-harmonic ripple in

channel 13 which will tend to increase the range. (Note that

results for only the first 13 channels are plotted because

the digital signals were lowpass filtered at 4.5 kHz, and the

three highest channels are centered at frequencies above that

cutoff).

The amplitude range for speech filtered by the

multifilter varies at most 15 dB across all frequency

channels and averages 32 dB. The ranges agree reasonably

well with the results of Krieg (1980), who used the present

system, as well as with Dunn and White (1940) who. used a

different measurement procedure (peak pressures in 1/8 second

intervals). The present measurements are slightly smaller

than those reported by DeGennaro et al. (1981), who also

used the same system as used here for measuring bandpass

envelopes. It should be noted that while DeGennaro et al.

(1981) display a 50-dB range for one talker and a single band

(Figure 4 in their paper), their subsequent figures (5, 6,

and 7) for at least two talkers and all sixteen bands show

nearly all ranges to be between 30 and 40 dB.

- 39 -

Clipping reduces the range of speech by about 15 to 20

dB across all frequency bands. The range of bandpass speech

that was pre-filtered with an 1100-Hz highpass filter and

clipped varies 12 dB across frequency bands and averages 15.8

dB. The range of clipped speech with no pre-filtering varies

by 11 dB and averages 9.5 dB across bands. This difference

between highpass pre-filtering and no pre-filtering will be

discussed further in Section 3.3.1.

In subsequent experiments in which a single channel is

analyzed, the band centered at 1000 Hz is used. This band

was chosen as representative because it is approximately in

the middle of the speech frequency range, and also because

the measured ranges in this band are near the respective

averages across bands for the three conditions in Figure 3.1.

3.2 Comparison of Digital and Analog Implementations

3.2.1 Digital vs. Analog Sentences

In order to identify any differences in the measurements

due to digital or analog sentence inputs, histograms made

with tape-recorded analog sentences and digital sentences

were compared. Thirty tape-recorded analog Harvard sentences

and 10 digital Harvard sentences (after D/A conversion and

lowpass filtering at 4.5 kHz) were processed identically with

- 40 -

the analog equipment. Ranges were compared for bandpass

filtered speech and clipped/filtered speech. In the clipping

condition, these sentences were pre-filtered with the 1100-Hz

highpass filter, infinitely peak-clipped, and bandpass

filtered into the sixteen channels by the multifilter. The

analog sentence histograms were "edited" (as described in

Section 2.3.2) to eliminate obvious noise. No editing was

neccessary for the digital sentence histograms. The ranges

are shown in Figure 3.2.

Except for discrepancies at 800 and 2500 Hz, the

difference between the ranges for digital and analog

sentences in the nonclipped condition is less than 5 dB.

(Note that the three bands above 4.5 kHz are included for the

analog sentences.) With the exception of channel 1, a larger

range was measured with analog sentences than with digital

sentences.

In the clipping condition, the difference between the

ranges for digital and analog sentences is 2 to 3 dB. The

smaller discrepancy here may be attributed to the clipper's

"dead-zone" which was adjusted so that there was no output

produced by tape hiss (when the speech was off). Because the

distributions of the unclipped analog sentences are wide (30

to 40 dB), it is more likely for the speech and noise

distributions to overlap in this condition than with clipp.ed

- 41 -

s0

45

33

3- 30

c 2S

20

15

10

0

- .16 .29 .4 .S .G3 .8 1.0 1.2S LB 2.0 2.G3.194.0 S.0 6.3 9.0 -


Figure 3.2 Ranges across 13 or 16 frequency bands comparingdigital and analog Harvard sentences as inputs.

Fl = 1100 Hz highpass filter (analog)C = Infinite peak clipper (analog)F2 = Multifilter

- 42 -

S I I I I I I I I I I I I IAnalog Sentences (edited) OLgftel Sentences

3 = F2 4- = F2- = FI/C/F2 x = FI/C/F2

0

3 0 0 0+ 0

- 4++

- + + 0

0 0

x 0x

06o

I I I I I I I I II| | |

speech. Under clipping, the mixing of noise and speech

distributions is reduced because the range of speech is

reduced.

It appears from these measurements that digital

sentences are superior to analog sentences in terms of

signal-to-noise ratio. However, for the clipping condition,

analog and digital sentences produce similar results.

3.2.2 Digital vs. Analog Processing

In order to investigate the effects of a number of

different pre- and post-filters, digital filters were

designed. Histograms after digital and analog processing

were compared. The 10 digital sentences listed in Appendix B

were used as test inputs. The analog multifilter, 1100 Hz

highpass filter, and clipper were used for analog processing.

Counterpart digital filters were designed for digital

processing. The comparison was made on the one-third octave

distribution centered at 1000 Hz (channel 7).

Digital and analog processing were compared with three

different pre-filters (an allpass filter, an 1100-Hz highpass

filter, and a one-third octave (231 Hz) filter centered at

1000 Hz) before clipping as well as for the reference

condition of no clipping.

- 43 -

Table II

A comparison of ranges in channelcenter frequency=1000 Hz) afterprocessing of 10 digital sentences.

7 (bandwidth=231 Hz,digital and analog

F2 = 1/3 octave (231-Hz bandwidth) filter centered at 1000 HzFl = 1100-Hz highpass pre-filterC = Infinite peak clipper

Digitallyprocessed

Analogprocessed

F2

36

34

C/F2

11

14

Fl/C/F2

13

18

F2/C/F2

6

7

- 44 -

Table II shows that the ranges differ between digital

and analog processing by at most 5 dB. An interesting fact

is that pre-filtering the speech with a one-third octave

filter decreases the range from 36 to 6 dB. This will be

discussed in Section 3.3.1.

3.3 Compression Results

3.3.1 Pre-filtering

The following experiments were conducted to determine

the effect of various pre-filters on the range of clipped

speech. The pre-filters used here were those employed in

various studies of clipped speech. The first is an allpass

filter (i.e. no pre-filtering) as used by Licklider (1948),

for example. The second is an RC highpass filter with cutoff

at 8000 Hz (subsequently referred to as a differentiator) as

used by Licklider and Pollack (1948) and Thomas and

Niederjohn (1968). The third is an 1100-Hz (12 dB/octave)

highpass filter (subsequently referred to as the "optimal"

filter) suggested by Thomas and Niederjohn (1968). A fourth

pre-filter is the set of pre-filters described by Hildebrant

(1982) that was designed to separate the formant regions of

speech. The first-formant filter is a 200 Hz to 900 Hz,

4-pole Butterworth bandpass filter (spanning channels 1

- 45 -

through 6), the second-formant filter is a 900 Hz to 2800 Hz,

10-pole Butterworth bandpass filter (spanning channels 7

through 11), and the third-formant filter covers the range

from 2800 Hz to 6000 Hz (channels 12 and 13) and is also a

10-pole Butterworth filter.

The ranges in the thirteen channels under the various

configurations are shown in Figure 3.3. Clipped speech with

no pre-filtering has the lowest overall range, from 9 dB to

20 dB and averaging 13.7 dB across frequency bands.

Differentiating the sentences before clipping increases the

range relative to no pre-filtering by about 6 dB across all

bands; these ranges vary from 13 dB to 27 dB and average

19.8 dB. Pre-filtering with an optimal filter does not

increase the range quite as much as differentiating; the

range is 9 dB to 22 dB across all bands and averages 15.5 dB.

In the low-frequency bands, speech pre-filtered with the

first formant filter has the smallest range of the four types

of pre-filtering (averaging 12.8 dB in the first 6 channels).

The second and third formant filters produce an effect

similar to the differentiator and optimal filters. The

average ranges in these two formant filter regions are 16.6

dB in channels 7 through 11 and 14.5 dB in channels 12 and

13.

- 46 -

Fl/C/F= F2/C/F= F3/C/F

x = C/F= Fopt/C/F= FdtF/C/F

35-

30-

7 +~25-

20 - \CL/ +-, x x Zk\4

0 7-

XV X x"

10 - /

S_

- .16 .28 .4 .5 .S3 .8 10 1.25 1.6 ZO 2.5 3.15 4.0


Figure 3.3 Effect of different pre-filters on ranges in 13frequency bands.

F1 = 200-900 Hz bandpass digital filterF2 = 900-2800 Hz bandpass digital filterF3 = 2800-6000 Hz bandpass digital filterFopt = 1100 Hz highpass digital filterFdif = 8000 Hz highpass digital filterC = Infinite peak clipperF = Multifilter

- 47 -

The effects of varying pre-filter bandwidth and

rejection slope were investigated in the following

experiments. The sentences were digitally pre-filtered,

clipped, and digitally post-filtered with a one-third octave

bandpass filter centered at 1000 Hz, which had the same

filter-skirt slope as the pre-filter also centered at 1000

Hz. The pre-filter bandwidth varied from 231 Hz to 29 Hz and

the slope of the pre-filter and post-filter skirt were varied

together by changing the Butterworth filter transfer function

from 2-pole to 8-pole. The signal level was detected in

channel 7 (RC-filter bandwidth is 80 Hz) after bypassing the

multifilter.

Plots comparing the ranges of speech pre-filtered with

the narrow-band filters to speech pre-filtered with the

allpass, optimal, and differentiator filters are shown in

Figure 3.4. Except for the case of a 2-pole Butterworth

filter, the range for clipped speech that was pre-filtered

into a one-third octave (231 Hz) band is 5-8 dB smaller than

speech that was pre-filtered with the differentiator or

optimal filters. Generally, as the bandwidth of the

pre-filter decreases, the range increases. Given that the

pre-filter transform has more than 2 poles, varying the

pre-filter slope does not change the ranges appreciably.

- 48 -

2-pole Butterworth Filter

20 -

t4 -

12 -

to

2


=1

4

0,M

X

- 2q G - 11S - 1_7 - 231 - - - O -p - All-

Bandwidth (Hz) P-e-FE~ter


I I I I I I I

14

*

14

14 14

14 14

14

* 2q S - IlS - 173 - 2.31 - - AL-

Bandwidth (Hz) P

--4

20 -

18 -

16-

1-4

2-

11

-2q S2 - 1S - 1-73 - 231 - - - OIFP - ;oat - All -

Bandwidth (Hz) P'eFlIter

S-pole Butterworth Filter

161-

4-1-

2

2q 8 is t73 2.1 - - OFP A

Bandwidth (Hz)

Figure 3.4 Effect of different narrowband pre-filters onranges. Also plotted are the ranges with the differentiator(Diff), the 1100-Hz highpass filter (Fopt), and the allpassfilter (All). Post-filter is 1/3-octave centered at 1000 Hz.Pre-filter and post-filter slopes are the same. For thecases of 'Diff', 'Fopt', and 'All' these slopes apply onlyto the post-filter.

- 49 -

0,

20 -

18 -

12

2

The pre-filters with a 2-pole transfer function will

produce an effect on the low frequency part of the spectrum

similar to the differentiator. Since the speech spectrum is

dominated by low-frequency components, the ranges are similar

(upper left pannel of Figure 3.4). Other discrepancies from

the general trends appear for speech that is pre-filtered

with a 29 Hz, 6-pole Butterworth filter.

Clipping clearly compresses the amplitude range of

speech. Digitally pre-filtering and clipping produces output

ranges of 8 to 16 dB in the 1000-Hz frequency band, a

reduction of more than 20 dB (Figure 3.4). Across all

frequency bands the range is slightly larger, from 10 to 23

dB (Figure 3.3); that is, clipping decreases the range by 15

to 20 dB across all frequency bands.

3.3.2 Post-filtering

The effect of post-filtering the clipped speech was

determined by varying the bandwidths and slopes of the

post-filter. Post-filters were centered at 1000 Hz and the

bandwidth was decreased from one-third octave (231 Hz) to

1/24 octave (29 Hz). The slopes of the post-filter bandpass

skirts were increased by varying the Butterworth filter

transfer function from 2-pole to 8-pole. Variations in

- 50 -

post-filtering were examined with speech that was

pre-filtered with either the allpass filter, the optimal

filter, or the differentiator. With the differentiator as

pre-filter, only the effect of varying postfilter slopes was

examined; the post-filter bandwidth was fixed at one-third

octave.

Figures 3.5 and 3.6 show that for all pre-filters

examined, variations in post-filter bandwidth and skirt

slopes have relatively small effects on compression. Figure

3.5 presents range results with no pre-filter (i.e. an

allpass filter) as a function of post-filter bandwidth and

skirt slopes. The range of the clipped speech varies from

9-15 dB, with the exception of the measurement with the 29 Hz

bandwidth and 8-pole Butterworth filter. Results with

optimal and differentiator pre-filtering, shown in Figure

3.6, exhibit a similar spread of ranges.

3.4 Spectral Modifications

Speech sounds can be classified into two categories

determined by the presence or absence of voicing. This

distinction is important in the analysis of the spectra of

clipped speech sounds. The spectra of the two types of

speech sounds are analyzed in the following sections.

- 51 -

2-pole Butterworth post-filtercenter frequency=OOOHz

is-

1.-

12

C

4-

2-

4-pole Butterworth post-filter

center frequency=OOOHzI I I I I T

I I I I I

*

*

* *

z 2q sa ils 174 232

Post-ftIter Bandwidth (Hz)

6-pole Butterworth post-filtercenter frequency-COOHz

Is-

1.4

2

*

*

I I

-I

a 29 sa Lis 174 232

Post-Fi[ter Bandwidth (Hz)

3-pole Butterworth post-filtercenter frequency=OOHz

te - I1.8-

1.4H

2VY

_2

S 29 1.1U5 174 232

Post-Filter Bandwidth (Hz)

I I I I I I

*

IUs 174 232

Post-rtiter Bandwidth (Hz)

2q G2

Figure 3.5 Effect of varying the post-filter bandwidth andslopes. Pre-filter is an allpass filter.

- 52 -

18-

16-

Is-14 -

12 -

a--

4-

5-12

*

I I

-

I I

2-pole Butterworth post-filtercenter frequency=l00oHz

I I I II8-

Is-t4 -

12 -

t2-

8-

6-

II - I I I I I L3 2q sa lls 174 232

Post-filter Bandwidth (Hz)

6-cole Butterworth post-filtercenter frequency=1000Hz

7 71 1* i I I I I I i I]

** 0*

*A

2 Us :74 232

Post-Filter Bandwidth (Hz)

4-pole Butterworth post-filter

center frequency=100OHz

1 1 6 1 1. I I II:8

ts

14

:2

:4

12

0

IS -

14 -

12

Z 0~

caH

0 20 S a

' *0

*

a 2q a :15 L74 232

Post-Flter Bandwidth (Hz)

3-pole Butterworth post-filtercenter frequency=lOOHz

I t I I

0

,-Is 17.

Post-4'iter Bandwidth (Hz)

232

Figure 3.6 Effect of varying the post-filter bandwidth and

slopes. Pre-filter is an optimal filter (*) ordifferentiator (o).

- 53 -

0

10 -'4

8 0

2 t

I

H-4

-

222

3.4.1 Spectra of Clipped Unvoiced Speech

It has been shown (Papoulis, 1965; Fawe, 1966) that for

gaussian noise as input, the output power spectrum Gy(w)

after infinite clipping can by determined from the input

power spectrum Gx(w) by the "arcsin law" relating their

respective autocorrelation functions, Ry(t) and Rx(t). Thus,

Ry(t)/Ry(O) = 2/TY arcsinLRx(t)/Rx(O)]

This relation was verified by measuring the spectra of

various noises before and after clipping. The first set of

noises consisted of synthesized tokens of the consonants

/f,sh,s/ and the burst portions of /p,t,k/. Clipping was

done by the analog clipper and spectra were analyzed with the

spectrum analyzer. The power spectrum of each synthetic

consonant was also calculated from the filter transfer

function from which it was generated, and the arcsin law was

applied to these spectra using the FFT.

Figures 3.7 through 3.12 show both the measured spectra

and the spectra calculated from the arcsin law. (Since the

noise bursts were generated with a sampling rate of 10 kHz,

only a 5 kHz range is shown). From the figures it is clear

that the clipped spectra follow the arcsin law.

- 54 -

1000

0

2000 3000

Frequency (Hz)

2000 3000FREQUENCY (HZ)

4000

4000

5000

5000

Figure 3.7 Spectra of the synthetic consonant noise burst, /f/,measured before and after clipping. In the top plot, thesolid line is the ideal input spectrum and the dashed lineis the spectrum calculated from the arcsin law. In the bottomplot, the lower trace is the measured input spectrum and theupper trace is the measured clipped spectrum.

- 55 -

Spectrum of /F/

- clipped

I I I

-20

~i%

a

(nCq,40

0

L4-

4)a-

-60

-70

-80

0 100

-10

-20

-30

U, -40

-50CL.

-60

-70

-80 0

1%

clipp-

0

lows 4000

Frequency (Hz)

1000 2000 3000FREQUENCY (HZ)

4000 5000

Figure 3.8 Spectra of the synthetic consonant noise burst, /sh/.Plots are the same as in Figure 3.7.

- 56 -

Spectrum of /SH/

-' '-,. -clipped

--20

-3

4

a

C

C4~i

+A

-so

0

-10

-20

-30

U -50

-60

-70

-80

- clipped

II I -I

0

I I

-

-

-

0

-10

C

C140

C-

V)

Spectrum of /S/. .............. _ T

400

Frequency (Hz)

0

-10

-20

2 -30

Z-4o

S-50

-60

-70

-801000 2000 3000

FREQUENCY (Hz)41000 5000

Figure 3.9 Spectra of the synthetic consonant noise burst, /s/.Plots are the same as in Figure 3.7.

- 57 -

clipped

M 4 12 - ---

- clipped

e

0

000 3000

Frequency (Hz)

2000 3000FREQUENCY (Hz)

40M0

400

5000

Figure 3.10 Spectra of the syntheticPlots are the same as in Figure 3

consonant.7.

noise burst, p/.

- 58

-101

Spectrum of /P/

-T

- clipped

- --------a4J

in

0

U3-.60

0

-10

-20

-30

~-40

CL1

-60

-70

-80

0

0

100

1000

- clipped

-j

-

Spectrum of /T/

1000

1000

20Me 300

Frequency (Hz)

2000 3000FREQUENCY (4Z)

5000

5000

Figure 3.11 Spectra of the synthetic consonant noise burst, /t/.

Plots are the same as in Figure 3.7.

- 59 -

-101--

1-o

C

0

a-(I)

-clipped

I I II _

-60I-

-80

e

0

-10

-20

clipped

'0-30

I-40-j

-50CA)

-60

-70

-80-.0)

I I I I

400r)

3-1Ca

aC

usecn

-88

Spectrum of /K/

Z0

210

41)

SO

-70

-801000

200 .300

Frequency (Hz)

FREQUENCY (HZ)30002000

4000

4000

5000

500

Figure 3.12 Spectra of the synthetic consonant noise burst, /k/.Plots are the same as in Figure 3.7.

- 60 -

0

10 --

- - - ' - clipped

---- --- --- ----- --7

LL.

I-

U~)

clipped

I I

I I

-60

-/0 -

Ivi

In order to look at relative changes in the clipped

spectrum after varying spectral frequencies and amplitudes,

random noise was filtered with the 1925 General Radio 1/3

octave multifilter. The output was clipped and analyzed with

the H.P. analyzer. Power spectra were also calculated by

varying the parameters of the filter transfer function of the

synthesized consonants and the arcsin law was applied.

(Thus, the calculation was performed using a filter transfer

function different from the one used empirically. This

difference can be seen between the unclipped calculated and

measured spectra in the figures.) Spectra with a single peak

at 315 Hz and 1250 Hz are shown in Figures 3.13 and 3.14.

Spectra with two peaks are shown in Figures 3.15 through

3.19.

Several observations can be made about the spectral

modifications caused by clipping. First, clipping tends to

fill in "valleys" in the spectrum, much as an increased noise

floor would. Second, as expected from the Fourier series for

a square wave, harmonics appear at odd multiples of the

spectral peaks. This is especially noticeable in Figures

3.13 and 3.14 which display a single peak. Third, the

amplitude relation between two spectral peaks after clipping,

- 61 -

I I

I I

---

E I I1~ LE0 2QSS

a-50a-3

o

-7

as

25Q

Frequency (Hz)

-10 r-

-60

-70

0 500 100 FREQueNCY ([A500 2000 2500

Figure 3.13 Spectra of filtered narrowband noise centered at315 Hz measured before and after clipping. The top plot iscalculated from the arcsin law using a different transferfunction from that measured for the input noise. The bottomplot is measured with the spectrum analyzer. The format ofthis plot is the same as in Figures 3.7-3.12.

- 62 -

a

-to

IT

0

-20

-30

-80

- d

'

I Ito"M

Z4

9-50ULU156,

W to" 2U06 3UU 4WQ S" Sees 7095 aBG Iees teee

Frequency (Hz)

-10

-20

-30

S-50

-70

-80

Figure 3.14 As in Figure 3.13 but with input noise centeredat 1250 Hz.

- 63 -

a

CO

0

,C-

U -500Q-

If)

-0SI

-7,

-00

_ I I I I I

- t1--

- clipped

I I I I I I I I

-I clipped --

.................-

I I i

IW 200 =7 = m 10000

- 1 1 1 1 1 _

-to

-2S--

200 _258

Frequency (Hz)

-10

-20

-30

-50

-60

-70 1

500FREQUENCY (RO

Figure 3.15 As in Figure 3.13 but withcentered at 250 Hz and 1250 Hz.

2OW 2UO

input noise peaks

- 64

C0-4.

0

aO

-8sf--

-- - - clipped

L.-0S

a

0--ippe

-80

a

0

-10-

-20-

30

- clipped

a.

(n70-

-M -

-80

0 low 2000 3000 4000 5000Frequency (Hz)

-20

-30

-clipped

-60

-70

800FREQUEICY (HZ

Figure 3.16 As in Figure 3.13 but with input noise peakscentered at 250 Hz and 1250 Hz.

- 65 -

200 300

Frequency (Hz)

30 C

FREQiENCy (Hz)

Figure 3.17centered

As in Figure 3.13 but with input noise peaksat 315 Hz and 1250 Hz.

- 66 -

-Wo-

-201

Cq 40

- .- clipped

H- -------- -------

I.-80

0 I000 4000

-10 1-

OT I I I

- clipped

-I

-20

-30

1-50

-80

5000

50YU LWJ LIAAJ

-1 I

I I I

-70

3 i I

-20-

-30

c --- ,- - -clipped140 - -

a.L

I)

-60

-70-

.. -.

0 1000 2008 3000 4000 SamFrequency (Hz)

0

-10

-20

-30

t-50

-70

-80omu 32(W) 3m)

FRE'UENCY (Hz)


- 67 -

-10

-20

'30

Cq 401

a.C-U)

-60

4090 6000

Frequency (Hz)

0

-10 -

-20 -

-30

9-50

-70

-80 10 1 AJ 2W FREl UNY ( OZ

limo


- 68 -

I-

--- -~clipped

iT-I-

- clipped

i I

tese

when neither is obscurred by distortion components, is nearly

the same as the input (Figures 3.16, 3.18, 3.19). Low-level

peaks can be obscurred by the harmonics of higher-level

peaks, especially if the frequency of the low-level peak is

greater than the frequency of the higher-level peak (Figures

3.15 vs. 3.16 and 3.17 vs. 3.19). The low-level peak can

also be enhanced by the harmonics of higher-level peaks

(Figure 3.15).

3.4.2 Phase Dependence of Clipped Voiced Sounds

For the case of voiced sounds, which consist

components, it is expected that the arcsin law

applicable in general, although it may possibly

useful approximation. In this section the

predicting the clipped spectrum of harmonic

demonstrated.

of harmonic

will not be

provide a

problem in

sounds is

The problem can be seen by constructing a signal that is

the sum of harmonically related sine waves and varying the

phases of the components (Figure 3.20). The three frequency

components were chosen in a one-third octave band with center

- 69 -

frequency of 1000 Hz.

Any prediction of the clipped spectrum based solely on

the input auto-correlation function would be independent of

the input phases. It is clear that clipped spectra are

dependent on the phases of the individual frequency

components in the input (panels A-C of Figure 3.21). The

prediction from the arcsin law is shown is Panel D of Figure

3.21. It will be noted that the gross spectral shape is

rather consistent across the three clipped spectra and

follows the shape predicted by the arcsin law. However, the

validity of the arcsin law approximation in other harmonic

cases is unknown. Given that fine details of the spectrum

are lost as a result of "critical-band" filtering by the ear,

an approximation as good as that shown in Figure 3.21 would

probably be adequate. Determining the goodness of the arcsin

approximation for harmonic signals is a worthwhile but

lengthy task which was not pursued here because of time

constraints.

If the tones in the signal are not harmonically related,

phase shifts of components can be viewed as a shift in the

time orgin, and there should be no effect of phase shifts on

the clipped spectrum. However, this case is not relevant to

speech signals, since the components of voiced sounds are

harmonically related.

- 70 -

VALUES1i

8

BEG = .006400 SEC MID = .01640 SEC ENO = .02640 SEC

Figure 3.20 Differing time-waves caused by phase shifting.The signal was generated by summing three harmonicallyrelated tones with frequencies of 900, 1000, and 1100 Hz,and relative amplitudes of 0, -2, and -5 dB respectively.The three tones (900, 1000, and 1100 Hz) in Panel A havezero phase. The phases of the three tones were respectively0, 180, and 90 degrees in Panel B. The phases were 90,270, and 180 degrees in Panel C.

- 71 -

0

0

0 --

0

0

0

0

0 --

| ||

0

0 VV

r VV

(A)

-8

-16VALUES

16(

(B)

-160VALUES

iC

(C)

-8

-1G6

(A)


4000

I Y I


(C)

4000

(B)(OB)T 10 0

f A

I Will I )I 1AIIHIH I If 1

Jill HIA 1 11991111 i v

_ _ A7-..


(08)1 100 too

qG

80

70

90

so

so

40C40

U

30 V) 30

20 20

to

0 t000 2000 3000

4000

4000

Frequency (Hz)

(D)

Figure 3.21 The phase dependence of harmonically related tones.Panels A, B, and C show the clipped. magnitude spectra withthe three signals in Figure 3.20 as input. Panel D showsthe prediction of the arcsin law applied to the input.

- 72

(DB)-1 100

40

s0

70

SO

40

30

20

I A111111 11

MIR h 1.1 1111 1 li wMITI, I r )j 11 U M 11.11111W HIMIll H

41 LU.A. 11111MM 11

0

I-4

20

to

0

3.4.3 Spectra of Clipped Vowels

In order to observe the effects of clipping on vowel

spectra, measurements were made with steady-state synthetic

vowels. The spectra of the vowels were evaluated with the

ILS spectral analysis package.

The vowels were synthesized using a model in which the

glottal source is passed through a filter designed to model

the vocal tract (Rabiner and Schafer, 1978). Parameters of

the model were varied to change the frequencies and

bandwidths of the formants. In the present implementation,

an impulse was used as a glottal pulse waveform in order to

remove the negative spectral tilt associated with the more

natural waveform. It is easier to evaluate the clipped

spectra after eliminating this low-frequency emphasis.

The vowels were synthesized with a fundamental frequency

of 125 Hz. Thus, the spectrum of each vowel is a sequence of

harmonics spaced 125 Hz apart with spectral-envelope peaks at

specified formant frequencies.

The spectra of clipped synthetic vowels are presented in

Figures 3.22 through 3.39. There are some major differences

between the clipped spectra of vowels and noise. A clipped

vowel spectrum is a sequence of harmonics with the same

- 73 -

inter-component spacing of 125 Hz as the unmodified spectrum,

but with components extending to higher frequencies. As with

noise, harmonics appear at odd multiples of the spectral

peaks. However, there is no counterpart to the "noise floor"

seen with clipped noise.

Clipped harmonics of the fundamental frequency

components can emphasize or de-emphasize formant peaks. For

example, in Figures 3.22 through 3.25, the formant is at 1000

Hz, a multiple of the pitch period, and the clipped spectrum

has peaks only at odd harmonics of this formant. In another

example where the formant of the vowel is higher in frequency

and not exactly at a harmonic of the pitch period (Figures

3.26-3.29, the clipped spectrum contains other spectral peaks

at low frequencies. However, the original formant is

preserved as an absolute maximum in the spectral envelope.

The extent to which the formant is retained in the clipped

spectrum, even when the input spectr'al peak is very shallow,

is quite remarkable (Figures 3.25 and 3.29).

The clipped spectra of vowels with two formants have

peaks at not only odd multiples of the formants but at other

frequencies as well (Figure 3.30). Again, clipping seems to

emphasize individual formants that are difficult to see in

the unmodified spectrum (figure 3.31).

- 74 -

In Figures 3.32-3.34, the two formants (at 1000 Hz and

3000 Hz) are not only harmonically related to the pitch, but

the higher frequency formant is an odd multiple of the lower

frequency formant. After clipping, narrow spectral peaks

appear at only odd harmonics of the formants. Even as the

bandwidths of the formants are varied, no new peaks appear in

the spectrum (as seen in Figures 3.30 and 3.31), and the

spectrum remains unchanged except for the spectral height of

the peaks.

When two formants are not harmonics of the fundamental

pitch frequency, the formants are preserved but the clipped

spectrum contains other spectral peaks (Figures 3.35-3.39).

Similarly to noise, lower-amplitude formants in the spectrum

are obscurred after clipping. (Figure 3.37).

- 75 -

(OB)1

0 1000 2000 3000 4000 500FREQUENCY (HZ)

'A(

0 1000 2000 3000FREQUENCY (HZ)

4000 S000

Figure 3.22before

Spectra of a synthetic steady-state vowel measured(top plot) and after (bottom plot) clipping. The

formant frequency is 1000 Hz and bandwidth is 50 Hz.

- 76 -

00

70

6o

30

20

10

08)100

60

70

60

20

10

0



Figure 3.23 As in Figure 3.22 with a 1000 Hz formant frequency

and a 200 Hz bandwidth.

- 77

(DB)100

A h I I I A

11111 H A V IIIIIIIIII IIIIIII III I)IIIII IIIIIIIII 111IM 11 , IIH IIIII

I IIIII 11 111111 11111111 11 H IM

111111111MI IIII

uvd Jj julifill" tin m 4fW 30

20

10

05000

(OB)100

80

70

so

20

10

0

0

0

4000

40001000 5000

80

70

so

I A


1000

4000


5000

I HitIlHil

11111 1 1 H I 111111 11 1111111 11 IN I I I 111 1 11 1 1 11111111 1 1 1 1 IN 1 11 1 11111

1111 1 111 H ill 11 1 11 111 1111 1 11111

1111 11H W -44M III III1IlUV ' N I W II;

s0

30

20

10

0

(0B)

5000

100

70

so

30

20

to

0

Figure 3.24 As in Figure 3.22 with a 1000 Hz formant frequency

and a 500 Hz bandwidth.

- 78

0

0

100

90

80

11111 .1111 H I IJ 11 H IH A III . III JillA il 1 11 111 11 1111 1 1 1 1 1 11 1 11 11 1 1111 ji I I I Ill 1 111111 H I

1 11 1 1 111, 111 1 1 1111111 H I II I M 1111 1 111111111111H I M 11111H 11111111

111111 1 HIHIIN IIHM

I U I I I- v


1000 2000 3000FREQUENCY (HZ)

4000

(MB)100

W 111 IHII F- U A ' I I !111

I )I 11 11 11 H I I III M

1]~~ ~~~ 111- 1IIII I

30

20

10

0

(OB)- 100

80

70

60

40

30

20

10

00100

Figure 3.25 As in Figure 3.22 with a 1000 Hz formant frequencyand a 1000 Hz bandwidth.

- 79

0

* * li.1

j 11

70

s0

1000

1000

2000FREQUENCY

3000(HZ)


4000

4000

.(OB)

IIIII II. I i I f 1 fva

. I I I I l m

30

20

0

(DB)1100

SO

70

60

20

10

0EQO


- 80 -

0

0

I I________ 1. ,I I. - I__

f ______________

-

1 - - ________

-1 - - - @- - - -- - .. -1, 11 / -- , - - - ------------ -z A I

80

70

s0

I

1 0

I

IliI________1_11



40

20

to

0


- 81 -

DB)100

90

80

70

60

40

30

to

04000 5000

III III 1 'lH 1 if iM H

(OB)1 100

60

70

60

0 1.000 S000

II

01 20il30l


HIM IIIHI I M HIM JI 11111 I I 1 11 j

111111 11111 HIM I 1 11 ji il 11 4111 1 H M I( 11 I 111 1 MW HIC HIIIIIII11 . 1111H II .114 111 11 1,111 11111H IIII. 1111111

( H ill 1 11 11 11 U ll I' ll 1 11 11111 1 111 111Ij I I IM I 11 1 '1 H IP 11 1 lj 11 1 IN I I III

I o

1000 2000 3000FREQUENCY (HZ)

Figure 3.28 As in Figure 3.22 with aand a 500 Hz bandwidth.

2700 Hz formant frequency

- 82

80

70

60

40

30

20

04000 Eooo

(OB)100

80

70

60

SO

40

30

20

10

00 4000 5000

0

0

0

1000

1000

2000FREQUENCY

2000

3000(HZ)

4000

3000FREQUENCY (HZ)

5000

III HIM HINIII IIM I III MI9 1 11 11 H 111 11 1111-11 111 11111 1 1 11111 1 111 Jill I I III Jill

1 111 1 1 11 11 Jill

0 UA-1.1 1) [1 Ij 111111( U l

III HIM30

20

t0

0

(DB)t00

80

70

60

30

20

to

05200 -


- 83

s0

so

I E A

0

0



4000

4000

100

I I I l

Lai

30

10

0

DB)100

70

60

50

30

20

10

05000

Figure 3.30 As in Figure 3.22. First formant frequency is 1000 Hzwith 50 Hz bandwidth. Second formant frequency is 2000 Hz with50 Hz bandwidth.

- 84 -

s0

70

60

ilt(

A

T. IN III I I I 111 J I I fl 11 11 11111 11

11 1 11 IN 11 J llu l I 1 1111111111. 11111 11 1 11P III 1 11

11 HIM I I I], r4 i 4 1111 I 1111i 11 1)11 1 11111 11,11MIll

Ili Ilim , 1 1 11 1 111 1) 111111 1 .. 11 1111

.1. HIP P, I I III HIM 111111

I I JIM Mill Hil I

0 1000 2000 3000 4000 SooFREQUENCY (HZ)

IN I 111 L

II II II I t

IIIIIIIIIII I H 111111 I I I i ill 111111 1 11 H I h ill 11

11111 11 ll 11.1 U1111 i I IIIIH I 11M 11 M ill 1. H il

j 1 1 1 ' Jill i ll 1111I I 1 1111111 1 i llHil 0 Hill L ill


4000

100

60

70

30

10

0

100

70

60

E000

10

0

Figure 3.31with 500

As in Figure 3.22. First formant frequency is 1000 HzHz bandwidth. Second formant frequency is 2000 Hz with

500 Hz bandwidth.

- 85 -

0

A1

0 1000 2000 3000 4000 5000FREQUENCY (HZ)

___(

1000 2000 3000FREQUENCY (HZ)

4000

00

so

70

60

30

20

10

0

OB)100

630

70

60

30

20

0S000

Figure 3.32with 50

As in Figure 3.22. First formant frequency is 1000 HzHz bandwidth. Sencond formant frequency is 3000 Hz with

500 Hz bandwidth.

- 86 -

1000

1000

t

\ 0sL.Jj

100

I H IIII I IH I I

0Sao02000 3000

FREQUENCY (HZ)


4000

(DB)1 100

s0

70

s0

40

S000

20

10

0

Figure 3.33 As in Figure 3.22. First formant frequency is 1000 Hz

with 100 Hz bandwidth. Second formant frequency is 3000 Hz with

100 Hz bandwidth.- 87 -

0

0

30

20

to

--------- ,

.~ .. III 11 11 111M INIIII

a 1000 2000 3000 4000 S00FREQUENCY (HZ)

- -

100 2000 3000FREQUENCY (HZ)

4000

0

Figure 3.34with 500

As in Figure 3.22. First formant frequency is 1000 HzHz bandwidth. Second formant frequency is 3000 Hz with

50 Hz bandwidth.- 88 -

(Ut:$)1 100

90

70

60

30

10

0

(OB)100

80

70

60

40

30

20

10

00

__A I I___ ___

0 1000 2000 3000 4000 SOOFREQUENCY (HZ)

IH I~ II

1000 2000 3000FREQUENCY (HZ)

4000 5000

re 3.35with 5050 Hz ba

As in Figure 3.22. First formant frequency is 1000 HzHz bandwidth. Second formant frequency is 2700 Hz withndwidth.

- 89

100

s0

40

30

20

10

0

OB)100

70

30

50

30

20

to

00

Figu


I A

jt U

H IM I 11 Ill I I W , J 11 J I I I I I I I I

111111 IIJ 11,11 d I I I I I I J M illi I ji H II

11111111111 IN l I Ill H ill H I L IIIIIII 11111111 M ill


Figure 3.36 Aswith 500 Hzwith 500 Hz

in Figure 3.22. First formant frequency is 1000 Hz

bandwidth. Second formant frequency is 2700 Hz-

bandwidth.

- 90 -

\ 1- -

100

AI I A

11111 11 1 H ill M ill

4 11111 11 11111 Jill

60

20

10

50000

0

1000

1000

4000

4000

OB)100

80

70

s0

20

10

0E000

80



Figure 3.37with 50

As in Figure 3.22. First formant frequencyHz bandwidth. Sencond formant frequency is

is 600 Hz2300 Hz

with 500 Hz bandwidth.

- 91

0

1 100

I- J1111 H 11 11 IN I I j I I 1 11.1

I I I U 111 11 (1 IRM O- 11 1 11111 11 1 111111 1111H

I 1 1 114 1 111 1 11111111 1 1 Iffl - t 1 111111 111 1111A I 1 9 111 1.1 U 1111 ti 11 1i 11 1 1 1111111111 1 1

p I P I I I l' I

30

20

10

0S000

(B)100

70

630

30

20

10

05000

4000

4000

- - - I 4 1'i

0 1000

80

70

6o

2000FREQUENCY

3000(HZ)

4000

(

I I HIM 1 A I

... 1 M l I 1 1 l 1 1 1 A l 1 1 1 1 1 : 1


4000

Figure 3.38with 100with 100

As in Figure 3.22. First formantHzHz

frequency is 600 Hzbandwidth. Second formant frequency is 2300 Hzbandwidth.

- 92

0

I W 11 IIM 111

911 1111 - 111 5

J1 i

IIIHIIII I MU HIM IPH 111W40101 HIM 11111 IIIH IIII...11

s0

so

.30

10

0

OB)100

s0

30

20

10

00

80

500



Figure 3.39with 500

As in Figure 3.22. FIrst formant frequency is 600 HzHz bandwidth. Second formant frequency is 2300 Hz

with 50 Hz bandwidth.

- 93

I I AI(

11

4000

DB)100

s0

30

20

10

(DB)100

60

so

30

20

10

0

0

(0 4000 5000

H ill I 1 1111 11 11 1 1

IJ 111 11 11 H 11 U I Ill

i IN A wiftill 1 111'

I

Chapter 4

Discussion

The present study was designed to assess infinite peak

clipping as a means of reducing the amplitude variations in

speech levels. The study was based on the following two

assumptions about hearing and the perception of speech. The

first is that the ear analyzes sound into "critical-band"

filters, the outputs of which are relevant to the perception

of loudness. The second assumption is that the primary cues

to intelligibility lie in the spectral (magnitude) patterns

of speech. The investigation was designed to answer

questions based on these assumptions: what are the effects

of post-filtering speech after infinite-peak clipping, and

how can spectral distortion be minimized? The goal was to

determine if an amplitude compression system based on

infinite peak clipping could be designed which would give

maximum compression and minimum spectral distortion, thereby

maximizing intelligibility for individuals with sensorineural

hearing loss.

- 94 -

4.1 Compression Results

The general result from the study of compression was the

insensitivity of the amplitude range of clipped speech to the

characteristics of the filters that preceded and followed the

clipper. Regardless of which pre-filters and post-filters

were used, clipping reduced the range of speech by at least

15 to 20 dB. Output ranges were on the order of 10 to 20 dB.

The general trends of the effects of pre- and

post-filtering clipped speech can be understood qualitatively

with concepts discussed by DeGennaro et al. (1981). The

fluctuations in a bandpass speech envelope can be modeled as

consisting of slow and rapid components. The rapid

components result from the narrowband filtering (the rate of

fluctuations are on the order of the bandwidth) and account

for 10-15 dB of the measured range. The slow component (with

fluctuations on the order of a few per second) results from

variation in the overall intensity and in spectral

distributions across different speech sounds. The slow

component accounts for about 30 dB of the range. To a first

approximation, the effect of infinite peak clipping on the

range measured after narrow post-filtering is to remove the

slow component. An example is a third-octave band of noise

that is slowly modulated over a large decibel range, then

clipped and post-filtered by a third-octave filter with the

- 95 -

same center frequency. The zero-crossing pattern is

unaffected by the slow modulation, so the clipped noise will

exhibit the same envelope variations as if there were no slow

modulations.

The preceding analysis applies to the case where the

post-filter is in the spectral region that is dominant at the

input to the clipper. If this dominant region changes over

time, variations in the clipped spectral distributions will

contribute to the measured output range.

This discussion aids in interpreting the finding that

with no pre-filtering the 16 band ranges were slightly

smaller than with a highpass pre-filter (the optimal filter

or differentiator), or a bandpass pre-filter (formant

filters). With no pre-filtering, the input speech waveform

(and hence the sequence of zero-crossings) is dominated by

the low-frequency first formant. The harmonics in the higher

frequency regions will be relatively constant across the

various speech sounds after clipping. If the first-formant

is reduced in amplitude by highpass filtering, the input

zero-crossing sequence will exhibit greater variability,

resulting in greater variation in the clipped spectrum across

the different speech sounds.

- 96 -

It was also observed that ranges were reduced further if

the pre-filter bandwidth was narrowed to one-third octave or

less. This effect can be understood by considering the case

of a very small pre-filter bandwidth. The input to the

clipper will approximate a tone with slowly varying amplitude

and frequency modulation. Since the clipped signal will be

frequency modulated with constant amplitude, the output of

the third-octave filter will also be a tone with slow

frequency modulation and an amplitude modulation resulting

from FM-to-AM conversion by the post-filter. Thus, the

narrower the input band, the less frequency modulation and

consequently the less amplitude modulation there will be

after post-filtering.

The finding that the clipped range is not strongly

dependent on the bandwidth or slope of the post-filter

inplies that the exact characteristics of the ear's

critical-band filters need not be known in order to make

meaningful measurements of range.

It is interesting to speculate on whether there is a

minimimum range of speech levels that can be achieved after

(auditory) post-filtering by any compression system. When

one considers that the envelope ranges of narrowband noise

and harmonic complexes are on the order of 10-15 dB, one must

wonder how the variations in the speech envelope could be

- 97 -

further reduced. It would seem that further reduction of

these envelope variations could be achieved only in special

cases.

4.2 Comparison with AGC Systems

Krieg (1980) studied multiband amplitude compression

achieved by independent AGC in each band. The input speech

was CVC lists spoken by a male talker, and the compression

ratios in the 16 frequency bands were 3 (with a deviation of

approximately 0.5 in channels 11 and 16). The input

third-octave ranges were 25 to 42 dB, averaging 36.8 dB

across all 16 channels (Krieg, 1980, Figure 9). After

compression, the ranges were 14 to 38 dB, averaging 24.3 dB

across all frequency bands. The actual amount of compression

was approximately 12 dB (Krieg, 1980, Figures 11 and 15) and

the effective compression ratio was approximately 1.5.

Infinite peak clipping, in the present study, reduced

the range of speech by approximately 20 dB, 8 dB more than

the AGC system used by Krieg (1980). Assuming that greater

intelligibility results from presenting a greater range of

speech within the residual hearing area of the listener, then

clipping should allow superior intelligibility results when

the listener's dynamic range is less than about 20 dB.

- 98 -

4.3 Spectral Modifications

The spectral pattern of clipped unvoiced sounds can be

predicted from the arcsin law (Section 3.4). After clipping,

spectral peaks will appear at odd multiples of the original

spectral peaks. Clipping spreads the spectrum, filling in

the low-level region much as a noise floor would. Clipped

spectra with two formants have harmonic peaks at not only odd

multiples of the formants but at combination frequencies.

The lower-frequency formant, when stronger, tends to obscure

the higher-frequency formant.

The clipped spectra of voiced sounds are dependent on

the phases of the input frequency components, so the arcsin

law cannot be aplied. Except for the fact that the clipped

harmonic spectrum contains frequency components with the same

fundamental spacing as the unmodified spectrum (i.e. there

is no counterpart to the "noise floor" of unvoiced sounds),

the general results are similar with voiced and unvoiced

sounds. Spectral peaks which appear at odd multiples of the

formants and combination frequencies after clipping will

obscure the low-level formants.

A remarkable aspect of clipping vowels is the degree to

which spectral peaks can be enhanced. This finding suggests

that infinite peak clipping could be exploited in schemes for

- 99 -

formant detection and tracking.

4.4 Implications for Hearing Aid Design

As stated above, the present study sought to determine

the effects of infinite peak clipping in order to minimize

both output ranges in critical-band-like filters and spectral

distortions. It was found that output ranges are relatively

insensitive to post-filtering and slightly dependent on

pre-filtering. If minimizing output ranges were the only

criterion, there would be either no pre-filtering or

narrowband pre-filtering. However, pre-filters like Thomas

and Niederjohn's (1968) 1100-Hz highpass filter and

Hildebrant's (1982) formant filters produced ranges only a

few dB larger than no pre-filter and are known to produce

greater intelligibility. The differentiator pre-filter gave

the largest ranges of the pre-filters investigated.

The present results on spectral distortions will help to

determine a choice of pre-filter. Even though the narrowband

filters produce the greatest amount of compression, the

relative spectral qualities among frequency bands would be

lost in such a system. For multiple-formant sounds it would

seem that a pre-filtering scheme that separates the formants,

like Hildebrant's (1982), would be beneficial for preserving

the formants (without losing the relationship between bands)

- 100 -

and avoiding interactions in the clipper. However, for

single-peak sounds, this scheme may result in spurious output

peaks, or a less well-defined peak than in the input. In

this respect, a single-channel clipper with highpass

pre-filtering might prove superior.

Of course, the particular loss of the individual should

also be considered. For example if the individual depends

only on first formant cues, and has virtually no second

formant hearing, then pre-filtering with an optimal filter

will hinder intelligibility by decreasing the saliency of the

cues.

4.5 Recomendations for Future Work

Future work should test intelligibiltiy with a 3-band

clipper and optimal filter/clipper systems on

hearing-impaired listeners with small dynamic ranges.

Ideally, results could be compared with performance on other

amplitude compression systems using the same subjects.

Examination of the types of speech errors for both

hearing-impaired and normal subjects should help to determine

the next course of action.

- 101 -

REFERENCES

Braida, L.D., et al. (1979). "Hearing Aids--A review ofPast Research on Linear Amplification, AmplitudeCompression, and Frequency Lowering," ASHA Monograph No. 19

Coln, M.C.W. (1979). "A Computer Controlled MultibandAmplitude Compressor," Master's Thesis, Mass. Inst. Tech.,Cambridge, MA.

De Gennaro, S., Braida, L.D., Durlach, N.I. (1981). "AStatictical Analysis of Third-Octave Speech AmplitudeDistributions," Paper presented at the 101st meeting of theAccoustical Society of America, Ottawa, Ontario, Canada

De Gennaro, S. Krieg, K.R., Braida L.D., Durlach, N.I.(1981). "Third-Octave Analysis of Multichannel AmplitudeCompressed Speech," Proceedings of IEEE Inter. Conf. onAcoust., Speech and Signal Processing, Atlanta, Georgia

De Gennaro, S. (1982). "An Analytical Study of SyllabicCompression for Severely Impaired Listeners," PhD Thesis,Mass. Inst. Tech., Cambridge, MA.

Dunn, H.K., White, S.D. (1940). "Statistical Measurementson Conversational Speech," JASA 11, pp278-288

Fawe, A.L. (1966). "Interpretation of Infinitely ClippedSpeech Properties," IEEE Trans. on Audio and.Electroacoustics 14, ppl78-183

Green, D.M., Swets, J.A. (1966). Signal Detection Theoryand Psychophysics, (John Wiley and Sons, New York)

Guidarelli, G. (1981). "Intelligibility and NaturalnessImprovement of Infinitely Clipped Speech," Acoustics Letters4, No.7

Hildebrant, E.M. (1982). "An Electronic Device to Reducethe Dynamic Range of Speech," Bachelor's Thesis, Mass.Inst. of Tech., Cambridge, MA

IEEE (1969). "Recommended practice for speech qualitymeasurements." IEEE Trans. Audio Electroacoust. 17,pp225-246

- 102 -

Kac, M., Siegert, A.J.F. (1947). "On the Theory of Noisein Radio Receivers with Square Law Detectors," J. AppliedPhysics 18, pp383-397

Krieg, K.R. (1980). "Third Octave Band Level Distributionsof Amplitude Compressed Speech," Bachelor's Thesis, Mass.Inst. of Tech., Cambridge, MA.

Licklider, J.C.R. (1946). "Effects of Amplitude Distortionupon the Intelligibility of Speech," J. Acoust. Soc. Am.18, pp429-434

Licklider, J.C.R., Bindra, D., Pollack, I. (1948). "TheIntelligibility of Rectangular Speech-Waves," Am. J.Psychology 61, ppl-20

Licklider, J.C.R., Pollack, I. (1948). "Effects ofDifferentiation, Integration, and Infinite Peak Clippingupon the Intelligibility of Speech," JASA 20, pp42-51

Papoulis, A. (1965). Probability, Random Variables andStochastic Processes, (McGraw-Hill, New York)

Pollack, I. (1952). "On the Effect of Frequency andAmplitude Distortion on the Intelligibility of Speech inNoise," JASA 24, pp538-540

Rabiner, L.R., Schafer, R.W. (1978). Digital Processing ofSpeech Signals, (Prentice-Hall, New Jersey)

Thomas,. I.B., Niederjohn, R.J. (1970). "TheIntelligibility of Filtered-Clipped Speech in Noise," J.Audio Engin. Soc. 18, pp299-303

Thomas, I.B., Sparks, D.W. (1971). "Discrimination ofFiltered/Clipped Speech by Hearing-Impaired Subjects," JASA49, ppl881-1887

- 103 -

Appendix A

Probability Densities of Test Signals

Narrow-band Noise

The probability density for the random variable, V,

generated by squaring and then lowpass-filtering a band of

noise centered at a high frequency was calculated by Kac and

Siegert (1947) to be

00 exp(-V/) s)P(V) = ----------------- V>O (Al)

s (1 i3s)

where i is the ith eigenvalue in the solution of an

integral equation. For the specific case where the input

noise spectrum is that of a "simple-tuned" circuit and the

lowpass filter is a simple RC-lowpass filter, the

eigenvalues are found from the zeroes of the (R-1)th order

Bessel function, where R is the ratio of the input noise

bandwidth to the lowpass filter's -3 dB bandwidth.

J L2(R/, i) J = 0. (A2)

R-/

- 104 -

Probability density functions were computed for values of R

between 0 and 1 (Figure Al), and values between 1 and 8

(Figure A2). The case of R=0 corresponds to that of a

Rayleigh random variable squared, or an

exponentially-distributed variable. The range was computed

to be the decibel difference between the the two values, V10

and V90, at which the cumulative distribution function is

equal to 0.10 and 0.90, respectively. The range values are

graphed in Figure 2.2.

Tones

An input sine wave with unit amplitude and cyclic

frequency f=w/2T( is considered in this section. Squaring

and lowpass filtering of this signal results in the variable

1 - /H(2f)//cos(4Thft)V =----------- --------------------- (A3)

2

where

H(f) = 1/ 1+RA

and R is the ratio of f to the -3 dB bandwidth of the

RC-lowpass filter.

- 105 -

Because of symmetry in the time-wave, the calculations

can be limited to the range for c^'= 47'fft of 0< /<17/. The

cumulative distribution function for V is found by

integrating the probability density function on o/ (which is

assumed uniformly distributed) between the limits determined

by the inverse function,

(1-2V)cX =F (V) = cos L ------------ ] . (A4)

/H(2f)/

The cumulative distribution function is derived below.

Pr(V< Vo) = Pr(c/< F (Vo))

/v -

= arccos L(1-2V)// H(2f)/ I (A5)

The two values, V10 and V90, at which the cumulative

distribution is equal to 0.10 and 0.90, were calculated from

equation A5. The range, which is the decibel difference of

these two values, is graphed as a function of R in Figure

2.3.

- 106 -

2.0 -

.8

0.0

~~1

R 1/21/41/80 (exponential density)

t-

0 1 2 3 4V

Figure Al Probability density functions for narrowband noiseafter squaring and lowpass filtering for R less than 1.R is the ratio of the input noise bandwidth to the lowpassfilter's -3dB cutoff.

- 107 -

I

.41

2 .0 -

1.8 R8--1R -- 8

/ \I \

1.2 -x 4 -1

>\ 214

0 1 2 3 4V

Figure A2 Probability density functions for narrowband noiseafter squaring and lowpass filtering for R greater -than orequal to 1.

- 108 -

APPENDIX B

10 Harvard sentences used as input for experiments. Speakeris male.

1. A large size in stockings is hard to sell.

2. The birch canoe slid on the smooth planks.

3. Glue the sheet to the dark, blue background.

4. It's easy to tell the depth of a well.

5. These days a chiken leg is a rare dish.

6. Rice is often served in round bowls.

7. The juice of lemons makes fine punch.

8. The box was thrown beside the parked truck.

9. The hogs were fed chopped corn and garbage.

10. Four hours of steady work faced us.

- 109 -

Documents

INFINITE PEAK CLIPPING OF AMPLITUDE COMPRESSION