51
A Review of Auditory Perceptual Theories and the Prospects for an Ecological Account Ewan A. Macpherson Department of Psychology University of Wisconsin-Madison (In partial fulfillment of Preliminary Exam requirements) July 1995

A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

A Review of Auditory Perceptual Theor iesand the Prospects for an Ecological Account

Ewan A. Macpherson

Department of PsychologyUniversity of Wisconsin-Madison

(In partial fulfillment of Preliminary Exam requirements)

July 1995

Page 2: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

Contents

1 Introduction 11.1 Motivation 11.2 Definitions 11.3 The neglect of auditory perceptual theory 2

2 Opinions on the role of auditory perception 42.1 Opinions on the role of perception in general 42.2 The role of hearing according to Helmholtz 52.3 The role of hearing according to James 72.4 The role of hearing according to Gibson 72.5 Other opinions: identification or source recovery? 82.6 Summary 9

3 Theories of Auditory Perception 93.1 Helmholtz's account of audition 93.2 James' account of audition 103.3 Brunswik's probabili stic functionalism 113.4 Gibson's account of audition 123.5 Computational accounts of audition 13

4 Establishment ecological research 154.1 Brunswick & Mohrmann: loudness constancy 154.2 Auditory scene analysis & auditory image perception 17

4.2.1 Bregman: auditory scene analysis 174.2.2 Yost: auditory image perception 194.2.3 Summary 21

4.3 Ballas & Howard: interpreting environmental sound 22

5 Ecological ecological research 245.1 Time-to-contact: acoustic looming 255.2 Using auditory information for active contact 265.3 Transformational invariants: breaking & bouncing 275.4 Perceiving numbers by audition 285.5 Acoustic texture in distance perception 29

6 Prospects for an Ecological account 296.1 Auditory ecology 306.2 Superposition 326.3 Specification 336.4 Auditory affordances 37

7 Conclusions & Speculations 41

References 45

Page 3: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

1

1 Introduction

1.1 Motivation

During the spring of 1995 I attended a seminar given by Bill Epstein entitled "Thinking and

Perceiving". The discussion centered around the conception of perception as a process of

unconscious inference, and starting with the writings of Helmholtz and Berkeley, continued through

to the computational constructivism of Marr. In addition to the various construals of this notion, we

also dealt with the objections and alternatives which have been offered, and discussed what sorts of

experimental results would stand as evidence for one position or another. While unspecified in the

course titl e, the 'perceiving' referred to throughout was uniformly visual, with littl e reference to

audition or the other modaliti es. Thus, this paper is motivated in part by my speculation about how

the content of the seminar would have changed if hearing were the canonical sense for perceptual

theorizing. In keeping with this theme, I have somewhat liberally included short quotations from

several authors in lieu of "readings" since it is often enlightening to see cited authors' original words.

More specifically, my aims in this paper are threefold: to review perceptual theorizing carried

out within the context of audition; to examine a selection of experiments motivated by the differing

theoretical viewpoints; and finally, to look criti cally at the diff iculties that proponents of direct

perception might face in importing a theory framed primarily in terms of vision into the auditory

modality. Since these aims are rather interdependent some topics are discussed in more than one

context, but I hope that I have been successful in minimizing repetition. Rather than getting mired

in a full -fledged analysis of the direct perception debate, I have attempted to take each position on

its own merits while (I hope) maintaining a suitable and evenhanded skepticism.

1.2 Definitions

Before beginning the discussion proper, I would like to define what I mean by the terms

'Establishment' and 'Ecological', which I use to characterize the two main styles of perceptual theory.

These refer respectively to accounts of perception which posit mediation by various psychological

processes and those which do not. In terms of a succinct introduction to the direct-indirect debate

I do not feel that I can do better than to offer the following passage by Rock:

Page 4: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

2

"... the essence of a direct theory is that stimulus information is available thatuniquely correlates with each particular perception. Thus the specification of suchinformation provides the necessary and suff icient explanation of perception. Theessence of an indirect theory is that the stimulus information, while a necessarydeterminant, is not suff icient, because certain mediating processes must occur, oncethe stimulus information is registered or picked up, prior to the achievement of thepercept. Such mediating processes can be described in psychological language andare a necessary part of the chain of event leading to the final perception. In myopinion, these processes could be either interactive in nature, such as were stressedby the Gestaltists, or they could be cognitive or thoughtlike in character. Examplesof such processes are variously referred to as 'organizing' or 'grouping', 'interpreting','taking account of', 'computing', 'inferring', 'describing', 'deciding', and the like."(Rock 1980)

The position which sets itself against accounts involving mediation is variously referred to direct

perception, direct realism, or ecological perception. The program identified by these terms involves

a rather radical redefinition of perception and stimulation, and since words tend to take on special

meanings in this context I will use 'Ecological' to refer to the specific approach and 'ecological' for

environment-directed perception in general. The Ecological approach actively contrasts itself with,

and defines itself negatively with respect to inferential accounts - it is "indirect" which "wears the

trousers" (Turvey et al. 1981). Therefore it seems reasonable to adopt Fodor and Pylyshyn's use of

the term 'Establishment' to refer to the collection of theories with which the Ecologists take issue

(Fodor & Pylyshyn 1981).

1.3 The neglect of auditory perceptual theory

Also as a preliminary, I would like to briefly discuss the relative neglect of audition in perceptual

theorizing. The roots of the traditional, constructivist view of perception lie in analyses of vision.

Bishop Berkeley discussed the perception of space, and it was in terms of vision that Helmholtz

presented his theory of perception as a process of unconscious inference (Helmholtz 1867). He made

littl e reference to similar matters in his monograph on auditory perception (Helmholtz 1877), and

in fact the title of the latter refers explicitl y to the "sensations of tone". Boring (1942) construed

"auditory theory" to be a framework for discussing the physiology of the inner ear, while more

modern collections with pan-modal titl es (Cognitive Approaches to Human Perception (Ballesteros

1994), for example) still cheerfully ignore the non-visual modaliti es. Contemporary contributors to

Page 5: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

3

cognitive constructivist theory also work primarily with vision (for example Gregory 1993, Rock

1983, and Shepard 1990), as do computational constructivists such as Marr (1982).

In the last three decades an alternative to familiar accounts of perception, inspired by the

work of Gibson has gained some acceptance. While Gibson addresses all the modaliti es as

'perceptual systems' (1966), the full exposition of his theory deals explicitly with vision (1979). The

most serious proponents of his program tend to similarly dwell on vision, sometimes restricting their

discussion of audition to a single page (Michaels & Carello 1981).

Unsurprisingly, this visual bias persists in most discussions of the relative merits of

traditional and direct theories of perception. Typical examples are the target article by Ullman and

the resulting commentary (Ullman 1980), the debate between Fodor & Pylyshyn (1981) and Turvey

et al. (1981), and the analyses by Bruce & Green (1990) and Hochberg (1994).

The main thrust of auditory research also seems to have proceeded in the absence of

discussion of the fundamental nature of perceptual processes. Licklider (1959) remarks:

"There is no over-all theory of hearing. No one since Helmholtz has tried to handleanything like all the known problems within a single framework. Each of the severaltheories of hearing that are extant deals with a restricted set of questions."

This seems true today, and of course the number of "known problems" continues to increase. To

what can we attribute this lack of theorizing, or conversely why has such work been more often

undertaken in the visual domain? The explanation seems to lie partly in beliefs or intuitions about

differences between the two modaliti es and about sound itself, but more so in the historical roots of

certain lines of experimentation.

Firstly, hearing has traditionally been thought of as passive and vision active. For example,

Dowling et al. (1987) cite Shopenhauer's belief that music's affective power is due to the passive

nature of hearing, which allowed "brain-fibres" to vibrate in synchrony with musical tones.

Secondly, the "products" of hearing were more often described in terms of sensation and rarely in

terms of object perception, and the interest in the perception of musical tones rather than of the

"noises" produced by everyday sources reinforced this emphasis.

Page 6: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

4

Yost (1990) elaborates on this, and suggests that the early direction of hearing research and

experimentation rests on an historical accident of timing. Sound was not considered to be localized

in space, and thus it was unclear how sound sources could be localized except by association with

perceptions derived from sight and touch. Helmholtz's psychoacoustic investigations revealed the

ear to be a sensitive frequency analyzer, and Lord Rayleigh's sound localization experiments came

after interest had been focused on the analysis problem (Strutt 1877). Licklider (1959) also credits

Helmholtz with establishing boundaries of interest within auditory science, and points out that

although von Bekesy's discovery of mechanical tuning within the cochlea disproved Helmholtz's

resonance theory, it merely altered the way frequency selectivity was studied. A final factor may be

that hearing provides a fruitful and "clean" domain for the application of the theory signal detection

(Green & Swets 1966). Thus the tendency to concentrate on basic psychophysics has persisted

throughout this century.

2 Opinions on the role of auditory perception

2.1 Opinions on the role of perception in general

Before examining some writings on the nature of the auditory process, I would like to survey

comments by a number of authors on the role of hearing. Differing views of what should properly

be considered its function or end-products must have an effect on the types of processes postulated

or required. In general, Ecological and Establishment advocates hold somewhat different views of

the role of perception, which I will present before moving on to comments specifically about

audition.

The problem of comparison is complicated by the fact that the two camps not only ascribe

different roles to perception but also define what counts as perception differently. Both the

Establishment and Ecological accounts acknowledge that perception serves to provide information

about the environment. For the Establishment, perception is the process of deriving mental

representations of the objects and events in the environment - the process of "getting the outside

inside". For example Pylyshyn (1984) defines sensory transducers as mechanisms for producing

symbols which depend on states of the environment, Willi am James refers to perception in terms of

Page 7: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

5

the conscious awareness of external objects (James 1890), and Fodor (1975) makes frequent

reference to "perceptual knowledge". Perception serves to provide knowledge of "what is where"

in the world, and action is guided on the basis of that knowledge.

In the Ecological view, perception is a keeping-in-contact which supports action, while the

emphasis in Establishment theories is more epistemological. In Ecological accounts, action is

'directly' related to perception, while in Establishment theories the relationship is 'mediated' by other

processes. The following passages ill ustrate the Ecological view:

"Perceiving is an achievement of the individual, not an appearance in the theater ofhis consciousness. It is a keeping-in-touch with the world, an experiencing of things,rather than a having of experiences. It involves awareness-of instead of justawareness. It may be awareness of something in the environment or something in theobserver or both at once, but there is no content of awareness independent of that ofwhich one is aware." (Gibson 1966)

"Fodor and Pylyshyn, as Establishment theorists, concentrate on how ones takes theenvironment, appealing to verbal labels of experience to lead the way in delineatingsubject matter. when the concentration is shifted to perceptual guidance of activity,however, it is clear that most of this continuous, nested perceiving lacks words forreferring to it. ... Fodor and Pylyshyn's kind of perception (in percepts) is whatevereventuates in a perceptual judgement or belief. Gibson's kind of perception, incontrast, is that which eventuates in the 'proper' adjustment or oriented (to variouslevels of the environment) activity." (Turvey et al. 1981)

This distinction between non-propositional perception-of and propositional perception-as is a major

point for Ecological theorists. The division between those adhering to the -as and -of interpretations

is not cleanly along "mediated" and "direct" lines, however. John Searle, certainly no supporter of

unconscious inference accounts of mental phenomena, explicitl y states that "all perception is

perception-as" (Searle 1992).

2.2 The role of hear ing according to Helmholtz

Most of Helmholtz's writings on hearing are found in the monograph On the Sensations of Tone As

a Physiological Basis for the Theory of Music. As the title suggests, this is a work with rather

specific aims. In particular, it deals with the perception of "musical tones" (defined as steady-state

Page 8: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

6

combinations of sine-tone partials) and not with everyday sounds, which Helmholtz referred to as

"noises". Despite this emphasis on hearing in a musical context it may be possible to draw some

conclusions about his thinking about the role of hearing in general.

Firstly, sensation is stressed as playing a more dominant role in hearing than in the other

senses (again, in a musical context). In the introduction, Helmholtz writes:

"Music stands in a much closer relation to pure sensation than do the other arts. Thelatter rather deal with what the senses apprehend, that is with the images of outwardsobjects, collected by psychical processes from immediate sensation. ... in music, thesensations of tone are the material of the art. So far as these sensations are excitedin music, we do not create out of them any external objects or actions. Again, whenin hearing a concert we recognize one tone as due to a violin and another as due toa clarinet, our artistic enjoyment does not depend upon our conception of a violin orclarinet, but solely on our hearing of the tones they produce, whereas the artisticenjoyment resulting from viewing a marble statue does not depend on the white lightwhich it reflects into the eye, but upon the mental image of the beautiful human formwhich it calls up." (Helmholtz 1877)

So although the listener can identify the source of a tone, the "raw" sensation of timbre is very clearly

present in awareness. Source identification is possible, but not necessarily the single overriding goal.

A second emphasis, on the challenge of source separation, does suggest an important place

for the "images of outwards objects" in hearing. As well as considering the abilit y to follow separate

melodic lines in a piece of music, the reader is also asked to consider a ball room:

"Here we have a number of musical instruments in action, speaking men and women,rustling garments, gliding feet, clinking glasses, and so on. All these causes give riseto systems of waves, which dart through the mass of air in the room, are reflectedfrom its walls, return, strike the opposite wall , are reflected again, and so on untilthey die out. ... in short, a tumbled entanglement of the most different kinds ofmotion, complicated beyond conception. And yet the ear is able to distinguish all theseparate constituent parts of this confused whole ..." (Helmholtz 1877)

Presumably this separation is supposed by Helmholtz to allow the listener to "apprehend" the

speaking men and women, the rustling clothes, etc.

Page 9: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

7

2.3 The role of hear ing according to James

Willi am James also advanced a "knowing what is where" view of perception's role. Throughout the

chapters on perception in The Principles of Psychology, he discusses both the visual and auditory

modaliti es in parallel, drawing no fundamental distinction between them. Perception results, in his

view, in conscious ideas suggested by sensation. "The first of these ideas is that of the thing to

which the sensible quality belongs. The consciousness of particular things present to sense is

nowadays called perception" (James 1890). In an auditory example (taken somewhat out of

context), he writes: "Thus, I hear a sound, and say 'a horse-car'". That is, the object is identified by

its sound.

2.4 The role of hear ing according to Gibson

Leaping ahead to the mid-20 century, one might expect Gibson to have a somewhat different viewth

on the role of auditory perception, but the explicit differences to be found are subtle. In The Senses

Considered as Perceptual Systems he writes:

"The function of the auditory system, then, is not merely to permit hearing, if by thatis meant the arousal of auditory sensations. Its exteroceptive function is to pick upthe direction of an event, permitting orientation to it, and the nature of an event,permitting identification of it." (Gibson 1966)

The obvious difference is the substitution of 'event' for 'object', but since by necessity the production

of sound involves a dynamic event, this might be construed as a difference in terminology. A greater

difference is the proposal that the 'nature' of an event is picked up. This presumably consists of the

shapes, motions, and materials involved in the production of the sound, but it is diff icult to interpret

Gibson's usage precisely, and the point is not elaborated in the most mature incarnation of his theory

(1979), which considers only vision. In light of the theory of affordances, the idea that picking up

the nature of an event subserves its identification seems somewhat inconsistent with an Ecological

stance. I will return to the discussion of the problem of auditory affordances in Section 6.4.

2.5 Other opinions: identification or source recovery?

Page 10: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

8

Other writers also stress identification in the auditory modality. Of these some explicitl y identify

their viewpoint as Ecological while others do not. As an example of the latter, Schubert (1974)

proposes the Source Identification Theory as an organizing principle of the auditory system, at least

in the processing of non-speech sounds. For speech he extends this to include a principle of Source

Behavior recognition in an effort to embrace the motor theory of speech perception. In this account,

the listener uses the sound stimulus to identify articulatory gestures, and from these derives the

phonemic and semantic content of an utterance. The means by which Schubert suggests this is

accompli shed are far from unmediated, however. Another promotion of source identification is

found in Jenkins' ecological but somewhat un-Gibsonian meditation on acoustic information (Jenkins

1985). The majority of the examples given refer to gaining "what is where" knowledge of sound-

producing objects.

The idea that listening to speech is exceptional is challenged by Fowler, a committed direct

realist. While in agreement with Schubert that in this case the auditory system recovers "the causal

source of the acoustic signal" (Fowler 1991), she maintains that it is wholly unspecial in that regard

and that all hearing involves event recovery rather than associating objects with sounds (ie.

identification). While admitting that there are situations in which there is no adaptive advantage in

perceiving events directly, her argument is that there frequently is such an advantage and therefore

that evolutionary pressures will have produced an auditory system which attempts to do exactly that.

In addition to Fowler's writings in the context of speech perception, perhaps the most serious

examination of the role of audition from an Ecological perspective is to be found in a pair of papers

by Gaver (1993a, 1993b). Here he proposes that our auditory sense exists to pick up sound-carried

information "...about an interaction of materials at a location in an environment". The sound

reaching a listener's ears is held to bear information about each of these elements: the nature of the

interaction, striking or scraping, say; the materials involved, wood or water; the location relative to

the listener or to the environmental setting; and the nature of the environment itself, in terms of

reflectiveness and configuration of surfaces. The example of sound from a moving car is provided:

"We can hear an approaching automobile, its size and its speed. We can hear whereit is and how fast it is approaching. And we can hear the narrow echoing walls of the

Page 11: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

9

alley it is driving along. These are the phenomena of concern to an ecologicalapproach to perception." (Gaver 1993a)

Thus what are heard are various physical features of environmental events, but as with Fowler, Gaver

does not attempt to make the case that these are necessarily ecologically-significant features

analogous to Gibson's visual affordances.

2.6 Summary

To review then, there seem to be two views on the role of auditory perception in addition to

Helmholtz's sensation-based account of music perception. The Establishment story is that hearing

serves to localize and identify sound-producing objects, while the Ecological view holds that the

physical nature of sound-producing events is directly perceived - the causal source of the acoustic

signal is recovered. As noted previously this account is not strictly ecological in the way visual

theories of perception-for-action claim to be.

3 Theor ies of Auditory PerceptionHaving examined the range of viewpoints on the role of auditory perception, I now turn to

discussions of the processes which are held to underlie the fulfillment of this role. These are quite

varied, including hints of unconscious inference in the writings of Helmholtz, the direct perception

approach of Gibson, and auditory applications of computational constructivism.

3.1 Helmholtz's account of audition

Beginning again with Helmholtz, we find that he devotes littl e discussion to the mental processes

involved in hearing. This may be largely due to his emphasis on the "sensation of tone", rather than

on adaptive auditory perception outside a musical context. However, a number of passages suggest

that Helmholtz feels that a great deal of work needs to be done on the auditory input in order to

produce separate percepts for the sound sources contributing to it. He is not as explicit as in his

advancement of unconscious inference as a theory of visual perception, but he certainly suggests

ratiomorphic, constructional mental activity. The three quotations which follow give the sense that

the auditory system is involved in analysis, inference, and problem solving respectively. The second

is preceded in the original by a passage describing the visual inspection of the surface of the ocean

Page 12: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

10

and the ease with which the superimposed systems of waves are separated by eye. (The emphases

are not present in the originals).

"We shall see that the ear has no decisive test by which it can in all cases distinguishbetween the effect of a motion of the air caused by several different music tonesarising from different sources, and that caused by the music tone of a single soundingbody. Hence the ear has to analyze the composition of single musical tones, underproper conditions, by means of the same faculty which enabled it to analyze thecomposition of simultaneous music tones."

"I must own that whenever I attentively observe this spectacle [the visual separationof ocean wave systems] it awakens in me a peculiar kind of intellectual pleasure,because it bares to the bodily eye, what the mind's eye [perception in general?] graspsonly by the help of a long series of complicated conclusions for the waves of theinvisible atmospheric ocean."

"Now there are many circumstances which assist us first in separating the musicaltones arising from different sources, and secondly, in keeping together the partialtones of each separate source. Thus when one musical tone is heard for some timebefore being joined by the second, and then the second continues after the first hasceased, the separation in sound is facilit ated by the succession in time. We havealready heard the first musical tone by itself and hence know immediately what wehave to deduct from the compound effect for the effect of this first tone."

3.2 James' account of audition

James discusses perception as a general process without strongly differentiating between the

modaliti es, although he does seem to side with Bishop Berkeley in asserting the primacy of touch,

and is quite explicit in his description of the processes. The account is sensation-based and

constructivist, and is well -summarized in the following two quotations:

"Sensational and reproductive brain processes combined, then are what give us thecontent of our perceptions" (James 1890)

"Perception may then be defined, in Mr. Sully's words, as that process by which themind

Page 13: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

11

supplements a sense-impression by an accompaniment or escort of revivedsensations, the whole aggregate of actual and revived sensation beingsolidified or 'integrated' into the form of a percept, that is, an apparentlyimmediate apprehension or cognition of an object now present in a particularlocality or region of space." (James 1890)

Moreover, James' account is also clearly empiricist:

"Infants must go through a long education of the eye and ear before they can perceivethe realiti es which adults perceive. Every perception is an acquired perception."(James 1890)

and continuing in a footnote, he makes special reference to audition:

"The educative process is particularly obvious in the case of the ear, for all suddensounds seem alarming to babies. The familiar noises of house and street keep themin constant trepidation until such time as they have either learned the objects whichemit them, or have become blunted to them by frequent experience of theirinnocuity." (James 1890)

3.3 Brunswik's probabili stic functionalism

Occupying a position somewhere between traditional, perception-as constructivism and Gibson's

Ecological approach lies Brunswik's probabili stic functionalism, which influenced Gibson's thinking

significantly (Lombardo 1987). In this framework, the emphasis is on the perceptual constancies,

referred to as distal focusing, and on their achievement in non-laboratory, or "representative"

contexts. The perceptual process is held to take the form of statistical inference; proximal cues of

varying reliabilit y are weighted and combined to produce a "best bet" at the distal state of affairs.

The model of the process incorporates three types of weightings or correlations, referred to as

validities. Correlations between distal features and proximal cues are ecological validities; the

weightings placed on cues to produce percept features are criterial validities; and the degree of

correspondence between the distal feature and the percept is the functional validity. This last is a

metric of achievement.

Page 14: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

12

While Brunswik himself applied his methods principally to the three canonical visual

constancies (size, shape, and color) the same system has been applied to audition in a study of

loudness constancy (Mohrmann 1939). This work will be described in Section 4.1 as one example

of Establishment-style experimentation.

3.4 Gibson's account of audition

Gibson's account of the basis of auditory perception exactly parallels his treatment of vision, and has

no place for the cues which play such an important role in Brunswik's conception of ecological

perception. The hearing organism is said to use its li stening system, "two ears together with the

muscles for orienting them to a source of sound", to sample the 'acoustic array'. This permits the

pick-up of invariants which specify the mechanical sound-producing event. No mediation by

inference, memory, or computation is required. As in any direct theory, the usefulness of such a

process rests on specification, or the one-to-one mapping from sound-field properties to sound-

source properties. For example, interaural time and amplitude differences and their patterns of

change as the head moves are identified as specifiers of the location of a source. Two quotations wil l

serve as evidence of his belief in acoustic specificity:

"In meaningful sounds, these variables [spectral and temporal features] can becombined to yield higher-order variables of staggering complexity. But thesemathematical complexities seem nevertheless to be the simplicities of auditoryinformation, and it is just these variables that are distinguished naturally by anauditory system. Moreover, it is just these variables that are specific to the source ofthe sound - the variables that identify the wind in the trees or the rushing of water,the cry of the young or the call of the mother. The sounds of rubbing, scraping,rolling, and brushing, for example, are distinctive acoustically and are distinguishedphenomenally." (Gibson 1966)

"... the kind of wave train is specific to the kind of mechanical event at the source ofthe field; that is, the sequence and composition of pressure changes at a point in theair correspond to what happened mechanically... This correspondence is thejustification for our metaphorical assertions that the waterfall 'splashes', the wind'whistles', and the thunder 'cracks'." (Gibson 1966)

Page 15: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

The 'computational' in 'computational constructivism' refers specifically to a style of1

processing involving mathematical manipulations and explicitl y geometrical representations. AsPylyshyn has pointed out (1984), all forms of constructivism can be considered computationalsince inference is couched in terms of propositions, propositions are represented symbolically,and an operation over symbols is computation.

13

Gibson also repeats his argument against sensations as a basis for perception in the context of

hearing. A sound signal, as a function of time, can be decomposed into a collection of sinusoids, but

he points out that adopting this mode of analysis leads to the dubious assumption that any complex

sound can be reduced to a collection of pitch sensations. The point he makes is similar to what

Jenkins (1985) calls Johansson's Law of Perceptual Richness, which is that mathematically complex

stimuli may be hard to describe, but are information-rich, while mathematically-simple stimuli may

not be so simply dealt with by the perceptual system (Johannson 1985).

3.5 Computational accounts of audition

In Gibson's auditory theory, the pick-up of information is said to be performed by neural structures

which 'resonate' to the invariants of stimulation. By removing these processes from the

psychological "domain of discourse" (Ullman 1980) Gibson left them unanalyzed. Those interested

in artificial intelli gence and the development of perceiving machines do not have this luxury,

however, and must face the problem of actually extracting invariants. Despite this component, and

an emphasis on representational transformation, Gibson's Ecological approach is often identified as

a source of inspiration (as well as exasperation) by those who practice computational

constructivism . For example, Sloman attempts to incorporate affordance-like objects of perception1

into his computational theory, but writes: "... we need not stick with Gibson's mystifying and

unanalysed notions of direct information 'pickup' and 'resonance', although I shall sketch a design

for such a system that has distant echoes of these notions" (Sloman 1989). Marr holds a similar

view:

"Gibson's important contribution was to take the debate away from the philosophicalconsiderations of sense-data and the affective qualiti es of sensation and to noteinstead that the important thing about the senses is that they are channels forperception of the real world outside or, in the case of vision, of the visible surfaces."(Marr 1982)

Page 16: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

14

"Although one can criticize certain shortcomings in the quality of Gibson's analysis,its major, and in my view, fatal shortcoming lies at a deeper level and results froma failure to realize two things. first, the detection of physical invariants, li ke imagesurfaces, is exactly and precisely an information-processing problem, in modernterminology. And second, he underestimated the sheer diff iculty of such detection."(Marr 1982)

This combination of consideration of ecological constraints and formal computation has been termed

'natural computation' by Richards (1988), and forms yet another class of auditory theory. C.J. Searle

(1982) and Lyon (1983), among others,have applied these methods to auditory processes. Other

major impetuses are soundscape understanding or 'machine listening' (Elli s 1995), and automatic

music transcription (Nunn 1995). Curiously the design of speech recognition systems seems to have

proceeded without much contact with perceptual science, and the techniques used are often general-

purpose pattern recognition algorithms rather than auditory models.

Bruce & Green (1990) offer a possible reconcili ation between computational and Ecological

accounts, framed in terms of non-symbolic representation. Neural "maps" can represent variables

of the input and preserve isomorphisms, but as Searle (1992) maintains, once the neurophysiological

bases of these maps are understood, the incentive to characterize the process in terms of symbolic

computation is greatly reduced. For example, it appears that interaural time differences are mapped

to "place" in the medial nucleus of the superior olivary complex (Pickles 1988) - the representation

is not symbolic. Certainly much of Marr's theory of early vision could be read simply as a functional

description of simple neural processing. Hatfield (1990) also proposes a rapprochement between

direct and representational transformation accounts via connectionist "symbol" processing.

Page 17: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

15

4 Establishment ecological researchIn the next two sections of the paper I will review a number examples of individual experiments or

of research programs conducted from the Establishment and Ecological viewpoints. The dual aims

are to compare the style of experimentation within the two camps and to provide some context for

the discussion of the Ecological approach with which I conclude in Section 6. Experiments which

are self-consciously motivated by an anti-direct stance tend to seek the effects of perceiver

knowledge on percepts (Hochberg 1994), while others tacitly working within the classical framework

uncritically offer inference-based explanations of their observations. There are also many examples

of the types of experiments which are a favorite target of Ecologists: snapshot theories of motion

perception, lateralization of sine tone stimuli , fixed-head sound localization, and auditory ill usions

using "impoverished stimuli " of various sorts.

Since the subject matter and analyses found in these studies are so obviously different from

those in Ecologically-motivated work, it does not seem particularly ill uminating to discuss them

here. Instead I will focus on experiments which address issues relevant to object or event perception

from an Establishment viewpoint. The emphasis in this work is usually on filli ng unsatisfying gaps

in the direct perception account (eg. how are cues or invariants extracted?) or on offering alternative

accounts involving mediating processes. I will attempt to show by example that to some extent one

can address auditory perception ecologically without being strictly Ecological.

4.1 Brunswick & Mohrmann: loudness constancy

As mentioned previously, the concern with investigating environmental perception did not originate

with Gibson. Brunswik and his colleagues examined many perceptual constancies within the

framework of his probabili stic functionalism. An example of this approach applied in hearing is a

study of loudness constancy by Mohrmann (1939, described by Postman & Tolman 1959). The task

of the subjects was to report the loudness of the sounds produced by a number of sources while

adopting one of two attitudes. The first, the naive-realistic attitude, was distally focused, and

required the listener to estimate the intensity at the source, while the analytic, or sensorial, attitude

concerned the intensity at the listener's position. The actual intensity was measured using

microphones at the source and listener positions, but the response method is not described.

Page 18: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

Warren (1982) discusses an approach in which estimates of loudness are actually held to be2

disguised estimates of distance.

16

Achievement of constancy was calculated by correlating the judgements with the physical

measurements. Presumably the proximal intensity was varied by altering the distance to the source

rather than changing its amplitude, since the latter would cause both distal and proximal intensities

to vary in parallel. In addition, the experiment was performed in the dark, with li steners blindfolded

after viewing the source, and with the source in plain view throughout.

If li steners were able to adopt the desired attitudes perfectly, the constancy ratios obtained

should be 1 in the naive-realistic case and 0 in the analytic case. This trend was observed, but

constancy ratios ranged from approximately 0.65 (for tones) to 0.95 (for speech) in the realistic case,

and from about 0.1 to 0.5 in the analytic. This suggests that on the whole observers are more

successful at reporting distal intensities than proximal ones. In addition, constancy was favored

when subjects could see the source and how far away it was no matter which attitude they were

requested to take, but visual cues hindered proximal reporting more than they assisted already-good

distal reporting. That is, li steners could only successfully adopt an analytic stance in the dark

condition. Another feature of the data is that the complex sounds, such as speech and music,

permitted much higher loudness constancy than tones and noise.

These results can of course be interpreted in several ways. In Brunswik's terms, the adaptive

value of perception lies in distal focusing, and therefore it should not be surprising that we have

easier access to distal representations that to the proximal cues from which they are derived.

Unconscious inference could be invoked to explain the achievement of greater constancy in the

visible-source condition, in which vision provides information about the distance to the source. This

could be used by the auditory system, which "knows" how intensity varies with distance, to

determine the source's loudness. This of course begs the question of how the visual system obtains2

unambiguous distance information. The advocate of direct perception would explain the diff iculty

of reporting proximal intensity as evidence that the auditory system is designed to recover source

properties. The better constancy obtained with speech and music could be attributed to their greater

ecological validity and informational richness in comparison to the lowly sine tone and noise burst,

for which source recovery would be ambiguous. The advantage bestowed by visual information is

Page 19: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

17

less conveniently explained within an Ecological account, but conceivably cross-modal invariants

for loudness could be hypothesized.

4.2 Auditory scene analysis & auditory image perception

As Helmholtz pointed out, a central issue in auditory research concerns the means by which the

complex superposition of sounds from several sources is processed so that each may be perceived

separately. This process is referred to as source segregation or auditory scene analysis, and is

addressed in an extensive program of research conducted by Bregman and his associates (Bregman

1990) and in a theoretical paper by Yost (1990). Each author is concerned with slightly different

aspects of the problem, and phrases his assumptions and motivations differently. Here I will give

a review of their theoretical orientations and the types of experiments associated with each.

4.2.1 Bregman: auditory scene analysis

For Bregman, like Marr, “perception is the process of using information provided by our senses to

form mental representations of the world around us” , and as in visual scene analysis an important

problem is the grouping of separate pieces of information about the same object together. He writes:

“ it is important to emphasize again that the way the sensory inputs are grouped by our nervous

systems determines the patterns that we perceive”. So, the products of perception are in this account

very much influenced by mental activity (or at least alterable neurophysiological activity). The scene

analysis task is posed as a problem to be solved by the auditory system through a process of

representational transformation. Bregman stresses that on one hand it is important to examine the

ecology of audition - the constraints on and commonaliti es among natural auditory scenes - and

suggests that the auditory system uses ‘knowledge�

of this sort in the form of useful heuristics in

order to achieve source separation. The formation of representations is held to be constrained both

by innate, primitive grouping rules and by learned rule complexes, which he calls schemas.

Grouping occurs both sequentially (on successively-presented segments of a sound pattern)

and in parallel (on sound components present simultaneously). The end result of the grouping

processes is one or more sound streams, which are described variously as the auditory equivalents

of visual objects, perceptual units representing single happenings, or as perceptual representations:

“a computational stage on the way to a full description of an auditory event” . When a stream is

compared to an object, clearly the meaning is not that streams exist in the environment, but that a

Page 20: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

18

stream is a unit of auditory experience with its own properties (rhythm, pitch contour and timbre,

say) just as a visual object is a unit of experience. Despite referring to ecological constraints as a

guide to grouping processes, it is clear that a stream does not necessarily correspond one-to-one with

a sound source. It is possible for the sound from many sources to merge into a single stream, or for

sound from a single source to be segregated into several streams.

The latter effect is revealed in experiments on pitch streaming, a sequential grouping process

(Bregman and Campbell 1971). When a tone sequence consisting of alternating high and low tones

is presented it can appear as a single stream if it is played slowly or the tones are not widely

separated in pitch, or it may split i nto two streams if played fast or with wide separation. In

situations where a single sequence is grouped into multiple streams, it is very diff icult for li steners

to discern the temporal relationships between them. For example, rhythmic patterns perceived in

a single stream can dissolve if pitch manipulations cause it to split i nto multiple streams. Bregman

writes that we can:

"...look at the streaming effect as the auditory system's description as a mixture oftwo sources - one high in pitch and the other low. This is the system's best bet as tothe deep structure of the situation. The heuristic that seems to be involved here isthis: Temporally adjacent segments are not necessarily to be grouped as arising fromthe same source, especially when the segments themselves have sharp boundaries.... In such cases, the events are to be grouped according to similarity." (Bregman1981).

The reference to "deep structure" is not accidental; he often compares the heuristics involved in

"parsing" the auditory input to Chomskian grammatical rules. Formal generative grammars have

been used in modelli ng the perception of music (Lerdahl and Jackendoff 1983), and Ballas (1987,

see Section 4.3) uses a speech metaphor in his account of environmental sound perception, so this

approach is not unique.

Bregman's experimental program also includes investigations of other streaming phenomena,

such as those based on timbre differences, and of auditory analogs to visual amodal completion

effects. When tone and noise bursts are alternated in sequence, the tone appears to become

continuous when the noise is suff iciently intense that it would have rendered a truly continuous tone

Page 21: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

19

inaudible. This effect also occurs when a tone glide is interrupted by noise - under the appropriate

conditions the glide appears to persist through the noise while continuing to change in pitch, a

phenomenon which has been used to investigate the auditory system's 'assumptions' about the rates

of change of sound source characteristics (Kluender & Jenison 1992). Warren (1982) reviews

several of these ill usory continuity effects, and Bregman is in agreement with his suggestion that

their function is to group together sound segments originating from the same source which would

otherwise be separated by masking signals. The abilit y to elaborate sketchy, temporally-limited

sensory information into temporally-extended stable percepts has also been noted in binaural

experiments (Stellmack 1994).

4.2.2 Yost: auditory image perception

In a paper entitled "Auditory image perception and analysis: The basis for hearing", Yost (1990) also

addresses the scene analysis problem, although he distinguishes his point of view from Bregman's.

His emphasis is on processes which allow the separation of concurrently active sources, under the

premise that main function of the auditory system is held to be the "determination of sound sources".

'Determination' is explicitly distinguished from 'identification'. It is generation of an 'auditory image'

corresponding to a single sound source. These images are the objects of the identification process

although identification need not be successful in order for them to be perceived. An auditory image

seems to be approximately the same as a stream although in a sense its identification with a single

physical source suggests that it is a more environmentally-oriented concept. While Bregman's

proposal that streams are the units of auditory experience seems clear, the use of the word 'image'

is rather more confusing. Consider this passage:

"Because the sounds from different sources do not arrive at the auditory systemseparately, the auditory system must process the neural representation of the complexsound field into elements ('auditory images') that allow the listener to potentiallydetermine the source. The presence of sound sources is inferred of deduced frompercepts, the auditory images, based on the information arriving at the ears of alistener. Thus auditory images are the bases for hearing." (Yost 1990)

Such an image is clearly not something which is imagined; it is not the sort of thing studied by those

interested in auditory imagery (Reisberg 1992). Nor is it analogous to a retinal image. If the images

Page 22: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

20

are percepts (ie. the experiential outcomes of the process of perception) then they are in classical

terms the conscious representations of sound sources in the environment. However Yost seems to

introduce an extra step of inferring the existence of sound sources from percepts rather than taking

the Helmholtzian position that percepts are the result of inference. To complicate matters Yost

elsewhere states that image perception is suff icient for sound source determination. The proli feration

of levels seems to result from an awkward attempt to keep the discussion outside the realm of

cognition. For example:

"If one reviews the literature on image formation (Handel 1990, Bregman 1990), thetopic may appear to be more closely related to cognitive science, or even tophenomenology, than to issues that would be of direct interest to psychoacoustics andauditory physiology. An assumption of this paper is that the auditory system isresponsible for auditory image formation and the four questions posed above areamenable for study by auditory scientists."

Yost claims to seek an explanation in terms of neurophysiology or basic psychophysics, but,

although denying it, needs a foot in both camps. If auditory images are not phenomenological

entities, then they are hardly percepts.

Rather than belabor this point, I will press on and discuss the experiment Yost presents as

an example of image formation and briefly describe the means by which he feels this is achieved.

The necessity for scene analysis occurs whenever more than one source is active at the same time.

The experimental stimulus in this case was a mixture of a man uttering the vowel /a/ and a

synthesized pipe organ note. Neither the physical frequency spectrum nor the output of an auditory

filter bank model make it obvious that two and only two sources are present, but all of the subjects

who heard the stimulus reported hearing only two. Identifying the sources was more variable, but

all li steners heard some spoken vowel and a musical note.

The strategy adopted in explaining this abilit y is to examine the ecology of sound production

for physical attributes of sources which might be encodable in the auditory nerve signals. The seven

physical variables suggested are: spectral separation, intensity profile, harmonicity, spatial

separation, temporal separation, common temporal onsets and offsets, and coherent slow temporal

modulation. While this does not exactly constitute a search for invariants, it is meant as a first step

Page 23: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

21

in a neurophysiological account of source separation. Note that the importance of temporal

separation and common onsets and offsets was recognized by Helmholtz (1877, see Section 3.1).

4.2.3 Summary

The search for an account at this ecological-neural level is perhaps the only feature of this approach

and of Bregman's which Ecological theorists would not object to. Talk of grouping rules,

representations, problem solving, deductions, and inference is the antithesis of a direct theory.

However, no convincing account of source separation in terms of acoustic invariants has yet been

offered. Gibson (1966) proposed that orienting the head so as to synchronize the binaural inputs for

one source while desynchronizing those for others was the basis of 'selective listening'. While spatial

separation and binaural input does assist in source segregation it is clearly quite possible with a

single channel of input, and thus Gibson's account is inadequate. This issue is discussed in more

detail i n Section 6.2.

Bregman's work might be subjected to the standard criti cism that his stimuli are

impoverished and unnatural and that the results therefore have littl e or no relevance to ecological

listening. In addition, any account which posits rules is vulnerable both to questions about who is

applying the rules (ie. the homunculus problem) and to objections about lack of constraints. One can

keep adding rules to explain whatever behavior is observed. However, Bregman states that he is

interested in a functional description of these processes - the rules are tools for predicting percepts

rather than actual constituents of the auditory system. His is an as-if, not an in-fact, rule-following

account. The primitive rules are described as "automatic innate processes that act without conscious

control" (Bregman 1990). On the other hand, his description of the more sophisticated, top-down,

schema-based processes is less reconcilable with direct perception accounts. Here consciously-

directed attention and "the activation of stored knowledge of familiar patterns" are held to play a

role. A Gibsonian explanation would involve an account of perceptual learning, which involves the

discovery of additional variables of stimulation permitting finer discriminations.

4.3 Ballas & Howard: interpreting environmental sound

As a final example of a non-Ecological approach to environmental perception I will examine Ballas

and Howard's paper, "Interpreting the Language of Environmental Sounds" (1987). Their main point

Page 24: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

22

is that a useful analogy can be made between the perception and understanding of speech and of

environmental sound. Both seem to involve bottom-up, data-driven processes combined with top-

down, context-dependent, knowledge-based cognitive processes which serve to resolve ambiguities

and permit the recovery of meaning. This is similar to Bregman's distinction between primitive and

schema-based processes, but the authors' intent is to present evidence that not only is the general

form of perceptual processing similar, but that specific details are too. The claim that environmental

sound can be considered a language is therefore more than metaphorical.

Ballas and Howard discuss four experiments in support of their contention. The first

involved the free-response identification of a number of short recorded sounds, several of which

were intended to represent events in water or steam-pipe systems. It was found that (with the

exception of a water drip sound) actions were much more accurately identified than agents. That is,

listeners could more reliably say whether the event involved an impact, friction or flow than whether

the materials involved were water, wood, metal or air. These results are contrasted with those of

Vanderveer (1979), who obtained much more accurate judgements, but Howard and Ballas offer the

explanation that Vanderveer's stimuli (e.g. jingling keys, fingers drumming on a table) were

presented in an appropriate context, a seminar room, and that this cued the listeners. Their

conclusion is that, taken in isolation, the meanings of individual environmental sounds (ie. the

identities of the source events) can have ambiguity as can the meanings of isolated words.

A second study attempted to draw a parallel between sound and speech homonyms using an

Information Theory approach. Listeners were again asked to identify the recorded sounds from the

first experiment and to rate the confidence of their identifications. The responses were sorted into

categories and the "entropy" of each sound calculated based on the number of different categories

into which it was placed. The correlation between confidence and entropy was significant,

suggesting that identification is affected by the number of different causes to which a sound might

be attributed. The authors also suggest that identification might be influenced by the frequency of

occurrence of particular sounds in the same way that word recognition depends on frequency, but

they admit that quantifying this may be diff icult.

The final two studies used the same set of sounds presented in sequences and were concerned

with the effect of context on the identification of individual sounds within sequences or the learning

Page 25: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

23

of sequences as whole units. Context was found to influence the interpretation of individual sounds.

For example a hammer striking a pipe was thought to be a factory machine in one sequence and a

car crash in another. This effect is compared to the resolution of homonym meaning in sentences:

"... it appears that the integration of sequences of sounds resembles the integrationof sequences of words in a sentence. In the latter case, multiple interpretations ofeach word might be activated initially and all but one eliminated on the basis of thecontext provided by the other words." (Ballas & Howard 1987)

Although not mentioned by the authors, the activation and inhibition which they propose could be

perhaps be investigated using the established tools of experimental psycholinguisitics.

In the final experiment, li steners were asked to learn sequences of two sorts. One set

contained randomly-ordered combinations of drips, clangs, flushes etc, while the other consisted of

causally-sensible structured sequences created using a small finite state grammar. In addition, half

the subjects in each condition were informed that they would hear sounds involving water and half

were given no instruction. The structured sequences were learned more quickly than the random

ones, and there was an interaction with the instructions given. Prior information aided those learning

the structured patterns but hindered those learning the random ones. The interpretation is that the

expectation of causally-logical sequences interfered with the learning of random patterns. In effect

there is held to be a grammar of causality which listeners use to parse environmental sound

sequences. Jackendoff (1987) makes similar claims about the representation of visual events and

their relationship to language.

An Ecological response to this might question the validity of results obtained with sounds

taken out of an environmental and causal context. In other words, Ballas and Howard might have

too restricted a view of what should comprise an environmental sound or stimulus. Fodor and

Pylyshyn (1981) discuss this move with respect to the phonemic restoration effect and conclude that

widening the conception of the effective stimulus allows the resolution of ambiguity, but

concomitantly reduces the abilit y to explain the perceptual similarity which can occur in differing

contexts. For direct theorists who hold that the auditory system seeks to recover the sound-

producing physical events, the existence of sound homonyms may pose no problem, since these

Page 26: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

24

sounds are often produced by similar physical systems. Ballas and Howard give the example of a

loud sharp bang, which could be caused by an engine backfire, a gun, or an explosion. In all cases

the physical cause of the sound is the rapid expulsion of air from an enclosure, but, as I argue in

Section 6.4, the environmental significances of these causes differ significantly, and identification

is important.

5 Ecological ecological researchThe number of auditory studies explicitl y inspired by the Ecological approach is not large. A

substantial portion of the literature consists of speculative discussions of the applicabilit y of direct

or Ecological accounts to audition rather than descriptions of experimental work. The five examples

discussed below have been chosen to indicate the types of experiments performed and the relative

successes and failures which were encountered.

The influence of the Ecological approach manifests itself in the objectives of particular

experiments or studies and consequently in their design. Characteristic aims are: the discovery of

invariants of stimulation; obtaining evidence that perception is causally related to these invariants,

which in turn is taken as evidence for direct perception; and characterizing the manner in which

perception guides action. The search for invariants involves either mathematical analysis or physical

measurements of a given environmental situation. In order to show that perceptual systems actually

utili ze a particular invariant its presence must be shown to be a suff icient condition for the relevant

percepts to arise. Thus, observers must be shown to be able to perceive the environmental property

which the invariant specifies and their percepts must be alterable by experimental manipulations of

the invariant.

It is of course diff icult to prove experimentally that perception is unmediated, particularly

since the putative mediating processes are presumably unconscious and inaccessible to introspection.

The argument for direct perception therefore generally consists of the identification an invariant,

verification of its eff icacy, and a subsequent appeal to parsimony. If perception appears to be a

function of stimulation, why invoke unconscious inference or other processes? Ultimately this is a

somewhat unsatisfying approach since it must proceed case-by-case and leaves open the question

of directness in situations for which no invariant has yet been discovered. However, one may also

Page 27: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

25

hold that since it is an empirical matter, there is no logical inconsistency in simply assuming the

existence of specification until it i s disproved (Fowler 1991).

The style of experimentation also differs from the bulk of Establishment perceptual research

in the types of stimuli used and the types of responses required of participants. Typically, the stimuli

are complex or "realistic". Subjects are asked to characterize events or perform certain actions based

on their perceptions. These are frequently more complex or natural actions than the typical

psychophysical discrimination task.

5.1 Time-to-contact: acoustic looming

The derivation and investigation of an acoustic variable for time-to-contact provides a good model

of the Ecological approach. In vision the inverse of relative rate of expansion of an object's retinal

projection (r / dr/dt) specifies the time-to-contact if it is moving directly towards the observer. Shaw,

McGowan and Turvey (1991) derive an acoustic equivalent based on the simpli fying assumptions

that the source is a compact monopole, the acoustic medium is non-absorbing, and the surroundings

are anechoic. Under these conditions, acoustic time-to-contact, or tau , is equal to twice the inversea

of the relative rate of change in intensity (2I / dI/dt) at the observer's position. If time-to-contact

were to be deduced only from successive "snapshot" judgements of distance, accuracy would suffer,

since estimation of auditory distance is notoriously poor (Gardner 1969). Prior to any experimental

verification of the effectiveness of this invariant, Guski (1992) questioned whether the auditory

system could in principle use this variable since he thought it required access to the absolute intensity

of the sound source. This concern seems to be based on a misapprehension; the intensity in question

is not that of the source, but the proximal intensity. The acoustic tau is independent of overall

intensity and distance, just as the visual one is independent of size.

A number of studies have examined the abilit y of li steners to judge the time-of-passage of

a moving sound source (for example Rosenblum et al. 1987), but these do not directly address the

effectiveness of the tau invariant since other sources of information such as intensity and Dopplera

shift changes also specify the time of closest approach. Tau offers prospective information for time-a

of-arrival, and therefore the important test is whether arrival time can be accurately predicted from

acoustic information collected before the "colli sion" when other variables are uninformative.

Rosenblum (1993) describes an experiment in which recordings of cars passing an observer at

Page 28: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

26

various speeds were edited into thirds to evaluate the usefulness of information from different

portions of the stimulus. The results indicate that information available prior to passage is as useful

in estimating arrival time as hearing the actual passage. Jenison (1994) has derived variables

involving intensity, interaural time difference, and Doppler shift which are cues to parameters of the

more general approach problem, in which the source moves past the observer at some distance and

at a particular trajectory angle. Wightman & Jenison (1995) report data from an experiment using

such synthesized stimuli which show that li steners can use prospective information to discriminate

arrival times differing by about 300 ms.

While the effectiveness of this invariant seems to have been established, the assumptions

under which it was derived are actually quite restrictive. Sources radiating short wavelengths cannot

be approximated by compact monopoles and in reverberant environments the invariant applies only

to the direct signal. I shall discuss issues of this sort in the concluding sections of the paper.

5.2 Using auditory information for active contact

The studies of acoustic tau discussed above required subjects to judge time-to-contact independent

of any other action. In an experiment conducted by Heine and Guski (1993), participants were

requested to catch a ball rolli ng towards them using only acoustic information. The balls were

released on a ramp which they rolled down and continued towards the edge of a table at which the

subject was seated. Only a single reach-and-catch gesture was permitted, so good performance

depended on estimation of time-to-contact from the sound produced by the ball .

While results varied with the size of ball used (and hence the strength of the sound produced),

performance turned out to be quite poor overall . The authors advance various explanations for this,

the first of which is that the experiment was conducted in an anechoic room, a condition under which

it is very diff icult to judge distance auditorily. This seems like an unfortunate point to raise, since

the advantage of the "looming" invariant for time-to-contact is that it is independent of distance. If

distance judgements are required, the case for the eff icacy of the invariant is undermined. A second

point raised is that sighted humans rarely rely only on acoustic information in natural situations.

Hearing typically aids orientation and preparation for visually-guided action. However the fact that

blind athletes can apparently use similar information to play games involving rolli ng balls leads the

Page 29: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

27

authors to conclude that suff icient information is present in the acoustic signal, but that their sighted

subjects were not attuned to it.

5.3 Transformational invar iants: breaking & bouncing

An early and frequently-cited example of Ecological acoustics is the study of the perception of

breaking and bouncing by Warren and Verbrugge (1984). The emphasis is on identifying

transformational invariants (specifying a dynamic characteristic) for bouncing and breaking events.

It is suggested that a "single damped quasi-periodic pulse train" specifies a bouncing event and that

an "initi al rupture burst dissolving into overlapping multiple damped quasi-periodic pulse trains"

specifies breaking. Subjects listened to natural tokens of bottles and jars hitting a linoleum floor and

were asked to identify the type of event independent of the material involved. In addition to the

breaking and bouncing categories, subjects were encouraged to respond "don't know" if they could

not decide or if they perceived some other type of event. Given this three-choice task, correct

identification was better than 98% for both types of tokens. To verify that the hypothesized

invariants do specify the two types of events, synthetic tokens were constructed using recorded

sounds from four single pieces of glass. Here correct identification was 90.7% for bouncing and

86.7% for bouncing.

It is of course possible that subjects used prior knowledge of similar events to perform the

classification rather than perceiving them directly via the temporal patternings. As the authors

acknowledge, and additional problem is the response method used. If these temporal structures truly

specify the events, then rates of correct identification should be unaffected by the number of different

sorts of events to be identified. If non-breaking and non-bouncing events were included, would

performance deteriorate? Predefining the categories brings to mind a criti cism which has been

leveled at Establishment theorists; Turvey et al. (1981) state that those opposed to Establishment

theory should ask of its proponents "both why and how any given thing comes to be described in just

those predicates that are consonant with the hypothesis mediating its interpretation." By restricting

the responses and the categories, perhaps Warren and Verbrugge have cast the task into the form of

a statistical inference problem.

5.4 Perceiving numbers by audition

Page 30: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

28

Occasionally, as in the ball -catching study, invariants of stimulation may exist, but seem to be

poorly-utili zed by observers. The task in this experiment (Heine, Guski & Pittenger 1993) was for

listeners to estimate the number of steel balls dropped and allowed to bounce on a wooden surface.

For a single ball the sound consisted simply of a series of impacts, while with two or more balls there

were also colli sions between balls. Recordings were made in an attempt to find acoustical correlates

of the number of balls dropped. Correlation coeff icients with magnitudes from 0.95 to 0.99 were

found between the number of balls and the peak sound level, the time interval between the first and

second bounces, and the overall duration of the event.

Although subjects were able to identify the single-ball case reliably, in all other cases the

number of balls tended to be under-estimated, and the variabilit y in responses was high. In fact,

from the data presented, it does not appear that listeners could reliably distinguish between 2 and 9

balls. The explanations offered for this result are similar to those in the ball -catching experiment.

The task is somewhat unnatural and makes atypical demands on the auditory system, which may not

be attuned to pick up the acoustic invariants available. The authors again make an un-Ecological

remark about the subjects' lack of knowledge of the situation. Apparently when shown the

experimental setup before being blindfolded the correlation between judgements and number of balls

increased from 0.73 to 0.84, which suggests that prior, non-auditorily-derived knowledge of the

situation may be as important as "attunement", which was not demonstrated.

5.5 Acoustic texture in distance perception

The final example of Ecologically-inspired experimentation is an investigation of the utilit y of

providing "acoustic texture" in a distance judgement task (Höger 1993). Gibson (1979) proposed

that texture gradients are invariants for surface slant and that the amount of texture occluded by an

object serves to specify its distance from the observer. By (rather weak) analogy, "it is assumed that

characteristic changes of background sounds from different locations constitute an acoustic texture

gradient of depth". Four loudspeakers were positioned at 4 m increments from the listener, whose

task it was to identify the position from which one of three sounds (truck, dog or ducks) was

presented. In a "texture" condition a recording of singing birds was played in a random order from

each loudspeaker prior to presentation of the test stimulus.

Page 31: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

29

The data revealed no significant effect of adding texture except at one distance for the truck

sound. A second experiment employed monaural recordings of stationary or moving cars at various

distances. These were presented over headphones with and without texture, and listeners were asked

to report the apparent distance to the car. Texture had no effect for the stationary car, but slightly

improved a tendency to underestimate distance in the moving car condition. This bias did not exist

for the stationary car, which is puzzling since moving stimuli contain dynamic Doppler shift and

intensity cues to distance, and hence judgements might be expected to be more accurate.

It is clear that acoustic texture cannot specify distance in the way that visual texture is held

to do. In the visual case, occlusion is essential and this does not exist in the auditory case. In

Section 6, I argue that attempts of this sort to apply the principles of visual ecological theory directly

to the auditory realm are ill -advised.

6 Prospects for an Ecological accountMy third aim in this paper is to have a criti cal look at the current status of Ecological accounts of

audition in order to assess their successes and shortcomings. Since Gibson's approach is so deeply

rooted in vision, the first step taken is to examine the differences between the auditory and visual

ecologies. Following this, I discuss the problem of the superposition of acoustic signals, of acoustic

specificity, and of auditory affordances.

6.1 Auditory ecology

A theory of audition (whether Establishment or Ecological in style) must take account of the

particulars of acoustic ecology. The manner in which sound is usefully structured by the world

differs greatly from the way light is, and therefore auditory systems (and auditory theories) are faced

with many challenges dissimilar to those found in vision. Although many differences can be listed,

I contend that the root cause is the fact that audible sound has very long wavelengths in comparison

to those of light. The range of human hearing covers wavelengths from 10m to 2cm, while light

sensiti vity consists of wavelengths from approximately 400 to 600 nm. Light and sound are both

wave phenomena, but their differing scales mean that the manner in which they interact with the

same objects in the world are dissimilar.

Page 32: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

30

The first consequence of wave length is that there can be no "acoustic retina". To achieve

the same spatial resolving power as the eye, an acoustic lens would need a diameter of approximately

200 m for the highest frequencies and 100 km for the lowest. The transduction of sound is therefore

non-directional; sound impinging upon the listener from any direction is "projected" to a single point

- the eardrum. There is no geometrical preservation of space or place-to-place mapping from the

world to a receptor surface as there is in vision. The challenge facing the visual system is often

stated in the form of the inverse projection problem. Given a 2-dimensional retinal projection, there

are infinitely many 3-D surface layouts which could have produced it. Clearly the problem is even

worse in audition since the projection is from three dimensions to a 0-dimensional point. The

situation is ameliorated somewhat by the facts that we possess two ears and that sound travels

relatively slowly, allowing interaural time differences to specify one component of source direction.

A second consequence of sound's large wavelengths is that sound-emitting objects, unless

they very large, do not occlude others sources in the way that visual objects do. Diffraction permits

sound to sweep past objects and to propagate around corners, and thus occlusion cannot provide

information about the relative distances of interposed objects. Auditory masking is sometimes

compared to visual occlusion, but the processes are really quite different. An intense sound wil l

mask other sounds independent of their direction of origin, and there is no way to "li sten around" a

masker. It is a cotemporal process rather than a codirectional one.

The combination of 3-D to 0-D projection and the lack of occlusion means that the auditory

system is faced with determining the spatial positions and character of sources the sounds from

which are superimposed at the receptor. There is no independent access to sounds from different

directions or at different distances, and the information from all concurrently active sources must

pass through a single channel. Somehow this information gives rise to percepts of individual

sources.

The situation is further confounded by a third consequence of wavelength, which is that

sound reflection is specular and maintains the important temporal structure of the original source

signal. It is generally specular because sound-reflecting surfaces are much smoother at the

wavelength scale than are the same surfaces when reflecting light. Frequently there is littl e to

Page 33: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

31

distinguish an echo from an additional source, and these reflections are themselves superimposed

on the signal at the eardrum.

Because of sound's long wavelength and our lack of acoustic retinae, the information

contained in sound reflected from an object is rather low-resolution. Humans, unlike bats, rely

primarily on the sound-emitting properties of objects rather than their sound-reflecting properties.

Bregman vividly sums up the situation this way:

"This way of using sound has the effect of making acoustic events transparent; theydo not occlude energy from what lies behind them. The auditory world is li ke thevisual world would be if all objects were very, very transparent and glowed insputters and starts by their own light, as well as reflecting the light of their neighbors.This would be a very hard world for the visual system to deal with." (Bregman 1990)

Helmholtz also addresses the problem of superposition in his discussion of the separation of systems

of ripples on the surface of a body of water:

"But the ear is much more unfavorably situated in relation to a system of waves ofsound, than the eye for a system of waves of water. The ear is affected only by themotion of that mass of air which happens to be in the immediate neighborhood of itstympanum within the aural passage. ... The ear is therefore in nearly the samecondition as the eye would be if it looked at one point of the water through a longnarrow tube, which would permit of its seeing its rising and falli ng, and were thenrequired to undertake an analysis of the compound waves. It is easily seen that theeye would, in most cases, completely fail i n the solution of such a problem. The earis not in a condition to discover how the air is moving at distant spots, whether thewaves which strike it are spherical or plane, whether they interlock in one or morecircles, or in what direction they are advancing. The circumstances on which the eyechiefly depends in forming a judgement, are all absent for the ear.

If, then, notwithstanding all these diff iculties, the ear is capable ofdistinguishing musical tones arising from different sources - and it really shews amarvelous readiness in so doing - it must employ means and possess propertiesaltogether different from those employed or possessed by the eye." (Helmholtz 1877)

Thus, the auditory system relies mainly on different sorts of structures in stimulation than the visual

system - temporal ones rather than spatial. The acoustic signal therefore supplies information about

Page 34: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

32

very different properties of objects that does light, and this potentially leads to a further source of

ambiguity in the stimulus. I will discuss the problem of acoustic specificity shortly, but first wish

to examine the significance of the superposition problem for an Ecological theory of hearing.

6.2 Superposition

A central tenet of the Ecological approach is the idea that what count as stimuli should be broadened

with respect to the traditional view. Thus the stimulus in vision is taken to be the optic array, rather

than the retinal image. There is no reason why an acoustic array could not be defined to give spectral

content as a function of time and direction over a sphere centered on the listener. However it is not

clear that defining the stimulus in this way is of much use since there is no directional access to this

array prior to transduction. One cannot sample the acoustic array in the same sense that the visual

system can sample the optic array. Directional information such as binaural difference cues and

direction-dependent pinna filtering might be held suff icient to define unambiguously the location

of a source, but these are properties of sounds corresponding to individual sources and not of the

complex superposition of signals at the eardrum.

In general, superposition seems to be an unaddressed and diff icult problem for direct realism.

Proposed invariants such as the acoustic tau and Warren and Verbrugge's bounce-specifying

temporal patterns are properties of individual sources or events. If a li stener is presented with a

stationary source and a looming one, tau of the overall signal does not specify time-to-contact. Thea

sources must be separated so that only those components belonging to the moving source are

subjected to the looming "computation". Again, suppose a bouncing event is heard simultaneously

with babble from a group of speakers - the stimulus as a whole will not take the form of a quasi-

periodic pulse train.

As mentioned previously, Gibson (1966) suggests that orienting to a sound source

synchronizes the arrivals at the two ears, but separation is also possible with monaural li stening and

with diffuse, unlocalizable sources. It is hard to imagine how separation occurs without something

like the segregation and fusion processes proposed by Yost and Bregman, but these seem to operate

heuristically and to impose a structure on the stimulation. Ultimately the percepts derived seem to

owe as much to the processes of separation as to the sound-structuring properties of the environment.

Page 35: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

33

This is not the sort of explanation proponents of Ecological perception have in mind, but no serious

alternative has been proposed.

A point to note is that in the domains where the Ecological approach has been most

successfully applied, vision and haptics, the superposition problem does not exist. Only one object

can be in contact with the skin at any point, and only light from the nearest surface in a particular

direction contributes to the optic array. The fact that source separation is less problematic in these

modaliti es perhaps explains why it has not been dealt with in Ecological accounts of audition.

6.3 Specification

Setting aside the issue of superposition, let us consider specification in the auditory domain. For a

direct account to succeed, detectible properties of the acoustic signal must stand in a one-to-one

relationship with the perceived properties of sound sources. The source-to-sound mapping is clearly

unique, but, even for a single source in a noise-free environment, can we be sure that the reverse

mapping is also unique? Can the inverse problem of recovering the causal source of the acoustic

signal be solved?

For a number of simple sound-producing systems it seems that it cannot. First consider the

Helmholtz resonator, which consists of a vessel enclosing a volume of air with a neck containing a

"plug" of air. The resonant frequency of such a device depends only on the mass of air in the plug

and on the volume of air in the main chamber. Vessels of many shapes and sizes can produce the

same sound, and therefore these parameters cannot be specified. Similarly, the frequency of

vibration of a stretched string depends on its length, mass, and tension. Thus length, for example,

cannot be specified since a change in length can always be compensated by appropriate adjustments

in tension or mass.

The 2-dimensional counterpart of the string, the stretched membrane or drum, also suffers

from this same ambiguity. While the frequencies of various modes of vibration provide information

about the area of the membrane and the length of its perimeter, it has been proven that drums of

different shapes can vibrate with exactly the same set of frequencies when struck (Cipra 1992,

Driscoll 1995). Hence "one cannot hear the shape of a drum" (Gordon et al. 1992). Finally, it has

been demonstrated that identical vowel spectra can be produced by the human vocal tract in very

Page 36: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

34

different configurations (Ladefoged et al. 1978), and in principle a given set of formants can be

produced by a variety of vocal tract area functions. This is a problem for those who maintain that

speech is perceived on the basis of articulator position recovery.

No amount of sampling or scanning of the acoustic array can resolve the ambiguities, so it

must be assumed that these particular sound-specifying parameters are not specified in the sound

produced. Perhaps these are merely overly-simpli fied systems, which Gaver might group with

musical sounds;

"Musical sounds are not representative of the range of sounds we normally hear. ...Musical sounds seem to reveal littl e about their sources, whereas everyday soundsprovide a great deal of information about theirs." (Gaver 1993a)

Fowler (1991) states that a claim that we "hear the world" is not a claim that we hear every property

of the world or that every different thing is perceived differently, but claims that specificity exists

nearly always for "for relevant properties of objects and events with which we interact". This

assertion seems vaguely circular, since it would be rather lucky for us to li ve in a world where no

relevant properties of objects are unspecified. It is clear that there are properties which cannot be

specified acoustically - whether these are relevant or not is a matter for debate. There are also

properties which do seem to be specifiable. For example the elasticity of a vibrating material is

indicated by the decay rate of vibration when it is struck (Wildes & Richards 1988). Fowler also

refers to the rareness of "mirages" outside of the laboratory, but in addition to the sound homophones

described by Ballas, one can think of more natural examples. Gibson mentions that thunder 'cracks',

but tree branches also 'crack', and the physical causes are quite dissimilar.

Given that some properties of the systems discussed cannot be specified, it is necessary either

to suggest means of resolving ambiguity, to refine the idea of what it means to recover the source,

or to abandon the inverse problem altogether (Kluender 1991). For an Ecological account, supplying

the perceiver with knowledge of the constraints of the system is not an option. For example if one

knew the possible configurations for a human vocal tract it might be of assistance in recovering

articulator positions, although in modeling this appears to be diff icult even with careful X-ray

measurements of one individual speaker (Baill y et al. 1991). Fowler's move to block "premature

Page 37: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

35

allegations of lack of specificity in acoustic speech signals" (1991) is the proposal that in running

speech the situation is different. The requirement that the current configuration must be smoothly

connected to those before and after may constrain the problem enough to yield a unique solution.

A different view is held by Kluender (1991) who refers to work on visual structure-from-motion in

maintaining that once rigidity is given up (and he claims it must) all bets are off in solving the

inverse problem. Gaver (1993b) recognizes the limits of specification and suggests that what are

specified are constraints on solutions to the inverse problem.

Coupling this with Fowler's position that we do not hear everything, the question seems to

be what do we hear? With how fine a brush is the auditory world painted, and can the answer to that

question be accounted for by the information available in acoustic stimulation? Answering these

questions is of course rather diff icult since even in free identification tasks it is impossible for

listeners to describe every aspect of their percepts. Discrimination studies leave open the question

of whether responses are based on recovery of source properties or simply on differences in the

acoustic signals. Studies in which subjects are asked to detect source-properties often limit the

domain of responses, and thus do not speak directly to the specificity question. Examples of the

latter are studies of the perception of breaking and bouncing (Warren and Verbrugge 1984), hand-

clapping (Repp 1987), and mallet hardness (Freed 1990).

The forgoing comments mainly concern the specification of shape and vibrational properties

of sound emitters, but questions of specificity also exist in determining the spatial layout of sources

and the environment in which they are active. In determining the direction from which sound is

arriving the auditory system, absent head movements, depends on binaural difference information

and the directional filtering performed by the pinnae. The spectrum of the sound reaching the

eardrum is ambiguous with respect to this spectral cue because the contributions of the source

spectrum and the pinna filtering are not separately available. Yet listeners can localize sounds

without employing head movements. The explanation for this achievement has traditionally been

that listeners employ a priori knowledge of the source spectrum to recover the pinna filter function

and to identify the source position, although this idea has not been tested rigorously.

The sound field at a listener's position is structured not only by the sources of sound but also

by the layout of reflecting surfaces in the environment, and it is often proposed that by consequence

Page 38: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

36

of this we can hear the location of these surfaces. A previously-mentioned example was that "we

can hear the narrow, echoing walls of the alley it [a car] is driving along" (Gaver 1993a). We can

obviously tell the difference between the interior of a cathedral and a coat closet, but how much

information for the layout of surfaces is actually present? Consider two properties of a sound field

which depend on the characteristics of the enclosure in which events occur: the reverberation time

and the direction of arrival of reflections. It has been shown that reverberation time of a room is

directly proportional to its volume and inversely proportional to the surface area of the walls and

their absorbtivity (Morse and Ingard 1968). Therefore the shape of the room cannot be specified by

this parameter. Secondly, in an experiment using synthesized stimuli carried out in our laboratory

(HDRL, Waisman Center) we found that subjects were unable to discriminate between cases in

which wall reflections accurately duplicated those in a rectangular room and those in which the

reflections came from arbitrary directions with the same distribution of time delays. In this case the

auditory system was not sensitive to the locations of the walls, but only to their distances relative to

the li stener and the source. Thus only "fuzzy" information about the layout of surfaces in the

environment seems to be present in the acoustic array.

The final observations I will make about specificity concern the auditory perception of

distance. In an Establishment analysis, the auditory system is faced with an inverse projection

problem exactly analogous to that in vision. An image projected on the retina could arise from an

object at any distance if its size is chosen appropriately, and (in an open space without reflecting

walls) a sound of given proximal intensity could be caused by a source at any distance given the

appropriate sound level. Another feature which varies with distance, the absorption of high

frequencies, is ambiguous in the same way as the pinna filtering cue to direction. Gibson's solution

to the visual problem is to point out that objects are not encountered floating in a featureless void,

but that they generally appeared against some sort of textured background (Gibson 1979). The

amount of texture surrounding and occluded by the object is held to specify its distance and size, but

a similar invariant cannot exist in audition because there is generally no such thing as acoustic

occlusion.

Höger (1993) attempted to devise an auditory counterpart to Gibson's surface texture, but

found littl e improvement in the accuracy of distance judgements when "texture" was added. I feel

Page 39: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

37

that this experiment is an example of a tendency which, at worst, leads to the assumption that

principles derived for Ecological optics apply equally well i n the auditory modality, and at best to

the production of rather strained analogies.

6.4 Auditory affordances

Although not intrinsic to direct perception, affordances are an important constituent of the Ecological

approach. While one might envision an account in which just spatial layout itself is perceived

without mediation, Gibsonians emphasize the ecological significance of certain configurations. This

is necessary since their reconceptualization of perception ties it intimately to action. One can find

various definitions of affordances in the literature, some more straightforward than others:

"The affordances of the environment are what it offers the animal, what it providesor furnishes for good or ill . The verb to afford is found in the dictionary, but thenoun affordance is not. I have made it up. I mean by it something that refers to boththe environment and the animal in a way which no existing term does. It implies thecomplementarity of the animal and the environment." (Gibson 1979)

"Affordances are the acts or behaviors permitted by objects, places, and events."(Michaels and Carello 1981)

"A propertied thing X ... affords an activity Y ... for a propertied thing Z ... if andonly if certain properties of X ... are dually complemented by certain properties of Z,where dual complementation of properties translates approximately as properties thatare related by a symmetrical transformation or duality T such that: T(P ) � P and1 2

T(P ) � P ." (Turvey et al. 1981)2 1

Although Ecological theorists define affordances in rather general terms, those which are commonly

introduced to explain the idea tend to be of a particular type. For example we are given climbabilit y,

grababilit y, crawl-intoabilit y (Turvey et al. 1981), sit-onabilit y, and drink-fromabilit y (Michaels &

Carello 1981). These affordances are said to be the objects of perception specified by variables of

stimulation and as such they share one essential property. This is that the characteristics of surfaces

responsible for structuring the optic array (and thereby providing information for the affordances)

Page 40: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

38

are the same characteristics which underlie their ecological significance. In other words, a group of

surfaces provides certain affordances by virtue of its shape, and it is its shape which structures the

information-bearing light. In fact the case for direct perception of affordances rests on this type of

specification and the additional assertion that there is a one-to-one mapping between layout and

variables of the optic array.

Whether or not affordances are directly perceived in vision, the situation in audition is clearly

somewhat different. In general the ecological significance of a sound-emitter need have littl e to do

with the means by which it produces sound, although there are certainly exceptions. A snake may

be identified as a threat by hearing its hiss, but it is threatening because it is a snake and not because

it lacks vocal cords. The ringing of a telephone has significance because a telephone is a message-

conveying device and not because it contains a brass bell or a buzzer. In both of these examples it

is perception-as which is important, and not perception-of. Of course counter-examples are also

available; a woodpecker may detect hollows in a tree trunk by tapping and an organism may judge

the approximate size of an enclosure by variables related to reverberation.

The affordances of objects are generally related to their shapes. These may be specified by

light, but need not be specified by sound. Sounds are signifiers as well as specifiers. The sorts of

characteristics which comprise affordances are not always the sorts of things which can be specified

in the acoustic array. Michaels and Carello (1981) state that "to detect affordances is, quite simply,

to detect meaning", but it seems clear that meaning can be detected without affordances as they are

typically construed. If one accepts this, it would seem that we have found an instance of one of

Mace's "five ways to have a theory of indirect perception" (Mace 1977) because meaning is not

specified without an additional step of identification or recognition. Mace points out that a direct

theory of perception must be an Ecological one, although an ecological theory need not be direct

(Fowler 1990). Unless meaning itself, in other words affordance, is specified and picked up,

mediation is required to interface perception with the psychological systems controlli ng action.

The difficulty of translating the standard concept of affordance into the auditory domain is

reflected in the infrequency with which writers on ecological acoustics use the term. For example

it is not mentioned by Jenkins (1985), Fowler (1990, 1991), or Gaver (1993a, 1993b). When authors

do refer to affordances in an auditory context they seem to do so with some carelessness, or in ways

Page 41: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

39

which distress strict adherents of Gibson's program. For example in Handel's monograph, Listening

(1989), he claims to take an approach inspired in part by Gibson, but makes the following statement

about sound source identification:

"At a third level, we hear objects. [The first two levels being physical features ofsounds and more abstract timbral qualiti es.] I am thinking of 'violinness', 'PresidentCarterness', 'President Reaganness', and 'airplaneness'. What is characteristic is thatthe sounds seem directly perceived as objects. Gibson (1979) has used the termaffordances."

for which he is rightly taken to task by Heine & Guski (1991). Affordances are not objects; they are

what objects afford.

Other explicit examples are few in number. Gibson (1966) mentions that sound sources

afford orientation and localization, meaning that an organism can establish its position and heading

in space relative to a sound-emitter. Note however that this affordance is related to spatial layout

and not to the properties of the object which determine what sort of sound it produces. Michaels &

Carello (1981) briefly discuss the complementarity of perception and action in the context of an

articulatory basis for speech perception, but it is not clear that this really addresses the issue of

affordances. While it is an indispensable part of an Ecological account, the theory of affordances

seems to be one which has yet to be seriously addressed by Ecological acousticians.

Page 42: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

40

7 Conclusions & SpeculationsIn this paper I have attempted to review some theoretical writings on the nature of auditory

perception, to examine the sorts of experimentation on distally-focused perception carried out by

Establishment and Ecological researchers, and to look critically at state and prospects of Ecological

acoustics.

In general it seems that despite the emphasis on vision in perceptual theorizing, theories of

audition have paralleled those of vision. Advocates of perception-as and perception-of accounts do

not seem to differentiate between the perceptual systems. Those who propose unconscious inference,

association, or representational transformations as accounts of visual perception do so also in the

case of hearing. Those who maintain that perception is direct and unmediated and that the world is

specified by stimulation apply their analysis with equal conviction to both modaliti es. These

consistencies are evidence of the desire to develop a theory of perception, either mediated or direct,

in which all the modaliti es are governed by the same principles.

To date, attempts to devise an Ecological account of audition seem to suffer from three

shortcomings. First, the proposed source-specifying invariants of sound are not in general invariants

of the effective stimulus - the eardrum signal - in which sounds from many sources are

superimposed. Therefore an account of direct source separation is required. Second, many sound-

structuring properties of objects cannot be specified uniquely in the acoustic signal. Therefore it is

important to give an account of what it is that is directly perceived. Finally, no serious attempt has

been made to define auditory affordances, and without them a theory of perception-for-action is a

step short of direct.

I will conclude with some (perhaps ill -advised) speculations about the objects of auditory

perception and the difference between audition and vision. In a series of papers, Diehl & Kluender

and Fowler engage in a lively debate about what should properly be considered the objects of speech

perception (Diehl & Kluender 1989a, Fowler 1989, Diehl & Kluender 1989b, Fowler 1990, Diehl

et al. 1991, Fowler 1991). Fowler's position is that, directly or not, the auditory system attempts to

recover sound-producing events, and thus the objects of speech perception are articulatory. Diehl

& Kluender maintain that speech perception involves decoding auditory information, and that

Page 43: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

41

effective communication does not require access to the vocal tract configurations of one's

conversants. Does the auditory system seek to satisfy the same goals as the visual in general, or are

there natural situations in which we hear sounds and not the properties of sound-emitters?

Gaver makes a distinction between musical listening and everyday listening, in which the

former involves attending to the timbre and other abstract auditory properties of a sound (not

necessarily music), and the latter to hearing objects. The distinction is somewhat akin to the

difference between Brunswik's analytic and naive-realistic attitudes. The type of li stening one

indulges in is to some extent under conscious control. Although everyday listening may be the

default, it is quite possible to li sten musically to environmental sounds, and in fact contemporary

musical genres like musique concrète and the electroacoustic pieces of Alvin Lucier rely on this

abilit y. In addition, shifting one's focus from everyday to musical li stening does not result in a

relocation of the percept to the ears of the listener. Thus the following criti cism of Diehl &

Kluender's account is moot:

"In acoustic perception, Diehl et al. aver, however, stimulus structure in the air thathas been caused by an event is hear in itself. Why? And why does this allegedlyacoustic-signal perceiving system localize sound, not where the acoustic signal is (inthe ear), but where the acoustic-signal causing event is in the world?" (Fowler 1991).

It is simply a fact that this does not happen, whichever style of listening one happens to be involved

in. Nor does it happen in vision. An observer can compare the relative projective sizes or the colors

of objects without the location of the percepts jumping to the retina. Fowler mentions "nonsense

stimuli " such as sinewave analogs to speech, which do not contain enough information to specify

their causal sources, but these stimuli are localized in-the-world to the same degree that real speech

or any other sound is.

So, it is possible to avoid recovering a sound's causal source (or at least to ignore its

recovery), but in the absence of adjusting one's Brunswikian attitude, is it always the case that this

recovery takes place? It seems clear that the visual system attempts to recover light-structuring

properties of the world, that is, surface layout. We do see surfaces and objects, but, regardless of

identification, do we always hear materials and the events they are involved in? It is my feeling that

Page 44: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

42

the answer depends on time scale. We seem to hear events such as bouncing and approach, which

structure sound at a scale of tenths of seconds to seconds, but it is not clear that we hear properties

which structure vibration at smaller time scales.

Consider the following examples. When we hear ventilation noise, do we really hear air

turbulence resonating in a duct? In perceiving speech, do we really hear the vocal cords vibrating?

When we listen to a door swinging shut and slamming do we really hear the "stiction" in the rusty

hinges and the vibration of the door, or just a squeak and a bang? When walking in the park do we

really hear the crickets' littl e legs rubbing away or just a curious buzz? In fact, the sounds of many

animals seem to pose this sort of problem; even Jenkins provides an example while extolli ng the

richness of acoustic information:

"From our backyard locale, my wife and I heard a remarkable burst of song - somekind of warbler. At length we located a small bird on a high wire at the end of theyard. Could it be that this tiny bird was the source of the song? We thought itunlikely, but we were rapidly convinced by the synchrony of the bursts of song andthe movements of the bird." (Jenkins 1985)

Note that the sound source was identified as a warbler, but even this did not help to specify the size

of the bird.

Is there any a priori reason to think that audition is fundamentally different from vision in

this way? Assuming that these intuitions are correct, why should it be the case that we can hear

sounds in the environment without hearing vibration-structuring properties? A possible explanation

lies in the previously mentioned fact that vision is primarily directed at reflectors of energy, while

audition is primarily directed at sources of energy. What happens when we view sources of light

directly? It seems that the experience is of "a source of light of a particular color and intensity at a

particular location". While one can perhaps identify the spectrum-structuring properties of the

source (it's an LED, it's a sodium lamp etc.) the primary experience is of the radiant light itself.

Gibson makes the following remarks about radiant light:

"Is there any kind of information in radiant light? The answer must be yes, for thespectrum of any radiant beam specifies vibrations in the atoms that emitted the

Page 45: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

43

energy. The astronomer with a spectroscope can identify the substance of the star.One could aim the instrument at a luminous object and determine whether it isincandescent, fluorescent, bioluminescent, etc. But note that an eye cannot do this;it cannot register the distribution of wavelengths and cannot measure their absoluteintensities. This is not the kind of information an eye can pick up. A single spot oflight in darkness conveys only a minimum of information to an eye." (Gibson 1966).

"Radiant light has no structure; ambient light has structure. Radiant light ispropagated; ambient light is not, it is simply there. Radiant light comes from atomsand returns to atoms; ambient light depends on an environment of surfaces. Radiantlight is energy; ambient light can be information." (Gibson 1979).

The perception of radiant light is an exceptional case in Gibson's visual theory, but radiant sound is

the main stuff of audition. It seems somewhat perverse to hold that radiant light specifies atoms but

that atoms are not perceived while maintaining that radiant sound specifies vibration-structuring

properties of objects and that these can be perceived. If it is indeed the case that we can hear

temporall y-extended events but that we sometimes hear only sounds (while still being able to

identify their sources), perhaps Schubert's concepts of Source Identification and Source Behavior

Recognition can serve as a model for a uniquely auditory theory of perception.

Whether or not the foregoing comments are convincing, it is clear that transferring any

perceptual theory wholesale from one modality to another can be problematic. The ecology of

audition poses unique challenges which must be taken seriously by theorists of any stripe.

Page 46: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

44

References

Baill ey G, Laboissière R, Schwartz JL (1991): A model of coarticulation based on connectionistsequential networks: can we recover articulatory movements from acoustics. Conference on CurrentPhonetic Research Paradigms: Implications for Speech Motor Control. Stockholm, Sweden, August1991. (cited in Kluender 1991)

Ballas JA, Howard Jr. JH (1987): Interpreting the language of environmental sounds. Environmentand Behavior 19(1):91-114.

Ballesteros S (ed) (1994): Cognitive approaches to human perception. Laurence Erlbaum Associates.

Boring EG (1942): Sensation and perception in the history of experimental psychology. Appleton-Century.

Bregman AS (1981): Asking the "what for" question in auditory perception. In PerceptualOrganization, ed Kubovy M & Pomerantz JR. Laurence Erlbaum Associates.

Bregman AS (1990): Auditory scene analysis. MIT Press.

Bregman AS, Campbell J (1971): Primary auditory stream segregation and perception of order inrapid sequences of tones. J.Exp.Psych. 89:244-249.

Bruce V, Green PR (1990): Visual perception: Physiology, psychology and ecology. LaurenceErlbaum Associates.

Cipra B (1992): You can't hear the shape of a drum. Science 255:1642-1643.

Dowling JW, Lung KM, Herrbold S (1987): Aiming attention in pitch and time in the perception ofinterleaved melodies. Perception & Psychophysics 41(6):642-656.

Diehl RL, Kluender KR (1989a): On the objects of speech perception. Eco.Psych. 1(2):121-144.

Diehl RL, Kluender KR (1989b): Reply to commentators. Eco.Psych. 1(2):195-225.

Diehl RL, Walsh MA, Kluender KR (1991): On the interpretabilit y of speech/nonspeechcomparisons: A reply to Fowler. J.Acoust.Soc.Am. 89(6):2905-2909.

Driscol A (1995): Eigenmodes of isospectral drums. World Wide Web document. URL:http://cam.cornell .edu/~driscol/research/drums.html.

Page 47: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

45

Elli s D (1995): Hard problems in computational auditory scene analysis. World Wide Webdocument. URL: http://sound.media.mit.edu/~dpwe/writing/hard-probs-1995jul09.html.

Fodor JA (1975): The language of thought. Harvard University Press.

Fodor J, Pylyshyn Z (1981): How direct is visual perception? Some reflections on Gibson's'Ecological Approach'. Cognition 9:139-196.

Fowler CA (1989): Real objects of speech perception: A commentary on Diehl and Kluender.Eco.Psych. 1(2):145-160.

Fowler CA (1990): Sound-producing sources as objects of perception: Rate normalization andnonspeech perception. J.Acoust.Soc.Am. 88(3):1236-1249.

Fowler CA (1991): Auditory perception is not special: We see the world, we feel the world, we hearthe world. J.Acoust.Soc.Am. 89(6):2910-2915.

Freed D (1990): Auditory correlates of perceived mallet hardness for a set of recorded percussivesound events. J.Acoust.Soc.Am. 87:311-322.

Gardner MB (1969): Distance estimation of 0�

or apparent 0�

-oriented speech signals in anechoicspace. J.Acoust.Soc.Am. 45:47-53.

Gaver WW (1993a): What in the world do we hear? An Ecological approach to auditory eventperception. Eco.Psych. 5(1), 1-29.

Gaver WW (1993b): How do we hear the world?: Explorations in ecological acoustics. Eco.Psych.5(4):285-313.

Gibson JJ (1966): The senses considered as perceptual systems. Houghton Miff lin.

Gibson JJ (1979): The ecological approach to visual perception. Houghton Miff lin.

Gordon C, Webb D, Wolpert S (1992): One cannot hear hte shape of a drum. Bull.Am.Math.Soc.27:134-138.

Green DM, Swets JA (1966): Signal detection theory and psychophysics. Wiley.

Gregory RL (1993): Seeing and thinking. Italian J.Psych. 20:749-769.

Guski R (1992): Acoustic tau: An easy analogue to visual tau? Eco.Psych. 4(3): 189-197.

Handel S (1989): Listening: An introduction to the perception of auditory events. MIT Press.

Page 48: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

46

Hatfield G (1990): Gibsonian representations and connectionist symbol processing: Prospects forunification. Psych.Rev. 52:243-252.

Heine WD, Guski R (1991): Listening: The perception of auditory events? An essay review ofListening: an introduction to the perception of auditory events. by Stephen Handel. Eco.Psych.3(3):263-275.

Heine WD, Guski R (1993): Using auditory information for active contact with sound sourcesmoving rectili nearly with respect to a listener. In Contributions to psychological acoustics: Resultsof the 6 Oldenburg Symposium on Psychological Acoustics, ed. Schick A. 349-359.th

Heine WD, Guski R, Pittenger JB (1993): Perceiving numbers of stell balls by audition. InContributions to psychological acoustics: Results of the 6 Oldenburg Symposium on Psychologicalth

Acoustics, ed. Schick A. 361-371.

Helmholtz H von (1867/1925). Physiological optics. Vol. 3. Optical Society of America.

Helmholtz H von (1877/1954): On the sensations of tone. Dover.

Hochberg J. Perceptual theory and visual cognition. In Cognitive approaches to human perception.ed. Ballesteros S. Laurence Erlbaum Associates. 269-289.

Höger R (1993): Acoustic texture in distance perception. In Contributions to psychologicalacoustics: Results of the 6 Oldenburg Symposium on Psychological Acoustics, ed. Schick A. 337-th

348.

Jackendoff R (1987): Consciousness and the computational mind. MIT Press.

James W (1890/1950): The principles of psychology Vol.2. Dover.

Jenison RL (1994): On acoustic information for auditory motion. Perception. (in press?).

Jenkins JJ (1985): Acoustic information for objects, places and events. In Persistence and change:Proc. 1st Internat. Conf. on Event Perception, eds. Warren W, Shaw R. Laurence ErlbaumAssociates. 115-138.

Johansson G (1985): About visual event perception. In Persistence and change: Proc. 1st Internat.Conf. on Event Perception, eds. Warren W, Shaw R. Laurence Erlbaum Associates. 29-54.

Kluender KR (1991): Psychoacoustic complementarity and the dynamics of speech perception andproduction. Perilus XIV:131-136.

Page 49: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

47

Kluender KR, Jenison RL (1992): Effects of glide slope, noise intensity, and noise duration on theextrapolation of FM glides through noise. Perception & Psychophysics 51(3):231-238.

Ladefoged P, Harshmann R, Goldstein L, Rice L (1978): Generating vocal tract shapes from formantfrequencies. J.Acoust.Soc.Am 64:1027-1035.

Lerdahl F & Jackendoff R (1983): A generative theory of tonal music. MIT Press.

Licklider JCR (1959): Three auditory theories. In Psychology: A study of a science, ed S. Koch.McGraw-Hill .

Lombardo TJ (1987): The reciprocity of perceiver and environment: The evolution of James J.Gibson's ecological psychology. Laurence Erlbaum Associates, Hill sdale NJ.

Lyon RF (1983): Binaural localization and source separation. Proc. ICASSP 83:1148-1151.(reprinted in Richards 1988)

Mace WM (1977): James J. Gibson's strategy for perceiving: Ask not what's inside your head, butwhat your head's inside of. In Perceiving, acting, and knowing: Towards an ecological psychology.ed Shaw R, Bransford J. Laurence Erlbaum Associates.

Marr D (1982): Vision. Freeman.

Michaels CF & Carello C (1981): Direct Perception. Prentice-Hall .

Mohrmann K (1939): Lautheitkonstanz im Entfurnungswechsel. Z. Psychol. 145: 146-199. (citedin Postman & Tolman 1959).

Morse PM, Ingard KU (1968): Theoretical acoustics. Princeton University Press.

Nunn D (1995): Pictures of some research issues. World Wide Web document. URL:http://capella.dur.ac.uk/doug/pictures.html.

Pickles JO (1988): An introduction to the physiology of hearing. Academic Press.

Postman L & Tolman EC (1959): Brunswik's probabili stic functionalism. In Psychology: A studyof a science. ed. Koch S McGraw-Hill . 502-564.

Pylyshyn ZW (1984): Computation and cognition. MIT Press.

Reisberg D (ed) (1992): Auditory imagery. Laurence Erlbaum Associates.

Page 50: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

48

Repp BH (1987): The sound of two hands clapping: an exploratory study. J.Acoust.Soc.Am.81(4):1100-1109.

Richards W (ed) (1988): Natural computation. MIT Press.

Rock I (1980): Diff iculties with a theory of direct perception. Behavioral and Brain Sciences 3:398-399. (Commentary on Ullman 1980).

Rock I (1983): The logic of perception. MIT Press.

Rosenblum LD (1993): Acoustical information for controlled colli sions. In Contributions topsychological acoustics: Results of the 6 Oldenburg Symposium on Psychological Acoustics, ed.th

Schick A. 303-322.

Rosenblum LD, Carello C, Pastore RE (1987): Relative effectiveness of three stimulus variables forlocating a moving sound source. Perception 16:175-186.

Schubert ED (1974): The role of auditory perception in language processing. In Reading, perceptionand language. eds Duane DD, Rawson MB. York Press, Baltimore.

Searle CJ (1982): Representing acoustic information. Can.J.Psych. 36:402-419. (reprinted inRichards 1988)

Searle JR (1992): The rediscovery of the mind. MIT Press.

Shaw BK, McGowan RS, Turvey MT (1991): An acoustic variable specifying time-to-contact.Eco.Psych. 3(3):253-261.

Shepard RN (1990): Mind Sights. Freeman.

Sloman A (1989): On designing a visual system: Towards a Gibsonian computational model ofvision. J. Experimental & Theoretical Artificial Intelligence 1:289-337.

Stellmack MA (1994): The reduction of binaural interference by the temporal nonoverlap ofcomponents. J.Acoust.Soc.Am. 96(3):1465-1470.

Strutt JW (1907): On our perception of sound direction. Philosophical Magazine 13:214-232.

Turvey, Shaw, Reed, Mace (1981): Ecological Laws of perceiving and acting: In reply to Fodor andPylyshyn (1981). Cognition 9, 237-304.

Ullman S (1980): Against direct perception. (with commentaries). Behavioral and Brain Sciences3:373-415.

Page 51: A Review of Auditory Perceptual Theories and the Prospects ... · "auditory theory" to be a framework for discussing the physiology of the inner ear, while more modern collections

49

Vanderveer NJ (1979): Ecological acoustics: human perception of environmental sounds.Dissertation Abstracts International, 40: 4543B. (University Microfilms no. 8004002). (Cited byBallas and Howard, 1987).

Warren RM (1982): Auditory perception: a new synthesis. Pergamon.

Warren WH & Verbrugge RR (1984): Auditory perception of breaking and bouncing events.J.Exp.Psych.:Human Perception and Performance 10:704-712. (reprinted in Richards 1988).

Wightman FL, Jenison RL (1995): Auditory spatial layout. In Handbook of perception and cognitionVol 5: Perception of space and motion. eds Epstein W, Rogers S. Academic Press. (in press?)

Wildes RP, Richards WA (1988): Recovering material properties from sound. In NaturalComputation. ed Richards WA. MIT Press. 356-363.

Yost (1990): Auditory image perception and analysis: The basis for hearing. Hearing Research 56:8-18.