Systems Neuroscience & Neurotechnology Unit€¦ · Systems Neuroscience & Neurotechnology Unit Auditory modelling Correlogram • The autocorrelogram, or simply correlogram, is a

1

Auditory Processing and Perception

BMT 925, Course 2013

Theme 5: Auditory Streaming

Farah I. Corona-Strauss

Systems Neuroscience &Neurotechnology Unit

Neurocenter, Saarland University Hospital

Systems Neuroscience &Neurotechnology Unit Cocktail party problem

This effect is related to how a listener at a party,

subjected to multiple sound sources, is able to separate,

and participate in, a single conversation; a problem which

scales to ask how a listener makes sense of any complex

auditory environment.

Stream: a perceptual unit that represents a single acoustic

source.

Source is the physical generator of a sound.

It is usual for a sequence of sounds originating from the same

source to be perceived as a stream.

2

Systems Neuroscience &Neurotechnology Unit Sound reprentation: Spectogram

Spectrogram, is a visual representation of how the

frequencies in a sound change over time. The amount of

sound at each point of time, at each frequency is

represented by the darkness at that point in the diagram.

one, two, three

mixture of (a) the spoken words "one, two,

three," (b) singing "da-da-da," (c) whistling, (d)

computer fan.

Systems Neuroscience &Neurotechnology Unit ASA

What is auditory scene analysis?

• A process in which the auditory system takes the mixture of sound that it

derives from a complex natural environment and sorts it into packages of

acoustic evidence in which each package probably has arisen from a single

source of sound.

• This grouping helps pattern recognition not to mix information from different

sources. Often, the interest is in a single stream of events, such as a violin

playing, a person talking, or a car approaching.

The general idea is to understand how the brain could build separate perceptual

descriptions of sound-generating events despite this mixing of evidence. It

appears that the first thing it does is to analyze the incoming array of sound into

a large number of frequency components.

But then it is left with the following problem: which combination of these

components has arisen from a particular source of sound, such as the voice of

a particular person continuing over time? Only by putting together the right set

of frequency components over time can the identity of the signals be

recognized. Otherwise the recognizer might combine the syllables of two

talkers to make a spurious word.

3


Auditory Scene Analysis (by

Bregman)

1. Decomposition of sounds into discrete sensory

elements

2. Stream forming and segregation: grouping

(primitive and schema-driven grouping)

Analysis and Synthesis

The initial states of ASA

*Grouping is the mechanism by which these

sensory elements are combined.

Systems Neuroscience &Neurotechnology Unit Auditory Stream Segregation

• Auditory stream segregation is the phenomenon in which

a single sequence of sounds breaks up into two or

more parallel perceptual streams.

• The streams appear to be happening at the same time,

and each is heard as a separate sequence of sounds,

with its own melody and rhythm.

4


Primitive Grouping


Gestalt-Approach

(contra classical structuralism)

Gestalt is also known as the "Law of Simplicity"

5


Law of Simplicity

Of several geometrically possible organisations the one will actually

occur which possesses the best, simplest and most stable shape

Systems Neuroscience &Neurotechnology Unit Example: Color Selection

Viewer can rapidly and accurately determine

whether the target (red circle) is present or absent.

Difference detected in color.

6


Example: Shape Selection

Viewer can rapidly and accurately determine

whether the target (red circle) is present or absent.

Difference detected in form (curvature)


Example: Conjunction of Features

Viewer cannot rapidly and accurately determine

whether the target (red circle) is present or absent when

target has two or more features, each of which are

present in the distractors. Viewer must search sequentially.

All Preattentive Processing figures from Healey 97

http://www.csc.ncsu.edu/faculty/healey/PP/PP.html

http://www.csc.ncsu.edu/faculty/healey/PP/PP.html

7

Systems Neuroscience &Neurotechnology Unit Common fate

• Common fate describes the tendency to group components whose

properties change in a similar way over time. Stimulus features

which are subject to these common fluctuations are generally

perceived as one object, making it difficult to focus on the individual

components.

• Similarly, common onset and offset are also part of the wider

common fate cue, and correspond to the perceptual grouping of

components whose onsets and / or offsets occur simultaneously.

– Darwin (1984) has shown that a tone that starts or stops at the same

time as a vowel sound is more likely to be heard as part of the vowel

complex than if the onset and / or offset times had been different. In

support of this grouping principle, Roberts and Moore (1991)

demonstrated that tones added to a vowel sound had a significantly

stronger effect on vowel quality when presented with identical onset and

offset times.


Law of Proximity

Law of Similarity

VSA: Proximity-Similarity

8

Systems Neuroscience &Neurotechnology Unit ASA: Proximity-Similarity

• The Gestalt proximity principle states that the grouping

strength between elements, or groups of elements, is

proportional to the distance between them.

• In hearing, similarity usually implies closeness of timbre,

pitch, or loudness.

– For example, the relationship between frequency proximity and

temporal proximity has been studied extensively using the two

tone streaming phenomenon (see Bregman, 1990 for a review).

The closer in frequency two tones are, the more likely it is that

they are grouped into the same stream. Similarly, the proximity

of two tones in time, also determines likelihood of streaming.


Law of Proximity and Similarity:

Example

track3

3. Loss of rhythmic information as a result of stream

segregation.

9


Law of Closure

Visual and Auditory Scene

Analysis


Law of Good Continuation

Visual and Auditory Scene

Analysis

10

Systems Neuroscience &Neurotechnology Unit Continuity & Closure

• Certain interrupted and / or smoothly changing forms are perceptually grouped into a

whole. For the continuity perception to occur, there must be sufficient evidence to

support the hypothesis that the form is obscured rather than interrupted.

• Similarly, when the figures are considered to be pseudospectrograms with the bar

representing a tone and the gray band representing noise, provided the noise is

sufficiently loud, the noise band will be perceived as obscuring a single tone, rather

than separating two distinct tones. This effect can also been seen for speech signals

which are obscured by noise. The speech sounds much more intelligible and

continuous when interrupted by noise than when interrupted by silence (Miller and

Licklider, 1950; see also Warren et al., 1972).

• Good continuation is also recognised as playing an important role in perceptual

grouping. Provided the changes between consecutive elements are smooth, then the

elements tend to be grouped together.

Systems Neuroscience &Neurotechnology Unit Law of Closure: Example

12. Effects of connectedness on segregation.

We perceive the first sequence, with the transitions, as more coherent. This

demonstration shows that continuity helps hold auditory sequences together.

11


Law of Good Continuation:

Example

28. Apparent continuity.

29. Perceptual continuation of a gliding tone through a

noise burst.

Systems Neuroscience &Neurotechnology Unit Gestalt-Principles

• Common fate: elements whose change in a similar way over time tend to be grouped together

• Similarity: elements that are similar in physical attributes (such as timbre, pitch, or loudness) tend to be grouped

• Proximity: grouping strength between elements, or clusters of elements, is proportional to the distance between them

• Continuity: provided the changes between consecutive elements are smooth then the elements tend to be grouped together

• Closure: elements that form a complete, but possible partially obscured object tend to be grouped

12


Schema-driven Grouping

Systems Neuroscience &Neurotechnology Unit Schema-driven groupping

• The perceptual system can also employ prior knowledge about common

sound, to organize the acoustic enviroment into streams.

• Example: Schema-driven can be demonstrated by the perceptual

restoration of a phoneme which has been replaced by a burst of noise

(Warren and Warren, 1970).

• The stimulus used was of the form “the *eel was on the axle” or “the *eel

was on the orange”, where “*” indicates the noisemasked deleted phoneme.

• For these two examples, listeners reported hearing wheel and peel

respectively: a top-down schema processes the speech before conscious

perception occurs even when the disambiguating word occurred several

words later than the deletion.

http://images.google.de/imgres?imgurl=http://billstclair.com/blog/images/broken-axle-464x319.jpg&imgrefurl=http://billstclair.com/blog/broken_axle.html&usg=___XB7oCzlxJ080u37jxYX9j55ISw=&h=319&w=464&sz=27&hl=de&start=3&um=1&tbnid=9Rno83gbekwevM:&tbnh=88&tbnw=128&prev=/images%3Fq%3Daxle%26hl%3Dde%26rlz%3D1T4GZAZ_deDE252DE253%26sa%3DN%26um%3D1

http://images.google.de/imgres?imgurl=http://www.fruechteadam.com/img/Orange.jpg&imgrefurl=http://www.fruechteadam.com/obst_de.html&usg=__dwCLiqyByXwKMQhb2WFEgZYGel8=&h=356&w=363&sz=23&hl=de&start=1&um=1&tbnid=NWR_TUoBCgsLIM:&tbnh=119&tbnw=121&prev=/images%3Fq%3Dorange%26hl%3Dde%26rlz%3D1T4GZAZ_deDE252DE253%26um%3D1

13

Systems Neuroscience &Neurotechnology Unit Critic of pure audition:

Systems Neuroscience &Neurotechnology Unit Critic of pure audition:

14

Systems Neuroscience &Neurotechnology Unit Sounds perception:

Miriam Makeba Click song

Systems Neuroscience &Neurotechnology Unit Chimaeraic sounds

http://images.google.de/imgres?imgurl=http://www.dissidenten.com/exilneu/press/bilder/covers/4958-makeba-reflections.jpg&imgrefurl=http://www.dissidenten.com/exilneu/press/4958-makeba-pi-dt.html&usg=__2NJ-OdAP--Z8EMNPJOBlYQbTmnk=&h=1405&w=1573&sz=822&hl=de&start=5&um=1&tbnid=YbG5xjRSq5JGtM:&tbnh=134&tbnw=150&prev=/images%3Fq%3Dmiriam%2Bmakeba%26hl%3Dde%26rlz%3D1T4GZAZ_deDE252DE253%26sa%3DX%26um%3D1

15

Systems Neuroscience &Neurotechnology Unit Example

• Separation of components: Envelope sound

Fine structure + Envelope

Systems Neuroscience &Neurotechnology Unit CAPD

Central Auditory Processing Disorders= Auditory

Processing Disorders

• Disorders of the auditory central nervous system and its

perceptual functions.

• Can be caused by a variety of factors: age related

deterioration, congenital and/or hereditary disorders,

degenerative and demyelinated diseases, developmental

disorders, chemical or drug induced problems, head

traume, infections, tumors, and even surgically induced

lesions. Sometimes are often of undetermined origin.

16

Systems Neuroscience &Neurotechnology Unit APD

• Patients with APD typically experienced one or more of the following

problems with perceptual processing of auditory information:

– Disturbances of speech when noise, reverberation or competing signals,

are present.

– Impairements of auditory discrimination, pattern recognition, and various

binaural processes (directional hearing, binaural fusion).

• The diagnosis of APD can be challenging because many of its

features are also found in patients with other problems, such as

learning disabilites, dyslexia, austistic disorders, language

impairment, ADHD, and cognitive disorders.

• The deficits does not have only to be limited to auditory modality.

Systems Neuroscience &Neurotechnology Unit CASA

• Computer models that mimic auditory scene analysis have led to the field of study known as COMPUTATIONAL AUDITORY SCENE ANALYSIS (CASA).

• Early work was concentrated on speech separation.

• Computational solutions to the ASA problem are generally motivated by one of two applications:

1. The first is the goal of improving automatic speech recognition (ASR) performance in noisy environments. The accuracy of ASR systems whose input speech has been obtained in all but the quietest of backgrounds is poor compared to that of the human listener particularly if the noise is non-stationary.

An ASA model that can successfully segregate speech from any number of interfering sound sources (including other speech) could be used as a first stage of pre-processing in a larger recognition system.

2. The second motivation is to produce ‘advanced’, or ‘intelligent’, hearing prostheses. Instead of amplifying the entire frequency range, such devices could segregate the acoustic environment into any number of streams, one of which (for example, a speech stream) could be selected with all the others being attenuated.

17


Wrigley et al., Comp. Model of Atten., IEEE Trans. on NN, 2005

Systems Neuroscience &Neurotechnology Unit CASA

Wrigley et al., Comp. Model of Atten., IEEE Trans. on NN, 2005

18

Systems Neuroscience &Neurotechnology Unit Practical applications

• Auditory disorders: Some people with normal or near normal audiograms

complain of not being able to understand voices when mixed with other

sounds (i.e., they are deficient in ASA). Tests based on scientific knowledge

of ASA, could assess the patients' residual abilities to use specific kinds of

cues for the segregation of signals. This may help in diagnosing the

physiological basis of their disorders, and may also permit the fitting of

hearing aids that maximize the remaining potential.

• Smart hearing aids: "Smart" hearing aids may be provided for individuals

who have difficulty in segregating concurrent sounds. Computers

incorporated in these aids would enhance the ASA cues for segregation,

such as spatial location, and allow listeners to use their remaining abilities

to focus on individual sounds.

• Workplace: Knowledge about ASA can contribute to workplace safety by

aiding in the design of alarms and machine-status signals that will tend not

to blend with each other or the background.


The Computational Ear

19

Systems Neuroscience &Neurotechnology Unit Auditory modelling

Systems Neuroscience &Neurotechnology Unit Correlogram

• The autocorrelogram, or simply correlogram, is a visual display of sound periodicity and an important representation of auditory temporal activity that combines both spectral and temporal information. It is normally defined as a three-dimensional volumetric function, mapping a frequency channel of an auditory periphery model, temporal autocorrelation delay (or lag), and time to the amount of periodic energy in that channel at that delay and time.

• The periodicity of sound is well represented in the correlogram.

• If the original sound contains a signal that is approximately periodic, such as voiced speech, then each frequency channel excited by that signal will have a high similarity to itself delayed by the period of repetition.

20

Systems Neuroscience &Neurotechnology Unit Correlogram


21



22


Systems Neuroscience &Neurotechnology Unit HAP: model of the BM

Working model of the basilar membrane response

(BM) to arbitrary sound stimuli

Documents

Systems Neuroscience & Neurotechnology Unit€¦ · Systems Neuroscience & Neurotechnology Unit Auditory modelling Correlogram • The autocorrelogram, or simply correlogram, is a