Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
Auditory Processing and Perception
BMT 925, Course 2013
Theme 5: Auditory Streaming
Farah I. Corona-Strauss
Systems Neuroscience &Neurotechnology Unit
Neurocenter, Saarland University Hospital
Systems Neuroscience &Neurotechnology Unit Cocktail party problem
This effect is related to how a listener at a party,
subjected to multiple sound sources, is able to separate,
and participate in, a single conversation; a problem which
scales to ask how a listener makes sense of any complex
auditory environment.
Stream: a perceptual unit that represents a single acoustic
source.
Source is the physical generator of a sound.
It is usual for a sequence of sounds originating from the same
source to be perceived as a stream.
2
Systems Neuroscience &Neurotechnology Unit Sound reprentation: Spectogram
Spectrogram, is a visual representation of how the
frequencies in a sound change over time. The amount of
sound at each point of time, at each frequency is
represented by the darkness at that point in the diagram.
one, two, three
mixture of (a) the spoken words "one, two,
three," (b) singing "da-da-da," (c) whistling, (d)
computer fan.
Systems Neuroscience &Neurotechnology Unit ASA
What is auditory scene analysis?
• A process in which the auditory system takes the mixture of sound that it
derives from a complex natural environment and sorts it into packages of
acoustic evidence in which each package probably has arisen from a single
source of sound.
• This grouping helps pattern recognition not to mix information from different
sources. Often, the interest is in a single stream of events, such as a violin
playing, a person talking, or a car approaching.
The general idea is to understand how the brain could build separate perceptual
descriptions of sound-generating events despite this mixing of evidence. It
appears that the first thing it does is to analyze the incoming array of sound into
a large number of frequency components.
But then it is left with the following problem: which combination of these
components has arisen from a particular source of sound, such as the voice of
a particular person continuing over time? Only by putting together the right set
of frequency components over time can the identity of the signals be
recognized. Otherwise the recognizer might combine the syllables of two
talkers to make a spurious word.
3
Systems Neuroscience &Neurotechnology Unit
Auditory Scene Analysis (by
Bregman)
1. Decomposition of sounds into discrete sensory
elements
2. Stream forming and segregation: grouping
(primitive and schema-driven grouping)
Analysis and Synthesis
The initial states of ASA
*Grouping is the mechanism by which these
sensory elements are combined.
Systems Neuroscience &Neurotechnology Unit Auditory Stream Segregation
• Auditory stream segregation is the phenomenon in which
a single sequence of sounds breaks up into two or
more parallel perceptual streams.
• The streams appear to be happening at the same time,
and each is heard as a separate sequence of sounds,
with its own melody and rhythm.
4
Systems Neuroscience &Neurotechnology Unit
Primitive Grouping
Systems Neuroscience &Neurotechnology Unit
Gestalt-Approach
(contra classical structuralism)
Gestalt is also known as the "Law of Simplicity"
5
Systems Neuroscience &Neurotechnology Unit
Law of Simplicity
Of several geometrically possible organisations the one will actually
occur which possesses the best, simplest and most stable shape
Systems Neuroscience &Neurotechnology Unit Example: Color Selection
Viewer can rapidly and accurately determine
whether the target (red circle) is present or absent.
Difference detected in color.
6
Systems Neuroscience &Neurotechnology Unit
Example: Shape Selection
Viewer can rapidly and accurately determine
whether the target (red circle) is present or absent.
Difference detected in form (curvature)
Systems Neuroscience &Neurotechnology Unit
Example: Conjunction of Features
Viewer cannot rapidly and accurately determine
whether the target (red circle) is present or absent when
target has two or more features, each of which are
present in the distractors. Viewer must search sequentially.
All Preattentive Processing figures from Healey 97
http://www.csc.ncsu.edu/faculty/healey/PP/PP.html
7
Systems Neuroscience &Neurotechnology Unit Common fate
• Common fate describes the tendency to group components whose
properties change in a similar way over time. Stimulus features
which are subject to these common fluctuations are generally
perceived as one object, making it difficult to focus on the individual
components.
• Similarly, common onset and offset are also part of the wider
common fate cue, and correspond to the perceptual grouping of
components whose onsets and / or offsets occur simultaneously.
– Darwin (1984) has shown that a tone that starts or stops at the same
time as a vowel sound is more likely to be heard as part of the vowel
complex than if the onset and / or offset times had been different. In
support of this grouping principle, Roberts and Moore (1991)
demonstrated that tones added to a vowel sound had a significantly
stronger effect on vowel quality when presented with identical onset and
offset times.
Systems Neuroscience &Neurotechnology Unit
Law of Proximity
Law of Similarity
VSA: Proximity-Similarity
8
Systems Neuroscience &Neurotechnology Unit ASA: Proximity-Similarity
• The Gestalt proximity principle states that the grouping
strength between elements, or groups of elements, is
proportional to the distance between them.
• In hearing, similarity usually implies closeness of timbre,
pitch, or loudness.
– For example, the relationship between frequency proximity and
temporal proximity has been studied extensively using the two
tone streaming phenomenon (see Bregman, 1990 for a review).
The closer in frequency two tones are, the more likely it is that
they are grouped into the same stream. Similarly, the proximity
of two tones in time, also determines likelihood of streaming.
Systems Neuroscience &Neurotechnology Unit
Law of Proximity and Similarity:
Example
track3
3. Loss of rhythmic information as a result of stream
segregation.
9
Systems Neuroscience &Neurotechnology Unit
Law of Closure
Visual and Auditory Scene
Analysis
Systems Neuroscience &Neurotechnology Unit
Law of Good Continuation
Visual and Auditory Scene
Analysis
10
Systems Neuroscience &Neurotechnology Unit Continuity & Closure
• Certain interrupted and / or smoothly changing forms are perceptually grouped into a
whole. For the continuity perception to occur, there must be sufficient evidence to
support the hypothesis that the form is obscured rather than interrupted.
• Similarly, when the figures are considered to be pseudospectrograms with the bar
representing a tone and the gray band representing noise, provided the noise is
sufficiently loud, the noise band will be perceived as obscuring a single tone, rather
than separating two distinct tones. This effect can also been seen for speech signals
which are obscured by noise. The speech sounds much more intelligible and
continuous when interrupted by noise than when interrupted by silence (Miller and
Licklider, 1950; see also Warren et al., 1972).
• Good continuation is also recognised as playing an important role in perceptual
grouping. Provided the changes between consecutive elements are smooth, then the
elements tend to be grouped together.
Systems Neuroscience &Neurotechnology Unit Law of Closure: Example
12. Effects of connectedness on segregation.
We perceive the first sequence, with the transitions, as more coherent. This
demonstration shows that continuity helps hold auditory sequences together.
11
Systems Neuroscience &Neurotechnology Unit
Law of Good Continuation:
Example
28. Apparent continuity.
29. Perceptual continuation of a gliding tone through a
noise burst.
Systems Neuroscience &Neurotechnology Unit Gestalt-Principles
• Common fate: elements whose change in a similar way over time tend to be grouped together
• Similarity: elements that are similar in physical attributes (such as timbre, pitch, or loudness) tend to be grouped
• Proximity: grouping strength between elements, or clusters of elements, is proportional to the distance between them
• Continuity: provided the changes between consecutive elements are smooth then the elements tend to be grouped together
• Closure: elements that form a complete, but possible partially obscured object tend to be grouped
12
Systems Neuroscience &Neurotechnology Unit
Schema-driven Grouping
Systems Neuroscience &Neurotechnology Unit Schema-driven groupping
• The perceptual system can also employ prior knowledge about common
sound, to organize the acoustic enviroment into streams.
• Example: Schema-driven can be demonstrated by the perceptual
restoration of a phoneme which has been replaced by a burst of noise
(Warren and Warren, 1970).
• The stimulus used was of the form “the *eel was on the axle” or “the *eel
was on the orange”, where “*” indicates the noisemasked deleted phoneme.
• For these two examples, listeners reported hearing wheel and peel
respectively: a top-down schema processes the speech before conscious
perception occurs even when the disambiguating word occurred several
words later than the deletion.
13
Systems Neuroscience &Neurotechnology Unit Critic of pure audition:
Systems Neuroscience &Neurotechnology Unit Critic of pure audition:
14
Systems Neuroscience &Neurotechnology Unit Sounds perception:
Miriam Makeba Click song
Systems Neuroscience &Neurotechnology Unit Chimaeraic sounds
15
Systems Neuroscience &Neurotechnology Unit Example
• Separation of components: Envelope sound
Fine structure + Envelope
Systems Neuroscience &Neurotechnology Unit CAPD
Central Auditory Processing Disorders= Auditory
Processing Disorders
• Disorders of the auditory central nervous system and its
perceptual functions.
• Can be caused by a variety of factors: age related
deterioration, congenital and/or hereditary disorders,
degenerative and demyelinated diseases, developmental
disorders, chemical or drug induced problems, head
traume, infections, tumors, and even surgically induced
lesions. Sometimes are often of undetermined origin.
16
Systems Neuroscience &Neurotechnology Unit APD
• Patients with APD typically experienced one or more of the following
problems with perceptual processing of auditory information:
– Disturbances of speech when noise, reverberation or competing signals,
are present.
– Impairements of auditory discrimination, pattern recognition, and various
binaural processes (directional hearing, binaural fusion).
• The diagnosis of APD can be challenging because many of its
features are also found in patients with other problems, such as
learning disabilites, dyslexia, austistic disorders, language
impairment, ADHD, and cognitive disorders.
• The deficits does not have only to be limited to auditory modality.
Systems Neuroscience &Neurotechnology Unit CASA
• Computer models that mimic auditory scene analysis have led to the field of study known as COMPUTATIONAL AUDITORY SCENE ANALYSIS (CASA).
• Early work was concentrated on speech separation.
• Computational solutions to the ASA problem are generally motivated by one of two applications:
1. The first is the goal of improving automatic speech recognition (ASR) performance in noisy environments. The accuracy of ASR systems whose input speech has been obtained in all but the quietest of backgrounds is poor compared to that of the human listener particularly if the noise is non-stationary.
An ASA model that can successfully segregate speech from any number of interfering sound sources (including other speech) could be used as a first stage of pre-processing in a larger recognition system.
2. The second motivation is to produce ‘advanced’, or ‘intelligent’, hearing prostheses. Instead of amplifying the entire frequency range, such devices could segregate the acoustic environment into any number of streams, one of which (for example, a speech stream) could be selected with all the others being attenuated.
17
Systems Neuroscience &Neurotechnology Unit
Wrigley et al., Comp. Model of Atten., IEEE Trans. on NN, 2005
Systems Neuroscience &Neurotechnology Unit CASA
Wrigley et al., Comp. Model of Atten., IEEE Trans. on NN, 2005
18
Systems Neuroscience &Neurotechnology Unit Practical applications
• Auditory disorders: Some people with normal or near normal audiograms
complain of not being able to understand voices when mixed with other
sounds (i.e., they are deficient in ASA). Tests based on scientific knowledge
of ASA, could assess the patients' residual abilities to use specific kinds of
cues for the segregation of signals. This may help in diagnosing the
physiological basis of their disorders, and may also permit the fitting of
hearing aids that maximize the remaining potential.
• Smart hearing aids: "Smart" hearing aids may be provided for individuals
who have difficulty in segregating concurrent sounds. Computers
incorporated in these aids would enhance the ASA cues for segregation,
such as spatial location, and allow listeners to use their remaining abilities
to focus on individual sounds.
• Workplace: Knowledge about ASA can contribute to workplace safety by
aiding in the design of alarms and machine-status signals that will tend not
to blend with each other or the background.
Systems Neuroscience &Neurotechnology Unit
The Computational Ear
19
Systems Neuroscience &Neurotechnology Unit Auditory modelling
Systems Neuroscience &Neurotechnology Unit Correlogram
• The autocorrelogram, or simply correlogram, is a visual display of sound periodicity and an important representation of auditory temporal activity that combines both spectral and temporal information. It is normally defined as a three-dimensional volumetric function, mapping a frequency channel of an auditory periphery model, temporal autocorrelation delay (or lag), and time to the amount of periodic energy in that channel at that delay and time.
• The periodicity of sound is well represented in the correlogram.
• If the original sound contains a signal that is approximately periodic, such as voiced speech, then each frequency channel excited by that signal will have a high similarity to itself delayed by the period of repetition.
20
Systems Neuroscience &Neurotechnology Unit Correlogram
Systems Neuroscience &Neurotechnology Unit
21
Systems Neuroscience &Neurotechnology Unit
Systems Neuroscience &Neurotechnology Unit
22
Systems Neuroscience &Neurotechnology Unit
Systems Neuroscience &Neurotechnology Unit HAP: model of the BM
Working model of the basilar membrane response
(BM) to arbitrary sound stimuli