Upload
ting-shuo-yo
View
1.355
Download
4
Embed Size (px)
DESCRIPTION
An introduction to the biology and neurophysiology of human speech. The target audience is researchers and engineers working on speech recognition technology.
Citation preview
Neurophysiology of Speech
T.S. Yo
ReferencesAudition, the body senses, and the chemical senses. Physiology of behavior, 6th Ed, 1998, pp. 185-223. by Carlson N. R.
Human communication. Physiology of behavior, 6th Ed, 1998, pp. 477-508. by Carlson, N. R.
FUNCTIONAL MRI OF LANGUAGE: New Approaches to Understanding the Cortical Organization of Semantic Processing Annu. Rev. Neurosci., (2002), pp. 151-188. by Bookheimer, S.
Lateralization of auditory language functions: A dynamic dual pathway model Brain and Language, 89 (2004) 267–276 by Friederici, A.D. and Alter, K.
Outline
● Auditory apparatus● MFCC● Lesion study● Neuroimaging● Dynamic dual channel model● Can we design ASR systems by mimicking
organic systems?
Auditory system
鼓膜耳廓
槌骨
砧骨
鐙骨
歐氏管;耳咽管
耳蝸
前庭
Cochlea
Cochlea (2)
Auditory Pathway
Detecting Acoustic Features● Pitch
– High freq: place coding– Low freq: rate coding
● Loudness– Freq of firing in cochlea nerves
● Timbre– Waveform decomposition
Localization with Neural Circuits
Localization with Neural Circuits
Vestibular System
MFCC● Mel Frequency Cepstral Coefficient
– Take the Fourier transform of a signal– Map the log amplitudes of the spectrum obtained
above onto the mel scale, using triangular overlapping windows.
– Take the Discrete Cosine Transform of the list of mel log-amplitudes, as if it were a signal.
– The MFCCs are the amplitudes of the resulting spectrum.
From the ears to the brain● Ear
– Spectral signals.– Fourier transform done by neural circuits.
● Brain– Two pathways in two hemisphere– Left: semantics and syntactics– Right: prosody
Brain Mechanisms for Language
● From lesion study to neuroimaging● Localization of functions● Lateralization● Speech Production and Comprehension● Prosody
Lesion Studies● Aphasia
– Difficulty in producing or comprehending speech caused by brain damage.
● Broca's aphasia– agrammatism– anomia
● Wernicke's aphasia– poor speech comprehension
Broca's Aphasia● Agrammatism:
– difficulty in understanding / using grammar● Anomia:
– difficulty in finding the appropriate word to describe an object, action, or attribute.
● Apraxia of speech: – impairment in the ability to program movements of
the tongue, lips, and throat required to produce the proper sequence of speech sounds.
Broca's Aphasia Example● "Yes ... Monday ... Dad, and Dad ... hospital,
and ... Wednesday, Wednesday, nine o'clock and ... Thursday, ten o'clock ... doctors, two, two ... doctors and ... teeth, yah."
● 是...阿...星期一...阿...父親及父親....阿...醫院...及阿...星期三...星期三九點... 以及 ,喔...星期四...十點, 阿,醫生...兩個...醫生...及阿...牙齒...對的。
Broca's Aphasia
Wernicke's Aphasia● Poor speech comprehension:
–
● Fluent but meaningless speech: –
● Pure word deafness: – The ability to hear, to speak, and to read and write
without being able to comprehend the meaning of speech.
Wernicke's Aphasia Example● Examiner: What kind of work have you done? ● Patient: We, the kids, all of us, and I, we were working for a long time
in the ... you know ... it's the kind of space, I mean place rear to the spedawn ...
● Examiner: Excuse me, but I wanted to know what work you have been doing.
● Patient: If you had said that, we had said that, poomer, near the fortunate, porpunate, tamppoo, all around the fourth of martz. Oh, I get all confused.
Wernicke's Aphasia
Neuroimaging Studies● Neuroimaging
– Functional magnetic resonance imaging (fMRI)– Positron emission tomography (PET)
● Subjects are asked to perform cognitive tasks while taking imaging.
Neuroimaging● FMRI● PET
Normalizing Neuroimages● Talairach coordinate space
– Center: Anterior Commissure
– X: [-65, +65]– Y: [+70, -90]– Z: [-40, +65]
Semantic Conditions● Same
– The lawyer questioned the witness.– The attorney questioned the witness.
● Different– The man was attacked by the doberman.– The man was attacked by the pitbull.
Syntactic Conditions● Same
– The policeman arrested the thief.– The thief was arrested by the policeman.
● Different– The teacher was outsmarted by the student.– The teacher outsmarted the student.
Summary by Bookheimer, 2002
● The role of the left inferior frontal lobe in semantic processing and dissociations from other frontal lobe language functions.
● The organization of categories of objects and concepts in the temporal lobe.
● The role of the right hemisphere in comprehending contextual and figurative meaning.
Overview by Ahrens, 2007● Past
– Functional localization (brain damage)● Present
– Narrower localization + discussion of overlap and integration (neuro-imaging techniques)
● Future – Language as a brain function (integrate knowledge
about timing, context, and individual differences)
The Three Myths● Myth 1: Broca’s area deals with syntax/production
– Fact: Semantics and phonology cluster in different areas of the IFG; syntax seems to be distributed throughout the IFG.
– Fact: IFG is activated during non-language tasks.
● Myth 2: Wernicke’s area deals with semantics/comprehension– Fact: There are functional subdivisions for language in
posterial temporal area.
The Three Myths● Myth 3: The right hemisphere is not used when
processing language – Fact: The right hemisphere is called upon for many
integrative language processes.> Figurative Language and Metaphor> Linguistic Context> Prosody
Summary of Neuroimaging Studies
Dynamic Dual Pathway Model
● Spoken language comprehension requires the coordination of different subprocesses in time.
● Segmental information: – phonemes, syntactic elements and lexical-semantic
elements.● Suprasegmental information:
– accentuation and intonational phrases, i.e., prosody.
Localization of Different Subsystems
● Segmental information:– syntactic and semantic information are primarily
processed in a left hemispheric temporo-frontal pathway including separate circuits for syntactic and semantic information
● Suprasegmental information: – sentence level prosody is processed in a right
hemispheric temporo-frontal pathway.
Dynamic Interaction● Corpus Callosum
Can we design ASR systems by imitating the brain?
● An open question– Is it possible? Is it more effective?
● Complexity– Basic computation power of a neuron: 60 hz– 10^8 of input, 10^10 in the brain, each with >8000
connections● Training time
– How long would it take for a human being to understand language?
Some factors in human neural system