The Neurophysiology of Speech

Preview:

DESCRIPTION

An introduction to the biology and neurophysiology of human speech. The target audience is researchers and engineers working on speech recognition technology.

Citation preview

Neurophysiology of Speech

T.S. Yo

ReferencesAudition, the body senses, and the chemical senses. Physiology of behavior, 6th Ed, 1998, pp. 185-223. by Carlson N. R.

Human communication. Physiology of behavior, 6th Ed, 1998, pp. 477-508. by Carlson, N. R.

FUNCTIONAL MRI OF LANGUAGE: New Approaches to Understanding the Cortical Organization of Semantic Processing Annu. Rev. Neurosci., (2002), pp. 151-188. by Bookheimer, S.

Lateralization of auditory language functions: A dynamic dual pathway model Brain and Language, 89 (2004) 267–276 by Friederici, A.D. and Alter, K.

Outline

● Auditory apparatus● MFCC● Lesion study● Neuroimaging● Dynamic dual channel model● Can we design ASR systems by mimicking

organic systems?

Auditory system

鼓膜耳廓

槌骨

砧骨

鐙骨

歐氏管;耳咽管

耳蝸

前庭

Cochlea

Cochlea (2)

Auditory Pathway

Detecting Acoustic Features● Pitch

– High freq: place coding– Low freq: rate coding

● Loudness– Freq of firing in cochlea nerves

● Timbre– Waveform decomposition

Localization with Neural Circuits

Localization with Neural Circuits

Vestibular System

MFCC● Mel Frequency Cepstral Coefficient

– Take the Fourier transform of a signal– Map the log amplitudes of the spectrum obtained

above onto the mel scale, using triangular overlapping windows.

– Take the Discrete Cosine Transform of the list of mel log-amplitudes, as if it were a signal.

– The MFCCs are the amplitudes of the resulting spectrum.

From the ears to the brain● Ear

– Spectral signals.– Fourier transform done by neural circuits.

● Brain– Two pathways in two hemisphere– Left: semantics and syntactics– Right: prosody

Brain Mechanisms for Language

● From lesion study to neuroimaging● Localization of functions● Lateralization● Speech Production and Comprehension● Prosody

Lesion Studies● Aphasia

– Difficulty in producing or comprehending speech caused by brain damage.

● Broca's aphasia– agrammatism– anomia

● Wernicke's aphasia– poor speech comprehension

Broca's Aphasia● Agrammatism:

– difficulty in understanding / using grammar● Anomia:

– difficulty in finding the appropriate word to describe an object, action, or attribute.

● Apraxia of speech: – impairment in the ability to program movements of

the tongue, lips, and throat required to produce the proper sequence of speech sounds.

Broca's Aphasia Example● "Yes ... Monday ... Dad, and Dad ... hospital,

and ... Wednesday, Wednesday, nine o'clock and ... Thursday, ten o'clock ... doctors, two, two ... doctors and ... teeth, yah."

● 是...阿...星期一...阿...父親及父親....阿...醫院...及阿...星期三...星期三九點... 以及 ,喔...星期四...十點, 阿,醫生...兩個...醫生...及阿...牙齒...對的。

Broca's Aphasia

Wernicke's Aphasia● Poor speech comprehension:

● Fluent but meaningless speech: –

● Pure word deafness: – The ability to hear, to speak, and to read and write

without being able to comprehend the meaning of speech.

Wernicke's Aphasia Example● Examiner: What kind of work have you done? ● Patient: We, the kids, all of us, and I, we were working for a long time

in the ... you know ... it's the kind of space, I mean place rear to the spedawn ...

● Examiner: Excuse me, but I wanted to know what work you have been doing.

● Patient: If you had said that, we had said that, poomer, near the fortunate, porpunate, tamppoo, all around the fourth of martz. Oh, I get all confused.

Wernicke's Aphasia

Neuroimaging Studies● Neuroimaging

– Functional magnetic resonance imaging (fMRI)– Positron emission tomography (PET)

● Subjects are asked to perform cognitive tasks while taking imaging.

Neuroimaging● FMRI● PET

Normalizing Neuroimages● Talairach coordinate space

– Center: Anterior Commissure

– X: [-65, +65]– Y: [+70, -90]– Z: [-40, +65]

Semantic Conditions● Same

– The lawyer questioned the witness.– The attorney questioned the witness.

● Different– The man was attacked by the doberman.– The man was attacked by the pitbull.

Syntactic Conditions● Same

– The policeman arrested the thief.– The thief was arrested by the policeman.

● Different– The teacher was outsmarted by the student.– The teacher outsmarted the student.

Summary by Bookheimer, 2002

● The role of the left inferior frontal lobe in semantic processing and dissociations from other frontal lobe language functions.

● The organization of categories of objects and concepts in the temporal lobe.

● The role of the right hemisphere in comprehending contextual and figurative meaning.

Overview by Ahrens, 2007● Past

– Functional localization (brain damage)● Present

– Narrower localization + discussion of overlap and integration (neuro-imaging techniques)

● Future – Language as a brain function (integrate knowledge

about timing, context, and individual differences)

The Three Myths● Myth 1: Broca’s area deals with syntax/production

– Fact: Semantics and phonology cluster in different areas of the IFG; syntax seems to be distributed throughout the IFG.

– Fact: IFG is activated during non-language tasks.

● Myth 2: Wernicke’s area deals with semantics/comprehension– Fact: There are functional subdivisions for language in

posterial temporal area.

The Three Myths● Myth 3: The right hemisphere is not used when

processing language – Fact: The right hemisphere is called upon for many

integrative language processes.> Figurative Language and Metaphor> Linguistic Context> Prosody

Summary of Neuroimaging Studies

Dynamic Dual Pathway Model

● Spoken language comprehension requires the coordination of different subprocesses in time.

● Segmental information: – phonemes, syntactic elements and lexical-semantic

elements.● Suprasegmental information:

– accentuation and intonational phrases, i.e., prosody.

Localization of Different Subsystems

● Segmental information:– syntactic and semantic information are primarily

processed in a left hemispheric temporo-frontal pathway including separate circuits for syntactic and semantic information

● Suprasegmental information: – sentence level prosody is processed in a right

hemispheric temporo-frontal pathway.

Dynamic Interaction● Corpus Callosum

Can we design ASR systems by imitating the brain?

● An open question– Is it possible? Is it more effective?

● Complexity– Basic computation power of a neuron: 60 hz– 10^8 of input, 10^10 in the brain, each with >8000

connections● Training time

– How long would it take for a human being to understand language?

Some factors in human neural system

Recommended