66
PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Embed Size (px)

Citation preview

Page 1: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

PSY 369: Psycholinguistics

Language ComprehensionWord recognition & speech recognition

Page 2: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Lexicon

SemanticAnalysis

SyntacticAnalysis

WordRecognition

Letter/phonemeRecognition

Formulator

Grammatical Encoding

Phonological Encoding

Articulator

Conceptualizer

Thought

Com

prehension

Production

Lexical Access

Page 3: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Lexical access How do we retrieve the linguistic

information from Long-term memory? How is the information organized/stored? What factors are involved in retrieving

information from the lexicon? Models of lexical access

Page 4: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Lexical access How do we retrieve the linguistic

information from Long-term memory? How is the information organized/stored? What factors are involved in retrieving

information from the lexicon? Models of lexical access

Page 5: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Lexical accessFactors affecting lexical access

Some of these may reflect the structure of the lexicon

Some may reflect the processes of access from the lexicon

Page 6: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Role of prior context

Listen to short paragraph. At some point during theparagraph a string of letters will appear on the screen. Decide if it is an English word or not. Say ‘yes’ or ‘no’ as quickly as you can.

Swinney (1979): Cross Modal Priming Task:

Page 7: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Role of prior context

ant

“Rumor had it that, for years, the government building has been plagued with problems. The man was not surprised when he found several spiders, roaches and other

bugs in the corner of his room.”

Lexical decision: button press

Y NY

Swinney (1979)

Page 8: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Role of prior context Swinney (1979)

Lexical Decision taskContext related: antContext inappropriate: spyContext unrelated: sew

Results and conclusions Within 400 msecs of hearing "bugs", both ant and spy are

primed After 700 msecs, only ant is primed

“Rumor had it that, for years, the government building has been plagued with problems. The man was not surprised when he found several spiders, roaches and other

bugs in the corner of his room.”

fasterfaster

400 msec

fasterNo difference

700 msec

Page 9: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Lexical ambiguity Hogaboam and Pefetti (1975)

Words can have multiple interpretations The role of frequency of meaning

Task, is the last word ambiguous? The jealous husband read the letter (dominant meaning) The antique typewriter was missing a letter (subordinate meaning)

Results: Participants are faster on the second sentence.

The results may seem counterintuitive The task is the key, “is the final word ambiguous” In the first sentence, the meaning is dominant and the context strongly

biases that meaning. So the second meaning may not be around, which in turn makes the it harder to make the ambiguity judgment in the first sentence

Page 10: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Lexical access How do we retrieve the linguistic

information from Long-term memory? How is the information organized/stored? What factors are involved in retrieving

information from the lexicon? Models of lexical access

Page 11: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Lexical access How do we retrieve the linguistic

information from Long-term memory? How is the information organized/stored? What factors are involved in retrieving

information from the lexicon? Models of lexical access

Lexicon

SemanticAnalysis

SyntacticAnalysis

WordRecognition

Letter/phonemeRecognition

Formulator

Grammatical Encoding

Phonological Encoding

Articulator

Conceptualizer

Thought

Lexical Access

Page 12: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

(some) Models of lexical access Serial comparison models

Search model (Forster, 1976, 1979, 1987, 1989) Parallel comparison models

Logogen model (Morton, 1969) Cohort model (Marslen-Wilson, 1987, 1990)

Connectionist models Interactive Activation Model (McClelland and Rumelhart,

1981)

Important contrasting features: (1) interactive or autonomous, (2) serial or parallel, (3) cascaded or discrete

Page 13: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Search model (e.g., Forster, 1976)

Access of the lexicon is considered autonomous, independent of other systems involved in processing language

A complete perceptual representation of the perceived stimulus is constructed

The representation is compared with representations in access files Three access files:

Orthographic Phonological Syntactic/semantic (for language production)

Access files are organized in a series of bins (first syllable or letters)

Position within the bins is organized by lexical frequency Access files have “pointers” to meaning information in semantic

memory

Page 14: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Ent

ries

in o

rder

of

Dec

reas

ing

freq

uenc

yVisual input

cat

Auditory input

/kat/

Access codes

Pointers

mat cat mouseMental lexicon

Search model (e.g., Forster, 1976)

Page 15: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Search model (e.g., Forster, 1976)

Frequency effects Bin organization

Repetition priming effects Temporary reordering of bins in

response to recent encounter

Semantic priming effects Accounted for by cross

referencing in the lexicon Context effects

Search is considered to be autonomous, un affected by context (so context effects are “post-access”)

Search model (Forster, 1976, 1979, 1987, 1989, 1994)

Ent

ries

in o

rder

of

Dec

reas

ing

freq

uenc

y

Visual input

cat

Auditory input

/kat/

Access codes

Pointers

mat cat mouseMental lexicon

Page 16: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Logogen model (Morton 1969)

Logogens specify word’s attributes e.g., semantic, orthographic, phonological

Activated in two ways By sensory input By contextual information

Access (recognition) when reach threshold Different thresholds depending on different factors

e.g., frequency Access makes information associated with word

available

cat dog cap

The lexical entry for each word comes with a logogen

Page 17: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Logogen model (Morton 1969)Auditory stimuli

Visual stimuli

Auditory analysis

Visual analysis

Logogen system

Outputbuffer

Context system

Responses

Available Responses

Semantic Attributes

cat dog cap

Page 18: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Think of a logogen as being like a ‘strength-o-meter’ at a fairground

When the bell rings, the logogen has ‘fired’

Page 19: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

‘cat’[kæt]

• What makes the logogen fire?

– seeing/hearing the word

• What happens once the logogen has fired?

– access to lexical entry!

Page 20: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

– High frequency words have a lower threshold for firing

–e.g., cat vs. cot

‘cat’[kæt]

• So how does this help us to explain the frequency effect?

‘cot’[kot]

Low freq takes longer

Page 21: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

• Spreading activation from doctor lowers the threshold for nurse to fire

– So nurse take less time to fire

‘nurse’[nə:s]

‘doctor’[doktə]

nurse

doctor

Spreading activation network

doctor nurse

Page 22: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Interactive Activation Model (IAM)

McClelland and Rumelhart, (1981)

Proposed to account for Word Superiority effect

Also goes by the name: Interactive Activation and Competition Model (IAC)

Page 23: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

+

Until the participant hits some start key

The Word-Superiority Effect (Reicher, 1969)

Page 24: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

COURSE

Presented briefly … say 25 ms

The Word-Superiority Effect (Reicher, 1969)

Page 25: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

U &&&&&

A

Mask presented with alternatives above and belowthe target letter … participants must pick one as theletter they believe was presented in that position.

The Word-Superiority Effect (Reicher, 1969)

Page 26: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

The Word-Superiority Effect (Reicher, 1969)

+

E E

& T

+

PLANE E

&&&&& T

+

KLANE E

&&&&& T

Letter only Say 60%

Letter in Nonword Say 65%

Letter in Word Say 80%

Why is identification better when a letter is presented in a word?

Page 27: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Interactive Activation Model (IAM)

McClelland and Rumelhart, (1981)

Nodes: • (visual) feature• (positional) letter• word detectors

• Inhibitory and excitatory connections between them.

Previous models posed a bottom-up flow of information (from features to letters to words).

IAM also poses a top-down flow of information

Also goes by the name: Interactive Activation and Competition Model (IAC)

Page 28: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Inhibitory connections within levels If the first letter of a word is “a”, it

isn’t “b” or “c” or … Inhibitory and excitatory

connections between levels (bottom-up and top-down)

If the first letter is “a” the word could be “apple” or “ant” or …., but not “book” or “church” or……

If there is growing evidence that the word is “apple” that evidence confirms that the first letter is “a”, and not “b”…..

Interactive Activation Model (IAM)

McClelland and Rumelhart, (1981)

Page 29: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

IAM & the word superiority effect

The model processes at the word and letter levels

simultaneously Letters in words benefit from

bottom-up and top-down activation

But letters alone receive only bottom-up activation.

McClelland and Rumelhart, (1981)

Page 30: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Cohort model (Marslen-Wilson & Welch, 1978)

Specifically for auditory word recognition(covered in chapter 9 of textbook)

Speakers can recognize a word very rapidly Usually within 200-250 msec

Recognition point (uniqueness point) Point at which a word is unambiguously different from

other words and can be recognized (strong emphasis on word onsets)

Three stages of word recognition1) activate a set of possible candidates2) narrow the search to one candidate3) integrate single candidate into semantic and

syntactic context

Page 31: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Cohort model Prior context: “I took the car for a …”

/s/ /sp/ /spi/ /spin/

…soapspinachpsychologistspinspitsunspank…

spinachspinspitspank…

spinachspinspit…

spin

time

Page 32: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Comparing the models Each model can account for major findings (e.g.,

frequency, semantic priming, context), but they do so in different ways.

Search model is serial, bottom-up, and autonomous Logogen is parallel and interactive (information flows up and

down) AIM is both bottom-up and top-down, uses facilitation and

inhibition Cohort is bottom-up but parallel initially, but then interactive

at a later stage

Page 33: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Different signalsVisual word recognition Speech Perception

Some parallel input Orthography

Letters Clear delineation Difficult to learn

Serial input Phonetics/Phonology

Acoustic features Usually no delineation “Easy” to learn

Where are you going

Page 34: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Different signalsVisual word recognition Speech Perception

Some parallel input Orthography

Letters Clear delineation Difficult to learn

Serial input Phonetics/Phonology

Acoustic features Usually no delineation “Easy” to learn

Where are you going

Page 35: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Speech perception

Articulatory phonetics Production based

Place and manner of articulation

Acoustic phonetics Based on the acoustic signal

Formants, transitions, co-articulation, etc.

Page 36: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Acoustic cues are extracted and stored in sensory memory and then mapped onto linguistic information

Air is pushed into the larynx across the vocal cords and into the mouth nose, different types of sounds are produced.

The different qualities of the sounds are represented in formants The formants and other features are mapped onto phonemes

Speech production to perception

Page 37: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Acoustic features

Spectrogram Time on the x-axis Frequency (pressure

under which the air is pushed) on the y-axis

Amplitude is represented by the darkness of the lines

time

Frequency

Page 38: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Acoustic features Acoustic features

Formants - bands of resonant frequencies Formant transitions - up or down movement of formants Steady states - flat formant patterns

Bursts - sudden release of air Voice onset time (VOT) - when the voicing begins

relative to the onset of the phoneme

Page 39: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Formants in a wide-band spectrogram

<-- Formant transitions -->

<-- F1

<-- F 2<-- F 3

Formants - bands of resonant frequencies Formant transitions - up or down movement of formants Steady states - flat formant patterns

Page 40: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Formants in a wide-band spectrogram

<-- Formant transitions -->

<-- F1

<-- F 2<-- F 3

Burst -->

Bursts – sudden release of air

Page 41: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Voice-Onset Time (VOT)

40 ms5 ms

bit pit

Page 42: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Categorical Perception Categorical Perception is the perception of different

sensory phenomena as being qualitatively, or categorically, different.

Liberman et al (1957) Used the speech synthesizer to create a series of syllables panning

categories /b/, /d/, /g/ (followed by /a/) Was done by manipulating the F2 formant

Stimuli formed a physical continuum Result, people didn’t “hear” a continuum, instead classified them into

three categories

Page 43: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Categorical Perception

1. Set up a continuum of sounds between two categories

1 ... 3 … 5 … 7

/ba/ - /da/

Liberman et al (1957)

Page 44: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

2. Run an identification experiment

1 ... 3 … 5 … 7

% /ba/

100

0

Sharp phoneme boundary

Categorical Perception Liberman et al (1957)

Our perception of phonemes is categorical rather than continuous.

Page 45: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Hard Problems in Speech Perception

Linearity (parallel transmission): Acoustic features often spread themselves out over other sounds

Where does show start and money end?

Wave form

Demo's and info

Page 46: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Hard Problems in Speech Perception

Invariance: One phoneme should have a one waveform

But, the /i/ (‘ee’) in ‘money’ and ‘me’ are different There aren’t invariant cues for phonetic segments

Although the search continues

Wave form

Demo's and info

Page 47: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Hard Problems in Speech Perception

Co-articulation: the influence of the articulation (pronunciation) of one phoneme on that of another phoneme.

Essentially, producing more than one speech sound at once

Demo's and info

Wave form

Page 48: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Hard Problems in Speech Perception

Trading relations Most phonetic distinctions have more than one

acoustic cue as a result of the particular articulatory gesture that gives the distinction.

slit–split – the /p/ relies on silence and rising formant, different mixtures of these can result in the same perception

Perception must establish some "trade-off" between the different cues.

Page 49: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Hard Problems in Speech Perception

McGurk effect McGurk effect2

The McGurk effect: McGurk and MacDonald (1976)• Showed people a video where the audio and the video don’t match

• Think “dubbed movie”

• visual /ga/ with auditory /ba/ often hear /da/

Implications• phoneme perception is an active process • influenced by both audio and visual information

Page 50: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Motor theory of speech perception A. Liberman (and others, initially proposed in late 50’s)

Direct translation of acoustic speech into articulatorally defined categories

Holds that speech perception and motor control involved linked (or the same) neural processes

Theory held that categorical perception was a direct reflection of articulatory organization

Categories with discrete gestures (e.g., consonants) will be perceived categorically

Categories with continuous gestures (e.g., vowels) will be perceived continuously

There is a speech perception module that operates independently of general auditory perception

Page 51: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Frontal slices showing differential activation elicited during lip and tongue movements (Left), syllable articulation including [p] and [t] (Center), and listening to syllables including [p] and

[t] (Right)

Pulvermüller F et al. PNAS 2006;103:7865-7870

©2006 by National Academy of Sciences

Page 52: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Motor theory of speech perception Some problems for MT

Categorical perception found in non-speech sounds (e.g., music)

Categorical perception for speech sounds in non-humans Chinchillas can be trained to show categorical perception of /t/ and /d/

consonant-vowel syllables (Kuhl & Miller, 1975)

Page 53: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Other theories of speech perception Direct Realist Theory (C. Fowler and others)

Similar to Motor theory, articulation representations are key, but here they are directly perceived

Perceiving speech is part of a more general perception of gestures that involves the motor system

General Auditory Approach (e.g., Diehl, Massaro) Do not invoke special mechanisms for speech

perception, instead rely on more general mechanisms of audition and perception

For nice reviews see: Diehl, Lotto, & Holt (2003) Galantucci, Fowler, Turvey (2006)

Page 54: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Top-down effects on Speech Perception

Phoneme restoration effect Sentence context effects

Page 55: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Phoneme restoration effect

Listen to a sentence which contained a word from which a phoneme was deleted and replaced with another noise (e.g., a cough)

The state governors met with their respective legi*latures convening in the capital city.

* /s/ deleted and replaced with a cough

Click here for a demo and additional information

Page 56: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Phoneme restoration effect

Typical results:

Participants heard the word normally, despite the missing phoneme

Usually failed to identify which phoneme was missing

Interpretation

We can use top-down knowledge to “fill in” the missing information

Page 57: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Phoneme restoration effect

Further experiments (Warren and Warren, 1970):

What if the missing phoneme was ambiguous

The *eel was on the axle.

Results:

Participants heard the contextually appropriate word normally, despite the missing phoneme

The *eel was on the shoe. The *eel was on the orange. The *eel was on the table.

Page 58: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Phoneme restoration effect Possible loci of phoneme restoration effects

Perceptual loci of effect: Lexical or sentential context influences the way in

which the word is initially perceived. Post-perceptual loci of effect:

Lexical or sentential context  influences decisions about the nature of the missing phoneme information.

Page 59: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Beyond the segment

Shillcock (1990): hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

The scientist made a new discovery last year.

Hear:

NUDIST

Page 60: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Cross-modal priming

The scientist made a novel discovery last year.

Hear:

Shillcock (1990): hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

NUDIST

Page 61: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Cross-modal priming

The scientist made a novel discovery last year.

Hear:

Shillcock (1990): hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

The scientist made a new discovery last year. faster

Page 62: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Cross-modal priming

The scientist made a novel discovery last year.

Hear:

Shillcock (1990): hear a sentence, make a lexical decision to a word that pops up on computer screen (cross-modal priming)

NUDIST gets primed by segmentation error

faster

Although no conscious report of hearing “nudist”

The scientist made a new discovery last year.

Page 63: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Beyond the segment Prosody and intonation

English: Speech is divided into phrases. Word stress is meaningful in English. Stressed syllables are aligned in a fairly regular rhythm,

while unstressed syllables take very little time. Every phrase has a focus. An extended flat or low-rising intonation at the end of a

phrase can indicate that a speaker intends to continue to speak.

A falling intonation sounds more final.

Page 64: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Beyond the segment Prosodic factors (supra segmentals)

Stress Emphasis on syllables in sentences

Rate Speed of articulation

Intonation Use of pitch to signify different meanings across

sentences

Page 65: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Beyond the segment Stress effects

On meaning “black bird” versus “blackbird”

Top-down effects on perception Better anticipation of upcoming segments when syllable

is stressed

Page 66: PSY 369: Psycholinguistics Language Comprehension Word recognition & speech recognition

Beyond the segment Rate effects

How fast you speak has an impact on the speech sounds

Faster talking - shorter vowels, shorter VOT Normalization

Taking speed and speaker information into account Rate normalization Speaker normalization