Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Time and Frequency*
Methods in Discourse Prosody
*and Intensity
Dafydd Gibbon
Bielefeld University
Prosody Interfaces Conference,Fudan University, Shanghai
14–15 July 2018
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 2
Time and Frequency*
Methods in Discourse Prosody
*and Intensity
Dafydd Gibbon
Bielefeld University
Prosody Interfaces Conference,Fudan University, Shanghai
14–15 July 2018
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 3
YARD – Yet Another Rhythm Discussion
1. The prevalent view is that rhythm has not been identified in the physical speech signal. True.
2. But there are many rhythms and the methods are inadequate to identify them.
3. Theses:– Rhythm is perceived oscillation– The oscillations are an emergent function of many factors
● top-down● bottom up
– There is a phonological basis to the rhythms of speech – There is a phonetic basis to the rhythms of speech– The key to understanding rhythms is Modulation Theory
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 4
YARD – Yet Another Rhythm Discussion
1. Methods and Models in ‘Rhythmology’
2. Rhythm is Isochrony? Not only!– 1D, 2D and 3D approaches
3. Phonological Iteration as a Basis for Oscillation
4. Rhythms are Oscillations– Production as Modulation– Perception as Demodulation
5. Modulation Theoretic Phonetic Basis for Oscillation– AEMS: Oscillation of Intensity– FEMS: Oscillation of Frequency
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 5
Methods and Models in ‘Rhythmology’
1. Qualitative– Methods:
● heard data● intuited data
– Disciplines:● Discourse analysis● Phonology
2. Quantitative– Modelling intensity
● in the time domain● in the frequency domain
– Disciplines:● experimental phonetics● corpus phonetics and speech technology
and of course musicologycardiologyneurology
...
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 6
Rhythms– complex forms of f0, intensity, timing at different rank levels
● phones (f0 perturbations)● syllables (contrastive functions of tones)● words (morphemic functions of tones)● phrases (structuring, attitudinal, iconic meanings of intonation)● discourse (contrast, focus, emphasis; turn-taking, framing)
Melodies– complex forms of f0 at the different rank levels
Harmonies– complex function of parallel formants at different rank levels
● cf. the harmony of parallel melodies in music● only distantly related to Smolensky’s Harmony Theory● Vowel Harmony defines constraints on formant harmony● cf. Prosodic Phonologies such as Firthian Phonology, Autosegmental
Phonology)
Rhythm: the Context
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 7
Rhythm 1: Events
Where is there a good definition of ‘rhythm’?
How about the following approach:
● A rhythm is a sequence of at least two events● The events are isochronous, i.e. of equal duration (but fuzzy)● The events are changes of the same parameter● The parameter changes induce at least a binary structure
(but maybe a much more complex structure)
● The structures are similar, e.g. strong-weak
Rhythm: the Context
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 8
Model of Rhythm as Sequence of Isochronous Structured Events
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 9
One Phonetic Basis of Rhythm:
Annotation Mining for Isochrony
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 10
One-dimensional approaches
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 11
Two-dimensional approaches
Wagner, Petra (2007). “Visualizing levels of rhythmic organisation.” Proc. International Congress of Phonetic Sciences, Saarbrücken 2007, pp. 1113-1116, 2007
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 12
One-dimensional approaches
4 3 4 5 2 3 1
w
s
s
s
s s
sw
w w w w
R
the man in the car saw Mary
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 13
Three-dimensional approaches
Automatically induced numerical parse trees, root at bottom
Implemented in Scheme
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 14
Three-dimensional approaches
Gibbon, Dafydd. 2006. “Time types and time trees: Prosodic mining and alignment of temporally annotated data”. In: Stefan Sudhoff, et al., eds. Methods in Empirical Prosody Research. Berlin: Walter de Gruyter, pp. 281–209, 2006.
‘Iambic’ deceleration relation: durations get longer
Implemented in Scheme
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 15
Three-dimensional approaches
Gibbon, Dafydd. 2006. “Time types and time trees: Prosodic mining and alignment of temporally annotated data”. In: Stefan Sudhoff, et al., eds. Methods in Empirical Prosody Research. Berlin: Walter de Gruyter, pp. 281–209, 2006.
‘Iambic’ deceleration relation: durations get longer
time-stamp
duration
18.199-17.982 = 0.217
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 16
Three-dimensional approaches
3-dimensional time-stamp duration analysis:Time-Tree induction:
- length ✕ depth with 1-place lookahead (so actually 2D+1):- hierarchical classification of alternation relations- several processing options: binary/nonbinary, lower/higher percolated- related to phrasal and discourse patterns
Cyclical upward percolation of ‘dominant’ duration value.Here: the left-hand shorter value
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 17
Rhythm 2: Oscillators
So rhythms are alternations of similarly structured events with similar durations, i.e. oscillations
But what is the ‘grammar’ of rhythm? How are rhythms generated and perceived?
Clearly,● by oscillators in production● by oscillation detectors in perception● but also by ‘abstract oscillation’ such as iteration in phonology
Research is gradually showing that these oscillators relate to oscillating signals in the brain
Rhythm: the Context
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 18
The Phonological Basis of Rhythm:
Iteration as ‘Abstract Oscillation’:
English Intonation
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 19
Phrasal Oscillator: Pierrehumbert’s Finite Machine Model
Pierrehumbert (1980)
This ‘intonation grammar’ for English intonation underlies the popular ToBI (Tones and
Break Indices) intonation transcription system
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 20
Phrasal Grammar: Pierrehumbert’s Finite Machine Model
Pierrehumbert (1980)
IP → BT1 PAcc+ PhAcc BT
2
BT1, BT1 ∈ {H%, L%}
PAcc ∈ {H*, L*, L*+H-, L-+H*, H*+L-,
H-+L*, H*+H-}
PhAcc {H∈ -, L-}
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 21
Iterative Finite Machine as Abstract Oscillator
Pierrehumbert (1980)
FTN (Finite Transition Network)representing an FSA (Finite State Automaton)
with Tone Lexicon
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 22
Iterative Finite Machine as Abstract Oscillator
Pierrehumbert (1980)
Revisions needed:
1. Reset (internal repetition)2. Insertion of parenthetics3. Variables for declination/inclination4. Interpolation of unstressed
syllables5. Constraints on accent sequences6. Transduction to phonetics
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 23
Iterative Finite Machine as Abstract Oscillator
Pierrehumbert (1980)
Formal properties
Iterative, with loops or cycles
1) equivalent to purely right (or purely left) branching regular grammar
2) non-finite maximal length3) 3 recursions (cycles, loops):
1) accent sequences2) intermediate phrase sequences3) intonation phrase sequences
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 24
The Phonological Basis of Rhythm:
Iteration as ‘Abstract Oscillation’:
Niger-Congo Tone Sandhi
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 25
Niger-Congo Tone Sandhi: Tem (Togo; Gur; ISO 639-3 kth)
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 26
Niger-Congo Tone Sandhi: Tem (Togo; Gur; ISO 639-3 kth)
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 27
Tem (Togo; (Gur; ISO 639-2 kth)
Data: transcription
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 28
1-tape (1-level) transition network
Finite Machine for Niger-Congo Languages with 2 Lexical Tones
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 29
1-tape (1-level) transition network
Formal Properties
Iterative, with loops or cycles
1) equivalent to purely right (or purely left) branching grammar
2) non-finite maximal length3) 3 recursions (cycles, loops):
1) accent sequences2) intermediate phrase sequences3) intonation phrase sequences
Finite Machine for Niger-Congo Languages with 2 Lexical Tones
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 30
2-tape (2-level) transition network
Generalised Two-tone Machine with Two-level Phonetic Mapping
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 31
3-tape (3-level) transition network
Generalised Two-tone Machine with Three-level Phonetic Mapping
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 32
3-tape (3-level) transition network
Generalised Two-tone Machine with Three-level Phonetic Mapping
The functions on the third level can be assigned numerical values:1) initial ‘start-up’ high or low fuzzy pitch
constant2) multiplication of previous value by an
upsweep, downdrift, upstep, or downstep value
3) addition of a baseline value
cf. Liberman & Pierrehumbert (e.g. 1984)
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 33
Generalised Two-tone Machine with Three-level Phonetic Mapping
Anyi Baule Ega (1) Ega (2)
Discrete levelTerraced
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 34
The Phonological Basis of Rhythm:
Iteration as ‘Abstract Oscillation’:
Tianjin Mandarin Tone Sandhi
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 35
Martin Jansche 1998Tianjin Mandarin tone sandhi
Generalised Two-tone Machine Three-level Machine for Mandarin
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 36
The Phonological Basis of Rhythm:
Iteration as ‘Abstract Oscillation’:
English Syllables as Individual Oscillation Cycles
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 37
Linear Syllable Grammar (English)
ONSET NUCLEUS CODA
English Monosyllabic Words
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 38
Linear Syllable Grammar (English)
ONSET NUCLEUS CODA
English Monosyllabic Words
The syllable hierarchy is simply a grouping of finite linear patterns, and is not recursive: in itself; it is the minimal element of a recursive series:
1) finite depth2) finite maximal length3) finite set (32883 potential syllables)
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 39
Benjamin Lee Whorf’s solutions
Carroll, John B. (ed.) (1956). Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. Cambridge, Mass.: MIT Press, p. 284.
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 40
Linear Syllable Grammar (English)
ONSET NUCLEUS CODA
English Monosyllabic Words
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 41
The Phonological Basis of Rhythm:
Iteration as ‘Abstract Oscillation’:
Mandarin Syllables as Individual Oscillation Cycles
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 42
Diphone Linear Syllable Grammar (Mandarin)
English Syllables
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 43
Linear Syllable Grammar (Mandarin)
ONSET NUCLEUS CODA
Linear Syllable Grammar for Mandarin
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 44
ONSET NUCLEUS CODA
Linear Syllable Grammar (Mandarin)
The syllable hierarchy is simply a grouping of finite linear patterns, and is not recursive:
1) finite depth2) finite maximal length3) finite set (437 potential syllables)
Linear Syllable Grammar for Mandarin
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 45
A Note on Iteration, the Different Kinds of Recursion,
and their Realtime Processing Properties
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 46
Prosody Processing: Computational Requirements
Realtime processing requirements:– finite memory space– finite or linear processing time
Fulfilment of real time processing requirements:– iterative grammars have linear processing requirements– right-branching, or left-branching grammars have linear
processing time– finite-depth grammars have constant finite processing time
Nonfulfilment of real time processing requirements:– non-deterministic grammars (e.g. like A→a b | a c )– centre-embedding phrase structure grammars
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 47
Food for thought:– recursion is not just about a node dominating another node
with the same name – that name may be ill-defined and ambiguous, or a generalisation, or vague; this criterion is necessary but not sufficient
– recursion is about describing an infinite number of objects (sentences, words, numbers, …)
– a recursive theory of language and speech must also be realistic:
● the Linear Processing Time Constraint:The time required for processing speech must be linear in relation to the length of the input.
● the Finite Processing Space Constraint:The memory required for processing speech must be finite.
Processing Time and Processing Space
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 48
In the many discussions of recursion over the past 20 years or so, this crucial distinction between two types of recursion with different processing time and space properties has been neglected:
– linear recursion:● left & right branching (computationally equivalent to iteration)● linear recursion is realistic, requiring finite working memory, and
processing time which is a linear function of the size of the input
– non-linear recursion:● centre-embedding, cross-serial dependencies● non-linear recursion is unrealistic, requiring unrestricted
memory and at least quadratic processing time, thus implausible for speech
Processing Time and Processing Space
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 49
Non-linear recursion is unproblematic: the basic principle of creativity in language.
But speakers fail at producing and understanding centre-embedding in spontaneous speech. How can this then be a feature of language?
In rehearsed speech, writing and read speech, a small amount of centre-embedding is possible, due to the additional time and memory space provided by this kind of register.
Processing Time and Processing Space: a Note on Recursion
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 50
Where did centre-embedding come from?Speakers were trying to be clever: generalising linearly recursive sentence-final nominal clauses (e.g. relative clauses, that clauses) to centre-embedding non-final positions.
So centre-embedding is– derived from right or left recursion– plus a generalisation:
“Use right (or left) branching anywhere”
Unfortunately, processing capacity is too limited to permit more than one application of this generalisation, unless rehearsal or writing are involved. And speakers fail.
Processing Time and Processing Space: a Note on Recursion
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 51
Where did centre-embedding come from?Speakers were trying to be clever: generalising linearly recursive sentence-final nominal clauses (e.g. relative clauses, that clauses) to centre-embedding non-final positions.
1. Linear (right-branching):– Jim saw the man who found the boy
2. Centre-embedding experiment – tough to process:– the man who found the boy saw Jim
3. Linear right-branching solution – use the passive:– Jim was seen by the man who found the boy
Processing Time and Processing Space: a Note on Recursion
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 52
Try pronouncing this:The lady who the girl who the teacher who my friend saw was teaching was visiting employed Jim.
Now try pronouncing this:Jim was employed by the lady who was being visited by the girl who was being taught by the teacher who was seen by my friend.
For those who claim that recursion is the key feature of language there is surely a responsibility
1. to distinguish between the processing properties of different types of recursion, and
2. to explain, as in the manner outlined here, why the relation of centre-embedding to linear branching, and why it fails.
Processing Time and Processing Space: a Note on Recursion
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 53
Rhythm 3: The Modulation Theory of Rhythm
● Speech is based on an oscillating carrier wave– melodious, generated in the larynx
● additionally maybe noisy, generated by oral occlusions
● Unmodulated carrier wave has– a neutral, relaxed, unmarked frequency characteristic– a neutral, relaxed, unmarked amplitude characteristic
● Modulators:– an oscillation of a much lower frequency
● either modulates the frequency of the carrier (FM)● or modulates the amplitude of the carrier (AM)
● Demodulators:– FM demodulation (aka ‘f0 estimation’, ‘pitch tracking’ etc.)– AM demodulation (detection of syllable, word,… envelope shape
Modulation Theory
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 54
Modulation Theoretic Approaches:
Production as Modulation
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 55
Barbosa’s Oscillator Model
Def. “rhythm”: speech rhythm is understood as the consequence of the variation of perceived duration along the entire utterance.
Two levels of duration encoding / control / specification, coupling between 2 oscillators:
• syllabic: intrinsic lexical level
• phrasal: extrinsic, properly rhythmic level
• entrainment (coupling) of the oscillators
Emulation of results of other rhythm studies:● the more like stress-timing: phrasal oscillator dominance
● the more like syllable-timing: syllable oscillator dominance
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 56
Barbosa’s Oscillator Model
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 57
Barbosa’s Oscillator Model
points of possible influence by phrase factor
Simplified reconstruction of the model:● entrainment operation:
● z(t) = ENTRAIN(x(t),y(t))
● phrase pulse generator (“oscillator”):● y(t) = PhraseAmpl * pulse({0,1})
● syllable oscillator:● x(t) = SyllAmpl * sin(frequency * t + phase)
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 58
Fujisaki’s Oscillator Model
Production based
Pulses: point (phrase), interval (accent)
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 59
Fujisaki’s Oscillator Model
Production:
phrase command:
point impulse
smoothing
accent command:
interval impulse
smoothing
baseline
combination
(e.g. addition)
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 60
Modulation Theoretic Approaches:
Perception as Demodulation
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 61
Modulation Theoretic Approaches: Demodulation
Amplitude Envelope Modulation
↓Amplitude Envelope Demodulation
absolute value of Hilbert transform(or rectification & peak-picking / LP filtering)
↓Spectral slice (FFT)
↓Spectral Zone Edge Detection
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 62
Modulation Theoretic Approaches: Demodulation
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 63
Modulation Theoretic Approaches: Demodulation
Rectifiedmodulated signal(light green, top)
Signal: 2s, 200×5 Hz AM carrier
(light & dark green)
Demodulated FM (‘pitch’) track(red outline)
AM and FMspectra
AM and FM spectraas heatmaps
Frequency ZoneEdge Detection
Demodulated AM envelope
(red outline)
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 64
Modulation Theoretic Approaches: Demodulation
English (RP)Edinburgh corpus
“The North Wind and the Sun”
Beijing MandarinYu corpus
“bei3 feng1 gen1 tai4 yang2”
Short phrases
Short IPUs
Paratone
IPUsIPU hierarchy
PhrasesIPUs
1 Hz
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 65
Modulation Theoretic Approaches: Demodulation
EnglishNewsreadingAEMS Frequency Tree
Algorithm: L-strong, <
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 66
Modulation Theoretic Approaches: Demodulation
AEMS Frequency Tree
EnglishNorth Wind & Sun
Algorithm: L-strong, <
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 67
Modulation Theoretic Approaches: Demodulation
AEMS Frequency Tree
MandarinNorth Wind & Sun
Algorithm: L-strong, <
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 68
Summary and Conclusion
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 69
Summary and Conclusion
I have tried to persuade you that
– rhythm is an emergent function of many oscillators
– a detailed definition of rhythm is necessary
– rhythm at an abstract, phonological level is iteration
– iteration at a concrete, phonetic level is oscillation
– rhythm is an emergent function of many oscillators
– there is not just ONE rhythm, there are many rhythms
i.e. oscillations of different frequencies
– these frequencies can be detected in terms of Modulation Theory● in the physical signal● by signal processing methods● and by tree construction from numerical data
– therefore pessimism about components of emergent rhythm not being detectable in the physical signal is unjustified
Prosody Conference 14-15 July 2018
D. Gibbon, Time and Frequency: Methods in Discourse Prosody 70
Thank you!谢谢 !