Upload
herman-ayers
View
16
Download
1
Embed Size (px)
DESCRIPTION
ProZed: an Editor for the Automatic Processing of Prosodic Variation. C. AURAN, C. BOUZON & D.J. HIRST Laboratoire Parole et Langage CNRS UMR6057 Université de Provence. Summary. 1. Prosodic systems Prosody as a multidimensional macro-system Levels of representation. - PowerPoint PPT Presentation
Citation preview
ProZed: an Editor for the Automatic Processing of Prosodic Variation
C. AURAN, C. BOUZON & D.J. HIRST
Laboratoire Parole et LangageCNRS UMR6057
Université de Provence
Summary1. Prosodic systems
Prosody as a multidimensional macro-systemLevels of representation
2. ProZEdGeneral conceptionsDemonstrations (a few modules)
Long sound file fragmentation, Speaker separation
Duration manipulationSilence detection and fragmentationMOMEL-INTSINT codingPhonological resynthesis
3. Perspectives
Prosodic systems
Prosody as a macro-system
• Prosody seen as consisting of 3 systems (Di Cristo 2001):• Tonal system• Temporal system• Metrical system
• Intimate interactions between elements from these 3 systems
• Complex relations between the acoustic, the phonetic and the
phonological levels
« Prosody » does not mean « intonation »
Orthogonal dimensions
•Tonal and temporal systems make use of 2 orthogonal
dimensions (Ladd 1996, Di Cristo et al. 2003 and forthcoming):
• Linear dimension (tonal sequences, syllable length
distribution, …)
• Frame dimension (register level and span, downtrends,
tempo, …)
Both dimensions play a major part in the organisation of discourse
and the linguistic characterisation of dialects (ref.)
Levels of representation (1)
• 4 levels of representation (cf. Hirst et al. 2000):
0. Physical level (acoustic data)
1. Phonetic level (continuous quantitative variables)
2. Surface phonological level (abstract qualitative characteristics)
3. Underlying phonological level
• Interpretability constraint → local interpretation in relation with
adjacent levels
• Mapping:• between level 0 and level 1: phonetic representation• between level 1 and level 2: surface phonological representation
Levels of representation (2)
• Phonetic representation:
• Temporal system: unit alignment with the speech signal
• Tonal system: quadratic spline modelling of fundamental
frequency (MOMEL algorithm)
Levels of representation (3)
• Surface phonological representation:
• Temporal system: categorical coding (--, -, , +, ++)- Base dimension: raw segment duration- Frame dimension: tempo factor on raw segment duration
• Tonal system: INTSINT coding of MOMEL targets (M, T, B, L,
H, U, D) - Purely formal coding (≠ ToBI but cf. narrow IPA
transcription)- Base dimension + frame dimensions (register level,
register span, declination effect)
INTSINT: base dimension Absolute tones
T (Top)
M (Mid)
B (Bottom)
Relative tones non-iterative
H (Higher)
L (Lower)
iterative
U (Up)
D (Down)
S (Same)H (Higher) L (Lower)
U (Up) D (Down)
0
50
100
150
200
M T L H L H L H B
INTSINT: Frame dimensionDowndrift
Register level and register span codings(cf. Portes & Di Cristo 2003)
ProZEd
General conceptions (1)
ProZEd: « Prosodic Editor »
• Multi-functional
• Preliminary processing (file segmentation, speakers separation, …)
• Specific processing (duration processing, silence detection, intonation processing, resynthesis, …)
• « Theory independent » (cf. Mixdorf’s work)
• Multi-platform (Praat, Perl), freeware and open source (GPL)
General conceptions (2)
ProZEd: Representation levels
Reversible mapping (for intonation):
0. Physical level
1. Phonetic level
2. Surface phonological level
MOMEL
INTSINT
INT2PHO
QSP
MBROLA
Demonstrations
Long sound file fragmentation
Duration manipulation
Silence detection and fragmentation
MOMEL-INTSINT coding
Phonological resynthesis
[ Launch ProZEd ]
Perspectives
Perspectives• Improved modelling of duration (z-score method)
• Automatic generation of both xml and human (more easily)
readable data sheets (polymetrical expressions for instance)
Ex.: _<M>(nV, <H>)(TIN, <BU>)_
• New modules for:
• automatic pseudo-segment detection and processing (IRIT’s
Vocalis software)
• automatic complementary information extraction
• automatic alignment using iterative DTW (Di Cristo & Hirst
1997)
Thank you for your attention
Presentation available from
www.lpl.univ-aix.fr/~EPGA/
(ProZEd modules also available shortly… )