ProZed: an Editor for the Automatic Processing of Prosodic Variation

ProZed: an Editor for the Automatic Processing of Prosodic Variation

C. AURAN, C. BOUZON & D.J. HIRST

Laboratoire Parole et LangageCNRS UMR6057

Université de Provence

Summary1. Prosodic systems

Prosody as a multidimensional macro-systemLevels of representation

2. ProZEdGeneral conceptionsDemonstrations (a few modules)

Long sound file fragmentation, Speaker separation

Duration manipulationSilence detection and fragmentationMOMEL-INTSINT codingPhonological resynthesis

3. Perspectives

Prosodic systems

Prosody as a macro-system

• Prosody seen as consisting of 3 systems (Di Cristo 2001):• Tonal system• Temporal system• Metrical system

• Intimate interactions between elements from these 3 systems

• Complex relations between the acoustic, the phonetic and the

phonological levels

« Prosody » does not mean « intonation »

Orthogonal dimensions

•Tonal and temporal systems make use of 2 orthogonal

dimensions (Ladd 1996, Di Cristo et al. 2003 and forthcoming):

• Linear dimension (tonal sequences, syllable length

distribution, …)

• Frame dimension (register level and span, downtrends,

tempo, …)

Both dimensions play a major part in the organisation of discourse

and the linguistic characterisation of dialects (ref.)

Levels of representation (1)

• 4 levels of representation (cf. Hirst et al. 2000):

0. Physical level (acoustic data)

1. Phonetic level (continuous quantitative variables)

2. Surface phonological level (abstract qualitative characteristics)

3. Underlying phonological level

• Interpretability constraint → local interpretation in relation with

adjacent levels

• Mapping:• between level 0 and level 1: phonetic representation• between level 1 and level 2: surface phonological representation


• Phonetic representation:

• Temporal system: unit alignment with the speech signal

• Tonal system: quadratic spline modelling of fundamental

frequency (MOMEL algorithm)


• Surface phonological representation:

• Temporal system: categorical coding (--, -, , +, ++)- Base dimension: raw segment duration- Frame dimension: tempo factor on raw segment duration

• Tonal system: INTSINT coding of MOMEL targets (M, T, B, L,

H, U, D) - Purely formal coding (≠ ToBI but cf. narrow IPA

transcription)- Base dimension + frame dimensions (register level,

register span, declination effect)

INTSINT: base dimension Absolute tones

T (Top)

M (Mid)

B (Bottom)

Relative tones non-iterative

H (Higher)

L (Lower)

iterative

U (Up)

D (Down)

S (Same)H (Higher) L (Lower)

U (Up) D (Down)

0

50

100

150

200

M T L H L H L H B

INTSINT: Frame dimensionDowndrift

Register level and register span codings(cf. Portes & Di Cristo 2003)

ProZEd

General conceptions (1)

ProZEd: « Prosodic Editor »

• Multi-functional

• Preliminary processing (file segmentation, speakers separation, …)

• Specific processing (duration processing, silence detection, intonation processing, resynthesis, …)

• « Theory independent » (cf. Mixdorf’s work)

• Multi-platform (Praat, Perl), freeware and open source (GPL)

General conceptions (2)

ProZEd: Representation levels

Reversible mapping (for intonation):

0. Physical level

1. Phonetic level

2. Surface phonological level

MOMEL

INTSINT

INT2PHO

QSP

MBROLA

Demonstrations

Long sound file fragmentation

Duration manipulation

Silence detection and fragmentation

MOMEL-INTSINT coding

Phonological resynthesis

[ Launch ProZEd ]

Perspectives

Perspectives• Improved modelling of duration (z-score method)

• Automatic generation of both xml and human (more easily)

readable data sheets (polymetrical expressions for instance)

Ex.: _<M>(nV, <H>)(TIN, <BU>)_

• New modules for:

• automatic pseudo-segment detection and processing (IRIT’s

Vocalis software)

• automatic complementary information extraction

• automatic alignment using iterative DTW (Di Cristo & Hirst

1997)

Thank you for your attention

Presentation available from

www.lpl.univ-aix.fr/~EPGA/

(ProZEd modules also available shortly… )

Documents

ProZed: an Editor for the Automatic Processing of Prosodic Variation