Upload
thuy
View
39
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Speech tools. Jean-Philippe Goldman 03.03.2004. Two questions. What kind of data ? Which task ?. What kind of data ?. Speech content (noise, multivoice,…) Data File Sound/Transcription/PitchCurve Sampling/Quantization 16k 12k 8k 4k 8bit - PowerPoint PPT Presentation
Citation preview
Speech tools
Jean-Philippe Goldman
03.03.2004
2
Two questions
What kind of data ?
Which task ?
3
What kind of data ? Speech content (noise, multivoice,…) Data File
Sound/Transcription/PitchCurve Sampling/Quantization
16k 12k 8k 4k 8bit Size 16k16bit,256kbps 1.9Mo/mn 115Mo/h Format
Sound: wav, wma, mp3, ogg, aiff, aifc, au, vox, raw, sd, CSL, Ogg/Vorbis, NIST/Sphere
Transcription: HTK, TIMIT, TextGrid, Phondat Number of files
4
Which task ?
Visualization and Edition: Record, Play, edit, mix, add effects
Analysis: spectral, pitch
Speech manipulation: Filtering, mixing, adding effects, prosodic manipulation
Annotation: segmentation, labeling
Scripting: Batch, communication with outside
Plotting
5
Examples of tasks
build stimuli for an experiment (i.e. cross-splicing)
manage a speech database for a TTS engine create a prosodic database analyze speech corpus from experiment
recordings verify/correct an automatic segmentation
6
Two questions
What kind of data ? Which task ?
Two rules
there is no unique tool to do everything there are plenty of ways to do one thing
7
Tool features
Visualization/Edition Analysis Speech manipulation Annotation Scripting Plotting
Supported format Platform/installation Evolution/community Accessibility Price
8
Softwares
Goldwave (audio editor) Esps Xwaves (routines + visual.) Praat (speech analysis) Wavesurfer (speech editor) Transcriber (annotation tool) Matlab (general purpose soft) OGI speech tools (routines + app. dev.) …winpitch, pitchworks, phonedit, cooledit…..
9
Goldwave
self-defined as “top rated, professional digital audio editor”
10
Goldwave
pros : edition (good gestion of memory for big files), many FX, noise reduction, real-time spectrum and VU meters, various formats, batch conversion, chain effects, easy interface
cons: nothing for speech (pitch, formant), windows only, no scripting
Good for file edition not for speech
11
12
Esps - Waves
Developed by Entropic + AT&T. Now public Comp.speech FAQ says:
Esps: comprehensive set of speech analysis/processing tools
Waves is a graphical front-end for speech processing (waveforms, spectrograms, pitch) includes a signal labeling utility
13
14
Esps – waves
pros: powerful, designed for big files, cons: UNIX only (free BSD), not standard
formats, requires programming skills, development has stopped
15
Praat
Developed by P.Boersma and D.Weenink at the Institute of Phonetic Sciences, University of Amsterdam
general purpose speech tool : edition, segmentation and labeling, prosodic manipulation
16
17
Praat
pros: designed for speech analysis (not only sound edition or spectrogram visualization), nice GUI, scripting, active development and community, prosodic manipulation
cons: limited scripting language, native format of transcription and pitch files
18
WaveSurfer Open Source tool for sound visualization and
manipulation speech/sound analysis and sound
annotation/transcription platform for more advanced/specialized
applications: extending WaveSurfer with new custom plug-ins or embedding WaveSurfer visualization components in other applications
Requires SnackToolKit
19
20
Transcriber
Authors: C. Barras, E. Geoffrois Relies on Snack (Tcl/tk) Good for annotation Nice, simple GUI No speech analysis
21
22
Matlab (Mathworks)
Math. environment Signal processing toolbox : filter-design,
spectral analysis, waveform generation, linear prediction
voicebox (2002) [email protected] pitch determination algorithm (2002)
Xuejing Sun [email protected] colea speech editor (1998) Philip Loizou
[email protected] Univ of Texas-Dallas
23
Matlab (Mathworks)
pros: open, powerful, scripting, excellent plotting
cons: poor speech community, standards, not designed for big files
24
OGI speech tools/CSLU Toolkit development started in 1992 in C on Unix, at Center for Spoken
Language Understanding (CSLU) at OGI Includes :
An X windows display tool (LYRE) display, edit speech signal, spectrograms, phoneme labels, and other information
a set of C library routines (LIBNSPEECH), utilities for converting file formats, filtering, Neural Network training, vector-quantizer, database utility to automate speech database related enquiries
a set of PERL Scripts which have been used mainly to automate the use of the OGI Speech Tools.
MAN Pages RAD rapid application development
points of entry: Package(C), script(tcl), GUI(tk) levels free for research use
25
26
Ed
it
An
al
Man
ip
An
no
t
Scrip
t
Plo
t
Fo
rmat
OS
Evo
lut.
Co
mm
Price
Goldwavewin $40
EspsWaves C sh Unix free
Praat
yesnative
consolesendpraat src free
wavesurfer +snack
Ctcl/tk
python src free
transcriberxml free
OGIToolkit free
matlab + Sigproc+ packages native no BSD
stud.$100
$40/tbx
Summary
= yes but requires some dev.
27
Expect to do conversions
Sound files goldwave (win) sox (unix)
Transcription files scripts to convert text-formatted label files
28
Links www.goldwave.com www.speech.kth.se/software/#esps www.praat.org www.speech.kth.se/software/#wavesurfer www.cse.ogi.edu/toolkit www.mathworks.com (Matlab)
www.lpl.univ-aix.fr/~sqlab/ (phonedit) www.sciconrd.com/pworks.htm (PitchWorks) www.winpitch.com (WinPitch) www.adobe.com (CoolEdit > Audition)