13
[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Embed Size (px)

Citation preview

Page 1: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

[Advanced] Speech & Audio Signal Processing

ES 157/257: Speech and Audio ProcessingProf. Patrick Wolfe, Harvard DEAS

02 February 2006

Page 2: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

State of the Art in Speech/Audio

Speech and audio processing may be divided into “low-level” and “high-level” inference Speech enhancement, compression, and

coding are all widely used technologies This low-level work is the most mature

High-level tasks will drive future advances Speech/music database information retrieval Automatic speaker and speech recognition

But low-level issues also remain…

Page 3: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Fundamental Questions

How to obtain highly structured representations of speech and audio signals? Time frequency “atoms”

as building blocks How can statistical inference

enable advances in speech signal processing? A means to obtain an

“atomic decomposition” Statistical modeling of time-

frequency coefficients provides a principled solution

Page 4: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Representative Applications

Missing data in the context of VOIP: Original Missing Restored

Source / Speaker Separation Source 1 Source 2

Mixture 1 Mixture 2

Recovery 1 Recovery 2

Page 5: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Digital Speech/Audio Processing

Page 6: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Speech Production

Page 7: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Time-Scale Modification

Page 8: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Time-Scale Modification

Male & Female Speaker Original Fast Faster Slower

Trumpet Original Fast Slow

Speech and Quasi-Periodic Audio Sinewave-based Modification Voicing-dependent Rate Factor

Page 9: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

More Time-Scale Modification

Falling Can, Bongo Drums, Loon Original Slow

Complex Non-Speech Signals Phase-Vocoder-based Modification Event-Dependent Phase Coherence

Page 10: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Pitch and Vocal Tract Change

Male & Female Speaker Original Low pitch/Long vocal

tract High pitch/Short vocal

tract

Male Speaker Original and Monotone

Sinewave-based Modification

Page 11: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Speech Coding

Female Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps

Sinewave-based Code-Excited Linear Prediction

Male Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps

Page 12: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Noise Reduction

Cell Phone Noise, Cocktail Party, Automobile Noise Original Enhanced

Adaptive Wiener Filter Adaptation Based on Spectral Change

Page 13: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Compression

Low-noise case Original 1.5 dB Reduction 3.0 dB Reduction

Reduction of Peak-to-RMS amplitude ratio Based on Sinewave Analysis/Synthesis

High-noise case Original 1.5 dB Reduction 3.0 dB Reduction