Upload
others
View
4
Download
1
Embed Size (px)
Citation preview
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 1
State of the Art in Perceptual Coding: MPEG-2/4 Advanced Audio Coding (AAC)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 2
History•
1994: Official start of AAC development
•
Goal: Development of a new powerful state-of-the-art multi-channel coder without compatibility constraints
•
1997: AAC International standard (IS)
•
1999: AAC part of the MPEG-4 standard
•
Today: favorite coder for many application areas like Internet audio, solid state players, ISDN music transmission, High definition TV (HDTV), satellite and terrestrial digital audio, broadcasting
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 3
Overview (1)•
Next generation mono/stereo/multichannel coding
•
Same quality at half the bit-rate
•
International cooperation of the Fraunhofer Institute and companies like AT&T, Sony and Dolby
•
Most efficient MPEG method for audio data compression up until now
•
Driving force to develop AAC was the quest for an efficient coding method for surround signals, like 5-channel signals (cinemas)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 4
Overview (2)•
Makes use of the signal masking properties of the human ear in order to reduce the amount of data
•
Quantization noise is distributed to frequency bands in such a way that it is masked by the total signal
•
Iterative encoder structure using Huffman coding and non-uniform quantization
–
Features found in Layer 3 and PAC
•
Window type and block switching–
Features found in AC-3, Layer 3, PAC, + new
•
Temporal Noise Shaping (TNS)–
New technique
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 5
Overview (3)•
Prediction
•
Bit reservoir
•
M/S stereo coding
•
Intensity stereo coding
•
Gain control
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 7
MPEG-AAC: Basic Features•
High frequency resolution filter bank-based coder (1024 subband
MDCT with 50% overlap)
•
1:8 block switching (1024/128 subband
MDCT)
•
Non-uniform quantizer
•
Noise shaping in half critical bands (scalefactorbands)
•
Huffman coding of scalefactors
and spectral coefficients
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 8
MPEG-2 AAC: Advanced Coding Tools
•
Window shape adaptation usually fixed (sine or KBD)
•
Temporal noise shaping (TNS) often used
•
Gain control (SRS/ Sample Rate Scalable profile, only), not often used
•
Backward adaptive prediction not often used
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 9
Frequency response of Sine and Fielder windowFielder (KBD) window
Sine window
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 10
MPEG-2 AAC: Joint Stereo Tools
•
Mid/Side stereo (MS) per scalefactor
band
•
Intensity stereo coding between channel pairs
•
Coupling channel(s)
Other Features:
•
Flexible bitstream
format for up to 48 channels
•
Low Frequency Enhancement (LFE) channel(s)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 11
MPEG-2 AAC PerformanceTest Results:•
Broadcast quality at 320 kbit/s for 5 channels (better than MPEG-2 Layer II at 640 kbit/s)
•
Broadcast quality at 128 kbit/s stereo•
Comparison to other codecs: AAC 96 kbit/s stereo comparable to
–
AC-3 at 160 kbit/s–
Layer II at 192 kbit/s–
Layer III at 128 kbit/s Hence: AAC is successful in providing higher
compression ratios)•
Very low bitrates (comparison within MPEG): AAC best audio coder at bitrates down to 16 kbit/s for mono and stereo
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 12
Differences MPEG-2 AAC and MPEG Audio Layer-3
•
Filter bank
•
ISO/MPEG Audio Layer-3 uses hybrid filter bank chosen for reasons of compatibility
•
MPEG-2 AAC uses a plain Modified Discrete Cosine Transform (MDCT) to reduce aliasing
•
Together with the increased number of subbands
(1024 instead of 576 samples) the
MDCT outperforms the filter banks of previous coding methods
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 13
Differences MPEG-2 AAC and MPEG Audio Layer-3 •
Temporal Noise Shaping TNS•
Shapes the distribution of quantization noise intime by prediction in the frequency domain
•
Voice signals in particular experience considerable improvement through TNS
•
Prediction
(in band in time domain)
•
A technique commonly established in the area of speech coding systems
•
It benefits from the fact that stationary audio signals are predictable to a certain extend
•
But requires higher computational complexity
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 14
Differences MPEG-2 AAC and MPEG Audio Layer-3 •
Quantization•
By allowing finer control of quantization resolution, the given bit rate can be used more efficiently
•
Bit-stream format•
The information to be transmitted undergoes entropy coding in order to keep redundancy as low as possible
•
The optimization of these coding methods together with a flexible bit-stream structure has made further improvement of the coding efficiency possible
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 15
Filter Bank Details•
MDCT
(Princen
/ Bradley)
–
TDAC, MLT, cosine modulated filter bank–
critical sampling
–
time domain aliasing cancellation•
Block switching
to adjust the impulse response
•
Window type switching–
sine window
–
Kaiser Bessel Derived (KBD) window
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 16
MPEG-4 General Audio Coding
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 17
MPEG-4 Audio•
Interactivity
•
High compression•
Universal accessibility
MPEG-4 addresses applications in the shaded area.
‘TV/film’
‘Telecom’‘Computer’
interactivityWireless
AV-data
wide range of bit rates, more channels possible
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 18
A short view into MPEG-4 Audio (1)•
Very diverse requirements: no single algorithm:
–
Music synthesis (Structured Audio) = kind of an extension of midi
–
Very low rate parametric coding (HILN, HVXC)–
Speech coding (CELP)
–
Perceptual Coding ("General Audio") over a wide range of bitrates
•
High quality coding done via AAC with additional coding tools:
–
TvinVQ, scalability tools–
Perceptual Noise Substitution (PNS)
•
Backwards compatibility, no new coding paradigm for high quality audio
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 19
A short view into MPEG-4 Audio (2)
•
MPEG-4 General Audio Coding: The “all-round coder”
in MPEG-4 audio
•
MPEG-4 Extensions:–
Perceptual Noise Substitution (PNS)
–
Long Term Prediction–
TwinVQ
Coding Core ( important for low
bitrates)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 20
MPEG-4 Audio Algorithms DSL
text to speech
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 21
MPEG-4 Audio Profiles•
Speech Audio Profile
–
Parametric speech coder (HVXC)–
CELP speech coder
• Synthesis Audio Profile–
generate speech and sound
•
Scalable Audio Profile–
contains the speech audio profile
–
General audio coding (AAC)–
TwinVQ
tools
–
Scalable coding of speech and music•
Main Audio Profile
–
contains all other profiles
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 23
Temporal Noise Shaping (1)
t
t
t
t
original
TNS
e.g. castanet
e.g. speech
quantization noise
1 frame 1 frame
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 24
Why use TNS instead of block switching?
Low number of subbands
leads to higher bit rate. Ok if it only happens occasionally. Therefore
buffer can be used. Problem if there are many peaks, as in speech the glottal pulses (every few ms!). Bit rate would become too high, or the quantization noise too high.
Alternative approach is needed TNS
But: TNS is not really a replacement for block switching
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 25
Temporal Noise Shaping (2)
•
Solution for avoiding quantization noise spread:–
Make smaller frames (works for attacks but notfor speech decrease of coding efficiency)
–
Higher time resolution to shape quantizationnoise
–
TNS
•
Limitation of TNS:–
Time domain aliasing
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 26
Speech Coding as Model for TNS
•
In frequency domain the prediction error is flat
•
TNS predicts in frequency domain instead of time domain, shapes noise in time domain.
encoder
decoder
compute LPC coefficientsand transmit them to the decoder
should beoriginal spectrum
predictionerror
linear FIRfilter
predictor only knows past
samples
should be flat spectrumfrequency response of structure inverse of signal
structure should havefrequency response like signal
quantization noise is shaped accordingly in frequency domain
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 27
TNS•
Switch roles of time and frequency domain:
predict not over time, but over frequencies, over the subbands
quantization error is shaped (after decoding)in the time domain (instead the frequency domain) like the signal
hopefully reduces pre-echo artifacts
But: aliasing in time domain limits effectiveness (peaks are mirrored over time)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 28
MDCT1024 bands
z-1 pred
Q
TNS
Audio
already in AAC subbands
sequence of subbands
coeffside-info
predicts from one subband
to the next, starting at the lowest subband, from subband
0 it predicts 1,…
Structure of TNS (encoder)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 29
TNS Decoder
pred z-1
SynthMDCT1024 bands
Audio
subbands
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 30
Extension: Perceptual Noise Substitution (PNS)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 31
Perceptual Noise Substitution (1)Background: •
Parametric coding of signals gives a very compact signal representation
•
Parametric coding of noise-like signal components has been used widely e.g. in speech coding
•
Can similar techniques be used in perceptual audio coding ?
MPEG-4:•
Perceptual Noise Substitution (PNS) permits a frequency selective parametric coding of noise-like signal components
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 32
Perceptual Noise Substitution (2)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 33
Perceptual Noise Substitution (3)Principle:•
Noise-like signal components are detected on a scalefactor
band basis
•
Corresponding groups of spectral coefficients are excluded from quantization/coding
•
Instead, only a "noise substitution flag" plus totalpower of the substituted band is transmitted in the bitstream
•
Decoder inserts pseudo random vectors with desired target power as spectral coefficientsHighly compact representation for noise-like spectral components
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 34
Extension: Long Term Prediction
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 35
Prediction (1)
Background:•
Tone-like signals require much higher coding precision than noise-like signals (e.g. 20 dB vs. 6 dB SNR)High bit rate necessary to code signals with many tonal components (e.g. Harpsichord, Pitch Pipe)
•
Stationary tonal signal components are predictable, even in the downsampled
subbands
Further quality enhancement by predictive coding
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 36
Prediction (2)MPEG-2 AAC:•
Prediction of each spectral coefficient; backward adaptive 2nd order Lattice predictor
•
High complexity (ca. 50% of decoder computation & RAM)Prohibitive for cost sensitive applications, used in “MPEG-2 Main Profile” only
MPEG-4 AAC:•
Long Term Predictor (LTP) as known from speech coding, before MDCT on time domain
signal.
•
New: Integration into perceptual audio coder•
Lower complexity: Saving of approx. 50% in terms of computation and memory over MPEG-2 predictors
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 37
Long Term Prediction
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 38
Transform-Domain Weighted Interleave VQ (1)Background:•
Audio coding at extremely low bitrates (≥
6
kbit/s)MPEG-4:•
Transform-Domain Weighted Interleave Vector Quantization (TwinVQ) as additional coding kernel
•
Fully integrated into MPEG-4 AAC coding system:
–
Uses same spectral representation as AAC coder
–
Makes use of other MPEG-4 tools (e.g. LTP, TNS, joint stereo coding)
–
Possible core coder for MPEG-4 scalable coding
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 39
Transform-Domain Weighted Interleave VQ (2)Structure:•
Normalization of spectral coefficients:
–
LPC envelope (overall spectral shape)–
Periodic component coding (harmonic components)
–
Bark-scale envelope coding (additional flattening)
•
Vector Quantization (VQ) process:–
Interleaving of spectral coefficients into new sub-vectors
–
Vector quantization (two sets of codebooks, weighted distortion measure)
–
allows distortion control by perceptual model
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 40
Transform-Domain Weighted Interleave VQ (3)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 41
Conclusions•
The “all-round”
coding system among the MPEG-4
audio schemes, providing a set powerful tools•
Based on MPEG-2 Advanced Audio Coding kernel
•
Several enhancements for improved coding efficiency
•
Perceptual Noise Substitution (PNS): Exploiting noise-like components in the signal
•
Long Term Prediction (LTP): Taking advantage of verystationary / tonal signals
•
TwinVQ
coder kernel supports audio coding at extremely low data rates (MPEG-4 scalable coding)
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 42
MPEG-4 Scalable Audio Coding
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 43
Scalable Audio Coding
•
Embed lower quality (e.g. lower bandwidth) bitstream
in higher bandwidth bitstream
•
Key functionality for MPEG-4 audio
•
Main types of scalability:–
Small step scalability
Enhancement layers of ~ 1 kbit/s–
Large step scalability
Enhancement layers of 8 kbit/s and more
•
All natural audio coding in MPEG-4 supports scalability
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 44
The Scalable Audio Profile•
Four levels defined according to
–
sampling frequency–
number of channels / objects
•
Objects in the scalable audio profile–
AAC LC
–
AAC LTP–
AAC Scalable
–
TwinVQ–
CELP
–
HVXC–
TTSI
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 45
Speech Coding•
HVXC (Harmonic Vector EXcitation
Coding)
–
Bit-rates typically 2 kbps -
4 kbps–
Parametric speech coding: High quality for coding of clean speech
–
Speed change / pitch change capability
•
CELP (Code Excited Linear Prediction)–
Bit-rates typically 6 kbps -
24 kbps
–
Very flexible configuration possibilities–
Support for 8 kHz and 16 kHz sampling
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 46
MPEG-4 Low Delay Audio Coding
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 47
MPEG-4 Version 2 Low Delay Audio Coding
Target:•
High audio and speech quality and
•
Low bitrate
and Low algorithmic delay (20 ms)
Solution:•
MPEG-4 Version 2 Low Delay Audio Coder:
–
Derived from MPEG-2/4 "Advanced Audio Coding" (AAC)
–
Specific modifications for low-delay operation
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 48
Delay Sources in Perceptual Audio Coding
•
Framing delay•
Filter bank delay
•
Look-ahead delay for block switching•
Use of bit reservoir
Overall delay:
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 49
Example: Delay of AAC Codec (48 kHz / 64 kbps)
•
Framing delay : 1024 samples•
Filter bank delay : 1024 samples
•
Look-ahead delay for block switching : 576 samples
•
Use of bit reservoir : 74.7 ms
Overall delay:
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 50
Low Delay AAC Codec (48 kHz, min. delay mode)
•
Reduced filter bank delay : 959 samples•
No block switching no look-ahead delay: 0 samples
•
Minimal bit reservoir : 0...32 bits
Overall delay:
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 51
Delay vs. Bitrate
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 53
Preecho
Reduction by Window Shape Adaptation
Prof. Dr. Karlheinz Brandenburg, [email protected] Page 55
SummaryThe MPEG-4 V2 low-delay coder provides
•
High audio quality for music and speech•
Algorithmic delay of 20 ms enables two-way communications
•
Audio quality scales with bitrate•
Stereo and multi-channel capabilities (inherited from MPEG-2/4 AAC)
•
Compares to well w.r.t. MP3•
Low computational & memory complexity
•
Error robustness