Upload
joseph-fisher
View
225
Download
3
Tags:
Embed Size (px)
Citation preview
CD Quality
CD Audio: 2 Channel (stereo)16 bit encoding44.1 kHz sampling rate
Data Rate:This leads to a data rate of 1.4 - 1.54 mbps
AUDIO ENCODER!
Bit rate: as low as 1 bit per sample or less
Based on a Perceptual model
Capable of high fidelity audio
Lossy or Lossless?
‘CD Quality’
Many perceptual test were conducted to verify the quality of audio output
Takes advantage of perceptual irrelevancies as well as statistical redundancies.
Motion Picture Experts Group
MPEG
MPEG is a family of encoding standards for digital multimedia information
• MPEG-1: a standard for storage and retrieval of moving pictures and audio on storage media (e.g., CD-ROM).
Layer I
Layer II
Layer III (aka MP3)
• MPEG-2: standard for digital television, including high-definition television (HDTV), and for addressing multimedia applications.
• Advanced Audio Coding (AAC)
• MPEG-4: a standard for multimedia applications, with very low bit-rate audio-visual compression for those channels with very limited bandwidths (e.g., wireless channels).
• MPEG-7: a content representation standard for information search
Back to the Encoder!
Generic Audio Encoder ArchitecturePainter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000
Psychoacoustic Model
Critical Listening Threshold:The absolute threshold of hearing is defined as the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment.
This criteria assumes that the volume control on the decoder will be set such that the smallest possible output signal will be presented at 0 dB - SPL
Psychoacoustic Model
Absolute threshold of hearing Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000
Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000, Vol. Vol. 88(No. 4)
Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000, Vol. Vol. 88(No. 4)
nn
Psychoacoustic Model
Critical BandsThe ear has a limited frequency selectivity that varies in acuity from less than 100 Hz for the lowest audible frequencies to more than 4 kHz for the highest.As a result the audible spectrum can be partitioned into critical bands that reflect the resolving power of the ear as a function of frequency.
Due to this limited frequency resolving power, the threshold for noise masking at any given frequency is solely dependent on the signal activity within a critical band of that frequency.
Psychoacoustic Model
MPEG/Audio filter banks Vs Critical BandsPan, D.Y. Digital Audio Compression, Digital Technical Journal, 1993, Vol. 5
Psychoacoustic Model
Auditory MaskingAuditory Masking is a perceptual weakness of the ear that occurs whenever the presence of a strong audio signal makes a spectral neighbourhood of weaker audio signals imperceptible.
Two types of Masking:* Simultaneous Masking* Temporal Masking
Psychoacoustic Model
Audio Masking
Pan, D.Y. Digital Audio Compression, Digital Technical Journal, 1993, Vol. 5
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Psychoacoustic Model
Perceptual EntropyJohnston at Bell Labs has combined notions of psychoacoustic masking with signal quantization principles to define perceptual entropy (PE), a measure of perceptually relevant information contained in any audio record.
Expressed in bits per sample, PE represents a theoretical limit on the compressibility of a particular signal.
PE measurements reported in and suggest that a wide variety of CD quality audio source material can be transparently compressed at approximately 2.1 bits per sample.
Time - Frequency Analysis
Filter BanksThe filter bank divides the signal spectrum into frequency sub-bands and generates a time-indexed series of coefficients representing the frequency localized signal power within each band.
Masking thresholds are applied to resulting frequency sub-band signals
By providing explicit information about the distribution of signal and hence masking power over the time-frequency plane, the filter bank plays an essential role in the identification of perceptual irrelevancies when used in conjunction with a perceptual model
Time - Frequency Analysis
Pseudo QMF - M Band BanksUsed in all MPEG 1 encoders
Signal is separated into sub-bands, the widths of which are equal over the entire frequency range
The resulting sub-band signals are then down-sampled, in order to conserve bandwidth. (they are up-sampled again at the decoder)
Pre Echo Distortion
Pre-echoes occur when a signal with a sharp attack begins near the end of a transform block immediately following a region of low energy.
This situation can arise when coding recordings of percussive instruments such as the triangle, the glockenspiel, or the castanets
b
Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000