AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya

AUDIO COMPRESSION

TOOLS & TECHNIQUESGautam Bhattacharya

CD Quality

CD Audio: 2 Channel (stereo)16 bit encoding44.1 kHz sampling rate

Data Rate:This leads to a data rate of 1.4 - 1.54 mbps

AUDIO ENCODER!

Bit rate: as low as 1 bit per sample or less

Based on a Perceptual model

Capable of high fidelity audio

Lossy or Lossless?

‘CD Quality’

Many perceptual test were conducted to verify the quality of audio output

Takes advantage of perceptual irrelevancies as well as statistical redundancies.

Motion Picture Experts Group

MPEG

MPEG is a family of encoding standards for digital multimedia information

• MPEG-1: a standard for storage and retrieval of moving pictures and audio on storage media (e.g., CD-ROM).

Layer I

Layer II

Layer III (aka MP3)

• MPEG-2: standard for digital television, including high-definition television (HDTV), and for addressing multimedia applications.

• Advanced Audio Coding (AAC)

• MPEG-4: a standard for multimedia applications, with very low bit-rate audio-visual compression for those channels with very limited bandwidths (e.g., wireless channels).

• MPEG-7: a content representation standard for information search

Back to the Encoder!

Generic Audio Encoder ArchitecturePainter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000

http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=842996&isnumber=18261


Psychoacoustic Model

Critical Listening Threshold:The absolute threshold of hearing is defined as the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment.

This criteria assumes that the volume control on the decoder will be set such that the smallest possible output signal will be presented at 0 dB - SPL


Absolute threshold of hearing Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000

Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000, Vol. Vol. 88(No. 4)

Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000, Vol. Vol. 88(No. 4)

nn








Critical BandsThe ear has a limited frequency selectivity that varies in acuity from less than 100 Hz for the lowest audible frequencies to more than 4 kHz for the highest.As a result the audible spectrum can be partitioned into critical bands that reflect the resolving power of the ear as a function of frequency.

Due to this limited frequency resolving power, the threshold for noise masking at any given frequency is solely dependent on the signal activity within a critical band of that frequency.


MPEG/Audio filter banks Vs Critical BandsPan, D.Y. Digital Audio Compression, Digital Technical Journal, 1993, Vol. 5

http://eceftp.niu.edu.tw/mhyeh/teach.files/media92/data/paper/Digital_Audio_Compression_1993.pdf


Auditory MaskingAuditory Masking is a perceptual weakness of the ear that occurs whenever the presence of a strong audio signal makes a spectral neighbourhood of weaker audio signals imperceptible.

Two types of Masking:* Simultaneous Masking* Temporal Masking


Audio Masking

Pan, D.Y. Digital Audio Compression, Digital Technical Journal, 1993, Vol. 5

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

http://eceftp.niu.edu.tw/mhyeh/teach.files/media92/data/paper/Digital_Audio_Compression_1993.pdf


Perceptual EntropyJohnston at Bell Labs has combined notions of psychoacoustic masking with signal quantization principles to define perceptual entropy (PE), a measure of perceptually relevant information contained in any audio record.

Expressed in bits per sample, PE represents a theoretical limit on the compressibility of a particular signal.

PE measurements reported in and suggest that a wide variety of CD quality audio source material can be transparently compressed at approximately 2.1 bits per sample.

Time - Frequency Analysis

Filter BanksThe filter bank divides the signal spectrum into frequency sub-bands and generates a time-indexed series of coefficients representing the frequency localized signal power within each band.

Masking thresholds are applied to resulting frequency sub-band signals

By providing explicit information about the distribution of signal and hence masking power over the time-frequency plane, the filter bank plays an essential role in the identification of perceptual irrelevancies when used in conjunction with a perceptual model

Time - Frequency Analysis

Pseudo QMF - M Band BanksUsed in all MPEG 1 encoders

Signal is separated into sub-bands, the widths of which are equal over the entire frequency range

The resulting sub-band signals are then down-sampled, in order to conserve bandwidth. (they are up-sampled again at the decoder)

Pre Echo Distortion

Pre-echoes occur when a signal with a sharp attack begins near the end of a transform block immediately following a region of low energy.

This situation can arise when coding recordings of percussive instruments such as the triangle, the glockenspiel, or the castanets

b

Painter, T. & Spanias, A. Perceptual Coding of Digital Audio, Proceedings of IEEE, 2000

AUDIO COMPRESSION

THANK YOU!

Documents

AUDIO COMPRESSION TOOLS & TECHNIQUES Gautam Bhattacharya