Upload
sirhanshafahath8415
View
299
Download
19
Tags:
Embed Size (px)
DESCRIPTION
Hi, I'm Sirhan Shafahath. This is my presentation on Advanced Audio Coding, the finest audio coding algorithm today, and the successor of mp3. I had done this for as my seminar topic for the partial fulfillment of my B.Tech degree. Hope this might be useful for you. For detailed information and sources of references contact me at "[email protected]". Will be always there for you to help.Some details on presentation:- contains an introduction to CD audio and the need for audio compression.Then goes for the technology used for compression, ie; psychoacoustic. Then going for the aac coding with block diagrams. Then the SBR and PS technologies that give aac its quality.A comparison with mp3.The applications and a small conclusion.
Citation preview
ADVANCED AUDIO CODING [AAC]
Presented By Sirhan Shafahath
00606002S7 EC
INTRODUCTIONbull Advanced Audio Coding (AAC) is a standardized lossy compression and
encoding scheme for digital audio
bull Its standardized (defined) in ISOIEC 13818-7 [MPEG-2] ISOIEC 14496-3 [MPEG-4]
bull Developed with the cooperation and contribution of companies including Fraunhofer IIS ATampT Bell Laboratories Dolby Sony Co and Nokia
bull Designed to be the successor of the well-known audio compression format MP3
bull Filename extension m4a m4b m4p m4v m4r 3gp mp4 aac
bull It is currently the most powerful multichannel audio coding algorithm in MPEG family
INTRODUCTION TO DIGITAL AUDIO
bull Before the introduction of digital audio audio signals have been represented in analog form
bull Main disadvantages of analog audio Compression Rendering Quality Enhancement
bull Representing audio signals in digital form allows us to achieve the above goals more easily
bull The idea behind digital audio is to use numbers to represent the physical sound via an analog-to-digital (AD) conversion process
bull The AD conversion process involves sampling and quantization
Continuehellip
bull Sampling Each samplersquos amplitude as a function of a discrete index the rate at which each sample is extracted the sampling frequency or the sampling rate which is described in terms of number of samples per second or Hertz (Hz)
bull Quantization Sample resolution or bit depth determines how precisely the samplersquos amplitude is recorded or stored An n-bit sample resolution allows 2^n different possible amplitude values
Continuehellipbull Encoding The sampled and quantized signals are encoded using
some error correction codes and are stored in a media
bull CD AUDIO Itrsquos the most commonly used media for storing and transporting of digital audio
Sampling Rate 44100Hz (Nyquist Criteria satisfied for 20KHz) Sample Resolution 16-bit (ADC) Size (1minStereo) 60 x 2 x 44100 x 16 = 10584 MBmin Filename cda cdda
bull Generally they are uncompressed PCM data
bull The large amount of data makes them not suitable for internet streaming and digital broadcasting because of large bandwidth
HERE ARISE THE NEED FOR COMPRESSION
Compression Techniques
bull Any compression technique belongs to either lossy compression or lossless compression
bull Lossless Compression ndash If data is losslessly compressed the original data can be recovered
exactly from the compressed datandash As name implies involve no loss of information
bull Lossy compression ndash Involves some loss of informationndash Data that have been lossy compressed generally cannot be
recovered exactlyndash By accepting the above we can achieve higher compression ratios
than lossless compression
Perceptual Audio Coding
bull One of the key elements in the development of reduced bit rate audio is the understanding and application of psychoacoustics
bull All of the current perceptual audio coders achieve high compression rates by exploiting the fact that signal information that cannot be detected by even a well-trained listener can be discarded
bull Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components
bull Stereo audio streams contain largely redundant information
bull Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles
Principles of Psychoacoustics
1 Absolute Threshold of Hearing
The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment
It can be expressed with a non-linear function
Tq(f) = 364(f1000)-08 - 65e-06(f1000-33)2 + 10-3(f1000)4 (dB SPL)
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
INTRODUCTIONbull Advanced Audio Coding (AAC) is a standardized lossy compression and
encoding scheme for digital audio
bull Its standardized (defined) in ISOIEC 13818-7 [MPEG-2] ISOIEC 14496-3 [MPEG-4]
bull Developed with the cooperation and contribution of companies including Fraunhofer IIS ATampT Bell Laboratories Dolby Sony Co and Nokia
bull Designed to be the successor of the well-known audio compression format MP3
bull Filename extension m4a m4b m4p m4v m4r 3gp mp4 aac
bull It is currently the most powerful multichannel audio coding algorithm in MPEG family
INTRODUCTION TO DIGITAL AUDIO
bull Before the introduction of digital audio audio signals have been represented in analog form
bull Main disadvantages of analog audio Compression Rendering Quality Enhancement
bull Representing audio signals in digital form allows us to achieve the above goals more easily
bull The idea behind digital audio is to use numbers to represent the physical sound via an analog-to-digital (AD) conversion process
bull The AD conversion process involves sampling and quantization
Continuehellip
bull Sampling Each samplersquos amplitude as a function of a discrete index the rate at which each sample is extracted the sampling frequency or the sampling rate which is described in terms of number of samples per second or Hertz (Hz)
bull Quantization Sample resolution or bit depth determines how precisely the samplersquos amplitude is recorded or stored An n-bit sample resolution allows 2^n different possible amplitude values
Continuehellipbull Encoding The sampled and quantized signals are encoded using
some error correction codes and are stored in a media
bull CD AUDIO Itrsquos the most commonly used media for storing and transporting of digital audio
Sampling Rate 44100Hz (Nyquist Criteria satisfied for 20KHz) Sample Resolution 16-bit (ADC) Size (1minStereo) 60 x 2 x 44100 x 16 = 10584 MBmin Filename cda cdda
bull Generally they are uncompressed PCM data
bull The large amount of data makes them not suitable for internet streaming and digital broadcasting because of large bandwidth
HERE ARISE THE NEED FOR COMPRESSION
Compression Techniques
bull Any compression technique belongs to either lossy compression or lossless compression
bull Lossless Compression ndash If data is losslessly compressed the original data can be recovered
exactly from the compressed datandash As name implies involve no loss of information
bull Lossy compression ndash Involves some loss of informationndash Data that have been lossy compressed generally cannot be
recovered exactlyndash By accepting the above we can achieve higher compression ratios
than lossless compression
Perceptual Audio Coding
bull One of the key elements in the development of reduced bit rate audio is the understanding and application of psychoacoustics
bull All of the current perceptual audio coders achieve high compression rates by exploiting the fact that signal information that cannot be detected by even a well-trained listener can be discarded
bull Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components
bull Stereo audio streams contain largely redundant information
bull Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles
Principles of Psychoacoustics
1 Absolute Threshold of Hearing
The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment
It can be expressed with a non-linear function
Tq(f) = 364(f1000)-08 - 65e-06(f1000-33)2 + 10-3(f1000)4 (dB SPL)
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
INTRODUCTION TO DIGITAL AUDIO
bull Before the introduction of digital audio audio signals have been represented in analog form
bull Main disadvantages of analog audio Compression Rendering Quality Enhancement
bull Representing audio signals in digital form allows us to achieve the above goals more easily
bull The idea behind digital audio is to use numbers to represent the physical sound via an analog-to-digital (AD) conversion process
bull The AD conversion process involves sampling and quantization
Continuehellip
bull Sampling Each samplersquos amplitude as a function of a discrete index the rate at which each sample is extracted the sampling frequency or the sampling rate which is described in terms of number of samples per second or Hertz (Hz)
bull Quantization Sample resolution or bit depth determines how precisely the samplersquos amplitude is recorded or stored An n-bit sample resolution allows 2^n different possible amplitude values
Continuehellipbull Encoding The sampled and quantized signals are encoded using
some error correction codes and are stored in a media
bull CD AUDIO Itrsquos the most commonly used media for storing and transporting of digital audio
Sampling Rate 44100Hz (Nyquist Criteria satisfied for 20KHz) Sample Resolution 16-bit (ADC) Size (1minStereo) 60 x 2 x 44100 x 16 = 10584 MBmin Filename cda cdda
bull Generally they are uncompressed PCM data
bull The large amount of data makes them not suitable for internet streaming and digital broadcasting because of large bandwidth
HERE ARISE THE NEED FOR COMPRESSION
Compression Techniques
bull Any compression technique belongs to either lossy compression or lossless compression
bull Lossless Compression ndash If data is losslessly compressed the original data can be recovered
exactly from the compressed datandash As name implies involve no loss of information
bull Lossy compression ndash Involves some loss of informationndash Data that have been lossy compressed generally cannot be
recovered exactlyndash By accepting the above we can achieve higher compression ratios
than lossless compression
Perceptual Audio Coding
bull One of the key elements in the development of reduced bit rate audio is the understanding and application of psychoacoustics
bull All of the current perceptual audio coders achieve high compression rates by exploiting the fact that signal information that cannot be detected by even a well-trained listener can be discarded
bull Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components
bull Stereo audio streams contain largely redundant information
bull Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles
Principles of Psychoacoustics
1 Absolute Threshold of Hearing
The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment
It can be expressed with a non-linear function
Tq(f) = 364(f1000)-08 - 65e-06(f1000-33)2 + 10-3(f1000)4 (dB SPL)
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Continuehellip
bull Sampling Each samplersquos amplitude as a function of a discrete index the rate at which each sample is extracted the sampling frequency or the sampling rate which is described in terms of number of samples per second or Hertz (Hz)
bull Quantization Sample resolution or bit depth determines how precisely the samplersquos amplitude is recorded or stored An n-bit sample resolution allows 2^n different possible amplitude values
Continuehellipbull Encoding The sampled and quantized signals are encoded using
some error correction codes and are stored in a media
bull CD AUDIO Itrsquos the most commonly used media for storing and transporting of digital audio
Sampling Rate 44100Hz (Nyquist Criteria satisfied for 20KHz) Sample Resolution 16-bit (ADC) Size (1minStereo) 60 x 2 x 44100 x 16 = 10584 MBmin Filename cda cdda
bull Generally they are uncompressed PCM data
bull The large amount of data makes them not suitable for internet streaming and digital broadcasting because of large bandwidth
HERE ARISE THE NEED FOR COMPRESSION
Compression Techniques
bull Any compression technique belongs to either lossy compression or lossless compression
bull Lossless Compression ndash If data is losslessly compressed the original data can be recovered
exactly from the compressed datandash As name implies involve no loss of information
bull Lossy compression ndash Involves some loss of informationndash Data that have been lossy compressed generally cannot be
recovered exactlyndash By accepting the above we can achieve higher compression ratios
than lossless compression
Perceptual Audio Coding
bull One of the key elements in the development of reduced bit rate audio is the understanding and application of psychoacoustics
bull All of the current perceptual audio coders achieve high compression rates by exploiting the fact that signal information that cannot be detected by even a well-trained listener can be discarded
bull Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components
bull Stereo audio streams contain largely redundant information
bull Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles
Principles of Psychoacoustics
1 Absolute Threshold of Hearing
The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment
It can be expressed with a non-linear function
Tq(f) = 364(f1000)-08 - 65e-06(f1000-33)2 + 10-3(f1000)4 (dB SPL)
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Continuehellipbull Encoding The sampled and quantized signals are encoded using
some error correction codes and are stored in a media
bull CD AUDIO Itrsquos the most commonly used media for storing and transporting of digital audio
Sampling Rate 44100Hz (Nyquist Criteria satisfied for 20KHz) Sample Resolution 16-bit (ADC) Size (1minStereo) 60 x 2 x 44100 x 16 = 10584 MBmin Filename cda cdda
bull Generally they are uncompressed PCM data
bull The large amount of data makes them not suitable for internet streaming and digital broadcasting because of large bandwidth
HERE ARISE THE NEED FOR COMPRESSION
Compression Techniques
bull Any compression technique belongs to either lossy compression or lossless compression
bull Lossless Compression ndash If data is losslessly compressed the original data can be recovered
exactly from the compressed datandash As name implies involve no loss of information
bull Lossy compression ndash Involves some loss of informationndash Data that have been lossy compressed generally cannot be
recovered exactlyndash By accepting the above we can achieve higher compression ratios
than lossless compression
Perceptual Audio Coding
bull One of the key elements in the development of reduced bit rate audio is the understanding and application of psychoacoustics
bull All of the current perceptual audio coders achieve high compression rates by exploiting the fact that signal information that cannot be detected by even a well-trained listener can be discarded
bull Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components
bull Stereo audio streams contain largely redundant information
bull Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles
Principles of Psychoacoustics
1 Absolute Threshold of Hearing
The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment
It can be expressed with a non-linear function
Tq(f) = 364(f1000)-08 - 65e-06(f1000-33)2 + 10-3(f1000)4 (dB SPL)
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Compression Techniques
bull Any compression technique belongs to either lossy compression or lossless compression
bull Lossless Compression ndash If data is losslessly compressed the original data can be recovered
exactly from the compressed datandash As name implies involve no loss of information
bull Lossy compression ndash Involves some loss of informationndash Data that have been lossy compressed generally cannot be
recovered exactlyndash By accepting the above we can achieve higher compression ratios
than lossless compression
Perceptual Audio Coding
bull One of the key elements in the development of reduced bit rate audio is the understanding and application of psychoacoustics
bull All of the current perceptual audio coders achieve high compression rates by exploiting the fact that signal information that cannot be detected by even a well-trained listener can be discarded
bull Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components
bull Stereo audio streams contain largely redundant information
bull Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles
Principles of Psychoacoustics
1 Absolute Threshold of Hearing
The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment
It can be expressed with a non-linear function
Tq(f) = 364(f1000)-08 - 65e-06(f1000-33)2 + 10-3(f1000)4 (dB SPL)
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Perceptual Audio Coding
bull One of the key elements in the development of reduced bit rate audio is the understanding and application of psychoacoustics
bull All of the current perceptual audio coders achieve high compression rates by exploiting the fact that signal information that cannot be detected by even a well-trained listener can be discarded
bull Human hearing is insensitive to quiet frequency components to sound accompanying other stronger frequency components
bull Stereo audio streams contain largely redundant information
bull Irrelevant signal information is identified during signal analysis by incorporating into the coder several psychoacoustic principles
Principles of Psychoacoustics
1 Absolute Threshold of Hearing
The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment
It can be expressed with a non-linear function
Tq(f) = 364(f1000)-08 - 65e-06(f1000-33)2 + 10-3(f1000)4 (dB SPL)
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Principles of Psychoacoustics
1 Absolute Threshold of Hearing
The absolute threshold of hearing characterizes the amount of energy needed in a pure tone such that it can be detected by a listener in a noiseless environment
It can be expressed with a non-linear function
Tq(f) = 364(f1000)-08 - 65e-06(f1000-33)2 + 10-3(f1000)4 (dB SPL)
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Equal loudness contours for pure tones
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Continuehellip
bull When applied to signal compression it could be interpreted as a maximum allowable energy level for coding distortions introduced in the frequency domain
bull So using this information the noise levels during quantization are tried to fit below this threshold
bull Due to this quantization noise does not become audible
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
2 Critical Band
bull Human ear can be viewed as a discrete set of band pass filters which covers the entire 20kHz frequency range
bull The inner ear called as rdquoCochleardquo contains frequency sensitive positions Whenever any tone enters the cochlea it moves until it reaches the position where it resonates
bull The ldquocritical bandwidthrdquo is a function of frequency that quantifies the cochlear filter pass bands (unit ndash Bark)
bull As the center frequency goes on increasing the bark-width also goes on increasing
bull Spectral analysis of audio content is performed using critical bands
Bark-width with center frequency lsquofrsquo is gives as hellip BWc(f) = 25 + 75(1 + 14(f100)2)069 Hz
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Idealized critical band filter bank
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
3 Masking
bull Masking refers to a process where one sound is rendered inaudible because of the presence of another sound
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Advanced Audio CodingModular encoding AAC takes a modular approach to encoding Depending on the
complexity of the bitstream to be encoded the desired performance and the acceptable output implementers may create profiles to define which of a specific set of tools they want use for a particular application The standard offers four default profiles
bull Low Complexity (LC) - the simplest and most widely used and supported
bull Main Profile (MAIN) - like the LC profile with the addition of backwards prediction
bull Sample-Rate Scalable (SRS) - aka Scalable Sample Rate (MPEG-4 AAC-SSR)
bull Long Term Prediction (LTP) - added in the MPEG-4 standard - an improvement of the MAIN profile using a forward predictor with lower computational complexity
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
MPEG-2 AAC BLOCK DIAGRAMS
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
MPEG AAC FAMILY
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
MPEG-4 AAC LCPerceptual Noise Substitution [PNS ]
bull Instead of trying to reproduce a waveform that is similar as input signals the model-based coding tries to generate a perceptually
similar sound as output
bull The encoding of PNS includes two steps (1) Noise detection For input signals in each frame the encoder
performs some analysis and determines if the spectral data in a scale-factor band belongs to noise component
(2) Noise compression All spectral samples in the noise-like scale-factor bands are excluded from the following quantization and entropy coding module Instead only a PNS flag and the energy of these samples are included in the bitstream
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
MPEG-4 HE-AAC
Spectral Band Replication [ SBR ]bull Developed by a German based company ldquoCoding Technologiesrdquo
bull SBR is a bandwidth extension tool
bull The main effect used is the high correlation between the low- and high-frequency content in an audio signal
bull In an SBR-based coding system waveform audio coding is only used to code the lower frequencies of an audio signal This low frequency content is used to recreate the high frequency content at the decoding side
bull This is done by state-of-the-art transposition method
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
bull The reconstruction of the high band is conducted by transmitting guiding information such as the spectral envelope of the original input signal or additional information to compensate for potentially missing high-frequency components
bull This guiding information is referred to as SBR data
bull The recreated high-frequency content undergoes some frequency and time domain adjustment before it is combined with the low-frequency part of the audio signal
bull HE-AAC aka aacPlus v1
Continuehellip
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Continuehellip
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Continuehellip
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
MPEG-4 HE-AAC v2
Parametric Stereo
bull Its also a contribution from ldquoCoding Technologiesrdquo
bull In the encoder only a monaural downmix of the original stereo signal is coded after extraction of the Parametric Stereo data
bull Just like SBR data these parameters are then embedded as PS side information in the ancillary part of the bit-stream
bull In the decoder the monaural signal is decoded first After that the stereo signal is reconstructed based on the stereo parameters embedded by the encoder
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
bull Three types of parameters can be employed in a Parametric Stereo system to describe the stereo image
1048705bull Inter-channel Intensity Difference (IID) describing the intensity
difference between the channels
bull Inter-channel Cross-Correlation (ICC) describing the cross correlation or coherence between the channels The coherence is measured as the maximum of the cross-correlation as a function of time or phase
bull Inter-channel Phase Difference (IPD) describing the phase difference between the channels
bull HE-AACv2 aka aacPlus v2
Continuehellip
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Continuehellip
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Advantages Over MP3 AAC
1 Multi Channel Audio ndash up to 48 audio channels
2 Sample frequencies from 8KHz ~ 96KHz
3 Simpler filter bank (pure MDCT used)
4 Better stationary and transient response due to block sizes of 1024 and 128 samples
5 Excellent handling of high frequency signals
6 CD quality audio at 64Kbitssec
7 Much better quality of audio at lower bit rates (down to 32Kbps)
MP3
1 Stereo signal ndash maximum of only 2 channels
2 Sampling frequencies from 16KHz ~ 48KHz
3 Hybrid filter bank ( more computational power)
4 Poorer stationary and transient response due to block sizes of 576 and 192 samples
5 Signal handling up to 155158 KHz
6 CD quality audio at 128Kbitssec
7 Audio quality is poorer at low bit rates and may present coding artifacts
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Disadvantages
bull Transparency is lost at very low bit rates when SBR is used
bull Small loss of stereo image when PS is used
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
APPLICATIONS
bull HE-AAC was chosen as the coding used in DAB (Digital Audio Broadcasting)
bull HE-AAC is the coding used in DRM (Digital Radio Mondiale)bull Itrsquos the default format in Apples i-PODbull Used in mobile phone to store songsbull Itrsquos the audio coding used in 3gp and 3gpp formatbull Itrsquos the audio coding used in DTH services [MPEG-4]bull For Internet Streamingbull Audio format in Bluetooth StereoMono headsets
[ A2DP ndash Advanced Audio Distribution profile ] (Optional)
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
Conclusion
AAC ndash The perceptual audio coding the world is going to adapt completely
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
ReferencesSitesbull wwwwikipediaorgbull wwwhydrogenaudioorgbull wwwcodingtechnologiescombull wwwmp3-techorgaachtml
Booksbull High-Fidelity Multichannel Audio Coding - Dai Tracy Yang Chris Kyriakakis and
C-C Jay Kuobull Introduction To Data Compression - Khalid Sayood
Papersbull ISOIEC Standards [13818-7 14496-3]bull MP3 and AAC Explained Karlheinz Brandenburg [Father of MP3]bull CT-aacPlus - a state-of-the-art audio coding scheme Martin Dietz and Stefan
Meltzerbull MPEG-4 HE-AAC v2 - audio coding for todayrsquos media world Stefan Meltzer and
Gerald Moserbull helliphelliphellip
THANK YOU
THANK YOU