31
EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio codec) and HE-AAC (high efficiency-advanced audio codec) Name: Yashas Prakash Student ID :1000803680 Instructor: Dr. K. R. Rao Date: 04-25-2012

EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Embed Size (px)

Citation preview

Page 1: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

EE 5359 Multimedia Processing Project ProposalStudy and implementation of G.719 audio codec and performance analysis of

G.719 with AAC (advanced audio codec) and HE-AAC (high efficiency-advanced audio codec)

 

Name: Yashas Prakash

Student ID :1000803680

Instructor: Dr. K. R. Rao

Date: 04-25-2012

Page 2: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

List of acronyms• AAC - Advanced audio coding • ATSC - Advanced television systems committee• AES - Audio Engineering Society• EBU - European broadcasting union • FLVQ - Fast lattice vector quantization • HE-AAC - High efficiency advanced audio coding• HRQ - Higher rate lattice vector quantization• IMDCT - Inverse modified discrete cosine transform• ISO - International organization for standardization• ITU - International telecommunication union• JAES - Journal of the Audio Engineering Society• LC - Low complexity• LRQ - Lower rate lattice vector quantization• LFE - Low frequencies enhancement• LTP - Long term prediction• MDCT - Modified discrete cosine transform• MPEG - Moving picture experts group• SBR - Spectral band replication• SMR - Symbolic music representation• SRS - Sample rate scalable• TDA - Time domain aliased• WMOPS - Weighted millions operations per second

Page 3: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Introduction to codecs

• A codec is a device or computer program capable of encoding or decoding a digital data stream or signal.

• It can be thought of as a compressor/de-compressor or encoder/decoder

• Codec programs are required for the media player to play audio/video files.

• A codec encodes a data stream or signal for transmission, storage or encryption and decodes it for playback or editing.

• Codecs are used in videoconferencing, streaming media and video editing applications.

Page 4: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Introduction to G.719 codec [1]

• G.719 is an ITU-T standard audio codec providing high quality, moderate bit rate (32 to 128 kbit/s) wideband (20 Hz - 20 kHz audio bandwidth, 48 kHz audio sample rate) audio coding at low computational load [1].

• It was produced through a collaboration between Polycom and Ericsson.

• G.719 incorporates elements of Polycom's Siren22 codec (22 kHz) and Ericsson codec technology, as well as Polycom's Siren7 and Siren14 codecs (G.722.1 and G.722.1 Annex C), which have been used in videoconferencing systems for many years [1].

Page 5: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Advantages and Applications of G.719 [3]

• The algorithm is designed to provide 20 Hz - 20 kHz audio bandwidth using a 48kHz sample rate, operating at 32 - 128 kbps [3].

• This codec features very high audio quality and low computational complexity and is suitable for use in applications such as videoconferencing, teleconferencing, and streaming audio over the Internet [3].

Page 6: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

OVERVIEW OF THE G.719 CODEC [1]

• The G.719 codec is a low-complexity transform-based audio codec and can provide an audio bandwidth of 20 Hz to 20 kHz at 32 - 128 kbps.

• The codec features very high audio quality and extremely low computational complexity compared to other state-of-the-art audio coding algorithms.

• G.719 is optimized for both speech and music.• It is based on transform coding with adaptive time-resolution,

adaptive bit-allocation and low complexity lattice vector quantization [1].

• The computational complexity is quite low (18 floating-point MIPS) for an efficient high-quality compressor [1].

Page 7: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

G.719 Contd..

• The codec operates on 20 ms frames, and the algorithmic delay end-to-end is 40 ms [2].

• The encoder input and decoder output are sampled at 48 kHz [2].

Page 8: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Block diagram of G.719 encoder

Block diagram of the G.719 encoder [1].

Page 9: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

ADAPTIVE TIME-FREQUENCY-TRANSFORM

• The adaptive time-frequency transform is based on the detection of a transient sounds [3].

• In the case of transient sounds, the time-frequency transform will increase its time resolution and allows a better representation of the rapid changes in the input signal characteristics [3].

Page 10: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

G.719 Decoder

Block diagram of the G.719 decoder [1].

Page 11: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Complexity in G.719

• Complexity is a paramount parameter for a codec. Complex codecs require more powerful and more expensive digital signal processors (DSPs) to run on [1].

• This increases the product cost and power consumption, which limits the codec usability [1].

• The fixed-point C-code implementation of G.719, which is an integral part of the recommendation by ITU-T, is based on a set of instructions that mimics a generic DSP instruction set [1].

Page 12: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

An overview of AAC codec [9]

• Advanced audio coding(AAC) scheme was a joint development by Dolby, Fraunhoffer, AT&T, Sony and Nokia [9].

• It is a digital audio compression scheme for medium to high bit rates which is not backward compatible with moving picture experts group (MPEG) audio standards [9].

• AAC is a second generation coding scheme which is used for stereo and multichannel signals. When compared to the perceptual coders, AAC provides more flexibility and uses more coding tools [12].

Page 13: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

AAC codec contd.,

• The AAC encoding follows a modular approach and the standard defines four profiles which can be chosen based on factors like complexity of bitstream to be encoded, desired performance and output [9]. – Low complexity (LC)– Main profile (MAIN) – Sample-rate scalable (SRS)– Long term prediction (LTP)

Page 14: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

An overview of the HE-AAC codec [9]

• High efficiency advanced audio codec is a lossy data compression scheme.

• It is an extension of low complexity AAC optimized for low bit rate operations such as streaming audio.

• HEAAC uses spectral band replication (SBR) technology to enhance the compression efficiency in frequency domain.

• Scientific testing by the European Broadcasting Union has indicated that HE-AAC at 48 kbit/s was ranked as "Excellent" quality using the MUSHRA scale [9].

• Testing indicates that material decoded from 64 kbit/s HE-AAC does not yet have similar audio quality to material decoded from MP3 at 128 kbit/s using high quality encoders.

Page 15: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Block diagram of the SBR encoder [15]

Page 16: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Block diagram of the SBR decoder [15]

Page 17: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Subjective performance of G.719 [1]

Page 18: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Explanation for the subjective performance of G.719 codec

• DMOS: Degradation mean opinion score:- It is defined as user’s view of the quality of the network. It is a subjective measurement where listener would sit in a quiet room and score call quality as they are perceived.– Requirements : The talker should be seated in a quite room and the reverberation time is less than

500ms, Room noise level should be below 30dba.

DMOS Ratings: 5=excellent, 4=good, 3=fair, 2=poor, 1=annoying• Experiment 1: speech• Experiment 2: mixed content and music(speech music and noise)

The reference test vector used in these experiments are of MPEG audio format.

Studying the above graphs: In experiment 1 the G.719 codec performed better at all bit rates In experiment 2 the G.719 codec better than the reference codec at lowest bit-rate and is almost the

same as the reference for all other bit rates. An additional subjective listening test for the G.719 codec was conducted later to evaluate the quality

of the codec at rates higher than those described in the ITU-T test plan. Because the quality expectation of the codec at these high rates is high, a pre-selection of critical items, for which the quality at the lower bit rate range was most degraded, was conducted prior to testing. The test results are shown in Figure 7. It has been proven that transparency was reached for critical material at 128 kbps.

Page 19: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Framework of G.719 audio codec

• The G.719 framework is defined by the transformation of time domain signals into frequency domain spectra.

• The transform is a modulated lapped transform(MLT) performed differently on the mode selection based on the transient detection.

• The MLT consists of windowing followed by modified DCT.• The transient mode consists of further time segmentation into four sub-frames to improve the time

resolution.• The transient are detected from time-domain signal in order to select a fine time resolution for

transients as well as for stationary signal.• The switching between the stationary and transient is instantaneous and does not require the usage

of transient window• The MLT is applied to block of two consecutive frame which is explained in this slide. • The signal is sampled at 48khz with a window function, then the reference signal should be 25khz,

which satisfies the nyquist criterion, this reference signal is an overlapping signal. • Due to the large frequency spread of the rectangular window the freq analysis can be contaminated

by aliasing.• In order to reduce the frequency spread and suppress the aliasing effect windows with sharp

discontinuities are used.

Page 20: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Frame buffering and windowing with overlap [16]

Page 21: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Explanation for windowing of sub-frames

• In transient mode of G.719 the time aliased signal block is reversed in time and divided into four sub-frames

• The reversion recreates the temporal coherence of the input signal that was destroyed by time domain aliasing.

• The first and the last sub-frames are windowed by half sine windows with a fourth of zero padding while second and third sub-frames are windowed with ordinary sine window.

• The overlap between windowed sub-frames is 50% and each segment is MDCT transformed.

• The transform lengths are equal in the stationary and transient mode of G.719.

Page 22: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Windowing of sub-frames in transient mode [16]

Page 23: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Steps for implementation

• Use a C-compiler such as DevC++ to compile the code. Any C-compiler can be used to generate the executable files.

• The encoder code is executed to get encoder.exe file which is used to for encoding the input test_vectors of 32,48 and 64kbps.

• The decoder code is executed to get decoder.exe file which is used to decode the the encoded test_vectors which are of 32,48 and 64kbps respectively.

• The encoded and the decoded files are compared with each other in the console to check if decoded file and the original test_vector was the same.

Page 24: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Console commands for execution• The console commands to encode a test_vector at 32kbps is as follows:-

- encoder.exe –r 32000 –i *input file path\test_vector.raw –o *output file path\test_32000_en.bs

• The console commands for the decoder at the same bit rate is as follows:-

Note: The input file here is the encoded file at 32000 bps• - decoder.exe –r 32000 –i *path of the input file\test_32000_en.bs –o *specific path of the output

file\test_32000_dec.raw

• Note: It is advisable to keep the encoded and decoded files in the same root folder as it would it be very helpful to compare the files the sound frames are encoded and decoded.

• Type console command :- • comp test_32000_dec.raw test_vector.raw

This command gives us the validation that the decoded file is infact same as the test_vector.raw through which the file was encoded. • The screen shots of the above commands implemented in console is shown.

Page 25: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Instructions to implement the encoder and decoder

Page 26: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Implemented encoder

Page 27: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Implementation of decoder

Page 28: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

Comparison of decoded sequence with default test vector

Page 29: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

References

• [1] M. Xie, P. Chu, A. Taleb and M. Briand, " A new low-complexity full band (20kHz) audio coding standard for high-quality conversational applications ", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp.265-268, Oct. 2009.

• [2] A. Taleb and S. Karapetkov, " The first ITU-T standard for high-quality conversational fullband audio coding ", IEEE communications magazine, vol.47, pp.124-130, Oct. 2009.

• [3] J. Wang, B. Chen, H. He, S. Zhao and J. Kuang, " An adaptive window switching method for ITU-T G.719 transient coding in TDA domain", IEEE International Conference on Wireless, Mobile and Multimedia Networks, pp.298-301, Jan. 2011.

• [4] J. Wang, N. ning, X. ji and J. kuang, " Norm adjustment with segmental weighted SMR for ITU-T G.719 audio codec ", IEEE International Conference on Multimedia and Signal Processing, vol.2, pp.282-285, May. 2011.

Page 30: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

References

• [5] K. Brandenburg and M. Bosi, “ Overview of MPEG audio: current and future standards for low-bit-rate audio coding ” JAES, vol.45, pp.4-21, Jan/Feb. 1997.

• [6] A/52 B ATSC Digital Audio Compression Standard: http://www.atsc.org/cms/standards/a_52b.pdf

• [7] F. Henn , R. Böhm and S. Meltzer, “ Spectral band replication technology and its application in broadcasting ”, International broadcasting convention, 2003.

• [8] M. Dietz and S. Meltzer, “CT-AACPlus – a state of the art audio coding scheme”, Coding Tecnologies, EBU Technical review, July. 2002.

• [9] ISO/IEC IS 13818-7, “ Information technology – Generic coding of moving pictures and associated audio information Part 7: advanced audio coding (AAC) ”, 1997.

Page 31: EE 5359 Multimedia Processing Project Proposal Study and implementation of G.719 audio codec and performance analysis of G.719 with AAC (advanced audio

References• [10] M. Bosi and R. E. Goldberg, “ Introduction to digital audio coding standards ”, Norwell, MA, Kluwer,

2003.

• [11] H. S. Malvar, “ Signal processing with lapped transforms ”, Artech House, Norwood, MA, 1992.

• [12] D. Meares, K. Watanabe and E. Scheirer, “ Report on the MPEG-2 AAC stereo verification tests ”, ISO/IEC JTC1/SC29/WG11, Feb. 1998.

• [13] Super (c) v.2012.build.50: A simplified universal player encoder and renderer, A graphic user interface to FFmpeg, Mencoder, Mplayer, x264, Musepack, Shorten audio, True audio, Wavpack, Libavcodec library and Theora/vorbis real producers plugin: www.erightsoft.com

• [14] T. Ogunfunmi and M. Narasimha, “ Principles of speech coding ”, Boca Raton, FL: CRC Press, 2010.

• [15] P. Ekstrand, " Bandwidth extension of audio signals by spectral band replication ", IEEE Workshop on model based processing and coding of audio, pp.53-58, Nov. 2002.

• [16] T. Johnson, " Stereo coding for ITU-T G.719 codec ", Uppsala university, May 2011.