Audio Coding

A Comparison of MPEG-1, MP3, MPEG-2, & MPEG-4

Applications and Performance Analysis

Sherida Subrati

Anthony Caliendo

The MPEG codecs evolved as the needs of the industry and the readily available technology changed with time. MPEG-1 was developed with the intention of distributing audio on a CD medium, coupled with the video portion of this codec it provided VHS quality audio and video distributable on CDs. MPEG-2 audio’s main feature involved the ability to encode multiple audio channels as well as improved encoding-noise reduction algorithms. This variation catered to the surge in surround sound becoming a standard in cinemas and also home theatres. The Advanced Audio Coding (AAC) revision of this particular variation offers such high-quality encoding that it remains practically untouched even in the newest MPEG-4 standard, which mainly involves improvements in video encoding technology. MPEG-4 has the advantage of featuring a simple math structure, enabling rapid encoding/decoding, as well as a Low Delay (LD) version that enables high quality audio over a narrow bandwidth. Perhaps the most widely known revision is the MP3, or MPEG-1 Layer 3, which is used throughout various industries to distribute high-quality low bit rate stereo sound.

The intention of this research and comparative study is to determine where the efficiencies of each standard lies despite overlapping abilities. Each codec is geared towards a different audience and so, despite backwards compatibility, older codecs may actually provide more efficient encoding for a given application. MPEG-4 in its new innovations and attempts to encompass all applications of its codec “ancestors” may lead to providing telephony or even music delivery service over very low bandwidth communication channels. The computational load or complexity of these encoding/decoding schemes will also be explored in order to show how certain codecs, being mathematically “simple”, are cheaper to implement in hardware. This will be shown through sample compressions of music and speech samples, while monitoring processor and memory usage, as well as encoding duration.

Project Plan:

  • Subrati & Caliendo – Researching & analyzing MPEG-1/2/4 codec structures
  • Caliendo – Explore the effects and improvements made by AAC
  • Subrati – Obtain comparison tools/software that can be used to test the performance of each codec
  • Caliendo – Further in depth analysis of encoding structure of codec, dealing with the basic technology, such as signal processing and psychoacoustics, behind it.
  • Subrati – Establish set of benchmarks to perform
  • Caliendo – Perform
  • Subrati & Caliendo - Quality comparison of the performance of each codec with the use of music files, voice files, etc.
  • Subrati – Through previous comparison have a definite idea of weaknesses/strengths for certain applications of each codec
  • Caliendo – Identify new applications made possible through codec improvements.
  • Caliendo – Prepare presentation
  • Subrati – Prepare final report

Project Accomplishments:

  1. MPEG-1: This standard consists of three layers of increasing complexity, delay, and quality/performance. The higher layers are incorporated through the use of the lower layers as a platform. MPEG-1 supports four modes: mono, stereo, dual (two separate channels), and joint stereo.
  2. Layers I and II:

-Uses a polyphase filter bank with 32 sub bands of equal size.

-Layer I uses 512-point FFT, and Layer 2, 1024-point FFT, to generate the psychoacoustic masking models.

-Layer II performs better because of a lower bit rate due to a redundancy reduction technique used for the transmission of scalefactors.

  1. Layer III

-Uses a hybrid filter bank, in which the 32 subband signals are subdivided by using a DCT block transform.

-Utilizes an analysis by synthesis approach (the process of scaling, quantization, and coding of data is performed within two nested iteration loops), pre-echo control, and non-uniform quantization with entropy coding.

-Also has a buffer technique, known as bit reservoir, which preserves additional bit rate by ensuring that the decoder buffer neither overflows or underflows when a bitstream is presented at a constant rate.

-Only layer that supplies obligatory decoder support for variable bit rate coding.

  1. MPEG-2: MPEG-2 creates a lifelike sound for audio related applications. This includes video conferencing, electronic cinema, multimedia services, multilingual channels, channels for hearing impaired and visually impaired, etc. This standard is the second phase of MPEG that supports 2 multi-channel audio coding standards.
  2. Forward and Backward Compatibility with MPEG-1:

-The forward and backward compatibility aspect of MPEG means that new developments will not necessarily make existing equipment and programming obsolete.

  1. Non-backward Compatible (NBC):

-NBC coders do not provide a meaningful bit stream for MPEG-1 stereo decoder, however they produce high quality reproduction of audio signals.

  1. MPEG-2 AAC:

-Basic modules have been developed that support the following features: optional preprocessing, time-to-frequency mapping (filterbank), psychoacoustic modeling, prediction, quantization and coding, noiseless coding, and bit stream formatter.

-This standard also offers three profiles:

  1. Main profile offers a scalable DCT block.
  2. Low complexity profile eliminates temporal noise shaping and time domain prediction.
  3. Sampling-rate-scaleable profile adds a preprocessing module.
  1. MPEG-4: MPEG-4 is the one of the most recent iterations, combining all previous MPEG standards into one standard that covers a multitude of applications. It achieves higher quality versus bit-rate while sparing the computational complexity of MP3. Also through optimized code, load delay encoding and decoding of audio is possible. High fidelity audio is capable of being transmitted via a narrow bandwidth.

Three core coders are used either individually or in tandem to achieve high coding efficiency for a higher bit-rate:

-a parametric coding scheme for low bit rate speech coding

-an analysis-by-synthesis coding scheme for medium bit rates (6 to 16 kb/s)

-a subband/transform-based coding scheme for higher bit rates[1]

The Low-Delay component of the codec opens this format to new applications such as IP phone and other real time voice communications. At very low bit rates it substitutes sounds rather than trying to reproduce them.

  1. Comparisons:MPEG-1 Layers I, II, III and MPEG-4 low complexity and long-term prediction encoders

-Hardware: Pentium III 1 GHz CPU, 512 MB PC133 RAM, Windows 2000 with SP3

-Software: AVI2MP, LAMEwin32, NERO MPEG-4 AAC Encoder, GOLDWAVE sound editor.

-We used two sample music files originally in PCM 16-bit 44.1 kHz stereo format and one recorded voice sample having the format PCM 8-bit 44.1 kHz. These were all coded at various constant bit rates using the five codecs mentioned above. The bit rates, relative file sizes, encoding times, and quality ratings are all available in Appendix A.

-The file sizes for each bit rate did not vary with each codec but this was not the case with quality. The newer the codec, the better the quality at a given bit rate, while certain codec such as MP3 and MPEG-4 LTP scaled up much more rapidly than other codecs. It was also found that during waveform comparison MP3 formats experienced a 50msec delay from the original waveform while MPEG-1 layers I & II only experienced about a 10msec delay. MPEG4 waveforms could not be compared in the same fashion since software could not be obtained that read these files accurately.

Conclusion:

Data obtained through both research and actual encoding of sound samples provided somewhat expected results of various factors such as quality, file size, etc. As expected, with constant bitrate encoding, the file sizes regardless of codec were the same for each sample. Quality steadily increased when compared to bitrate, achieving much higher quality at each bitrate when using a newer codec. The encode times were understandable as well. There are very few freeware MPEG1 – Layer I & II encoders, as well as MPEG4 encoders. The former is an older codec and the latter is so new and its parameters are not yet set in stone due to licensing bickering, that few are willing to release actual encoders. MPEG1 – Layer III however is widespread, has been developed by countless people, and so its encoder has gone through countless refinements over the years. An Excel spreadsheet with all of this data, as well as screenshots of certain waveforms has been included in a zip file. (Note: The file name S2-M1L1064… means sample 1, mpeg1 layer 1 at 64kbps. All samples were encoded at 44.1Khz)

References:

  • Bouvigne, Gabriel. “MPEG-2/MPEG-4 – AAC”
  • Howitt, Wil. “Sub-Band Coding” Otolith 18th October 1995.
  • Noll, Peter. “MPEG Digital Audio Coding Standards” CRC Press LLC 1999. mpeg%20audio%20coding.pdf
  • “AAC-LD: MPEG-4 Audio Coding for Low Delay, High Quality Sound Applications”
  • “Introduction to MPEG-2”
  • “MPEG-4”
  • “MPEG-4: Scalable AAC Coding”
    download/scal_aac2.pdf
  • “MPEG AAC”

[1] Noll, Peter. “MPEG Digital Audio Coding Standards” CRC Press LLC 1999. mpeg%20audio%20coding.pdf