|Texas Instruments Audio and Video/Imaging Series|
The goal of digital compression algorithms is to produce a digital representation of an audio signal which, when decoded and reproduced, sounds the same as the original signal, while using a minimum of digital information (bit-rate) for the compressed (or encoded) representation. Different types of compression algorithms exist. This article provides an introduction to the AC-3 and MP3 algorithms. Any application using compression needs two steps at a minimum. A chip or coder for compression and a decoder for decompression.
For example, let's consider a 1D (audio) signal to demonstrate the two different domains. Consider a complicated sound such as an ambulance alarm. We can describe this sound in two related ways:
- Sample the amplitude of the sound many times a second, which gives an approximation to the sound as a function of time (time domain).
- Analyze the sound in terms of the pitches of the notes, or frequencies, and recording the amplitude of each frequency (frequency domain).
In Figure 1 , we have a signal that consists of a sinusoidal wave at 8 Hz. 8Hz means that wave is completing 8 cycles in 1 second and is the frequency of that wave.
Figure 1: Example of time domain
In Figure 2 we can see that the composition of our signal is one wave (one peak) occurring with a frequency of 8 Hz with a magnitude/fraction of 1.0, in other words, it is the entire signal.
Figure 2: Example of frequency domain
Another example of digital compression algorithms are the popular music files available on the Internet. What has made this possible is the reduction of download time and file size of the transmitted song. The introduction of MP3 to the market made this possible. What once took hours to download and occupied huge amounts of space could be downloaded with the same speed modem much faster and stored in around one-tenth the disk space. At the same time, the high quality sound reproduction was retained. Thus music downloads have become popular on the Internet. However, it is worth mentioning that it's the file content and not the file type that may breach copyrights.
We will compare how these different algorithms handle compression at a high-level in this section.
MP3 (MPEG-1 Audio Layer 3) is a form of compression that was standardized by MPEG in 1991. The MP3 compression technique is based on the psycho-acoustic model of compression of human sensitivity to frequencies. This model selects audible frequencies and rejects all other frequencies. Human beings can hear frequencies in the range between 20 Hz and 20 KHz, and it is most sensitive between 2 to 4 KHz. MP3 files thus consist of frequencies in the audible range and reject frequencies which cannot be heard. This is known as destructive compression. The disadvantage is that frequencies once rejected or eliminated in the creation of an MP3 cannot be replaced.
While encoding a file into MP3, you can choose different compression levels. Audio quality is directly proportional to the file size. That is, the larger the size, the higher the quality. An MP3 file created with 128 Kbit compression will be of greater quality and larger file size than that of a 56 Kbit compression. The advantages of MP3 compression is that people can back up their own collection of songs and save it to their hard drive or CDs.
The MP3 format uses, at its heart, a “hybrid transform” to transform a time domain signal into a frequency domain signal. This basic model is the same for all three layers of audio defined by MPEG, but codec complexity increases with each layer. The codec divides data into frames, and each frame contains 384 samples, 12 samples from each of the 32 filtered sub-bands.
Steps in the MP3 algorithm:
- Use convolution filters to divide the audio signal (for example, 48 kHz sound) into frequency sub-bands that approximate the 32 critical bands (sub-band filtering).
- Determine amount of masking for each band caused by nearby band (the psychoacoustic model).
- If the power in a band is below the masking threshold, it is rejected.
- If the power is within acceptable limits, determine number of bits needed to represent the coefficient such that noise introduced by quantization is below the masking effect (1-bit of quantization introduces about 6 dB of noise).
- Format bitstream.
A high-quality critical band filter is used (non-equal frequencies) for MP3. In addition, the psychoacoustic model includes temporal masking effects, takes into account stereo redundancy, and uses a Huffman coder.
Texas Instruments has successfully ported the audio formats of compressed format MP3 decoders to the C5409, which offers 32 Kwords of RAM along with a 16 Kword ROM for holding tables and constants. The code development was accomplished using spectrum Digital C54X EVM. For real-time code development, the C5410 Internet Audio EVM developed within the Portable Audio Group at TI was used. This board is a microless design with a compact flash for holding music and a user interface for controlling playback. Support is provided for audio decompressionm graphic equalization and volume control. The DSP also handles user interface operations and compact flash I/O .
AC-3, developed by the Digital Coder group at Dolby Labs, is a high-quality, low-complexity multi-channel audio coder. You can obtain a lower net data rate by coding multiple channels as a single entity than by coding individual channels. By coding a multiplicity of channels as a single entity, it is able to operate at lower data rates for a given level of audio quality than an ensemble of equivalent single channel coders.
Although AC-3 algorithms are independent of the number of channels coded, current implementations have standardized on the SMPTE-recommended 5.1 channel arrangement: five full bandwidth channels representing Left, Center, Right, Left-Surround, and Right-Surround; plus a limited bandwidth low-frequency Subwoofer channel. AC-3 conveys this channel arrangement with a high degree of transparency at data rates as low as 320 Kbps. AC-3 has been implemented using available, cost-effective DSP hardware, and is designed to be readily ported to new DSP platforms.
The central philosophy behind AC-3 is that all channels are compressed together as an ensemble, where the total bits that can be accommodated by the media (which in this case is film) is distributed among the channels. The input to the AC-3 encoder is six-channel PCM audio (16- to 24-bit resolution and 48 kHz sampling rate).
Steps in the AC-3 algorithm:
- Transform each of the channels from the time to the frequency domain, using Time Domain Aliasing Cancellation (TDAC). Blocks of 512 samples, or 10.7 ms of audio, which are normally used to yield 256 spectral coefficients. However, when a transient signal is detected, the block size is reduced to 5.4 ms duration to minimize pre-echo.
- A mantissa and an exponent are then obtained by converting each of the spectral coefficients from all of the channels from a fixed-point binary number to floating-point notation. The mantissa is a fractional amount of the fixed-point number, and the exponent is a scaling factor, to which the mantissa is multiplied to obtain the fixed-point number. Resolution is obtained by the word length of the mantissa, and the exponent determines the quantizing step size of the frequency component. Expressing spectral coefficients in floating-point form is advantageous because it allows for floating-point coding opportunities.
- The output compressed data stream consists of the mantissa and exponent information from all of the channels, plus auxiliary data for exponent coding, coupling coefficients, and bit allocation.
The decoder is the reverse process of encoding.
Different strategies have been used by AC-3 to code floating point numbers. For a steady audio signal, the exponent information is repeated over several blocks up to about 64 ms in a duration of six blocks. Dolby also determined that if the difference between exponents in adjacent frequencies were to be coded instead of the actual values of the exponents, only about two-bit resolution would be required.
Bit allocation measures are used to obtain more data reduction. The set of spectral coefficient exponents spanning the frequency range is a representation of the signal power along the spectrum. This set is grouped into bands, whose width increases with frequency, similar to the critical bands (psychoacoustic model of masking).
Each band has a common exponent. The word length of the mantissas within each of the bands is then determined by a bit allocation routine, which is based on a predicted masking curve over the entire spectral range. This masking curve is determined for each frequency band. If a band exponent lies above or below the masking curve (as calculated from a model), the value of the curve at that band frequency is accordingly incremented or decremented. The results from each of the bands are then combined to obtain the predicted masking curve. After making sure that all parts of this curve exceed the threshold of human auditory sensitivity, the mantissa for each frequency component (not for each band) is re-quantized, with the resolution corresponding to the extent to which its exponent exceeds the predicted masking value.
After the mantissas have been re-quantized, a count of the number of bits consumed for all of the channels is performed. If the total number of bits available has not been exceeded, then the mantissas can be quantized with greater accuracy. However, if the total has been surpassed, then two measures can be invoked. The first is to just decrease the resolution of the mantissas. Up until now, we have only been considering data reduction in AC-3 for each of the channels independently. A second way of meeting the total bit requirement is a technique known to Dolby as coupling.
In the coupling technique, the mantissa information for frequency bands across multiple channels is combined into a single coupling channel, based on the average signal power. For each band, the ratio between the signal power in the coupling channel and in each separate channel (known as the coupling coefficient) is substituted for the mantissa and exponent in each channel, which in turn requires fewer bits. Then, the original spectral coefficients for each channel are recovered upon decoding, by multiplying the mantissas from the coupling channel by the appropriate coupling coefficients. Coupling occurs only for frequency bands above 10 kHz.
Different coding strategies can be used to achieve low data rates. AC-3 is by far the most complex of the two codecs. Steps for decoding the data are those essentially the reverse for those of encoding, requiring the ancillary data for parameters and information on reconstructing the original channels.
The AC-3 Decoder CS5063 offered in the form of library/source code is an efficient solution from Cute Solutions for catering to a large suite of embedded applications requiring AC-3 decoding implementation. The solution is based on Texas Instruments' TMS320C64x architecture.
TI and Dolby are in the process of jointly developing the TI-based Dolby Digital Professional Encoder platform . Based on Dolby's digital-audio encoding technology and TI's TMS320C67x floating-point digital signal processing (DSP) technology, the Dolby-certified platform offers up to three times the channel density of existing fixed solutions for equipment for the creation of professional digital audio for DVDs, HDTV broadcasts and cable/satellite transmissions.
TI's TMS320C6713 floating-point DSP is the basis for its reference platform. Running at 300 MHz, a single programmable C6713 encodes three stereo pairs and enables a universal broadcast encoder platform, allowing broadcasters to standardize on audio encoding hardware platforms.
Previous implementations of the professional version of AC-3 required users to have a separate DSP for each stereo pair. With the increased capabilities of the TI DSP-based platform developers will be able to reduce the board space required for end-user equipment.
Created by Lyrtech, a reference design is available for evaluation of the AC-3 Professional Encoder platform. This design supports six S/PDIF inputs (12 digital audio channels) and three S/PDIF outputs (six digital audio channels) with the ability to embed Timecode and Metadata information into the digital audio output streams, the reference board uses a USB 2.0 interface to connect to a host computer, enabling real-time configuration of AC-3 encoder parameters as well as transfer of data.
Surround sound for motion pictures uses AC-3. In the future, Dolby might extend its uses to television and video production as well. MPEG-4, which is used in Digital Television broadcasting, is a more advanced system that not only allows interactivity, but also allows greater protection for intellectual property rights (in other words, copyright), compared to other MPEG formats.
MP3, dispite its disadvantage of not being able to reuse the rejected frequency, may become the standard due to its current momentum and popularity.