CMP PLANET ANALOG
CMP TAGLINE NEWSLETTER
HOME HOME NEWSLETTER ABOUT ADVERTISING FEEDBACK
News


Events
Discussion
Industry Groups



EE TIMES NETWORK
 Online Editions
 EE TIMES
 EE TIMES ASIA
 EE TIMES CHINA
 EE TIMES FRANCE
 EE TIMES GERMANY
 EE TIMES INDIA
 EE TIMES JAPAN
 EE TIMES KOREA
 EE TIMES TAIWAN
 EE TIMES UK

 EE TIMES EUROPE
 ANALOG EUROPE
 INDUSTRIAL EUROPE
 AUTOMOTIVE DL EUROPE

 POWER DL EUROPE

 Web Sites
 • Audio DesignLine
 • Automotive DesignLine
 • Career Center
 • CommsDesign
 • Microwave
    Engineering
 • Deepchip.com
 • Design & Reuse
 • Digital Home DesignLine
 • DSP DesignLine
 • EDA DesignLine
 • Embedded.com
 • Elektronik i Norden
 • Green SupplyLine
 • Industrial Control
    DesignLine
 • Planet Analog
 • Mobile Handset
    DesignLine
 • Power Management
    DesignLine
 • Programmable Logic
    DesignLine
 • RF DesignLine
 • RFID-World
 • Techonline
 • Video | Imaging
    DesignLine
 • Wireless Net
    DesignLine

ELECTRONICS GROUP SITES

 • eeProductCenter
 • Electronics Supply &
    Manufacturing
 • Conferences
    and Events
 • Electronics Supply &
    Manufacturing--China
 • Electronics Express
 • Webinars


21 November 2008

MPEG-4 toolkit improves on motion vector calculation

By Paul Fernandez, Member pf the Technical Staff, and Mark Nadeski, Software Development Manager, Imaging and Audio Group, Texas Instruments, Dallas
Planet Analog
January 8, 2003 (2:00 PM EST)




Digital video, with its inherent quality, reliability and flexibility advantages over traditional analog video, is showing up in a wide range of applications. Unlike analog video, signals in the digital domain allow users to view, access and manipulate content in entirely new ways.

MPEG-1 and MPEG-2, the first two video standards from the Moving Pictures Experts Group (MPEG) body of the International Organization for Standardization (ISO)A1, were fundamental in creating widespread acceptance of digital video formats. Their successor, MPEG-4, is leading the march onto the Internet -- and the "mobile Internet" -- with lower bit rates, more flexibility, and a range of new features.

MPEG-1 was designed to code progressively scanned video at bit rates up to about 1.5 Mbps, providing quality similar to VHS and stereo audio at 192 bps. Primary applications include CD-ROM interactive and Video CD systems for storing video and audio on CD-ROM. Building on the original MPEG-1 work started in 1988, the ISO started work on MPEG-2 in 1990, recognizing the need for a second standard suitable for coding video for broadcast formats at higher data rates. The MPEG-2 standard can code standard-definition television at bit rates from about 3 to15 Mbps and high-definition television at 15 to 30 Mbps. MPEG-2 builds on the stereo audio capabilities of MPEG-1 to add multi-channel surround sound coding. To address low bit rate applications such as videophones, the International Telecommunications Union (ITU) also developed the H.263 standard.

While perfectly adequate to work in the environments for which they were designed, MPEG-1, MPEG-2 and H.263 don't feature the flexibility needed to efficiently address the requirements of the myriad of multimedia applications now making their way to market. It is at this juncture, where flexibility meets high performance, that the recently adopted MPEG-4 standard makes its entry. As a multimedia standard, MPEG-4 is designed to interoperate among a large number of applications with very different requirements. In general, most multimedia applications have in common the need to support interactivity with different kinds of data. Variations in visual data include data type, source, type of communication and desired functionality with the visual image. MPEG-4 gives designers a technological foundation to provide multimedia functionality across all these data requirements. A few application areas include:

Digital TV - Widespread interest in digital television is a natural outgrowth of widespread growth in the Internet. Like the web, interactivity is the operative word in new digital television systems. Increased text, picture, audio, or graphics can be controlled by the user to augment the entertainment value of programming, and user-controlled graphics give the viewer the ability to gather information unrelated to the programming they are watching.

Mobile Multimedia - Next-generation cell phones and palm computers will provide a rich array of digital video and multimedia applications to enhance the end-user experience.

Streaming video - Streaming video applications over the Internet are already massively popular. Standards-based formats increase error resilience and improve coding efficiency to improve the user's streaming video experience.

Games - Sophisticated formats like MPEG-4 allow the addition of video objects into traditional 3D-graphics-based games. In addition to making the user-experience more compelling standards-based technology also makes it possible to personalize games using personal video databases linked in real-time to the game.

Television Production - As with all digital applications, flexibility is the key to opening up new television production possibilities. New digital standards allow scenes and actors to be recorded separately and then mixed with computer-generated special effects. By coding objects digitally instead of capturing them in traditional linear video frames, a scene can be rendered in higher quality and with more flexibility.

The MPEG-4 Toolkit

MPEG-4 is composed of a collection of "tools" to support and enhance these applications. The standard provides tools for shape coding, motion estimation and compensation, texture coding, error resilience, sprite coding and scalability. For those not wishing to implement the whole standard, MPEG-4 provides a rich suite of well-defined subsets of itself, called "conformance points". These allow the freedom to optimize system cost without sacrificing interoperability. Together, these capabilities give designers a highly flexible and interoperable way to render high-quality digital video graphics across a wide spectrum of multimedia applications.

Features and functionality

The MPEG-4 standard is made up of a set of tools that support applications through several classes of applications. In general, they can be put into a few categories.

Compression efficiency - MPEG-4 builds on previous standards by improved coding efficiency, increasing acceptance of MPEG-4 based applications.

Content-based interactivity - Representing video as an object rather than in video frames enables content-based applications. This, in-turn, provides new levels of content interactivity based on efficient representation of objects, object manipulation, bit stream editing, and object-based scalability.

Universal access - Because MPEG-4 is highly robust in error-prone environments, it can be used across a wide range of media, including mobile networks and wired connections.

STRUCTURE AND SYNTAX

An MPEG-4 visual scene may consist of one or more video objects, and each video object is characterized by temporal and spatial information in the form of shape, motion, and texture. Some applications may not be able to use all the MPEG-4 tools because of either the associated overhead or the difficulty of generating video objects. Here, MPEG-4 video allows coding of rectangular frames, which represent a degenerate case of an arbitrarily shaped object.

An MPEG-4 visual bit stream provides hierarchical description of a visual scene. Start codes, which are special code values, can access each level of the hierarchy in the bitstream. Hierarchical levels include:

* Visual Object Sequence (VS) : This is the complete MPEG-4 scene, and it may include any 2-D or 3-D natural or synthetic objects as well as their enhancement layers.

* Video Object (VO) : A video object is linked to a certain 2-D element in the scene. A rectangular frame provides the simplest example, or it can be an arbitrarily shaped object that corresponds to an object or background of the scene.

* Video Object Layer (VOL) : Video object encoding takes place in one of two modes, scalable or non-scalable, depending on the application represented in the video object layer (VOL). The VOL provides support for scalable coding.

* Group of Video Object Planes (GOV) : Optional in nature, GOVs enable random access points into the bitstream by providing points where video object planes are independently encoded.

* Video Object Plane (VOP) : VOPs are a video object sampled in time. They can either be sampled independently or dependently by using motion compensation. Rectangular shapes can represent a conventional video frame.

Although there are several ways a video object plane can be used, the most common way has the VOP containing the encoded video data of a time sample of a video object. Each VOP consists of macroblocks, each of which contains four 8x8 luminance blocks, and two 8x8 chrominance blocks.

MPEG4 TOOLS

Video Compression Tools

A video codec achieves compression by removing spatial and temporal redundancy.

Intra Coded VOPS (I-VOPs) are coded with information within the VOP, removing some of the spatial redundancy. Inter coding makes use of temporal redundancies between frames by the method of motion estimation and compensation: two modes of inter coding are provided for - prediction based on a previous VOP (P-VOPs) and prediction based on a previous VOP and a future VOP (B-VOPs). Both of the above coding techniques are based on previous video standards, with MPEG4 providing additional tools for increased compression efficiency, error resilience, and coding of different types of video objects.

Shape coding tools

MPEG4 provides tools for encoding arbitrary shaped objects. Binary shape information defines which portions (pixels) of the object belong to the video object at a given time, and is encoded by a motion compensated block-based technique that allows both lossless and lossy coding. Gray scale shape information is much like binary shape except that every pixel or element of the matrix can take on a range of values (usually zero to 255) representing the degree of the transparency of that pixel. Block-based motion compensation is also used to encode gray scale shape information, but only lossy coding is supported.

Sprite coding

A2

A sprite consists of those regions of a VO that are present in the scene, throughout the video segment. A `background sprite' is a typical example, consisting of all pixels belonging to the background in a camera-panning sequence. It is essentially a static image that can be transmitted only once at the beginning of the transmission. Sprites have been included in MPEG-4 primarily because they provide high compression efficiency in these instances. Sprite-based coding works well for synthetic objects, but it can also be used for objects in natural scenes that undergo rigid motion.

Scalability

In MPEG-4, spatial scalability and temporal scalability are put into service using several VOLs. In the example of spatial scalability with two VOLs, including a base-layer and an enhancement-layer, the enhancement-layer improves upon the spatial resolution of a VOP provided by the base-layer. Similarly, with temporal scalability, the enhancement-layer can be decoded if the target frame rate is higher than what the base-layer is offering. Consequently, temporal scalability smoothes the sequence's motion.

CONFORMANCE POINTS

Conformance points are the basis for interoperability and the primary force behind standardization. Products that share a common conformance point are by definition able to read and write each other's bitstreams correctly.

In MPEG-4 conformance points take the form of profiles and levels. A profile specifies a set of video tools, which can be selectively applied. Within the profile, a level determines constraints on parameters in the bit stream and corresponding tools.

The most basic profile, Simple Visual Profile, supports coding of rectangular objects with H.263 baseline tools as the minimum. In addition it provides for 4 motion vectors per macro-block, unrestricted motion vectors, ac/dc prediction and error resilience. The Simple Scalable Visual Profile supports temporal and spatial scalable objects in addition to the Simple Visual Profile tools. The advanced simple profile, one of the newer profiles, is based on the simple visual profile, with additional tools to improve compression efficiency.

Support for non-rectangular video objects is in the higher profiles - Core Visual and Main profiles. The Core Visual Profile supports arbitrary-shaped objects and temporally scalable objects in addition to Simple Visual Profile tools. The Main Visual Profile adds support for interlaced coding, semi-transparent, and sprite objects over Core Visual Profile tools.

Most applications to date, primarily take advantage of the compression and error resilience tools for rectangular objects, present in the Simple, Simple Scalable and Advanced Simple Profile. Hence we primarily discuss implementation issues related to these profiles.

IMPLEMENTATION ON DSP

Video Processing

In typical hand-held devices data is captured in RGB format at 30 fps. Prior to encoding it is conversion to 4:2:0 YCbCr format, usually via an intermediate format, such as 4:2:2. Typically the format conversion involves subsampling, which can be performed by a Digital signal processor (DSP). Likewise, an MPEG4 decoder output, which is in 4:2:0 format, is converted to RGB for display by a series of interpolation operations, some of which can be done on a DSP. The memory requirements for captured and decoded images are large, and as such typically stored in memory external to the DSP.

Intra coding

Intra coded VOPs (I-VOPs) are coded with information contained within the particular VOP. As in other image coding algorithms, the intra coding process involves performing a DCT (Discrete cosine transform), followed by quantization. Compression is achieved in quantization, where most of the higher frequency coefficients are set to zero, and as such need not be encoded. Both the DCT and quantization involve linear transformation or scaling, which is amenable to implementation on a DSP.

The quantized coefficients are predictively coded with that of neighboring blocks to further remove spatial redundancy, and then scanned to produce a set of zero runs and coefficients, which are variable length encoded.

Motion estimation

Motion estimation and compensation make use of temporal redundancies between frames, by predicting macroblocks from the previous reference frames as shown in Figure 1. Since this process has to be repeated at the decoder, a reconstruction of the previously coded frame, instead of the source frame, is used as the prediction candidate. A block matching method is commonly used for prediction whereby the best match from the reference frame is the one with the lowest error between itself and the current source macroblock being coded. This error obviously contains less information than the original macroblock, and as such can be encoded with fewer bits.

Figure 1: Inter-Frame Prediction

Block matching algorithm involves computing the absolute differences between the current source macroblock and candidate blocks of the reference frame. This operation can be implemented on a DSP. It typically takes up a significant portion of encoding time: hence, to reduce complexity, the matching is performed on a window surrounding the co-sited macroblock in the reference frame, which is smaller than the size of the entire frame. For CIF (352x288) sequences, a search range of +/-16 will give good quality picture. Due to limited DSP memory, typically only the reference "window" required for motion estimation is stored in DSP memory, while the entire reference frame is stored in external memory.

Several block matching algorithms exist; two of the more well know being the expensive exhaustive search algorithm and a lower complexity telescopic search algorithm.

An exhaustive algorithm performs search over all possible candidate blocks within the search window. The telescopic search performs matching in stages: at each stage, block matching on a sparse grid of candidate blocks is performed, and the best candidate vector is used as the starting point of the next stage where a denser grid is used. The telescopic search algorithm has much lower complexity than the exhaustive search, but slightly lower compression efficiency. When implementing an algorithm on DSP, this trade-off between quality and complexity is taken into account.

The advanced prediction mode of MPEG4 simple profile allows one motion vector per 8x8 block. Obviously, the selection of these vectors can require up to 4 times the computing power of the macroblock motion vector search algorithm: thus, to reduce complexity, the search can be done on a grid around the macroblock (16x16) motion vector, without sacrificing much quality. For improved quality, half pixel accurate motion vectors are are computed by predicting from the interpolated reference window. Interpolation of reference window is done in the vertical, horizontal and diagonal directions, with the "full pixel" 16x16 or 8x8 vector typically used as the starting address for interpolation.

Motion vectors calculated in the above stages are first differentially coded using vectors of three neighboring blocks. The motion vectors differences are then encoded using variable length coding.

It may occur that in fast motion sequences, or at the edge of a frame, a "good match" may not be found in the reference window, in which case the source macroblock is intra- coded.

Motion Compensation and Texture coding

In the motion compensation stage, the motion vector is used to copy the best matching block from the reference window. The residual error between this motion compensation prediction block and the current source macroblock is the then encoded by performing DCT, quantization and variable length encoding, as shown in Figure 2.

Figure 2: MPEG encoding

MPEG4 Simple Profile quantization is performed by dividing macroblock coefficients by a scale factor. The size of this scale factor influences the number of zero coefficients and the level of non-zero coefficients, hence the number of bits used to code a macroblock. An encoder can therefore control bit rate and quality by varying this quantization scale factor.

To enable predictive coding, the encoder contains a decoding loop consisting of the Inverse Discrete Cosine Transform and Inverse quantization, which reconstruct the current macroblock. This macroblock is later moved to the reconstructed frame store in external memory, which is used in encoding the next P-VOP.

Decoding

The decoder performs the inverse process of the encoder. The coefficients and overhead information is variable length decoded. Coefficients undergo inverse quantization followed by inverse discrete cosine transform to give the error coefficients. The Motion compensation step involves copying or interpolating the reference block corresponding to the motion vector. The motion compensated prediction is then added to the error coefficients to give the output. A diagram of these functions are shown in the Figure 3.

Figure 3: MPEG decoding

MPEG4 simple profile additionally supports unrestricted motion vectors, which allows the motion vector to point outside the VOP. Padding of the boundary coefficients prior to motion compensation enables this tool.

In typical simple profile decoder implementations, the transfer of reference window from external memory to DSP memory can account for a significant portion of the decoding time, because each macroblock can have up to 4 motion vectors (one for each 8x8 block), thus requiring 4 memory transfers per macroblock. Hence, much effort is placed in optimizing these memory transfers.

Rate Control

In typical applications, a video sequence must be encoded at a constant bit rate and have uniform quality across frames. However, due to the non-uniform complexity and predictability of a picture, the bit rate tends to vary at the local level. To enable constant bit rate, the encoder is allowed to have a buffer store, which can be filled at a variable rate by the encoder, but emptied at a constant rate during transmission. Similarly, the decoder has a buffer store of the same size as the encoder. The maximum size of the buffer is specified in the standard according to the level supported. The buffer is prevented from overflowing or underflowing by adjusting the bit rate. Methods of varying the bit rate include adjustment of quantization scale factor, skipping of frames and the use of stuffing bits. The larger the size of the buffer, the more variable the bit rate, hence the more uniform quality across frames.

Error resilience

Error resiliance is particularly important in error-prone environments like mobile communications.

MPEG-4 integrates several mechanisms for error resilience with different degrees of robustness and complexities. MPEG-4's four error resilience tools include resynchronization, data partitioning, header extension code, and reversible variable length codes.

1. Resynchronization: The most frequent way to bring error resilience to a bit stream, resynchronization consists of inserting unique markers in the bit stream so that in the case of an error, the decoder can skip the remaining bits until the next marker and restart decoding from that point on.

2. Data partitioning: Here, MPEG-4 separates bits for motion and dc coefficients from that of texture information, so that in the event of an error in the texture information, the motion and dc information can allow for better concealment.

3. Header extension code: With header code extension, optional redundant header information can be included for data decode. In doing this, the chance of corrupted header information and the skipping of large portions of bitstream are reduced.

4. Reversible VLCs: RVLCs are code words which can be decoded in forward as well as reverse. When an error occurs and a bit stream is skipped until the next resynchronization mark, it is possible to still decode portions of the corrupted bit stream in the reverse order to limit the impact of error.

IMPLEMENTATION TRENDS

While MPEG4 provides several video coding tools, most current applications use the simple visual profile tools.

The standard allows an encoder to selectively use tools within a profile. Therefore, for applications were computational power is a premium, an simple profile implementation with only H.263 baseline tools can give a 25% saving in computation complexity, while degrading quality by only 10%, over a complete simple profile implementation. On the other hand, the decoder has to support all tools within a profile.









EE Times TechCareers
Search Jobs

Enter Keyword(s):


Function:


State:
  

Post Your Resume
-----------------
Employers Area
Most Recent Posts More career-related news, resources and job postings for technology professionals


Sponsor Links

 

All materials on this site Copyright © 2008 TechInsights, a Division of United Business Media LLC.
All rights reserved.
Terms and Conditions | Privacy Statement | Your California Privacy Rights