The convergence of fourth-generation multimedia standards (MPEG-4) and third-generation (3G) wireless standards could result in the next great killer application, mobile entertainment devices that generate and deliver high-quality multimedia streams. Additionally, MPEG-4 is the enabler for high-bandwidth video over the Internet, which to date has been restricted to wire-line implementations.
Several attempts for such video capabilities have been made but with little success. Designers learned from these deployments that the additional silicon area required to implement the multimedia standard resulted in devices that were bulky and power hungry, draining batteries too quickly. Furthermore, the 2G standards allow only a 9.6-kbit/second data link that produces very small images with frame rates of 1 to 2 frames per second. Consequently, these first 2G designs have not been popular with consumers.
To combat these problems, next-generation devices will use 2.5G and 3G wireless standards that allow data rates from 64 to 384 kbits/s in a mobile environment and up to 2 Mbits/s in an environment where the device is not moving. The greater computing complexity of the 2.5/3G standards allows for better usage of the available wireless spectrum, such as adaptive modulation in the 2.5G Enhanced Data Rate for Global System for Mobile Communication Evolution (Edge) standard, along with better error correction, such as turbocoding in the 3G standards.
In addition, MPEG-4 will be used as the multimedia standard of choice for high-bandwidth video. MPEG-4 comes with the essential tools and features to increase bandwidth reliability and reduce errors. Its built-in error-resilience capabilities include flexible resynchronization markers, data partitioning, header protection, reversible variable-length coding and forced intraframe refresh. Moreover, MPEG-4 has superior video and audio compression than previous standards that result in either lower data rates or higher image quality.
However, while both the MPEG-4 multimedia standard and 2.5/3G wireless standards exist, the implementations are immature. Thats because it is unknown which implementation is best and what subset of the standards will be deployed. But most critically of all, 3G and MPEG-4 present more computational complexities than early 2G experiments, causing the specter of bulky, power-consuming devices with limited features. This status quo poses a host of difficult challenges for wireless system engineers developing mobile devices for high-bandwidth video over the Internet.
Armed only with conventional ASIC, DSP and microprocessor chip technologies, the system engineer faces a host of complex design issues. A large variety of features and functions can potentially be implemented, as well as a multiplicity of standards; however, based on current IC technology and design practices, the system engineer can select only a small subset for step-by-step incremental development.
MPEG-4 is an object-based standard in which audio/visual scenes are composed of different scalable objects. A major aspect of MPEG-4 is profiles and levels that dont define the complexity per individual MPEG-4 object, but rather provide bounds on the total of all objects in the scene. A profile defines the set of a certain type of tool that can be used in a certain MPEG-4 terminal.
There are audio, visual, graphics, scene description and object descriptor profiles. A level is a specification of the constraints and performance criteria on these profiles and, thus, on the corresponding tools.
Profiles can contain more than one object, and each object can be of a different nature. Therefore, the concept of an object type is introduced as an intermediate level of definition between tools and profiles. Object types define the tools needed to create an object in a scene and how these tools can best be combined.
Media profiles define the kinds of audio and visual objects that the MPEG-4 terminal needs to decode and hence, give a list of admissible elementary stream types. Media profiles describe object types used to create a scene and the tools used to create those object types.
Visual profiles use five visual object types to represent natural video information: Simple, Simple Scalable, Core, Main and N-bit.
The Simple object type, the least compute intensive, is an error-resilient, rectangular natural video object of arbitrary height/width ratio developed for low bit rates. It uses simple and inexpensive coding tools based on intra (I) and predicted (P) video object planes, or VOPs, the MPEG-4 term for frames.
A scalable extension of Simple, the Simple Scalable object type gives temporal and spatial scalability using Simple as the base layer. The Core object type uses a tool superset of Simple to give better quality through the use of bidirectional interpolation (B-VOPs). It has binary shape and supports scalability based on sending extra P-VOPs. The binary shape can include a constant transparency, but excludes the variable transparency offered by gray-scale shape coding.
The Main object type is the video object that gives the highest quality. It supports gray-scale shape, sprites and interlaced content in addition to progressive material. Lastly, the N-bit object type is equal to the Core object type, but can vary the pixel depth from 4 to 12 bits for both the luminance and chrominance planes.
Main is the most complex and compute-intensive profile. Still, there are many profiles and levels between Simple and Main. For example, the Main profile at Level 3 corresponds to standard-definition TV. Main profile at Level 4 requires 21 times as many Mips as the Simple profile; 23 times as much memory; and 26 times as much bandwidth. As a baseline, a 160 x 160-pixel, full-color, 15-frame/s encode and decode session (for example, a mobile video teleconferencing application) requires more than 1.3 billion operations/s of a conventional DSP. As a reference point, a 160 x 160 screen size is considered the low-end range for PDAs; PDAs currently shipping sport sizes of 320 x 220 pixels.
The high computational power and design complexity dictate to the system engineer a choice of either placing a fixed-function silicon intellectual-property (IP) core on the die or using a very high-powered RISC/DSP array. Each has its disadvantages. Adaptive-computing IC technology is emerging as a means of providing system engineers a more efficient and flexible design route than conventional IC best practices of mixing DSPs and fixed-function silicon accelerators.
There are currently about 13 MPEG-4 profiles, expanding to over 20 within the next 18 to 24 months. The majority of current-generation MPEG-4 implementations implement only one profile, with the latest announced IP cores now implementing only two. The decision about which to implement is critical and the correct subset will not be known until mass deployments.
The wireless network infrastructure is also undergoing changes that directly affect mobile devices. At present, the 2G wireless standards include IS-95 and IS-136, Global System for Mobile Communication (GSM) and Pacific Digital Cellular (PDC), all with very low data rate, typically 14 kbits or below. Each standard has its own set of infrastructure basestations and mobile terminals.
In Europe, some older 1G legacy systems still exist but the predominant standard is GSM. The carriers in Europe have plans to deploy higher-data-rate versions of GSM, including High Speed Circuit Switched Data, General Packet Radio Service (GPRS) and Edge, as well as wideband CDMA (UMTS-2000).In Japan, the older standards PDC, Personal Handyphone System (PHS) and IS-95 exist, but the higher-data-rate cdma2000, data-only PHS and UMTS 2000-based 3G are being deployed. The United States is evolving from IS-136, GSM and IS-95 into cdma2000, wideband CDMA, IS-136 Edge, and GSM/GPRS and GSM/Edge. Each of the higher-speed services is meant to deliver MPEG-4.
Forklift upgrades of these new network overlays are expensive, but they can be implemented over a short time frame. The catch is that mobile devices have about an 18-month life span and many remain on the market for as long as five to six years. Therefore, its difficult to deploy new network overlays when millions of wireless customers have devices operating under earlier standards.
Another issue is the computational intensity newer infrastructures demand. The 2G standards enabled data capabilities from 9.6 to 14.4 kbits/s. As demand for higher data rates emerged in the mid-1990s, 2.5G standards evolved with higher data rates enabled by multislot data and different modulation schemes. GSM is now being enhanced through GPRS, which allows data rates up to 170 kbits/s, and Edge, which allows GSM operators to use the existing GSM bands to offer wireless multimedia Internet Protocol-based services and applications at a rate of 384 kbits/s, with a bit rate of 48 kbits/s per time slot, and up to 69.2 kbits/s per time slot under good RF conditions.
Edge satisfies the bit rates of 384 kbits/s for pedestrian and low-speed vehicular movement, 144 kbits/s for high-speed vehicular movement and 2 Mbits/s for indoor wideband. However, GSM-Enhanced GPRS at 384 kbits/s requires an extremely high 2,500 Mips of processing power.
The additional bits per megahertz have severe implications on the computational power required in both the mobile device and the network infrastructure. For example, in wideband CDMA, three algorithms used to gain higher spectrum utilization are matched filter, blind minimum mean square error (MMSE) and exact decorrelator. Each has extremely high computational requirements. Depending on how complex the MPEG-4 design is, one to three orders magnitude of computational Mips can be burned up based on the algorithmic scheme used.
For instance, 124 multiplications are required to perform matched filter using an 8-bit word length, 496 to perform blind MMSE using 12-bit word length and 230,000 for exact decorrelator using 16-bit word length. These higher-computational-power algorithms, as well as others in 3G, bring the total requirements for 3G to roughly 11,000 millions of operations per second (Mops). As a comparison, this is more than 10 times as many computations as the 2G GSM standard.
Based on the Mops computations and the power dissipation equation, system engineers must make some difficult decisions. Placing algorithms in a DSP provides flexibility but at the cost of higher power dissipation. Placing algorithms in fixed-function siliconfor example, an ASIClowers the power dissipation but at the cost of flexibility. Because of these trade-offs, most first-generation 3G chip sets consume significantly more power and silicon real estate than 2G designs. Unfortunately for the consumer, this has meant more expensive, bulkier mobile devices and short battery life.
The adaptive-computing machine (ACM) is a new class of IC that offers higher performance, lower power consumption, smaller size, lower cost and a remarkably high level of adaptability as compared to conventional, rigid IC technologies.
Adaptive computing is extremely specialized because it brings into existence, for as long or short a time as required (clock cycle by clock cycle if needed), the exact hardware implementation that the software requires. This results in higher efficiency in executing the specified algorithms, which manifests itself in lower power dissipation, high performance and software flexibility.
As an example of efficiency, the discrete cosine transform algorithmic element in the MPEG-4 video codec takes approximately 115 cycles on average to perform when implemented on a conventional DSP; an ACM-based implementation takes 10 cycles on average.
The ACM also plays a major role in minimizing problems associated with adding network overlays. ACM-enabled devices provide the user with multi-mode operation so that changing the network overlays doesnt pose a problem. The users mobile device rapidly adapts to the new standards and modes because the ACMs circuitry adapts on the fly, resulting in high-quality video-on-demand.
Computational power efficiency (CPE) provides an analysis of the ACMs efficiency as compared with conventional microprocessor, DSP and ASIC approaches. CPE is a metric that illustrates the advantages and disadvantages of todays design techniques. It is defined as the ratio of the number of gates actively working to solve the given problem to the total number of gates in the device.
The CPE of a typical DSP or microprocessor is about 5 to 10 percent. This means that only 5 to 10 percent of the gates on a DSP or processor are performing real work at any one time; the remaining portion is overhead. This overhead keeps busy the small piece of the DSP or processor performing the real work.
The CPE for an ASIC averages about 25 percent, a 2.5-times advantage over the DSP solution. But the ASICs gains, which can either increase performance or reduce power dissipation, come at the expense of flexibility. Any changes not anticipated during the design cycle of the ASIC result in a time-consuming and costly respin of the entire ASIC.
The CPE for an ACM is approximately 60 to 70 percent. This means algorithms no longer need to be changed to fit the predefined hardware architecture that exists on a processor. Instead, the optimum hardware needed for an algorithm comes into existence, accomplishes its task extremely fast and then goes away. Since the CPE metric is independent of silicon processing technology, an ACM will always be more efficient than a DSP or ASIC at the same silicon process geometry.
With streaming video and its more advanced profiles demanding greater computational power, savvy designers will soon begin targeting the best CPE possible by relying on newer, more efficient technologies such as ACM. Otherwise, system engineers holding on to conventional DSPs, ASICs and microprocessors will continue building less efficient designs by staying with their step-by-step incremental developments.