Computer graphics research has always pushed the boundaries of software and hardware, forcing us to ask the question: "If the technology were fast enough, what else could be done?"
Thankfully, as hardware and software technologies advance, some of these questions can now be answered. New-generation Graphics Processing Units (GPUs) with dedicated high-speed memory pipelines are capable of calculating billions of operations per second, and enable the development of unbelievably complex effects through their powerful shading instruction sets. Through the use of their integrated dual vertex and advanced pixel shaders, graphics rendering engineers (like nVidia'sGeForce4 family) have unleashed ferocious graphics power on many of the computer graphics research topics currently being studied.
The power behind the new-generation GPUs can be understood in their inherent ability to solve one of the classic computer graphics problems of all time: realistic rendering of hair and fur. Animals that have skin textures have traditionally been easier to render than fur. For example, the dinosaurs in Jurassic Park were easier to make lifelike and believable than the animals in the safari movie Jumanji. Trying to render hair and fur in real time has historically resulted in animals that looked plastic, stiff, and constrained. Computer-generated hair and fur should give the appearance of texture and movement, and it should show off the subtle reflections and light absorption that real hair or fur would in a given situation.
A modern graphics engine - one which includes dual vertex shaders, pixel shader pipelines, a 3D textures library, shadow buffers and z-correct bump mapping - makes it possible to render complex, high quality hair and fur at very fast frame rates.
California Institute of Technology's Fuzzy Bear
In the early days of research, scientists at the California Institute of Technology created a new approach for rendering hair and fur by using a 3D hair texture volume. This 3D texture would have a surface frame, and would use normal tangent and binormal 3D vectors to calculate lighting and reflections. But the parameters of the lighting model would be freely distributed throughout the volume of hair. This volume was then rendered in software by walking through the volume and accumulating the densities along the viewing vector (effectively, a ray-casting approach).
Figure 1. The Fuzzy Bear
This approach produced the "Fuzzy Bear" picture, which became a reference of sorts in the computer graphics industry back in 1989. One of the interesting facts to realize is that when this research, a single frame took over two hours to render, running on a network of large IBM mainframes, containing 12 IBM 3090 processors and four 3081 processors. And, while the bear was a graphics breakthrough for its time, the rendering time of two hours per frame was not acceptable for any real-time applications; that is, for anything depicting motion.
In early 2001, researchers at Princeton University and Microsoft Corporation published a paper on the subject of real time fur. The principal contribution of this paper was the concept of rendering the 3D fur volume by generating a series of concentric "shells". These shells were created by scaling the base "skin" layer along the vertex normal. The textures for these shells would be created from samples of the 3D hair texture at different depths in the volume. These textures would then be applied to their corresponding concentric shell layers, and the concentric shells would then be blended together to produce the final appearance of hair or fur.
Lastly, in addition to the shell geometry, the researchers added the use of specially created fin geometry and textures to help enhance the object if viewed in silhouette. These fin textures are used to produce the wispy bits of hair and fur that add realism to the animal, especially when viewed in silhouette.
The result of their research became known as "Furry Bunny" - a rendering containing 5,000 faces (or Polygons), 7,547 edges, and 306 patches. (In motion on an nVidia nfiniteFX II engine, this can run between 12 and 23 frames per second, depending on the number of concentric shells that were turned on).
Figure 2. The Fuzzy Bear
Research on hair and fur rendering underlined two key technologies: vertex shaders and pixel shaders. Ordinarily, every object in computer graphics image is a "wireframe" made of a certain number of triangles. The "Vertex" is the corner of the triangles that make each 3D-object. Each vertex carries a lot of information being rendered, including the 3-dimensional coordinates x, y, z and w (weight), and color data coded in the common 'RGBA' format (for "red," "blue," "green" and "alpha." There are also texture coordinates s, t, r and q, which represent the texture and its position for the vertex. A vertex can in fact have several texture coordinates where more than one texture is applied to it. Additionally, there might be fog as well as point size information and even more.
Figure 3. Vertex shaders cull tiling information (lighting, texture, etc.), while Pixel shaders tailor each dot on the screen
Figure 4. The vertex refers to the corners of tile, and includes weighted RGB values
Programmable vertex shaders enable developers of 3D applications to apply special programs - color, position, lighting, fog and texture - to each vertex of an object or to a complete scene (frame) that are executed on the graphics processor without requiring CPU resources. Within the coming year, even commodity graphics hardware will have programmable vertex shaders. Such shaders will be ideal for accelerating shell rendering, since they can directly evaluate the geometric offsets between shells, and implement them with a small number of instructions.
Programmable pixel shaders do the rendering of the actual pixels that make up the 3D image. From the vertex shader, the vertices of the 3D-scene is transformed and lit. The pixel shader performs clipping, which removes all vertices of the scene that are not within the area of the screen. Back Face Culling removes all vertices that are facing 'back', away from the viewer, and thus will not show up on the screen. The viewport mapping is finally transforming the x and y coordinates of the vertices to viewport coordinates. Triangle setup is where the life of the vertices ends and the life of the pixels begins. It also marks the change of the 3D-scene from 'real' 3D to 'virtual' 3D or 2D. (The computer screen is only 2D after all, so the final frame has got to be 2D as well.) Future programmable pixel shaders may be able to perform per-pixel lighting, which would be useful for wavy and curly hair patterns.
Figure 5. A hardwired texture engine speeds the calculation of pixel values from tiles. A per-pixel processor would render hair curls and kinks
Figure 6. Texture mapping becomes part of the rasterization process, one that paints the CRT screen pixel-by-pixel
NVIDIA's Wolfman
The NVIDIA GeForce4 GPU family and nfiniteFX II engines are among the first graphics processors to render fur with per-pixel lighting realistically in real time. This capability can be applied to complex animated characters and run at high frame rates.
The nfiniteFX II engine's dual vertex shaders are able drive more than 100 million vertices per second. An nVidia graphics characters called "the Wolfman" contains over 100,000 polygons, and is more than 20 times the complexity of the "Furry Bunny" research project of a short year ago. (Figure 7 shows a wire mesh of the Wolfman before the concentric shells have been applied; the inset version is fully rendered.)
Figure 7. Wolfman Wiremesh
The Wolfman uses eight concentric fur shells. The color and density of the fur is controlled using a separate texture map that covers the entire body, which gives the fur its distinct look, rather than a uniform pattern. A pixel shader's support for multiple textures accelerates this type of rendering. One of the unique properties of stranded material such as hair and fur is that it reflects light more in some directions than others. This is known as "anisotropic" lighting, and is computationally expensive to reproduce. It requires ferocious graphics power.
Figure 8. Wolfman
Table 1 shows the content and performance of various fur projects:
Table 1. Furry Creatures Comparison Chart
Thus, the Wolfman is not a mere static model. Rather, it is a completely skinned animation on top of a 61-bone skeleton. The complexity of this model - eight layers of vertex deformations - is on par with that used in television and film special effects production. Each and every vertex of the skin, fur layers, and fin geometry are deformed in real-time to match the movement of the underlying skeleton.
As hardware and software technologies evolve, so do the research and remedies. In the case of rendering hair and fur, Graphics Processors engines provide a vehicle to approach the latest research and display the results in real time. In addition, these renderings occur at very high frame rates with stunning antialiased visuals.
A machine like the nfiniteFX II, with dual vertex shaders, is capable of driving the complex type of geometries required for a skinned character like the Wolfman (100k+ polygons); advanced pixel shaders help finish the job.
To view a streaming movie of the Wolfman demo, visit:
http://www.nvidia.com/view.asp?PAGE=power_demos
A Glossary of Computer Graphics Terms
Bit Depth
The bit depth refers to the number of bits of precision for the color and z-values associated with each pixel on the screen. More bits of precision improve the visual realism and accuracy of the rendered frame. The two most common bit depths in modern graphics hardware are 16-bit and 32-bit. Each of these values can be associated with color or Z-values. Color that is 32-bit (for example) typically is used to represent red, green, blue and alpha (or transparency) values with up to 8 bits per component, or 256 "values" for each of those components. A 32-bit z-value is typically allocated as 24-bits of Z precision (or depth precision) and 8 bits of stencil or "mask" precision.
Depth Complexity
Depth complexity is a measure of the complexity of a scene. It refers to the number of times any given pixel must be rendered before the frame is done. For example, a rendered image of a wall has a depth complexity of one. An image of a person standing in front of a wall has a depth complexity of two. An image of a dog behind the person but in front of the wall has a depth complexity of three, and so on. As depth complexity increases, more rendering horsepower and bandwidth is needed to render each pixel or scene. The average depth complexity of today's graphics applications is two to three, meaning that for every pixel you end up seeing, it gets rendered two or three times by the graphics processor.
Fill Rate
Fill rate is the rate at which pixels are drawn into the screen memory. Fill rate is a common measure used to illustrate the pixel processing capabilities of today's 3D graphics processors. Fill rate is usually measured in millions of pixels/sec. (Mpixels/sec.) In 1997, 50-70 Mpixels/sec. was considered state of the art. In 2002, the leading 3D graphics processors will be capable of more than 1200 Mpixels/sec. While this improvement is an incredible achievement, it is still barely enough to create a compelling 3D environment. Rendering pixels at such a high rate consumes enormous amounts of memory bandwidth.
Frames per Second
Frames per second (fps), or frame rate, refers to how many times per second the scene is updated by the graphics processor. Higher frame rates yield smoother, more realistic animation. It is generally accepted that 30fps provides an acceptable level of animation, but increasing the performance to 60fps results in significantly improved interaction and realism. Beyond 75fps it is difficult to detect any performance improvement. Displaying images faster than the refresh rate of the monitor results in wasted graphics computing power, because the monitor is unable to update its phosphors (or display) that fast, wasting frame rate beyond its refresh rate.
Memory Bandwidth
Memory bandwidth refers to the rate at which data is transferred between the graphics processor and graphics memory. Memory bandwidth limitations are one of the key bottlenecks that must be overcome to deliver truly realistic 3D environments. To deliver truly stunning 3D requires high-resolution, 32-bit color depth at high frame rates, with rich geometry, sophisticated texture mapping, and complex vertex and pixel shading.
Resolution
Resolution is the number of pixels on a screen. Higher resolutions can create a more realistic 3D environment because more scene detail can be displayed. Most modern displays are capable of at least 1280 horizontal pixels x 1024 vertical pixels, while many larger or more expensive displays are capable of 2048x1536 pixels. Most graphics applications support a variety of resolutions, allowing the end user to run at higher resolutions (and hence higher level of detail) with the trade-off being increased load on the graphics processing system.
Texture Mapping
Texture mapping is the technique of projecting a 2D image (typically a bitmap) onto a 3D object. Texture mapping allows substantial increases in visual detail without significant increases in polygon count. Because of the improved realism that can be obtained with a very small increase in computational cost, texture mapping is one of the most common techniques for displaying realistic 3D objects. In order to render a texture-mapped pixel, the texture data for that pixel needs to be read into the graphics processor, consuming memory bandwidth.