Graphics Wisdom, Part IV

by Len "Viking1" Hjalmarson

Article Type: Hardware - Graphics
Article Date: June 20, 2002
Links / Files: Click Here


GeForce Series
With Matrox bowing out of the 3-D accelerator business about two years back, ATI and NVIDIA have had the playground to themselves. That will all change this year, with the introduction of Matrox Parhelia 512, and the introduction of 3DLabs's P10.

Roughly two years ago, NVIDIA introduced their GeForce line of video accelerators with integrated transformation and lighting. This was almost a paradigm shift in game acceleration, and NVIDIA properly identified the shift by labeling the GeForce2 series as “GPU,” Graphics Processing Unit.

The GeForce2 (GF2) series topped out with the GF2 Ultra, with 64MB of main memory and running nearly 8000 3DMarks on a 1 GHz class CPU.

GeForce3 (GF3) arrived in 2001 with more robust features, nearly completing the transition to a complete GPU. The programmable pixel shaders and vertex shaders of GF3 and DirectX 8 fame offer developers greater flexibility than they have ever had in the past. Furthermore, antialiasing has become a standard rather than an extra—most gamers demand some form of antialiasing while playing their favorite games.

While GF3 was revolutionary, GeForce 4 (GF4) is evolutionary, building on the features while improving the speed and efficiency and memory bandwidth. GF4 arrives in four major variants, from the flagship Ti4600 with 128MB to the GF4 MX variant with 64MB.

GF4 is somewhere around 40 percent more powerful in game applications than GF3, and is more than 25 percent more efficient at ignoring textures that will be masked (occluded) in the final scene. Furthermore, antialiasing finally lives up to the hype.


From GPU to VPU

3DLabs P10

3DLabs could have labeled their new chip an advanced GPU, but they feel it is distinctive enough to warrant a new acronym: VPU. To 3DLabs, current GPUs aren't flexible enough, and their VPU (Visual Processing Unit) identifies a new approach to the problem.

What makes the P10 VPU so different from existing graphics solutions is not new pipelines, but rather what happens at each stage in the 3-D pipeline.

The most obvious improvement can also be claimed by Matrox Parhelia: the use of a 256-bit memory bus. As with Matrox's new chip, 3DLabs claims more than 20 GBs of memory bandwidth, twice that of a GF4.

The other noticeable difference is the addition of a stage labeled as the Command Processor. The full specifications are as follows:

  • 0.15-micron manufacturing process (same as GF4)
  • 76M transistors (fewer than Parhelia but more than GF4)
  • 860 ball HSBGA package
  • 4 pixel rendering pipelines, can process two textures per pipeline
  • 256-bit DDRAM memory interface
  • up to 256MB of memory on-board
  • AGP 4X support
  • Full DX8 pixel and vertex shader support


The P10's 3-D pipeline

Command and Vertex Processors
The new Command Processor is dedicated to handling multiple 3-D threads. This doesn’t happen in today's games, but in future operating systems it will. When Longhorn arrives (the codename for Microsoft's next-generation UI for Windows) and desktop DirectX acceleration becomes the norm, we will have different threads being sent to the graphics card. The Command Processor is designed to efficiently manage them.

The most powerful GPU currently is the GF4, which has two vertex shaders. 3DLabs's P10 has sixteen 32-bit floating-point geometry processors that handle vertex processing. Unfortunately, it isn’t possible to make a straight comparison between NVIDIA’s approach and the P10. If we converted the GF4 vertex shaders to 3DLabs's approach, we could claim that the GF4 has four vertex processors.

Instead of using very powerful dedicated units, 3DLabs uses sixteen 32-bit scalar vertex processors (VPs). Each one of these processors can complete a scalar operation in one clock cycle, but they take four clock cycles to complete a typical vertex operation. However, with sixteen of these VPs in parallel, the P10 has the potential to run at twice the vertex processing speed of the GF4. Alternatively, one could have lighting of double the complexity and run the scene at the same speed. From the standpoint of the programmer, the software only sees a single virtual vertex processor.

Even better, with DX9 in the offing, 3DLabs believes that they will be able to claim full support for the new vertex shader specification in DirectX 9.


P10 and "Pixel Shaders"
One key distinction between NVIDIA’s GF4 and the 3DLabs P10 is in pixel shaders. 3DLabs claims that ATI and NVIDIA’s method allows flexibility, but falls short of real programmability.

3DLabs admits that NVIDIA’s implementation was fairly powerful, but they felt they could do better. As a result, the P10 offers a unique solution. The P10's programmable texture processor supports current DX8 pixel shader specifications but, like their vertex engine, the pixel shader is composed of a number of 32-bit processors in parallel.

The P10 has sixty-four floating point processors that help determine texture coverage, while another sixty-four integer processors are used to calculate final pixel colors. All of these processors are fully programmable.

Although the P10 offers enhanced DX8 Pixel Shaders, the P10 isn't a true DX9 part. DX9 requires floating-point pixel pipelines, and this is not true of the P10. A much higher number of gates would be required, and 3DLabs estimates that they will need to be on at least a 0.13-micron process and maybe even a 0.1-micron process before this will be possible and economically feasible. This means that we are likely to see the P15 (or similar designation) in 2003 with full floating point pixel pipelines.

This gets more interesting when you consider NVIDIA’s placement. With the NV30 (GF5) due in the fall, NVIDIA will reach full DX9 compliance and may take back the lead they lose to the P10 and Parhelia 512 this summer.


And That’s Not All
In comparison to GF4, the the P10's programmable texture processor can apply 8 textures in a single pass. ATI’s Radeon 8500's can apply 6, and the GF4 manages 4.

But the final stage in the P10's pipeline is also programmable, a unique feature to the VPU. No other chip is built like this. After a pixel has been textured and filtered processes like antialiasing are applied. In all other GPUs these final processes are hard-wired in the pipeline. In the P10 they are not.

Obviously, this greatly increases the flexibility of the architecture. The P10 supports any type of antialiasing out there, and any other you can dream up. Edge antialiasing is supported under OpenGL, a method which offers much better performance, and has already been hard wired into Matrox new Parhelia 512. Supersampling and multisampling are also supported.

Thanks to the programmable pixel stage the P10 can support much higher color depths, even 64-bit color. Like the Matrox Parhelia, this includes support for gamma-correct 10-bit RGB outputs.


Virtually Yours
Finally, a new application of existing technology is found in 3DLabs VMS (Virtual Memory System). VMS works by storing all textures in main memory and treating the memory on the graphics card itself like a large dedicated cache.

In traditional memory architectures, you may be flying through some hills and the textures for the area are requested. Instead of the entire texture being downloaded, a 256x256 block of 32-bit pixels can be pulled in locally and accessed. This is far more efficient, since in most 3-D environments only a small part of a texture will actually be visible on the screen. With the P10's VMS the entire texture will remain in system memory, but only the part being seen will be transferred to video memory.

Game developers like John Carmack are applauding this change, seeing it as a long awaited revolution. VMS will enable the use of far larger textures than is currently possible. The revolution is comparable to when Microsoft freed its OS from the restrictions of RAM by going to virtual memory on the hard disk. VMS will enable developers to deal with much larger textures than we have yet seen.


Conclusion
3DLabs actually developed the P10 for high level markets, but Creative Labs recognized that the technology could easily be adapted to break back into the game and consumer markets. Creative purchased 3DLabs and has stated their intention to release a gaming board in the fall.

With performance likely to surpass GF4, the P10 will become Creative’s competition for NV30 this fall. This is a huge benefit for virtual pilots, since it means that NV30 (GF5) may be introduced at a cost lower than the introduction of GF4. It also means that both NV30 and the Matrox Parhelia 512 will arrive sooner rather than later.

These developments are also welcome from another standpoint. When 3dfx was purchased by NVIDIA, their technology and research went along with the company. While NVIDIA benefited directly by the acquisition, the R&D process inevitably narrowed a bit. With both Matrox and 3DLabs continuing to develop hardware for high-end graphics and gaming applications, we’ll see the pace of hardware advances increase again.

The P10 is a powerful chip. Just as NVIDIA rightly considered their GPU a revolution, so the P10 offers some revolutionary technology, and 3DLabs is right to offer a new acronym in their VPU. The technology will really come of age in 2003 when the entire pipeline becomes programmable and reaches full DX9 compliance.

These advances won’t benefit the games of 2002, but in 2003 and following as games become fully DX8 compliant, and then later DX9 compliant, we may see a graphics revolution as great as we have seen from 1999-2001. The sky is the limit. And that is just where virtual pilots want to be.


Notes: ATI R300
ATI had a sample of their latest RADEON board running at Computex in Taipai at the VIA booth (though there weren’t very happy when they found out it was there). Since the chip is already up and running it is likely to be a 0.15 micron part rather than a 0.13 micron part. This fueled speculation that it would not be a fully DX9 compliant solution, and instead something of a hybrid like the Matrox Parhelia.
However, it seems that somehow ATI has pulled it off. There is now good reason to believe that ATI will be the first to market with a fully DX9 compatible video board. The R300 will sport 107 million transistors, about 40 percent more than NVIDIA’s current leader. NVIDIA’s next part, the NV30, is expected to sport 105-110 million transistors. Fill rate is expected to be around 2.0 Gpixel’s per second, which is lower than the expectation for the NV30.

While no testing was possible, the memory chips on the card were rated at 2.86ns and 350MHz, giving an effective memory clock of 700MHz. Rumor is that the chip has a 256-bit memory bus, again like the Parhelia-512 and 3DLabs P10, resulting in 22.4GB/s of raw memory bandwidth.

With parts running on silicon this early, it seems reasonable to expect to see production boards this summer.



Graphic Card Resources

Articles:
Companies:

 Printer Friendly

© 2014 COMBATSIM.COM - All Rights Reserved