ATI's RADEON 9700: Do Ya Wanna Revolution?

by Len "Viking1" Hjalmarson

Article Type: Review
Article Date: August 09, 2002

Product Info

Product Name: ATI Radeon 9700
Category: Hardware: Video
Manufacturer: ATI
Release Date: August 19, 2002
Sys. Spec: Click Here
Files & Links: Click Here

* * *



You Say You Want a Revolution?
3dfx did it in 1996.

NVIDIA did it in 1999.

This year, ATI has done it. With the RADEON 9700, ATI takes 3-D game acceleration into the stratosphere.

3dfx introduced the world’s first dedicated 3-D accelerator for games in 1996. The first simulations that were enhanced under GLIDE, the dedicated 3dfx API, were simply astonishing. Suddenly we were running 16-bit color at 640x480. EF2000 under Glide was transformed from a good experience to an outstanding experience, and with frame rates at 25 fps or better.

NVIDIA introduced the first video hardware with onboard transformation and lighting in 1999. Simulations like Jane's WW2 Fighters and USAF suddenly looked better and flew faster. NVIDIA’s GeForce series effectively created a dual processor system with a CPU and “GPU,” or graphics processing unit.

What was left to do? With dedicated graphics processors already handling most of the work, where would the next revolution take us?

This time the quantum leap in 3-D acceleration comes not in a new concept, but from taking proven concepts to perfection, and extending others to their natural conclusions. ATI leaps to the front of the 3-D gaming world with the first DX9 compliant part, fully programmable, and with nearly twice the transistors of the current leader.

ATI has been around for as long as PC games themselves. Founded in 1985, ATI was making gamers smile long before 3dfx or NVIDIA existed. With their experience in the industry, it isn't a great surprise that these technological wizards make NVIDIA’s GeForce series look like yesterday’s news.

Logging anywhere from 50 to 250 percent superior performance to the GeForce4 Ti4600, ATI’s RADEON 9700 redefines the race.

Better still, antialiasing never looked this good, and the performance penalty is the smallest yet. Image quality is also improved, with improved precision in pixel shaders and the ability to apply sixteen textures in a single pass (compared to 6 for GeForce4). This is like taking that old VW 1600, doubling its capacity to 3.2 liters and adding a turbocharger!


It’s About Time
In truth, it’s about time that ATI came through. They have been promising to take the lead for two or three years now, but each time they have come up short. The sheer pace maintained by NVIDIA has been incredible, and even the RADEON 8500 wasn’t up to the challenge. Furthermore, ATI drivers have lacked a certain panache, while the driver building team at NVIDIA has shown both ability and determination.

ATI has formally introduced two new products on the 20th of July: the evolutionary RADEON 9000 and the revolutionary RADEON 9700. The RADEON 9700 is the stuff of which revolutions are made and is the first ATI part to relegate NVIDIA dominance to history.

The new RADEON is a next generation product. One industry watcher has compared the technological leap from the 8500 to the 9700 as if NVIDIA had moved from the GeForce 256 straight to the GeForce3!


The RADEON 9700
The big challenge of a new design is adding greater power now and not only future promise. This may prove to be the golden thread for ATI, leading them out of the lackluster backroom maze of 3-D gaming and into the blazing light of the future.

The RADEON 9700 not only promises full DX9 compliance, thus ensuring it will be a workhorse both this year and next, but also offers substantial performance gains in current game environments. Just how much performance gain? Anywhere from 50 percent to a whopping 250 percent, depending on the game and the video settings. But before we look at specific numbers, let’s take a closer look at ATI’s technology. The following chart illustrates the main features of the GeForce4 Ti 4600 and the new RADEON 9700.

Feature Comparison

Turbocharged Rendering Engine
The chart tells all. The RADEON 9700 doubles the vertex shader pipelines, doubles the pixel pipelines, has a transform rate 250 percent greater than GeForce4, a fill rate more than twice that of GeForce4, and an antialiased fill rate 300 percent greater than GeForce4. Whew!

The Vertex Shader is composed of eight parallel 128-bit pixel pipelines. That’s not all…each pipeline also has its own independent texture unit and pixel shader engine, allowing the texture unit to handle up to 16 textures per pass.

There aren’t any games that can take full benefit of this ability, but those games will be quick in coming. In the meantime, the additional horsepower does benefit current games.

The 2.6 GB/second pixel fill rate is dependent on the clock speed of the RADEON 9700, and all those who have previewed the new board have made it clear that the clock speed has not yet been written in stone. Review boards are shipping at 325MHz, and ATI has stated that production boards will ship at a minimum of 315 MHz. All indications are that ATI will reach their goal and ship the new board with a main clock of 325MHz.

How in the world are they running with a clock setting this high?

The RADEON 9700 is a .15 micron part, the same as GeForce4 and Matrox's new Parhelia 512. The Parhelia is not a DX9 part, but sports close to 90 million transistors. Its performance is around two-thirds that of GeForce4, while the RADEON 9700 exceeds GeForce4 performance by a wide margin.

It’s a mystery how ATI is able to get a clock rate exceeding GeForce4 on a .15 micron part. Granted that most of the GeForce4 Ti4600 parts will clock to 330MHz, the increased size and complexity makes ATI’s accomplishment an engineering feat.


SMARTSHADER 2.0
The RADEON 8500 debuted SMARTSHADER® technology, but like NVIDIA’s first expression of its Quincunx technology in the GeForce3, the original SMARTSHADER was less than impressive.

The RADEON 9700 incorporates SMARTSHADER 2.0, the second generation of ATI's programmable vertex/pixel shader technology. SMARTSHADER 2.0 allows for movie-quality effects in real time for games and other 3D applications.

One of the key improvements in the pixel shader engines is the ability to handle floating-point calculations—a necessity under DX9 specifications. This brings movie quality effects within reach of every gamer. The RADEON 9700’s pixel shaders can process three simultaneous instructions: texture look-up, address, and color operation.

The move to fully floating point pipelines is critical as 3-D graphics move towards greater fidelity. Consider a typical situation where a floating point number is handed off to an integer pipeline.

If the original number being sent into an integer pipeline is 10.0123432544850, the number that actually gets sent to the pipeline is rounded or truncated so that it is a whole number. The integer pipeline might round the number to 10.012343. So? This isn’t a problem if the result is being used to generate a relatively simple image, but when your goal is a photorealistic image an error of that size can mean the difference between an image that looks computer generated and an image that looks like real life.

A greater range of lighting values

This image shows what is possible with floating point precision.

Microsoft’s Pixel Shader 2.0 specification is a component of DirectX 9. The improvements over DX 8.1 are substantial, shown in the table below.

DirectX Comparison

The first difference you see is that the new Pixel Shader specification calls for a maximum of 16 texture inputs. This translates into the ability to apply 16 textures per pass, a whopping 250 percent increase over GeForce4 and the RADEON 8500.

Even more important, and a factor in complexity for current and near to arrival games, a fully floating-point rendering pipeline translates directly into higher quality lighting.

As great looking as are the current simulations we fly in comparison to early sims like European Air War, the simulations which take advantage of these new lighting features in 2003 will make our current sims look dated and worn. We still have a long way to go for lifelike lighting; the RADEON 9700 sets the stage.

With the old 32-bit integer pipelines every RBGA value was limited to 8-bits, or 256 distinct values. 128-bits of floating point precision means thousands of distinct values, with the resulting extremes of bright and dark being suddenly separated by a lifelike difference in values. Some of the blurriness we experience near the extremes of current lighting in our games will suddenly disappear.


Double the Bus, and a 256 bit Memory Interface
The RADEON 9700 will support the AGP 8X specification, making it the first high-end graphics chip to double the current bus. The available bandwidth on the bus moves from the 1.0 GB/second of AGP 4X, to a full 2.0 GB/second.

The RADEON 9700 will be the second market entry this year with a 256-bit memory interface, Matrox being the first out with their Parhelia. But the performance of Parhelia is underwhelming, in spite of huge memory bandwidth. This time around the board is capable of utilizing that bandwidth, and performance is staggering.

The RADEON 9700’s memory controller has four independent 64-bit memory channels, which can read and write data simultaneously. The plant is to use standard DDR memory, but the chip includes support for the new DDR-II format.

It’s a reasonable guess that about the time NVIDIA ships their GeForce5 (November?) ATI will move to a 0.13 micron process, higher clock speeds and the faster DDR-II technology. Bandwidth could hit 30 GB/s.


HYPERZ III
It was PowerVR, Videologic and the KYRO II that brought occlusion culling to the fore. The technology is simple in concept. Virtually any scene, particularly indoor scenes but also most outdoor scenes which have any depth, include components that obscure other components. With occlusion technology, any part of the scene which is hidden by other parts of the scene is not drawn. The earlier in the pipeline this occlusion is noticed, the more pixel pushing power is saved to draw parts of the scene the gamer will actually see.

While it was PowerVR that developed the most powerful occlusion culling routines we know, ATI was actually the first to market with the technology in an earlier RADEON. HyperZ was fairly effective, and has grown in efficiency with each new generation.

The third generation of HyperZ is present in the RADEON 9700 and it contains faster versions of all three of the occlusion culling components present in HyperZ II.

ATI's HyperZ technology is composed of three features that work in conjunction with one another to increase effective bandwidth. The three features are Hierarchical Z, Z-Compression and Fast Z-Clear. To place these features in context we need to recall how the Z-buffer works.

The Z-buffer is a portion of memory dedicated to holding the z-values of rendered pixels. These z-values dictate what pixels and then what polygons appear in front of one another when sent to the screen. In mathematical terms, the z-values indicate a position along the z-axis.

A typical 3-D accelerator processes each polygon as it is sent to the hardware, and has no way of knowing anything about the rest of the scene. Lacking this knowledge, every forward facing polygon must be shaded and textured. Pixels that will be hidden but drawn anyway are called “overdraw.”

It is not unusual for overdraw by a factor of three. But the z-buffer has the information needed to know which pixels will be occluded. Accessing and using that information effectively is the key to improved efficiency.

PowerVR with their “Tile Based Rendering” attacked this problem from one angle. ATI doesn't implement this architecture, but it does borrow some of its features to increase efficiency. The key to improvement is to make the checks of the z-buffer more efficient. ATI's HyperZ increases the efficiency of access to the z-bufferes, so instead of attacking the root issue of overdraw, ATI went after the issue of frequent Z-buffer access.

The Hierarchical Z feature allows for the pixel being rendered to be checked against the z-buffer before the pixel actually hits the rendering pipelines. Early Z sub-divides the Z-buffer down to the pixel level so that the card can achieve close to 100 percent efficiency in discarding occluded (hidden) pixels. Z-Compression uses a special algorithm to compress the data in the Z-buffer.

The Fast Z-Clear feature allows for the quick clearing of all data in the Z-buffer after a scene has been rendered. All of this contributes to more efficient use of memory, and the effects become quite dramatic when antialiasing is thrown into the mix.


Smoothvision 2.0
With Smoothvision 2.0 ATI finally arrives at a multisampling technique for antialiasing. Up to this point ATI has used Supersampling, a method that required a huge fill rate and incurred a greater performance penalty than NVIDIA’s multisampling method.

Multisampling works by looking at the z-value (depth) of each pixel. Multisampling works by taking a weighted average between the foreground and background colors of the pixels, and then computing the final color of the pixel. Multisampling increases the number of z-buffer accesses, but since ATI uses a Z Compression method, these accesses barely impact the RADEON 9700’s memory bandwidth.

Furthermore, ATI claims that their method is superior to NVIDIA’s current multisampling, particularly where transparent textures are used. Improvement over the earlier RADEON 8500 will be so great that there is no comparison. But improvement over even the GeForce4 Ti4600 will be amazing. Improvement in both in quality and speed is the best we could hope for, and ATI claims to have achieved this.


TRUFORM 2.0
TRUEFORM technology uses tessellation (arranging geometric shapes in a pattern) to smooth out curved surfaces. This would be most noticeable on the arm joints of troopers, or on the conical nose of an aircraft.

With more and more ground troops appearing in simulations, and with other games blurring the distinction between simulation, RTS and FPS (e.g., Battlefield: 1942), we’re going to greatly benefit from improvements in these higher order surface instructions.

Displacement Mapping and Tesselation

TRUEFORM 2.0 offers additional tessellation options, with continuous and adaptive tessellation added. Displacement mapping is another new feature, analogous to an enhanced and more powerful version of bump mapping, offering more control over 3D surfaces. These features all work toward the goal of lifelike 3-D images, particularly when objects are in motion.

When you are hiding in the grass somewhere in Europe, you may appreciate the improvements as you see the grass waving in the breeze. When you are flying low through the trees features like this will make those trees look like more than collections of pixels and textures.


Old Issues
I haven’t personally run an ATI board for almost five years. Why not? Mostly because of driver issues. But there were other complaints with the earlier RADEON 8500.

The RADEON 8500 introduced ATI’s adaptive anisotropic filtering. This algorithm determined the number of samples that should be taken based on the rotation along the x and y axes. The result was that in situations where the highest degree of anisotropic filtering was necessary, the GPU would take the maximum amount of samples set in the driver (16X). The problem was ATI failed to take into account what would happen if there was rotation along the x, y and z axes. When rotation along the z axis was added to the mix, the GPU defaulted to the lowest filtering method supported - bilinear.

The resulting poor quality image was the bane of many a flight simulation fan. ATI has fixed the z-rotation issue and improved the filtering algorithm. Early reports are that this is by far the best implementation yet seen. Furthermore, efficiency is improved and the performance penalty is smaller than NVIDIA’s implementation.


What About Drivers?
With a list of advances like this, and with performance figures on early drivers already showing astonishing results, the big question in everyone’s mind is compatibility. ATI has had problems with drivers in the past…will they release solid drivers this time around?

That remains to be seen, with only a few early review boards having shipped. But with on-the-shelf date targeted for late August, those boards will be hitting the hands of more reviewers very, very soon. My guess is that after a short teething period the new boards will perform very well indeed.

3DMark2001 SE

Tests with early drivers using 3DMark2001 SE build 330 reveal scores 70 percent higher than GeForce4 Ti4600.

Wha…???

It’s one thing to release technical information, it’s another to do hands-on tests that achieve scores this high.

As far as I know, this is a first. I don’t believe ANY video hardware release has every shown this much of an improvement over the current leader, unless it was 3dfx first release. At that time, however, there were few standard benchmarks and 3DMark did not yet exist.

RADEON 9700

With 110 million transistors and on a .15 micron die, this chip is huge and has more transistors than your current CPU! Like an earlier GPU, it will require an external power source since it isn’t wise to draw too much power over the AGP bus.


Conclusion
As always, it’s best to wait for early reviews to invest in video hardware, so that you can ensure that your favorite games will run. My guess is that within a few months, there will be a lot of used NVIDIA hardware available…at least, until NVIDIA releases GeForce5 late in the fall or early in 2003.

It’s great to see ATI’s return to front line competition, and even to see them take the lead. Competition is good in this industry. None of us want to lay out $350 for a new video board every six months. With ATI and NVIDIA going head to head, and with a new entry by 3D Labs looming in the background, increased competition will mean more pressure on prices, and faster and better looking games for all of us.


Visit ATI’s website to access educational demos.




Graphic Card Resources

Articles:
Companies:

 Printer Friendly

© 2014 COMBATSIM.COM - All Rights Reserved