GeForce 8 Series Architecture
By Gabriel Torres on November 9, 2006


Introduction

NVidia has just released their new GeForce 8 series, formerly known by its codename, G80. This new series uses a completely different architecture from all other graphic chips, using a unified shader engine. In this article we will explain in details all that is new on this new graphics chip series.

GeForce 8 is the first series to support the forthcoming DirectX 10 (Shader 4.0 model). So first let’s talk about what is new in DirectX 10.

One of the new things on DirectX 10 is geometry processing. Until now the system CPU was in charge of this task. On DirectX 10 geometry processing can now be done by the graphics chip. Geometry processing smoothes curved surfaces and you can see a better quality on character animation, facial expressions and hair.

DirectX 10 provides more resources to the GPU, improving 3D performance. The key differences in GPU resources between DirectX 9 and DirectX 10 you can see in the table below.

ResourcesDirectX 9DirectX 10
Temporary Registers324,096
Constant Registers25616 x 4,096
Textures16128
Render Targets48
Maximum Texture Size4,048 x 4,0488,096 x 8,096

In the table below you see a comparison between shader models 1.0 (DirectX 8.1), 2.0 (DirectX 9.0), 3.0 (DirectX 9.0c) and 4.0 (DirectX 10).

 Shader 1.xShader 2.0Shader 3.0Shader 4.0
Vertex Instructions12825651265,536 *
Pixel Instructions4+832+6451265,536 *
Vertex Constants96256 25616 x 4,096 *
Pixel Constants 83222416 x 4,096 *
Vertex Temps1616164,096 *
Pixel Temps212324,096 *
Vertex Inputs16161616
Pixel Inputs4+28+21032
Render Targets1448
Vertex Textures--4128 *
Pixel Textures81616128 *
2D Texture Size--2,048 x 2,0488,192 x 8,192
Int Ops---Yes
Load Ops---Yes
Derivatives--YesYes
Vertex Flow Control-StaticStatic/DynamicDynamic *
Pixel Flow Control--Static/DynamicDynamic *

* As DirectX 10 implements unfied architecture as we will discuss in the next page, this number is the total for the whole unified architecture and not for this individual spec.

Besides the increase in resources capacity, there are several new features on DirectX 10, but in summary the goal of this new programming model is to reduce the system CPU role on 3D graphics performance – i.e., it tries to avoid the use of the system CPU as much as possible.

Of course we will have to wait until DirectX 10 games are released to see these new features in action. Some of the games that will be released based on DirectX 10 include Crysis and Hellgate London.

But the main difference between GeForce 8 series and all other graphic chips available today is its unified shader engine, another feature introduced by Shader 4.0 programming model. Let’s talk more about it.

Architecture

Instead of the graphics chip having separated shader engines according to the task to be performed – for instance, separated pixel shader and vertex shader units – the GPU has now just one big engine that can be programmed on the fly according to the task to be done: pixel shader, vertex shader, geometry and physics. This new architecture also makes it easier to add new shader types in the future.

The reason behind unified architecture is that in some situations the GPU was using all shader engines from one type (pixel shader engines, for example) and even queuing tasks for these engines while engines from another type (vertex shader engines, for example) were idle but cannot be used to perform a different task, since they were dedicated to a specific procedure type.

So Shader 4.0 allows the use of any shader engine by any shader process – pixel, vertex, geometry and physics.

In Figure 1, you can see the block diagram for GeForce 8800 GTX, the most high-end model on the new GeForce 8 series. As you can see, this GPU has eight shader units and none of them are dedicated to a specific task. Each shader unit has 16 streaming processors (the green boxes labeled SP), eight texture filtering units (the blue boxes labeled TF), four texture address units (not drawn in Figure 1) and one L1 memory cache (the orange box). This GPU also has six memory interface busses, each one are 64-bit wide and has its own L2 memory cache. The streaming processors work with a different (higher) clock rate.

GeForce 8800 GTX
click to enlarge
Figure 1: GeForce 8800 GTX block diagram.

So GeForce 8800 GTX has 128 shader engines (i.e., streaming processors; 16 streaming processors x 8 units) and 384-bit memory interface (64 bits x 6).

GeForce 8800 GTS, another GeForce 8 model that was released, has six shader units and five 64-bit memory interface busses, so it has 96 shader engines (16 streaming processors x 6 units) and 320-bit memory interface (64 bits x 5).

We will discuss the technical specs for these two chips in a while.

Other features found on GeForce 8 series include:

Released Models

Two GeForce 8 models were released this week: GeForce 8800 GTX and GeForce 8800 GTS, both targeted to PCI Express x16. GeForce 8800 GTX requires two auxiliary power connectors and has two SLI connectors (we are asking ourselves if this isn’t a future support for SLI mode with four video cards).

As we mentioned before, the shader engines use a different clock rate from the rest of the GPU.

We list below the main specs for these two new video cards.

 GeForce 8800 GTXGeForce 8800 GTS
Core clock575 MHz500 MHz
Streaming Processors (Shader Engines)12896
Streaming Processors Clock1.35 GHz1.2 GHz
Memory Clock1.8 GHz1.6 GHz
Memory Capacity768 MB640 MB
Memory Interface384-bit320-bit
MSRPUSD 599USD 499

Originally at http://www.hardwaresecrets.com/article/GeForce-8-Series-Architecture/398


© 2004-13, Hardware Secrets, LLC. All Rights Reserved.

Total or partial reproduction of the contents of this site, as well as that of the texts available for downloading, be this in the electronic media, in print, or any other form of distribution, is expressly forbidden. Those who do not comply with these copyright laws will be indicted and punished according to the International Copyrights Law.

We do not take responsibility for material damage of any kind caused by the use of information contained in Hardware Secrets.