GeForce GTX 200 Series Architecture
By Gabriel Torres on June 16, 2008
Like GeForce 8 and GeForce 9 the new GeForce GTX 200 series is also a DirectX 10 (i.e., shader 4.0) hardware, but using a different architecture. Let’s see what is new.
The main thing with GeForce GTX 200 series is that NVIDIA is now officially pushing GPGPU (General Purpose Graphics Processing Unit) – which is the use of the GPU located on the graphics card to process regular programs – into the consumer market. Thus you will see NVIDIA saying that chips from these series are kind of “2-in-1” or having “parallel computing capabilities.” NVIDIA marketing department is calling this “beyond gaming.”
In fact all video cards can be used like this. The trick is the compiler and NVIDIA has their compiler, called CUDA, available for downloading for some time now. CUDA allows compiling regular C/C++ programs to be run mainly on the GPU. Before the release of CUDA, programmers would need to write programs on the GPU language, which means learning something completely new and unusual.
GPGPU has been seen on the academia for a while now improving performance on several specific applications. With the release of the Tesla platform – basically GeForce 8 video cards to be used specifically to process regular programs and not to produce video – NVIDIA showed the commitment of moving GPGPU from a merely research stage to a serious application for the scientific community. Now with the new GTX 200 series NVIDIA wants to do one step further, encouraging software developers to incorporate the GPGPU concept into applications available to the general public.
The rationale behind this idea is that today the GPU – especially high-end models – have far more computing power than CPU’s. In Figure 1, you can see a comparison between the new GeForce GTX 280 chip and a mainstream CPU (Core 2 Duo E8400) and a high-end CPU (Core 2 Extreme QX9650). GFLOPS stands for billions of floating-point (mathematical) operations per second and measures the maximum math performance of a chip.
During their Spring 2008 Editor’s Day NVIDIA made some demonstrations of how throwing processing that is usually done by the CPU to the GPU can improve performance of regular applications. In Figure 2, you can see an example of how long a 2-hour high-definition movie in H.264 format takes for encoding using several different hardware. The performance boost is amazing, but keep in mind that this isn’t an exclusive feature from GTX 200 series (any video card can do that – programs compiled with CUDA will only work on GeForce 8 and above, though –; see how they are comparing the GTX 280’s performance to a GeForce 9600 GT) and you need a program that uses GPGPU. Don’t get excited thinking that by just adding the new GTX 280 you will have this performance gain. This specific demonstration was done with a program called BadaBOOM, which is a program that converts movies into portable media player formats. This program isn’t available for the general public yet but the same company offers an encoder called RapidHD, which is available for Adobe Premiere Pro and also uses GPGPU.
Another example gave by NVIDIA was Folding@Home, the distributed computing program for analyzing proteins sponsored by Standford University. Each person that downloads and installs this program adds his or her own computer to the program’s network, building a supercomputer using PC’s all around the Globe as nodes. In Figure 3 you see the performance increase on this program, which is capable of using the graphics card to do processing.
In summary, GPGPU isn’t something new and it isn’t exclusive for the new GeForce GTX 200 series, but expect to see more regular programs capable of using the video card’s GPU to process programs. For instance, Adobe announced that the new version of Photoshop to be released on the second half of this year will use the GPU to do some processing and thus increase the performance of the program.Now let’s talk the GeForce GTX 200 series architecture.
NVIDIA is launching today two chips on the GeForce GTX 200 family: GTX 280 and GTX 260. In Figure 4, you can see a block diagram from the new GeForce GTX 280 and in Figure 5 a photo of the GeForce GTX 280 die showing the location of the main blocks.
In the middle of the block diagram in Figure 5 you can see 10 blocks. These blocks are called Thread Processing Cluster or simply TPC and in Figure 7 you have a more detailed view of one of these blocks.
Each TPC has one L1 memory cache and three sets of processing units. Each one of these sets is called Streaming Multiprocessors (SM) by NVIDIA. Each set has eight processing units (labeled as “core” by NVIDIA; they are also known as Streaming Processors or SP) sharing a small piece of RAM (labeled as “local memory” by NVIDIA). The addition of these small pieces of RAM is one of the main differences between the architecture used on the GeForce GTX 200 series and the one used by GeForce 8 and GeForce 9 series. You can learn more about the architecture of these two series in our article GeForce 8 Series Architecture (despite its name, GeForce 9 is based on GeForce 8 architecture).
The main idea behind DirectX 10 – i.e., Shader 4.0 programming model – is that each processing unit is a “generic” unit, allowing any kind of processing (this concept helped a lot GPGPU). Previously the GPU had specific units for each kind of possible processing (most notably specific processing units for pixel shaders and specific processing units for vertex shaders).
Since each set inside the TPC has eight processing units, each TPC has 24 processing units, for a total of 240 processing units (10 TPC’s) on GeForce GTX 280. GeForce GTX 260 has less units, 192, achieved by having eight TPC’s instead of 10.
Inside each TPC you can also find eight texture filtering units (labeled as “TF” in Figure 7), for a total of 80 texture units on GeForce GTX 280 and 64 on GeForce GTX 260.
As you can see in Figure 4, GeForce GTX 280 has eight memory interface units, each one being 64-bit wide. This means that GeForce GTX 280 has a 512-bit (64-bit x 8) memory interface – it was about time: GeForce 8800 GTX uses a 384-bit memory interface and GeForce 9800 GTX uses a 256-bit interface. This model supports 1 GB of video memory, with two 64 MB (512 Mbit) chips attached to each memory interface unit. GeForce GTX 260 has seven memory interface units, meaning that this version uses a 448-bit memory interface (64-bit x 7) and comes with 896 MB of video memory (64 MB per chip x 2 x 7).
GeForce GTX 200 series finally supports double floating-point precision (i.e., 64-bit floating point registers).
Chips from the new GeForce GTX 200 series bring the updated 2D video processing engine, called VP2 or “2nd generation PureVideo HD,” which has a fully hardware-based H.264 decoder (used to decode high-definition movies like Blu-Ray and HD-DVD), releasing the system CPU from this task. This same decoder is found on all video cards from GeForce 8 and 9 series but "G80" chips (GeForce 8800 GTS, GTX and Ultra), which are based on the previous PureVideo HD engine, VP1, still partially using the system CPU for decoding.
GeForce GTX 200 series also has more power saving modes. Four modes are available:
As mentioned two models are being released at this moment: GeForce GTX 280 and GeForce GTX 260. Below you can see pictures of the reference model for these two cards – as you can see, they have the same external appearance. Both cards require two auxiliary power connectors, with GTX 280 requiring one 6-pin connector and another 8-pin connector and GTX 260 requiring two 6-pin connectors. Both video cards are based on PCI Express 2.0, naturally. So for best performance you must install them on a motherboard supporting PCI Express 2.0. Both of them support three-way SLI.
In the table below we summarize the main specs from these two cards.
GeForce GTX 280
GeForce GTX 260
Streaming Processors (Shader Engines)
Streaming Processors Clock
Memory Clock (Real)
Memory Clock (DDR)
1 GB GDDR3
896 MB GDDR3