GeForce GTX 200 Series Architecture

[nextpage title=”Introduction”]

Like GeForce 8 and GeForce 9 the new GeForce GTX 200 series is also a DirectX 10 (i.e., shader 4.0) hardware, but using a different architecture. Let’s see what is new.

The main thing with GeForce GTX 200 series is that NVIDIA is now officially pushing GPGPU (General Purpose Graphics Processing Unit) – which is the use of the GPU located on the graphics card to process regular programs – into the consumer market. Thus you will see NVIDIA saying that chips from these series are kind of “2-in-1” or having “parallel computing capabilities.” NVIDIA marketing department is calling this “beyond gaming.”

In fact all video cards can be used like this. The trick is the compiler and NVIDIA has their compiler, called CUDA, available for downloading for some time now. CUDA allows compiling regular C/C++ programs to be run mainly on the GPU. Before the release of CUDA, programmers would need to write programs on the GPU language, which means learning something completely new and unusual.

GPGPU has been seen on the academia for a while now improving performance on several specific applications. With the release of the Tesla platform – basically GeForce 8 video cards to be used specifically to process regular programs and not to produce video – NVIDIA showed the commitment of moving GPGPU from a merely research stage to a serious application for the scientific community. Now with the new GTX 200 series NVIDIA wants to do one step further, encouraging software developers to incorporate the GPGPU concept into applications available to the general public.

The rationale behind this idea is that today the GPU – especially high-end models – have far more computing power than CPU’s. In Figure 1, you can see a comparison between the new GeForce GTX 280 chip and a mainstream CPU (Core 2 Duo E8400) and a high-end CPU (Core 2 Extreme QX9650). GFLOPS stands for billions of floating-point (mathematical) operations per second and measures the maximum math performance of a chip.

Figure 1: Comparison between GeForce GTX 280, Core 2 Duo E8400 and Core 2 Extreme QX9650.

During their Spring 2008 Editor’s Day, NVIDIA made some demonstrations of how throwing processing that is usually done by the CPU to the GPU can improve performance of regular applications. In Figure 2, you can see an example of how long a 2-hour high-definition movie in H.264 format takes for encoding using several different hardware. The performance boost is amazing, but keep in mind that this isn’t an exclusive feature from GTX 200 series (any video card can do that – programs compiled with CUDA will only work on GeForce 8 and above, though –; see how they are comparing the GTX 280’s performance to a GeForce 9600 GT) and you need a program that uses GPGPU. Don’t get excited thinking that by just adding the new GTX 280 you will have this performance gain. This specific demonstration was done with a program called BadaBOOM, which is a program that converts movies into portable media player formats. This program isn’t available for the general public yet but the same company offers an encoder called RapidHD, which is available for Adobe Premiere Pro and also uses GPGPU.

Figure 2: Transcoding performance increase.

Another example gave by NVIDIA was Folding@Home, the distributed computing program for analyzing proteins sponsored by Standford University. Each person that downloads and installs this program adds his or her own computer to the program’s network, building a supercomputer using PC’s all around the Globe as nodes. In Figure 3 you see the performance increase on this program, which is capable of using the graphics card to do processing.

Figure 3: Folding@Home performance increase.

In summary, GPGPU isn’t something new and it isn’t exclusive for the new GeForce GTX 200 series, but expect to see more regular programs capable of using the video card’s GPU to process programs. For instance, Adobe announced that the new version of Photoshop to be released on the second half of this year will use the GPU to do some processing and thus increase the performance of the program.

Now let’s talk the GeForce GTX 200 series architecture.[nextpage title=”Architecture”]

NVIDIA is launching today two chips on the GeForce GTX 200 family: GTX 280 and GTX 260. In Figure 4, you can see a block diagram from the new GeForce GTX 280 and in Figure 5 a photo of the GeForce GTX 280 die showing the location of the main blocks.

Figure 4: GeForce GTX 280 block diagram.

Figure 5: Location of the main blocks on the GeForce GTX 280 die.

In the middle of the block diagram in Figure 5 you can see 10 blocks. These blocks are called Thread Processing Cluster or simply TPC and in Figure 7 you have a more detailed view of one of these blocks.

Figure 6: Thread Processing Cluster (TPC).

Each TPC has one L1 memory cache and three sets of processing units. Each one of these sets is called Streaming Multiprocessors (SM) by NVIDIA. Each set has eight processing units (labeled as “core” by NVIDIA; they are also known as Streaming Processors or SP) sharing a small piece of RAM (labeled as “local memory” by NVIDIA). The addition of these small pieces of RAM is one of the main differences between the architecture used on the GeForce GTX 200 series and the one used by GeForce 8 and GeForce 9 series. You can learn more about the architecture of these two series in our article GeForce 8 Series Architecture (despite its name, GeForce 9 is based on GeForce 8 architecture).

The main idea behind DirectX 10 – i.e., Shader 4.0 programming model – is that each processing unit is a “generic” unit, allowing any kind of processing (this concept helped a lot GPGPU). Previously the GPU had specific units for each kind of possible processing (most notably specific processing units for pixel shaders and specific processing units for vertex shaders).

Since each set inside the TPC has eight processing units
, each TPC has 24 processing units, for a total of 240 processing units (10 TPC’s) on GeForce GTX 280. GeForce GTX 260 has less units, 192, achieved by having eight TPC’s instead of 10.

Inside each TPC you can also find eight texture filtering units (labeled as “TF” in Figure 7), for a total of 80 texture units on GeForce GTX 280 and 64 on GeForce GTX 260.

As you can see in Figure 4, GeForce GTX 280 has eight memory interface units, each one being 64-bit wide. This means that GeForce GTX 280 has a 512-bit (64-bit x 8) memory interface – it was about time: GeForce 8800 GTX uses a 384-bit memory interface and GeForce 9800 GTX uses a 256-bit interface. This model supports 1 GB of video memory, with two 64 MB (512 Mbit) chips attached to each memory interface unit. GeForce GTX 260 has seven memory interface units, meaning that this version uses a 448-bit memory interface (64-bit x 7) and comes with 896 MB of video memory (64 MB per chip x 2 x 7).

GeForce GTX 200 series finally supports double floating-point precision (i.e., 64-bit floating point registers).

Chips from the new GeForce GTX 200 series bring the updated 2D video processing engine, called VP2 or “2^nd generation PureVideo HD,” which has a fully hardware-based H.264 decoder (used to decode high-definition movies like Blu-Ray and HD-DVD), releasing the system CPU from this task. This same decoder is found on all video cards from GeForce 8 and 9 series but "G80" chips (GeForce 8800 GTS, GTX and Ultra), which are based on the previous PureVideo HD engine, VP1, still partially using the system CPU for decoding.

GeForce GTX 200 series also has more power saving modes. Four modes are available:

Idle/2D power mode: used when you are working on Windows and working with regular programs, like word processing and internet browsing. The video card consumes around 25 W when it is in this mode.
Video playback mode: used when you want to playback movies and use the hardware-based decoder incorporated in the graphics chip instead of using the system CPU for decoding. The video card consumes around 35 W when it is in this mode.
Full 3D performance mode: When playing games the video card activates its 3D engine. The power consumption will be the maximum (maximum of 236 W on GeForce GTX 280 and 182 W on GeForce GTX 260).
HybridPower: This is a technology where 2D video is produced by the motherboard (i.e., on-board video) and the video card is automatically turned off when you are not playing games. Thus power consumption from the video card is zero when you are not playing games. You need a HybridPower compliant motherboard in order to use this feature.

[nextpage title=”Released Models”]

As mentioned two models are being released at this moment: GeForce GTX 280 and GeForce GTX 260. Below you can see pictures of the reference model for these two cards – as you can see, they have the same external appearance. Both cards require two auxiliary power connectors, with GTX 280 requiring one 6-pin connector and another 8-pin connector and GTX 260 requiring two 6-pin connectors. Both video cards are based on PCI Express 2.0, naturally. So for best performance you must install them on a motherboard supporting PCI Express 2.0. Both of them support three-way SLI.

Figure 7: GeForce GTX 260 reference model.

Figure 8: GeForce GTX 280 reference model.

In the table below we summarize the main specs from these two cards.

Feature	GeForce GTX 280	GeForce GTX 260
Core clock	602 MHz	576 MHz
Streaming Processors (Shader Engines)	240	192
Streaming Processors Clock	1,296 MHz	1,242 MHz
Memory Clock (Real)	1,107 MHz	999 MHz
Memory Clock (DDR)	2,214 MHz	1,998 MHz
Memory	1 GB GDDR3	896 MB GDDR3
Memory Interface	512-bit	448-bit
TDP	236 W	182 W
MSRP	USD 649	USD 399

GeForce GTX 200 Series Architecture

For Performance

Everything you need to know

Reader Interactions

Leave a Reply Cancel reply

Footer

For Performance

Everything you need to know