Radeon HD 2900 XTX

Radeon HD 2900 XT runs at 740 MHz and access its 512 MB GDDR3 memory at 825 MHz (1.65 GHz DDR), using a new 512-bit memory interface, with boosts the memory maximum theoretical transfer rate to 105.6 GB/s – Radeon X1950 XTX has a memory maximum transfer rate of 64 GB/s and GeForce 8800 GTX, of 86.4 GB/s, but the new GeForce 8800 Ultra reaches 103.6 GB/s.

Its unified shader architecture has 320 shader units or “streaming processors” – GeForce 8800 GTX has 128.

In Figure 1 you can have an overall look at the architecture used by Radeon HD 2900 XT.

ATI Radeon HD 2900 XTFigure 1: Radeon HD 2900 XT architecture.

In Figure 2 you can have a more in-depth look at how it works. As you can see, it has a dispatch unit that can send up to eight shader instructions to the streaming processors and up to two vertex or texture instructions per clock cycle. And as we will explain below, each one of these shader instructions can actually represent up to six instructions.

ATI Radeon HD 2900 XTFigure 2: Inside Radeon HD 2900 XT architecture.

The streaming processors are divided into four main groups (called “SIMD arrays”) with 80 processors each, each group connected to two ports of the dispatch unit. These groups are subdivided into 16 units, each unit containing five streaming processors and one branch processing unit. The architecture of each one of these units can be seen in Figure 3.

ATI Radeon HD 2900 XTFigure 3: Architecture of each streaming processor unit, containing five processors each.

These units are superscalar, meaning that each streaming processor can be processing several instructions in parallel at the same time. All five processors deal with multiply-add instructions, which are the most common instruction type, while only one (the first one in Figure 3) can also deal with transcendental instructions as well, i.e., log and trigonometric instructions like SIN, COS, LOG, EXP, etc. It is very interesting to note that each streaming processor is, in fact, a small 32-bit floating-point unit.

Another very interesting thing is that each instruction sent to each unit packs six instructions (five math instructions plus one flow control instruction) into a single instruction. So instead of having to send up to six separated instructions to each unit, the dispatch unit can fill all six execution units with just one big instruction. This concept is called VLIW (Very Long Instruction Word).

Gabriel Torres is a Brazilian best-selling ICT expert, with 24 books published. He started his online career in 1996, when he launched Clube do Hardware, which is one of the oldest and largest websites about technology in Brazil. He created Hardware Secrets in 1999 to expand his knowledge outside his home country.