AMD ATI Radeon HD 2000 Series Architecture
By Gabriel Torres on July 9, 2007 Page 2 of 6

Radeon HD 2900 XTX

Radeon HD 2900 XT runs at 740 MHz and access its 512 MB GDDR3 memory at 825 MHz (1.65 GHz DDR), using a new 512-bit memory interface, with boosts the memory maximum theoretical transfer rate to 105.6 GB/s – Radeon X1950 XTX has a memory maximum transfer rate of 64 GB/s and GeForce 8800 GTX, of 86.4 GB/s, but the new GeForce 8800 Ultra reaches 103.6 GB/s.

Its unified shader architecture has 320 shader units or “streaming processors” – GeForce 8800 GTX has 128.

On Figure 1 you can have an overall look on the architecture used by Radeon HD 2900 XT.

ATI Radeon HD 2900 XT
click to enlarge
Figure 1: Radeon HD 2900 XT architecture.

On Figure 2 you can have a more in-depth look on how it works. As you can see, it has a dispatch unit that can send up to eight shader instructions to the streaming processors and up to two vertex or texture instructions per clock cycle. And as we will explain below, each one of these shader instructions can actually represent up to six instructions.


ATI Radeon HD 2900 XT
click to enlarge
Figure 2: Inside Radeon HD 2900 XT architecture.

The streaming processors are divided into four main groups (called “SIMD arrays”) with 80 processors each, each group connected to two ports of the dispatch unit. These groups are subdivided into 16 units, each unit containing five streaming processors and one branch processing unit. The architecture of each one of these units can be seen on Figure 3.

ATI Radeon HD 2900 XT
click to enlarge
Figure 3: Architecture of each streaming processor unit, containing five processors each.

These units are superscalar, meaning that each streaming processor can be processing several instructions in parallel at the same time. All five processors deal with multiply-add instructions, which are the most common instruction type, while only one (the first one on Figure 3) can also deal with transcendental instructions as well, i.e. log and trigonometric instructions like SIN, COS, LOG, EXP, etc. It is very interesting to note that each streaming processor is, in fact, a small 32-bit floating-point unit.

Another very interesting thing is that each instruction sent to each unit packs six instructions (five math instructions plus one flow control instruction) into a single instruction. So instead of having to send up to six separated instructions to each unit, the dispatch unit can fill all six execution units with just one big instruction. This concept is called VLIW (Very Long Instruction Word).


Originally at http://www.hardwaresecrets.com/article/448/2Pages (6): 1 2 3 4 5 6 »

© 2004-9, Hardware Secrets, LLC. All Rights Reserved.

Total or partial reproduction of the contents of this site, as well as that of the texts available for downloading, be this in the electronic media, in print, or any other form of distribution, is expressly forbidden. Those who do not comply with these copyright laws will be indicted and punished according to the International Copyrights Law.

We do not take responsibility for material damage of any kind caused by the use of information contained in Hardware Secrets.