Everything You Need to Know About the HyperTransport Bus
By Gabriel Torres on October 13, 2011
Current AMD processors have two external busses. One is used on the communication between the CPU and the memory, and it is simply called “memory bus.” The other is used on the communication between the CPU and all other PC components through the motherboard chipset and is called HyperTransport – an I/O (Input/Output) bus. In this tutorial, we will be explaining how the HyperTransport bus works and clarifying common mistakes people assume about this bus.
This bus was added starting with the AMD64 architecture, and in older AMD processors not based on this architecture (such as the original Athlon, Athlon XP, and Sempron socket 462 processors), the CPU has only one external bus, also known as the front side bus (FSB). In this approach, the external bus carries both memory and I/O communications. Since there is only one datapath out of the processor, memory and I/O transfers compete with each other for the use of the bus, thus lowering the I/O performance.
In Figure 1, you can see how an AMD processor communicates with the external world. The “bridge” chip is the motherboard chipset. Depending on the chipset, you can have one or two chips. On two-chip solutions, all peripherals (such as hard disk drives, add-on cards, sound cards, etc.) are connected to the second chip. (This second chip is called south bridge, not shown in Figure 1.) On single-chip solutions, everything is connected to this single chip.
AMD CPUs targeted to servers – i.e., Opteron processors – can have one, two or three HyperTransport busses, depending on the model. These extra busses are used to interconnect several CPUs allowing them to talk to each other, i.e., used on servers with more than one CPU on the motherboard. Since desktop and notebook CPUs do not support this kind of configuration, there is only one HyperTransport bus on them.
For a more in-depth explanation of AMD64 architecture, please read our “Inside AMD64 Architecture” tutorial.
Besides providing AMD processors with separated datapaths for memory and I/O, HyperTransport brings another advantage: it provides separated links for the CPU input and output operations, allowing the CPU to transmit (“write”) and receive (“read”) I/O data at the same time (i.e., in parallel). In the traditional architecture using a single external bus, since the external bus is used for both input and output operations, reads and writes cannot be done simultaneously.
The HyperTransport bus can operate under several different clock and width (i.e., the number of bits that are transmitted per time) configurations. This is probably where a lot of misconceptions and mistakes regarding HyperTransport are said and written.
The HyperTransport is a bus created by a consortium comprised of several companies including AMD, NVIDIA, and Apple. This bus can be used on several applications, and it is not limited to AMD processors.
This means that the actual configuration of the HyperTransport bus will depend on the hardware developer.
Also, some developers announce an exaggerated transfer rate of the HyperTransport bus they are using.
AMD processors use 16-bit links, even though HyperTransport allows the use of 32-bit links.
HyperTransport 1.x (“HT1”) is used on all socket 754 processors and socket AM2 Sempron processors. (Other AM2-based processors use HyperTransport 2.0.)
Here is a breakdown of all possible clock and transfer rates on HyperTransport 1.x:
HyperTransport transfers two data per clock cycle, a concept also known as DDR, double data rate.
The formula to find the maximum theoretical transfer rate is:
Transfer rate = width (number of bits) x clock x number of data per clock cycle / 8
Thus, with socket 754 processors, the HyperTransport bus can work up to 800 MHz or 3,200 MB/s. Some people advertise this clock and transfer rate using other numbers, generating a lot of confusion in the market.
Some say that the clock rate used by HyperTransport 1.x is 1,600 MHz. This occurs because since on each clock cycle two data are transferred, the performance obtained is similar to 1,600 MHz clock rate, transferring only one data per clock cycle. In the end, the transfer rate will be the same, as in the above formula instead of using “2” for “number of data per clock cycle,” it will use “1” instead. This is the same thing that happens with DDR memories, where the announced clock rate is double the actual clock rate (e.g., DDR3-1600 memories work, in fact, at 800 MHz, transferring two data per clock cycle).
AMD says that the clock rate used by its socket 754 CPUs is 1,600 MT/s. MT/s stands for Mega Transfers per second, or Millions of Transfers per second. This is the correct way to express the above idea. Transfers per second are equal to the clock rate times the number of data transferred per clock cycle.
Some say that the maximum transfer rate of HyperTransport 1.x is 6,400 MB/s. This occurs because the announced transfer rate is for each datapath (i.e., 3,200 MB/s for the input datapath and 3,200 MB/s for the output datapath), so some people simple multiply the transfer rate by two to cover the two datapaths. We don’t agree with this methodology. In brief, it is as if we said that a highway has a speed limit of 130 mph just because there is a speed limit of 65 mph in each direction. It doesn’t make any sense.
Another misunderstanding is saying that the external bus or FSB (Front Side Bus) of Athlon 64 (or any other CPU based on HyperTransport 1.x bus) is 1,600 MHz. This is partially correct. We can say this regarding I/O operations, but not for memory, as processors based on AMD64 architecture have two separated external busses, as we discussed. Thus, it is better to say “HyperTransport” or “HT” rather than “external bus” or “FSB” in order to avoid confusion.
It is important to note that AMD processors can work with several other clock rates below the announced 1,600 MT/s (800 MHz). In fact, they can work with any of the speeds on the list published above.
The chipset can negotiate a lower clock rate with the CPU and even an eight-bit width instead of the default 16-bit one. In fact, when the first Athlon 64 chipsets came out, VIA claimed that their chipset for the Athlon 64, the K8T800, was superior to the competition for working with the HyperTransport bus at 1,600 MT/s. VIA accused competing products (without mentioning names) of not working at the maximum transfer rate that the HyperTransport allows, but rather at one of those inferior rates, or even using eight-bit instead of 16-bit links.
At http://www.hypertransport.org, HyperTransport’s official website, you will see that they announce a maximum transfer rate of 12.8 GB/s for the HyperTransport 1.x. This maximum transfer rate is achieved by using 32-bit links. As we explained, AMD processors use 16-bit links. However, if you do the math, you will find 6,400 MB/s (32 bits x 800 MHz x 2 / 8). Here the consortium doubled the maximum transfer rate just because there are two datapaths available (one for transmitting data and another for receiving it). As we said before, we do not agree with this methodology of calculating transfer rates.
HyperTransport 2.0 (“HT2”) adds new clock rates (and thus new transfer rates) along with a new feature, PCI Express mapping, which makes it easier for the CPU to “talk” to PCI Express devices.
The new clock and transfer rates introduced by HyperTransport 2.0 are the following, assuming 16-bit links (which is the configuration used by AMD processors):
HyperTransport 2.0 devices can also work with HyperTransport 1.x transfer rates.
AMD uses HyperTransport 2.0 on all CPUs based on sockets 939 and AM2 (except on Sempron CPUs, which continue to use HyperTransport 1.0), however, supporting only the lower HT2 speed. In fact, AMD was more interested in the PCI Express mapping feature than a higher transfer speed. So even though these processors are based on HT2, the maximum transfer rate of their HT links is 4,000 MB/s.
To make things a little confusing, several times AMD uses the name “HT1” to describe the HyperTransport bus of CPUs that have their HyperTransport links working at 1,000 MHz. This is probably done to avoid people assuming that, since they are HT2 parts, they can work up to 1,400 MHz (5,600 MB/s), which is not the case, as we are explaining.
Also, some people refer to this 1,000 MHz (4,000 MB/s) HyperTransport link as:
Another misunderstanding is saying that the external bus or FSB (Front Side Bus) of an AMD processor based on HT2 is 2,000 MHz. This is partially right. We can say this regarding I/O operations but not for memory, as processors based on AMD64 architecture have two separated external busses, as we saw. So it is better if you say “HyperTransport” or “HT,” rather than “external bus” or “FSB.”
As with the HyperTransport 1.x, it is important to keep in mind that AMD processors based on HyperTransport 2.0 can work with any of the clock rates below 1,000 MHz.
Once again, official values for the HyperTransport 2.0 are inflated, as HyperTransport consortium announces their using 32-bit links and multiplying the transfer rates by two, since there are two links available (one for transmitting and another for receiving data). As we previously mentioned, we do not agree with this methodology. Because of this methodology, the HT2 maximum theoretical transfer rate is advertised as 22.4 GB/s (1,400 MHz x 32 x 2 / 8 x 2 links).
Besides adding new clock rates – and thus new transfer rates – the HyperTransport 3.0 brings several new features over HyperTransport 2.0, such as AC operating mode, Link Splitting (a.k.a. Un-Ganging), Hot Plugging, and Dynamic Link Clock/Width Adjustment. Current AMD processors, such as the Phenom, Phenom II, Athlon II, and FX, use the latest version of the HyperTransport bus.
HyperTransport 3.0 adds the following new clock rates, keeping compatibility with HT1 and HT2 rates (transfer rates assuming 16-bit links, which is the configuration used by AMD processors):
Sometimes you will see the MT/s numbers published as MHz, as already discussed.
Socket AM2+ and AM3 processors and their companion chipsets, however, are limited to the 8,000 MB/s transfer rate. Only socket AM3+ CPUs and chipsets are capable of using all the speeds published above. Of course, all CPUs and chipsets are compatible with the lower transfer rates available.
Keep in mind that socket AM2+ processors can still be installed on socket AM2 motherboards, however, their HyperTransport bus will be limited to HT2 speeds.
Once again, the transfer rates announced by the HyperTransport consortium are highly exaggerated. They announce HyperTransport 3.0 as having a maximum transfer rate of 41.6 GB/s. To reach this number they considered 32-bit links (and not 16-bit links) and doubled the number found by two because there are two links available. The math used was 2,600 MHz x 32 x 2 / 8 x 2 links. As we have already explained, AMD processors use 16-bit links, not 32-bit ones, and we don’t agree with the methodology of doubling the transfer rate, done because there is one link for transmitting and another for receiving data. We would only agree with this if the links were in the same direction.
Now let’s talk about the extra features brought by HyperTransport 3.0.
The new AC operating mode (translation: using a signaling system similar to networks) allows the HyperTransport bus to achieve longer distances. The goal is to allow the HyperTransport to be used directly to interconnect cases, boards, and backplanes. Processors won’t use this feature.
Link splitting, also called un-ganging, allows the 16-bit link to be accessed as two independent eight-bit links. This can be used for increasing the number of links available, thereby allowing more CPUs to be interconnected without using any extra fancy hardware.
Hot Plugging allows HyperTransport devices to be installed and removed with the bus running. It won’t allow you to replace your CPU with the system turned on because the CPU has several other pins besides the HyperTransport, but this feature may be used on storage servers based on HT3.
Finally, the Dynamic Link Clock/Width Adjustment is used by HT3-based AMD CPUs when installed on a motherboard using an HT3 chipset. This feature allows the CPU to change the clock and the number of bits that are transmitted per clock cycle dynamically, i.e., “on the fly.” The idea here is to reduce power consumption. For example, if the CPU senses that running its HyperTransport bus at 2,600 MHz (10,400 MB/s) is too much for what it is doing at the moment, it can reduce the bus to 1,000 MHz (4,000 MB/s) or whatever rate it thinks will be more suitable. The same is true for the number of bits transferred per clock cycle; it can be reduced from 16 to whatever number the CPU feels like, based on the current system usage.