Everything You Need to Know About The QuickPath Interconnect (QPI)
By Gabriel Torres on August 25, 2008
Since the beginning of times Intel CPUs use an external bus called Front Side Bus or simply FSB that is shared between memory and I/O requests. The next generation of Intel CPUs will have an embedded memory controller and thus will provide two external busses: a memory bus for connecting the CPU to the memory and an I/O bus to connect the CPU to the external world. This bus is a new bus called QuickPath Interconnect (QPI) and in this tutorial we will be explaining how it works.
On Figures 1 and 2 we are comparing the traditional architecture used by Intel CPUs and the new architecture that will be used by Intel CPUs with an integrated memory controller.
This is exactly the same idea that AMD has been using since 2003, when they released their first Athlon 64 CPU. Currently all CPUs from AMD have an integrated memory controller and they use a bus called HyperTransport to make the I/O communications. Though QuickPath Interconnect and HyperTransport have the same goal and work in a very similar fashion, they are not compatible.
By the way, technically speaking both QuickPath Interconnect and HyperTransport aren’t busses but a point-to-point connection. A bus is a set of wires that allows several components to be connected to it at the same time, while a point-to-point connection is a path connecting only two devices. Even though it is technically wrong call these connections “busses,” we will keep calling them this way for simplicity and also to facilitate the comprehension of the text by laymen that call these connections this way.
We will now explain you how the QuickPath Interconnect works. If you are interested you can read our tutorial The HyperTransport Bus Used By AMD Processors to compare these two external busses.
Just like HyperTransport, QuickPath Interconnect provides two separate lanes for the communication between the CPU and the chipset, as you can see in Figure 3. This allows the CPU to transmit (“write”) and receive (“read”) I/O data at the same time (i.e., in parallel). On the traditional architecture using a single external bus since the external bus is used for both input and output operations reads and writes cannot be done at the same time.
Speaking of chipsets, Intel will initially launch single-chip solutions. Since on CPUs with embedded memory controllers the equivalent of the north bridge chip is embedded inside the CPU, the chipset works as the south bridge chip or “I/O Hub” or simply “IOH” on Intel’s lingo.
So, how the QuickPath Interconnect works?
Each lane transfers 20 bits per time. From these 20 bits, 16 bits are used for data and the remaining 4 bits are used for a correction code called CRC (Cyclical Redundancy Check), which allows the receiver to check if the received data is intact.
The first version of the QuickPath Interconnect will work with a clock rate of 3.2 GHz transferring two data per clock cycle (a technique called DDR, Double Data Rate), making the bus to work as if it was using a 6.4 GHz clock rate (Intel uses the GT/s unit – which means giga transfers per second – to represent this). Since 16 bits are transmitted per time, we have a maximum theoretical transfer rate of 12.8 GB/s on each lane (6.4 GHz x 16 bits / 8). You will see some people saying that the QuickPath Interconnect has a maximum theoretical transfer rate of 25.6 GB/s because they simple multiply the transfer rate by two to cover the two datapaths. We don’t agree with this methodology. In brief, it is as if we said that a highway has a speed limit of 130 MPH just because there is a speed limit of 65 MPH in each direction. It makes no sense.
So compared to the front side bus QuickPath Interconnect transmits fewer bits per clock cycle but works at a far higher clock rate. Currently the fastest front side bus available on Intel processors is of 1,600 MHz (actually 400 MHz transferring four data per clock cycle, so QuickPath Interconnect works with a base clock eight times higher), meaning a maximum theoretical transfer rate of 12.8 GB/s, the same as QuickPath. QPI, however, offers 12.8 GB/s on each direction, while a 1,600 MHz front side bus provides this bandwidth for both read and write operations – and both cannot be executed at the same time on the FSB, limitation not present on QPI. Also since the front side bus transfers both memory and I/O requests, there are always more data being transferred on this bus compared to QPI, which carries only I/O requests. So QPI will work “less busy” and thus having more bandwidth available.
QuickPath Interconnect is also faster than HyperTransport. The maximum transfer rate of HyperTransport technology is 10.4 GB/s (which is already slower than QuickPath Interconnect), but current Phenom processors use a lower transfer rate of 7.2 GB/s. So Intel Core i7 CPU will have an external bus 78% faster than the one used on AMD Phenom processors. Other CPUs from AMD like Athlon (formerly known as Athlon 64) and Athlon X2 (formerly known as Athlon 64 X2) use an even lower transfer rate, 4 GB/s – QPI is 220% faster than that.
Going down to the electrical transmission, each bit is transferred using a differential pair, as shown in Figure 4 (please read this tutorial to understand how differential transmission works). So for each bit two wires are used. QuickPath Interconnect uses a total of 84 wires (including the two lanes), which is roughly half the number of wires used on the front side bus of current Intel CPUs (150 wires). So the third advantage of QuickPath Interconnect over front side bus is using less wires (in case you are wondering, the first advantage is providing separated datapaths for memory and I/O requests and the second advantage is providing separated datapaths for reads and writes).
QuickPath uses a layered architecture (i.e., similar to the architecture used on networks) with four layers: Physical, Link, Routing and Protocol.
Now let’s talk about some advanced techniques introduced on QuickPath Interconnect.
QuickPath Interconnect offers three power modes, called L0, L0s and L1. L0 is the mode where QPI will be fully operational. On L0s state the data wires and the circuits that drive these wires are turned off for saving energy. And on L1 everything is turned off, saving even more energy. Of course L1 state provides a higher wake-up time than L0s.
We mentioned that each QuickPath Interconnect lane is 20-bit wide. What we didn’t mention is that QuickPath allows each lane to be treated as being four 5-bit lanes. This division is available for improving reliability especially on the server market environment. You won’t see this feature being implemented on desktops.
When this feature is implemented, if the receiver perceives that the connection between it and the transmitter is physically damaged, it can shut down the portion of the bus that is damaged and operate transmitting fewer bits per time. This will of course lower the transfer rate but at the other hand the system won’t fail, what wouldn’t happen on a system not implementing this feature.