Inside the Intel Haswell Microarchitecture
By Gabriel Torres on May 31, 2013
The Haswell microarchitecture expands the Ivy Bridge microarchitecture by adding a few new features, such as a new graphics engine, the new AVX2 instruction set, new dispatch ports, and more. Let’s see what is new.
For a better understanding of this tutorial, we recommend you read our “Inside the Intel Ivy Bridge Microarchitecture” tutorial before continuing.
The bad news for users who like to upgrade their computers by simply replacing the CPU is that CPUs based on the new Haswell microarchitecture will use a different socket type (LGA1150 on desktop models), making it impossible for you to replace your current CPU with a Haswell-based model.
The Haswell microarchitecture expands the Ivy Bridge microarchitecture by adding the following new features:
Other features remain the same as the Ivy Bridge microarchitecture.
Let’s now talk a little more about these new features.
The AVX2 instruction set expands the existing AVX instruction set to allow the use of 256-bit registers with integer operations. With the AVX instruction set, integer operations are limited to 128-bit registers, and 256-bit registers are only used with floating-point operations.
In addition, the AVX2 instruction supports three-operand Fused Multiply-Add (FMA) instructions (a.k.a. FMA3), which are able to execute operations such as a x b + c with a single instruction. These instructions were already supported by AMD CPUs based on the Piledriver microarchitecture. Two FMA execution units were added to the microarchitecture, as we will show on the next page.
New 15 bit manipulation instructions (BMI) were added. These instructions, which are listed in Figure 1, may be used for cryptography, indexing, and data conversion.
The third new instruction set added to the Haswell microarchitecture is called TSX or Transactional Synchronization eXtensions, and is used to help solve data synchronization issues when the same data may be used by different processes that are running at the same time.
All new instruction sets are described in detail in the “Intel Architecture Instruction Set Extensions Programming Reference.” (The file downloads without an extension; it is a PDF file.)
To use any of those new instruction sets, the program you are running must support them, of course.
From the Nehalem microarchitecture on, Intel CPUs have six dispatch ports to connect the CPU’s Reservation Station (where microinstructions awaiting to be processed are stored) to the CPU’s execution units. The Haswell microarchitecture adds two new dispatch ports, increasing the number of microinstructions the Reservation Station can send to the execution units by 33%.
The Haswell microarchitecture has a total of 17 execution units, while the Sandy Bridge and Ivy Bridge microarchitectures have 15 and the Nehalem microarchitecture has 12.
An important enhancement added to the Haswell microarchitecture is the addition of 256-bit datapaths between the Reservation Station and the execution units. The Sandy Bridge and Ivy Bridge microarchitectures use 128-bit datapaths and, therefore, when 256-bit AVX instructions have to be executed, two execution engines must be combined. This doesn’t happen with the Haswell microarchitecture.
The new 2D video engine expands the capabilities of the video engine present on the Ivy Bridge microarchitecture by adding support for 4K resolution (up to 3840 x 2160 @ 60 Hz on DisplayPort or up to 4096 x 2304 @ 24 Hz on HDMI), integrated media encoder now supporting MPEG2 and SVC encoding, integrated media decoder now supporting MJPEG and SVC decoding, and support for new image enhancements, as shown in Figure 3.
The Haswell microarchitecture comes with a new DirectX 11.1 graphics engine, and its main block diagram can be seen in Figure 4. The exact number of execution engines (“EU,” in Figures 4 and 5) varies according to the CPU model. Also, building blocks can be duplicated in order to achieve CPUs with higher 3D performance, as shown in Figure 5. Internally, Intel calls the three configurations they will offer “GT1,” “GT2,” and “GT3.” The commercial names are listed in Figure 6. The “GT3e” part (Iris Pro Graphics 5200) will be a “GT3” part with 128 MiB of L4 memory cache for use as video memory integrated in the package as the CPU, but not constructed inside the CPU chip. This part should be targeted to the mobile market.