Inside the Intel Haswell Microarchitecture
By Gabriel Torres on May 31, 2013


The Haswell microarchitecture expands the Ivy Bridge microarchitecture by adding a few new features, such as a new graphics engine, the new AVX2 instruction set, new dispatch ports, and more. Let’s see what is new.

For a better understanding of this tutorial, we recommend you read our “Inside the Intel Ivy Bridge Microarchitecture” tutorial before continuing.

The bad news for users who like to upgrade their computers by simply replacing the CPU is that CPUs based on the new Haswell microarchitecture will use a different socket type (LGA1150 on desktop models), making it impossible for you to replace your current CPU with a Haswell-based model.

The Haswell microarchitecture expands the Ivy Bridge microarchitecture by adding the following new features:

Other features remain the same as the Ivy Bridge microarchitecture.

Let’s now talk a little more about these new features.

New Instructions

The AVX2 instruction set expands the existing AVX instruction set to allow the use of 256-bit registers with integer operations. With the AVX instruction set, integer operations are limited to 128-bit registers, and 256-bit registers are only used with floating-point operations.

In addition, the AVX2 instruction supports three-operand Fused Multiply-Add (FMA) instructions (a.k.a. FMA3), which are able to execute operations such as a x b + c with a single instruction. These instructions were already supported by AMD CPUs based on the Piledriver microarchitecture. Two FMA execution units were added to the microarchitecture, as we will show on the next page.

New 15 bit manipulation instructions (BMI) were added. These instructions, which are listed in Figure 1, may be used for cryptography, indexing, and data conversion.

click to enlarge
Figure 1: New bit manipulation instructions

The third new instruction set added to the Haswell microarchitecture is called TSX or Transactional Synchronization eXtensions, and is used to help solve data synchronization issues when the same data may be used by different processes that are running at the same time.

All new instruction sets are described in detail in the “Intel Architecture Instruction Set Extensions Programming Reference.” (The file downloads without an extension; it is a PDF file.)

To use any of those new instruction sets, the program you are running must support them, of course.

New Dispatch Ports and Execution Units

From the Nehalem microarchitecture on, Intel CPUs have six dispatch ports to connect the CPU’s Reservation Station (where microinstructions awaiting to be processed are stored) to the CPU’s execution units. The Haswell microarchitecture adds two new dispatch ports, increasing the number of microinstructions the Reservation Station can send to the execution units by 33%.

The Haswell microarchitecture has a total of 17 execution units, while the Sandy Bridge and Ivy Bridge microarchitectures have 15 and the Nehalem microarchitecture has 12.

An important enhancement added to the Haswell microarchitecture is the addition of 256-bit datapaths between the Reservation Station and the execution units. The Sandy Bridge and Ivy Bridge microarchitectures use 128-bit datapaths and, therefore, when 256-bit AVX instructions have to be executed, two execution engines must be combined. This doesn’t happen with the Haswell microarchitecture.

click to enlarge
Figure 2: New dispatch ports and execution units of the Haswell microarchitecture

New 2D Video Engine

The new 2D video engine expands the capabilities of the video engine present on the Ivy Bridge microarchitecture by adding support for 4K resolution (up to 3840 x 2160 @ 60 Hz on DisplayPort or up to 4096 x 2304 @ 24 Hz on HDMI), integrated media encoder now supporting MPEG2 and SVC encoding, integrated media decoder now supporting MJPEG and SVC decoding, and support for new image enhancements, as shown in Figure 3.

click to enlarge
Figure 3: New image quality enhancements

New 3D Engine

The Haswell microarchitecture comes with a new DirectX 11.1 graphics engine, and its main block diagram can be seen in Figure 4. The exact number of execution engines (“EU,” in Figures 4 and 5) varies according to the CPU model. Also, building blocks can be duplicated in order to achieve CPUs with higher 3D performance, as shown in Figure 5. Internally, Intel calls the three configurations they will offer “GT1,” “GT2,” and “GT3.” The commercial names are listed in Figure 6. The “GT3e” part (Iris Pro Graphics 5200) will be a “GT3” part with 128 MiB of L4 memory cache for use as video memory integrated in the package as the CPU, but not constructed inside the CPU chip. This part should be targeted to the mobile market.

click to enlarge
Figure 4: The 3D video engine

click to enlarge
Figure 5: The 3D video engine

click to enlarge
Figure 6: Commercial names of the available engines

Originally at

© 2004-13, Hardware Secrets, LLC. All Rights Reserved.

Total or partial reproduction of the contents of this site, as well as that of the texts available for downloading, be this in the electronic media, in print, or any other form of distribution, is expressly forbidden. Those who do not comply with these copyright laws will be indicted and punished according to the International Copyrights Law.

We do not take responsibility for material damage of any kind caused by the use of information contained in Hardware Secrets.