Pentium 4 Pipeline
Pipeline is a list of all stages a given instruction must go through in order to be fully executed. On 6th generation Intel processors, like Pentium III, their pipeline had 11 stages. Pentium 4 has 20 stages! So, on a Pentium 4 processor a given instruction takes much longer to be executed then on a Pentium III, for instance! If you take the new 90 nm Pentium 4 generation processors, codenamed “Prescott”, the case is even worse because they use a 31-stage pipeline! Holy cow!
This was done in order to increase the processor clock rate. By having more stages each individual stage can be constructed using fewer transistors. With fewer transistors is easier to achieve higher clock rates. In fact, Pentium 4 is only faster than Pentium III because it works at a higher clock rate. Under the same clock rate, a Pentium III CPU would be faster than a Pentium 4 because of the size of the pipeline.
Because of that, Intel has already announced that their 8th generation processors will use Pentium M architecture, which is based on Intel’s 6th generation architecture (Pentium III architecture) and not on Netburst (Pentium 4) architecture. This arquitecture, called Core, can be studied on our Inside Core Microarchitecture tutorial.
In Figure 2, you can see Pentium 4 20-stage pipeline. So far Intel didn’t disclosure Prescott’s 31-stage pipeline, so we can’t talk about it.
Here is a basic explanation of each stage, which explains how a given instruction is processed by Pentium 4 processors. If you think this is too complex for you, don’t worry. This is just a summary of what we will be explaining in the next pages.
- TC Nxt IP: Trace cache next instruction pointer. This stage looks at branch target buffer (BTB) for the next microinstruction to be executed. This step takes two stages.
- TC Fetch: Trace cache fetch. Loads, from the trace cache, this microinstruction. This step takes two stages.
- Drive: Sends the microinstruction to be processed to the resource allocator and register renaming circuit.
- Alloc: Allocate. Checks which CPU resources will be needed by the microinstruction – for example, the memory load and store buffers.
- Rename: If the program uses one of the eight standard x86 registers it will be renamed into one of the 128 internal registers present on Pentium 4. This step takes two stages.
- Que: Queue. The microinstructions are put in queues according to their types (for example, integer or floating point). They are held in the queue until there is an open slot of the same type in the scheduler.
- Sch: Schedule. Microinstructions are scheduled to be executed according to its type (integer, floating point, etc). Before arriving to this stage, all instructions are in order, i.e., on the same order they appear on the program. At this stage, the scheduler re-orders the instructions in order to keep all execution units full. For example, if there is one floating point unit going to be available, the scheduler will look for a floating point instruction to send it to this unit, even if the next instruction on the program is an integer one. The scheduler is the heart of the out-of-order engine of Intel 7th generation processors. This step takes three stages.
- Disp: Dispatch. Sends the microinstructions to their corresponding execution engines. This step takes two stages.
- RF: Register file. The internal registers, stored in the instructions pool, are read. This step takes two stages.
- Ex: Execute. Microinstructions are executed.
- Flgs: Flags. The microprocessor flags are updated.
- Br Ck: Branch check. Checks if the branch taken by the program is the same predicted by the branch prediction circuit.
- Drive: Sends the results of this check to the branch target buffer (BTB) present on the processor’s entrance.
- 1. Introduction
- 2. Pentium 4 Pipeline
- 3. Memory Cache and Fetch Unit
- 4. Decoder
- 5. Allocator and Register Renamer
- 6. Scheduler
- 7. Dispatch and Execution Units