| Inside Pentium 4 Architecture | |
| By Gabriel Torres on October 18, 2005 | Page 7 of 7 |
![]()
Dispatch and Execution Units As we’ve seen, Pentium 4 has four dispatch ports numbered 0 thru 3. Each port is connected to one, two or three execution units, as you can see on Figure 6.
The units marked as “clock x2” can execute two microinstructions per clock cycle. Ports 0 and 1 can send two microinstructions per clock cycle to these units. So the maximum number of microinstructions that can be dispatched per clock cycle is six:
Keep in mind that complex instructions may take several clock cycles to be processed. Let’s take an example of port 1, where the complete floating point unit is located. While this unit is processing a very complex instruction that takes several clock ticks to be executed, port 1 dispatch unit won’t stall: it will keep sending simple instructions to the ALU (Arithmetic and Logic Unit) while the FPU is busy. So, even thought the maximum dispatch rate is six microinstructions, actually the CPU can have up to seven microinstructions being processed at the same time. The two double-speed ALUs can process two microinstructions per clock cycle. The other units need at least one clock cycle to process the microinstructions they receive. So, Pentium 4 architecture is optimized for simple instructions. As you can see on Figure 6, dispatch ports 2 and 3 are dedicated to memory operations: load (read data from memory) and store (write data to memory), respectively. As for memory operation, it is interesting to note that port 0 is also used during store operations (see Figure 5 and the list of operations on Figure 6). On such operations, port 3 is used to send the memory address, while port 0 is used to send the data to be stored at this address. This data can be generated by either the ALU or the FPU, depending on the kind of data to be stored (integer or floating point/SSE). On Figure 6 you have a complete list of the kinds of instructions each execution unit deals with. FXCH and LEA (Load Effective Address) are two x86 instructions. Actually Intel’s implementation for FXCH instruction on Pentium 4 caused a great deal of surprise to all experts, because on processors from previous generation (Pentium III) and processors from AMD this instruction can be executed at zero clock cycle, while on Pentium 4 it takes some clock cycles to be executed. That’s it. If you want to compare Pentium 4 architecture to Athlon 64's, read our Inside AMD64 Architecture tutorial. | |
| Originally at http://www.hardwaresecrets.com/article/235/7 | Pages (7): 1 2 3 4 5 6 7 » |
© 2004-8, Hardware Secrets, LLC. All Rights Reserved. Total or partial reproduction of the contents of this site, as well as that of the texts available for downloading, be this in the electronic media, in print, or any other form of distribution, is expressly forbidden. Those who do not comply with these copyright laws will be indicted and punished according to the International Copyrights Law. We do not take responsibility for material damage of any kind caused by the use of information contained in Hardware Secrets. | |