Notice: Undefined index: article270 in /www/hardwaresecrets/article.php on line 5 Inside Pentium M Architecture | Hardware Secrets
Hardware Secrets
Home | Camera | Case | CE | Cooling | CPU | Input | Memory | Mobile | Motherboard | Networking | Power | Storage | Video | Other
First Look
Gabriel’s Blog
Main Menu
About Us
Awarded Products
Manufacturer Finder
RSS Feed
Test Your Skills
Subscribe today!
The Unabridged Pentium 4: IA32 Processor Genealogy
The Unabridged Pentium 4: IA32 Processor Genealogy, by Tom Shanley (Addison-Wesley Professional), starting at $114.57

Home » CPU
Inside Pentium M Architecture
Author: Gabriel Torres 127,129 views
Type: Tutorials Last Updated: January 4, 2006
Page: 2 of 7
Pentium M Pipeline

Pipeline is a list of all stages a given instruction must go through in order to be fully executed. Intel didn’t disclosure Pentium M’s pipelines, so we will talk about Pentium III’s. Pentium M’s pipeline has probably more stages than Pentium III’s, but analyzing Pentium III’s will give you a good idea on how Pentium M’s architecture work.

Just to remember, Pentium 4 pipeline has 20 stages and the pipeline of newer Pentium 4 CPUs based on ”Prescott“ core has 31 stages!

In Figure 1, you can see Pentium III’s 11-stage pipeline.

Pentium M
click to enlarge
Figure 1: Pentium III pipeline.

Here is a basic explanation of each stage, which explains how a given instruction is processed by P6-class processors. If you think this is too complex for you, don’t worry. This is just a summary of what we will be explaining in the next pages.

  • IFU1: Loads one line (32 bytes, i.e., 256 bits) from L1 instruction cache and stores it in the Instruction Streaming Buffer.
  • IFU2: Identifies the instructions boundaries within 16 bytes (128 bits). Since x86 instructions don’t have a fixed length this stage marks where each instruction starts and ends within the loaded 16 bytes. If there is any branch instruction within these 16 bytes, its address is stored at the Branch Target Buffer (BTB), so the CPU can later use this information on its branch prediction circuit.
  • IFU3: Marks to which instruction decoder unit each instruction must be sent. There are three different instruction decoder units, as we will explain later.
  • DEC1: Decodes the x86 instruction into a RISC microinstruction (a.k.a. micro-op). Since the CPU has three instructions decode units, it is possible to decode up to three instructions at the same time.
  • DEC2: Sends the micro-ops to the Decoded Instruction Queue, which is capable to store up to six micro-ops. If the instruction was converted in more than six micro-ops, this stage must be repeated in order to catch the missing micro-ops.
  • RAT: Since P6 microarchitecture implements out-of-order execution (OOO), the value of a given register could be altered by an instruction executed before its ”correct“ (i.e., original) place in the program flow, corrupting the data needed by another instruction. So, to solve this kind of conflict, at this stage the original register used by the instruction is changed to one of the 40 internal registers that P6 microarchitecture has.
  • ROB: At this stage three micro-ops are loaded into the Reorder Buffer (ROB). If all data necessary for the execution of a micro-op are available and if there is an open slot at the Reservation Station micro-op queue, then the micro-op is moved to this queue.
  • DIS: If the micro-op wasn’t sent to the Reservation Station micro-op queue, this is done at this stage. The micro-op is sent to the proper execution unit.
  • EX: The micro-op is executed at the proper execution unit. Usually each micro-op needs only one clock cycle to be executed.
  • RET1: Checks at the Reorder Buffer if there is any micro-op that can be flagged as ”executed“.
  • RET2: When all micro-ops related to the previous x86 instruction were already removed from the Reorder Buffer and all micro-ops related to the current x86 instruction were executed, these micro-ops are removed from the Reorder Buffer and the x86 registers are updated (the inverse process done at RAT stage). The retirement process must be done in order. Up to three micro-ops can be removed from the Reorder Buffer per clock cycle.

Don’t worry if all this sounded confusing to you. We will explain all this better in the next pages.

Print Version | Send to Friend | Bookmark Article « Previous |  Page 2 of 7  | Next »

Related Content
  • Intel is going to Identify Their Processors Through Numbers
  • Does Celeron Centrino Exist?
  • Intel Fab18 Factory Tour in Kiryat Gat, Israel
  • Intel EM64T Technology Explained
  • Celeron, Pentium Dual Core and Athlon X2: Which One is the Best USD 70 CPU?

  • RSSLatest Content
    ASRock Z97 Anniversary Motherboard
    December 16, 2014 - 4:27 AM
    Gigabyte H81M-S2PH Motherboard
    December 12, 2014 - 3:05 AM
    Aerocool Dead Silence Case Review
    December 2, 2014 - 3:00 AM
    NZXT S340 Case Review
    November 27, 2014 - 3:45 AM
    AMD A4-5000 CPU Review
    November 26, 2014 - 3:10 AM
    Samsung Galaxy Note Pro 12.2 Tablet Review
    November 25, 2014 - 3:00 AM
    ASUS X99-PRO Motherboard
    November 5, 2014 - 3:00 AM
    ASRock QC5000-ITX Motherboard
    November 4, 2014 - 3:00 AM
    Gigabyte X99-UD3 Motherboard
    October 30, 2014 - 8:30 AM

    2004-14, Hardware Secrets, LLC. All rights reserved.
    Advertising | Legal Information | Privacy Policy
    All times are Pacific Standard Time (PST, GMT -08:00)