Hardware Secrets
Home | Camera | Case | CE | Cooling | CPU | Input | Memory | Mobile | Motherboard | Networking | Power | Storage | Video | Other
Content
Articles
Editorial
First Look
Gabriel’s Blog
News
Reviews
Tutorials
Main Menu
About Us
Awarded Products
Datasheets
Dictionary
Download
Drivers
Facebook
Links
Manufacturer Finder
Newsletter
RSS Feed
Test Your Skills
Twitter
Newsletter
Subscribe today!
Search
Recommended
The Unabridged Pentium 4: IA32 Processor Genealogy
The Unabridged Pentium 4: IA32 Processor Genealogy, by Tom Shanley (Addison-Wesley Professional), starting at $8.33


Home » CPU
Inside Pentium M Architecture
Author: Gabriel Torres 125,486 views
Type: Tutorials Last Updated: January 4, 2006
Page: 2 of 7
Pentium M Pipeline

Pipeline is a list of all stages a given instruction must go through in order to be fully executed. Intel didn’t disclosure Pentium M’s pipelines, so we will talk about Pentium III’s. Pentium M’s pipeline has probably more stages than Pentium III’s, but analyzing Pentium III’s will give you a good idea on how Pentium M’s architecture work.

Just to remember, Pentium 4 pipeline has 20 stages and the pipeline of newer Pentium 4 CPUs based on ”Prescott“ core has 31 stages!

In Figure 1, you can see Pentium III’s 11-stage pipeline.

Pentium M
click to enlarge
Figure 1: Pentium III pipeline.

Here is a basic explanation of each stage, which explains how a given instruction is processed by P6-class processors. If you think this is too complex for you, don’t worry. This is just a summary of what we will be explaining in the next pages.

  • IFU1: Loads one line (32 bytes, i.e., 256 bits) from L1 instruction cache and stores it in the Instruction Streaming Buffer.
  • IFU2: Identifies the instructions boundaries within 16 bytes (128 bits). Since x86 instructions don’t have a fixed length this stage marks where each instruction starts and ends within the loaded 16 bytes. If there is any branch instruction within these 16 bytes, its address is stored at the Branch Target Buffer (BTB), so the CPU can later use this information on its branch prediction circuit.
  • IFU3: Marks to which instruction decoder unit each instruction must be sent. There are three different instruction decoder units, as we will explain later.
  • DEC1: Decodes the x86 instruction into a RISC microinstruction (a.k.a. micro-op). Since the CPU has three instructions decode units, it is possible to decode up to three instructions at the same time.
  • DEC2: Sends the micro-ops to the Decoded Instruction Queue, which is capable to store up to six micro-ops. If the instruction was converted in more than six micro-ops, this stage must be repeated in order to catch the missing micro-ops.
  • RAT: Since P6 microarchitecture implements out-of-order execution (OOO), the value of a given register could be altered by an instruction executed before its ”correct“ (i.e., original) place in the program flow, corrupting the data needed by another instruction. So, to solve this kind of conflict, at this stage the original register used by the instruction is changed to one of the 40 internal registers that P6 microarchitecture has.
  • ROB: At this stage three micro-ops are loaded into the Reorder Buffer (ROB). If all data necessary for the execution of a micro-op are available and if there is an open slot at the Reservation Station micro-op queue, then the micro-op is moved to this queue.
  • DIS: If the micro-op wasn’t sent to the Reservation Station micro-op queue, this is done at this stage. The micro-op is sent to the proper execution unit.
  • EX: The micro-op is executed at the proper execution unit. Usually each micro-op needs only one clock cycle to be executed.
  • RET1: Checks at the Reorder Buffer if there is any micro-op that can be flagged as ”executed“.
  • RET2: When all micro-ops related to the previous x86 instruction were already removed from the Reorder Buffer and all micro-ops related to the current x86 instruction were executed, these micro-ops are removed from the Reorder Buffer and the x86 registers are updated (the inverse process done at RAT stage). The retirement process must be done in order. Up to three micro-ops can be removed from the Reorder Buffer per clock cycle.

Don’t worry if all this sounded confusing to you. We will explain all this better in the next pages.

Print Version | Send to Friend | Bookmark Article « Previous |  Page 2 of 7  | Next »

Related Content
  • Intel is going to Identify Their Processors Through Numbers
  • Does Celeron Centrino Exist?
  • Intel Fab18 Factory Tour in Kiryat Gat, Israel
  • Intel EM64T Technology Explained
  • Celeron, Pentium Dual Core and Athlon X2: Which One is the Best USD 70 CPU?

  • RSSLatest Content
    ASUS ZenFone 5 Smartphone Review
    October 15, 2014 - 7:00 PM
    ASUS AM1M-A Motherboard
    October 15, 2014 - 4:30 AM
    ASRock X99 Extreme4 Motherboard
    October 14, 2014 - 4:10 AM
    Cooler Master Elite 130 Case Review
    October 9, 2014 - 2:46 AM
    ASUS RAMPAGE V EXTREME Motherboard
    October 7, 2014 - 2:50 AM
    ASRock Fatal1ty X99M Killer Motherboard
    October 6, 2014 - 5:40 AM
    ASUS X99-DELUXE Motherboard
    September 30, 2014 - 1:07 AM
    MSI GT70 2PE Dominator Pro Laptop Review
    September 25, 2014 - 1:15 AM







    2004-14, Hardware Secrets, LLC. All rights reserved.
    Advertising | Legal Information | Privacy Policy
    All times are Pacific Standard Time (PST, GMT -08:00)