Intel’s Next Generation Microarchitecture
On this IDF we could get more details on the next microarchitecture that will be used on all new Intel CPUs starting on the second half of 2006 and which was developed by the Haifa team. As we mentioned, this microarchitecture is based on Pentium M’s and not on Pentium 4’s. We have already posted a tutorial about Pentium 4 microarchitecture, so you may be interested in reading it for better understanding this subject and also for understanding the technical terms we are talking about below. This will be Intel’s 8th x86 microarchitecture generation.
First, it will use a 14-stage pipeline, opposed to Pentium 4’s 21-stage pipeline and to Prescott’s 30-stage pipeline. So in this respect this new microarchitecture is more like Intel’s 6th generation CPUs, like Pentium III, than Intel’s 7th generation CPUs.
It will have four dispatch units against the three used on Pentium 4. Simply put, it will be able to send more microinstructions per time to the CPU execution units, which obviously increases performance.
On the cache side, it will use a shared L2 cache for all the CPU cores, just like Yonah, the dual-core Pentium M CPU manufactured in the new 65-nm process that will be released in the beginning of next year. This was made in order to decrease the cache-miss rate, i.e., decrease the number of times that the CPU run out of cache and needs to go to the slow RAM memory to grab data. With a shared L2 cache between all CPU cores, the CPU can dynamically give more or less L2 cache for each core depending on the demand. In a dual-core CPU with separated L2 caches, if one core runs out of cache, it must go and get data directly on the RAM memory, even if the L2 cache from the other core has plenty of space available. On the shared model, if the CPU has 2 MB total L2 cache, for example, one core may be using 1.5 MB and the other 0.5 MB of it, thus decreasing the number of times the CPU needs to grab data directly on RAM memory, increasing performance.
Intel is also promising that the L1 memory caches from each core will be able to directly communicate to each other and also a higher bandwidth between the CPU core and the L2 memory cache.
Another thing new on this architecture will be a new multimedia instruction set (SSE4?), the fifth multimedia instruction set since MMX was released back in 1996.
All CPUs will incorporate the 64-bit addressing extension, EM64T.
The difference between Merom, for mobile market, Conroe, for desktops, and Woodcrest, for servers, will be basically the L2 memory cache size, the TLB (Translation Look-aside Buffer) size and the amount of RAM memory the CPU can address. TLB is a table used by the virtual memory system (a.k.a. swap file) that lists the physical address page number associated with each virtual address page number.