Memory Cache and Fetch Unit
Just to remember, memory cache is a high-speed memory (static RAM or SRAM) embedded inside the CPU used to store data that the CPU may need. If the data required by the CPU isn’t located in the cache, it must go all the way to the main RAM memory, which reduces its speed, as the RAM memory is accessed using the CPU external clock rate. For example, on a 3.2 GHz CPU, the memory cache is accessed at 3.2 GHz but the RAM memory is accessed only at 800 MHz.
Core microarchitecture was created having the multi-core concept in mind, i.e., more than one chip per packaging. On Pentium D, which is the dual-core version of Pentium 4, each core has its own L2 memory cache. The problem with that is that at some moment one core may run out of cache while the other may have unused parts on its own L2 memory cache. When this happens, the first core must grab data from the main RAM memory, even though there was empty space on the L2 memory cache of the second core that could be used to store data and prevent that core from accessing the main RAM memory.
On Core microarchitecture this problem was solved. The L2 memory cache is shared, meaning that both cores use the same L2 memory cache, dynamically configuring how much cache each core will take. On a CPU with 2 MB L2 cache, one core may be using 1.5 MB while the other 512 KB (0.5 MB), contrasted to the fixed 50%-50% division used on previous dual-core CPUs.
It is not only that. Prefetches are shared between the cores, i.e., if the memory cache system loaded a block of data to be used by the first core, the second core can also use the data already loaded on the cache. On the previous architecture, if the second core needed a data that was located on the cache of the first core, it had to access it through the external bus (which works under the CPU external clock, which is far lower than the CPU internal clock) or even grab the required data directly from the system RAM.
Intel also has improved the CPU prefetch unit, which watches for patterns in the way the CPU is currently grabbing data from memory, in order to try to “guess” which data the CPU will try to load next and load it to the memory cache before the CPU requires it. For example, if the CPU has just loaded data from address 1, then asked for data located on address 3, and then asked for data located on address 5, the CPU prefetch unit will guess that the program running will load data from address 7 and will load data from this address before the CPU asks for it. Actually this idea isn’t new and all CPUs since the Pentium Pro use some kind of predicting to feed the L2 memory cache. On Core microarchitecture Intel has just enhanced this feature by making the prefetch unit look for patterns in data fetching instead of just static indicators of what data the CPU would ask next.