Here is how the memory cache works. The CPU fetch unit looks for the next instruction to be executed in the L1 instruction cache. If it isn’t there, it will look for it on the L2 cache. Then, if it is not there, it will have to go to the RAM memory to fetch the instruction.
We call a “hit” when the CPU loads a required instruction or data from the cache, and we call a “miss” if the required instruction or data isn’t there and the CPU needs to access the system RAM memory directly.
Of course when you turn your PC on the caches are empty, so accessing the RAM memory is required – this is an inevitable cache miss. But after the first instruction is loaded, the show begins.
When the CPU loads an instruction from a certain memory position, a circuit called memory cache controller loads into the memory cache a small block of data below the current position that the CPU has just loaded. Since usually programs flow in a sequential way, the next memory position the CPU will request will probably be the position immediately below the memory position that it has just loaded. Since the memory cache controller already loaded some data below the first memory position read by the CPU, the next data will probably be inside the memory cache, so the CPU doesn’t need to go outside to grab the data: it is already loaded inside in the memory cache embedded in the CPU, which it can access at its internal clock rate.
This amount of data is called line and it is usually 64 bytes long (more on that in the next page).
Besides loading this small amount of data, the memory controller is always trying to guess what the CPU will ask next. A circuit called prefetcher, for example, loads more data located after these first 64 bytes from RAM into the memory cache. If the program continues to load instructions and data from memory positions in a sequential way, the instructions and data that the CPU will ask next will be already loaded into the memory cache.
So we can summarize how the memory cache works as:
1. The CPU asks for instruction/data stored in address “a.”
2. Since the contents from address “a” aren’t inside the memory cache, the CPU has to fetch it directly from RAM.
3. The cache controller loads a line (typically 64 bytes) starting at address “a” into the memory cache. This is more data than the CPU requested, so if the program continues to run sequentially (i.e., asks for address a+1) the next instruction/data the CPU will ask will be already loaded in the memory cache.
4. A circuit called prefetcher loads more data located after this line, i.e., starts loading the contents from address a+64 on into the cache. To give you a real example, Pentium 4 CPUs have a 256-byte prefetcher, so it loads the next 256 bytes after the line already loaded into the cache.
If programs always run sequentially the CPU would never need to fetch data directly from the RAM memory – except to load the very first instruction – as the instructions and data required by the CPU would always be inside the memory cache before the CPU would ask for them.
However programs do not run like this, from time to time they jump from one memory position to another. The main challenge of the cache controller is trying to guess what address the CPU will jump, loading the content of this address into the memory cache before the CPU asks for it in order to avoid the CPU having to go to the RAM memory, what slows the system down. This task is called branch predicting and all modern CPUs have this feature.
Modern CPUs have a hit rate of at least 80%, meaning that at least 80% of the time the CPU isn’t accessing the RAM memory directly, but the memory cache instead.