Memory cache is a high performance kind of memory, also called static memory. The kind of memory used on the computer main RAM memory is called dynamic memory. Static memory consumes more power, is more expensive and is physically bigger than dynamic memory, but it is a lot faster. It can work at the same clock as the CPU, which dynamic memory is not capable of.
Since going to the “external world” to fetch data makes the CPU to work at a lower clock rate, memory cache technique is used. When the CPU loads a data from a certain memory position, a circuit called memory cache controller (not drawn in Figure 6 in the name of simplicity) loads into the memory cache a whole block of data below the current position that the CPU has just loaded. Since usually programs flow in a sequential way, the next memory position the CPU will request will probably be the position immediately below the memory position that it has just loaded. Since the memory cache controller already loaded a lot of data below the first memory position read by the CPU, the next data will be inside the memory cache, so the CPU doesn’t need to go outside to grab the data: it is already loaded inside in the memory cache embedded in the CPU, which it can access at its internal clock rate.
The cache controller is always observing the memory positions being loaded and loading data from several memory positions after the memory position that has just been read. To give you a real example, if the CPU loaded data stored in the address 1,000, the cache controller will load data from “n” addresses after the address 1,000. This number “n” is called page; if a given processor is working with 4 KB pages (which is a typical value), it will load data from 4,096 addresses below the current memory position being load (address 1,000 in our example). By the way, 1 KB equals to 1,024 bytes, that’s why 4 KB is 4,096 not 4,000. In Figure 7 we illustrate this example.
The bigger the memory cache, the higher the chances of the data required by the CPU are already there, so the CPU will need to directly access RAM memory less often, thus increasing the system performance (just remember that every time the CPU needs to access the RAM memory directly it needs to lower its clock rate for this operation).
We call a “hit” when the CPU loads a required data from the cache, and we call a “miss” if the required data isn’t there and the CPU needs to access the system RAM memory.
L1 and L2 means “Level 1” and “Level 2”, respectively, and refers to the distance they are from the CPU core (execution unit). One common doubt is why having three separated cache memories (L1 data cache, L1 instruction cache and L2 cache). Pay attention to Figure 6 and you will see that L1 instruction cache works as an “input cache”, while L1 data cache works as an “output cache”. L1 instruction cache – which is usually smaller than L2 cache – is particularly efficient when the program starts to repeat a small part of it (loop), because the required instructions will be closer to the fetch unit.
On the specs page of a CPU the L1 cache can be found with different kinds of representation. Some manufacturers list the two L1 cache separately (some times calling the instruction cache as “I” and the data cache as “D”), some add the amount of the two and writes “separated” – so a “128 KB, separated” would mean 64 KB instruction cache and 64 KB data cache –, and some simply add the two and you have to guess that the amount is total and you should divide by two to get the capacity of each cache. The exception, however, goes to the Pentium 4 and newer Celeron CPUs based on sockets 478 and 775.
Pentium 4 processors (and Celeron processors using sockets 478 and 775) don’t have a L1 instruction cache, instead they have a trace execution cache, which is a cache located between the decode unit and the execution unit. So, the L1 instruction cache is there, but with a different name and a different location. We are mentioning this here because this is a very common mistake, to think that Pentium 4 processors don’t have L1 instruction cache. So when comparing Pentium 4 to other CPUs people would think that its L1 cache is much smaller, because they are only counting the 8 KB L1 data cache. The trace execution cache of Pentium 4 and Celeron CPUs is of 150 KB and should be taken in account, of course.
- 1. Introduction
- 2. Clock
- 3. External Clock
- 4. Block Diagram of a CPU
- 5. Memory Cache
- 6. Branching
- 7. Processing Instructions
- 8. Out-Of-Order Execution (OOO)
- 9. Speculative Execution