Inside AMD K10 Architecture
By Gabriel Torres on September 3, 2007
K10 is the name of the new architecture that new processors from AMD will be using, like the forthcoming Phenom and the Opteron based on the much expected “Barcelona” core. In fact, a lot of people are making a big confusion calling K10 architecture as “Barcelona,” while Barcelona is only one of the CPUs that will use this new architecture. In this tutorial we will explain what is new on the K10 architecture and will also present a complete AMD roadmap showing all products based on K10 architecture that are planned so far.
The new K10 architecture is based on the K8 (a.k.a. AMD64) architecture with some enhancements. Thus we recommend you to read our Inside AMD64 Architecture before continuing to read the present tutorial. By the way, AMD never released an architecture called K9, from K8 they jumped to K10.
The foil presented in Figure 1 shows the main enhancements K10 microarchitecture brings over K8.
The main points that were enhanced were:
In Figure 2, you can see a list of new features introduced by K10 architecture. We will be explaining each one of them in the next pages.
Just to remember, memory cache is a high-speed memory (static RAM or SRAM) embedded inside the CPU used to store data that the CPU may need. If the data required by the CPU isn’t located in the cache, it must go all the way to the main RAM memory, which reduces its speed, as the RAM memory is accessed using the CPU external clock rate. For example, on an AMD 3 GHz CPU, the memory cache is accessed at 3 GHz but the RAM memory is accessed at 800 MHz (if you are using DDR2-800 memories) or less.
On Pentium D and AMD dual-core CPUs based on K8 architecture each CPU core has its own L2 memory cache. On Intel dual-core CPUs based on Core and Pentium M microarchitectures, there is only L2 memory cache, which is shared between the two cores.
Intel says that this shared architecture is better, because on the separated cache approach at some moment one core may run out of cache while the other may have unused parts on its own L2 memory cache. When this happens, the first core must grab data from the main RAM memory, even though there was empty space on the L2 memory cache of the second core that could be used to store data and prevent that core from accessing the main RAM memory. So on a Core 2 Duo processor with 4 MB L2 memory cache, one core may be using 3.5 MB while the other 512 KB (0.5 MB), contrasted to the fixed 50%-50% division used on other dual-core CPUs.
On the other hand, current quad-core Intel CPUs like Core 2 Extreme QX and Core 2 Quad use two dual-core chips, meaning that this sharing only occurs between cores 1 & 2 and 3 & 4. In the future Intel plans to launch quad-core CPUs using a single chip. When this happens the L2 cache will be shared between the four cores.
In Figure 3, you can see a comparison between these three L2 memory cache solutions.
K10 architecture adds a shared L3 memory cache inside the CPU. This is shown in Figure 4. The size of this cache will depend on the CPU model, just like what happens with the size of L2 cache.
AMD calls this approach as “Balanced Smart Cache.”
By the way, L1 memory cache continues unaltered: 64 KB for instructions and 64 KB for data per core (on Figure 1 AMD shows “512 KB,” but this is the total figure for a quad-core CPU).
The higher the data the CPU fetches from the RAM memory per clock cycle the faster the system will be. As we explained in the previous page, the CPU is a lot faster than the RAM memory, so the less times it needs to fetch data from the memory the better. Loading lots of data at once prevents this from happening.
Memory modules are 64-bit devices. Instead of launching 128-bit memory modules, CPU and chipset manufacturers came with the idea of dual-channel memory, a way to access two memory modules simultaneously, as if these two 64-bit memory modules were a single 128-bit module. This doubles the memory access transfer rate, as now instead of one 64-bit data two 64-bit data can be loaded per clock cycle.
The problem with dual-channel technology is that the second 64-bit data that is loaded together with the data that was originally requested is necessarily stored on the following address. For example, if the CPU asked for the data A stored in address 1, the memory controller will automatically load data A and data B, which is stored in address 2.
If the CPU doesn’t have a use for this data B, this second load will be completely wasted, as the memory controller cannot use this parallel loading to read a data that is stored on an address that is not the following address.
The memory controller used on K10 architecture allows the CPU to load a data stored on an address different from the next address. This independency will increase the CPU performance by not wasting memory loads. Figure 5 illustrates this feature, where the CPU wanted to load data A and F. On K8 architecture, illustrated on the left side, two data fetches are needed (as two data are completely useless), while on K10 architecture only one data fetch is needed.
Informally the independent architecture used on K10 is called "un-ganged", while the previous implementation that is used nowadays is called "ganged".
AMD calls this feature as “AMD Memory Optimizer Technology.”
By the way, it seems that AMD fixed the “broken divider” problem found on current socket AM2 CPUs. Let’s wait to see if that is really true.
The majority of new features introduced by the new K10 architecture are targeted to save energy – and thus make the CPU to produce less heat.
Here are these features:
Now let’s talk about the CPUs that will use the new K10 architecture.
You can see K10-based server CPUs roadmap on Figures 9 and 10.
As expected the first CPU to be launched using the new K10 architecture will be a quad-core Opteron processor based on “Barcelona” core. In Figure 11, you can see the Opteron "Barcelona" models AMD plans to launch and below it a table containing all models released so far.
AMD Opteron 2300 Series
AMD Opteron 8300 Series
Here is a quick summary of the cores that will be launched for the server market based on K10 architecture:
Now let’s see the planned desktop models.
You can see K10-based desktop CPUs roadmap in Figure 12.
AMD didn’t say the model numbers that will be released.
Here is a quick summary of the cores that will be launched for the desktop market based on K10 architecture:
Socket AM2+ and socket 1207+ are sockets AM2 and 1207 (socket F) supporting HyperTransport 3.0 and the Dual Dynamic Power Management (DDPM) technologies. Like we said before, you can install K10-based processors on old socket AM2 or socket F motherboards, however the CPU won’t have access to the new transfer rates and features provided by HyperTransport 3.0 nor the separated voltage for the memory controller – both CPU and memory controller will be fed with the same voltage.