| Inside AMD64 Architecture | |
| By Gabriel Torres on May 16, 2006 | Page 6 of 9 |
Memory Cache and Fetch Unit On AMD64 architecture the datapath between the L2 memory cache and L1 data cache is 128-bit wide. On Intel’s 7th generation CPUs (Pentium 4) this datapath is of 256 bits and on Intel’s 6th generation CPUs (Pentium Pro, Pentium II, Pentium III and Pentium M) this datapath is of 64 bits.
The L1 instruction cache of AMD64 processors include a pre-decode logic, i.e. each byte store inside the L1 instruction cache has some bits to mark the start and the end of each instruction. Since x86 instructions don’t have a fixed length (they can have anything from 1 byte to 15 bytes*), the process of detecting where each instruction starts and ends is very important to the CPU decoder. * You can find yourself quite lost by this statement, since you were always told that x86 architecture uses 32-bit (i.e. 4 bytes) instructions, so further explanation is necessary in order to clarify this affirmation. Inside the CPU what is considered an instruction is the instruction opcode (the machine language equivalent of the assembly language instruction) plus all its required data. This is because in order to be executed, the instruction must enter the execution engine “completed”, i.e. together with all its required data. Also, the size of each x86 instruction opcode is variable and not fixed at 32 bits, as you may think. For example, an instruction like mov eax, (32-bit data), which stores the (32-bit data) into the CPU’s EAX register is considered internally as a 40-bit length instruction (mov eax translates into a 8-bit opcode plus the 32 bits from its data). Actually, having instruction with several different lengths is what characterizes a CISC (Complex Instruction Set Computing) instruction set. If you want to learn more about this subject, read AMD64 Architecture Programmer’s Manual Vol. 3: General Purpose and System Instructions. The L1 instruction cache provides 76 extra bits to the fetch unit: 52 pre-decode bits, eight parity bits and 16 branch selectors. The branch selector bits are used by the CPU to try to predict branches on the program running. So in reality the L1 instruction cache is bigger than the 64 KB announced, since it stores pre-decode and branching information. In fact, the real size of AMD64 L1 instruction cache is of 102 KB (64 KB instruction cache + 4 KB parity + 26 KB pre-decode data + 8 KB branch data). AMD64 architecture uses a 2,048-entry BTB (Branch Target Buffer), the same size used on AMD’s previous architecture, K7. BTB is a small memory that lists all identified branches on the program. Pentium 4’s BTB is of 4,096 entries while on 6th generation processors from Intel this buffer was of 512 entries. Another branching register, BHT (Branch History Table) – which AMD calls GHBC (Global History Bimodal Counter) – has 16,384 entries on AMD64 architecture, while on Pentium 4 it is of 4,096 entries, the same size of the BHT found on AMD K7 architecture. This two-bit register is used to track down conditional branches: "strongly taken", "taken", "not taken" and "strongly not taken". | |
| Originally at http://www.hardwaresecrets.com/article/324/6 | Pages (9): 1 2 3 4 5 6 7 8 » ... Last » |
© 2004-8, Hardware Secrets, LLC. All Rights Reserved. Total or partial reproduction of the contents of this site, as well as that of the texts available for downloading, be this in the electronic media, in print, or any other form of distribution, is expressly forbidden. Those who do not comply with these copyright laws will be indicted and punished according to the International Copyrights Law. We do not take responsibility for material damage of any kind caused by the use of information contained in Hardware Secrets. | |