AMD CPUs use a hybrid CISC/RISC architecture since their 5th generation CPUs (namely K5). Intel started using this approach only from their 6th generation CPUs on. The processor must accept CISC instructions, also known as x86 instructions, since all software available today is written using this kind of instructions. A RISC-only CPU couldn’t be create for the PC because it wouldn’t run software we have available today, like Windows and Office.
So, the solution used by all processors available on the market today from both AMD and Intel is to use a CISC/RISC decoder. Internally the CPU processes RISC-like instructions, but its front-end accepts only CISC x86 instructions.
CISC x86 instructions are referred as “instructions” as the internal RISC instructions are referred as “microinstructions”, “micro-op”, “µops” or “ROP”. AMD64 architecture has a third instruction type, called macro-op or “MOP”, which is the instruction resulted from the instruction decoder. AMD64 deals internally with macro-ops. When the macro-op reaches the appropriate scheduler, it is further decoded into micro-ops and then these micro-ops are executed. If you pay attention this is somewhat what Intel is doing on their new Core architecture, with their macro-fusion feature. However, while macro-fusion on Core-based processors only works with branch instructions, on AMD64 the use of macro-ops is done for all instructions.
The RISC microinstructions, however, cannot be accessed directly, so we couldn’t create software based on these instructions to bypass the decoder. Also, each CPU uses its own RISC instructions, which are not public documented and are incompatible with microinstructions from other CPUs. I.e., AMD64 microinstructions are different from Pentium 4 microinstructions, which are different from AMD’s K7 architecture microinstructions.
Depending on the complexity of the x86 instruction, it has to be converted into several RISC microinstructions.
On AMD64 architecture x86 instructions can be converted into macro-ops using three different ways: using a simple decoder, called DirectPath Single, which translates one common x86 instruction into a single macro-op; using also a simple decoder, called DirectPath Double, which translates one x86 instruction into two macro-ops; or using a complex decoder, called DirectPath Vector, which translates one complex x86 instruction into several macro-ops. The DirectPath Vector has to call a ROM memory (called Microcode Sequencer) to convert the x86 instruction.
Here is how the AMD64 decoder works. On Pick stage, also known as Scan, the CPU looks and separates the instructions present in its Instruction Byte Buffer, deciding which path to use: DirectPath or VectorPath.
Then comes the Decode stage, which is broken into two steps, where the x86 instructions are actually converted into macro-ops. This stage is equivalent of the Align stage found on K7 processors. The maximum decoder output rate is of six macro-ops per clock cycle, three for DirectPath and three for VectorPath.
The macro-ops go to the Pack stage (which is the equivalent of the Decode 1 stage on K7 architecture), where the macro-ops are packed together, so three macro-ops are sent to the next stage, pack/decode, which does some more decoding and sends the macro-ops to the Instruction Control Unit, which is the name given by AMD to what Intel calls Reorder Buffer (ROB).

click to enlarge
Figure 13: AMD64 Decoder Unit.