A new concept was introduced with Core microarchitecture: macro-fusion. Macro-fusion is the ability of joining two x86 instructions together into a single micro-op. This improves the CPU performance and lowers the CPU power consumption, since it will execute only one micro-op instead of two.
This scheme, however, is limited to compare and conditional branching instructions (i.e. CMP and TEST and Jcc instructions). For example, consider this piece of a program:
…
load eax, [mem1]
cmp eax, [mem2]
jne target
…
What this does is to load the 32-bit register EAX with data contained in memory position 1, compare its value with data contained in memory position 2 and, if they are different (jne = jump if not equal), the program goes to address “target”, if they are equal, the program continues on the current position.
With macro-fusion the comparison (cmp) and branching (jne) instructions will be merged into a single micro-op. So after passing thru the instruction decoder, this part of the program will something like this:
…
load eax, [mem1]
cmp eax, [mem2] + jne target
…
As we can see, we saved one instruction. The less instructions there are to be executed, the faster the computer will finish the execution of the task and also less power is generated.
The instruction decoder found on Core microarchitecture can decode four instructions per clock cycle, while previous CPUs like Pentium M and Pentium 4 are able to decode only three.
Because of macro-fusion, the Core microarchitecture instruction decoder pulls five instructions per time for the instruction queue, even though it can only decode four instructions per clock cycle. This is done so if two of these five instructions are fused into one, the decoder can still decode four instructions per clock cycle. Otherwise it would be partially idle whenever a macro-fusion took place, i.e. it would deliver only three micro-ops on its output while it is capable of delivering up to four.
On Figure 1 you can see a brief summary of what we explained on this page and on the previous one.

click to enlarge
Figure 1: Fetch unit and instruction decoder on Core microarchitecture.