Penryn Core New Features
By Gabriel Torres on March 29, 2007 Page 4 of 5

FPU Enhancements

The new Penryn core brings two enhancements to the CPU floating-point unit (FPU), one for its divider engine and another for its shuffle engine.

Fast Radix-16 Divider

This is an enhancement on the way that the CPU floating-point unit (FPU) handles division operations. On Core 2 CPUs, division operations process two bits per clock cycle. The new divider circuit implemented on Penryn is able to process four bits per clock cycle, meaning it is two times faster on division operations that Core 2 CPUs.

On Figure 7 you can see a comparison between the FPU of the Core 2 Duo CPU and the FPU of the new Penryn core. The “y” axis represents clock cycles, so the lower the bars, the better (less time is spend processing an instruction). On the “x” axis you can see the several division instructions selected for this comparison.

Here is a small glossary for understanding Figure 7 if you are not familiar with CPU instructions:

  • int = Integer
  • SP = Single Precision (32-bit numbers)
  • DP = Double Precision (64-bit numbers)
  • EP = Double Extended Precision (80-bit numbers)

Intel Penryn Core
click to enlarge
Figure 7: Performance comparison of the new divider engine used on Penryn Core.

Super Shuffle Engine

This is an enhancement on the way the CPU floating-point unit (FPU) handles shuffle operations used by SSE data formatting instructions, allowing Penryn-based CPUs to perform some instructions in less clock cycles compared to the core currently used by Core 2 Duo processors (Merom).

On Figure 8 you can see a comparison between the number of clock cycles these two cores take to perform each one of these instructions. The smaller the bars, the better – less clock cycles means less time spend, thus higher speed.

As you can see, several 128-bit SSE instructions that took more than one clock cycle to be processed are now processed in just one clock cycle, improving SSE performance. SSE (Streaming SIMD Extensions) is used by multimedia applications that implement this kind of instruction.

Intel Penryn Core
click to enlarge
Figure 8: Performance comparison of the new shuffle engine used on Penryn Core.


Originally at http://www.hardwaresecrets.com/article/434/4Pages (5): 1 2 3 4 5 »

© 2004-8, Hardware Secrets, LLC. All Rights Reserved.

Total or partial reproduction of the contents of this site, as well as that of the texts available for downloading, be this in the electronic media, in print, or any other form of distribution, is expressly forbidden. Those who do not comply with these copyright laws will be indicted and punished according to the International Copyrights Law.

We do not take responsibility for material damage of any kind caused by the use of information contained in Hardware Secrets.