NVIDIA Tesla Technology
By Gabriel Torres on November 14, 2007
With the processing power of GPUs (i.e., the graphics chip located on the video card) increasing everyday – to the point that they are more powerful than regular CPUs for math calculations – it’s been discussed for quite some time now if GPUs couldn’t be used as a CPU for processing regular programs. The idea, known as GPGPU (General-Purpose Computation on GPUs), is to throw to the GPU calculations that would otherwise be done by the CPU in order to increase performance.
The problem is how to do this, as a programmer would have to know how to program to a specific GPU in order to make a program that could use the system GPU, and this program wouldn’t work with a different GPU.
To solve this issue NVIDIA launched a free C compiler to their GeForce 8800 series, called CUDA. With CUDA any programmer can easily compile their programs written in C to use the power of the system GPU to process their program.
Going one step further, NVIDIA launched a series of “video cards” called Tesla. These “video cards” feature GeForce 8800 GPUs but they do not produce video: they are targeted to be used as CPUs, processing programs. In this article we will tell you everything you need to know about Tesla, including a lot of pictures of Tesla solutions.
These programs must be compiled with CUDA, of course. So regular users won’t benefit this technology, i.e., don’t think that by installing one of these cards on your PC your processing performance will automatically increase.
Any kind of heavy-calculation program that does a lot of things in parallel can be benefited from the use of GPGPU – if they are compiled to use the GPU, of course. This includes mostly simulations (physics, financial, medical, biological and chemical, for example).
One very interesting thing about CUDA is that you don’t need to have a Tesla card installed to use it. So a programmer can buy any video card from the GeForce 8800 series and try it out to see if using the GPU instead of the CPU will in fact improve the performance of the application that is being written. If it works out fine, then the programmer can think of buying a more power system, namely a Tesla solution.
So far NVIDIA has launched three Tesla products: a basic card, called C870, which is a GeForce 8800 video card but without a video output. The “C” on its name stands for “card.” This card has 1.5 GB of memory and has a math processing performance of 500 GFLOPS (billions of floating-point operations per second). Using a standard PCI Express x16 connector this card can be installed on any desktop computer.
This basic card is the building block for the other two Tesla products available: D870 and S870.
D870 – where the “D” on its name stands for “Desktop” – is a small external case containing two C870 cards, so the processing power of this solution is of 1 TFLOP (trillion of floating-point of operations per second). This case is connected to the main PC through a cable, which is basically an expansion of the PCI Express bus.
Then we have the most high-end model, Tesla S870, which holds four C870 cards inside. We will talk about this product in the next page.
Tesla S870 – where the “S” standard for “Server” – is a 1U rack-mount server case containing four C870 cards. So this product is targeted to be connected to servers. On Figures 4 and 5 you can see a Tesla S870 opened.
Tesla S870 is connected to a server through a cable, which is basically a PCI Express extension. In Figure 6, you can see the connector available on S870 for its connection to a server and on Figures 7 and 8 you can see the PCI Express card that is installed on the server
In Figure 9, you can see the cable used to connect Tesla S870 to a server.
During SC-07 (Supercomputing Show 2007, where we took these pictures, by the way) NVIDIA displayed a cluster containing four Tesla S870 systems attached to eight servers for a total of 16 Tesla C870 cards or around 8 TFLOPS of available performance.
This system was running a biological simulation comparing the performance of the server cluster with and without the Tesla systems. As you can see in Figure 12, the cluster got a 9x performance gain with the use of the four S870 systems.