NVIDIA CUDA: Kepler Vs. Fermi Architecture

The era of the next-generation graphics is finally upon us. If you’ve been hankering after a graphics card upgrade lately and wanted to see what NVIDIA’s reply to AMD’s Radeon HD 7970 is, wait no further. The green squad has taken the cloaks off of their GeForce GTX 680, a new piece of graphics silicon targeted at consumers and the enthusiast mob based on its latest Kepler architecture. NVIDIA claims that its new GTX 680 is the fastest and most power-efficient GPU ever made offering significant performance enhancements over its rivals.

The GeForce GTX 580 was the company’s last big launch that comprised of ground-breaking Fermi technology, at the time rewriting the record books as the fastest single-GPU graphics card. The latest release, i.e. GTX 680, is based on Kepler architecture named after German mathematician Johannes Kepler. In this article we’ll try to address the improvements Kepler offer over Fermi architecture.

Core Improvements in Kepler:

  • More CUDA cores, up from 512 to 1536
  • GPU Boost for automatic overclocking
  • Refined instruction pipeline
  • Adaptive Vsync
  • Low power consumption

Kepler Vs. Fermi Architecture

NVIDIA’s Kepler architecture is built on the foundation of NVIDIA’s Fermi GPU architecture first established in 2010. Fermi presented a completely new parallel geometry pipeline optimized for tessellation and displacement mapping. Kepler continues to provide the best tessellation performance and combines this with new features specifically designed to deliver a smoother, richer and faster gaming experience.

The big jump in the architecture is the number of CUDA cores per SM which has been increased to 192. The GTX 680 has a total of 8 streaming multiprocessors, making a total of 1536 CUDA cores in contrast to 512 Cores of GTX 580. The reason NVIDIA has managed to squeeze so many cores onto one die is because Kepler is the firm’s first chip produced on a smaller 28nm process. Despite tripling the number of cores, the phsyical die size is about two thirds smaller than Fermi, and has just 500 million more transistors (3.5billion compared to 3billion).

But that’s not the whole story. Built into each SM of Kepler (now known as SMX), is the NVIDIA’s new polymorph engine having 16 Texture processing units. This gives a total of 128 Texture units.

Compute Capability and Software Model

No changes have been done in the device compute capability. However, an interesting feature for developers is the new Bindless Textures in the Kepler architecture. According to NVIDIA, the number of textures that can be used for rendering a scene has been increased from 128 to 1 Million!

The lift-off of NVIDIA’s Kepler GPU architecture will allow developers to incorporate even larger levels of geometric density, physical simulations, stereoscopic 3D processing, and advanced antialiasing effects into their next generation of DX11 titles.

This marks the major differences between Kepler and previous architecture. If you’re interested to know more in detail, read From Fermi to Kepler.

Read complete comparison: CUDA Differences b/w Architectures and Compute Capability