Starting off a new section: ‘CUDA with Examples’

Hello everyone!

Learning how to program with examples is always the easiest way to learn new stuff. Since we have been exploring the hidden gems of CUDA (C for CUDA specifically) for some time now, we thought why not share our experiences, tips and tricks and workarounds with others in the CUDA developers’ community.

In short, we will be posting small tips and tricks, detailed tutorials almost daily on our new section titled ‘CUDA with Examples’. Your feedback, comments and questions related to the section would be greatly appreciated!

NVIDIA’s CUDA – Differences between Fermi and Previous Architectures

Note: See updated complete list of differences between all Compute Capabilities of CUDA.

The release of next generation CUDA architecture, Fermi, marks the fact that CUDA is still an evolving architecture. Fermi having compute capability of 2.0 has several differences from previous architectures. In addition to increasing the number of threads per blocks and packing 512 cores in a single chip, Fermi can also run multiple Kernels simultaneously. Shared memory has also been increased from 16 KB to 48KB and most importantly the number of streaming processors in one SM have been increased to 32. The comparison below, by NVIDIA, gives a complete picture of the differences between compute capability 1.0, 1.1, 1.2, 1.3 and 2.0 of NVIDIA’s CUDA enabled devices.

Continue reading “NVIDIA’s CUDA – Differences between Fermi and Previous Architectures”