A preview of CUDA toolkit 5 is already available for Registered developers and NVIDIA is expected to roll out the production release soon. Besides habitual addition of more image processing functionality, the new toolkit offers some great features including:
- Dynamic parallelism
- GPUDirect for clusters (RDMA)
- GPU object linking
- NVIDIA Nsight, Eclipse Edition
Dynamic parallelism, supported by Kepler architecture, removed some long awaited bottlenecks from CUDA programming. The developers are no longer restricted to call GPU Kernels from host side alone. With dynamic parallelism, a developer can call a CUDA kernel as well as CUDA library function directly from another kernel. You can also create and use streams as well as CUDA events without CPU involvement. Dynamic parallelism will help in further optimizing algorithms such as fluid dynamics simulation and will also benefit recursion depended problems.
GPUDirect in CUDA 5 will allow one GPU to directly access another GPU’s memory via network without copying a single byte onto the CPU memory. This will result is significant performance improvement in data transfer on Tesla and Quadro based cluster systems. To use GPUDirect all you need is two Fermi cards (or latest architecture) on the same PCI Express bus.
With toolkit 5, individual .cu files can be compiled to .o objects and then linked later on, rather than compiling a massive heap of .cu source files at once. It’s also shipped with Nsight Eclipse Edition which is first of its kind IDE that will help you develop CUDA applications more quickly and productively. See the demo of the Nsight Eclipse IDE in the video below. Download CUDA 5