It was just another day at work for Brent, scanning the floor of the ocean off the coast of Florida, whose company Queens Jewels LLC owns “salvaging rights” to 1715 fleet, a group of 11 Spanish ships which sank somewhere off the shore of Florida.

And then Brent stuck gold, literally. He has unearthed jewels from one of the sunken ships, gold coins to be precise, which are worth over 1 million dollars in value.



Undersea imagery is awesome. It shows us the parallel world of creatures, coral reefs and whatnot which covers about 75% of our planet Earth. There are a plethora of cameras which record and stream underwater videos and images, ranging from analog cameras which are still widely used in commercial diving to HD and even 2K cameras streaming digital undersea video. However, water is different in the way it scatters light and visibility as a diver or an ROV goes further down. The visibility and the colors we see in undersea imagery are very sensitive to the depth as well as the equipment used. Most cameras do not have any “undersea filter” which can clean up your awesome scuba dive or in the case of commercial diving, you may miss visual details of important underwater infrastructure you were trying to inspect.

before
after


We heard and we responded. One dominant request we got from our users was to be able to quickly test out the library without setting it up first, since sometimes, they ran into issues due to machine incompatibility. Today, we announce the release of a live CUVI demo page where you can quickly try out most common functions of CUVI GPU Imaging SDK without downloading or installing anything.

This saves you hours you would have otherwise spent on installing and configuring CUVI to try out simple functions from our Color Operations module, for example.

The demo runs on a beefy virtual server on the Cloud with NVIDIA Grid K520 which is a pretty decent card to test performance of different CUVI functions. What you should note though, is that the performance on your own machine with a similar capability NVIDIA CUDA enabled card would be a bit more, since the demo runs on virtualized hardware, not dedicated bare metal silicon.



NVIDIA has planned to drop the support for GPUs with Tesla architecture (compute capability 1.x) in upcoming releases of CUDA Toolkit. In fact, GPUs with compute capability 1.0 have already been removed as a target device from CUDA Toolkit 6.5, released in August 2014. With toolkit 6.5, you can no longer specify compute_10, sm_10 for the code generation. Not only this, NIVIDIA has also removed the CC 1.0 from the comparison tables in the Programming Guide 6.5

The default architecture has been changed to compute_20, sm_20 in the rules file of CUDA Toolkit 6.5. As for the rest of Tesla architectures, i.e. CC 1.1, 1.2 and 1.3, they are still supported as a target, but are marked as deprecated. The following warning is generated by the compiler if we attempt to compile the code for Tesla architecture with CUDA 6.5:

CUDACOMPILE : nvcc warning : The ‘compute_11′, ‘compute_12′, ‘compute_13′, ‘sm_11′, ‘sm_12′, and ‘sm_13′ architectures are deprecated, and may be removed in a future release.



Since its first release back in year 2007 with compute capability 1.0, CUDA has three more architectural releases and eight more compute capabilities which marks the fact that it’s an ever evolving architecture. Although CUDA is forward compatible but every new release comes with its own new features worth using and an increased thread/memory support. As a rule of thumb every new architecture runs the CUDA code faster than previous generation given both cards have same number of cores.

The comparison below gives a list of feature/functionality support between compute capabilities of NVIDIA’s CUDA enabled devices. Note that atomic operations weren’t supported in the first release and since they are so important, NVIDIA now practically compares architectures from 1.1 and later.



Continuing the legacy to provide the best imaging algorithm at lightning fast speed, we are proud to announce the addition of DFPD debayer algorithm in CUVI which is more robust than the existing demosaic and shows no artifacts at high feature areas. The previous implementation of demosaic algorithm (which uses bilinear interpolation) is super fast giving a throughput of more than 500 fps on full HD image on a common GPU yet it has its downside.

Since color planes have severe aliasing, a simple interpolation (or HQ bilinear interpolation for that matter) of the individual planes has little effect in removing the artifacts that appear at high feature regions. Hence we need a a better reconstruction approach:

li_vs_dfpd

Not only the new algorithm removes artifacts at high-feature regions, the colors get more natural and crisp. This is due to the fact that DFPD (directional filtering with posteriori decision) algorithm better estimates the green plane taking into account the natural edges of the image and then reconstruct the missing red/blue pixels based on that reconstructed green image instead of calculating all values directly.

This huge improvement over the existing implementation comes at a price: more computational cost. The DFPD algorithm is almost half as slow as the previous one, however, it still gives a whopping 263 fps on a full HD image. Note this time excludes the memory transfers. And as always as in CUVI you can use this GPU accelerated DFPD debayer with just three lines of code:

CuviImage input("D:/bayer.tif", CUVI_LOAD_IMAGE_GRAYSCALE_KEEP_DEPTH), output;

cuvi::colorOperations::demosaic_DFPD(input, output, CUVI_BAYER_RGGB);

cuvi::io::saveImage(output, "D:/debayered.png");

There’s an additional refinement step (optional) that comes with DFPD to further refine the pixels values and cut down the unnatural high frequencies. By default, it’s set to false but you can enable it with a flag:

// Further refine the results
cuvi::colorOperations::demosaic_DFPD(input, output, CUVI_BAYER_RGGB, true);

Download the latest cuvi from here or get more information on the features at our wiki.



CUVILib provides out-of-the-box hyper-accelerated Imaging functionality, ready for use in your film scanning, restoration & recoloring applications. With CUVI, you can deliver supercomputing like performance to your users without the need to set up expensive high-end CPUs.

house_restored



Nsight Eclipse Edition is a full-featured IDE, powered by the Eclipse platform that provides a complete integrated development environment to edit, build, debug and profile CUDA C/C++ applications on MAC and Linux platforms. The combination of CUDA aware source editor and powerful debugging and profiling tools make Nsight Eclipse Edition the ultimate development platform for heterogeneous computing.



A preview of CUDA toolkit 5 is already available for Registered developers and NVIDIA is expected to roll out the production release soon. Besides habitual addition of more image processing functionality, the new toolkit offers some great features including:

  1. Dynamic parallelism
  2. GPUDirect for clusters (RDMA)
  3. GPU object linking
  4. NVIDIA Nsight, Eclipse Edition


It’s one thing to compare GPU code performance with CPU code performance. If the algorithm is parallel, GPU would beat CPU any day. In our case, CUVI beats the best (performance wise) CPU primitives library on the planet, Intel(r) IPP. Take a look at the performance figures.