Aerial Object Tracking on CUDA with CUVI

Unmanned aerial vehicles or UAVs with a real-time video feed offer numerous vision applications ranging from civil to military use cases. Many of these applications depend on detection of objects and tracking those objects along the scenes. The feedback from tracking data can also be used to automatically steer UAVs to follow that object or person of interest. In this article we’ll use code example to show how this can be done using CUVI. We will use the following video feed captured via a UAV, courtesy of IVUL lab at kaust.

The above video is captured at a resolution of 1280×720 at 24fps and is increased to 30fps for viewing purpose. In this example, we will track the black SUV across the feed using CUVI’s goodFeaturesToTrack and trackFeatures functionality.

Select Features to Track

We start by finding good/trackable features on our object of interest that is the vehicle. We extracted manually the location of the vehicle in the first frame to be within the coordinates (430,260) and (460,310) hence we use that as ROI (region of interest). The following code inputs the first frame from disk and defines ROI.

CuviImage frame1("image_path_on_disk", CUVI_LOAD_IMAGE_GRAYSCALE);
//Region of interest that contains vehicle
CuviRect roi(430, 260, 30, 50);

Next, we define the feature selection criteria that comprises of various parameters. The most important of them all is the featureMinDistance that defines how sparsely on dense packed features you need in a given region. Since our image size is small and ROI even smaller we select a smaller number to get as many features as we can.

//Parameters for feature selection
static const float featureQuality = 0.006f; //Quality of a feature
static const int featureMinDistance = 2; //Minimum distance between two features
static const int blockSize = 3;	//block size for computing Eigen Matrix
static const float k = -2.0f;	//k for Harris feature detector

CuviFeaturesCriteria feature_criteria(CUVI_FEATURES_HARRIS, featureQuality, featureMinDistance, blockSize, k);

Later, we will see that we are able to extract only 12 usable features in the given ROI despite requesting 200 features. These 12 however are sufficient enough to successfully track the object along the frames. We now perform goodFeaturesToTrack on our first frame using our selected criteria to get features.

//Number of requested features
int feature_count = 200;
//Call any Feature Detector on first Frame( KLT | HARRIS | PETER )
cuvi::computerVision::goodFeaturesToTrack(frame1, roi, features1, feature_count, feature_criteria);

Here’s the CLI output of the 12 extracted features. We draw a red rectangle that surrounds all the extracted features in a way that the feature point with lowest coordinate (x,y) values is the top left corner of the box and the feature point with highest coordinate (x,y) values is the bottom right corner.

No. of features found = 12
Feature: 451 268
Feature: 450 268
Feature: 436 266
Feature: 437 266
Feature: 450 262
Feature: 435 267
Feature: 437 267
Feature: 451 262
Feature: 437 274
Feature: 432 280
Feature: 449 280
Feature: 455 278

Track Features

It’s important to note here that since we are planning to track the vehicle and not the road or surroundings, it’s better to select an ROI that consists of only the vehicle or a subsection of it and certainly not the background. We now will use these 12 viable features and track them along the complete length of the video using trackFeatures. We start by defining the criteria for tracking and then tracking the features on consecutive images until we reach the end of the feed. The parameters that defines the criteria are self explanatory and elaborated further with the comments below.

//Parameters for feature tracking
static const int pyramidLevels = 2; //Level Of Scaling
static const CuviSize trackingWindow = CuviSize(12, 12); //Size of tracking window
static const float residue = 10.0f; //Absolute Difference Between Original Location Window & Tracked Location Window
static const int iterations = 100; //Maximum number of iterations to find a feature

CuviTrackingCriteria tracking_criteria(pyramidLevels, trackingWindow, iterations, residue);

//Track Features Using of Frame#1 onto Frame#2 using Tracker
cuvi::computerVision::trackFeatures(image1, image2, features1, features2, feature_count, tracking_criteria);

In the trackFeatures function, feature1 contains those previously extracted 12 features while features2 contains the list of features being tracked in the next image from those 12 features. The same processes continues to the proceeding image until we reach the end of the feed. We draw a green rectangle that surrounds the previous image’s features that are successfully tracked in the current image in the similar manner as before. The size of the rectangle tends to change as some features are not tracked while others change location. The final result is below:

Benchmark on Jetson Nano

Speed is the second name of CUVI. Our primitives enable users to build real time vision applications such as this for up to 8k images. The reason for selecting trackFeatures for this example is that it’s blazing fast due its sparse nature and ideal for use cases such as above where the person of interest or the object of interest is but a small subsection of the image. For the sake of benchmarking goodFeaturesToTrack, we will use the whole image as ROI and set the requested number of features to be 200 per each megapixel of image size. To benchmark trackFeatures, we use 200 trackable features per each megapixel of image size. We use Jetson Nano (in low power) as the underlying hardware for benchmark since it can be housed easily into many UAVs.

FunctionImage SizeTime (ms)FPS
goodFeaturesToTrack1280×720 (HD)0.4614832,166.93
goodFeaturesToTrack 1920×1080 (FHD)1.028972.76
goodFeaturesToTrack 3840×2160 (4k)4.102243.78
trackFeaturesHD and 200 features0.5808311,721.67
trackFeatures FHD and 400 features0.6989121,430.80
trackFeatures 4k and 800 features0.9668481,034.29
Benchmark performed on Jetson Nano (low power mode)

Multi-Object Tracking

The same concept can be used to track multiple objects in real time. For each object of interest, we calculate its features and track these features independently to other objects. Each additional track adds a very small latency to the overall pipeline as shown above in the benchmark section. Note that this algorithm works well as long as the object remains within the frame at all times and there’s no obstruction.

Rally Points

Rally Point or RTL (Return to Launch) point helps UAV return safely in case of loss of contact. In case of unavailability of GPS satellites, computer vision helps to maintain a track of launch/rally point and helps UAV return to that point at the end of the mission.

CUVIlib – CUDA Vision & Imaging Library – is a simple to use, GPU accelerated computer vision SDK. The library is available for download for free for personal unlimited use. For more information, visit our website at