Unmanned aerial vehicles or UAVs with a real-time video feed offer numerous vision applications ranging from civil to military use cases. Many of these applications depend on detection of objects and tracking those objects along the scenes. The feedback from tracking data can also be used to automatically steer UAVs to follow that object or person of interest. In this article we’ll use code example to show how this can be done using CUVI. We will use the following video feed captured via a UAV, courtesy of IVUL lab at kaust.
The above video is captured at a resolution of 1280×720 at 24fps and is increased to 30fps for viewing purpose. In this example, we will track the black SUV across the feed using CUVI’s goodFeaturesToTrack and trackFeatures functionality.
Select Features to Track
We start by finding good/trackable features on our object of interest that is the vehicle. We extracted manually the location of the vehicle in the first frame to be within the coordinates (430,260) and (460,310) hence we use that as ROI (region of interest). The following code inputs the first frame from disk and defines ROI.
CuviImage frame1("image_path_on_disk", CUVI_LOAD_IMAGE_GRAYSCALE); //Region of interest that contains vehicle CuviRect roi(430, 260, 30, 50);
Next, we define the feature selection criteria that comprises of various parameters. The most important of them all is the featureMinDistance that defines how sparsely on dense packed features you need in a given region. Since our image size is small and ROI even smaller we select a smaller number to get as many features as we can.
//Parameters for feature selection static const float featureQuality = 0.006f; //Quality of a feature static const int featureMinDistance = 2; //Minimum distance between two features static const int blockSize = 3; //block size for computing Eigen Matrix static const float k = -2.0f; //k for Harris feature detector CuviFeaturesCriteria feature_criteria(CUVI_FEATURES_HARRIS, featureQuality, featureMinDistance, blockSize, k);
Later, we will see that we are able to extract only 12 usable features in the given ROI despite requesting 200 features. These 12 however are sufficient enough to successfully track the object along the frames. We now perform goodFeaturesToTrack on our first frame using our selected criteria to get features.
//Number of requested features int feature_count = 200; //Call any Feature Detector on first Frame( KLT | HARRIS | PETER ) cuvi::computerVision::goodFeaturesToTrack(frame1, roi, features1, feature_count, feature_criteria);
Here’s the CLI output of the 12 extracted features. We draw a red rectangle that surrounds all the extracted features in a way that the feature point with lowest coordinate (x,y) values is the top left corner of the box and the feature point with highest coordinate (x,y) values is the bottom right corner.
No. of features found = 12 Feature: 451 268 Feature: 450 268 Feature: 436 266 Feature: 437 266 Feature: 450 262 Feature: 435 267 Feature: 437 267 Feature: 451 262 Feature: 437 274 Feature: 432 280 Feature: 449 280 Feature: 455 278
It’s important to note here that since we are planning to track the vehicle and not the road or surroundings, it’s better to select an ROI that consists of only the vehicle or a subsection of it and certainly not the background. We now will use these 12 viable features and track them along the complete length of the video using trackFeatures. We start by defining the criteria for tracking and then tracking the features on consecutive images until we reach the end of the feed. The parameters that defines the criteria are self explanatory and elaborated further with the comments below.
//Parameters for feature tracking static const int pyramidLevels = 2; //Level Of Scaling static const CuviSize trackingWindow = CuviSize(12, 12); //Size of tracking window static const float residue = 10.0f; //Absolute Difference Between Original Location Window & Tracked Location Window static const int iterations = 100; //Maximum number of iterations to find a feature CuviTrackingCriteria tracking_criteria(pyramidLevels, trackingWindow, iterations, residue); //Track Features Using of Frame#1 onto Frame#2 using Tracker cuvi::computerVision::trackFeatures(image1, image2, features1, features2, feature_count, tracking_criteria);
In the trackFeatures function, feature1 contains those previously extracted 12 features while features2 contains the list of features being tracked in the next image from those 12 features. The same processes continues to the proceeding image until we reach the end of the feed. We draw a green rectangle that surrounds the previous image’s features that are successfully tracked in the current image in the similar manner as before. The size of the rectangle tends to change as some features are not tracked while others change location. The final result is below:
Benchmark on Jetson Nano
Speed is the second name of CUVI. Our primitives enable users to build real time vision applications such as this for up to 8k images. The reason for selecting trackFeatures for this example is that it’s blazing fast due its sparse nature and ideal for use cases such as above where the person of interest or the object of interest is but a small subsection of the image. For the sake of benchmarking goodFeaturesToTrack, we will use the whole image as ROI and set the requested number of features to be 200 per each megapixel of image size. To benchmark trackFeatures, we use 200 trackable features per each megapixel of image size. We use Jetson Nano (in low power) as the underlying hardware for benchmark since it can be housed easily into many UAVs.
|Function||Image Size||Time (ms)||FPS|
|trackFeatures||HD and 200 features||0.580831||1,721.67|
|trackFeatures||FHD and 400 features||0.698912||1,430.80|
|trackFeatures||4k and 800 features||0.966848||1,034.29|
The same concept can be used to track multiple objects in real time. For each object of interest, we calculate its features and track these features independently to other objects. Each additional track adds a very small latency to the overall pipeline as shown above in the benchmark section. Note that this algorithm works well as long as the object remains within the frame at all times and there’s no obstruction.
Rally Point or RTL (Return to Launch) point helps UAV return safely in case of loss of contact. In case of unavailability of GPS satellites, computer vision helps to maintain a track of launch/rally point and helps UAV return to that point at the end of the mission.
CUVIlib – CUDA Vision & Imaging Library – is a simple to use, GPU accelerated computer vision SDK. The library is available for download for free for personal unlimited use. For more information, visit our website at cuvilib.com.