We use cookies to provide the best site experience.
Ok, don't show again
Do you have any questions? Contact us!
July 6, 2021
Challenge accepted
Author: Luuk van Dijk, founder & CEO of Daedalean
Hats off to our friends at Iris Automation who posted a video of their traffic detection system with over 100 minutes of high-resolution video showing 300 aerial encounters.

As the skies are seeing more intense traffic from manned and unmanned aerial vehicles, better Detect and Avoid (DAA) systems will become ever more important to ensure safety of life and goods. Iris' Casia system demonstrates an impressive capability, suitable for mounting on the type of lighter drones (typically below 25 kg) that in most countries can be flown under simpler regulations than 14 CFR Part 23 and heavier aircraft. We congratulate them on their accomplishments.

In their latest newsletter, they write "Test yourself against Casia and see if you can spot the aircraft first!", which is not the kind of thing you can just put out there on a Friday afternoon without seriously distracting some of us for a couple of hours.

So we ran Daedalean's VXS Visual Awareness System against their footage to see how we compare. Before we present the results below, first some background.
* The video appears to have been posted in late March 2021, but the mail was sent on June 29, 2021.

Certified visual awareness
Daedalean's VXS performs a wider range of functions than just Detect And Avoid. It also provides absolute positioning (independent of GPS/GNSS), weather avoidance, and landing guidance (independent of ILS) towards runways, vertipads, or on-the-fly emergency landing sites. Unlike Iris' Casia systems, Daedalean targets certified manned and unmanned aircraft.

Our visual awareness suite covers all functions required to demonstrably outperform a human pilot in VFR operations on every measurable dimension, and it is being built for airworthiness certification to DAL-C for application as pilot-assistance and to higher DAL's for full autonomy.

One of the consequences of this is that at Daedalean we need to target certifiable hardware. Where Iris's Casia is based on the off-the-shelf NVIDIA Jetson TX2 or Xavier, a light, powerful and relatively cheap platform, making it an excellent choice for consumer applications, Daedalean can not use these because consumer-grade and even most industrial-grade hard- and software simply is not engineered to aerospace standards, which are tougher even than automotive ones. Daedalean's hardware is therefore heavier but also more powerful than Casia, putting us at a somewhat unfair advantage for this challenge.
The contest of Who Can See The Farthest
The dataset provided by Iris is of high quality but consists of compressed 8-bit grayscale video. We assume that their system, like ours, works on raw 12-bit pixels (although we use color), uses all the frames, and that the video compression may have fuzzed details that are important when trying to recognize small objects, so it is not a given that we have as much signal available as Casia had when we run our algorithm on this video. Also, we have to stop as soon as Casia outputs a bounding box, since the overlays obviously would distort the input to our system. For that reason, we only look if and how often we detect a track before Casia does. To avoid excessive processing we subsample the video at 6pfs. Other than that we just run the pixels through our unchanged detection algorithm without any special tuning.

We didn't have the opportunity to look at every single encounter (defined by the timestamp in the top left, which we extracted to identify the encounter boundaries) by hand so we assume that every encounter that shows a green frame is a true positive for Casia and any encounter that does not show any green frame is a false negative. We may have been sloppy because we only detected 258 resets of that timestamp. We assumed there were no false positives for Casia in this dataset, either because Casia is really precise or because the dataset was filtered afterwards. Whenever we see something first and we can match our match with theirs, we call it an uncontested earlier match for us; otherwise, the point goes to Casia. When we are sure about the match, we compute how much earlier we saw it and what the object size was.

The complete processed video is here (1:46:17).

And here are the stats:


Daedalean VXP

Total number of encounters


Encounters retrieved (TPs)


Cannot say because of the overlay

Encounters missed (FNs)


Encounters retrieved earlier (out of Casia TPs)

(including undecided)

(only when sure)

VXP earlier match (uncontested): avg #frames / encounter



VXP earlier match (uncontested): avg object size at track initialisation / encounter


Avg False Positive tracks longer than 10 frames / encounter



As pointed out above, this is not an entirely fair comparison between apples and apples, as we have a bit more computational power available, but we can see our system is tuned to pick up things that are smaller in apparent size than Casia. For 13 of the 22 encounters that Casia missed entirely, we had a track longer than 60 frames. On average we detect a target* 47 frames (@6pfs) or nearly 8 seconds before Casia, which in real aviation applications can make the difference between meeting or not meeting the Minimum Operational Performance Standards, as it defines declaration ranges extending to 3.4 Nautical Miles (6.3 km).

Of course, our increased sensitivity comes with a price: we pick up some false detections, mostly on the ground, which in production we filter out by other means (more about that in another post someday). But we interpret the challenge to be about a recall, not precision (within reason), and despite this dataset being significantly different (8 bit, no color, desert terrain) from the sets we train on, even if we filter out our false positives more aggressively, the VXS clearly sees almost everything significantly earlier and at a farther distance than Casia.
*In Aviation it's also called a "target" if you try to not hit it.
Neural Networks at work
How small objects are when they are first detected is an important performance metric of the Neural Network that does the hard work of scrutinizing every patch of pixels for the presence or absence of an aircraft. The certifiability of these so-called Deep Convolutional Neural Networks has been the subject of great interest and discussion in the industry, and Daedalean has been working with the regulators to define how to think about this at all. One of the fundamental requirements we identified is that the datasets you test on should be representative of the conditions you expect to see when you deploy the system in flights for real, and that they should not have biases that skew your test results in favourable ways. If that sounds intuitive, it is also founded on sound statistical learning theory.

The published video has an obvious bias in that almost all encounters are seen above the horizon, against the blue sky or against a background of clouds. For a low flying drone this may be representative, but since most aircraft maintain their altitude most of the time, these are also the encounters likely to be harmless, while at the same time easier to spot than below-horizon ones.

A second bias is that we appear to be in the Nevada desert, where hot and dry air typically makes for fantastic visibility. In coastal areas or further inland, representative conditions include a haze that reduces the contrast at greater distances. To be able to distinguish a white from a black object, even when the camera resolution permits, becomes impossible if the dynamic range of the pixels is too low. This is where the difference between 8- and 12-bit pixel cameras becomes very apparent.

A third, more subtle bias in this dataset is that most encounters are not on collision trajectories. A collision trajectory is characterized by the fact that the target appears to remain in the same relative position to the ownship*, only slowly getting bigger. So the fact that we can see most targets move means that in this dataset we only show that we are pretty good at detecting things we are unlikely to fly into if both we and the target maintain the level and heading.
*Aerospeak for "ourselves"
Seeing is not the same as believing
So while we don't think Iris' dataset is sufficient to prove that any DAA system is sufficiently good for operating safely in real conditions other than flying a small drone at low altitude in the Nevada desert, we do agree with the main point: the human eye and visual cortex may be a truly amazing piece of equipment suitable for a wide range of tasks but machines can be made to outperform it on the specific task of spotting a small thing far away, and the thought that the human eye is currently supposed to be the last line of defence against mid-air collisions is not one of great comfort.

Better than human or not: for the types of operations performed by certified aircraft we can not assume either system is good enough to allow autonomous unguided flight, based only on evidence like this. Daedalean's VXS targets the certified market while using Neural Networks in critical parts of the system. As part of the path to certification that we laid out in our work with EASA, it is therefore trained and tested on below-horizon encounters as much as above the horizon, and the system is carefully designed to guarantee rigorously the same performance for collision as for non-collision trajectories.

A couple of examples of our system in action over central Switzerland on a dreary day are posted below, along with some of the raw images from those sequences. We post them in two formats, a pretty one and one that gives you all the raw bits that come out of the camera.

Test yourself against the Daedalean VXS and see if you can spot the aircraft first. Or at all.


Raw Bayer pattern RGGB data

16-bit PortableGrayMap. every even row has Red/Green/Red/Green… pixels, every odd row is Green/Blue/Green/Blue… every pixel was sampled at 12 bits. You may have to download this and open it in a custom viewer like Gimp to manipulate the contrast, as your monitor likely can not display this with the full dynamic range.


More posts