Annotation in the context of Daedalean’s software development means labeling images to make the objects on them recognizable to computer vision. To learn to ‘see’ and categorize objects, a neural network needs to be fed millions of pre-labeled images. The team processes them in full compliance with the strict requirements imposed by RTCA DO-200B, a standard used by EASA, the FAA, and other aviation regulators to certify aeronautical data.
The annual KPI for the first year of the annotation team was to process 500,000 images. But they delivered twice as much—a million images by the end of the year (including reannotation for uniformity and regulatory compliance). This was achieved thanks to the carefully designed work processes, high level of automation, specialized customized software tools, and regular training. The tools they use have, in part, been developed within Daedalean and tailored to the industry requirements.
Why has this job proven too tricky for third-party annotators?
Lack of specialized knowledge
In software development for aerospace products, knowledge of the subject area is very important, especially in safety-critical fields. This requirement applies to data annotation, too: annotators should not only follow some set of formal instructions — they need to understand what they are seeing from a subject matter point of view. For example, an airport runway has a lot of designations unknown to a person not working in aviation. During landing, the flight control software will be guided by the ‘knowledge’ based on the data fed to it during the training process. That means that the accuracy of identifying the exact borders of the runway in flight will depend on the work of the annotators, on how accurately and thoroughly they identified the runway designations and labeled them on the images.
Sometimes an annotation task can be so nuanced that it would be impossible to fulfill it without the help of professionals possessing many years of hands-on experience. In medicine, if we teach neural networks to read X-ray shots, the annotation should be performed at least in part by professional surgeons—and interns, whose knowledge is so far mostly theoretical, won’t do.
Compare this to labeling the ground for possible landing sites for helicopters. To do this, an annotator should understand very clearly where a helicopter can land and where it cannot, and review the whole sequence related to it. It is hard to formalize this task, because in reality, whether a helicopter can safely land on a particular spot, depends on a lot of factors – including things like the height of the grass. An experienced pilot, who has accumulated hundreds of flight hours, can determine if there is a hidden pit or a rock covered by grass by looking down from a window and noticing subtle changes in the color and height of the grass, but a fresh graduate of a pilot school cannot. In Daedalean, when the qualification of the annotators is not high enough for some cases, they always have pilots to consult with; currently, the team is mixed with the 80/20 ratio (80% ‘general’ annotators and 20% subject matter specialists).
From these examples, it is clear that the services of highly specialized, qualified annotators cannot be cheap. As for the agencies offering affordable services, their workflow is often based on the frequent rotation of the team, where the performer is assigned randomly each time and receives just a brief description of a task. This might be enough for the development of funny Instagram apps, but not for aerospace. And when an agency offers a permanent team assigned to a client, the costs are compatible with maintaining an in-house team, while the latter gives a company more control over the processes.
Failure to comply with certification requirements
Data management for safety-critical aviation applications is regulated by the standard DO-200B, Standards for Processing Aeronautical Data. One of the most critical requirements related to the processing of data in this standard is the traceability of the source, so that in case of any errors we can always track the source of the data and who put it there. It means that the process must be performed in a strictly defined way.
To understand this requirement, let’s compare it with the following example: in the defense industry, there is a notion of military acceptance. A manufacturer building a submarine for the navy must draw a report for every little detail, almost for each screw, weld, or a paint smear: who and when installed it or put it there. This way, you always know who is responsible in case something goes wrong.
The same principle applies to data processing in the development of certifiable software. For each piece of data fed to neural networks, we must know its entire life cycle. Meaning, if we have a photo, we also know where and when it was taken, who processed it, who annotated it, by which software, settings, and so on. We know the whole history of this image from the moment it was taken, and up to the moment it was processed by a neural network.
In short: To comply with strict certification requirements imposed by DO-200B, a standard for aeronautical data used by certifiable aviation software, while labeling hundreds of thousands of images quickly and efficiently, Daedalean needed a full-time team of dedicated people undergoing a lot of specialized training, tailored to the industry needs.