"With traditional software, we have lines of code and software development assurance systems. When it fails, we can look at the code and understand where and why it failed. Artificial intelligence and neural networks are not like this. How can we establish the trust that artificial agents will behave safely?
AI can't be understood. It can't be explained. It is non-deterministic. Thus, it can neither be trusted by the general public nor certified by authorities."
If you trust your life to something, you have a right to demand proof that it is safe. However, not everybody has to understand how it all works. In reality, the details of every complex technology are always known by a few specialists anyway. For example, very few aviation passengers understand how a jet engine works. Regulators, such as EASA and the FAA, are responsible for ensuring that such systems are safe, on behalf of the public.
In traditional software development, development often starts with a minimum viable product, possibly with some bugs, as an alpha (first) version. But in aerospace, we simply cannot afford to ship the minimum viable product. It should be proven safe from the very beginning of operation – that is, the regulators must be fully convinced that the system will perform robustly during an operation before it actually being allowed to be put into operation!
Although this sounds like a chicken-and-egg problem, it can actually be achieved by using statistical and mathematical methods. The current goal of Daedalean is to show that its systems have a failure rate of less than 10^9. In other words, the system cannot fail even once in 10^9 (one billion) flight hours.
There are design assurance standards for traditional software and engineering and more specific additional standards for aviation regulators, but these do not apply to neural networks. That is why Daedalean is working with EASA under an innovation partnership contract (IPC) to develop methods to ensure that any neural networks incorporated into aircraft systems are at least as safe, predictable, and reliable as existing flight-critical software. In March 2020, Daedalean and EASA published Concepts of Design Assurance for Neural Networks (CoDANN), containing the first guidelines for evaluating the safety of neural networks. These concepts can be used in any safety-critical field, not only in aviation.
First, let us define the terms used. Artificial intelligence (AI) is a rather broad and opaque definition. Throughout its history, this term referred to computer systems able to perform tasks that ‘normally’ require human intelligence, which at that time scientists and engineers did not know how to create. As ‘normally’ is a shifting term, the definition of what constitutes AI has changed over the years. In the 1950s, a computer playing chess (considered then the peak of human intellectual achievement) was labelled as AI. After the problem was solved, it simply became a part of computer science. In the 1980s, Go was considered the hardest intellectual game that humans could play. Eventually, this was also solved by AI. With further advances in computing power, and with the new types of computation that it has made possible, it turned out that recognising a cat in a video is no less difficult; however, this challenge has also been solved. Whenever these challenging 'AI' problems are cracked, they are no longer labelled AI but become branches of science and engineering.
Machine learning is a much more specific, but still quite broad term. It is the branch of engineering in which a computer algorithm finds a solution to an engineering problem by automatically exploring all of the possible uses of a system. The answer may be in the form of another computer program, or as a function that can be evaluated by humans.
A subset of systems in the field of machine learning is the deep convolutional neural network. First, a neural network is a form of computer program inspired by how biological brain cells work. It includes a group of algorithms that you can train to recognize things, by providing lots of labelled examples so that the program can 'learn' to answer questions such as, "Is there a cat in this picture?" or "Where is the runway?"
Traditional computer programs with simpler ‘if ... then’ statements would not be capable of solving this class of problems. But neural networks can be trained to 'see' the differences between safe and unsafe in real-world situations. For example, a program can process a set of pixels representing an image and then split the image into fragments that look like a runway and ones that do not. Our neural networks, trained in this manner on millions of images, do this correctly 96% of the time (at the moment), and there are networks where the accuracy reaches 98%.
You may ask: 96% is a high probability, but does that mean that the aircraft will crash in 4% cases because the neural networks in its software would make an error? No – because it is not just one image that leads to the final decision: the neural network does not only decide yes or no once, it continues to ‘look’ and adapt its judgement during the whole process – just like a human pilot. Throughout the approach, it will continue receiving and processing multiple images of the runway coming into view. Each processing decision will be made with the same probability of 96% being correct, increasing the overall confidence of the decision, and thus effectively eliminating the probability of an error.
In the end, it does not matter that the lines of code in a neural network are not as easily readable as traditional, ‘handwritten’ software. What matters is that its results are fully deterministic (it will always process the same data or image with the same output, and the results will not change over time), and its performance in real-world operation can be mathematically proven up to a known probability of success.