a) Collection and labelling of the data. (See our past blog post on the specifics of data labelling for aviation, including the requirements of traceability of the data).
b) Distribution of the data. We should ensure that there are enough data, that the data are not biased, that their distribution corresponds to the real world, and that they cover all possible cases and scenarios, including the edge – rare – ones.
c) Independence of the training, validation, and testing datasets. An NN must be evaluated on a different dataset than the one it has been trained on.