Thursday, 25 July 2013

Software Complexity and What Brought Down AF447


After the recovery of the black boxes from the ill-fated Air France flight 447, it has been concluded that pilot error, coupled with Pitot-tube malfunction have been the major causes of the tragedy. It appears, however, that this is yet another "loss of control" accident. Based on black box data, the aircraft stalled at very high altitude. But, you cannot stall an A330. By definition. The airliner (and many other fly-by-wire aircraft) is software packed to such an extent that it won't let you stall it even if you wanted to commit suicide. That's the theory. But in reality, you don't fly an airliner - you fly the software.  The degree of automation is phenomenal. That is precisely the problem.
Pilots say that they have become button pushers. Here are some comments on the AF447 accident taken verbatim from a Professional Pilots blog:


"We need to get away from the automated flight regime that we are in today."

"Pilots must be able to fly. And to a better standard than the autopilot!"

"To be brutally honest, a great many of my co-pilot colleagues could NOT manage their flying day without the autopilot. They would be sorely taxed."

"It will cost a lot of money to retrain these 'button pushers' to fly again, ..."

"It appears as if the sheer complexity of the systems masked the simplicity of what was really going on. "

"Just so I understand correctly, then there is no way to take direct control of the aircraft unless the computer itself decides to let you, or perhaps more correctly stated, decides you should. Sounds like Skynet in "The Terminator". "


This accident is a very complex one. It is not going to be easy to understand why the plane really came down. It will take time to analyse the data thoroughly and to understand why highly trained pilots pulled the nose up when the stall alarm went off. The theory is that they must have received a large volume of information of very highly confusing nature in order to do so. Apparently, they managed to crash a flyable aircraft.



We have our own view as to the nature of the problem, not to its cause. We believe that it is the excessive complexity of the system that is to be blamed. Modern aircraft carry over 4 million lines of code. That is a huge amount of real-time code. The code, organised into modules, runs in a myriad of modes: "normal law", "alternate law", " approach", "climb", etc., etc. The point is however this. No matter what system you're talking of, high complexity manifests itself in very unpleasant manner - the system is able to produce surprising behaviour. Unexpectedly. In other words, a highly complex system can suddenly switch mode of behaviour, often due to minute changes of its operating conditions. When you manage millions of lines of code, and, in addition, you feed into the system faulty measurements of speed, altitude, temperature, etc., what can you expect? But is it possible to analyse the astronomical number of conditions and combinations of parameters that a modern autopilot is ever going to have to process? Of course not. The more a SW module is sophisticated - number of inputs, outputs, IF statements, GOTO, read, write, COMMON blocks, lines of code, etc., etc. - the more surprises it can potentially deliver. But how can you know if a piece of SW is complex or not? Size is not sufficient. You need to measure its complexity before you can say that it is highly complex. We have a tool to do precisely that - OntoSpace. It works like this. Take a SW module like the one depicted below.

























It will have a certain number of entry points (inputs) and produce certain results (outputs). The module is designed based on the assumption that each input will be within certain (min and max) bounds. The module is then tested in a number of scenarios. Of great interest are "extreme" conditions, i.e. situations in which the module (and the underlying algorithms) and, ultimately the corresponding HW system in question is "under pressure". The uneducated public - just like many engineers - believe that the worst conditions are reached when the inputs take on extreme (min or max) values. This is not the case. Throw at your SW module hundreds of thousands or millions of combinations of inputs - you can generate them very efficiently using Monte Carlo Simulation techniques - and you will see extreme conditions, which do not involve end values of the inputs, to emerge by the dozens. And once you have the results of a Monte Carlo sweep just feed them into OntoSpace. An example with 6 inputs and 6 outputs is shown below.



 
























The module, composed of four blocks (routines) has been plugged into a Monte Carlo loop (Updated Latin Hypercube Sampling has been used to generate the random values of the inputs). As can be observed the module obtains a 5-star complexity rating. Its complexity is 24.46. The upper complexity bound - the so-called critical complexity - is equal to 34.87. In the proximity of this threshold the module will deliver unreliable results. Both these values of complexity should be specified on the back of every SW DDD or ADD (Detailed Design Document and Architectural Design Document). So, this particular module is not highly complex. The idea, of course, is simply to illustrate the process and to show a Complexity Map of a SW module. In other words, we know how to measure the complexity of a piece of SW and to measure its inclinations to misbehave (robustness).
 

But how complex is a system of 4 million lines of code? Has anyone ever measured that? Or its capacity to behave in an unexpected manner? We believe that the fate of AF447 was buried in the super-sophisticated SW which runs modern fly-by-wire airliners and which has the hidden and intrinsic ability to confuse highly trained pilots. You simply cannot and you should not design highly sophisticated systems without keeping an eye on their complexity. Imagine purchasing an expensive house without knowing what it really costs or embarking on a long journey without knowing how far you will need to go. If you design a super sophisticated system and you don't know how sophisticated is really is it will one day turn its back on you. It sounds a bit like buying complex derivatives and seeing them explode (or implode!) together with your  favourite bank. Sounds familiar, doesn't it?



www.ontonix.com


www.design4resilience.com