Sunday, May 19, 2013

AF447 -- The Final Report

"To an even greater extent than the sea, the sky is incredibly unforgiving of human carelessness, incapacity, or neglect." (Unknown)

With little notice, last year the French Civil Aviation Safety Investigation Authority released the final report on AF447. (I have written about the mishap investigation here and here.)

First, a few words about how crash investigations proceed. They start with an initial report, which primarily serves as an official notice of the accident and known circumstances. Then, depending on the severity of the crash and the complexity of the investigation, there will be one or more interim reports. Their purpose is to provide the accumulated list of facts. The final report, based upon the accumulated evidence, provides a theory of the accident that incorporates the facts, along with recommendations to prevent similar mishaps.

At over 250 pages, the report is definitely not something you would look to for light beach reading. Nor, as I am about to demonstrate, is it a natural fit for a blog post that won't soon remind readers that the internet is indeed big, then shortly thereafter convince them to direct their attention somewhere, anywhere else.

In order to avoid various and tedious means of citation, I will simply preface everything I sourced from the final report with (FR). Text prefaced with (DE) is descriptive for a non-specialist audience. Everything else is in my humble, but very expert, opinion. Since the report follows a specific format, which is organizational rather than narrative, so a great extent this analysis will, too.

For those unwilling to subject themselves to a slog, I'll cut to the chase. I thought the report failed to understand the underlying cause of the mishap, engaged in unwarranted speculation, completely missed a few "wait, what?" moments, and didn't question existing procedures. Now, the slog.

In order to substantiate those criticisms, I'm going to become unavoidably abstruse.

Mishap Summary

While flying through an area with super-cooled water droplets, AF447 lost all airspeed indications due to icing of the ram air pressure sensing devices. The flying pilot (PF) then commanded full nose up, which resulted in the aircraft climbing outside its flight envelope, whereupon it entered an aft-stick stall. Neither the PF nor the monitoring pilot (PM) recognized the stall. The aircraft remained in the aft stick stall until impact.

History Bites

(FR) With respect to the A330, there had been 13 previous incidents sufficiently well documented for analysis and comparison. In all cases, unintended altitude variations were less than 1000 feet. In five cases, crews deliberately descended up to 3500 feet in response to stall warnings; all but one of those warnings was due to a combination of flight control reversion to Alternate Law and turbulence. (Alternate Law is a degraded flight control mode that uses arbitrary values instead of air data inputs to the flight control computer; also, in AL there are no flight envelope protections.) Within that seemingly benign group, though, is one instance where the crew made inappropriate high-amplitude control inputs, sometimes from both pilots, over four minutes. The inputs, although extreme, weren't sustained, the altitude deviations were less than 600 feet.

(FR) None of the affected crews applied the memory items from the unreliable airspeed procedure. They didn't manually disconnect the flight directors, disengage autothrust, or set the pitch attitude to 5º per the procedure.

At this point, someone — heck, everyone — should be raising their hands. What are memory items, and what good are they?

(DE) Memory items are procedural steps to a small number of emergencies considered too time critical for reliance on the Quick Reference Handbook. (The QRH is largely dedicated to abnormal conditions. It also has supplemental checklists for normal but non-routine operations, and tabular performance data.)

While seemingly sensible, memory items disregard human limitations when responding to extremely rare events. If the response is simple, (e.g., loss of cabin pressure, put on the O2 mask) they are unnecessary. Where the required procedure is more extensive, then it is extremely unlikely that the pilots will be able to fly/monitor the airplane while reciting a frequently verbose laundry list during a situation completely hostile to that very thing.

(DE) My previous airline acknowledged that fact by putting a red bordered card on the glare shield containing the immediate action steps for time critical situations. What the pilots were required to memorize was merely which abnormals were on the red border checklist: the flying pilot would direct the monitoring pilot to read the appropriate critical action items, in case the non-flying pilot wasn't already doing so.

The universal failure to apply what was a list of memory items for the A330 should be waving a giant red flag, already clearly visible for those not thoroughly stunned by habit, that history has comprehensively rejected the entire concept. (My current airline is similarly stunned.) If the crew had a red-bordered card with a list of steps to be read in the event of any problem with a primary flight instrument, you probably would never have heard of AF447. The report frequently refers to the "startle effect" such situations create, while simultaneously seeming oblivious to the obvious implications for the very notion of memory items.

When you need a pilot, you don't want an operator

(DE) Stripped to it's essence, piloting an airplane requires three things: continuous and accurate situational awareness; decisions based upon that awareness about what to do with the airplane; and implementing those decisions. A popular term for this is OODA Loop: Observe, Orient, Decide Act. All pretty self evident and, prior to glass cockpit airplanes, more or less essential.

(DE) Enter the FMS and FCC, TLAs for the Flight Management System and the Flight Control Computer. In modern airplanes, the FMS maintains positional "awareness", and calculates the vector required to null the difference between the current and required position and speed. When the autopilot is engaged, the FCC then moves the flight controls and throttles in order to null the difference between the aircraft vector and the FMS commanded vector. [Note: The FMS actually works in mixed modes: following programmed route, altitudes and speeds; various permutations with pilot assigned heading, altitude, and speed; or, solely pilot assigned heading, altitude and speed. Full automation is common during climbout and en route; a combination mixed mode and solely pilot assigned during arrival.]

(DE) From the pilot's point of view, the FMS commands are shown by the Flight Director (FD) vertical and horizontal (fly-to) bars on the Primary Flight Display. Centering the fly-to bars, whether via the FCC or manual flight control input, creates the "nulling" vector.

There are two serious problems here. 1. The FMS and FCC are very good. 2. They are very reliable. Wait. Problems … how?

Yes, that is counterintuitive. But the generally accurate guidance and compliance of modern auto flight systems has meant there is almost no externally imposed incentive to not rely upon them. Moreover, the continual presence of FMS flight guidance, especially when hand flying, has removed the piloting from flying. Remember, part — by far the biggest — of being a pilot requires deciding, continuously and in detail, where the airplane is, what the airplane needs to do, and what it can do. Manipulating the flight controls is the relatively trivial consequence of everything else.

However, as long as the FD is in view, the FMS has taken the OOD out of OODA. Even when hand flying, pilots are merely being self-propelled FCCs, doing what the FMS tells them to do. So long as the FD is working, which is almost always, it makes all the control (attitude, power) and performance (heading, horizontal & vertical speed) instruments practically redundant. Pilots don't decide, for instance, what thrust setting and pitch attitude produces the required horizontal and vertical speeds; instead, they leave speed control to the autothrust system, while centering the FD pitch and roll command bars. The consequence of extensively relying upon the FD means it is easy to for pilots to stop deciding for themselves what the airplane should be doing, which is a long step towards losing sight of what it both must and can do.

When glass cockpit (shorthand for FMS/FCC) airliners first arrived, the operational philosophy — initially — was to always operate them at the highest possible level of automation consistent with the situation. However, what should have been apparent at the outset became more or less quickly, depending on the airline, glaring enough: flying skills were deteriorating. Various airlines addressed this in various ways: scarcely; a fair amount, but not enough; and just right.

Northwest Airlines, where I flew the A320 more than a decade ago, had by then taken the lesson fully on board. The Operations Manual actively encouraged, when conditions were permissive (light traffic, good weather, on your A game), doing the flying version of the full Monty. Not just turning off the autoflight and autothrust systems, but also the FD. Probably half the guys I flew with took the opportunity at least once per trip. The other half either didn't feel the need, or doubted their abilities.

My current employer, where I have been for six and a half years, started near the Air France position, and as the consequence of some painful (but not infamous) experiences, shifted towards the position Northwest had long since adopted. Flight Ops hasn't quite gotten to the point of actively encouraging the full flying Monty, but at least it is no longer prohibited, as it was up until a couple years ago. Most pilots at my base, but, oddly, far fewer at my company's main hub, hand fly from takeoff through about 20,000', and the last ten minutes or so of the flight. In my experience, almost none (one Capt in the last couple years, and yours truly) routinely do the full flying Monty. (Note: glass cockpit airplanes in some respects have higher workloads than their steam gauge predecessors. In the terminal environment, for reasons that don't bear going into, the monitoring pilot's (PM) workload increases significantly if the flying pilot (PF) is hand flying. This means in a busy terminal environment, most pilots will have AFS fully engaged to better distribute tasking.)

Air France (SFAIK, I am speculating a bit here) didn't. Until AF447, their philosophy was always maximum automation. The consequence was an airline staffed by more by operators than pilots. Indeed, a design goal of the A320/330/340/380 (A32+) series was the elimination of pilot skill, to the point that when first writing the A320 flight manuals, Airbus wanted to use the term "flight manager" instead of "pilot". Their flight test department saw that one off, but the underlying assumptions remained.

Nettle, ungrasped

A final report is supposed to contain data pertinent to the mishap, and causal theories that explain the data. They should not engage in speculation, unless there are otherwise unbridgeable gaps in the data; that is not the case with AF447. Yet the report does just that. It invokes something they call "the startle affect", and combines that with an excessive emphasis in training about over-speeding the airplane to produce an explanation of the PF's reactions. Of course the startle effect exists: happens every time something goes bang in the flight. Yet virtually all flights with bangs don't end up in pieces, so this is an explanation that, on its own, explains nothing.

(FR) In fairness to the report, it does go into some detail about how Air France's training with regard to over-speed was, pretty much by definition, excessive: it was wrong. The A32+ aircraft do not have enough power in level flight to overspeed the A32+ airfoil. Additionally, the report found that there was no discussion of stall recognition or recovery in the manual. Since one of the design features of the A32+ is extensive flight envelope protection, it isn't possible to stall the airplane. Clearly, that was at least one presumption too far.

However, it is an unacceptable leap to then explaining the flying pilot's comprehensively incorrect reaction to the loss of airspeed information as being motivated by his conclusion that the airplane was in the process of exceeding its maximum operating mach.

Rather, the proximate cause is a quite simple nettle the Final Report leaves firmly ungrasped. The only explanation for the PF putting the aircraft out of control, and the PM failing to satisfactorily notice and correct that situation is that neither pilot knew how to fly an airplane.

Consequently, the PF was completely incapable of performing the very first step of any abnormal situation: maintain aircraft control. He was unable to correctly identify not just the proper pitch attitude for level flight, but also a pitch attitude wildly inappropriate at cruise altitude.

He compounded this problem, already easily bad enough, by lacking any apparent knowledge of the basic physics of flight. The airplane was in level, unaccelerated, flight when the pitot systems went tango uniform. Therefore, it was not possible for the airplane to stall or overspeed. To non-pilots, the first reaction must be "that's crazy talk". But it's true. If the PF had firewalled the throttles, rather than accelerate, the airplane would have climbed. If he had instead pulled the throttles back, it would have descended. (A simplification, but close enough for this discussion.)

Nearly as appalling, neither pilot had any awareness of the airplane's performance margin at cruise altitude. At the start of the mishap sequence, AF447 was flying at, or very near to, its optimum altitude. That means, among other things, that the difference between cruise thrust and max thrust is very small. That means the aircraft only has enough excess power available to maintain horizontal speed at a vertical speed of roughly 1000' per minute. Consequently when climbing to a new optimum cruise altitude — which typically happens up to a half dozen times during a long flight as the gross weight decreases — the pitch change is a degree and a half, or less. Anything greater than that must result in a loss in airspeed.

To show what I mean, here is a picture of the MD11 primary flight display at optimum cruise altitude. The A330 display is similar:

(DE) The black dot in the center of the artificial horizon (the aircraft reference) shows the FCC nulling the FMS commands. The aircraft attitude is ~1º right bank and 3º pitch. The black bars on either side are the aircraft wings. Just above them are barbed cyan lines. These are the pitch limit indicator. The lines show the angle of attack remaining — 2º — to stickshaker. The barbs show the AOA remaining to stall, 4º. The aircraft speed is 294 knots (338 mph) indicated, and Mach 0.823, which is about 490 knots true (about 560 mph over the ground in still air). At 294 knots, the airplane is 20 knots above the 1.3G buffet limit, and 18 knots below max operating Mach at 32,960 feet.

(It is perhaps worth noting, although the report doesn't, that the A32+ doesn't have a pitch limit indicator. This is symptomatic of a design philosophy that seems to have pervaded the Airbus fly-by-wire aircraft: because FBW control systems are able to extensively incorporate flight envelope protection, the airplane will always know better than the pilot.)

(DE) In this regime, the airplane is operating in what is sometimes referred to as the "coffin corner". The gap between too fast and too slow is quite small. In your car, it would be like if you let your speed slow to 65 from 70, the doors would fall off; if you accelerated to 75, the wheels would; and if you turned the steering wheel more than a couple degrees, you would end up in a ditch.

(FR) When the pitot system succumbed to icing, the pilot flying (PF) applied full aft stick, pulling the aircraft to well over 10 degrees nose high, a pitch attitude that first caused a gross altitude deviation, then inevitably and fairly quickly, an aft-stick stall.

The excessive nature of the PF's inputs can be explained by the startle effect and the emotional shock at the autopilot disconnection, amplified by the lack of practical training for crews in flight at high altitude, together with the unusual flight control laws.

(FR) The list of explanations are: distraction; unconscious initiation of a previous plan to climb above the weather; the attraction of clear sky (the aircraft was flying at the edge of the cloud layer); task saturation; the turbulence at the time (which was at the upper limit of what is defined as "light"); concern about overspeeding the airplane; setting a pitch attitude appropriate for airspeed failure at low altitude, where avoiding the ground is the highest priority; and responding to contradictory flight director commands.

To do this in the first place is so incomprehensible to me that I can't think of a metaphor, simile or analogy that comes even close to conveying both the required ignorance and ineptitude. The report's list of explanations sidesteps the elephant in the room, which is this: somehow there were two guys on the fight deck who completely lacked basic airmanship. In particular, the PF failed to observe, orient and decide. The combination of a highly automated airplane and a operations culture that emphasized total reliance on the autoflight system produced a pilot who could only act, which is the same as saying he wasn't a pilot at all, but merely an ambulatory stick actuator.

(FR) The PM initially noted the pitch attitude and altitude deviation, but then failed to perform his primary duty — monitoring aircraft performance, announcing deviations the PF, and ensuring the PF corrects the deviations. Any competent PM, when faced with the PF's pitch input, would have dropped everything and directed the PF to the correct pitch attitude and altitude.

Other Issues

(DE) Until the A32+ series, aircraft design required positive longitudinal stability. That is, putting the center of lift (CL) far enough aft of the center of gravity (CG) so that a pitch change in one direction creates a torque around the CG in the other. There are two goals: keeping the airplane from being too sensitive in pitch and, second, to make the airplane speed stable. The distance between the CG and the CL creates an upward vector that, left unbalanced, would pitch the nose down. The horizontal stabilizer, with a downward lift vector, is the balance. However, the cost is the wings having to support both the aircraft weight, and the effective weight of the balancing vector. More weight means more lift required which means more drag which means more fuel burn.

(FR) [In alternate law] the airplane does not have positive longitudinal stability — it is not necessary to make or increase a nose-up input to compensate for a loss of speed while maintaining altitude.

This behavior, even if it may appear contrary to [certification provisions] was judged acceptable by taking into account special conditions and interpretation material. Indeed, the presence of flight envelope protection makes neutral static stability acceptable. The specific consequence of [alternate law is that the airplane will stall if there is insufficient thrust to maintain level flight, without any flight control input]. It appears this absence of positive static stability could have contributed to the PF not identifying the approach to stall.

In other words, the A32+ is designed in such a way that would be unacceptable in a non-FBW airplane. Which is just fine, until your FBW airplane isn't. This isn't to say the A32+ are un-flyable in alternate law; rather, maintaining the proper pitch attitude, which in order to do, one must know in the first place. However, stalling a conventional aircraft requires a concerted effort both to get it, and keep it, there.

(FR) In the absence of speed indications, stall warning consists of only of a synthetic "STALL, STALL" and a an illuminated master caution light, which generally illuminates for many other reasons. The flight manual makes no mention of airframe buffet associated with stall. There is insufficient awareness of the proximity of the stall angle of attack when cruising at high altitude.

This is another sign that Airbus didn't take seriously enough the possibility of ending up in alternate law. Not only do conventional airliners have positive stability, they also have impossible to ignore stick shakers that kick in when airplane approaches stall.

As day to day hands-on-flying goes, the A320 is outstanding, and not just because of FBW. Flying with a stick is far more precise than a yoke. But, as previously noted, one consideration to keep firmly in mind is that without static stability, the airplane will not seek flying speed, which makes intrusive stall warning even more important. Why the designers could get away without adding haptic feedback is a mystery the report never mentions, because it never notes the shortcoming in the first place.

(FR) While on the subject of things bizarre and ignored, the FR noted in explaining the mishap sequence that the stall warning, triggered by the Angle of Attack (AOA) approaching the stall AOA, silenced when the indicated airspeed was less than 60 knots.

Now, while this isn't typically apparent in transport aircraft, there is no direct relationship between airspeed and angle of attack. That is (and I have done this), it is possible to have both zero airspeed, and not be stalled: just get the airplane going straight up. Tying AOA validity to some minimum airspeed value is a silly schoolboy error. Not only does it ignore the physics of flight, it also means that multiple simultaneous airspeed failure also takes away AOA. Which segues nicely to …

Remember where you first read this

The report recommended making angle-of-attack information available: "Only a direct readout of the angle of attack could enable crews to rapidly identify the aerodynamic situation of the airplane and take the actions that may be required. Consequently, the BEA recommends that the EASA and FAA evaluate the relevance of requiring the presence of an angle of attack indicator directly accessible to pilots on board [sic] aeroplanes."

For long time reader-sufferers, I wrote this very thing years ago. (Along with criticizing the lack of flying skills among FMS pilots.)

And it is easy to do. Here is how the logic goes: If true airspeed differs from ground speed plus the wind component (both sensed by the inertial platform) by more than (system tolerance) then replace airspeed on the primary flight display with AOA.

Some parting shots

For those sufficiently geeked out, the AF447 Final Report is very interesting because it lays out the whole process of investigating this tragedy. And, ultimately, it does make some good recommendations in various areas.

However, to my eye those who wrote the report went to amazing, but unconscious, lengths to avoid making eye contact with what was staring them right in the face: Air France has pilots whose flying skills aren't just weak, they are non-existent. Further, that situation is due to a culture that extols planning — no need to think for yourself, flight envelope protection will always be there — while failing to comprehend the possibility of plans failing, even when it had happened numerous times.

It's almost a metaphor.