Note: this is a longer version of a column that appeared in EE Times, June 12, 2006
We all know that software has bugs, even after diligent testing. PC's crash, cell phones hang, and we do those soft and hard resets to get things going again. The impact ranges from annoyance to hassle, and even to life-threatening. Authors such as Jack Ganssle, and publications and web sites such as Embedded Systems Design (www.embedded.com), have discussed the problem extensively, and how to minimize it.
But I was truly scared–yet only a little surprised–when I read that today's commercial airliners, with their millions of lines of code supporting navigation, fly-by-wire, and hands-off automated everything, can suffer major software glitches. The Wall Street Journal, May 30, 2006, looked at the problem (“Incidents prompt new scrutiny of airplane software glitches”) starting with the case of a Boeing 777, which developed a mind of its own and did a power climb and then dive; it took 45 seconds for the pilot to regain control.
When I read stories like this, I mentally retreat to the comfortable niche of electronic hardware, where solid ICs–mostly analog–rule. But it appears I've been living in a fool's paradise. Why? Due to counterfeit electronic components.
A recent EE Times story (“On the trail of counterfeit chips”, May 22, 2006) and IEEE Spectrum story ( “Bogus”, May 2006) showed how counterfeit ICs, batteries, and passive components are worming their way into almost every manufactured product, even mil/aero/hi-reliability ones. The problem is a very tough one to solve, since there are so many ways to get these bogus parts into the supply chain, and so many unscrupulous people willing and able to do so. There is no magic bullet solution.
Ironically, while counterfeit ICs and high-end components offer the most illicit gain, there's good money to be made with fake lowly capacitors and resistors; there are typically 10 to 20 of these passives for each high-level IC. And worse, the final board and system will likely pass functional tests with those passives, then fail in the field in few months, due to temperature drift, basic reliability issues, or EMI zapping. So an analog signal chain may be no better than a software-driven system, in terms of actual reliability.
What really scared me was learning that most complex systems really don't have an override, even if they do have a manual override switch. The switch itself is sensed by software, so a serious software glitch or a counterfeit processor means that you can't regain control unless there's a very, very good watchdog function or backup control. I suppose I shouldn't be surprised, since the so-called “on/off” switch on most of today's consumer products doesn't interrupt the power rail, but only signals the device to go into a sleep mode.
Still, I'd feel a lot better if the emergency reset on an aircraft or machinery forced a hard reset, or even better, allowed some basic, direct manual control to resume. Those cable-based mechanical controls and linkages are starting to look very good to me.
Bill Schweber , Site Editor, Planet Analog