Analog Angle Article

Coincidental failures complicate debug

Debug is tough, no doubt. Experienced engineers know that design gets the glory, but the real project battle is at the debug level. Perhaps that's unfortunate and not at all fair, but that's the way it is.

It's hard enough to debug when you're looking at a single fault or problem situation. Is the problem caused by hardware, software, some interaction between the two, a test set-up problem, inadequate power or voltage rail, something subtle, or something so obvious you can't see it? Is the problem at least consistent and repeatable, or a dreaded occasional occurrence or intermittent?

Even simple designs and problems can really trip you up, when you have two unrelated but coincidental problems. Here's a mundane yet illustrative example: Last week, a floodlight-style bulb in my kitchen's recessed ceiling burned out, which is no big deal. I bought a replacement, screwed it in, and it flickered then went poof! Just to make sure it was blown, I checked the dc resistance and yes, it was an open circuit.

I took the replacement bulb back to the store, and got another one, brought it home, screwed it in. This one didn't light up at all. My immediate impulse was that I somehow happened on a bad batch of bulbs, and I was about to head out for yet another bulb swap. But then my engineering mindset kicked in, and I said to myself, “wait a minute, don't assume.” I checked this bulb and there was a complete circuit internally, so I screwed a non-floodlight bulb into the socket, and it came on just fine.

After a little more thinking and investigation, I found the problem. The center contact “tang” in the bottom of the socket was not touching the bulb's base contact. Perhaps the bulb was slightly out of spec, perhaps the tang was pushed down too far. After all, whether the resultant gap is large or almost invisible doesn't matter, it's still an open circuit. I pulled the tang out slightly using a pair of pliers, and the replacement bulb worked fine.

Once again, relatively simple problems occurring at the same time almost had me jumping to a very wrong conclusion. Just think how much more difficult it is when you have unrelated, simultaneous problems due to circuitry, software, noise, and intermittents. Think of how many times you genuinely believe you have located the problem, only to find out later you were wrong.

This is why the lead debugging engineer on the team should get a lot more credit than he or she likely gets, and why assigning a junior member to work without proper guidance and—dare I use the worn-out phrase?—mentoring is a bad idea. But neither is likely to happen in today's time-pressed, slimmed down design environment, unfortunately.

0 comments on “Coincidental failures complicate debug

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.