To follow up on one of my previous blogs (Component Count & Reliability), let's dig into the numbers a bit more. We'll use a simple example: Consider a device that has just 1,000 things that can go wrong with it — not even as complex as a 4 bit MCU. Let's say that each thing has a 0.1 percent probability of going wrong per hour — what does this tell you?
Well, it tells you that statistically, there will be something wrong every hour you try and use it! Obviously, not a very happy experience for you if it were a WiFi router, cellphone, or any other modern convenience.
We still often see advertisements for six-sigma quality people on sites like Monster or LinkedIn — these mostly have to do with manufacturing products. On a previous job, I was in charge of support engineering for wireless phone production for a line that was turning 200k phones a month — a slow mover. It was manufactured to seven-sigma for the design. First Pass Yield at final test was nothing like seven-sigma, and usually, fall-out and early returns ran in the double-digit percentages. Why is this?
With an ASIC that included two fast, low-power DSPs, and a fast ARM processor plus a large amount of cache and RAM, as well as almost as much flash as the address space of a high end PC of the late 1980s, and complex mixed-signal ASICs for RF and baseband, it was a very complex product boiled down to about half a dozen ICs and RF semi-conductors.
We got many chipsets that would pass all tests and even V&V (verification & validation) qualifications, and then fail one function or another in a full-up phone. Escapes from all the silicon vendors testing — we would re-ball the chips put them back in the problem phone and that one function would not work — was always exercised in some way that was different than ATE did it, and Qualification test did it. Thousands of prototypes had been built, tested, and alpha- and beta-tested all over the country, and they failed to uncover this small flood of “ankle biter” problem phones that would come across my desk.
Modern FPGAs and SOCs are exceeding this. There are very complex mixed-signal ASICs and SOCs on the market in high volume products. Modern PC processors exceed 1 billion transistors! ICs aren't built on perfect processes in perfect clean-rooms — they are good enough to do the job well so yield is not too bad at wafer probe, and at test, and are a good economical solution in a high volume market.
In making an IC, the process can drift or get towards the marginal side, and drive yield down and escapes up. Also, impurities are always present in the clean-room. The IC is really like a 20-layer circuit board, on top of a bunch of components formed in the board material. Any little speck of stuff in the clean room air can narrow a line, or widen a line, weaken a via or drive up its resistance. They can also alter the N and P in the features, causing a device to switch poorly in a digital gate, or have altered parameters in an analog feature as well as other “bad” things.
Given a billion transistors, and about three billion Interconnects in a PC processor, is nine-sigma even enough, given each of these four billion items can have many parameters that can fail in many different ways? What is your experience? Some companies even will sell you chips that don't pass all the tests, as long as they work with your present version of code for less money. What is your take on all this six sigma stuff?