Advertisement

Blog

FAILURE ANALYSIS WHEN “FAILURE IS NOT AN OPTION1”

Just like the quotation in the movie “Apollo 132 ” and in real life, we design things not to fail. Your “friend” Murphy and Murphy Law ’s3 (see Figure 1 below) is out there waiting for you.

Figure 1

Murphy's Law

As a consultant, I get called on during various phases of a project to perform “failure analysis”.

Each phase of a project has unique failure signature patterns. Just like the bathtub curve in Figure 2, we see more “failures” in the initial design, development and early production “ramp up” phase.

As these failures are resolved, lower number of failures are exhibited throughout most of the product life cycle. Toward the “wear out” or legacy –“end of life” build there is an increase in failures until the end of the product life.

Figure 2

FAILURE RATE BATHTUB CURVE4

FAILURE RATE BATHTUB CURVE4

“Failure analysis is the process of collecting and analyzing data to determine the cause of a failure. It is an important discipline in many branches of manufacturing industry, such as the electronics industry, where it is a vital tool used in the development of new products and for the improvement of existing products. The failure analysis process relies on collecting failed components for subsequent examination of the cause or causes of failure using a wide array of methods, especially microscopy and spectroscopy5 .

What is Root Cause Analysis?

Root cause analysis (see Figure 3) can be summarized by a series of questions: What is the problem? Why did it happen? And what will be done to prevent it in the future?

Generally, the root cause failure analysis process can be divided into four parts or steps6

:

The first is defining the “problem”. For example, failure and type of failure (reliability, specification compliance, qualification, design to cost, delivery, functionality or catastrophic) would be typical failure mode signatures.

The second is performing the root cause analysis. Typical six sigma techniques like the 5 Why’s, brainstorming, scatter diagram, flow charts, run charts, histograms, control charts, tree charts, design of experiments, and the Ishikawa Fishbone Diagrams (see Figure 4), are used.

Figure 4

Fishbone Diagram

The third is root cause identification. This is the forensic portion where the underlying cause or causes are identified.

The fourth is recommendation, generation, and implementation of the solution.

From my experience here are some recommendations.

The first recommendation is to take good notes, document all experiments, guard and keep intact all failures as evidence as long as possible, assemble a team from the different disciplines, (engineering, quality, manufacturing, test, etc.) and summarize the failures and failure modes. The second is to understand the failure mode. Using a simple chart as shown in Figure 5 can help speed up and direct the failure analysis effort, which can be time consuming. Typically most companies are “under the gun” to resolve the problem quickly. It’s usually a financial burden or their customer needs product urgently.

The third is to keep an open mind of the root cause(s) and don’t jump to any conclusions. Always ask the question what changed?

As an example, I was called in where my client was having sporadic catastrophic MOSFET problems on a legacy product. After spinning our wheels for several days, we noticed that the MOSFET vendor was changed from the original supplier due to delivery problems. Both suppliers met the breakdown voltage but the original supplier was 20%-30% higher. Of course (Murphy’s Law) the voltage spike in the circuit was just higher than the new MOSFET breakdown voltage but lower than the original MOSFET breakdown.

Thus, the root cause was “not specifying the MOSFET breakdown correctly and with adequate margin” and not the fact that purchasing switched to a new vendor. A contributing root cause was not evaluating and qualifying a new supplier in the specific product. The corrective action was specifying a MOSFET with higher breakdown voltage and analyzing the voltage spike in the circuit to determine the adequate MOSFET breakdown voltage.

With increased integration of analog and digital functions and new microelectronic technologies our products are becoming very complex. This results in failure modes, which are very difficult to analyze and fix but by using six sigma tools and a team effort there is always a solution.

References:

1 Origin of the Quote “Failure is not an option”

2 Narrative on Apollo 13 Movie

3 Murphy’s Law

4 Failure analysis definition

5 Greg Caswell April 2, 2013 IMAPS PRESENTATION

6 Root Cause Analysis For Beginners by James J. Rooney and Lee N. Vanden Heuvel

For Further Reading and Study:

Failure Analysis: A Practical Guide for Manufacturers of Electronic … By Marius Bazu, Titu Bajenescu

Microelectronics Failure Analysis: Desk Reference By EDFAS Desk Reference Committee

Root Cause Analysis: The Core of Problem Solving and Corrective Action By Duke Okes

RCA Root Cause Analysis Excellence in Problem Solving By Deepak Kumar Sahoo

10 comments on “FAILURE ANALYSIS WHEN “FAILURE IS NOT AN OPTION1”

  1. eafpres
    April 26, 2015

    Hi Tom–nice article and nice to meet you.  You thoughts triggered a few neurons in my head.  First, when I was taught Murphy's Laws by my father, as a young person in the early 70s, there was always appended one more:

     

    “Murphy was an optimist”.

     

    The other thought I had is that a complement to Root Cause is FMEA.  I'm not sure today's new grads are taught FMEA or not, but I learned most of what I know of it working as a Tier 1 supplier to automotive.  In theory, if your FMEA (Failure Modes and Effects Analysis) is complete and unbiased, the failure you are trying to determine cause for would already have been predicted.  Of course in the real world this often does not happen.  What should happen at this point is the FMEA for the system under analysis should be reviewed and improved with the knowledge of the new failure modes, and the knowledge should be propagated to everyone involved in similar designs and incorprated in future FMEAs as best practice.

    In the Automotive industry, they have done this for so long they have deep understanding of the risk of almost any manufacturing operation and or component type, developed over decades of such feedback.

  2. Victor Lorenzo
    April 27, 2015

    I would like to know how other companies make the product transition to production.

    In one of my previous job positions we (the designers from the R&D department) were in charge of almost everything from specification and design to first batch production. It was then when colleagues from our production department (another syster company) took control of regular production. We had several incidents with several batches and main root causes were:

    1) From time to time silicon manufacturers sent notes about product changes or too long lead times and the guys from production decided not to ask for our opinion when looking for replacement parts.

    2) Some components had high cost due to their requirements and specs, but the persone from purchasing department asked to production a cost reduction with same procedure as previous cause. One incident that was a real brain melting task, some NP0 capacitors, they ordered other type (X7R/X5R) with a very poor temperature coefficient and very high tolerance, so the device stopped working (RFID reader) under 5ºC.

    3) Of course we had responsibility in all of that, we did not write a full specification for every component.

    Now I always specify preferred part and two or more alternative parts. Not perfect but results are much better.

  3. eafpres
    April 29, 2015

    Hi Victor.  In a previous company the design to manufacturing transition was sometimes a struggle.  As you said, we (the design side) were responsible through at least limited production.  However, for exactly the reasons you said, components on BOMs could get changed later by purchasing etc.  One reason that happened (thiking root cause analysis here) is that dsign would buy parts from a distributor, often basing the vendor choice (ie. the manufacturer of the part) on what was available ASAP.  Unfortunately, some of those vendors could not meet price when the BOM was handed over to manufacturing and they were trying to meet BOM cost targets.  So they would switch to the same part but another vendor.  All of our work was in RF, mainly > 1 GHz, and little things matter.

    Over time, several things evolved to improve the situation.  One was purchasing created a function called design sourcing which allowed us to work with purchasing from the very start of a project.  This eventually helped us to live by rules that production could not change parts out without involving design.  Related to these processes was the AVL (approved vendor list) which, as we rolled out certain technologies required design to help sourcing in qualifiying new vendors with needed capability.  This came up in PCB raw material suppliers, board build (photo etch/plate/stack up etc.) houses, and even assembly houses (if we were not doing SMT in house for the project).  The last was really challenging as we needed to quality production parts before full ramp up, but often the SMT shops were on the other side of the globe from us.  Often this involved a lost of FedEx International Air Freight.

  4. Victor Lorenzo
    April 30, 2015

    Thanks for your comments Blaine.

    We tried to apply the principle of “design for manufacturing” and part of components selection phase was the direct contact with several major distributors (our suppliers) for passives, actives and mechanical parts. One component pre-requisite was availability and MOQ. We used Farnell, RS, Digikey and Mouser only for acquiring some components for prototyping, always after selection was done. We were able to obtain many new components for evaluation through manufacturers sampling programs.

    SMT was kind of a nightamare too. Surprisingly the one with most up to date technology was the one with worst results for us. Fortunately we had ten or more SMT shops at less than a hour from our offices. I was able (and did, in fact) to inspect every first batch in the SMT shop. That allowed us to correct any manufacturing error detected from this early production phase as we made an in depth visual inspection and an extensive hardware test.

    Unfortunately that was not the methodology for boards already in production phase, after changing the subcontracted SMT shop the people from production found themselves a couple of times with tens/hundreds of non functional boards to diagnose and repair.

    Once again, thanks for sharing your experience.

  5. ammarajashekar
    May 5, 2015

    nice one thank you

  6. Tom_Terlizzi
    June 11, 2015

    Victor,

    I have been very busy and didn't notice all these posts.

    FMEA's are very long and tedious. To my knowledge there is no automated way of

    performing a FMEA. Maybe someone out there in Planet Analog knows of an automated way.

    The reults can cetrtainly help the design but in most cases it is an after effect after the design is completed.

     I hope to add a page on my web site to collect more links on this topic because in almost any situation “failure is not an option”

    regards

    Tom Terlizzi

    GM Systems LLC

    kings Park, NY

    gmsystems-dot-com

    631-2693820

     

  7. Tom_Terlizzi
    June 11, 2015

    Victor,

    I have been very busy and didn't notice all these posts.

    FMEA's are very long and tedious. To my knowledge there is no automated way of

    performing a FMEA. Maybe someone out there in Planet Analog knows of an automated way.

    The reults can cetrtainly help the design but in most cases it is an after effect after the design is completed.

     I hope to add a page on my web site to collect more links on this topic because in almost any situation “failure is not an option”

    regards

    Tom Terlizzi

    GM Systems LLC

    kings Park, NY

    gmsystems-dot-com

    631-2693820

     

  8. Tom_Terlizzi
    June 11, 2015

    Victor,

    What I have instituted in several cmpanies I have consultated for is first article or first piece inspection before we build the whole production lot.

    In every case we found minor errors either by the contract manufacturer or errors in the OEM's documentation.

    Using this proverb in many phases of life is so simple but true. “Measure twice cut once” means you've only got one shot at cutting the peice of wood.

    When I was a boy first learning woodwork my Dad pounded this into my head, and you always double check all of your measurements just to be safe.

    The same goes for building hardware.

    thanks for sharing your thoughts with us.

    regards

    Tom Terlizzi

    GM Systems LLC

    kings Park, NY

    gmsystems-dot-com

    631-2693820

     

  9. Tom_Terlizzi
    June 11, 2015

    Victor,

    We all have been burnt with the same situation. Sometimes IC that are supposed “2nd sources” don't work eactly as advertised or their parameters are skewed to one side of the tolerance. In most cases we don't do a full worst case design as it takes to much time but Murphy's Law always shows this up. I had this problem with a 2N2222 transitor where the switching times caused havoc. Later we found out that purchasing switched vendors with out telling anyone. So that 2 cent savings cost us $100,000 and we lost one of our key customer because we were late. After that when instituted a policy that all changes  had to be evaluated in the circuit not just “pencil whipped”.

    A painful experience but being cautious is the way to go.

     

    regards

    Tom Terlizzi

    GM Systems LLC

    kings Park, NY

    gmsystems-dot-com

    631-2693820

  10. Tom_Terlizzi
    June 11, 2015

    eafpres1,

    Thanks for your kind comments.

    Your Dad was “right on” in saying that “Murphy was an optimist” 

    “Lessons learned” have to be exported to the team so that we don't keep designing in problems.

    A recent case study in the automotive world on GM's faulty ignition switch (cost ~57 cents) caused 13 deaths and COST GM over $1.2 BILLION in recalls. (Not to mention bad press).

    Top corporate management has to understand when we find a problem that we have to fix it or pay the consequences later on down the road.

    regards

    Tom Terlizzi

    GM Systems LLC

    kings Park, NY

    gmsystems-dot-com

    631-2693820

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.