In the everyday world that we live in, there is a common saying that says, “Measure twice, cut once”. Clearly this is trying to stress the importance of checking things before you take action and that if you are sloppy with the measurement, then it can lead to expensive mistakes.
So, how does this apply to the subject of analog chip design and integration? In the crudest terms, perhaps it means to make sure you verify adequately before you commit to taping-out the design. But how much verification is enough and how much is too much? To quote another common saying, “When is it time to shoot the engineer and ship it”? Perhaps more importantly, how do you ensure that the time in verification is spent wisely, especially given the extremely high costs associated with analog verification?
Coming from this as a primarily digital guy, I am always surprised at how ad-hoc the analog world seems to be. While many things are abstracted away in the digital world, there are also more discrete functionalities that have to be verified and this means that choices have to be made. Let me give you an example. Consider a FIFO (first in, first out). It is a way to sequentially store data in a pipeline-like manner; the contents are made available over time in the same order in which they were delivered (hence, the name). There are some special kinds of FIFOs that allow out of order access (so not truly a FIFO), but I am not talking about those in this context.
We could verify that the FIFO has been in every possible state from empty to full and that each operation has happened at every level of fullness. However, this is wasteful and is unlikely to find bugs in an efficient manner. It is most likely that problems will exist in one of three cases: when it is full or nearly full, when it is empty or nearly empty, and when it is somewhere in between. The “somewhere-in-between” covers all states of somewhat empty, somewhat full and about half full. We now have only three conditions to look for, and a coverage metric is defined for this.
Similar metrics are defined for other aspects of the design. Now when a simulation is performed, metrics are tracked. Test_1 may have verified conditions when the FIFO was full, for example, and does not need to be verified again. When a test is created that does not add to a new metric being checked off, it is dropped from future verification runs. In this way a more compressed set of tests is created.
Now, if I look at the analog world — especially the integrated analog parts — it would seem as if no measurements are being taken about the effectiveness of any test. Conditions under which the circuit could operate are chosen and the design is put into a particular configuration, but does that run take the design into a different aspect of operation than previous tests have done? There is no feedback in the process, no way to optimize the test suite and so after every design change is made, all tests have to be rerun.
It seems strange to me (but perhaps I am wrong) that all tests in the analog world are equally useful. I see similarities. A transistor can be in shut-off, saturation, or linear regions. We could equally set up metrics to report if each of the thousands of transistors in these large-scale devices has operated in each of these regions.
Finding out that a transistor has been in cut-off may indicate a problem, or the fact that it has not gone into cutoff may identify a case that has not been adequately verified. We could add metrics for a voltage going through various thresholds. There are many possible metrics. As these integrated analog devices get more and more complex, the tests will necessarily get more complex. They must, since the costs associated with bad silicon for such large-scale devices leaves little room for error.
I know some EDA vendors are trying to define coverage metrics for the analog world, so I guess my question to you is — will they actually make a difference if they were adopted? What is the best way to define analog verification completeness? In both worlds, the verification problem is intractable, so we have to find ways to limit the amount we do and that means we have to define an efficient set of tests.