Chapter 15

Troubleshooting

The general method for troubleshooting divides in six steps. Eventually, a seventh step is added if relevant:

  1. Describe the problem
  2. Identify root cause
  3. List possible solutions
  4. Choose solution
  5. Implement solution
  6. Validate solution
  7. [Prevent recurrence]

This method applies in every situation where an issue occurs. During the development of an analog IC, this can take place mainly during the design phase and during the validation phase. A design issue is generally a specification item that is not met. A validation issue is generally an unexpected difference between simulation and measurement.

15.1 Describe the problem

This can seem obvious but defining accurately and without ambiguity a problem and specifying properly the conditions of occurrence if applicable is a difficult task that requires method and training. A powerful method is to answer the standard questions:

Very often, answering the initial list of questions brings additional questions.

  • What? What does not work as expected?
  • When? When does the issue occur, in what conditions?
  • How? How can one see that something is wrong? How often does the issue occur?

15.2 Identify root cause

This is probably the most difficult step in troubleshooting. Of course, possible root causes depend on the addressed issue. However, there are several standard cause classes:

  • Tools
  • Environment
  • Principle
  • Implementation
  • Failure
  • Interaction

15.2.1 Tools

The issue can be real on the circuit under evaluation or can result from the tools that are used for the evaluation. Tools depend on the phase in progress. They are simulators and models during the design phase, they are measurement instruments during prototypes validation. Tools must be questioned and checked to make sure that the issue is real. Using another simulator or another simulation technique, using another measurement instrument can help.

15.2.2 Environment

During design, design kit can be updated resulting in changes in the circuit behavior. The change does not necessarily result from a design change. However, normally the new design kit is supposed to be more accurate and the circuit must be change in order to work properly in the new environment.

During prototypes evaluation, some “oscillations” may not result from the circuit but from a cellphone or from another signal source in the surroundings.

A special subclass of environment items are parasitic elements. They should be considered during design and tools should take them into account but this is not always the case. In the lab, sockets parasitic elements can impact the circuit behavior.

15.2.3 Principle

The issue can result from the principle that has been used for designing the circuit. During design, this should normally occur only at early stages. It this occurs during validation, it indicates that something really went wrong in the development process.

15.2.4 Implementation

The issue can result from a bad implementation of a good principle.

15.2.5 Failure

Issue can result from a component failure or from a design failure.

15.2.6 Interaction

A difficult situation is when the issue does not result from a simple cause but from an interaction between two or more causes.

15.3 Design debug

When one or more specification items are not met

15.4 Silicon debug

In case something does not work as expected in the lab, either during verification or during characterization the designer must start a troubleshooting or debugging phase. Globally, the approach is based on the general seven steps problem solving method:


  1. Describe the problem
  2. Identify root cause
  3. List possible solutions
  4. Choose solution
  5. Implement solution
  6. Validate solution
  7. Prevent recurrence

Of course, when applied to silicon debugging, the general method is adapted to the particular constraints of that field.

15.4.1 Define the problem

Steps are:

  1. List issues.
  2. For each issue, check consistency and find boundaries.
    1. Run the experiment again in order to avoid spurious phenomenons.
    2. Run the same experiment with a couple other parts in order to avoid a part artifact.
    3. Run the experiment with another measurement instrument in order to avoid an instrument artifact.
    4. Vary supply voltage and temperature or other signal parameters in order to find a possible region of correct operation.

Each issue should be summarized in a sentence describing the undesired behavior and the conditions of occurrence.

15.4.2 Find root cause

One root cause has to be found for each issue. A single cause may be the root for several issues. The method for each issue is:

  1. Imagine scenarios that could lead to the issue.
  2. Check scenarios
    1. Force the circuit in conditions (temperature, voltage, frequency...) where a given scenario predicts that something should change and check prediction.
    2. Use the on chip test circuitry in order to identify the faulty block.
    3. Use simulations to enforce assumed cause and compare with measurements (circuit signature)

When two or three signatures comply with expectations from a given scenario, it is a good candidate. The investigations to find root cause may show additional issues. These should be considered as well but as separate issues that must be analyzed. Only careful analysis can define whether different issues are related to the same root cause. A complex problem should never be over-simplified.

Once a root cause is identified, it should be validated using all the possible tools and methods. Simulation is a powerful tool but some techniques such as FIB (Focused Ion Beam) allow in place circuit modifications by cutting wires and creating new connections.

15.4.3 List possible solutions

Just like for design, the right solution is the best choice in a list. Depending on the context (metal fix or full re-spin), the list of possible solutions may vary. Usually, the more expensive the solution, the more powerful it is.

15.4.4 Choose a solution and implement it

The solution must be validated extensively through simulation. A design change leads to a new design with ideally the expected effect on the problem. But the new design may also exhibit some undesired behavior or performance that could be worse than the original design!

15.4.5 Validate the solution

Full silicon validation is required again to check that the original problem is fixed but also that the fix did not create a new issue nor it did break something that worked fine before.

compteur.js.php?url=%2FCuq%2BrY5pFc%3D&d