Michael Eriksson
A Swede in Germany
Home » Software development | About me Impressum Contact Sitemap

Debugging

This a discussion of various techniques, observations, and issues relating to debugging, with a main focus on finding a known bug—not necessarily by using a debugger. A debugger is an enormously valuable tool; however, most developers have an over-focus on debuggers, and lack the skills to use other techniques. Correspondingly, parts of this page will deliberately preach the use of such over debuggers: Praising debuggers would merely be preaching to the choir.

The articles on bug tracking and test cases might also be of interest.

Understand the problem

The single most important issue when searching for the cause of a well-hidden bug: Make sure that you have understood the problem, the error message, what both the expected and actual behaviour is, that any indications in log files have been considered, etc. Spend time on considering various plausible causes, make an educated choice of technique to search for it, try to recall experiences with similar bugs in the past, ...

Do not just start-up the debugger to run around like a blind hen looking for a corn. (Note, however, that for an easily found bug this might work well—some ability to judge the “difficulty” of the bug in advance can be helpful.)

Incorrect assumptions are the main cause of bugs

A key realization is that most bugs are, one way or another, caused by the developer making incorrect assumptions. It can pay to simply read the code, while explicitly thinking about what assumptions are made: return values, throwing of exceptions, operator precedence, ... Even such trivial things as the assumption that a particular statement is written correctly can play in: That a particular statement is a trivial if that a first year college student would have no problem with, does not imply that it is written correctly: After all, if college graduates can need proof-reading to catch natural language errors like “there” vs. “their” vs. “they’re”, they can equally well miss similar trivialities in programming. (Note that almost all college graduates will be easily able to decide which is the correct word; however, the wrong word can still slip in—and, once there, it can be surprisingly hard to spot.)


Side-note:

I am very prone to such language issues, myself, and often make and fail to detect easy errors. For instance, a portion of the above originally read “miss a similar trivialities”, which is almost self-referential. I certainly do not guarantee that the current version of this text would be error free.

(Those familiar with my writings might have seen me use self-referential texts on occasions, including to illustrate an error. This was not such an occasion.)


In the end, if the cause of a bug cannot be identified, then it is almost certainly the case that the right (wrong?) assumption has not yet been scrutinized. This is also one of the reasons why it pays to take a break and come back with a fresh mind, try an entirely new angle, or invite someone else to have even a ten-second (!) look at a problem that has cost half-an-hour with no progress. Similarly, one of the benefits of binary search is that the possible place of the faulty (and possibly near invisible) assumption can be pin-pointed without having to identify it beforehand—and once only one line of code remains, finding the fault is usually easy.

Beware that as proficiency increases, the amount of thought spent in doing something decreases. While this will, overall, lead to less errors made per unit of work, it can lead to the introduction of some errors that would otherwise not be made—and it can make it a whole lot harder to identify some errors. The reason for this is that the brain skips over things and makes things automatically, where little thought is needed, which results in “blind spots”. (See the above side-note for a related example.)

Common incorrect assumptions include that pre-existing code works, that the documentation of pre-existing code is correct, that the input/output and pre-/post-conditions have the right values, that the right code version is used, that the build is consistent, that the developer has not made any embarrassingly amateurish errors (we all do from time to time), that a loop exits at the right time, that the return value of a statement is actually assigned to the right variable, that a certain method does (not) modify object state, and so on and so forth—even the assumption that the original error message was understood correctly is often faulty.

Third-party products can also be a source of problems; in particular, sadly, the commercial ones—for open source products the reputation of the project and the closeness to the last major release play in when judging the risk. In contrast, the assumption that the compiler, the virtual machine, or a similar component is working well enough is almost always valid (at least with Java; other languages, e.g. PHP, can have a different situation). Looking for errors there should be a last resort.

Various techniques

Writing a new test case

Writing a new test case that reproduces the problem will not immediately help with finding the bug; however, it can provide great indirect help—as well as a safe-guard against later re-occurrences of the bug (or against the same bug in another module that implements the same interface). Depending on developments, an individual bug might give rise to several test cases testing different components or on different levels of abstraction.

Consider e.g.:

  1. Shorter turn-around for the use of other techniques; in particular, when a re-start of the application would be needed to manually investigate the effects of a code change.

  2. Ability to access the problem more directly through investigating individual modules—not the application in its entirety.

  3. Ability to prove that particular components are not the cause of the problem by verifying that they adhere strictly to an agreed upon interface.

On the downside, a test case need not always have the ability to reproduce a particular bug, e.g. due to dummy implementations of supporting components that behave “too well” or a different timing of execution.

The possibly most powerful debugging technique there is, is a plain binary search (in various guises): Remove half the code from the equation and see whether the error is still there, re-install and remove half of the half where the error is now known to be, etc. (In rare cases, one problem might be sufficiently spread out that this fails.) Sometimes the removal is abstract, e.g. by printing pre- and post-conditions at a half-way point; on other occasions it is literal; on others yet, a dummy or mock-up can be plugged in at various points—the principle of binary search remains the same.


Side-note:

Generally, binary search might well be the single most important algorithmic principle to know—in any context. It has a wide range of applications, some of them even in daily life. The main competitor is likely (the related) divide-and-conqueror.


Unfortunately, many developers consider binary search a last brute-force solution after using a debugger has failed: This is usually a sign of lack of experience with, or understanding of, different techniques; however, cases do exist where using another technique is faster, or where binary search is unsuitable—as they do with any technique.

Isolate components from each other

Similar to binary search, it can pay to isolate different components from each other, rather than trying to investigate a system as a whole. A good starting point is to pick a (potentially) offending layer, module, class, whatnot, and just check whether its in- and outputs are correct. If they both are, then the problem is likely to be further down the line; if the inputs are flawed, then it is likely earlier; if the inputs are correct and the outputs flawed, the we look at the right culprit.

Note, in particular, that by using appropriate techniques more than one instance can be checked at once. Consider e.g. using a capable debugger or an aspect-oriented extension to print out the inputs to and outputs from a relevant sub-set of all methods (say all public methods, or all methods from several interacting classes).


Side-note:

In fact, some of the easiest debugging I have ever done was as a beginner doing Scheme: The trace mechanism provided by the interpreter made finding (at least) a ball-park position of the error trivial. A caveat is that the programs involved were very small, and I cannot guarantee that this particular tool for tracing would scale to “adult” applications.


Print statements

Print statements are another technique that is often ridiculed by the ignorant— “poor man’s debugger”. However, they have several advantages over a debugger that make them well worth using in the appropriate circumstances (note that as debuggers grow more powerful, similar capabilities are, or might later be, integrated):

  1. It is easier and quicker to do repeated tests, e.g. running through a piece of code with a number of input values and see what corresponding value a certain variable has for that input.

  2. Binary searches are typically easier to implement.

  3. Data can be reviewed easier after a run. It is, in particular, possible to group all relevant printed information in an easily overviewed paragraph, where a debugger would often require several clicks per variable and only allow viewing one variable at a time—and only during the run time.

  4. The printed output can be piped into commands like grep to further facilitate information handling.

  5. The (sometimes considerable) overhead of a debugger is not present.

Trial-and-error

Haphazard trial-and-error is one of the worst ways to debug; systematic and thought-through trial-and-error is a legitimate technique. Legitimate uses include making small changes in input and observing changes in output (and thinking about how these changes can be explained), identifying a group of cases (of input, pre-conditions, method calls, ...) that work and another that does not work, etc.

Best use of a debugger

Arguably, the best use of a debugger is not debugging (although this is a very valuable too...), but to simply interactively walk through the code once it is done: More likely than not, one finds that some statement is off, that some assumption does not hold (e.g. concerning a return code), that one does not actually entirely understand the code, or similar—even if the code does not actually fail during the execution! Finding a potential later bug before release in this manner is very cheap. (A further benefit is that the change in timing induced by the debugger can provoke some bugs that do not occur without the debugger.)

This is particularly true when own code is interacting with code written by others (which is almost always the case in modern software development), when large parts of the runtime environment is configured, dynamic class-loading is used, etc. Even knowing with what implementation of an interface the code interacts can be hard without a dynamic check. Further, the assumption that already existing code is correct is very often false.


Side-note:

This type of walk-through is also a very valuable aid when trying to get to know an unfamiliar system; in particular, when written by poor developers or with inadequate documentation.


How to use a debugger

A modern debugger (at least for Java) has many capabilities that are often wasted by the beginner, the most important likely the “frame drop”—the ability to throw away a set of changes since a particular point of time and start over from there. This is immensely useful e.g. when one sees that the execution is already past the point of error, when one wants dynamically test a certain scenario with different input or output values, etc., without having to start over from the beginning. One typical case is finding an erroneous line of code during debugging, correcting it, dropping the frame, resuming execution, and confirming that (or checking whether) everything is now in order.

Other important capabilities include “inspection”, tracing of variable values, and dynamic execution of code input during debugging. Generally, it pays to read the documentation of the current debugger sufficiently closely to know what capabilities it has (these will vary somewhat, as will the names of the different capabilities), and to take full advantage of these. Too many developers restrict themselves to setting break points, walking through the code, and just looking at the values displayed in an “objects” window; in fact, some of them lose time by not even using all types of break points available, restricting themselves to the line based ones...


Side-note:

Many of these capabilities can be very helpful when using non-debugger techniques; in particular, with applications that have a long build or start-up phase.


Problem resolved: What now?

Depending on the exact cause of a bug, it can be worthwhile to look for other instances of the same underlying problem. Consider the trivial case of a newcomer to Java having used & where && would have been appropriate: If he did this once, chances are that he has done so consistently. It is also likely that he is unaware of the difference between | and ||, and this too should be checked.

It can (in own code) be worth the effort to pay attention to any underlying cause for error, and to draw lessons for the future. The newcomer above would be wise to not only learn the difference between & and &&, but to grab a book and read up on the exact semantics of Java operators in general. (Obviously, someone else finding this error should make him aware of it.) Someone making an error when synchronizing multiple threads might find it beneficial to read up on issues relating to concurrency. Etc. Similarly, many developers have simple mistakes that they are very prone to do, and which it pays to check for both during development and debugging—I, for instance, have forgotten to actually assign a calculated value to a variable on several dozen occasions, and awareness of this weakness makes it possible for me to overcome it.

If the search was lengthy, some time can be spent on re-tracing steps to identify which measures were a waste of time and which eventually lead to the goal.

Various observations

Beware that, when programming, two wrongs can make a right; and that when one wrong is corrected, the other can become manifest. Similarly, the two wrongs can make a right most of the time, leading to sporadic errors when they do not.

In contrast, a set of errors need not have one cause: Where one bug is present, more are likely to exist.

Do not dig down in the trenches and try to out-last the bug: Take a break, gain a fresh perspective, and start over with a fresh mind.

For non-trivial bugs, it typically pays to think, not just run around in a debugger: What could cause failures like this? What methods could reasonably be responsible? What does not explain it?

For newly discovered bugs, first stop should almost always be to check what has been changed recently. This might seem too trivial to mention, but many developers fail in this regard.

Using a debugger can severely change timing and concurrency situations, considerably reducing the value of a debugger. The same applies to print statements (however, typically to a lesser degree) and test cases.

Make sure that all warnings and error messages have been accounted for: All to often, beginners tend to ignoring warnings as harmless—just to be severely bitten two weeks later. Similarly, applying various lint-like tools and other quality checkers, running compilers in the strictest possible modes, etc., can help by pointing out potential problem sources. (Then again, a good developer does this when developing, not when debugging.)

If you are a newcomer to a language, make very, very sure to have addressed all arrays (respectively the corresponding construct) correctly: “Off-by-one” errors occur even among the veterans of the language—for someone switching from a 0–n-1 language to a 1–n language (or conversely) they are near unavoidable.