Michael Eriksson
A Swede in Germany
Home » Software development | About me Impressum Contact Sitemap

Debugging

This a discussion of various techniques, observations, and issues relating to debugging, with a main focus on finding a known bug—not necessarily by using a debugger. A debugger is an enormously valuable tool; however, most developers have an over-focus on debuggers, and lack the skills to use other techniques. Correspondingly, parts of this page will deliberately preach the use of such over debuggers: Praising debuggers would merely be preaching to the choir.

The articles on bug tracking and test cases may also be of interest.

Understand the problem

The single most important issue when searching for the cause of a well-hidden bug: Make sure that you have understood the problem, the error message, what both the expected and actual behaviour is, that any indications in log files have been considered, etc. Spend time on considering various plausible causes, make an educated choice of technique to search for it, try to recall experiences with similar bugs in the past, ...

Do not just start-up the debugger to run around like a blind hen looking for a corn. (Note, however, that for an easily found bug, this may work well—some ability to judge the “hardness” of the bug in advance can be helpful.)

Incorrect assumptions are the main cause of bugs

A key realization is that most bugs are, one way or another, caused by the developer making incorrect assumptions. It can pay to simply read the code explicitly thinking about what assumptions are being made: return values, throwing of exceptions, operator precedence, ... Even such trivial things as the assumption that a particular statement is written correctly can play in: Just because a particular statement is a trivial if that a first year college student would have no problem with, does not imply that it is written correctly: After all, if college graduates can need proof-reading to catch natural language errors like “there” vs. “their” vs. “they’re”, they can equally well miss a similar trivialities in programming. (Note that almost all college graduates will be easily able to decide which is the correct word; however, the wrong word can still slip under the radar screen—and, once there, it can be surprisingly hard to spot.)

In the end, if the cause of a bug cannot be identified, then it is almost certainly the case that the right (wrong?) assumption has not yet been scrutinized. This is also one of the reasons why it pays to take a break and come back with a fresh mind, try an entirely new angle, or invite someone else to have even a ten-second (!) look at a problem that has cost half-an-hour with no progress. Similarly, one of the benefits of binary search is that the possible place of the faulty (and possibly near invisible) assumption can be pin-pointed without having to identify it beforehand—and once only one line of code remains, finding the fault is usually easy.

Beware that as proficiency increases, the amount of thought spent in doing something decreases. While this will, overall, lead to less errors made per unit of work, it can lead to the introduction of some errors that would otherwise not be made—and it can make it a whole lot harder to identify some errors. The reason for this is that the brain skips over things and makes things automatically, where little thought is needed, which results in “blind spots”.

Common incorrect assumptions include that pre-existing code works, that the documentation of pre-existing code is correct, that the input/output an pre-/post-conditions have the right values, that the right code version is used, that the build is consistent, that the developer has not made any embarrassingly amateurish errors (we all do from time to time), that a loop exits at the right time, that the return value of a statement is actually assigned to the right variable, that a certain method does (not) modify object state, and so on and so forth—even the assumption that an original error message was understood correctly is often faulty.

Third-party products can also be a source of problems; in particular, ironically, the commercial ones—for open sources products the reputation of the project and the closeness to the last major release play in when judging the risk. In contrast, the assumption that the compiler, virtual machine, or similar component is working well enough is almost always valid (at least with Java, other language, e.g. PHP, can have a different situation). Looking for errors there should be a last resort.

Various techniques

The possibly most powerful debugging technique there is, is a plain binary search (in various guises): Remove half the code from the equation and see if the error is still there, re-install and remove half of the half where the error is now known to be, etc. (In rare cases, one problem may be sufficiently spread out that this fails.) Sometimes the removal is abstract, e.g. by printing pre- and post-conditions at a half-way point; on other occasions it is literal; on others yet, a dummy or mock-up can be plugged in at various points—the principle of binary search remains the same.


Side-note:

Generally, binary search may well be the single most important algorithm to know—in any context. It has a wide range of applications, some of them even in daily life. The main competitor is likely (the related) divide-and-conqueror.


Unfortunately, many developers consider binary search a last brute-force solution after using a debugger has failed: This is usually a sign of lack of experience with, or understanding of, different techniques; however, problems do exist where using another technique is faster, or where binary search is unsuitable—as they do with any technique.

Isolate components from each other

Similar to binary search, it can pay to isolate different components from each other, rather than trying to investigate a system as a hole. A good starting point is to pick a (potentially) offending layer, modules, class, whatnot, and just check whether its in- and outputs are correct. If they both are, then the problem is likely to be further down the line; if the inputs are flawed, then it is likely earlier; if the inputs are correct and the outputs flawed, the we look at the right culprit.

Note, in particular, that by using appropriate techniques more than one instance can be checked at once. Consider e.g. using a capable debugger or an aspect-oriented extension to print out the inputs to and outputs from a relevant sub-set of all methods (say all public methods, or all methods from several interacting classes).


Side-note:

In fact, some of the easiest debugging I have ever done was as a beginner doing Scheme: The trace mechanism provided by the interpreter made finding (at least) a ball-park position of the error trivial. A caveat is that the programs involved were very small, and I cannot guarantee that this particular tool for tracing scales.


Print statements

Print statements are another technique that is often ridiculed by the ignorant— “poor man’s debugger”. However, they have several advantages over a debugger that make them well worth using in the appropriate circumstances (note that as debuggers grow more powerful, similar capabilities are, or may later be, integrated):

  1. It is easier and quicker to do repeated tests, e.g. running through a piece of code with a number of input values and see what corresponding value a certain variable has for that input.

  2. Binary searches are typically easier to implement.

  3. Data can be reviewed easier after a run. It is, in particular, possible to group all relevant printed information in an easily overviewed paragraph, where a debugger would often require several clicks per variable and only allow viewing one variable at a time—and only during the run time.

  4. The printed output can be piped into commands like grep to further facilitate information handling.

  5. The (sometimes considerable) overhead of a debugger is not present.

Trial-and-error

Haphazard trial-and-error is one of the worst ways to debug; systematic and thought-through trial-and-error is a legitimate technique. Legitimate uses include making small changes in input and observing changes in output (and thinking about how these changes can be explained), identifying a group of cases (of input, pre-conditions, method calls, ...) that work and another that does not work, etc.

Best use of a debugger

Arguably, the best use for a debugger is not debugging (although this is a very valuable too...), but to simply interactively walk through the code once it is done: More likely than not, one finds that some statement is off, that some assumption does not hold (e.g. concerning a return code), that one does not actually entirely understand the code, or similar—even if the code does not actually fail during the execution! Finding a potential later bug before release in this manner is very cheap. (A further benefit is that the change in timing induced by the debugger can provoke some bugs that do not occur without the debugger.)

This is particularly true when own code is interacting with code written by others (which is almost always the case in modern software development), when large parts of the runtime environment is configured, dynamic class-loading is used, etc. Even knowing with what implementation of an interface the code interacts can be hard without a dynamic check. Further, the assumption that already existing code is correct is very often false.


Side-note:

This type of walk-through is also a very valuable aid when trying to get to know an unfamiliar system; in particular, when written by poor developers or with inadequate documentation.


How to use a debugger

A moder debugger (at least for Java) has many capabilities that are often wasted by the beginner, the most important likely the “frame drop”—the ability to throw away a set of changes since a particular point of time and start over from there. This is immensely useful e.g. when one sees that the execution is already past the point of error, when one wants dynamically test a certain scenario with different input or output values, etc., without having to start over from the beginning. One typical case is finding an erroneous line of code during debugging, correcting it, dropping the frame, resuming execution, and confirming that (or checking whether) everything is now in order.

Other important capabilities include “inspection”, tracing of variable values, and dynamic execution of code input during debugging. Generally, it pays to read the documentation of the current debugger sufficiently closely to know what capabilities it has (these will vary somewhat, as will the names of the different capabilities), and to take full advantage of these. Too many developers restrict themselves to setting break points, walking through the code, and just looking at the values displayed in an “objects” window; in fact, some of them lose time by not even using all types of break points available, restricting themselves to the line based ones...


Side-note:

Many of these capabilities can be very helpful when using non-debugger techniques; in particular, with applications that have a long build or start-up phase.


Problem resolved: What now?

Depending on the exact cause of a bug, it can be worth while to look for other instances of the same underlying problem. Consider the trivial case of a newcomer to Java having used & where && would have been appropriate: If he did this once, chances are that he has done so consistently. It is also likely that he is unaware of the difference between | and ||, and this too should be checked.

It can (in own code) be worth the effort to pay attention to any underlying cause for error, and to draw lessons for the future. The newcomer above would be wise to not only learn the difference between & and &&, but to grab a book and read up on the exact semantics of Java operators in general. (Obviously, someone else finding this error should make him aware of it.) Someone making an error when synchronizing multiple threads may find it beneficial to read up on issues relating to concurrency. Etc. Similarly, many developers have simple mistakes that they are very prone to do, and which it pays to check for both during development and debugging—I, for instance, have forgotten to actually assign a calculated value to a variable on several dozen occasions, and awareness of this weakness makes it possible for me to overcome it.

If the search was lengthy, some time can be spent on re-tracing steps to identify which measures were a waste of time and which eventually lead to the goal.

Various observations

Beware that, when programming, two wrongs can make a right; and that when one wrong is corrected, the other can become manifest. Similarly, the two wrongs can make a right most of the time, but not always.

In contrast, a set of errors need not have one cause: Where one bug is present, more are likely to exist.

Do not dig down in trenches and try to out-last the bug: Take a break, gain a fresh perspective, and start over with a fresh mind.

For non-trivial bugs, it typically pays to think, not just run around in a debugger: What could cause failures like this? What methods could reasonably be responsible? What does not explain it?

For newly discovered bugs, first stop should almost always be to check what has been changed recently. This may seem too trivial to mention, but many developers fail in this regard.

Using a debugger can severely change timing and concurrency situations, severely reducing the value of a debugger. The same applies to print statements (however, typically to a lesser degree) and test cases.

Make sure that all warnings and error messages have been accounted for: All to often, beginners tend to ignoring warnings as harmless—just to be severely bitten two weeks later. Similarly, applying various lint-like tools and other quality checkers, running compilers in the strictest possible modes, etc., can help by pointing out potential problem sources. (Then again, a good developer does this when developing, not when debugging.)

If you are a newcomer to a language, make very, very sure that have addressed all arrays (respectively the corresponding construct) correctly: “Off-by-one” errors occur even among the veterans of the language—for someone switching from a 0–n-1 language to a 1–n (or conversely) they are near unavoidable.

TODO Bug management: Testing any particular scenario gives a chance of finding a bug—and a radically overhauled version of a piece of software is likely to have more than its fair share.