Michael Eriksson
A Swede in Germany
Home » Software development | About me Impressum Contact Sitemap

Advice for writing better tickets and bug reports

Notes on terminology

For simplicity, I will use “ticket” and similar expressions throughout, regardless of the nature of the tickets and including cases where “bug report”, or some other term, might be more common.

For the creation of a ticket, I separate two different roles: (a) The user who wishes to report a problem. (b) The author of the ticket, who in the below context is usually a member of first-level support and not identical to the user.

First-level support (call-center/service-desk/whatnot employees) will be referred to by “FLS”.

Also see the below introduction for use of e.g. “developer”.

Introduction

Over the years, I have gained considerable experience with tickets. All-too-often, these are abysmally poorly written. The consequences include problems with understanding the ticket, misunderstandings that lead to faulty solutions, time wasted when requesting additional information, etc. Notably, these are often more damaging to the user, sometimes to the author, than to the developers, because his problem is not resolved in a timely and correct manner.

Below follows a discussion of some related issues and advice on how to do it better, with a strong focus on FLS.

Much of what is written below applies when users are also authors, at least when they want to see their problems successfully resolved. However, not everything applies and such user authors must be treated differently from non-user authors: They do not have the quality obligations that FLS has, but do have a legitimately different perspective on topics like time (FLS is paid to provide support; users are not), usually have little experience with writing tickets, etc. Indeed, here some advice for developers in an older article take precedent.

Beware, however, that not all advice will be relevant in all contexts. Notably, below I focus on issues relating to software development that do not always translate (without modification or at all) to other areas where tickets are written. Even when it does apply, used terms (e.g. “developer”) might need modification to e.g. imply a system/database administrator or a member of higher support levels. As a special case, some of the advice base on my experiences with server-side development, and the details might differ for e.g. a developer of desktop software, say, in that log files are either not present or exist on a per user basis on the user’s own computer.

Moreover, the text is partially written based on an organisation with no, or only a rudimentary, layer between FLS and development, which is the situation that I am most familiar with. However, especially with large businesses, intermediary layers can exist. While the overall principles hold true, some change in detail might be needed when such layers exist. (And additional issues of bureaucracy, territoriality, and similar are the more likely the more layers there are.)

Main rule

Future readers of a ticket are only human: The risk for misunderstandings is great; the possibility of mind-reading negligible. Tickets must be written with this in mind—and any problems caused by tickets that are unclear, misleading, or missing mandatory information rightfully go on the shoulders of the ticket’s author.

Subject

The subject (title, short description, whatnot) of a ticket must be sufficiently clear that readers of ticket listings have at least a general idea of the contents without actually opening the individual tickets. Not having this idea can lead to a considerable increase of the work needed for other parties—while the ticket’s author often saves no time at all.

Adding various classifiers to the subject is usually a bad practice. These properly belong in attributes or fields of their own—and the ticket software should have the capability both to add custom attributes and to filter by these. However, with inferior software, putting them in the subject can be a valid work-around, as long as the number is kept low and a consistent standard is used.

Main description

The main description of a ticket is the most important part, and it is very important to pay attention to exactness and understandability of language. In my personal experience, these factors are more important than completeness.

At a minimum, the ticket’s author should read it through once and correct errors, straighten out ambiguities, etc. Attempts to save time here will cost others far more time down the line—and might even backfire, forcing the author to re-work the ticket a few days later.

Completeness, however, is important, and it is better to err on the side of too much than too little information. In particular, care should be taken not to exclude “obvious” or “obviously unimportant” information without having a very fine-tuned sense of judgment: Surprisingly often, these categories contain that crucial piece of information that makes the difference between a 30 second and a 30 minute search.

Attachments

Attachments to tickets can be a great help; however, they are not a replacement for the main description. The central points of the tickets should always be moved to the main description—even when also present in the attachments.

This small effort at the time of writing can save considerable time over the accumulated life-time of a ticket, because the readers do not have to open the attachment on repeated to many occasions, because no additional applications are necessary, and because the built-in search mechanisms of various tools will work better.

Particular care should be taken with attachments that come in non-standard, proprietary, and/or vulnerable-to-viruses formats; and with attachments that in turn contain further files. For instance, I have encountered tickets with an attached email, which, in turn, had a ZIP-file attached, which, in turn, contained a CSV-file—all to provide three lines of data that could (and should!) have been copied into the actual ticket.

Worse, I have seen ticket authors that, as a matter of course, create their tickets by taking an email complaint from an end-user, attaching it to the ticket, and adding a description reading in its entirety “S. attachment”. Inexcusable!

Prefer identifiers to names

From a developer’s point of view, identifiers are often more interesting than personal names and similar: Yes, he might be able to find out which account belongs to “Karim Papadopolous”, but it will take him (potentially, far) longer than for “account number 185034503” or for a suitable username. When it comes to “John Smith” an identification might be entirely impossible or require careful conclusions from various other data. (However, giving both account number and name can be a good precaution, to reduce the risk that a typo causes work on the wrong account.)

As a rule-of-thumb, it is better to provide too many identifiers than too few. For instance, the user of an online shop might have more than one current order, each with several items, each going to a separate address, and it can make a major difference whether just the user or the user, order, item, address, and whatnot can be easily and uniquely identified. (And, yes, we might even have the same product in several orders, e.g. because someone has bought some particular book to be delivered to several recipients in time for Christmas.)


Side-note:

From a UI point of view, it is important to ensure that all identifiers and whatnot that can be relevant are actually accessible. (Above, e.g., that the identifier of the order is visible in the web interface.)

Similar remarks can apply elsewhere, but are off-topic and will not normally be discussed.

I do caution against the common trend to hide information from the user, however. This especially in the destructive form that a user must never, ever, under any circumstance see an error message beyond the equivalent of “something went wrong”.


The actions of users

Problems will often depend strongly on what exact actions the user takes in what order—and having this information can be vital to identifying the problem. Notably, there are often more than one way to achieve a certain end and a developer might choose a different way, conclude that “it works for me” and “is not reproducible”, after which the ticket will be returned to FLS for further actions.

To take a trivial example: The menus for a particular application are broken and “File/Save” has no effect. A ticket complains that nothing happens when a user tries to save a file. The testing developer tries to reproduce the error, uses CTRL-S (or whatever the local keyboard shortcut is)—and finds everything working.


Side-note:

This is a good example of how important the interpreter role of FLS can be: Not only are such differences in use between users on various experience levels common, but each might see “his” way as “the” way.


Alternatively, consider a user who saves by navigating menus with the keyboard (often by something like ALT-F ALT-S)—but who falls on his face because this combination does something else in a certain set of circumstances. (Notably, some applications have the bad habit of changing what keys have what effect in the same menu when circumstances change. The “S” will often go to the first entry in a menu that starts with an “S” and when the number of entries varies... Then there is the issue of different language versions of a particular software using different keys for this-and-that.)

In both cases, it is vital to know not that “the user tries to save”, but that “the user [performed a particular set of lower level operations] in order to save”.


Side-note:

Generally, FLS should be careful about using too high-level descriptions and instead query the user for the exact, lower-level steps used.


Similarly, it can be vital to know what exact inputs are used. For instance, not “user performs a search”, but “user performs a search with the search phrase [whatever applies]”.

Reproducibility

Knowledge about the reproducibility of a problem is often critical to the investigations—or to the decision whether the problem needs an investigation. Correspondingly, the reproducibility should always be clearly stated

In particular, a simple “yes”/“no” or “reproducible”/“not reproducible” is insufficient—starting with the problem that not all ticket authors need mean the same thing. At least the following cases need to be differentiated:

  1. A problem is reproducible all of the time, most of the time, some of the time, every once in a blue moon, or not at all.

  2. A problem is reproducible with the same data and/or with the same set of operations, but not with all data/operations.

  3. A problem with a “broken” piece of data reproducibly leads to a certain error and/or a certain set of operations reproducibly “breaks” a given piece of data.

Obviously, a complete knowledge of the above is only possible in approximation; however, it is important that at least some attempts to gather the information have been made—and that this information is actually added to the ticket.

Multiple concerns/wishes

As a rule, any ticket should deal with exactly one issue. If this is not the case, it will be hard, possibly impossible, to make correct categorizations and plan well, various actions in the ticket tools might be inapplicable, individual concerns might be accidentally forgotten, ...

This applies even when the user has reported several problems in one session, one email, or similar. Imagine, in particular, that several problems are discussed in one email and that the email is attached to a single ticket with that lazy “S. attachment”—what are the developers to do with such nonsense?!?

Checklists

If FLS have been given a checklist with specific questions to ask, it is important to realize that this checklist is not a replacement for the main description—it is a means to ensure that a certain minimal set of information is always present in a reasonably consistent and easily scannable form. Filling in the checklist does not remove the obligation to give a good prose description of the problem at hand.

Conversely, if there are fields for a particular piece of information in the checklist, that field should typically be filled even when the information is already present in the main description. (However, here considerable variation from case-to-case and company-to-company might apply.) These fields do not only make it easier to scan for the information, but can also be searchable in a more targeted manner than the main description.


Side-note:

Unfortunately, checklists are often poorly thought-through and excellent examples of one-size-does-not-fit-all. A discussion of this problem is beyond the scope of this article; however, beware of making them too inflexible or too detailed and think twice before making any given field mandatory: Not all of them need be relevant in all scenarios and sometimes there simply is no good answer to give to a particular question. It is also not certain that a checklist suitable for one product is so for another.

A strong contributor to such problems is when checklists are written by someone like the head of TLS, some manager somewhere, or similar. These can rarely judge what data are actually relevant to the developers (and are not necessarily all that bright to begin with).

Under no circumstances should checklists be abused to gather irrelevant information for secondary purposes, e.g. to profile users or to sell email addresses to third parties.


Error messages (and other messages)

Error messages must be give verbatim. This is necessary for at least two reasons: Firstly, a verbatim error messages makes it easier to search the code for the exact place of the error. Secondly, a non-verbatim error message can include distortions of the actual message that lead the developers down the wrong road. If the error message reads “User account X is invalid”, the ticket should not claim e.g. “The error message says that the users account was not valid”: The correct version is “The error message says ‘User account X is invalid’ ”.


Side-note:

A considerable simplification, especially for multi-language software, is to have the application print an error number for all known errors (the “ORA-...” numbers used by the Oracle DBMS are a good example)—something that I strongly recommend whenever feasible. Just giving the number also immediately tells what kind of error occurred and the right place in the code can often be found more directly. However, the individual error message can contain information not encodeable in the error number. For instance, an error number can imply “File not found!”, but cannot generally tell what file was not found. Here the overall textual message (e.g. “File [file name] was not found!”) must still be given.


Screenshots

The claim that a picture would be worth a thousand words does not usually apply when it comes to tickets. However, screenshots can still be of immense value by reducing the problem of inexact and confusing statements. A developer looking for the exact error message, what exact page the user was on, whatnot, is often better off with a screenshot—simply because the textual claims made are often faulty.

Similarly, left out information (including information that might quite legitimately have seemed irrelevant at the time) can often be retrieved later from a screenshot.


Side-note:

However, it is important that all immediately relevant information be added to the main description—even when a screenshot is present. For one thing, the discussion of attachments above applies; for another, absence of data in the description can both introduce unnecessary errors and unnecessarily move leg-work to the developers. Ideally, the user just copies the data (e.g. an error message or a account number) from the application; failing that, FLS should extract the information from the screenshot (manually or through OCR); only as a last resort should this be the job of the developers.


Source code for HTML pages

Where HTML (and some similar formats) is concerned, it might even pay to send the actual source code of the page where the error occurred. For instance, it might be possible to see that some internal link points to the wrong place, that an internal variable has an unexpected value, or similar.


Addendum:

The original version of this page was written in 2012. As of 2023, so many web page contain so utterly absurd amounts of overhead (and usually pointless overhead, at that) that this advice might no longer be viable.


Time of problem

Knowing what time a problem occurred (with a granularity no greater than roughly one minute) can be an enormous help in finding the right log-file entry—and the log-file entry can give an enormous help in finding the problem. (In e.g. Java it can often pin-point the exact line of code where a particular problem manifested it self.) Further, even with a lesser granularity, important information can be implicitly provided, including e.g. whether the problem coincided with a known, more general, system error, a denial-of-service attack, or other exceptional events.


Side-note:

Here the problem of different time settings becomes important: It is not uncommon for different computers to differ by several minutes in terms of the time baseline—and the time actually displayed can be off by several hours due to different time zones.

To avoid such complications, it can pay to display the official system time of the server with every error message—or, better yet, display a unique, per event, error id that it is also present in any log entries. (Not to be confused with the error number mentioned above, which refers to the type of the error—not the individual occurrence of the error. However, the error number can obviously be made part of the error id.)


Recurring problems

It is common for certain (application specific) problems to be recurring and it obviously becomes tempting for FLS to not take down all information for each and every user who makes a certain complaint.

To a certain degree, this is perfectly acceptable and a great time saver; however, great care must be taken in several regards. Most notably, any short-cuts in this direction must be discussed with development before they are implemented. This to to ensure that development knows about the short-cut, that it is not implemented before sufficient data has been gathered overall (data from multiple users can be very helpful for finding a pattern that eventually pin-points the problem), and that different problems with merely similar symptoms can be kept apart.

Other issues include that data-gathering might still be necessary in order to take user dependent actions, to later verify that a particular bug fix actually solved a particular problem, whatnot; that new problems can be accidentally classified as standard cases; and that it must be possible for those not around when the ticket was created to understand it six months later.

Ideally, a classification to standard cases is done through some sort of formal relationship or attribute in a ticket tool. Failing this, I very strongly recommend the use of unique identifiers including a meta-description and a number (e.g. “Standard problem #5”). Failing this, too, the information provided must be kept very highly consistent between tickets, so that there is no risk of misinterpretation or confusion; in particular, different FLS members must not use different standard formulations for the same problem.


Side-note:

Will not such needs be very rare? After all, most recurring problems (e.g. common user errors and many misconfigurations) can be handled entirely by FLS, while a user with a problem caused by a bug is usually just told to wait for a future bug fix.

One might think so, but for the customer that I worked for during writing, such standard problems were common (and not coordinated with development) and often required individual intervention by developers, e.g. to correct entries in a central database. For those where a central one-off solution might have been possible, too little data for efficient work had usually been gathered, even on the first few tickets. (Arguably, an independent problem. Indeed, much of this text goes back on depressing experiences with this particular customer.)


Excursion on FLS as translator between developers and users

A core task of FLS is to serve as a translator and liaison between users and developers. These two groups typically have radically different computer skills and views on/understanding of the application. In many cases, they will have problems even with differences in basic terminology. Further, FLS has (should have...) considerable experience in dealing with end-users with little computer skills—developers are, at best, amateurs in this area.


Side-note:

An unrelated reason why FLS should serve as an intermediary is economics:

Almost all developers are paid more than almost all FLS members and almost always do work that brings a greater business value and has a greater time criticality. When external IT consultants are hired to work on a project, something more and more common, the difference increases further. It rarely makes economic sense e.g. that a developer repeatedly spends time trying to reach a user on the telephone, just to clarify the details of an error message—such tasks are better left to FLS.


Correspondingly, it is the job of FLS to make a sufficiently in-depth interview with the user that the ticket reaches a certain quality and completeness. In particular, it is entirely unacceptable to just take down a user’s monologue verbatim.

What happens in later steps will depend on the organisation and the circumstances, but often it should also be the task of FLS to handle later user contacts, including e.g. requesting more information. This especially when the user does not have access to the ticket tool and it is not possible for the developers to contact the user via that tool.


Side-note:

In those cases where it makes sense for FLS to handle future contacts, it it important to actually hold FLS to that responsibility: It is not uncommon that individual members (due to laziness or a high work-burden) or managers (e.g. due to over-focus on the own team, rather than a more holistic view of the company/project/whatnot) try to refuse this responsibility—even when the responsibility has been formally assigned by upper management. Given reasons range from none (combined with obstinate refusal) to short-sighted argumentation, e.g. that a deeper application knowledge would make the discussion with the user easier for a developer. (Can be the case, but the opposite often applies for the reasons given above.) Indeed, in one organisation I was regularly faced with the problem of FLS writing inexcusably poor tickets—and then refusing to even contact the user to fill out the mandatory information it neglected to collect the first time around...

At the same time, it is important not to take a certain distribution of responsibility religiously: In other settings, it might make more sense for the developers to take on a greater part of the burden—and the really tricky cases often need a developer.

Notably, if FLS has done its job properly to begin with, the chances that a direct contact developer–user makes sense are considerably increased.


Excursion on time per ticket/call

It is not uncommon for FLS (in particular, low-level call-center members) to underlie strong time constraints, e.g. “no more than fifteen minutes per caller” or “at least fifty calls taken per day”.

These constraints tend to do more harm than good, leaving members of FLS stressed out, users dissatisfied, and tickets in a sub-optimal state—leading to greater time-losses at later stages than originally saved.

I strongly recommend removing such constraints and instead use other mechanisms to avoid actual waste of time (say an employee engaging in excessive small-talk with a user after his problem has been resolved). Such mechanisms could include reviews of employees with a sub-average number of calls handled or simply having a supervisor listen in to any call that exceeds, say, twenty minutes. This while still bearing in mind that it is not quantity alone that matters, but the combination of sufficient quantity and sufficient quality.


Side-note:

The basic principle is so important that I considered making it a more prominent theme of this page. However, it applies over so wide a range of work and topics that it is more suitable for a later separate page of its own.

In short:

Producing shit is easy and can be done in great quantities in a short time—but doing so brings little or no value.

(Excepting some special cases, e.g. the fertilizer industry.)


Excursion on copying

When possible, it is important to actually copy error messages, inputs, and similar, not to blindly retype or blindly rely on screenshots. (Even apart from the extra effort and risk of typos when typing, and even apart from what is said above.)

What appears to be a particular piece of text can be something else and the risk that information is lost is reduced by copying.

For instance, there are many Unicode characters that are rendered identically in a typical font, and we might have situations like a user entering data in “Cyrillic” that looks like, but is not, “Latin”. If an application now has a check that all input must be “Latin” (leaving aside whether this is a good idea), the resulting error message might seem inexplicable.

Similar issues include the possibility of “unprintable” characters, two spaces that are taken for one space, a single space that goes unnoticed (e.g. at the beginning of a user name), a tab character that is taken for several spaces, and even confusions of e.g. “o”, “O”, and “0”.

I have even encountered rare cases where a portion of a text was not visible in a browser, due to unfortunate color combinations, but was both there and copy-able.


Side-note:

Unfortunately, such lack of visibility can also make it impossible to realize what needs to be copied, but copying can still be a help.

Consider a text like “Abc def ghi”, where “def” has been given emphasis through coloring, and text and background coincidentally have the same color, rendering the “def” unreadable. The user might now see “Abc [three spaces] ghi”. If this is typed the “def” is lost; if it is copied, the “def” is preserved. To boot, it stands a good chance of being readable immediately after pasting.