Michael Eriksson
A Swede in Germany
Home » Software development | About me Impressum Contact Sitemap

The big ticket disaster

2024 introduction

This is one of many texts written in 2012 but only published beginning in 2023.

In this case, the 2012 text contained a number of to-dos and keywords for future expansion. These have either been silently cut to leave the already done text or been replaced with brief comments. (My memory of the events is, by now, too vague to allow a deeper treatment—or, sometimes, to even guess at what my intention with a particular keyword was.)

Apart from such changes, the text has been slightly polished, including language improvements; and the odd addendum has been written.

For mentions of Remedy, Redmine, and whatnot, keep in mind that I speak of the state in 2012 and make no claims about the state in 2024 (or a later time of reading).

And, yes, the other 2012 texts in this category deal with the same project. And, yes, the below claim of “most disastrous ticket situation” still, a dozen years later, holds true—by a great distance.

2012/original introduction

During a recent project, I witnessed the most disastrous ticket situation that I have ever seen. This page deals with my experiences and the lessons to be learned.

Note that this project took place within a highly political and bureaucratic organization. Among the problems this caused (which is important to bear in mind below), is that we developers had only a very limited ability to influence decision making (even for decisions relating directly to us), planning, methods used, what other departments did (even when grossly incorrect), etc. Our feedback to requirements, possible sensible extensions to the application developed, possible complications of feature X (or lack of feature Y), were as good as uniformly ignored.

The product was a special purpose tool, intended for a brief time-frame (roughly one year) of use only, in a government setting. During this time, intensity of use and what functionality was predominantly used varied considerably.


Addendum:

The “roughly one year” might have been the intention. In reality, due to various delays within the overall project, the period of use was much longer.


Due to the circumstances described below, I focused my own efforts on tickets, including much boring non-development legwork, feeling that this was in the best interest of the end-users and the overall project—and knowing that the other team members were too governed by instructions from above that were highly detrimental to ticket work (cf. below).

Hotline and service desk

General

The tickets, describing problems of end users, where usually provided by one of two instances, both fraught with problems: The hotline and the service desk.

Interestingly, even from an organisational POV, no-one seemed to be clear on exactly which of the two was responsible for what—but surprisingly (considering the rough descriptions present) the former was the lesser evil. This largely due to the head (HSD) of the service desk (SD), who was extremely problematic, and whose behaviour resulted in a number of complaints from development and several meetings between him and development. During the meetings, he seemed very understanding and cooperative; after the meetings, he continued exactly as before.

Uncooperative use of Remedy

An interesting example of the above is HSD’s repeated, destructive, and conceptually false use of a Remedy (our main ticket tool) feature for refusing transfers of tickets:


Addendum:

As an explanatory note:

Users of Remedy were divided into groups (e.g. service desk, hotline, development, administrators, ...) and each ticket was assigned to a group for further action in a “someone from group X, please do something” sense (as opposed to an ownership or “its your problem—not ours” sense).

During work on a ticket, the need often arose to transfer it to another group. Such a transfer set the status to “transferred” and indicated to the new group that a need for action was present. A common example was that development transferred the ticket in order to get a clarification of some claim made by the hotline or SD, to get more data from end users (to whom we had no direct access), or similar; after which the new group was supposed to provide the needed information (take the needed action, whatnot) and then transfer the ticket back to development.

However, if the transfer was inappropriate, a member of the new group could “refuse” the ticket, which then went back to the old group with the status of “refused”. Such an inappropriate transfer could include sending the ticket to someone who was simply not responsible for the requested action, e.g. if an administrator was told to clarify some statement made by SD. However, critically, it did not include the cases discussed below.

(This addendum is partially based on memories, partially on a hard-to-understand side-note in the original draft. I make reservations for memory errors and misunderstandings of that side-note.)


On a number of occasions, when we transferred tickets back to SD with a request for more information (including missing checklist items, cf. below), clarification of formulations, the providing of an attachment mentioned in the ticket description (but not actually attached), he simply “refused” the transfer with the (automatically generated) claim that SD was not responsible. This despite the reason for the transfer being deficits caused by SD... Further, on several occasions, he weirdly supplied the requested information and declined the transfer. From our offline discussions, it is clear that his reasoning was that SD could not help (disputable; if true, not our problem) and therefore the transfers must be refused (incomprehensible; not compatible with the semantics of “refused”; contrary to the intended workflows). The result was that the ticket ended up with development again, were we, for lack of information, usually could do nothing sensible with the ticket. The correct procedure, obviously, would have been for HSD (or another SD member) to accept the transfer and either clarify the issue or to inform the customer (not development!) that SD was at an end.

I do not know the reason for this misbehavior, including whether it was caused by incompetence (e.g. an inability to understand how the workflows were supposed to work) or something more deliberate. However, a possibility is that the motivation simply was a wish to keep tickets away, to artificially create the impression that SD was on top of its work, to look good in some metric, or similar. If so, it is a great example of how too much territoriality and too great a focus on e.g. a metric can be harmful. Here, the result might have been that SD looked better; however, this came at the cost of SD not actually doing its job, which hurt the other parties that SD was supposed to assist.

This misbehavior also gave SD the wrong incentives, as it reduced the apparent need to address the quality of the tickets. (While the true need, of course, remained the same.)

Competence and training

That the members of first-level support were not the most competent is hardly surprising: This is one of the first places where organizations tend to save costs—and the stressful and unrewarding work gives those competent strong incentives to move on to greener pastures.

However, in addition, they were not given any form of training with the product (!), nor did they have access to the product (!), nor did they have access to the handbook for the product (!)—an almost surreal situation, which severely reduced their possibilities both to help the end users (who often knew more on the topic than the support) and to write understandable tickets.

Notably, HSD deliberate turned down our offers to provide handbooks, claiming that they were out-of-date. They were, but not in a significant way—and even somewhat outdated handbooks would have been far better than none at all. (The handbooks were neither written nor maintained by development, but we, and the end-users, had ready access.) A few individual members of the SD were very thankful when I sent a PDF-version in an off-the-record email. Similarly, we repeatedly offered to give an informal training and at least a demonstration of the product—an offer that HSD never took us up on.

Checklists

The one major help that the first-level support had were a few checklists to be filled out. For some reason, these were different for hotline and SD, poorly structured, too inflexible for the task, and not truly understood by the support.

A particular example is the repeated use of the field for “error message” to describe what the end user considered the problem—not too give the actual error message that he was faced with. (Further, when an actual error message was given, then almost invariably in a rough, often incorrect, paraphrase—where we needed and repeatedly requested the literal error message.)

A common issue was the inability to understand that the checklists were never intended to be the complete error description, but just a complement and a way to ensure that vital standard information was present, e.g. what page of the web-based application was shown at the time of the error. The result was that a prose explanation was typically missing and that the ticket was impossible to understand without further clarifications.


Side-note:

The problem was compounded (and at least partially caused) by an artificial time-limit (IIRC, 5 minutes) on every support call, which made it hard for even the few conscientious members to write adequate, let alone good, tickets.

Such time-limits are extremely shortsighted and counterproductive.



Addendum:

Here several keywords followed, without elaboration, but often with a theme of “poor attitude” (presumably, among SD/hotline workers).

One references “my guidelines”: I and a colleague wrote a set of guidelines on how to write better tickets that we distributed to SD and/or the hotline. The result, as we were later told, was that the female head of the hotline threw a fit, because she felt that we had encroached on her authority... To this: Firstly, a guideline is a guideline—not e.g. an order. This alone makes her reaction wrong. Secondly, whatever she was doing, including any alternate guidelines that she might have provided, worked exceptionally poorly and did massive harm to the project. Chances are that it was not truly her authority but her ego that had been encroached. (Something supported by repeated signs of great incompetence and how the incompetent tend to be the ones that handle criticism the worst.)


Development prevented from working on tickets

I strongly believe (even before this project; the more so, after it) that it is important to work on tickets continually, to fix bugs, to clear the slate of service requests, etc.—and this even when a naive first impression might be that there is no time for this. The time lost today will be repaid by losing less time tomorrow, in a week, and in a month.

Here, however, we were kept from working on tickets through at least three mechanisms:

  1. Undue prioritization of planned features over making sure that the old ones worked. In effect, there was roughly six months (!) from my entering the project until we were allowed (!) to work on tickets in any significant manner. The only exceptions were blockers and errors that were not politically acceptable.


    Addendum:

    An over-prioritization of new features over fixing old features is by no means unique to this project. However, even by 2024, I can recall no other project where a formulation like “allowed” would have been warranted. Moreover, in other projects, the problems have often related less to bug fixing and more to resistance against elimination of “technical debt”, improvements of code quality through refactoring, and similar.

    Also note that this restriction did not begin when I entered the project, implying that the “six months” is a lower bound on the length of the delay. The true length could be twice that for all I know.


  2. Enormous amounts of time were spent on, by hand, transferring tickets from Remedy to Redmine (see below for a fuller discussion) and keeping the tools synchronised.


  3. Addendum:

    Where the below currently reads “X”, the draft read “TODO”. I am very uncertain what/who was intended here, but the general idea of political limitations should be clear.

    (The reason was likely that I had some German word/title/whatnot in mind and had not yet gotten around to finding an English translation.)


    Even when we were finally allowed to work on tickets (and had the time to do so), we were restricted by the prioritization of the X. Working on tickets with a low priority was frowned upon for political reasons: X might complain. This even when working on tickets with a lower priority would have filled waiting times (e.g. because higher-priority tickets needed clarifications or could not be treated further before some other pre-condition was met) or when they could be done in five minutes, making a user somewhere happy (or, as case might have it, less frustrated, angry, and desperate). The result was that we did not get as much done as we could have in another climate. On a few occasions, we were even left rolling our thumbs—despite their being a ton of work that we could have done, had we not had these restrictions.

    The problem was made worse by the attitude of some of my fellow developers: They correctly observed that tickets without a certain priority or planned release version would likely never require solution from management’s point of view—and incorrectly concluded that solving them would be unnecessary work. (Is it truly a waste of effort, just because management does not care—even when there are users who would be made happy? Other aspects to consider include the benefit of increased quality and how removing a small problem might, as a side-effect, reveal or remove a larger.)


Addendum:

Here another set of keywords followed. Most are pointless, but some fall into an extended family of “more haste, less speed”, “an ounce of prevention is worth a pound of cure”, whatnot.

Observations in this family are not only very worthwhile in general but also illustrate well what type of problems we encountered and how much could have been done better at little or no extra cost.


Remedy and Redmine

For political reasons and as an organisational standard, BMC Remedy was used for ticket tracking—and was the only tool ever used by the hotline and SD.

Unfortunately, Remedy was woefully inadequate—without a doubt the worst ticket- or bug-tracking tool that I have ever worked with. (I cannot say, however, to what degree this was an inherent problem with Remedy and to what it was caused by the considerable local customizations.)

With these inadequacies apparent even to project management, they decided to run a second tool (the vastly superior, if by no means flawless, Redmine) in parallel. While the use of Redmine made life easier in many regards, including opening roads previously closed (e.g. by providing a good way to group individual tickets by topic), we were stuck with the task of moving a backlog of several thousand (!) tickets to Redmine by hand (!)—while also taking care of a daily in-flow of some fifty to hundred new tickets...


Side-note:

There might have been a way to automate this, but my suggestions to investigate the possibility were not met with enthusiasm, we “did not have the time” (a short-sightedness that I disagree with), and there were some doubts that we would be allowed to access Remedy through its Java API (without which there would be no chance with so limiting a tool).

In short, no steps in this direction were actually taken.


User errors, etc.

A clear majority of the tickets did not go back to software errors, but to various user errors or results of insufficiently thought-through requirements (e.g. too limited abilities to undo actions by GUI, failure to display vital information to the end users). Through the postponed ticket work, these were left to gather dust for months, where an (often) quick extension of the GUI would have stopped the flow within one release cycle—and often cleared out the already present tickets for this issue at the same time.

A good example is the marking of to-be-treated entities as cancelled (“Ausfall”) and not to be treated: A functionality to undo this cancellation had been suggested by the developers even before I entered the project, but was only added some eight or nine months afterwards. In the mean time, there were hundreds of incorrect cancellations (mostly through user errors) that needed to be undone and that prevented the end-users from completing their allotted work.

To make matters worse, these “unnecessary” tickets still had to be administrated, which costs a lot of effort and delayed the time when we were finally allowed to start large-scale ticket work.

Help at the wrong time

The best situation, in terms of ticket work, that we had was the time after my mastering enough of the application, database, and domain knowledge to make a brief pre-categorisation investigation of each ticket. This lead to roughly half the new tickets being sent back with simple solutions (including pointers to workarounds, user errors, or that something was simply not a problem (but just seemed to be so through a sub-optimal display of data in the application) within a day—instead of lying around, untreated, for months on end. This situation did not last long, however: After one or two months, we were given a helper to clear-up the administrative backlog (which was almost gone by that time...) and to take over much of the legwork involved with the Remedy->Redmine transfer. Simultaneously, I was made the lead on the development on an import interface to a third-party product, which swallowed most of my time. The result was that I was no longer able to do the pre-check on most tickets—and had to waste quite a bit of time on correcting the helper’s mistakes. (That he made mistakes is natural: The classification was hard and he had to start from scratch. The fault lies with management, who sent a “helper” at a time when he did more harm than good. In contrast, had he been added to the team at the beginning of the Remedy->Redmine transfer, he would have been an asset.)