Less stress, more thinking

Contents of this page

2024 Introduction/disclaimers
Main/2012 text

2024 Introduction/disclaimers

This text is one of many written in 2012 but only published beginning in 2023.

In this particular case, I suspect that parts of the details can be confusing, especially through use of technical and organisational terminology that might be unnecessarily specific or context dependent. It might be that the 2012 draft had not yet been sufficiently polished at the time that it was last touched.

I considered a re-write, but I believe that the general point should be clear even without an understanding of the details. Moreover, I would have been hampered by the vagueness of my own memories, and might, then, accidentally have falsified events through a “Chinese Whispers” effect.

Some minor language and similar corrections have taken place, however.

The choice of category is up for dispute, as the general point is more a matter of general approaches to work, problems, thinking, whatnot, than to specifically software development. I stuck with this category partially because a number of other texts deal with the same, highly problematic, project. (TODO add a list once all relevant texts are published.)

Main/2012 text

I am a great fan of thinking things through and gathering relevant data first and going to work second (“measure twice; cut once”). One particular incident provides a beautiful illustration of the waste of time and lack of productivity that can result when doing it the other way around:

It was around 15:30 on a Friday; of the four module-team members, I was the only one left in the office; and in stormed the possibly greatest liability of the larger project team, followed by two colleagues from the business side of the operation.

Side-note:

This article is not about specific, individual incompetence, but I note that he was of dubious technical competence, downright incompetent when it came to communication, had no sense of boundaries or respect for others priorities—and was entirely lacking in the insights that are at the core of this page.

Indeed, a few days earlier I was performing a database comparison between our respective modules that project management had requested. I found it near impossible to get him to say what he meant and mean what he said. Conversely, I told him explicitly on at least two occasions that his preconceptions about my evaluation involving only one specific sub-module were incorrect—yet, he kept referring to this one sub-module... He later admitted to have just listened “30 %” to my explanations and questions—no wonder there were such problems. (Also the reason why I allow myself this particular excursion.)

In his normal way, hyper-stressed and only semi-coherent, he went off about some database query or other that had to be done immediately for the business side—with no information on the background or what was actually wanted. He stayed long enough to ensure that I was influenced by his undue and infectious stress (something other colleagues have explicitly complained about; he, himself, appears to belong to the category of workers who work hard but still have problems getting things done) and then dashed off again, leaving me with the two others.

Fortunately, these were a little more forthcoming: Apparently, they needed to find out what data would be transmitted from the module developed by my team to the module of Mr. Liability during the upcoming nightly batch run.

To my surprise, they did not then leave to let me go to work, but actually hung around in my office, metaphorically looking over my shoulder.

Side-note:

While an audience does not automatically prevent thinking in favour of pre-mature action, this trap is psychologically hard to avoid. Further, an audience lacking in relevant experiences can often be a (non-psychological) hindrance, e.g. through misinterpreting silent inaction as an inability to proceed and coming with “helpful” suggestions—or, in the case of truly troublesome managers, actually verbally encouraging developers to “get to work”.

Well, I had had very little to do with this particular interface myself, but I knew enough to join together two materialized views, limit the result set to entries with the right status values, and (as requested) group them by region.

Having done this, I duly wrote a “service request” ticket to have the query executed in the production database (to which none of the developers had direct access)—and went about a second task given to me at the same time: To start a similar batch run on one of the test servers for yet another party (and for unrelated reasons).

Side-note:

Normally, such parallelism is not a big issue (assuming that the tasks are very few and at most one requires a greater amount of thinking); however, in combination with an already stressful situation, it can make a bad situation worse.

Waiting for the results, it occurred to me that my query would give the results for last night’s run: It takes hours for these materialized views to build and they are only built at 20:00, once a day—something that I was well aware of, but which first slipped my mind due to the artificial stress of the situation.

I explained this situation to the others. Fortunately, this did not need to be a problem, because the “missing” data that had caused the commotion had turned out to not be the one or two thousand entries originally reported from one of the regional centers, but just some one or two hundred—as an interim telephone call to the center had revealed. (How they had arrived at these numbers, be they right or wrong, is not known to me.) Further, apparently, the older numbers would have some value on their own.

Meanwhile, I turned to the definition of the materialized views, to see whether there was some possibility of constructing a query incorporating the central parts of the view definitions—but without having to wait for hours for the execution of the query to complete. The obvious thing to do was to restrict by status at an earlier stage; however, since the status was actually calculated by the views, this was not realistically possible. (Indeed, the need to calculate the status in a non-trivial manner using several tables, each containing millions of entries, was the very reason for the long build time.) Knowing that the experienced developer in charge of the views had already spent considerable time considering optimizations and had discussed the issue with the DB-team, I safely concluded that my chances of finding a revolutionizing optimization in the next five minutes were negligible—and, one way or another, no in-depth statement about the upcoming run could be made with less than a few hours of time.

The original production query came back, we briefly inspected it, and the conclusion was that the house was unlikely to be on fire—and that the actual results of the nightly batch run could be awaited.

Soon after, by now past four, I went home for the weekend, decided to take a casual stroll, enjoy the solitude, and stretch my legs a little. Five minutes later my thoughts drifted back to the events—and I more-or-less immediately saw several things that I could have tried:

As I had gleaned over time, not all sixteen regional centers were involved, but just a few—and only one in an urgent manner. By limiting the query in the view definition to this one center, the execution time would have dropped to roughly 1/16 of the several hours: For two hours, seven or eight minutes; for even a full four hours, no more than a quarter of an hour.
Yes, these results would not have been complete, but they would likely have been “good enough”—and they would also have been available “fast enough”.
Side-note:
The estimate of 1/16 is very rough: For one thing, not all regional centers have the same number of entries; for another, the results of the complex query over a great many rows might have had a time dependency which was not uniformly linear.
The status was calculated during the build of the materialized views; however, the calculation was only complicated in some cases. If the “missing” data did not belong to these cases, the status could (at least to a high degree of approximation) be read directly from the main application tables—with an execution time of just a few minutes.
Depending on exactly what was wanted (something which was never stated in detail...), other investigations might have been available. For instance, we have a particular table for logging all entries that have already been transmitted, which might or might not have been helpful. Further, given a few specific sample cases by “business id”, I could have checked these individual cases without needing the materialized views in any way.
Side-note:
With too little information forthcoming, I should have explicitly requested more details. However, this too is one of the things that are very easily forgotten when extra stress is present. Besides, the original request was simple enough, the details only becoming important when the execution time needed to be cut.
Generally, I recommend including the “why” in any request (for information, a new feature, an unusual service, whatnot). It costs little extra effort and can have a great positive effect.

Now, contrast the above with the following set of events (preferably at an earlier part of the week and/or day...):

Mr. Liability calmly walks in, states that Mr. X and Ms. Y have a problem, and asks whether I can take over.
Mr. X explains that one of the regional centers fears that data is missing, includes all relevant details (ideally with a few sample cases), asks me to do the best I can within, say, the next hour and to let him know the results (even if a failure).
I am left alone and can take the time to think things through—with the option of calling for more information or for a choice between different alternatives, should the need arise.

Needless to say, I would have produced better to far better results and the others would have been correspondingly more satisfied.

Side-note:

As a further annoyance, but unrelated to the circumstances under discussion, it has later turned out that I was somewhat misinformed concerning the times needed to build the materialized views: On our test systems, with smaller amounts of data, the view with the complicated logic (cf. above) was built in ten minutes and the other in a fraction of that time. For natural reasons, I had assumed that the former would still dominate the time on the production system; however, there the proportions were reversed... (Due to the relative amounts of data present for the respective view on the respective systems.) Correspondingly, an early constraint on status might have been feasible after all. Further, the overall time might have been closer to one than to two hours.

In this case, just a stroke of additional bad luck; however, a few more lessons can be drawn, including:

Even highly plausible assumptions can be wrong, and it can pay to consciously become aware of assumptions and actually check their correctness.
Requests for (e.g.) information should be done in a sufficiently timely manner that they can be answered by those most knowledgeable in the sub-area at hand: No one knows everything about every module/interface in a larger application—and the devil is often in the details. Notably, most cases of “has to be done now” only arise because one or several parties did not proceed when a certain need was or could have been recognized—but only at the last minute.

The following is an automatically generated list of other pages linking to this one. These may or may not contain further content relevant to this topic.

Sitemap