Michael Eriksson
A Swede in Germany
Home » Software development | About me Impressum Contact Sitemap

Strive for re-use

One of the main differences between good and poor software developers: The former strives for re-use; the latter re-invents the wheel. Not only is re-use usually a time saver, but it also typically reduces the complexity of the code, diminishes the number of bugs, allows for central points of control, makes it likelier that others will understand the code from the go, and increases the chance that there will be decent documentation.

Unfortunately, all-too-many developers do not understand when re-use is beneficial—in fact, may not even recognize the opportunity, except in the most obvious cases.

To take a specific example: My first task at E3, a long time ago, was to provide a basic queuew for use in Java. In a first step, I looked through the Java collections classes to check if a queue was already available. This was not the case (at the time, now java.util.Queue is present). I then had a look at other collections, made a few brief combinations in a wrapper class (probably just adding and removing elements to/from a LinkedList), and had my queue done with little own coding and most of the “leg work” done in the collections framework. I could, in particular, rely on the very thoroughly tested standard collections having eliminated most of the potential bugs that a hand-coded solution would have risked. As a bonus, extensions and modifications, e.g. to allow for other data types than Strings (to which the original requirements were limited), would be easy. I even remember being proud of myself for having so neatly solved the problem.

My boss took one look at the code, made a statement along the lines of “That is nice, but the task was to write your own queue. Use an array of Strings as the underlying data structure.”, and sent me back to the drawing board—thereby demonstrating a highly disputable judgement.

Now it had become clear from context that the task was not just intended to produce code for the application being developed, but also had the unstated purpose of testing my abilities. This is perfectly valid in it self; however, had he had his priorities straight, he would have had a very positive answer to the test: I am a developer who strives for re-use, low complexitiy, and do not unnecessarily re-invent the wheel. In fact, trying to implement my own data-types for everything is a beginner’s mistake that I commited and grow out of during my undergraduate days... (As is, I suspect that he drew entirely wrong conclusions: Michael was uncertain how to handle the task on his own, and resorted to re-use to cover up this fact.)

Being able to handle complexity is good; avoiding complexity is better. Being able to write functionality from scratch is good; re-using functionality is better. (The respective formers may, arguably, be pre-requisites for being a software developer; however, that a F1 driver can handle skidding does not imply that he should not try to avoid it.)

But what about the speed of the solution? Usually not an issue: Firstly, optimization should be done when it is actually needed (TODO article on this); secondly, many ready-made components (including Java’s collections) are already sufficiently optimized that a hand-made solution often needs considerable tweaking before it can compete.

To illustrate the same principle with a similar example: How does one take an existing List and provide a counter-part with unique elements? Go through it checking for duplicates? Use a hashing mechanism? No; create a Set based on the original List, and the rest is automatically taken care of. If a List is required as output, just create a List based on the Set... Something like new ArrayList(new HashSet(originalList)) should do the trick. In the unlikely event that this turns out to be too much overhead, then consider using e.g. a hand-crafted loop. Possible exceptions would be when very long lists are involved, the code is called very often, when sorted lists are involved, or strict real-time responses are needed—calling this code with a list of ten elements a hundred times per second may not even have a measurable effect, let alone a noticeable one, on a typical application on a modern computer.


Addendum:

At the time of original writing, I was using a measurability scale based on milliseconds, which through System.currentTimeMillis() was long the standard for measurements in Java—and even this was often optimistic: Windows NT, e.g., refused to do better than hundredths of a second.

As indirectly pointed out by a reader who disagreed with my statement about measurability, System.nanoTime() is now available, and higher standards for what is measurable should be used. (Beware that, as with NT of old, there is no guarantee that any individual system will make full use of nanosecond resolution and accuracy. A brief test on my own computer gives nothing better than microseconds; which would still be more than enough to measure the above, however.)

Even so, there is a fair chance that the corresponding extra cost would have no noticeable effect in even a typical desktop application, let alone the ones discussed immediately below in the original text.


Certainly, in the applications I have typically been involved with, it would have been a triviality: JEE based server-side applications with file and database access, JDBC overhead, dynamic generation of HMTL from JSP, latencies for the Internet connection to the user, ...


Side-note:

While I do not advocate being wasteful with CPU resources, it astounds me how many alleged professionals have no concept of the current turn-over in a CPU, and how much more e.g. a typical file access costs—or what difference an algorithm with a higher or lower level of (computational) complexity can make. Notably, the above method of creating a list with unique elements is linear in the size of the list, just like a manual filtering would be—and the actual use of the filtered and unfiltered list is likely to be at least linear in the first place...

(In contrast, if we wanted to find the smallest element of a list by first sorting the list and then picking the first element, the complexity would be Nlog(N) or higher; whereas a direct search for the smallest element could be done in linear time—here an a priori accusation of wastefulness could be justified. OTOH, even here it would take a rather large list before the effects became noticeable.)