Michael Eriksson
A Swede in Germany
Home » Software development | About me Impressum Contact Sitemap

Java style guide

Disclaimer: This document is very far from being finished. Further, a sizable part of the contents (in particular concerning Exceptions) were written on paper during a recent period when my computer malfunctioned, and were subsequently added in bulk with sub-optimal integration. This will be improved over time.

Introduction

There are many existing style guides for Java (e.g. http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.htmle); and this page is not an attempt to provide yet another. Instead, it should be considered an add-on to existing style guides, with some finer points typically not included in these. It further contains some particularly important points that would typically be included, and a few issues that, IMO, are often handled sub-optimally elsewhere.

As a general disclaimer: Style guides are, by their very nature, highly subjective; and there is often no one right answer. Instead the answer will depend on personal preference, circumstances that vary from organization to organization, backwards compatibility and what people are used to, etc. In particular, this discussion of rules and rule-breaking applies. Nevertheless, a “usually the best way to go” can often be found, and this page should be seen as an attempt identify such cases based on rational criteria.

The recommendations will often not be limited to Java, but reflect more general principles; however, care should be taken not to compare apples and oranges, when generalizing to other languages/areas. A particular issue is that different languages may have developed different idioms that may be contrary to general principles, but be so established that it would be foolish to ban them.

Note that there are cases below where it can be disputed whether “style guide entry” or e.g. “best practice” is the better term. (However, the legitimate scope of a style guide is not limited to naming conventions and layout.)

Recommendations

Below is a continually growing list of recommendations with a corresponding rationale.

Make ample use of final

Declare any variable as final, where it is not known in advance that it will be changed.

Rationale: This allows the compiler to provide checks against accidental changes, gives other developers a hint not to make unwise alterations, and prevents re-use of variables (a common source of bugs).

Beware, however, that a declaration as final only affects which object is pointed to: The object can no longer be switched for another; however, it can still be altered.


Side-note:

General principle: Help the compiler to help you.


Prefer accessor methods over direct access

Reduce direct access to member variables to a bare minimum, preferably only to getter/setter methods and, possibly, constructors.

Rationale: A direct access may look as a convenient time saver for the developer, but it is also a potential source of bugs and maintenance problems. Instead, always use getters and setters to provide a “single point of access”. This makes it easier to enforce particular behaviours (e.g. preventing that a variable is set to null), allows centralized changes, and facilitates debugging and logging.

An access from a class defined in another file should never use anything but getters and setters. (Note that this does not apply to constants, e.g. java.lang.Math.PI.) Ideally, only one getter respectively setter per variable is present; if several are used, only one should do direct access.

Note that with a good IDE, or an editor like Vim or Emacs, the extra work involved in setting up the accessor methods will be minimal; and even were it not, this would be a small initial one time cost.

Always use this

Always use the this keyword when accessing an instance variable or method. Benefits include:

  1. It is easier for a reader to differ between instance and class members.

  2. If the programmer mistakenly thinks that a member is on the instance level, this gives the compiler a chance to catch that error. (Note that the worst-case effects of such an error are far from trivial.)

  3. Many editors/IDEs help with code completion after an input of this.; starting immediately with the member name does not activate this help. (And if it does, the number of potential matches is much larger, which reduces the value of the help.)

Counter-argument (I actually had a team leader who was emphatically, but not, to me, convincingly, against use of this:

Use of the this keyword would be a sign that the programmer does not understand the language sufficiently to work without crutches. This is, of course, absolute non-sense. Not only is there no such connection, but the argument also leaves later readers out of the equation: One of the most important considerations, when writing code, is the later readability and maintainability—and there is no guarantee that later readers are on the same competence level as the original author.

Avoid naming schemes like theXXX

Do not name variables according to schemes like theString, theName, etc. (Other common and equally bad prefixes are “tmp” and “a”.)

Rationale: Such schemes almost always lead to uninformative and too generic names of variables—use descriptive names like textToTranslate instead. An additional complication is that there is no good way of handling collections: compare e.g. aNames or aCollectionWithNames with namesToIndex.

A common rationale in favour of these prefixes is that their use would make it easier to keep variables of different kinds apart (e.g. by using “tmp” for method-local variables and “the” for method arguments). This is partially correct, but the benefit is not large enough to outweigh the poor names; further, the same effect can be achieved by using a prefix without “natural language” meaning, which is affixed to an already formed logical name. For instance, “l” for local variables and “p” for arguments/parameters can be used. (Not, however, “a” for “argument”—this could lead to confusion with the indefinite article.) Instead of e.g. theName and tmpName, we would now have pNameToIndex and lCanonicalName.

Why not simply instruct the developers to make sure that the names are informative, even when they do begin with e.g. “the”?

Simply put, it will not work: I have seen such schemes used in practice on several occasions, and more often than not the names degenerate into prefix–type combinations like aString, aCollection, etc. Even “tmp”, which may seem more abstract, has this problem, because a semantic connection is made and the “tmp” is seen as an actual part of the logical name. The advantage of entirely abstract prefixes, like “l” is that this problem goes away—at least as long as the “l” is not explicitly read out as “local”. (Why not use e.g. “q” to avoid even this risk? A prefix without no connection whatsoever would increase the risk for mistakes and inconsistencies: With e.g. “l” both the coder and later readers can easily recognize the right category based on the prefix.)


Side-note:

This is an example of the general principle that style guides should make it easy for developers to do their job well. Cruise controlsw and alcolocksw in cars are good examples of the same principle—when they work correctly. Unfortunately, some not uncommon guidelines, like the naming schemes above, do the opposite—like a cruise control that automatically increases the maintained speed by 5 km/hour at random intervals.


Avoid interface names like IXXX

A sometimes common practice is to signify interfaces with an “I” (e.g. ICircle and ISquare) with the “plain” names reserved for the implementations (e.g. Circle and Square). Variations include CircleI. Do not follow this bad example.

Rationale: A general principle in object-oriented development is to always program against interfaces and to avoid explicit implementation names in any public context. By implication, if a variable corresponds to a circle, if a method returns or operates on a circle, etc., then the interface for circles, not a specific circle implementation should be used. In this situation, a name for the interface like ICircle is cumbersome, redundant, and misses the point; Circle, OTOH, is spot on. Additionally, an interface may have several independent implementations, and using Circle for exactly one of these introduces a lot of arbitrariness. (Using Circle for more than one, through different packages, would in turn be a source of confusion and errors.) Lastly, it is possible that a class hierarchy is refactored to make e.g. a class into an interface or vice versa (this should, obviously, be avoided for externally used classes/interfaces). The forced additional name change is an unnecessary complication, and lacks any logical reason.


Side-note:

The last points to a subtle complication: An interface in a generic sense need not correspond to an interface in the specific Java sense. (However, for any non-trivial or widely used hierarchy, I recommend that they do.)


The rationale for the bad practice is usually along the lines that the developer knows whether he is dealing with an interface or not. This, however, is a specious argument: Firstly, the developer should not need to know this. Secondly, the same benefit can be provided by using suitable names for the implementations (e.g. by including an “impl”, “abstract”, “concrete”, or similar in the name; or by using a more specific name, e.g. List for the interface and ArrayList for an implementing class).

Prefer empty collections over null collections

Prefer empty collections over null collections, and only allow the latter in cases where both are necessary for semantical reasons (hypothetically for a value that can be unset, set-and-empty, set-and-not-empty).

Rationale: The possibility of encountering a null value forces the developers to use additional if statements to check for this case, while an empty collection can often be handled generically by the same code that handles the non-empty collections, e.g.:

Collection<Place> placesToVisit = ... for (Place currentPlace : placesToVisit) { currentplace.visit(this); }

In fact, cases where empty collections need special treatment are very rare outside of UI programming, when this idiom is applied.

Premature optimizers may complain that a null check is faster or that the extra collections waste resources: This is a highly naive opinion in all but the most extreme cases, and only goes to prove that they are a danger to software development.

Do not duplicate the contents of constructors

Do not duplicate the contents of constructors, but always have one constructor call another (typically fewer arguments -> more argument) until the one “true” constructor is reached. An exception: When a sub-class wants to provide constructors from a super-class, it is often better to have each individual constructor call its super-correspondent; however, this is a judgment call, depending on the amount of logic that is or is not added.

Similar applies to methods of the same name, but different arguments. Here, however, errors are rare to begin with.

Rationale: This makes it easier to avoid duplication and to keep a neat logical structure. It also gives a single point of control.

Consider:

public class Gizmo { ... public Gizmo() { this(null); } public Gizmo(String name) { this(name, null); } public Gizmo(String name, String inventor) { super(); this.name = name; this.inventor = inventor; } ...

Note that going the other way (by having the two-argument constructor call the one-argument, which in turn calls the zero argument) would make for messier and harder-to-maintain code—even if it could look like a good idea of “incremental development” at a casual glance.

Some care must be taken when the constructors have arguments of different type or semantic: For constructors with String name and Name name arguments a simple conversion is almost always the best solution (the direction will depend on the internals of the class); however, for highly different types some other solution may be necessary. (Then again, if the types are that different, there is a fair chance that they do not belong in the same class to begin with, but should be put in two separate implementations of the same interface.)

Document design decisions

Any design decision that is not obvious should be documented, including (but not limited to) optimization tricks, unexpected data types, hacks of various kinds, choice of algorithm, whether something is a “for now” solution, what could break if a more expected solution had been chosen, ... Unfortunately, this is something that even experienced developers tend to be too lazy to do.

Rationale: There is no guarantee that the current developer will do all the future work—nor that any future developers will have psychic powers. It is also quite possible that the original developer himself has forgotten his reasons a few months later. The result, more likely than not, is that later changes unintentionally breaks something, be it by causing a bug, compatibility problem, or worsening performance by a factor of ten. If nothing else: By explicitly telling others why a decision was made, the original developer can avoid looking like a fool in the eyes of others (e.g. for having used a Vector instead of an ArrayList).

I once read a very illustrative anecdote on this topic: A family recipe for pot roast included cutting of large parts of the roast. The latest family member in line to cook was confused as to why this done. Several older family members were consulted, before the habit was traced to a still living member of an older generation. Her eventual answer: “I do not know why you do it. I did so, because otherwise the roast would not fit into my pan.”


Side-note:

I have tried in vain to dig up the original reference for this story. However, interestingly, a Google search for “"pot roast" "why you do"” yielded three apparent references as the first three hits—with two different versions and an unreadable “Google Books Result” which may have contained a third.

Correspondingly, the factuality of the story may be low. Nevertheless, the principle holds.

(Links: 1e, 2e, 3e.)


Use small code units

Within reasonable limits, code units should be as small as possible. This includes, but is not limited to, methods, classes, and class files: If a method grows too long, split the contents over several methods. If a class becomes to large, try to remodel it (likely it has been given to much responsibility—a sign of poor modeling); if this fails, divide the work onto internal helpers. If a file is too large, break it up into smaller units (this includes files in general, e.g. HTML, XML, build files for ANT, and MS Word—although the last is easier said than done...).

Rationale: Larger units are harder to read and overview, making a division forces the developer to think interfaces through, calls to well-chosen method names are much more informative than the corresponding blocks of code, smaller units reduce the risk of conflicting changes, smaller units facilitate re-use and centralization of control, ...

A counter-argument that I have occasionally heard is that, contrarily, having all code in one place (e.g. a method) would make it easier to overview. These developers have been weak in their ability to group information, abstract things, and similar; and have relied on having the code immediately in front of their respective nose in order to process it linearly. Linear reading, however, is not the way to read code—code is not a novel. A good developer learns to think more abstractly, divide the code into distinct chunks that are investigated separately, etc. For him, a higher degree of division (using good names) will be helpful, not detrimental. As for the linear readers: This is a sufficiently large deficiency that it is only acceptable in beginners; and when someone who has worked as a developer for more than, possibly, a year has not learned non-linear reading, then he should seriously reconsider his career choices.


Side-note:

The above is a completely different topic from spaghetti code, which often forces the reader to follow a linear thread-of-content which is distributed in a file in a non-linear manner. What is discussed here is the ability to get a higher level overview and investigate lower levels when and if the information becomes relevant. Spaghetti code resembles gamebooksw; the above, a book with a good table of contents, an overall overview, individual summaries for each chapter, descriptive names for each chapter/section, etc.

A recommendation to enable linear reading over spaghetti-code reading has my full support—but it is even better to strive directly for support of “chunking”.


Avoid paradigms and idioms from other languages

... or at least be very careful when using them.

Rationale: These may not be understood by everyone else, typically fit poorly within the framework of Java (more generally, any “foreign” language), and often lead to absurd code.

Example: One of my previous employers started as a SmallTalk specialist. When a move to Java was made, the then developers missed several features present in SmallTalk (notably, anonymous code-blocks), and tried to emulate them in Java. The result was many cases of hard-to-read code, with disadvantages far outweighing the advantages. (Although they may have worked as long as everyone had a solid SmallTalk background. This state, however, did not last long; and ten years later...)


Side-note:

In fact, they were so stuck in their SmallTalk thinking that they used formulations like “send object X the message Y” when discussing the Java method call X.Y() in documentation intended for the public. Not only does this indicate an inability to think in the proper paradigm, but it is also highly likely to confuse readers—most of whom will not be familiar with SmallTalk, but will often be familiar with message passing in other senses (e.g. between two threads).


Notably, this applies even if the paradigms are beneficial in and by themselves: The anonymous code blocks of SmallTalk, e.g., are wonderful—in SmallTalk. That language has a built-in support for them; Java does not. In an analogy: It can be disputed whether electric razors are superior or inferior to classic ones; but the point is moot when no electricity is available.

Wrap third-party products

Always wrap third-party products in an abstracting interface—including logging, DB-access, and similar. Notably, for a first time use, a perfunctory, internal mini-class may be enough.

Rationale: There is no telling when a change may be needed, and single points of control are invaluable. This will also help in keeping modularization up, and to keep “feature discipline” high (developers will think twice before using features of the third-party product that are not available elsewhere). A common issue is that the third-party products are often too low-level for them to be used directly.


Side-note:

Their being low-level is not in any way wrong—it can even be necessary in order to preserve sufficient flexibility. This does not mean, however, that it is always a good decision to use the provided interfaces directly; instead, they should be considered a generic framework to build the needed own (more abstract and limited) interfaces.


The built-in language features and standard libraries, e.g. file access, can almost always be used directly; however, exceptions can exist. For instance, logging in Java has two frameworks of considerable popularity (the built-in and Log4J), and there is a non-trivial probability that a switch or dual use will occur in any one application. Further, the extra layer can be highly beneficial for other reasons, e.g. simplifications and to have a single point of control.

Explicitly state input/output conditions

Explicitly state input/output conditions, how null values are treated, and similar. In particular, state which conditions on input are actively checked and which are (at least potentially) silently assumed to be correct.

Similarly, explicitly document which methods are idempotent and which are not, whenever relevant (e.g. for a deleteItem method, but not e.g. getXXX).

Rationale: A developer must be able to know and rely upon the behaviour of the existing code he uses. Note, in particular, that methods with similar names and conventions do not always behave similarly—often they vary even within in the same team or for the same developer. Making an explicit statement helps overcome this problem. Any unstated side-effects (and similar) must be sufficiently trivial that they will only be relevant in extremely rare cases (say the writing of something to a log file, which could cause a concurrency problem to be triggered, but will almost never do so).

The issue of what checks are performed is not secondary: If a method actively checks for its pre-conditions, developers can rely on these checks to catch any errors they make (at after writing test cases with a decent coverage); if it does not, they must themselves take corresponding precautions—and they must know of this need in advance. Notably, illegal input to a (non-checking) method need not lead to an Exception or other visible error, but can result in a data inconsistency or other hard-to-detect problem.


Side-note:

Obviously, it also highly recommended for the local style guide to lay down a standard (and for the developers to adhere to that standard...); however, even then there will always be methods and cases that do not fit the corresponding scheme—and external users of a certain module may follow a different convention.


Various points relating to Exceptions

  1. Do use checked Exceptions.

    Rationale: Unchecked Exceptions are almost always a bad idea, because the developers have no reliable way of being aware of them, catch them appropriately, and so on. In effect, an unchecked Exception will be caught when the first “catch-all” occurs, where it can rarely be treated in a good way,—or it will not be caught at all. As a side-effect, this makes it harder to build good hierarchies of Exceptions and use module specific Exceptions.

    A good test is whether anyone in the course of normal development (as opposed to e.g. testing) could reasonably want to catch and treat the Exception: If so, he should be helped to do so.

    (Note that a “catch-all” should be a last resort for unexpected events, nothing more. Further, that even a hypothetical tool that informs the developer of the currently thrown unchecked Exceptions will be extremely unreliable, because it can guarantee neither completeness nor immunity to future changes.)

  2. Let a module only throw its own (checked) Exceptions; not just pass through the Exceptions of others, but catch and wrap.

    Rationale: This makes for a more consistent, simpler, and easier-to-handle interface. In particular, any developer using the module will know that he has only a limited set of Exceptions that he needs to be concerned about. Further, the change resistance is increased, in that no additional effort is needed when something changes in a lower layer.

  3. Use fine-grained Exceptions (in a suitable tree-hierarchy) for any module. While it is rarely possibly (or a good idea) to give every special case its own class, every major category, at a minimum, should have one (hypothetically: IO Exceptions, security Exceptions, incorrect-input Exceptions—this will vary from case to case). Particularly important individual Exceptions should also have own classes. When in doubt, err on the side of too many individual classes.

    Rationale: This enables other classes to make their treatment correspondingly fine-grained. They may not do so, but the decision should be left with them, not with you.

  4. Never use catch–do-nothing: Minimum is a log message. (Some minor exceptions can be allowed, like failed numerical conversions, where the Exception is used as a flow control feature, rather than a “true” Exception—this, however, is usually a bad practice.)

    Rationale: With a catch–do-nothing, there is no way to find out what went wrong after the fact. Indeed, it will often be hard to find out that something went wrong at all.

  5. Configure logging to send emails (or similar messages) upon any Exception (or other noticed problem) above a certain severity.

    Rationale: Too often log files are not checked (time constraints, lazy co-workers, a sloppy mentality). Even more often, they are not checked in a timely manner. The result is errors that remain undetected and uncorrected, that are detected too late, that re-occur half-a-dozen times even though they could have been fixed after the first occurrence, etc. Explicitly sending an email reduces the risk for this considerably—and can even allow developers to correct a problem before the customers become aware of it.

    A case in point: I was once a member of a team that developed and maintained one of Germany’s leading online auctions for a customer. My code was written to send emails, as discussed above. Once I received an error email triggered by someone at the customer’s playing around with the newsletter interface. The error was entirely uncritical for (and probably went unnoticed by) him—but for us the alarm bells went off: The error was caused by the ISP’s failure to mount all NFS file-systems correctly. If we had not noticed this, a major malfunction would have occurred at a later stage; as is, we called the ISP and had it correct its error before any non-trivial problems ensued.


    Side-note:

    Note that for consumer applications this must be restricted to development: Sending emails without the users knowledge and full control is evil. Further, there is always a risk that one single error results in a hundred thousand emails, once the application has been released.


  6. A FileNotFoundException (the same applies, m.m., to other specific Exceptions) should only be thrown by methods that have a very clear association with files, e.g. a convenience method around Java IO file calls. In contrast, a generic resource or stream class should throw a special purpose Exception, e.g. ResourceNotFound, when a needed file (stream, database entry, whatnot) is not found. This Exception, in turn, should wrap the original Exception.

    Rationale: Doing anything else leads to poor abstraction, forces other classes to do a lot of special-case handling, etc.

  7. Make sure each module has its own Exceptions hierarchy with one “root” per module (which, on occasion, may inherit from a more abstract root—if logically reasonable). If several roots are needed, this is usually a sign that the module does too much and should be split into several; however, the existence of several sub-roots, to justify the different conceptual needs of the module, is entirely acceptable.

    Another module should inherit the hierarchies only when this is conceptually justified—not when it happens to be convenient. Using the same Exception will hardly ever be justified.

  8. A common recommendation is “Handle Exceptions/errors as early as possible. Do not let higher levels see the problem.”. This recommendation is fundamentally flawed: A problem should be handled at the most appropriate level—and this can be quite high: Consider e.g. a file not found error after the user has manually requested that file. The pertinent issues are who knows most about the current situation, who could find an alternative solution, who could fix the error, who needs to know, etc.

    Obviously, this does not mean that an Exception should be propagated willy-nilly; the relevant information should be brought to the right point in a controlled manner. Further, the right point can be where the problem is first detected.

    Another problem is that this recommendation is often misinterpreted to forbid a propagation at all—or to require hiding vital information by a try-catch-log-continue. This is only very rarely an acceptable, let alone appropriate, solution.

    (See also my discussion of error messages.)

  9. When logging Exceptions, make sure to include the stack trace and all “recursive” Exceptions.

    Rationale: The stack trace is a developers best friend when it comes to debugging. Not having the corresponding information, or only having it in an incomplete form, makes debugging much harder (and is extremely frustrating).

On additional work

Some developers like to complain that this or that code convention leads to unnecessary work. This, however, is seldom a valid concern: If a developer wants to save time for himself now, he does so at the price of time lost later—often by others, often through more difficult maintenance or unnecessary bugs, often outweighing the short-term save by several orders. Notably, this “later” can be in the near future, often even the same day...


Side-note:

As a general rule, good software developers focus on making life easy for others, not for themselves: They make sure that they choose good names, they write comments, structure their code well, write test cases, etc.—even if this slows them down in the short term. The reason: They know that this extra effort will pay off in the long run.

Being able to hold the workings of a nightmare class in ones head, does not make one a good developer—making sure that no class is a nightmare, so that no-one needs to, does. Also note that as good as everyone is wrong, who considers himself intelligent enough to handle the complexities of software; and that even if he were right, not all of his colleagues would be.


On brevity

A sometimes heard argument against use of some conventions is brevity, e.g. that use of this or accessor methods leads to cluttered code or too long lines; however, this should not be an issue for good coders: Problems arise when statements are too complex, line-breaks are not used correctly, or similar. In effect, if adding a this in front of a name leads to problems with reading or maintaining the code, then the code already is in a severe need of clean-up.

Neither should the additional keystrokes be an issue: Any negative effects here can be neutralized by learning touch typing or how to use macros; further, the additional hints for code-completion systems will often lead to a net decrease in the number of key strokes.

Automatic style-compliance checkers

There are a number of tools available for automatic checks of compliance to a certain style. These make it easy to check for and enforce the right use; and will help in avoiding future problems. Checkstylew is one example of such a checker, also available as an Eclipse plug-in.

Notably, the single greatest problem with style guides is how to get people to follow them: Attitudes like “I know what I am doing, and do not need a style guide.”, “I don’t give a f-ck!”, and “Style guides are good in theory, but I do not have the time and memory to learn ours by heart.” are very common—never mind innocent over-sights and errors. Just mandating that the style guide be followed will not work with all employees, threatening with repercussions is likely to backfire, and attempts at creating awareness and understanding are time-consuming and will not work with everyone. By instead having an automatic compliance check, e.g. in daily builds or as a pre-requisite before it is (technically) possible to commit a change set, these problems can be reduced—while at the same time making it easier for the developers to follow the style guide through direct feedback to errors.


Side-note:

It can be interesting to have a closer look at the attitudes mentioned:

“I know what I am doing...” can actually be true; however, overlooks the benefits of consistency through-out a group of developers or a product. There are often several equally worthy solutions, yet keeping to one of them through-out makes it easier for individual developers to read and understand each others code, and reduces the risk of misunderstandings. Further, in my experience, a clear majority of people who consider themselves competent, or even highly competent, usually are not; thus, merely having this attitude does not automatically put the developer in the group were a limited justification of the attitude is possible.

“I don’t give a f-ck!” (and, yes, this feeling is actually occurring) indicates someone with a severe attitude problem. Unless there are mitigating circumstances, or the developer is extremely competent in other regards, his position in the company may need re-thinking. Actually getting someone of this mindset to change is very hard. (However, I stress that no-one should be let go without having been explicitly told what the problem is, and given an opportunity to change. Further, great care should be taken to ensure that the mindset has not been misunderstood.)

“Style guides are ...” is often an excuse; however, when it is honestly meant, the possibilities of a change in attitude are comparatively large. Apart from use of tools (as above), it can help to stress that later time savings will outweigh the short-term problems, and that the difficulty of learning a style guide are comparatively small—in particular, if a gradual learning is allowed.