Michael Eriksson
A Swede in Germany
Home » Software development » Webdesign | About me Impressum Contact Sitemap

Validation of HTML, and other automatic checks

What is validation and why should it be done?

Validation is basically just a check that, e.g., a HTML document confirms to the corresponding specification—effectively the checks that a compiler for e.g. C or Java would do, but which are not (automatically) made during development of HTML, XML, CSS, ... Not doing these tests effectively ensures that any non-trivial document will be non-conforming; and increases the risk that user clients will interpret the pages incorrectly, that web-crawlers from search engines will choke and leave a site out of their respective indexing, etc.

To continue the analogy with programming languages, validation does not imply a linting, nor does it make any statements about quality, style, degradability, or similar. However, for the purposes of this page, I group lint-like automatic checks together with validation. A separate discussion of non-automatic checks is available.


Side-note:

As a general rule: Try to get hold of as many automatic checkers as possible for whatever you are doing (and actually use them). It is better to find a problem automatically ahead of time than to have an end-user stumble into it later. Further, even a problem that has no obvious symptoms, can be a liability: Problems add up over time and, in the end, the sheer mass of them can make an application disastrously error-prone, impossible to maintain, or otherwise catastrophic.


The vicious circle of broken HTML

Very many websites are very broken. This has forced browser developers to build in work-arounds and increase compatibility with broken HTML, which has made it harder to detect errors without validation, which has caused more websites to be broken, which ...

By taking the initiative to only publish validated HTML, we can help fight this vicious circle, which has cost enormous amounts of money over the years.


Side-note:

It should not come as a surprise that Microsoft has been one of the major villains, both through the non-compliant Internet Explorer and through deliberate attempts to make life hard for non-Microsoft standards.


Correct errors manually

Some tools (e.g. tidy) offer to repair HTML automatically. Do not take this offer. Instead make sure to correct the underlying templates, the generation mechanism, whatnot, so that the HTML is made correctly to begin with. Only when existing HTML files have to be repaired does this make sense, e.g. because of a legacy left by an incompetent developer. Even here it can pay to make the corrections manually, in order to learn from previous errors, or ensure a correct repair.

Generating defect HTML that is then corrected is a bad practice: Eventually, the number of defects will grow until the correcting program has problems keeping up, makes bad repairs, or similar. Further, a version change in the program can change the way repairs are made leading to compatibility problems.


Side-note:

As a rule: Do not write HTML directly for anything but the smallest tasks—put the contents in templates with a logical structure and generate the HTML instead. Exceptions only occur for odd stand-alone pages, pages that do not fit well in the framework (possibly error pages), and similar. Even using HTML in JSP pages should be avoided (although poor middle-management decisions can make this unavoidable).


What I use

At the time of writing (2009-08-01, beware that this may change over time), I use the following means of validation for this website:

  1. Local, pre-deployment validation of all HTML files with

    tidy -q -utf8 xmllint –valid –dtdvalid "$DTD" –html validate.pl -w

    Here $DTD is the path to a local copy of http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtde, xmllint is usually included in Linux distributions (and can be downloaded from http://xmlsoft.org/downloads.htmle), and validate.pl is a perl script from http://www.htmlhelp.com/tools/validator/offline/index.html.ene. (tidy is presumed to be known...)

  2. Additional, occasional checks of individual pages with W3C’s online validator for HTMLe.

  3. Stylesheet validation with W3C’s online validator for CSSe.

  4. Occasional post-deployment link checks with W3C’s online link-checkere. I recommend installing a local checker capable of output requiring less manual inspection; however, I have not yet gotten around to this myself.

Further, I have made a few tests with various other checkers for e.g. accessibility, mobile-phone compatibility, and similar. Unfortunately, these are of very limited benefit at the moment, e.g. in that they complain about HTML code that is currently needed for backwards compatibility, or prescribe solutions (design-wise) that would lead to a reduced usability for “normal” surfers. Trying a few of these can be a good idea, just to get a greater perspective and to consider alternatives; however, only when developing specifically with an eye at e.g. mobile devices should they be given great weight. (Note that this is likely to change over time.)