Michael Eriksson
A Swede in Germany
Home » Software development » Databases | About me Impressum Contact Sitemap

Tips and tricks around SQL

Introduction

This page contains a few tricks with Oracle SQL that I have found to be useful over the years. Some of them might apply directly to other SQL dialects, or do so after only minor changes; however, I make no such guarantee in any given case. For natural reasons, the main intended audience consists of software developers using SQL on regular basis as a complement to their main languages (Java, C, whatnot). Many things mentioned here will be elementary to an SQL specialist.

If you know a better way to achieve some of the below (always a possibility with SQL), please let me know.

Disclaimer and meta-information

This page was originally written in 2012, but went unpublished until 2024. Even the 2024 publishing is in so far a little premature as there were several to-dos left in, including a “check code for correctness and format it”, and as there are plenty of other tips that could potentially be included. (I suspect that I, even back then, could have found one or two dozen tips more valuable than the discussion of DB links towards the end. I would speculate that I had simply encountered that topic shortly before the time of writing.)

As it has been several years since I last dealt with SQL (in general or specifically for Oracle) in a non-trivial manner, I find it best to publish what I have and be rid of the text for now.

While I have done some polishing, most was a matter of (the English) language, I have not checked various details for correctness (I am hampered both by being rusty and by not having an Oracle DB available), and I have left the code “as is” (contrary to the aforementioned to-do—caveat lector).

I make corresponding reservations for sloppy errors by 2012 me, which 2012 me would have found and corrected during polishing and proof-reading, but which 2024 me might not have.

I make further reservations for what might have changed since the time of writing.

Things not discussed here (but still useful)

Someone very new to Oracle or SQL in general might want to read up on the many functions available, including e.g. substr, nvl, and nvl2; and other syntactical niceties, e.g. the case statement (the more handy and compact, but Oracle specific and slightly weaker, decode is used repeatedly below). Knowledge of aggregate functions is a near must (their complements, the analytical functions, are partially discussed below).

PL/SQL brings a lot of power, but is off-topic. I will likely add another article on the topic in the future.

Learning some of the specifics of SQL*Plus can also be very beneficial, but since other SQL clients (even Oracle’s own SQL-Developer) are not sufficiently compatible, it too remains off-topic.

Typically, I will not give instruction on the basics of the used functionality, but assume that the basics are already known. (If in doubt, the reader who does not know can easily find the corresponding information online.)

Use of temporary tables

Using a temporary table can be a great saver of both time and effort when using variations of the same (exploratory) query or making several such queries with similar base data. For instance, when comparing two sets of data I often find myself using one or several temporary tables to keep various partial steps (the first often containing the result of a join of two tables, identifying the relevant entries and giving some minimal data).

(The “exploratory” is to differ from queries used by production code, queries called over officials views, and similar. In such cases, temporary tables are only very rarely suitable and the priority should be to find and enforce good execution plans. Note that the means to do this are also better, as e.g. the creation of a new index is a far more reasonable action than for an “exploratory” query. In a pinch, a “materialized view”, which has the advantage of staying up-to-date, is a better choice for production code.)

A particularly useful trick is to replace a select ... from ... where ... in ([long list of identifiers]) with a join on a temporary table. Normally, this is a waste of time—but consider wanting to select the identifiers not found in the original table. By using a join instead, this becomes very simple (see the following section for an example; this technique, obviously, works wonderfully on regular tables too).


Side-note:

To reduce cluttering in the main schema, especially when a production system is concerned, I recommend putting such tables in a separate schema—ideally one specific to the querying user.



Side-note:

Why not use a with clause rather than a temporary table?

A with clause can be quite handy when only one eventual select is to be executed only once. Here the need to create and later drop the table is removed; and if several temporary tables would have been needed, it can be easier to keep track. However, when repeated executions are intended (usually the case when working interactively and adapting the statements) or when the temporary table is intended for several queries (very often the case), then the temporary table wins out.

Further, a temporary table can be more flexible, not being limited to a single (if possibly complex) SQL statement on existing tables (consider e.g. wanting to access data delivered in a CSV file) and allowing easier incremental combinations of data (say, selecting from table A, inspecting the results, then combining with table B, inspecting the results, and so on).


Use of selects to create a pseudo-table

When the creation of a temporary table with external data is not an option on a given system (e.g. due to insufficient rights or a wish to reduce the risk of accidental interference with the official data model), selects from dual in combination with unions can be an option.

Example to find all account numbers in a given list not present in a table:

select account_number from ( (select ’2340923’ as account_number from dual) union (select ’2340925’ as account_number from dual) union ... (select ’2346723’ as account_number from dual) ) as account_numbers left outer join account_table on account_table.account_number = account_numbers.account_number where account_table.id is null;

If a sufficiently large other table is present, the set of unions can be replaced by something like

select decode(rownum, 1, ’2340923’, 2, ’2340925’, ...) as account_number from other_table where rownum < ...

Beware that the time needed for parsing the actual SQL statement can go into the seconds when even a few hundred entries are concerned. Further, obviously, it is not possible to add an index to this type of “table”. Correspondingly, it is best used with smaller data sets. (Indeed, there is likely to be an upper limit in the number of entries available in this manner; however, I have not yet used so large a set. The first technique works up to at least 10.000 entries in Oracle 11.)


Addendum:

Looking back from 2024, I am uncertain about this parsing claim (and I do not have the time to do research). Even adjusting for the performance increases in the interim, seconds sounds like a lot. It might be that I intended the full pre-execution phase, including the building of an execution plan, and expressed myself sloppily. Network delays might also be a contributor, but only rarely by that much.



Side-note:

Actually generating the corresponding statements is not hard. For instance, given a file with one account number per row, the Vim command

%s/.*/(select ’&’ as account_number from dual) union/

will take care of the issue, leaving only the “frame” for manual insertion.


Using the dictionary

Oracle provides an extensive “dictionary” that can be used to query information about the database and its contents. For instance, a select * from user_tables gives information on various tables, while select * from user_constraints gives information on constraints.

(Ditto for all_tables and dba_tables, etc., which might be a better choice, depending on what information is wanted and what rights the current user has. A deeper discussion goes beyond the scope of this page. For a full overview of the dictionary, see Oracle documentation.)

Formatting dates for better comparisons

Comparisons of dates can be tricky; in particular, as the default display only has a granularity of a day. (Note that Oracle differs from ANSI in that dates correspond more to timestamps.) A naive date1 = date2 might then fail even if two dates appear to be the same. By first formatting the dates with to_char, this problem is avoided (and any arbitrary granularity can be chosen, e.g. month, day, or second).

Use set operators

Oracle supports a number of set operators—of which most developers only use union. Others, e.g. minus, can be very handy. The corresponding result can often be found by some mixture of joins and selection criteria. but using a set operator is usually more elegant and easier to read, and can sometimes bring a performance benefit.

Use analytical functions

In addition to the aggregate functions (e.g. counts and sums with an optional group by ... having ...) that should be common knowledge, Oracle has analytical functions that can be used for similar tasks, but sometimes are more flexible. Here only the select clause is altered to contain e.g. sum(price) over (partition by invoiceNumber). A particular benefit is the ability to print various results together with the “normal” data of a query: An aggregate function can only occur with non-aggregated fields that the query groups by; an analytical function can appear with arbitrary fields.

Use of decode together with aggregate or analytical functions

decode is almost magical when used correctly. Consider e.g. wishing to have a single query give the counts of all invoices, all invoices in state 2, and all invoice in state 5, per city (the meaning of these state numbers is not relevant):

select count(*), sum(decode(state, 2, 1, 0)), sum(decode(state, 5, 1, 0)) from invoice group by city;


Side-note:

Indeed, the number of tricks to be found with decode is only limited by the imagination. Even most case statements can be replaced by it. (Whether actually doing so is a good idea will depend on the circumstances.)

For those wondering, the arguments are, in order, the value to decode, arbitrary pairs of before and after values, and an optional default value. (If no default value is given, null is used.) The first use above, then, yields 1 if state = 2 and else 0. (After which the summing amounts to a count.) Much of the magic stems from the ability to use fields, function calls, whatnot, as values—not just constants.


Comparisons with better null treatment

Comparisons with a null value always yield false, leading many an unwary developer to faulty queries—and those wary to extra work.


Side-note:

Strictly speaking, at the time of writing, Oracle does not have a proper boolean datatype.

What actually happens is that a great many expressions, including comparisons, that involve a null value always result in null—even when null is compared with it self. In a next step, a null value is treated approximately as false in a “boolean context”. Notable exceptions to this include the below decode, nvl, and nvl2.

For simplicity, I will gloss over this.


Our friend decode can help with e.g.

decode(field1, field2, 1, 0) = 1

Depending on the data in the rows being compared, this statement can then evaluate as e.g.

decode(’rabbit’, ’hare’, 1, 0) = 1 => false

decode(’rabbit’, null, 1, 0) = 1 => false

decode(’rabbit’, ’rabbit’, 1, 0) = 1 => true

or

decode(null, null, 1, 0) = 1 => true

While this is not as easy to understand and read as field1 = field2 (with a different semantic, yielding false in the last example), it is at least as good as nvl(field1, X) = nvl(field2, X) (where X refers to some non-null value that is guaranteed not to be present in its own right) and there is no need for the X—the choice of which can be tricky and cannot be kept identical from case to case.

Further, comparison through nvl is less flexible, including that it is not suited for use in the “select” part of a query, and that it cannot be extended to handle less trivial comparisons. Assume e.g. that we want to display the result of a comparison using a logic of equal (in the sense of “=”) => 1, both values null => -3, the first \{null} => -1, the second => -2, otherwise unequal => 0:

select decode(field1, field2, nvl2(field1, 1, -3), null, -1, nvl2(field2, 0, -2)) from ...

(Where nvl2(a, b, c) could, obviously, be replaced with decode(a, null, c, b). The former is the better choice stylistically, which is why I used it. The latter, however, demonstrates that I did not cheat when I used another function.)

While this is a bit hard to read, it was written off the top of my head, demonstrating the power of decode—and a more readable version can possibly be found with a little more thought. (Not to mention that the rules were unusually convoluted to begin with.)

Doing the same with nvl?

No way occurs to me. Indeed, even with the slightly more powerful nvl2, I see no way (without using other functionality, notably the case statement—but with case, nvl/nvl2 would no longer be needed).

What about case? Well, the result could be better in at least some circumstances; however, it would also be harder to write, take more space, and might wreck the flow and readability of the surrounding SQL. Indeed, if the above statement is given a comment to declare the underlying intention, it will likely always be the better choice...


Side-note:

Normally, I am strongly for writing code with a focus on readability, including avoiding division/multiplication by two through a shift operator, and a certain skepticism towards the C and Java “ternary operator” (x ? y : z).

SQL, however, is an odd beast, where the rules for what makes good code overall seem to be different. This for, at least, two reasons:

  1. Optimizing the readability of one part of a query (e.g. the above) can affect the rest of the query negatively.

  2. There is a great potential for compactness (e.g. through decode) not present in most higher-level languages. Compactness, in turn, can in its own ways lead to better readability—provided that the reader is sufficiently familiar with the language and its standard idioms.


Links between databases can be very handy, allowing access to tables from a foreign database (“there”) as if they were present in the current one (“here”). However, they can be a performance disaster. My own experiences are not deep enough to give very specific advice (in particular, because I lack knowledge of the way the query optimizers make decisions in this area), but the one golden rule is to move as little data as possible over the link.

Now, this is not as easy as it sounds, because the actual evaluation of criteria often takes place “here” even when the data is “there”. In other words, if a million-row table “there” is accessed to retrieve just one row, it could be that the full million rows are still sent over the link...

Ways to avoid such waste include:

  1. Setting up a view “there” that includes the select criteria. This has the side-effect that all the criteria in the view are evaluated “there” (while “here” need not even realize that a view is used instead of a table).

  2. Not making repeated queries on “there” data, instead copying all relevant data in one go and then making the repeated queries “here”. Yes, it might even pay off to copy entire tables. (This, however, is rarely the optimal solution and should be reserved for quick-and-dirty work.)

  3. The driving_site hint can specify where a particular query should be executed, which can optimize the network transports by moving a little data from “here” to “there”, restricting the data, and then moving a little data from “there” to “here”—rather than moving a lot of data from “there” to “here” and then restricting the data. (select /*+DRIVING_SITE(table)*/ [rest of query])