Jump to content

How to Convert Subqueries to JOINs

+ 1
  drgrussell's Photo
Posted Oct 24 2009 09:19 PM

Sometimes you want to query one table, use that result to query another table, and then use that result to query yet another table. It’s tempting to do this as three separate queries, but the right solution is to chain them yourself into one SQL statement.

Consider a database where employees have job titles, and job titles have ranks, and a rank has a salary, as shown in Table 1, Table 2, and Table 3.

Table 1. The jobs table

EmployeeTitle
Gordon RussellLecturer
Andrew CummingTeaching fellow
Jim SmithTechnician

Table 2. The ranks table

TitleRank
LecturerLECT1
Teaching fellowLECT2
TechnicianTECH1

Table 3. The salary table

RankPayment
LECT12000.00
LECT23000.00
TECH15000.00
TECH26000.00

Determining how much to pay Andrew Cumming would require three steps. First, you’d need to determine Andrew’s title:

mysql> SELECT title FROM jobs WHERE employee = 'Andrew Cumming';

+-----------------+

| title           |

+-----------------+

| Teaching Fellow |

+-----------------+

Next, you’d need to determine the pay rank for a teaching fellow:

mysql> SELECT rank FROM ranks WHERE title = 'Teaching Fellow';

+-------+

| rank  |

+-------+

| LECT2 |

+-------+

Finally, you’d need to look up the salary for someone at the LECT2 pay grade:

mysql> SELECT payment FROM salary WHERE rank = 'LECT2';

+---------+

| payment |

+---------+

| 3000.00 |

+---------+

That’s not efficient, because you’d need to pass three different queries to the database and process the results in between. If a table is updated during this process the answer might be wrong, or the query might even return an error. Combining queries can make people nervous. Nervous programmers often use subqueries:

mysql> SELECT payment FROM salary WHERE rank =

    ->   (SELECT rank FROM ranks WHERE title =

    ->     (SELECT title FROM jobs WHERE employee = 'Andrew Cumming'));

+---------+

| payment |

+---------+

| 3000.00 |

+---------+

The preceding code is good in that you have reduced the problem to a single query and thus removed many of the overhead problems, but subquery statements can be slow. When your subquery statements contain no aggregate functions (such as MAX(  )), chances are you don’t need a subquery—you need a JOIN. If you have a working subquery arrangement, follow these steps to make it a JOIN:

  1. Mark all columns with the table name they come from.

  2. If you use the same table in two different FROM clauses, use aliases (not needed in this example).

  3. Move all FROM statements together to form a single FROM.

  4. Delete the occurrences of (Select.

  5. Substitute WHERE for AND after the first occurrence of WHERE.

Here’s an intermediate stage:

SELECT payment FROM salary,ranks,jobs WHERE salary.rank = 

  (Select ranks.rank  from grades AND ranks.title = 

    (Select jobs.title from jobs AND jobs.employee = 'Andrew Cumming'))

         

Ultimately, you get this:

SELECT payment FROM salary,ranks,jobs

WHERE  salary.rank = ranks.rank

AND    ranks.title = jobs.title

AND    jobs.employee = 'Andrew Cumming'

Another approach is to take the conditions inside the subquery statements and make them JOIN ON conditions:

SELECT payment

FROM salary JOIN ranks ON (salary.rank = ranks.rank)

     JOIN jobs ON (ranks.title = jobs.title)

WHERE jobs.employee = 'Andrew Cumming'

Looking for What’s Not There

Often programmers can handle this join-instead-of-subquery approach for inclusive matches, but it starts to get a little shakier with exclusive matches (looking for things that do not exist). For instance, how do you determine whether there are any ranks not currently allocated to a title? The brute force approach is to query the database for all ranks from the salary table, and then query each one in the ranks table. Needless to say, such an approach would result in bad performance. The next general approach is to use a subquery with NOT IN, but this too may not perform well:

mysql> SELECT salary.rank FROM salary

    -> WHERE rank NOT IN (SELECT rank FROM ranks);

+-------+

| rank  |

+-------+

| TECH2 |

+-------+

These may be performance losers because the subquery will likely be executed first, creating an intermediate temporary table in the database. This temporary table is then used to solve the outer query. However, in creating the temporary table, all indexes that may have existed on salary won’t be used, and the database will have to perform a full scan on the temporary table.

The query is the opposite of the nested subquery problem considered earlier, because now you want to find nonmatching rows between tables. Strangely, trying the earlier technique but using != rather than = results in a huge mess of results which don’t mean anything. Instead, you need to rely on OUTER JOIN. Put all tables required into a single FROM clause using an OUTER JOIN to link the tables. You are looking for things in salary which are not in ranks, and with OUTER JOIN the rows that don’t match will have NULL values for ranks.rank:

mysql> SELECT salary.rank

    -> FROM salary LEFT OUTER JOIN ranks ON (salary.rank = ranks.rank)

    -> WHERE ranks.rank IS NULL;

+-------+

| rank  |

+-------+

| TECH2 |

+-------+

You also can use this technique to eliminate EXISTS and NOT EXISTS. Without the subquery, the optimizer finds it much easier to use your indexes.

Cover of SQL Hacks
Learn more about this topic from SQL Hacks. 

Whether you're running Access, MySQL, SQL Server, Oracle, or PostgreSQL, this book will help you push the limits of traditional SQL to squeeze data effectively from your database. SQL Hacks offers 100 hacks -- unique tips and tools -- that bring you the knowledge of experts who apply what they know in the real world to help you take full advantage of the expressive power of SQL. You'll find practical techniques to address complex data manipulation problems.

Learn More Read Now on Safari


Tags:
0 Subscribe


0 Replies