Sometimes you want to query one table, use that result to query another table, and then use that result to query yet another table. It’s tempting to do this as three separate queries, but the right solution is to chain them yourself into one SQL statement.
Consider a database where employees have job titles, and job titles have ranks, and a rank has a salary, as shown in Table 1, Table 2, and Table 3.
Table 1. The jobs table
| Employee | Title |
|---|---|
| Gordon Russell | Lecturer |
| Andrew Cumming | Teaching fellow |
| Jim Smith | Technician |
Determining how much to pay Andrew Cumming would require three steps. First, you’d need to determine Andrew’s title:
mysql> SELECT title FROM jobs WHERE employee = 'Andrew Cumming';
+-----------------+
| title |
+-----------------+
| Teaching Fellow |
+-----------------+Next, you’d need to determine the pay rank for a teaching fellow:
mysql> SELECT rank FROM ranks WHERE title = 'Teaching Fellow';
+-------+
| rank |
+-------+
| LECT2 |
+-------+Finally, you’d need to look up the salary for someone at the LECT2 pay grade:
mysql> SELECT payment FROM salary WHERE rank = 'LECT2';
+---------+
| payment |
+---------+
| 3000.00 |
+---------+That’s not efficient, because you’d need to pass three different queries to the database and process the results in between. If a table is updated during this process the answer might be wrong, or the query might even return an error. Combining queries can make people nervous. Nervous programmers often use subqueries:
mysql>SELECT payment FROM salary WHERE rank =->(SELECT rank FROM ranks WHERE title =->(SELECT title FROM jobs WHERE employee = 'Andrew Cumming'));+---------+ | payment | +---------+ | 3000.00 | +---------+
The preceding code is good in that you have reduced the problem to
a single query and thus removed many of the overhead problems, but
subquery statements can be slow. When your subquery statements contain
no aggregate functions (such as MAX( )), chances are you don’t need a
subquery—you need a JOIN. If you have
a working subquery arrangement, follow these steps to make it a JOIN:
Mark all columns with the table name they come from.
If you use the same table in two different
FROMclauses, use aliases (not needed in this example).Move all
FROMstatements together to form a singleFROM.Delete the occurrences of
(Select.Substitute
WHEREforANDafter the first occurrence ofWHERE.
Here’s an intermediate stage:
SELECT payment FROM salary,ranks,jobs WHERE salary.rank = (Select ranks.rank from grades AND ranks.title = (Select jobs.title from jobs AND jobs.employee = 'Andrew Cumming'))
Ultimately, you get this:
SELECT payment FROM salary,ranks,jobs WHERE salary.rank = ranks.rank AND ranks.title = jobs.title AND jobs.employee = 'Andrew Cumming'
Another approach is to take the conditions inside the subquery
statements and make them JOIN
ON conditions:
SELECT payment
FROM salary JOIN ranks ON (salary.rank = ranks.rank)
JOIN jobs ON (ranks.title = jobs.title)
WHERE jobs.employee = 'Andrew Cumming'Often programmers can handle this join-instead-of-subquery
approach for inclusive matches, but it starts to get a little shakier
with exclusive matches (looking for things that do
not exist). For instance, how do you determine whether there are any
ranks not currently allocated to a title? The brute force approach is
to query the database for all ranks from the salary table, and then query each one in the
ranks table. Needless to say, such
an approach would result in bad performance. The next general approach
is to use a subquery with NOT IN, but this too may not perform
well:
mysql>SELECT salary.rank FROM salary->WHERE rank NOT IN (SELECT rank FROM ranks);+-------+ | rank | +-------+ | TECH2 | +-------+
These may be performance losers because the subquery will likely
be executed first, creating an intermediate temporary table in the database. This temporary table is
then used to solve the outer query. However, in creating the temporary
table, all indexes that may have existed on salary won’t be used, and the database will
have to perform a full scan on the temporary table.
The query is the opposite of the nested subquery problem
considered earlier, because now you want to find nonmatching rows
between tables. Strangely, trying the earlier technique but using
!= rather than = results in a huge mess of results which
don’t mean anything. Instead, you need to rely on OUTER JOIN.
Put all tables required into a single FROM clause using an OUTER JOIN to link the tables. You are looking for
things in salary which are not in
ranks, and with OUTER JOIN the rows that don’t match will have
NULL values for ranks.rank:
mysql>SELECT salary.rank->FROM salary LEFT OUTER JOIN ranks ON (salary.rank = ranks.rank)->WHERE ranks.rank IS NULL;+-------+ | rank | +-------+ | TECH2 | +-------+
You also can use this technique to eliminate EXISTS and NOT EXISTS. Without the subquery, the optimizer
finds it much easier to use your indexes.
Whether you're running Access, MySQL, SQL Server, Oracle, or PostgreSQL, this book will help you push the limits of traditional SQL to squeeze data effectively from your database. SQL Hacks offers 100 hacks -- unique tips and tools -- that bring you the knowledge of experts who apply what they know in the real world to help you take full advantage of the expressive power of SQL. You'll find practical techniques to address complex data manipulation problems.




Help






