Understanding All Relational Result Sets

an article added by: Daniel Carlson at 03252008



In: Categories » » Programming » Understanding All Relational Result Sets

  

This article can help you better understand complicated SQL containing multitable joins, group-related clauses, and aggregate functions—by far the most problematic topics for most database developers. You also learn the correct way to handle database exceptions and incorporate them as actual functionality in your ColdFusion application. You learn, too, how to increase performance by caching queries in memory for fast access. You can memorize SQL clauses and Bachus-Naur forms until you’re blue in the face, but you still end up playing “ready-fire-aim” trying to write complicated SQL that actually works—unless you have a clear picture of exactly how SQL processes statements. Your first step in clearly visualizing SQL is to understand the nature of relational result sets. The code in this article is not supported by Access, so if you want to follow along with the listings in this article, run the files This new database contains additional data and tables to help illustrate the various types of joins and the results that they produce. We added All to the title of this part for an important reason that was already stated in earlier. All SQL query results are in the form of a single table of rows and columns. Whether the result is from the query of a single table or a complicated relational query involving 15 tables doesn’t matter—all query results are in the form of a single table of rows and columns. Consider the code, which produces a relational query result that, like all query results, is in the form of a single table.

SELECT
   c.CompanyID,
   c.CompanyName,
   e.Lastname,
   e.Firstname,
   e.Salary
   FROM
   Company c INNER JOIN Employee e
 ON c.CompanyID = e.CompanyID

Notice that the result of joining these two tables results in a single table. What’s this? Isn’t this one of those Venn diagrams? Remember when Mr. Suber, your junior high school math teacher, told you that you really would use them one day in the real world, and you balked? Time to call him up and apologize! An INNER JOIN is the intersection of the two data sets contained in the two tables being joined. Why is this conceptualization critical to writing flawless SQL? Because we now can easily visualize any type of join between two tables: We can now easily describe what is really happening with each of these types of JOINs. An INNER JOIN result consists of the columns selected from both tables, but it contains only those rows that match the JOIN key values (specified in the ON clause) in both joined tables. A LEFT OUTER JOIN result consists of the columns selected from both tables, and it contains all the rows in the left-side table regardless of whether any of them match the JOIN key values in the right-side table. The query columns selected from the left-side table contain data from that table. For each row in the query result, if a JOIN key value from the left-side table matches a JOIN key value from the right-side table, the query columns selected from the right-side table contain column data from that table; if not, those columns contain nulls.

SELECT
   c.CompanyID,
   c.CompanyName,
   e.Lastname,
   e.Firstname,
   e.Salary
   FROM
   Company c LEFT OUTER JOIN Employee e
 ON c.CompanyID = e.CompanyID

A RIGHT OUTER JOIN result consists of the columns selected from both tables, and it contains all the rows in the right-side table regardless of whether any of them match the JOIN key values in the left-side table. The query columns selected from the right-side table contain data from that table. For each row in the query result, if a JOIN key value from the right-side table matches a JOIN key value from the left-side table, the query columns selected from the left-side table contain column data from that table; if not, those columns contain nulls. A FULL OUTER JOIN result consists of the columns selected from both tables, and it contains all the rows from both tables regardless of whether any of them match their JOIN key values. For each query result row extracted from the left-side table, if the JOIN key value from the left-side table has no match in the right-side table, that query result row contains nulls in the columns selected from the right-side table. Similarly, in a query result row extracted from the right-side table, if the JOIN key value from the right-side table has no match in the left-side table, that query result row contains nulls in the columns selected from the left-side table. If a query result row’s JOIN key values match between the left- and right-side tables, the data from both tables is contained in that query result row.

SELECT
   c.CompanyID,
   c.CompanyName,
   e.Lastname,
   e.Firstname,
   e.Salary
   FROM
   Company c RIGHT OUTER JOIN Employee e
 ON c.CompanyID = e.CompanyID
SELECT
   c.CompanyID,
   c.CompanyName,
   e.Lastname,
   e.Firstname,
   e.Salary
   FROM
   Company c FULL OUTER JOIN Employee e
 ON c.CompanyID = e.CompanyID

Okay, you can now precisely visualize the various types of joins and are more comfortable with the SQL syntax that creates them. That’s great! But in the real world, you often deal with joining more than two tables together in a single relational query. How are you going to write the complicated SQL to join seven related tables and get it right the first time, every time? Actually, it’s a piece of cake. In fact, writing a query that performs a combination of INNER, LEFT OUTER, RIGHT OUTER, and FULL OUTER joins between 15 tables is really no harder to write than a query containing a single INNER JOIN between two tables. The trick’s all in how you visualize it!

SELECT
   c.CompanyID,
   c.CompanyName,
   e.Lastname,
   e.Firstname,
   e.Salary,
   d.FullName,
   d.Relationship
   FROM
   Company c INNER JOIN Employee e
   ON c.CompanyID = e.CompanyID
   INNER JOIN Dependant d
 ON e.SSN = d.SSN

The trick to visualizing the joining of these three tables is to decompose the process into multiple joins between pairs of tables. You may remember that a relationship between two tables is called a binary relationship because it relates two tables. You may also remember, from the beginning of this article, that we restate the fact that all query results are in the form of a single table. But what is a query result? Certainly, a query result is the data returned from your database server to ColdFusion as the result of a CFQUERY call, but the intermediate work product of an SQL statement in process is a query result, too. Conceptually, each step in processing an SQL statement produces an internal, intermediate work product in the form of a single table. This intermediate work product is fed to the next step in the process, which in turn transforms it into its own intermediate work product in the form of another table. This continues until the SQL statement finishes processing steps; the final result is, again, a single table. In fact, the only difference between an intermediate work product and a final query result is the fact that you don’t have any more steps to process in the latter. So if you think of the intermediate work product of the first JOIN (between Company and Employee) as one table, and you think of the second JOIN as joining this “table” to the Dependant table.. In a nutshell, the joining of the Company and Employee tables produces an intermediate “table” that is then joined to the Dependant table, which in turn produces another table that is the final query result. Hey, wait a minute! Doesn’t this mean that, no matter how many complicated joins appear in a query, you can still decompose the whole mess into an easy-to-understand series of simple two-table joins? You bet! In fact, no such thing as a “three-table join” or a “multiple-table join” really exists, because all joins are between just two tables. Even a self-join between a table and itself is conceptually a join between two tables; the table is actually joined to an “image” of itself. So, joining N tables is really a series of N-1 simple two-table joins. Think of it that way, and you never experience another problem creating even the most complicated joins ever again. One final piece of advice: Use the exact SQL code indenting format that we use throughout this article, and you have significantly fewer problems writing perfect SQL the first time through. We mean don’t even deviate from our format one iota. Do everything—capitalization, tabbing, aliasing, putting each column in the SELECT clause on a line by itself, and so on—just as we’ve done in the listings for this article. Trust us—it works. The SQL code formatting technique we use really helps you visually separate each clause so you can see the statement as a collection of logical operations, rather than a haphazardly-typed string of code. In the following parts, you get to tackle once and for all an even more confusing matter: grouping and aggregate functions.

To understand exactly what you can and cannot do with SQL, you must first understand the order in which SQL processes the clauses of an SQL statement. You can use the code as an example of the order that SQL follows.

SELECT
   c.CompanyName,
   e.LastName + ‘, ‘ + e.FirstName AS EmployeeName,
   e.Salary,
   count(*) as NumDependants
   FROM
   Company c INNER JOIN Employee e
   ON c.CompanyID = e.CompanyID
   INNER JOIN Dependant d
   ON e.SSN = d.SSN
   WHERE
   e.DateOfBirth < ‘01/01/82’
   GROUP BY
   c.CompanyName,
   e.LastName,
   e.FirstName,
   e.Salary
   HAVING
   Count(*) >= 2
   ORDER BY
 EmployeeName ASC

Before SQL can do anything with data, it must get that data, so the FROM clause is the first to be processed in the statement. Remember that the result of the FROM clause is a single set of data in the form of a simple table, regardless of how many tables are joined together in the FROM clause. Refer to earlier in this article, for the details of how JOINs are processed in the FROM clause. The intermediate work product of the FROM clause is then “fed” to the WHERE clause, which further processes this set of data. The purpose of the WHERE clause is to select rows that satisfy a specific criterion or criteria. Only those rows in the intermediate work product of the FROM clause that satisfy the WHERE clause criteria can pass to the next clause in SQL’s processing order of precedence. If an SQL statement doesn’t contain a WHERE clause, the intermediate work product of the FROM clause instead passes to the next clause in the statement. The intermediate work product then passes to the GROUP BY clause, if one is present in the statement. The purpose of the GROUP BY clause is to group together rows in the intermediate work product based on homogenous values in the columns that define the group. In other words, if a statement groups by the CompanyName column, all rows containing the same value as CompanyName belong to the same group after the GROUP BY clause finishes processing. Similarly, if a statement groups by the LastName and FirstName columns, all rows containing the same combination of LastName and FirstName values belong to the same group after the GROUP BY clause is finished processing. But what do we mean by groups? If the result of every clause of an SQL statement is a simple table, where do these groups fit in the table that results from a GROUP BY clause? Actually, each group becomes a single row of the intermediate work product of the GROUP BY clause, and each group’s constituent rows are like separate little invisible tables turned sideways and sitting behind each group. This “invisible third dimension” to the intermediate work product of the GROUP BY clause is available to the HAVING clause and aggregate functions. The invisible third dimension is an internal working structure for these purposes only and is never returned as part of a final query result set. If an SQL statement doesn’t contain a GROUP BY clause, the intermediate work product instead passes to the next clause in the statement. The intermediate work product then passes to the HAVING clause, if one is present in the statement. Where the purpose of the WHERE clause is to select rows that satisfy a specific criterion or criteria, the purpose of the HAVING clause is to select groups that satisfy a specific criterion or criteria. These are the groups that sit “invisibly behind each row” in the intermediate work product of the GROUP BY clause. Notice how few rows remain in the intermediate work product of the HAVING clause? That’s because the criterion of the HAVING clause selects only those groups containing two or more elements—in other words, Employees with two or more Dependants. If an SQL statement doesn’t contain a HAVING clause, the intermediate work product instead passes to the next clause in the statement. The intermediate work product then passes to the SELECT clause, which selects from all the columns in the intermediate work product those columns that are part of the final query result. The intermediate work product then passes to the ORDER BY clause, which sorts all the columns in the intermediate work product according to the columns and directions specified in the clause. And that’s that! Not exactly the way that you envisioned SQL working, eh? Who ever thought that the SELECT clause was one of the last things to be processed? But it makes sense if you see it all laid out as here, doesn’t it? In fact, the processing order of precedence that you read here and in the preceding parts solves a few mysteries that you may be wondering about. You can refer to table-name aliases, for example, anywhere in an SQL statement, but if you try to use a column-name alias in the WHERE, GROUP BY, or HAVING clauses, an error is thrown. This is because table-name aliases are defined in the FROM clause which is processed first, so aliases defined there can be referenced by all the clauses that follow. Column-name aliases, however, are defined in the SELECT clause, which is processed next-to-last, so the only clause that can reference a column name alias is the ORDER BY clause, which is the only clause processed after the SELECT clause. After you understand how SQL processes statements, you have a much easier time learning how and when to use the more complicated aspects of SQL.

legal notice

Our website is not responsible for the information contained by this article. Web-articles is a free articles resource.
Suggestion: If you need fresh, daily updated content for your website, feel free to use our service. Click here for more information.

Useful tools and features

Understanding All Relational Result Sets  
If you like this article (tutorial), please link to it from your web page using the information above.

related articles

1. Modifying a Company in the Database
The data that CFQUERY requests from the database comes back in a result set, and the Name attribute tells ColdFusion what that result set is to be named. The SQL statement consists of three clauses: SELECT, FROM, and ORDER BY. SELECT tells the database which columns to retrieve from the database; FROM tells the database which table to retrieve those columns from; and ORDER BY tells the database how to sort the results. The result set returned from CFQUERY contains multiple rows of data, and each row has multiple columns. It would ...

2. Building the company edit action template
The code in CompanyEditAction.cfm is like the code in CompanyAddAction.cfm, but the edit action updates rather than inserts. Create a file named CompanyEditAction.cfm inside the Ch02 directory, type the code into the file’s editing window, and save the file. <cfquery name=”UpdateCompany” datasource=”#Request.MainDSN#”> UPDATE Company SET CompanyName = ‘#Trim(Form.CompanyName)#’, Address = ‘#Trim(Form.Address)#’, City = ‘#Trim(Form.City)...

3. Adding a New Employee to the Database
The CFQUERY in CompanyDeleteAction.cfm uses a DELETE statement with two SQL clauses: DELETE and WHERE. DELETE tells the database the table from which to delete a record, and WHERE tells the database which record to delete. You can watch the company delete process in action. Point your Web browser to http://<yourserver>/CFMXBible/Ch02/CompanyGetDeleteForm.cfm and enter a CompanyID. (To get a valid ID, go to the company list and pick a number from the ID column.) Click Submit to see the chosen company’s informatio...

4. Modifying an Employee in the Database
The DateFormat() function around the DateOfBirth column in the codereturns the employee’s birth date reformatted according to a display mask. DateOfBirth normally comes back from the database in the following format: 2002-01-01 00:00:00.0 That format is not very user-friendly. Calling DateFormat() with a mask of “mm/dd/yyyy” returns the date as follows: 01/01/2002 This version is, of course, more natural and easy to read. The same is true for the employee edit process...

5. Removing an Employee From the Database
The user must have the capability to remove employees from the database. The employee delete process is a simple combination of techniques that you have already learned, such as retrieving a record from the database, displaying that record in a template, and so on. The first page in this process is nearly identical to the employee get edit form. Create a file named EmployeeGetDeleteForm.cfm inside the Ch02 directory, type the code into the file’s editing window, and save the file. <html> <head> <ti...

6. Making direct links to the forms
Say that you want to modify or delete a company. Right now, you need to remember the company’s ID, go back to the launch pad, click Company Edit, and enter the company ID, all just to get to the edit form. Wouldn’t you rather click a company in the list and go directly to the edit form? ... <table> <tr> <td><b>ID</b></td> <td><b>Name</b></td> <td><b>Address</b></td> <td><b>City</b></t...

7. Using a Custom Tag
Custom tags are reusable, developer-authored extensions to the ColdFusion language. The custom tag that you create in the following parts displays today’s date in a familiar format. Create a file named TodaysDate.cfm inside the Ch02 directory, type the code into the file’s editing window, and save the file. <cfoutput>#DateFormat(Now(), “ddd, mmm d, yyyy”)#</cfoutput> Now() returns the current date and time, and DateFormat() reformats the date. To call your custom tag, open index.cfm, a...