SQL Server 2005: Common Table Expressions (CTE) in SQL Server 2005

When crafting a query in SQL, there are often times when we may need to operate over a set of data that doesn't inherently exist within the system. For example, the database for an eCommerce web application would have the standard tables - Products, Customers, Orders, OrderDetails, and so on - but we may need to run reports on a particular subset of the data or against aggregate data across these tables. Or the reporting queries we need might need to group or filter by results returned by scalar subqueries. Typically, views are used to break down complex queries into digestible chunks or to provide scalar subquery results that can be grouped and filtered. Views, however, are sometimes overkill, as they are permanent objects at the system-level. If we only need to reference this complex query in a single stored procedure or UDF, another option is to use a derived table. Unfortunately, derived tables muddle the readability of the query and must be repeated for each use in a statement.

Common Table Expressions, or CTE, are a new construct introduced in Microsoft SQL Server 2005 that offer a more readable form of the derived table that can be declared once and referenced multiple times in a query. Moreover, CTEs can be recursively defined, allowing a recursive entity to be enumerated without the need for recursive stored procedures. In this article we'll examine the benefits, uses, and syntax of both recursive and non-recursive CTEs. Read on to learn more!

A Simple Common Table Expression Example
Before we dive into the syntax or gritty details of CTEs, let's start by looking at a simple example. I think you'll agree that even without knowing the syntax of CTEs, they are pretty readable and straightforward (the hallmark of a well-designed programming language construct).

 WITH ProductAndCategoryNamesOverTenDollars (ProductName, CategoryName, UnitPrice) AS
(
   SELECT
      p.ProductName,
      c.CategoryName,
      p.UnitPrice
   FROM Products p
      INNER JOIN Categories c ON
         c.CategoryID = p.CategoryID
   WHERE p.UnitPrice > 10.0
)

SELECT *
FROM ProductAndCategoryNamesOverTenDollars
ORDER BY CategoryName ASC, UnitPrice ASC, ProductName ASC

This query creates a CTE named ProductAndCategoryNamesOverTenDollars that returns the name, category name, and price of those products whose unit price exceeds $10.00. Once the CTE has been defined, it must then immediately be used in a query. The query treats the CTE as if were a view or table in the system, returning the three fields defined by the CTE (ProductName, CategoryName, and UnitPrice), ordered alphabetically by category, then by price, and then alphabetically by product name.

In short, a Common Table Expression allows us to define a temporary, view-like construct. We start by (optionally) specifying the columns it returns, then define the query. Following that, the CTE can be used in a SELECT, INSERT, UPDATE, or DELETE statement.

Common Table Expression Syntax
A Common Table Expression contains three core parts:

The CTE name (this is what follows the WITH keyword)
The column list (optional)
The query (appears within parentheses after the AS keyword)

The query using the CTE must be the first query appearing after the CTE. That is, you cannot do the following:

 WITH ProductAndCategoryNamesOverTenDollars (ProductName, CategoryName, UnitPrice) AS
(
   SELECT
      p.ProductName,
      c.CategoryName,
      p.UnitPrice
   FROM Products p
      INNER JOIN Categories c ON
         c.CategoryID = p.CategoryID
   WHERE p.UnitPrice > 10.0
)

SELECT *
FROM Products

SELECT *
FROM ProductAndCategoryNamesOverTenDollars
ORDER BY CategoryName ASC, UnitPrice ASC, ProductName ASC

The ProductAndCategoryNamesOverTenDollars CTE only applies to the first query following it. So when the second query is reached, ProductAndCategoryNamesOverTenDollars is undefined, resulting in an "Invalid object name 'ProductAndCategoryNamesOverTenDollars'" error message.

You can, however, define multiple CTEs after the WITH keyword by separating each CTE with a comma. For example, the following query uses two CTEs. The subsequent SELECT query then uses an INNER JOIN to match together the records from the two CTEs:

 WITH CategoryAndNumberOfProducts (CategoryID, CategoryName, NumberOfProducts) AS
(
   SELECT
      CategoryID,
      CategoryName,
      (SELECT COUNT(1) FROM Products p
       WHERE p.CategoryID = c.CategoryID) as NumberOfProducts
   FROM Categories c
),

ProductsOverTenDollars (ProductID, CategoryID, ProductName, UnitPrice) AS
(
   SELECT
      ProductID,
      CategoryID,
      ProductName,
      UnitPrice
   FROM Products p
   WHERE UnitPrice > 10.0
)

SELECT c.CategoryName, c.NumberOfProducts,
      p.ProductName, p.UnitPrice
FROM ProductsOverTenDollars p
   INNER JOIN CategoryAndNumberOfProducts c ON
      p.CategoryID = c.CategoryID
ORDER BY ProductName

Unlike a derived table, CTEs can be defined just once, yet appear multiple times in the subsequent query. To demonstrate this, consider the following example: the Northwind database's Employees table contains an optional ReportsTo column that, if specified, indicates the employee's manager. ReportsTo is a self-referencing foreign key, meaning that, if provided, it refers back to another EmployeeID in the Employees table. Imagine that we wanted to display a list of employees including how many other employees they directly managed. This could be done using a simple, CTE-free SELECT statement, but let's use a CTE for now (for reasons which will become clear soon):

 WITH EmployeeSubordinatesReport (EmployeeID, LastName, FirstName, NumberOfSubordinates, ReportsTo) AS
(
   SELECT
      EmployeeID,
      LastName,
      FirstName,
      (SELECT COUNT(1) FROM Employees e2
       WHERE e2.ReportsTo = e.EmployeeID) as NumberOfSubordinates,
      ReportsTo
   FROM Employees e
)

SELECT LastName, FirstName, NumberOfSubordinates
FROM EmployeeSubordinatesReport

This query will return the employees records, showing each employee's last and first name along with how many other employees they manage. As the figure below shows, only Andrew Fuller and Steven Buchanan are manager material.

Now, imagine that our boss (Andrew Fuller, perhaps) comes charging into our office and demands that the report also lists each employee's manager's name and number of subordinates (if the employee has a manager, that is - Mr. Fuller is all to quick to point out that he reports to no one). Adding such functionality is a snap with the CTE - just add it in a LEFT JOIN!

 WITH EmployeeSubordinatesReport (EmployeeID, LastName, FirstName, NumberOfSubordinates, ReportsTo) AS
(
   SELECT
      EmployeeID,
      LastName,
      FirstName,
      (SELECT COUNT(1) FROM Employees e2
       WHERE e2.ReportsTo = e.EmployeeID) as NumberOfSubordinates,
      ReportsTo
   FROM Employees e
)

SELECT Employee.LastName, Employee.FirstName, Employee.NumberOfSubordinates,
   Manager.LastName as ManagerLastName, Manager.FirstName as ManagerFirstName, Manager.NumberOfSubordinates as ManagerNumberOfSubordinates
FROM EmployeeSubordinatesReport Employee
   LEFT JOIN EmployeeSubordinatesReport Manager ON
      Employee.ReportsTo = Manager.EmployeeID

With this additional LEFT JOIN, the employee's manager's results are returned; if there's no manager for the employee, NULLs are returned instead.

When to Use Common Table Expressions

When to Use Common Table Expressions
Common Table Expressions offer the same functionality as a view, but are ideal for one-off usages where you don't necessarily need a view defined for the system. Even when a CTE is not necessarily needed (as when listing just the employees and their subordinate count in the example above), it can improve readability. In Using Common Table Expressions, Microsoft offers the following four advantages of CTEs: Create a recursive query. Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata. Enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access. Reference the resulting table multiple times in the same statement. Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated. Using scalar subqueries (such as the `(SELECT COUNT(1) FROM ...)` examples we've looked at in this article) cannot be grouped or filtered directly in the containing query. Similarly, when using SQL Server 2005's ranking functions - `ROW_NUMBER()`, `RANK()`, `DENSE_RANK()`, and so on - the containing query cannot include a filter or grouping expression to return only a subset of the ranked results. For both of these instances, CTEs are quite handy. (For more on SQL Server 2005's ranking capabilities, be sure to read: Returning Ranked Results with Microsoft SQL Server 2005.) CTEs can also be used to recursively enumerate hierarchical data. We'll examine this next!

Common Table Expressions offer the same functionality as a view, but are ideal for one-off usages where you don't necessarily need a view defined for the system. Even when a CTE is not necessarily needed (as when listing just the employees and their subordinate count in the example above), it can improve readability. In Using Common Table Expressions, Microsoft offers the following four advantages of CTEs:

Create a recursive query.
Substitute for a view when the general use of a view is not required; that is, you do not have to store the definition in metadata.
Enable grouping by a column that is derived from a scalar subselect, or a function that is either not deterministic or has external access.
Reference the resulting table multiple times in the same statement.
Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.

Using scalar subqueries (such as the (SELECT COUNT(1) FROM ...) examples we've looked at in this article) cannot be grouped or filtered directly in the containing query. Similarly, when using SQL Server 2005's ranking functions - ROW_NUMBER(), RANK(), DENSE_RANK(), and so on - the containing query cannot include a filter or grouping expression to return only a subset of the ranked results. For both of these instances, CTEs are quite handy. (For more on SQL Server 2005's ranking capabilities, be sure to read: Returning Ranked Results with Microsoft SQL Server 2005.)

CTEs can also be used to recursively enumerate hierarchical data. We'll examine this next!

Recursive Common Table Expressions
Recursion is the process of defining a solution to a problem in terms of itself. For example, a teacher needs to sort a stack of tests alphabetically by the students' names. She could process the tests one at a time and, for each test, insert it into the appropriate spot to the left (called insertion sort), probably the way most people sort a hand of cards (at least that's the way I do it). However, depending on the distribution of the tests, the size of the work space, the number of tests to sort, and so on, it may be far more efficient to break down the problem into pieces. Rather than doing an insertion sort right off the bat, it might first make sense to divide the stack of papers in half, and then do an insertion sort on one half, an insertion sort on the second half, and then a merge of the two piles. Or perhaps it would make sense to divide the tests into four piles, or eight piles. (This approach is referred to as merge sort.)

With a recursive solution you will always have the following two pieces:

The base case - what to do when you're done recursing. After dividing the tests into separate piles of say, eight elements per pile, the base case is to sort these piles via insertion sort.
The recursive step - the action to perform that involves plugging the input "back into" the system. For merge sort, the recursive step is the division of one pile into two. Then into four. Then into eight, and so on, until the base case is reached.

For more on recursion, see Recursion, Why It's Cool.

Returning to CTEs... the Employees database table holds the corporate hierarchy within its rows. Imagine that good ol' Andrew Fuller has come back and insisted on a report that would list all persons in the company along with their position in the hierarchy. Since the Employees table can capture an arbitrary number of hierarchy levels, we need a recursive solution. Enter CTEs...

Like any recursive definition, a recursive Common Table Expression requires both a base case and the recursive step. In SQL parlance, this translates into two SQL queries - one that gets the "initial" data UNIONed with one that performs the recursion. For the Employees example, the base case is returning those employees that have no manager:

 SELECT ...
FROM Employees
WHERE ReportsTo IS NULL

The recursion includes a query on the CTE itself. The following shows the CTE - with both the base case and recursive step - along with a SELECT query that returns the rows from the CTE:

 WITH EmployeeHierarchy (EmployeeID, LastName, FirstName, ReportsTo, HierarchyLevel) AS
(
   -- Base case
   SELECT
      EmployeeID,
      LastName,
      FirstName,
      ReportsTo,
      1 as HierarchyLevel
   FROM Employees
   WHERE ReportsTo IS NULL

   UNION ALL

   -- Recursive step
   SELECT
      e.EmployeeID,
      e.LastName,
      e.FirstName,
      e.ReportsTo,
      eh.HierarchyLevel + 1 AS HierarchyLevel
   FROM Employees e
      INNER JOIN EmployeeHierarchy eh ON
         e.ReportsTo = eh.EmployeeID
)

SELECT *
FROM EmployeeHierarchy
ORDER BY HierarchyLevel, LastName, FirstName

The recursion occurs in the second query in the CTE by joining the results of Employees against the CTE itself (EmployeeHierarchy) where the employees' ReportsTo field matches up to the CTE's EmployeeID. Included in this query is the HierarchyLevel field, which returns 1 for the base case and one greater than the previous level for each recursive step down the hierarchy. As requested, this resultset clearly shows that Mr. Fuller is the alpha male in this organization. Furthermore, we can see that Steven, Laura, Nancy, Janet, and Margaret make up the second tier in the organizational hierarchy, while poor Anne, Robert, and Michael are down at the bottom:

Alternatives to Recursive Common Table Expressions
As we saw in this article, enumerating hierarchical data recursively can be accomplished via CTEs (for more on using recursive CTEs, don't forget to check out the official documentation - Recursive Queries Using Common Table Expressions). However, there are other options as well. One choice is to perform the recursion at the ASP/ASP.NET layer. That is, read in all employee information to a Recordset of DataSet in code, and then recurse there. My article Efficiently DisplayingParent-Child Data discusses this approach. If you need to perform the recursion in SQL, you can use recursive stored procedures, as discussed in The Zen of Recursion. If you are designing a data model that needs to support hierarchical data, your best bet is to bake in some lineage information directly into the table from the get-go. See SQL for Threaded Discussions and More Trees & Hierarchies in SQL for more information.

Alternatives to Recursive Common Table Expressions

As we saw in this article, enumerating hierarchical data recursively can be accomplished via CTEs (for more on using recursive CTEs, don't forget to check out the official documentation - Recursive Queries Using Common Table Expressions). However, there are other options as well. One choice is to perform the recursion at the ASP/ASP.NET layer. That is, read in all employee information to a Recordset of DataSet in code, and then recurse there. My article Efficiently DisplayingParent-Child Data discusses this approach.

If you need to perform the recursion in SQL, you can use recursive stored procedures, as discussed in The Zen of Recursion. If you are designing a data model that needs to support hierarchical data, your best bet is to bake in some lineage information directly into the table from the get-go. See SQL for Threaded Discussions and More Trees & Hierarchies in SQL for more information.

Conclusion
One of the many new features in SQL Server 2005 are Common Table Expressions (CTEs), which provide a more readable and usable approach to derived tables. Additionally, CTEs may be recursively defined, allowing a recursive entity to be enumerated without the need for recursive stored procedures. For more on the new features found in SQL Server 2005, be sure to also check out Returning Ranked Results with Microsoft SQL Server 2005 and TRY...CATCH in SQL Server 2005.

SQL Server 2005

Wednesday, June 3, 2009

Common Table Expressions (CTE) in SQL Server 2005

No comments:

Followers

Blog Archive

Links