SQL Blog of Zach Stagers

www.scratchbox.co.uk

Using Derived Tables

No Comments »

A derived table is essentially a correlated subquery within your FROM clause. If used correctly, a derived table can drastically improve the efficiency of your query.

Imagine that you’re going to be querying a large table with over a million rows in it and 20 columns, but you know your query is only ever going to be using 50,000 of those rows and 3 of the columns. That’s 950,000 rows and 17 columns that have to be scanned and rejected by your query, in order to get the data you need. By using a derived table, you can limit the table reference to those 50,000 rows.

Syntax and Explanation

SELECT P.* 

FROM
(
SELECT ProductId,
       ProductName,
       ProductDescription
FROM Products
WHERE ProductCategoryId = 4
) AS P JOIN
Orders AS O ON O.ProductId = P.ProductId

WHERE O.OrderDate >= DATEADD(Y, -1, GETDATE()

The above code will return all products with a ProductCategoryId of 4 from the products table, which have been ordered within the last year.

This will help query performance by cutting down the rows read out from the Products table to begin with. The FROM clause within a SQL statement is actually the first to be executed, then the WHERE. So in the above query, only the 50,000 records have to be joined to orders table, rather than the full million rows.

Subqueries – Correlated and Noncorrelated

2 Comments »

Subqueries are a great way to easily increase the scope of your result set, and in this post I’m going to explain the difference between Correlated and Noncorrelated Subqueries, as well as provide examples for them both.

Correlated Subqueries

Correlated Subqueries create a link between the outer query and the subquery. This means that the subquery is dependent on the outer query, and won’t execute as a stand alone query.

SELECT *
FROM Products AS P1
WHERE P1.Price >
              (
               SELECT AVG(P2.Price)
               FROM Products AS P2
               WHERE P2.ProductCategoryId = P1.ProductCategoryId
              )

The above query will return all Products that have a price higher than the average price for each product category. The WHERE clause of the subquery almost acts as if it were a GROUP BY for the outer query in this instance. This is because the subquery and outer query join together on the ProductCategoryId – meaning that the average price is found for each Product Category. Then products with a price greater than the average, for that product category, will be returned.

Noncorrelated Subqueries

Noncorrelated Subqueries are completely independent of the outer query, and will execute as a stand alone query without any issues. These are commonly used within a WHERE clause – much like the example below:

SELECT *
FROM Products AS P
WHERE P.ProductId IN (SELECT DISTINCT O.ProductId FROM Orders AS O)

The above code will return all rows from the Products table that have atleast one reference in the Orders tables. It does this by returning all of the ProductId’s in the Orders table, to be used in the WHERE clause of the outer query.

Recursive CTEs

No Comments »

Recursive Common Table Expressions allow you to perform a recursive query, without the use of a resource intensive Cursor.

In this post, I’ll show you how to write a very simple Recursive CTE, and explain how it works.

The example that I’ll be working with in this post is as below:

;WITH RCTE AS
       (
        SELECT 1 AS [Recursion Level], 'A' [Type]

        UNION ALL

        SELECT [Recursion Level] +1 AS [Recursion Level], 'R' [Type]
        FROM RCTE
        WHERE [Recursion Level] < 5
       )

SELECT * FROM RCTE
OPTION(MAXRECURSION 4)

This example obviously has no practical use, but it's good for showing how a Recursive CTE works.

The query starts in the exact same way as a normal CTE, ;WITH RCTE AS. The semicolon terminates any SQL defined before the CTE, and it's best practice to add a semicolon here for that reason. If any preceding code (Commonly the declaration of variables) isn't terminated by a semicolon, an error will occur.

The second line, SELECT 1 AS [Recursion Level], 'A' [Type] is known as the "Anchor Member". This query defines the base level item that you would like to build upon. This simple select just creates a single record with two columns.

UNION ALL joins the two parts of the recursive query together, and must be defined after your anchor query.

The query defined after the UNION ALL is known as the "Recursive Member". This query must reference the CTE in its FROM clause. This is so the query knows what base values it has, and can therefore build upon. The where clause, WHERE [Recursion Level] < 5, is a terminator statement, which stops the query from looping indefinitely. The CTE is then closed off with a closing a bracket.

A simple SELECT * FROM RCTE is all that is needed to pull back all of the required information from the CTE in this example.

OPTION(MAXRECURSION 4) is essentially another safety net to stop the query from spiralling into an infinite loop. This stops the query from running at a maximum of 4 recursions, but be aware that if 4 recursions isn't enough to extract all of the data within the loop - an error will occur. For example, If I changed my WHERE clause to less than 6 in the Recursive Member, and left my Max Recursion option at 4 I would recieve the following error:

The statement terminated. The maximum recursion 4 has been exhausted before statement completion.

The Max Recursion clause is optional.

The result set from the example query would like the below:

RCTE Result

As you can see, 5 results were returned - even though WHERE [Recursion Level] < 5 was specified in the Recursive Member, and OPTION(MAXRECURSION 4) is defined in the outer query. This is because the Anchor Member (First record in the result set, type "A") doesn't count as a recursion. There are, however, 4 recursion records (type "R").

A more practical use of a Recursive CTE would be to build something like a family tree, or a companies employee structure.

Stop X-Axis labels skipping values in SSRS 2K8

No Comments »

I came across an issue today involving a chart I’d added to a report. The chart was working as
expected, but the values along the X-Axis were unclear, below is a screenshot of the graph with
those values:

Graph Axis Label Issue

At a glance, the graph looks fine. Not until you take a good look at the values on the X-Axis do
you see that there’s a problem. Spotted it? The values go from 1 to 4, and there’s a bar in between
with no way of identifying if this is assigned to 2, 3, or both!

In order to resolve this, I went into the Axis Properties by clicking into the chart, right
clicking on the X-Axis, and selecting “Axis Properties…” at the very bottom of the list. This
will bring up the Category Axis Properties window, as below:

Graph Axis Properties

From this screen, tick “Scalar Axis” to force the axis to span all values that have data. For
example, if you have a chart with an X-Axis value of 1-5 available, and all of the available data
falls in either 3 or 5, then 3, 4, and 5 will be displayed along the X-Axis, but not 1 and 2.

The result of ticking this box is now the below:

Graph Axis Label Issue - Fixed (0-11)

Much better! I can now see exactly which value each bar is assigned to. However, this has created
another slight issue in that the values 0 and 11 are now displaying, even though the only available
values for X-Axis are 1 – 10. To get around this one, go back into the Axis Properties, but this
time go to the second tab – “Labels”:

Graph Axis Labels Properties

Check “Hide first and last labels along this axis”. This will remove the first and last X-Axis
values from the list, so long as no data is assigned to them. In this example, no data will ever be
assigned to 0 or 11, and will therefore be hidden – giving me exactly what I want:

Graph Axis Label Issue - Fixed

Finding a list of user defined tables within a SQL Database

1 Comment »

Like most companies working with development, we have a production database server and a test database server. The developers do their testing from the test server, and once they’ve finished their development phase and the project is signed off – the database is backed up and restored on the live server.

Occasional, additional development is required after the site is live – which could mean more database tables.

The above situation happened to me today, and one of the developers asked me to compare the two databases tables, and create any that weren’t already on the production server.

I found a quick way of comparing the two servers, rather than sitting there writing out a list of the two servers tables.

That quick way was the below query:

USE database
SELECT *
FROM sys.all_objects
WHERE type = 'U' AND
      is_ms_shipped = 0
ORDER BY name

This will find all of the user defined tables associated with the database its run against.

If you remove AND is_ms_shipped = 0 from the statement, then you’ll notice that a table named “dtproperties” is listed. This is actually a system table used to store information about database diagrams. It comes back as a user defined table due to the way it’s stored internally. Luckily Microsoft have added the is_ms_shipped flag to the table, so that we can easily filter items like this out.

Finding Distinct Values in Excel 2010

2 Comments »

This is something that I’ve just found myself, and I wish I would have known about it a few days when I was asked to extract the distinct values from a column.

Start off by highlighting the range of data that you would like to find the distinct values in, I’ll be using the below simple sample data for the purpose of this post:

Example Data

On the “Data” tab of Excel 2010, find the “Remove Duplicates” button:

Remove Duplicates Button

Click the button, and the Remove Duplicates Dialogue will appear:

Remove Duplicates Dialogue

On the top right of the Remove Duplicates Dialogue box, you’ll see a check box – “My data has headers” – you can toggle this check box depending on whether or not you have headers in your data. I’m going to leave it checked for the purpose of this example. The check box “Column” is the name of the column I’m removing duplicates from, if multiple columns are selected in the first step – they can be managed from here.

Once you’re ready, click OK, and the below message will appear – giving you a few details about what has occurred:

Remove Duplicates Message Box

The result:

Remove Duplicates Result