Returning a Random Result Set Using Order By

The Order By Clause is generally used to make data sets a little tidier, but a lesser known fact is that it can also be used to return a random set of data!

An Order By will always be placed at the very end of your SQL statement:

SELECT * 
 
FROM People
 
ORDER BY FirstName, SecondName ASC

The above statement will order the result set by FirstName in ascending (Smallest to Largest, A-Z) order, then by SecondName. For example, John Adams would be ahead of John Smith in the result set.

If you would like to order the result set from Largest to Smallest or Z-A, simply add DESC (For Descending) in place of ASC.

You don’t have to specify ASC or DESC at the end, but in not doing so, SQL Server will assume the default behaviour which is ascending.

The ORDER BY can also be used to specify that you would like to return a random result set. This is usually used in conjunction with a TOP Clause:

SELECT TOP 10 * 
 
FROM Person
 
ORDER BY NEWID()

The above will select the top 10 rows from within the Person table, but because we have specified ORDER BY NewId(), we’ll return a different set of 10 rows each time the query is run.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Merge Statements

Merge statements allow you to perform three different actions depending on the results of a table join.

There are three possible boolean (yes or no) checks that can be used to determine which action you’d like to be taken. They Are:

When Matched Then – This will contain the statement that you’d like to execute when there is a match in both tables. Up to two “When Matched Then” clauses can be specified within a single Merge statement using an additional join statement added with an AND. When two “When Matched Then” statements are specified, you must make sure that one performs an Update, and the other a Delete. The second will only ever be executed when the first is not.

When Not Matched By Source – This section will contain the SQL that will be executed when there is a record held in the Target table, but not in the Source table.

When Not Matched By Target – This is the opposite of the above, and will contain the code executed when a record exists in the Source table, but not the Target table.

An example of the syntax may make the above clearer:

MERGE INTO Person P
USING Employees E ON E.EmployeeId = P.PersonId
WHEN MATCHED THEN
   UPDATE
   SET LastUpdated = GETDATE()
WHEN NOT MATCHED BY SOURCE THEN
   DELETE
WHEN NOT MATCHED BY TARGET THEN
   INSERT (Name, Age, EmployementStatus)
   VALUES (E.Name, E.Age, E.EmployementStatus);

Notice the semicolon (;) at the end of the statement. This is essential for the statement to execute.

The above example will update the LastUpdated column in the Person table if there is a match in both the Person and Employees table.

When there isn’t a match in the Employees table (The Source) for a record that exists in the Person table (The Target), the record that’s in the Person table will be deleted.

When there isn’t a match in the Person table (The Target) for a record that exists in the Employees table (The Source), the record that’s in the Employees table will be inserted.

Each of the DML (Data Manipulation Langauge – Delete, Update, Insert (and Merge, actually)) statements can have the Output and Output Into clauses applied to them within the Merge statement.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

SQL LIKE ESCAPE Character

If you need to do a search on a field for a character that’s used by the LIKE syntax, the % for example, then you can specify an ESCAPE Character to allow you search to for the symbol.

To do this, simply specify your escape character, typically a value which you know won’t be returned in the result set, before the character you’re searching for.

For example, if a column containing a percentage was created as a VarChar, and users were inserting the % symbol along with the rate, you might want to find and correct these records before updating the columns data type to a numeric form. To do something like this, you would use a query like the below:

SELECT FirstName,
       Surname,
       TaxBand
 
WHERE TaxBand LIKE '%=%%' ESCAPE '='

This query specifies = as our ESCAPE symbol, therefore the % symbol listed after it can be searched for.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Subqueries – Correlated and Noncorrelated

Subqueries are a great way to easily increase the scope of your result set, and in this post I’m going to explain the difference between Correlated and Noncorrelated Subqueries, as well as provide examples of them both.

Correlated Subqueries

Correlated Subqueries create a link between the outer query and the subquery. This means that the subquery is dependent on the outer query, and won’t execute as a stand alone query.

SELECT *
 
FROM Products AS P1
 
WHERE P1.Price >
              (
               SELECT AVG(P2.Price)
               FROM Products AS P2
               WHERE P2.ProductCategoryId = P1.ProductCategoryId
              )

The above query will return all Products that have a price higher than the average price for each product category. The WHERE clause of the subquery almost acts as if it were a GROUP BY for the outer query in this instance. This is because the subquery and outer query join together on the ProductCategoryId – meaning that the average price is found for each Product Category. Then products with a price greater than the average, for that product category, will be returned.

Noncorrelated Subqueries

Noncorrelated Subqueries are completely independent of the outer query, and will execute as a stand alone query without any issues. These are commonly used within a WHERE clause – much like the example below:

SELECT *
 
FROM Products AS P
 
WHERE P.ProductId IN (SELECT DISTINCT O.ProductId FROM Orders AS O)

The above code will return all rows from the Products table that have atleast one reference in the Orders tables. It does this by returning all of the ProductId’s in the Orders table, to be used in the WHERE clause of the outer query.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

SQL LIKE Clause Syntax

The SQL LIKE clause is a very handy little tool, and it’s a very simple one to use.

A LIKE clause is a wildcard search placed in the WHERE clause, generally used to find a range of data by searching for a character string within a column.

There are several different searchs you can run with a LIKE, and I’ll explain each of them below.

The most common is probably the percent (%) symbol. The % replaces zero or more characters in a string, then searches for matchs of your string.

For example,

SELECT FirstName 
FROM People 
WHERE FirstName LIKE '%ach'

would return ‘Zach’ and anything else that ends in “ach”. The % can be used on either side of the string, or on both sides. You also don’t actually need to specify a string (LIKE ‘%%’), but doing this will return absolutely everything. I can’t see a use for this myself, but it’s there if you need it!

The next wildcard search is the underscore (_). This replaces one character in a character string. For instance “LIKE ‘_at’” would return “At”, “Eat”, “Cat”, “Hat”, “Sat”.. etc.

Square Brackets ([ ]) is a third option. Within the square brackets you specify a range or set of characters which will replace a single character within a string. There are a couple of ways to use the square brackets as below;

LIKE '[abcd]'
LIKE '[a,b,c,d]'
LIKE '[a-d]'

All of the above would return the exact same results. In example one, we are specifying all of the characters we would like to look for in our string. Example two is exactly the same, but has comma’s between the letters to improve readability. Example three is slightly different, instead of specifying the individual characters, we are using a range, which will find anything that equals the two characters you input, and anything inbetween.

The above square bracket examples are actually fairly useless unless you store single letters in a column. To get around this, you can combine the symbols to give you more flexibility in your search.

LIKE ‘[a-e]%’ would return anything that starts with the letters a-e (Air, Beer, Cricket, Dozen, Easy are all examples of what this search would return).

LIKE ‘[0-9][0-9][0-9]‘ can be used to confirm that a value is a 3 digit number.

The last symbol available to the LIKE clause is the caret (^). This can be used in conjuction with square brackets to rule out a set or range of characters.

LIKE ‘[^a-e]%’ will return anything that starts with F-Z.

The NOT key word can also be specified with the LIKE clause to do a wildcard search for things that aren’t similar to your search conditions

NOT LIKE ‘%Stagers%’ would return anything that doesn’t contain the character string “Stagers”.

The NOT keyword also gives you the ability to create a double negative! Again, this is a little pointless, but it’s possible!

NOT LIKE ‘[^a-d]%’ would return anything that starts with an A, B, C, or D.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Finding a list of user defined tables within a SQL Database

Like most companies working with development, we have a production database server and a test database server. The developers do their testing from the test server, and once they’ve finished their development phase and the project is signed off – the database is backed up and restored on the live server.

Occasionally, additional development is required after the site is live (Phase 2, say) – which could mean more database tables to be added.

The above situation happened to me today, and one of the developers asked me to compare the two databases tables, and create any that weren’t already on the production server.

I found a quick way of comparing the two servers, rather than sitting there writing out a list of the two servers tables.

That quick way was the below query:

USE Database1
 
SELECT *
 
FROM sys.all_objects
 
WHERE type = 'U' AND
      is_ms_shipped = 0
 
ORDER BY name

This will find all of the user defined tables associated with the database its run against.

If you remove “AND is_ms_shipped = 0″ from the statement, then you’ll notice that a table named “dtproperties” is listed. This is actually a system table used to store information about database diagrams. It comes back as a user defined table due to the way it’s stored internally. Luckily Microsoft have added the is_ms_shipped flag to the table, so that we can easily filter items like this out.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Working with Optional Parameters in Stored Procedures

Using optional parameters within a stored procedure is a good way to give the caller a little more control over what is being returned.

For example, if you were writing a stored procedure which was going to be used by a reporting system, and within the system, a user will have the ability to narrow down a search by choosing options from various drop down lists. Instead of writing many different stored procedures to handle each of the different drop down lists, you can do it all within one using optional parameters.

An “optional parameter” is a parameter which is set to NULL by default.

CREATE PROCEDURE dbo.GetSearchResults
 
@Franchise	VARCHAR(20) = NULL
@Make		VARCHAR(20) = NULL
@Colour		VARCHAR(20) = NULL
 
AS
 
SELECT Franchise,
       Make,
       Colour
 
FROM Vehicles
 
WHERE (@Franchise IS NULL OR Franchise = @Franchise) AND
      (@Make IS NULL OR Make = @Make) AND
      (@Colour IS NULL OR Colour = @Colour)

In the above example, if a value isn’t passed in for one of the parameters, then the default of NULL is used. This means that the first condition in the WHERE for that parameter (The “@Franchise IS NULL”, for example) is true because the variable is NULL, and execution continues on to the next line.

The above stored procedure can also be called in a few different ways, because of the added flexibility gained by using optional parameters:

1)

EXEC dbo.GetSearchResults

This method simply uses all of the default values for each parameter, and the query results aren’t filtered in the where clause at all

2)

EXEC dbo.GetSearchResults 'Volkswagen', 'Golf'

This execution method uses the values passed in and assigns them to the parameter in the same position that the parameter is declared. By that I mean, @Franchise is declared first in the procedure, so Volkswagen would be assigned to this variable as it’s passed in first. This method is a little dangerous, as if the value intended for @Colour is passed in first, no results will be returned.

3)

EXEC dbo.GetSearchResults @Franchise = 'Volkswagen', @Colour = 'Red'

This method is probably the safest of the bunch, as it explicitly defines the value to the parameter. Any parameters that aren’t defined in the EXEC use the default value set in the procedure.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Finding Duplicates With SQL

I’ve been asked several times over the last few days to help people find duplicates within a particular table, and below is a very basic bit of SQL which will do exactly that:

SELECT Column1,
       COUNT(Column1) [Data Count]
 
FROM Table
 
GROUP BY Column1
 
HAVING COUNT(Column1) > 1

Fairly straight forward, every where that I’ve written “Column1”, put the name of the column that you’d like to check for duplicates. You’ll obviously also need to replace “table” with the name of your table.

The “HAVING” clause just makes sure that only records with more than 1 record are returned, this can be modified as you please, or removed completely.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

SQL Server Queries – Order of Operations

I’m going to be performing a lot of query optimization this year, and knowing the order in which SQL Server executes queries is going to play a major part in the decisions I make when re-writing slow running queries.

The Order

Below is the exact order in which a query is executed:

FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY

I found it interesting that the FROM and WHERE were executed first, mainly because before I found this out – I’d always assumed that SQL worked in the same way that many other programming languages work, in that they execute from top to bottom, but the above does make perfect sense.

You can’t get data until you know where it comes from, hence the FROM clause being executed first.

The data from the tables your query now has access to is then filtered down by the WHERE clause, as it would be inefficient to return everything, and then filter it out.

The GROUP BY and HAVING clauses are then executed, if specified. Again, this makes perfect sense as the HAVING clause is essentially another WHERE, but with the aggregation of a column, and you can’t have a HAVING without a GROUP BY to get that aggregation!

Next the SELECT is executed to bring back all of the filtered and / or aggregated data.

Lastly, the ORDER BY is called.

The fact that the ORDER BY is called after the SELECT, but the GROUP BY isn’t, makes it clear as to why an aliase can be used within the ORDER BY clause, but not the GROUP BY.

Because the FROM is executed first, it’s a good idea to do as much filtering here as possible. Doing this will improve effeciency by having less data returned in the first place to be filtered by the WHERE.

Below is an example of a query filtering in the FROM, and below that is the same query written with the filtering in the WHERE.

SELECT *
 
FROM Orders O 
     JOIN Products P ON P.ProductId = O.ProductId 
          AND O.IsShipped = 1

Is equivalent to:

SELECT *
 
FROM Orders O 
     JOIN Products P ON P.ProductId = O.ProductId
 
WHERE O.IsShipped = 1
Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Ranking Functions

There are four different system functions within SQL 2008 that allow you to rank your data sets.

First of all I’d like to mention the OVER() clause which is applied to all ranking functions. The OVER() clause essentially tells SQL which way (ascending or descending) rows should be ranked, and if the rankings should be grouped, or partitioned, in any way. SQL keyword “OVER()” has other uses, and the above only applies to its use within a ranking function.

ROW_NUMBER()

The first ranking function I’d like to discuss, and probably the easiest to grasp, is ROW_NUMBER(). This function applies a rank based on where in the result set the result appears. If it’s second in the result set, rank 2 will be applied.

SELECT Name,
       Age,
       ROW_NUMBER() OVER(ORDER BY Age ASC)
 
FROM Person

The above example would return a persons name and age, as well as their “rank” within the result set. OVER(ORDER BY Age ASC) is ordering the ranking by age in an ascending order. For example, if the result set consisted of 3 people – one aged 1, one aged 4, and one aged 16. The record with the Age of 1 would have a rank of 1, the record with the age of 4 would have a rank of 2, and the final record with an age of 16 would have a rank of 3. If we replaced ASC with DESC in the query, the result set would be the same – but with different ranks. 16 would be rank 1, 4 ranked 2 (as it’s in the middle, it wouldn’t change), and 1 ranked 3.

RANK()

Next we have RANK():

SELECT Name,
       Age,
       RANK() OVER(ORDER BY Age ASC)
 
FROM Person

If the above query was applied to a result set with two people aged 12, one person aged 14, and one person aged 18 – then both of the people aged 12 would be rank 1, the person aged 14 would be rank 3 (because two people have already been returned, making the next tank available 3), and the person aged 18 would be rank 4. Again, if we changed ASC with DESC then the person aged 18 would be rank 1, 14 rank 2, and the two people age 12 would both be rank 3.

DENSE_RANK()

DENSE_RANK() is very similar to RANK(), I’ll be using the same example from the RANK() explanation to explain DENSE_RANK(). :

SELECT Name,
       Age,
       DENSE_RANK() OVER(ORDER BY Age ASC)
 
FROM Person

This time, the rankings are worked out in the exact same way as with RANK(), but instead of skipping a number if there are two people with the same age – it will apply the next value directly after the previous value.

Both people aged 12 would be rank 1, the person aged 14 would be rank 2, and the person aged 18 would be rank 3 in the case. Making the ranking values more “dense”.

NTILE()

SELECT Name,
       Age,
       NTILE(2) OVER(ORDER BY Age ASC)
 
FROM Person

This function is the “odd one out”, if you will, of all the ranking functions. NTILE() is the only ranking function which accepts a parameter. The value passed in must be an integer value of 1 or greater. The value passed in is used to break the result sets into groups, and rank by those groups.

For example, if we passed in a value of 1, and the above query returned 4 results, then all 4 rows would have a rank of 1. If we passed in 2, like in the example code, then the first 2 records (the two with the lowest ages) would be rank 1, and the other two (the older two) would be rank 2.

For the inquisitive types who may be wondering what happens when there isn’t an even number of results – if there are 7 records, and you pass in an even number (such as 2), then the first 4 records would be rank 1, and the last 3 would be rank 2.

PARTITION BY

All of the ranking functions have another option which can be placed in the OVER() clause. This is the PARTITION BY clause, and it’s used to apply ranks within groups, or partitions.

SELECT Name,
       Age,
       ROW_NUMBER() OVER(PARTITION BY Age ORDER BY Age ASC)
 
FROM Person

Using the above query, imagine that we have a result of three people aged 10, another three people aged 20, and one person aged 25. The three people aged 10 will be ranked from 1 to 3, then the three people aged 20 will again be ranked 1 to 3, the person aged 25 will be ranked 1. This is because they have been separated into their own ranking groups because they have the same age.

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)