SQL Blog of Zach Stagers

www.scratchbox.co.uk

Scalar Functions to Return First / Last Day of Month

No Comments »

Last Day of the Month

Below is the syntax used to create an extremely simple scalar function to return the last day of the month for the date passed in.

CREATE FUNCTION dbo.fnLastDayOfMonth (@Date DATE)
RETURNS DATE
AS
BEGIN

SELECT @Date = DATEADD(M, 1, DATEADD(D, -DAY(@Date), @Date))

RETURN @Date

END

The function does take leap years into consideration, as executing the below select against it, does return “2012-02-29″.

SELECT dbo.fnLastDayOfMonth('2012-02-22')

First Day of the Month

Below is the SQL used to create a function which returns the first day of the month for the date passed in.

CREATE FUNCTION dbo.fnFirstDayOfMonth (@Date DATE)
RETURNS DATE
AS
BEGIN

SELECT @Date = DATEADD(D, -DAY(@Date)+1, @Date)

RETURN @Date

END

Using the IN clause

No Comments »

Using the IN clause with a WHERE is a simple way to tidy up a query, making it much more readable, and potentially increasing query performance.

SELECT *
FROM Person
WHERE PersonId IN (1, 2, 3)

The above is equivalent to

SELECT *
FROM PERSON
WHERE PersonId = 1 OR
       PersonId = 2 OR
       PersonId = 3

As you can see, the second query contains a lot more syntax for the same result.

You can also use a Noncorrelated Subquery within the brackets of the IN clause, as below:

SELECT *
FROM Person
WHERE PersonId IN (
                   SELECT DISTINCT
                          PersonId
                   FROM Orders
                  )

The above code would return all Person records, who have a record in the Orders table.

Alternatively, you can use NOT IN to exclude specific records.

SELECT *
FROM Person
WHERE PersonId NOT IN (1, 2, 3)

This would return all records that do not have a PersonId of 1, 2, or 3.

TRY…CATCH Error Handling in SQL

No Comments »

The TRY…CATCH code was introduced in SQL 2005 as an easy way to manage errors. There are several system functions that go along with the TRY…CATCH which can much of the necessary information to resolve any errors that may arise.

I’ll first discuss the how the TRY…CATCH works, and its syntax.

There are two parts to a TRY…CATCH, and if it isn’t already obvious enough they are the TRY, and the CATCH. The TRY just encapsulates your SQL, for example the code within a stored procedure. The CATCH is where things get much more interesting.

First, the SQL contained within the TRY block is executed (The TRY tries to execute the SQL – hence the name). If the TRY fails for any reason, then control is immediately passed to the CATCH block where you can output any relevant information, or do as much damage recovery as possible. Note that only errors with a severity over 10 will push control over to the CATCH, any error with a severity below 10 will just output its usual error message.

The syntax is easy enough, as shown in the example below:

BEGIN TRY
	PRINT 7/0
END TRY
BEGIN CATCH
	INSERT INTO ErrorLog
	(ErrorLine, ErrorMessage, ErrorSeverity, ErrorState, ErrorNumber)
	VALUES(ERROR_LINE(), ERROR_MESSAGE(), ERROR_SEVERITY(), ERROR_STATE(), ERROR_NUMBER())
        PRINT 'An error occured, and has been logged'
END CATCH

Executing the above PRINT statement would normally produce the “Divide by zero error encountered.“, but using the Error functions listed in the CATCH, you can control what you do with the values. In the example, the details of the error are inserted into an error log table, and a message is displayed letting the user know so.

The Error Functions

I’ve used them in an example, but you may be wondering exactly what each of them return. Some of them are more useful than others, and this changes depending on the situation. But I’ll give a brief description of each;

  • ERROR_LINE() – This is one of my favorites, and can be very handy in stored procedures with many lines of code. This function will return the exact line that the error occured on. See “Enabling Line Numbers in SSMS 2008
  • ERROR_MESSAGE() – This provides the text that is normally produced by the error that occured, for example earlier in this post, “Divide by zero error encountered.” would be returned
  • ERROR_SEVERITY() – Returns the severity of the error. Remember that only errors with a severity over 10 will actually pass control to the CATCH. The CATCH will also not be invoked if the error produced causes you to lose your connection with the database.
  • ERROR_STATE() – If the same error number & message is capable of being raised by different causes, this will return an integer value which should help you find more information on the error you’ve recieved and what caused it.
  • ERROR_NUMBER() – Returns the errors number, again for our divide by zero error – “8134″ would be returned here.

There are a two other functions that I’d like to mention, but have not yet given examples of.

The first is XACT_STATE(). This function returns a small integer (-1, 0, or 1), and knowing that they mean allows you to have further control over your queries.

If the XACT_STATE() returns a 0, then there isn’t currently an open transaction.
1 would mean that there is an open transaction, and it’s in a commitable state.
-1 means that there is an open transaction, but it isn’t in a commitable state.

Knowing this allows you to do something like the below in your CATCH if you’re using transactions:

BEGIN CATCH
      IF (XACT_STATE()) = 1
        BEGIN
         COMMIT TRANSACTION
        END

      IF (XACT_STATE()) = -1
        BEGIN
         ROLLBACK TRANSACTION
        END
END CATCH

The next isn’t so much a function, but an option that can be switched on and off – XACT_ABORT.

SET XACT_ABORT ON

When XACT_ABORT is turned on, if a run-time error is raised, the current transaction will be rolled back automatically.

Note that this only occurs for run-time errors, not syntax errors.

Ranking Functions

No Comments »

There are four different system functions within SQL 2008 that allow you to rank your data sets.

First of all I’d like to mention the OVER() clause which is applied to all ranking functions. The OVER() clause essentially tells SQL which way (ascending or descending) rows should be ranked, and if the rankings should be grouped, or partitioned, in any way. SQL keyword “OVER()” has other uses, and the above only applies to its use within a ranking function.

ROW_NUMBER()

The first ranking function I’d like to discuss, and probably the easiest to grasp, is ROW_NUMBER(). This function applies a rank based on where in the result set it is. If it’s second in the result set, rank 2 will be applied.

SELECT Name,
       Age,
       ROW_NUMBER() OVER(ORDER BY Age ASC)
FROM Person

The above example would return a persons name and age, as well as their “rank” within the result set. OVER(ORDER BY Age ASC) is ordering the ranking by age in an ascending order. For example, if the result set consisted of 3 people – one aged 1, one aged 4, and one aged 16. The record with the Age of 1 would have a rank of 1, the record with the age of 4 would have a rank of 2, and the final record with an age of 16 would be rank of 3. If we replaced ASC with DESC in the query, the result set would be the same – but with different ranks. 16 would be rank 1, 4 ranked 2 (as it’s in the middle, it wouldn’t change), and 1 ranked 3.

RANK()

Next we have RANK():

SELECT Name,
       Age,
       RANK() OVER(ORDER BY Age ASC)
FROM Person

If the above query was applied to a result set with two people aged 12, one person aged 14, and one person aged 18 – then both of the people aged 12 would be rank 1, the person aged 14 would be rank 3 (because two people have already been returned, making the next tank available 3), and the person aged 18 would be rank 4. Again, if we changed ASC with DESC then the person aged 18 would be rank 1, 14 rank 2, and the two people age 12 would both be rank 3.

DENSE_RANK()

DENSE_RANK() is very similar to RANK(), I’ll be using the same example from the RANK() explanation to explain DENSE_RANK(). :

SELECT Name,
       Age,
       DENSE_RANK() OVER(ORDER BY Age ASC)
FROM Person

This time, the rankings are worked out in the exact same as RANK(), but instead of skipping a number if there are two people with the same age – it will apply the next value directly after the previous value.

Exactly the same as with RANK(), both people aged 12 would be rank 1, the person aged 14 would be rank 2, and the person aged 18 would be rank 3. Making the ranking values more “dense”.

NTILE()

SELECT Name,
       Age,
       NTILE(2) OVER(ORDER BY Age ASC)
FROM Person

This function is the “odd one out”, if you will, of all the ranking functions. NTILE() is the only ranking function which accepts a parameter. The value passed in must be an integer value of 1 or greater. The value passed in is used to break the result sets into groups, and rank by those groups.

For example, if we passed in a value of 1, and the above query returned 4 results, then all 4 rows would have a rank of 1. If we passed in 2, like in the example code, then the first 2 records (the two with the lowest ages) would be rank 1, and the other two (the older two) would be rank 2.

For the inquisitive types who may be wondering what happens when there isn’t an even number of results – if there are 7 records, and you pass in an even number (such as 2), then the first 4 records would be rank 1, and the last 3 would be rank 2.

PARTITION BY

All of the ranking functions have another option which can be placed in the OVER() clause. This is the PARTITION BY clause, and it’s used to apply ranks within groups, or partitions.

SELECT Name,
       Age,
       ROW_NUMBER() OVER(PARTITION BY Age ORDER BY Age ASC)
FROM Person

Using the above query, imagine that we have a result of three people aged 10, another three people aged 20, and one person aged 25. The three people aged 10 will be ranked from 1 to 3, then the three people aged 20 will again be ranked 1 to 3, the person aged 25 will be ranked 1.

SQL Server Queries – Order of Operations

No Comments »

I’m going to be performing a lot of query optimization this year, and knowing the order in which SQL Server executes queries is going to play a major part in the decisions I make when re-writing slow running queries.

The Order

Below is the exact order in which a query is executed:

  • FROM
  • WHERE
  • GROUP BY
  • HAVING
  • SELECT
  • ORDER BY

I found it interesting that the FROM and WHERE were executed first, mainly because before I found this out – I’d always assumed that SQL worked in the same way that many other programming languages work, in that they execute from top to bottom, but the above does make perfect sense.

You can’t get data until you know where it comes from, hence the FROM clause being executed first.

The data from the tables your query now has access to is then filtered down by the WHERE clause, as it would be inefficient to return everything, and then filter it out.

The GROUP BY and HAVING clauses are then executed, if specified. Again, this makes perfect sense as the HAVING clause is essentially another WHERE, but with the aggregation of a column, and you can’t have a HAVING without a GROUP BY to get that aggregation!

Next the SELECT is executed to bring back all of the filtered and / or aggregated data.

Lastly, the ORDER BY is called.

The fact that the ORDER BY is called after the SELECT, but the GROUP BY isn’t, makes it clear as to why an aliase can be used within the ORDER BY clause, but not the GROUP BY.

Because the FROM is executed first, it’s a good idea to do as much filtering here as possible. Doing this will improve effeciency by having less data returned in the first place to be filtered by the WHERE.

Below is an example of a query filtering in the FROM, and below that is the same query written with the filtering in the WHERE.

SELECT *
FROM Orders O JOIN
     Products P ON P.ProductId = O.ProductId AND
                   O.IsShipped = 1

Is equivalent to:

SELECT *
FROM Orders O JOIN
     Products P ON P.ProductId = O.ProductId
WHERE O.IsShipped = 1