Derek Colley: February 2014

Getting Around Strict GROUP BY Restrictions in SQL Server

SQL Server adheres to the ANSI-SQL standard, in that column names in queries with GROUP BY clauses (aggregate) queries must be included in the GROUP BY or in the aggregate clause itself.

However, sometimes you don't want to group by a certain column. Why would that be so? Say, for example, you want to query the msdb.dbo.backupset column for information on your last successful backup for each database, and you also want to return the backup size for that backup, too. Grouping the database name on MAX(backup_date) is fine, but throw in the backup size and you'll end up with a separate row for each database name, distinct backup size and maximum backup date. I'll show you what I mean.

Let's try writing a query to return the database name, backup size and the last backup date, grouped by database name.

SELECT database_name, backup_size, MAX(backup_finish_date)
FROM msdb.dbo.backupset
GROUP BY database_name, backup_size

All very well ... except we have multiple rows per database. So, let's try the next logical step - putting a MAX(backup_size) into the mix:

SELECT database_name, MAX(backup_size), MAX(backup_finish_date)
FROM msdb.dbo.backupset
GROUP BY database_name

Did this work? On the face of it, yes. But let's not be so hasty - let's query the whole dataset on our interesting columns to see if the information presented is, in fact, true:

SELECT database_name, backup_size, backup_finish_date
FROM msdb.dbo.backupset

No! As you can see, the former query with two MAXes returned a row saying that SANDBOX was last backed up on 2014-02-25 20:22:28.000 with a size of 9325877248 bytes. This is not true. The last backup date was 2014-02-25 20:22:28.000 with a backup size of 2727645184 bytes, as seen below. The double MAXes have mangled the result set, displaying incorrect data! Why? Because we simply concatenated both MAX values - in the former query, they are independent, and we need them to be coupled.

Why is this query so difficult to write? This is because T-SQL has a strict adherence to the GROUP BY standards defined in ANSI-SQL unlike some other database engines, such as MySQL. The way to get around this is to query back on the data in a different way - a self-join, but on a slightly altered GROUP BY syntax. This will effectively allow us to JOIN one aggregated query on the data (by backup_finish_date) on another aggregated query (by backup_size), as follows:

SELECT b1.database_name, b2.backup_size, b1.last_backup_date
FROM (
SELECT b.database_name,
MAX(b.backup_finish_date) [last_backup_date]
FROM msdb.dbo.backupset b
GROUP BY b.database_name ) b1
INNER JOIN
(
SELECT b.database_name, b.backup_size,
b.backup_finish_date
FROM msdb.dbo.backupset b ) b2
ON b1.database_name = b2.database_name
AND b1.last_backup_date = b2.backup_finish_date

Why does this work? Basically, the first sub-query contains the last backup date per database. By inner joining on both the database name and the last backup date to the ungrouped result set from msdb.dbo.backupset, we can bring the backup size for that database name and backup date (the maximum per database) into the result set, resulting in a listing of database name, backup size and last backup date (with the backup size correct for the backup date).

This is one way of getting around the GROUP BY restrictions in SQL Server - I'd be keen to hear more. If you have any alternatives, please feel free to leave a comment below.

Update - New Consultancy Launched!

I am pleased to announce the expansion of Optimal Computing into database consultancy - find the site over at http://www.optimalcomputing.co.uk . You can get in touch by e-mailing enquiries@optimalcomputing.co.uk or calling on +44 (0) 161 731 0037. Optimal Computing specialises in data management, with an offering at the moment on SIMS application support for schools and colleges.

More Stuff Published!

The good people over at www.MSSQLTips.com have been kind enough to publish many of my articles over the last couple of years, and the articles have been building up. Find my profile over here --> http://www.mssqltips.com/sqlserverauthor/100/derek-colley/ .

I Went Contracting!

Took the leap in September and I'm contracting now, instead of being a permanent DBA. On my second contract so far and doing well, having fun and earning money. If you know of any contracts in the North West of England, do get in touch - you can find out all about me on my LinkedIn profile which redirects from here -> http://www.derekcolley.co.uk

Get In Touch!
I might be an introverted, slightly aggressive, permanently angry misanthrope in real life but that's just my DBA side showing through :-) No, really, I'm uber-social and welcome contact from anyone in the IT arena - from 'let's be friends' (recruiters, normally) to 'Help! It's all falling down around my ears!'. You can reach me at derek@derekcolley.co.uk or follow me on Twitter: @dcolleySQL.

Derek Colley

Tuesday, February 25, 2014

Thursday, February 20, 2014