Monday, September 10, 2012

SQL Saturday #162 - Cambridge, UK


SQL Saturday #162 - Cambridge, UK


After a perilous drive across the foggy moors of the Pennines, and a nice, straight run down the Great North Road, I arrived in Cambridge at 07:45 on Saturday morning and - wow, what a crowd!

Being my first SQL Saturday, I was a bit ignorant of the drill, so waded right in with my SpeedPASS admission and raffle tickets, and went and found some familiar faces.

The sponsors were about doing their bit - Fusion IO had the most notable stall, with a $16,000 PCI-e Flash memory card available to take a look at, and a live demonstration of the speed increase of Fusion IO kit (log flushes measured in GB/sec!).  I'm definitely sold, although at the current high price of Fusion IO kit, my manager might not yet be.  Still, I can see it replacing the SAN, with long-term durability and the capability to fit many more GB/cm3 than currently possible with rotational drives.

Red Gate were out in force too, with an army of red-clad salespeople with leaflets and live demos of their SQL Monitor and SQL Storage Compress technologies.  I'm not the biggest fan of Red Gate products, finding that some are excellent (SQL Prompt, Data Generator, Search in particular) and some are just not right for me - SQL Backup (2 licenses required to restore to a secondary server - nice to find that out at 2am) and SQL Monitor (too simple, not enough depth or breadth).

SQL Sentry provided the best demo for me - the sheer amount of complex information available from one screen that will enable me to make the correct decisions in a disaster scenario (or simply check the health of my databases in a glance) was astounding.  Their software has features that would take me weeks to code up individually.  Note to Red Gate.  I like complexity!  Give me complexity!  The K.I.S.S. principle is NOT for me!

The sessions were great, too.  I started with Neil Hambly's session on Extended Events, which I've read a little about before but had trouble getting to grips with some of the concepts.  Neil explained it well, unfortunately my core environment is 2005 Standard (temporarily) but I look forward to applying some of his tips.  Buck Woody followed Neil's session with a well-delivered and uplifting keynote speech, with a brief history of SQL Server (tip: did you know that the silhouette icon in Outlook is modelled on the mugshot of Bill Gates as a teenager?).

Next up for me was Tobiasz Koprowski, an engaging Polish DBA with a great session on disaster recovery.  He spoke from experience, detailing the steps you need to take in an emergency (be prepared, basically), and including tips on the things you would perhaps never normally think about.  How to open a server rack when you don't have the keys.  Who to get out of bed when your server falls over at 2am.  Why hand warmers are important in a server room. 

I followed this with Niko Neugeberger's talk on inheriting a database for developers.  I was particularly interested in this, since I have also inherited a large number of databases from developers with a limited amount of DBA participation before I joined.  He spoke about improving performance, about checking how tables have been built and maintained, and although geared at developers the talk was very relevant to my work as a DBA, detailing for example the importance of keeping statistics updated on tables and about index fragmentation.

Straight afterwards was Martin Cairns session on Partitioning Design for Performance and Maintainability.  Martin gave an overview of partitioning and explained at length about how to design partition functions and schemas, and how to improve performance by using techniques such as partition elimination (similar to Denali's COLUMNSTORE index using segment elimination) and the different types (horizontal, vertical and filegroup) of partitioning (and when to use them).  What Martin lacked in engagement he made up for with content, as I scribbled down 3 or 4 pages of useful notes during his talk.

Straight after, I delved into the Fusion IO sponsor session, where I found out the difference between Fusion IO cards and SSDs (there IS a difference!) and just how quickly this new technology works.  

After the sponsor session, I had to stop for lunch - information overload.  The Crowne Plaza had provided some bagged lunch, so I took it outside and got chatting to a few people.  

After lunch, another marathon session.  I went for Hugo Kornelis' session on Making Your Queries Fly with COLUMNSTORE Indexes - this was my first introduction to the 2012 index type and I was extremely impressed.  Hugo had set up working demos, with performance gains demonstrated of 75x the speed of ordinary clustered indexes.  This was a very popular session, with the room hitting maximum occupancy.

Back in to Tobiasz Koprowski's licensing session (of relevance to me, since I recently had to navigate the murky waters of re-licensing under the 2012 per-core model) then onto Phil Quinn's session on XML.  Having had a bad experience with the XQuery features of 2005/08 recently (XML queries blackboxing as 'Remote Query' in the execution plan, leading to unacceptable delays during shredding) I was keen to get his views.  Phil went through 4 or 5 methods of using XML effectively in SQL Server, and referenced the very performance problem I had noticed.  Very helpful session with lots of reference material to take away and a big thank you to him for his time afterwards, patiently answering my XML questions.

After Phil, straight into Mark Broadbent's session (he is the SQL Cambs PASS chapter leader / user group organiser) on READPAST and the true atomicity (as per ACID) of SQL Server.  He went through several demos of where SQL Server does NOT treat transactions atomically (i.e. 'all or nothing') and despite some good-natured heckling from the back (Hugo!) he stepped through an innovative method of processing bucketised data using READPAST, as an alternative to READ COMMITTED / SERIALIZABLE-based locks, to ensure clean data without waits.

After this last session was the prize draw, where some lucky soul won both the OCZ card from Fusion IO AND an Amazon Kindle (fix!) and we heard again from Buck Woody and the organisers.  I left with three new books, a ton of other swag (laptop stickers, stress balls, cups, you name it) and the firm resolve to go again - an excellent experience that I'd recommend to anyone.

Tuesday, September 4, 2012

Auditing Stored Procedure Use in 2005


Auditing Stored Procedure Use in 2005

TL/DR: Generate your own list of extinct stored procedures, or create a frequency map of stored procedure calls.


Another blog post and a long delay since my last one, so my apologies to regular readers (if I have any, of course! :-) ).  If this isn’t your first visit, I hope you found value in my previous posts.  Now, I’ve dreamt up something else.  Today’s topic is how to get a list of stored procedures in your database that simply aren’t used any more (or conversely, are used, and you need a frequency map of how often, by whom, etc.)

Some background: My current shop has around 3000 stored procedures spread over two dozen databases.  I inherited most of these, and they came with no documentation or description of any kind.  Formatting, comments, structured development, even naming conventions were largely absent.  Sadly, nobody knew what ran what – or, more accurately, those who did, left.

So I created the following which assisted me in identifying those stored procedures in my architecture that were simply no longer used.  That, to paraphrase John Cleese, were ex-stored procedures.  They were no more.  They were, in fact, extinct.

Follow and modify the steps below to suit you.  If you have any suggestions for improvement, I’d be glad to hear them – a day where I don’t learn something new is a day wasted.


Step 1:  Set up the server-side trace. Define the trace as follows, or use Profiler and create your own.  You’ll need to capture all SP: Starting and RPC: Starting events, with TextData and DatabaseName columns as a minimum.  I’ve got some extra columns here so I can mess with the data at a later date.

-- Create a Queue
declare @rc int
declare @TraceID int
declare @maxfilesize bigint
set @maxfilesize = 50000

exec @rc =
sp_trace_create @TraceID output, 0, N'<output_path>\file', @maxfilesize, NULL
if (@rc != 0) goto error

-- Set the events
declare @on bit
set @on = 1
exec sp_trace_setevent @TraceID, 11, 1, @on
exec sp_trace_setevent @TraceID, 11, 14, @on
exec sp_trace_setevent @TraceID, 11, 35, @on
exec sp_trace_setevent @TraceID, 11, 12, @on
exec sp_trace_setevent @TraceID, 11, 60, @on
exec sp_trace_setevent @TraceID, 42, 1, @on
exec sp_trace_setevent @TraceID, 42, 14, @on
exec sp_trace_setevent @TraceID, 42, 35, @on
exec sp_trace_setevent @TraceID, 42, 12, @on
exec sp_trace_setevent @TraceID, 42, 60, @on


-- Set the Filters
declare @intfilter int
declare @bigintfilter bigint

-- display trace id for future references
select TraceID=@TraceID
goto finish

error:
select ErrorCode=@rc

finish:
go





Step 2:  When ready, start the trace.

exec sp_trace_setstatus @TraceID, 1




Step 3:  Collect the trace information for as long as you wish.  I recommend 24 hours as a minimum.  To get a true picture of SP usage, you would run this trace for around a week, uninterrupted, to capture all scheduled jobs in that time-frame.  Be careful that you know about jobs with a longer interval, and that the planned run doesn’t coincide with other affecting work, such as planned downtime.
The size of the output of the trace is proportional to the SP call load on the database, so plan for enough disk space.  For me, 24 hours of tracing an average of 210 RPC and local SP calls/sec resulted in 23GB of trace file.




Step 4:  You can monitor what’s going on in sys.traces by querying any column.  However I’ve used a bit of mathematics to make the output more meaningful.  E.g.

DECLARE @starttime DATETIME, @now DATETIME, @delta BIGINT
SET @starttime = ( SELECT start_time FROM sys.traces WHERE status = 1 )
SET @now = GETDATE()
SET @delta = DATEDIFF(ss,@starttime,@now)
select      st.start_time,
            @now [time_now],
            st.file_position, st.event_count, @delta [seconds_elapsed],
            ((24 * 60 * 60) - @delta ) [seconds_left],
            CAST((CAST(ROUND((CAST([file_position] AS DECIMAL(18,2)) / CAST(@delta AS DECIMAL(18,2))),2) AS DECIMAL(18,2)) / 1024768 * 60) AS DECIMAL(18,4)) [MB_per_min],
            CAST((CAST(ROUND((CAST([file_position] AS DECIMAL(18,2)) / CAST(@delta AS DECIMAL(18,2))),2) AS DECIMAL(18,2)) / 1024768 * 60 * 60) AS DECIMAL(18,4)) [MB/hr],
            CAST((CAST(event_count AS DECIMAL(18,2)) / CAST(@delta AS DECIMAL(18,2))) AS DECIMAL(18,2)) [avg_events_per_sec],
            CAST((CAST(@delta AS DECIMAL(18,2)) / ((24 * 60 * 60)) * 100) AS DECIMAL(18,3)) [trace_%_complete],
            (CAST((CAST(ROUND((CAST([file_position] AS DECIMAL(18,2)) / CAST(@delta AS DECIMAL(18,2))),2) AS DECIMAL(18,2)) / 1024768 * 60) AS DECIMAL(18,4)) * 60 * 24) [est_total_file_size_MB]
           

from sys.traces st

This code can be simplified – I think I’ve gotten a bit CAST-happy – so feel free to amend to suit your purpose.  This will output useful columns, such as the # of events so far, the seconds elapsed since starting, the seconds to go to a predefined target, % complete, and a forecast of how much disk space will be required (relies on reasonably linear load pattern).




Step 5:  When you’re finished, stop your trace and remove the server-side definition.

exec sp_trace_setstatus @id = 2, @status = 2
exec sp_trace_setstatus @id = 2, @status = 0




Step 6:  Now, you need to put your trace information into a table, so it can be queried.  This is straightforward.  My work setup is (sadly) 2005 Standard Edition at the moment, but if you’re on 2008+ then use Extended Events rather than fn_trace_gettable, since the latter is deprecated.

CREATE TABLE SANDBOX.dbo.spAuditTraceData ( DatabaseName NVARCHAR(MAX), TextData NVARCHAR(MAX) )

INSERT INTO SANDBOX.dbo.spAuditTraceData
      SELECT DatabaseName, TextData
      FROM fn_trace_gettable('F:\del\sp_audit_output.trc',1)     

(Note: this takes a while.  One useful method I’ve found of finding out how long is to use perfmon and measure an average of the Disk Write Bytes/sec counter.  Then knock off a reasonable amount depending on how busy your server is (hopefully you’re doing this on a quiet dev box).  Then take your total file size (your trace output file) and divide this by your average writes.  Then divide again by 60.  The result is how many minutes you can expect to wait for this step to complete.  For me (23GB) – about 40 minutes (yes, I have high I/O contention and a shitty SAN).




Step 7:  Now you need to create your procedure to compare your captured trace data with the stored procedures in your database, and count up how many times each procedure was executed.  Here’s my take on doing this.  I’ve included the SP header so you know how it works.  Hopefully, you’ll be able to modify and improve upon it, particularly as I’m using two nested CURSORs (for the many-to-many comparisons), which is hardly efficient.  However, like my aging rustbucket of a Ford Escort, it works.

In the code below, I’m using the database ‘SANDBOX_DEV’ and tweaked the parameters to suit the environment.  You will need to do the same.

ALTER PROCEDURE dbo.FindDeadSprocs
      ( @truncateExistingData BIT = 1, @database_name VARCHAR(100) )
AS BEGIN

--------------------------------------------------------------------------------
--  SP:     SANDBOX_DEV.dbo.FindDeadSprocs
--  Author: Derek Colley
--  Updated:      04/09/2012 by Derek Colley.
--
--  Purpose:      Will return the count of times a sproc has been called.
--                      Separates out into two output tables.
--                      Useful for identifying extinct procedures in a database.
--
--  Inputs:       Trace data loaded into SANDBOX_DEV.dbo.spAuditTraceData
--                      in format ( [database_id] BIGINT, [text_data] VARCHAR(MAX) )
--
--  Outputs:      SANDBOX_DEV.dbo.deadSPs
--                      SANDBOX_DEV.dbo.aliveSPs (includes [count] column).
--
--    Parameters:       @truncateExistingData (BIT) = 1
--                            @database_name VARCHAR(100) -- no default, mandatory.
--------------------------------------------------------------------------------

SET NOCOUNT ON
-- get a list of all stored procedures in non-system databases into a temp table var
DECLARE @t1_94875234 TABLE (
      [uid] BIGINT IDENTITY(1,1) PRIMARY KEY NOT NULL,
      [database_name] VARCHAR(100), [schema_name] VARCHAR(MAX),         [proc_name] VARCHAR(MAX),
      [concat_string] AS ( [schema_name] + '.' + [proc_name] ),
      [count] BIGINT DEFAULT 0 NULL
      )
DECLARE @sql NVARCHAR(MAX)
SET @sql=N'
IF ''?'' NOT IN (''MASTER'',''MODEL'',''MSDB'',''TEMPDB'')
BEGIN
      SELECT      ''?'' AS [database_name], SCHEMA_NAME(p.[schema_id]) [schema_name],      p.[name], 0
      FROM  ?.sys.procedures p WITH (NOLOCK)
      WHERE p.type_desc = ''SQL_STORED_PROCEDURE''
END
'
INSERT INTO @t1_94875234
      EXEC sp_msforeachdb @command1 = @sql

-- target for trace data
IF NOT EXISTS(SELECT [name] FROM SANDBOX_DEV.sys.tables WHERE [name] = 'spAuditTraceData') BEGIN
      CREATE TABLE SANDBOX_DEV.dbo.spAuditTraceData (
            database_id INT, text_data VARCHAR(MAX) ) END
ELSE BEGIN
      TRUNCATE TABLE SANDBOX_DEV.dbo.spAuditTraceData END

-- sample data, remove for real.  I included this for testing the SP.
INSERT INTO SANDBOX_DEV.dbo.spAuditTraceData
      SELECT 5, 'EXEC dbo.BuildPrincipalsMap'
      UNION ALL
      SELECT 5, 'EXEC dbo.showJobStats'
      UNION ALL
      SELECT 5, 'EXEC dbo.showJobStats'
      UNION ALL
      SELECT 5, 'EXEC dbo.RebuildIndexes'
      UNION ALL
      SELECT 6, 'EXEC dbo.sp_index_defrag' -- this shouldn't show in results, wrong DBID
      UNION ALL
      SELECT 5, 'EXEC dbo.whatalotofbollocks' -- nor should this, right DB but not a valid SP

-- compare trace data against procedure definition data, add a notch for every match, dependent on given DBID
DECLARE @currentDBID INT, @textdata VARCHAR(MAX), @procdata VARCHAR(MAX), @doesmatch BIT, @puid BIGINT
SET @currentDBID = ( SELECT d.[database_id] FROM sys.databases d WHERE d.[name] = @database_name )  -- put the current DB in here
DECLARE cur_ForEachCall CURSOR LOCAL FAST_FORWARD
FOR ( SELECT text_data FROM SANDBOX_DEV.dbo.spAuditTraceData WHERE database_id = @currentDBID )
OPEN cur_ForEachCall
FETCH NEXT FROM cur_ForEachCall INTO @textdata

WHILE @@FETCH_STATUS = 0
BEGIN
      DECLARE cur_ForEachProc CURSOR LOCAL FAST_FORWARD
      FOR ( SELECT [uid], concat_string FROM @t1_94875234 t INNER JOIN sys.databases d ON t.database_name = d.[name]
            WHERE d.[database_id] = @currentDBID )
      OPEN cur_ForEachProc
      FETCH NEXT FROM cur_ForEachProc INTO @puid, @procdata
      WHILE @@FETCH_STATUS = 0
      BEGIN
            SET @doesmatch = CASE   WHEN CHARINDEX(@procdata,@textdata,1) IS NOT NULL
                                                AND CHARINDEX(@procdata,@textdata,1) <> 0 THEN 1 ELSE 0 END
            IF    @doesmatch = 1
            BEGIN
                  UPDATE @t1_94875234 SET [count] = [count] + 1 WHERE [uid] = @puid
            END
            FETCH NEXT FROM cur_ForEachProc INTO @puid, @procdata
      END
      CLOSE cur_ForEachProc
      DEALLOCATE cur_ForEachProc
      FETCH NEXT FROM cur_ForEachCall INTO @textdata
END
CLOSE cur_ForEachCall
DEALLOCATE cur_ForEachCall

IF NOT EXISTS(SELECT [name] FROM SANDBOX_DEV.sys.tables WHERE [name] = 'deadSPs') BEGIN
      CREATE TABLE SANDBOX_DEV.dbo.deadSPs ( database_name VARCHAR(100), [schema_name] VARCHAR(100), proc_name VARCHAR(MAX) ) END
ELSE BEGIN
      IF @truncateExistingData = 1 BEGIN
            TRUNCATE TABLE SANDBOX_DEV.dbo.deadSPs END END

INSERT INTO SANDBOX_DEV.dbo.deadSPs
      SELECT database_name, [schema_name], [proc_name] FROM @t1_94875234 WHERE [count] = 0 AND database_name = @database_name

IF NOT EXISTS(SELECT [name] FROM SANDBOX_DEV.sys.tables WHERE [name] = 'aliveSPs') BEGIN
      CREATE TABLE SANDBOX_DEV.dbo.aliveSPs ( database_name VARCHAR(100), [schema_name] VARCHAR(100), proc_name VARCHAR(MAX), [count] BIGINT ) END
ELSE BEGIN
      IF @truncateExistingData = 1 BEGIN
            TRUNCATE TABLE SANDBOX_DEV.dbo.aliveSPs END END

INSERT INTO SANDBOX_DEV.dbo.aliveSPs
      SELECT database_name, [schema_name], [proc_name], [count] FROM @t1_94875234 WHERE [count] <> 0 AND database_name = @database_name

PRINT 'Comparison has finished.  Check dbo.deadSPs and dbo.aliveSPs for results.'

END




Step 8:  Using my sample data (6 ‘calls’ in my trace table) I get results like the following:

‘Comparison has finished.  Check dbo.deadSPs and dbo.aliveSPs for results.’

When we query the tables dbo.deadSPs and dbo.aliveSPs, we get (YMMV):

dbo.deadSPs


Dbo.aliveSPs








Now, you can see how easy it is to generate a list of stored procedures that have been in use in the last X minutes, days or hours.  You can use the dbo.deadSPs and dbo.aliveSPs with your own filters to generate your lists.

If you have found this useful / found problems with the scripts / want further clarification / fume at my shameless use of RBAR (delete as appropriate), please feel free to leave a comment below.

Until next time…

Del