Monday, July 23, 2012

Bug - Rebuilding temp indexes on-the-fly

Bug - Rebuilding temp indexes on-the-fly

Recently I was working on a query that used temporary tables (#this) extensively to shuffle data around in preparation for some large INSERT batches.  I didn't code up the query originally, but was looking for ways to make it more efficient.  After examining the way in which the temp tables were being built and populated, I decided to try adding an index, populating the temp tables then rebuilding the index to make the tables read-efficient.  When the SP got to the stage where these temp tables were SELECTed from to build the INSERT statement, table reads would then be replaced by a clustered index scan.
Here's a simplified example of part of the original code:

 SELECT a.col1, a.col2, b.col1, b.col2, b.col3, c.col1
 INTO #temp1
 FROM tableA a  INNER JOIN tableB b ON a.id = b.id
   INNER JOIN tableC c ON a.id = c.id
 WHERE a.dateField BETWEEN @startDate AND @endDate

 -- returns 500,000 rows.  @startDate and @endDate are passed-in parameters.
 -- There's a few of these builds into #tempX tables going on.


 INSERT INTO finalTable (valA, valB, valC, valD, valE, valF, valG)
  SELECT x.col1, x.col2, x.col3, y.col1, y.col2, z.col1, z.col2
  FROM #temp1 x  INNER JOIN #temp2 y ON x.col2 = y.col2
    LEFT JOIN #temp3 z ON y.col4 = z.col

 -- Example query that's a bit over-complex and ridiculous.
 -- Complicated in real life as the code is couched in dynamic SQL, passing in table names.
 -- And there are multiple JOINs and external dependencies i.e. JOINs on tables in linked servers.


To make a start on optimising this, I threw in some reindexing, since the execution plan showed table scans of epic proportions when doing the INSERT.  I put an index on the column with unique values, to create a clustered index (with PK).  I then created another non-clustered index INCLUDing the columns referenced by the SELECT:

 CREATE CLUSTERED INDEX ix_temp1 ON #temp1 (col1);
 CREATE NON-CLUSTERED INDEX ix_temp1_nc ON #temp1 (col1) INCLUDE (col2, col3);


No problems there.  Examining the execution plan again, the clustered index seek was still there but performance of the query (execution time) was still dire.
I realised I was creating the indexes on empty tables, before they were populated.  So the indexes were as useful as an ashtray on a motorbike:

I put this into the code after the population:

 ALTER INDEX ix_temp1 ON #temp1 REBUILD WITH (ONLINE=OFF);
 ALTER INDEX ix_temp1_nc ON #temp1 REBUILD WITH (ONLINE=OFF);


And here's the interesting bit:

 Msg 608, Level 16, State 1, Procedure sp_MyProcedure, Line 149 No catalog entry found for partition ID 72059596684984320 in database 2. The metadata is inconsistent. Run DBCC  CHECKDB to check for a metadata corruption.
Metadata corruption?  Wtf?  A quick run through of DBCC CHECKDB on the affected databases, and on TEMPDB, showed no corruption whatsoever.


So I looked into it, and here's what I found.  There's a thread on SQLServerCentral that Paul Randal and Gail Shaw both contributed to, discussing this very issue.  It seems that rebuilding indexes on the fly inside stored procedures on the version of SQL Server I'm using - 2005 - cannot be done.  The contributor to the original article posted this as a bug to Microsoft, and Paul Randal confirmed it to be a bug (he and his team wrote most of the indexing functionality in SQL 2005). For those interested, the thread is here -> http://www.sqlservercentral.com/Forums/Topic770808-149-1.aspx

Microsoft have stated that they do not intend to provide a hotfix for this issue, but it will be fixed in 2008 onwards.

The way in which I got around this was to use permanent tables and rebuild the indexes on these each time, instead of using TEMPDB.  The index changes are then written to the (permanent) datafiles rather than the temporary page allocations of TEMPDB. 

Additionally, TEMPDB doesn't get hit as much (TEMPDB SGAM contention is a real bottleneck in my system). 

For more info on indexing, there's a great series of technical articles here -> http://www.sqlservercentral.com/articles/Indexing/68439/


 '

 

Tuesday, July 10, 2012

DELETE vs. TRUNCATE with IDENTITY - A Note of Caution


A quick post about the differences between DELETE and TRUNCATE when dealing with IDENTITY columns, in response to a recent question from a colleague.

Let's suppose we have the following table:

 CREATE TABLE TestIdentityValues (
  uid  INT IDENTITY (1,1) PRIMARY KEY NOT NULL,
  data  VARCHAR(MAX) )

Now we populate the table with some sample data:

 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 1')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 2')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 3')

Now we SELECT the data out of the table.  As you'd expect, the IDENTITY column [uid] has values 1, 2 and 3.

 UID DATA
 --- ----
 1 Test # 1
 2 Test # 2
 3 Test # 3

Let's say we need to remove all the data from the table.  In real life, this could be because the data is a staging table, or tied to a temporal dataset, or is a temporary structure only.

 TRUNCATE TABLE TestIdentityValues;

Now let's re-populate the table with the next three test values. 

 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 4')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 5')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 6')

Here's the result set:

 UID DATA
 --- ----
 1 Test # 4
 2 Test # 5
 3 Test # 6

Notice what's happened?  The IDENTITY column has 'reset' itself.  When the TRUNCATE occurred, the records were removed from last to first.  Hence, with an empty table, the next available IDENTITY value is 1, since the order of removal was 3, 2, 1.

Now let's re-test this theory with DELETE.

 DELETE FROM TestIdentityValues;

The table is now empty. Let's populate it:

 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 7')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 8')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 9')

As the table is empty, we should get UID values 1, 2, 3, right?  Wrong.

 UID DATA
 --- ----
 4 Test # 7
 5 Test # 8
 6 Test # 9

The DELETE has deleted the existing records in-place, and doesn't 'reset' the IDENTITY values.  Instead, new, unused IDENTITY values are assigned.

Be careful when using IDENTITY.  If you are using a staging table, for example, that logs advertisement clicks on your website, or traffic, or some other high-volume statistic, and you use IDENTITY with a DELETE when the data has been processed, you may find yourself running out of values.  IDENTITY is limited to the maximum values of the datatype associated with it - INT or BIGINT.  The maximum value of an INT column is 2.1bn - sounds like a lot, but it's very little in this day and age.  Consider how many page hits Amazon gets in a month, perhaps.

However DELETE isn't always bad.  If you're also using the IDENTITY column as a FK on another table, you'll find that a TRUNCATE will ruin your constraints, if you even get it past the parser. 

Choose your method carefully.  Where possible, IDENTITY columns should not be directly INSERTed into (you can do it with explicit INSERT syntax and SET IDENTITY_INSERT ON).  Ideally, of course, you'd have a real PK and not a surrogate.

Visit my website:  http://www.derekcolley.co.uk/ for SQL Server database consultancy.