Bug - Rebuilding temp indexes on-the-fly
Recently I was working on a query that used temporary tables (#this) extensively to shuffle data around in preparation for some large INSERT batches. I didn't code up the query originally, but was looking for ways to make it more efficient. After examining the way in which the temp tables were being built and populated, I decided to try adding an index, populating the temp tables then rebuilding the index to make the tables read-efficient. When the SP got to the stage where these temp tables were SELECTed from to build the INSERT statement, table reads would then be replaced by a clustered index scan.Here's a simplified example of part of the original code:
SELECT a.col1, a.col2, b.col1, b.col2, b.col3, c.col1
INTO #temp1
FROM tableA a INNER JOIN tableB b ON a.id = b.id
INNER JOIN tableC c ON a.id = c.id
WHERE a.dateField BETWEEN @startDate AND @endDate
-- returns 500,000 rows. @startDate and @endDate are passed-in parameters.
-- There's a few of these builds into #tempX tables going on.
INSERT INTO finalTable (valA, valB, valC, valD, valE, valF, valG)
SELECT x.col1, x.col2, x.col3, y.col1, y.col2, z.col1, z.col2
FROM #temp1 x INNER JOIN #temp2 y ON x.col2 = y.col2
LEFT JOIN #temp3 z ON y.col4 = z.col
-- Example query that's a bit over-complex and ridiculous.
-- Complicated in real life as the code is couched in dynamic SQL, passing in table names.
-- And there are multiple JOINs and external dependencies i.e. JOINs on tables in linked servers.
To make a start on optimising this, I threw in some reindexing, since the execution plan showed table scans of epic proportions when doing the INSERT. I put an index on the column with unique values, to create a clustered index (with PK). I then created another non-clustered index INCLUDing the columns referenced by the SELECT:
CREATE CLUSTERED INDEX ix_temp1 ON #temp1 (col1);
CREATE NON-CLUSTERED INDEX ix_temp1_nc ON #temp1 (col1) INCLUDE (col2, col3);
No problems there. Examining the execution plan again, the clustered index seek was still there but performance of the query (execution time) was still dire.
I realised I was creating the indexes on empty tables, before they were populated. So the indexes were as useful as an ashtray on a motorbike:
I put this into the code after the population:
ALTER INDEX ix_temp1 ON #temp1 REBUILD WITH (ONLINE=OFF);
ALTER INDEX ix_temp1_nc ON #temp1 REBUILD WITH (ONLINE=OFF);
And here's the interesting bit:
Msg 608, Level 16, State 1, Procedure sp_MyProcedure, Line 149 No catalog entry found for partition ID 72059596684984320 in database 2. The metadata is inconsistent. Run DBCC CHECKDB to check for a metadata corruption.
Metadata corruption? Wtf? A quick run through of DBCC CHECKDB on the affected databases, and on TEMPDB, showed no corruption whatsoever.
So I looked into it, and here's what I found. There's a thread on SQLServerCentral that Paul Randal and Gail Shaw both contributed to, discussing this very issue. It seems that rebuilding indexes on the fly inside stored procedures on the version of SQL Server I'm using - 2005 - cannot be done. The contributor to the original article posted this as a bug to Microsoft, and Paul Randal confirmed it to be a bug (he and his team wrote most of the indexing functionality in SQL 2005). For those interested, the thread is here -> http://www.sqlservercentral.com/Forums/Topic770808-149-1.aspx
Microsoft have stated that they do not intend to provide a hotfix for this issue, but it will be fixed in 2008 onwards.
The way in which I got around this was to use permanent tables and rebuild the indexes on these each time, instead of using TEMPDB. The index changes are then written to the (permanent) datafiles rather than the temporary page allocations of TEMPDB.
Additionally, TEMPDB doesn't get hit as much (TEMPDB SGAM contention is a real bottleneck in my system).
For more info on indexing, there's a great series of technical articles here -> http://www.sqlservercentral.com/articles/Indexing/68439/
'
No comments:
Post a Comment