Tuesday, July 10, 2012

DELETE vs. TRUNCATE with IDENTITY - A Note of Caution


A quick post about the differences between DELETE and TRUNCATE when dealing with IDENTITY columns, in response to a recent question from a colleague.

Let's suppose we have the following table:

 CREATE TABLE TestIdentityValues (
  uid  INT IDENTITY (1,1) PRIMARY KEY NOT NULL,
  data  VARCHAR(MAX) )

Now we populate the table with some sample data:

 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 1')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 2')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 3')

Now we SELECT the data out of the table.  As you'd expect, the IDENTITY column [uid] has values 1, 2 and 3.

 UID DATA
 --- ----
 1 Test # 1
 2 Test # 2
 3 Test # 3

Let's say we need to remove all the data from the table.  In real life, this could be because the data is a staging table, or tied to a temporal dataset, or is a temporary structure only.

 TRUNCATE TABLE TestIdentityValues;

Now let's re-populate the table with the next three test values. 

 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 4')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 5')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 6')

Here's the result set:

 UID DATA
 --- ----
 1 Test # 4
 2 Test # 5
 3 Test # 6

Notice what's happened?  The IDENTITY column has 'reset' itself.  When the TRUNCATE occurred, the records were removed from last to first.  Hence, with an empty table, the next available IDENTITY value is 1, since the order of removal was 3, 2, 1.

Now let's re-test this theory with DELETE.

 DELETE FROM TestIdentityValues;

The table is now empty. Let's populate it:

 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 7')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 8')
 INSERT INTO TestIdentityValues (data)
 VALUES ('Test # 9')

As the table is empty, we should get UID values 1, 2, 3, right?  Wrong.

 UID DATA
 --- ----
 4 Test # 7
 5 Test # 8
 6 Test # 9

The DELETE has deleted the existing records in-place, and doesn't 'reset' the IDENTITY values.  Instead, new, unused IDENTITY values are assigned.

Be careful when using IDENTITY.  If you are using a staging table, for example, that logs advertisement clicks on your website, or traffic, or some other high-volume statistic, and you use IDENTITY with a DELETE when the data has been processed, you may find yourself running out of values.  IDENTITY is limited to the maximum values of the datatype associated with it - INT or BIGINT.  The maximum value of an INT column is 2.1bn - sounds like a lot, but it's very little in this day and age.  Consider how many page hits Amazon gets in a month, perhaps.

However DELETE isn't always bad.  If you're also using the IDENTITY column as a FK on another table, you'll find that a TRUNCATE will ruin your constraints, if you even get it past the parser. 

Choose your method carefully.  Where possible, IDENTITY columns should not be directly INSERTed into (you can do it with explicit INSERT syntax and SET IDENTITY_INSERT ON).  Ideally, of course, you'd have a real PK and not a surrogate.

Visit my website:  http://www.derekcolley.co.uk/ for SQL Server database consultancy.

 

No comments:

Post a Comment