Thursday, October 30, 2014

Notes on Nested Transactions

Last week I was approached by one of our developers, with a troublesome piece of code.  Essentially, the T-SQL he was attempting to run was inside a transaction, and using an INTO clause to put the result set into a target table.  Unfortunately, it wasn't working as anticipated - he was able to open the transaction and run the code, but when it came to checking the row count output he noted 0 rows were inserted into the table before he committed - and he was curious why.

My immediate suspicion was that because the transaction wasn't committed, the rows weren't being inserted, so an alternative method of getting the row count would be to print the value of the @@ROWCOUNT variable at the appropriate time.  However, in the course of testing this, I made a bit of a mistake - I decided to try a nested transaction, to get a better understanding of the problem.

So here's some example code of a straightforward use of the INTO clause.

Working as expected.  So now let's open up a transaction and do the same.  

We have ten rows reported.  Let's query the table and see if the rows actually exist.

Querying the table returns the columns.  Yet the transaction remains open.  This behaviour is not what I was seeing from the developer's code, so I decided to play with it a little and see if we could force this expected behaviour by using a nested transaction - that is, to introduce ANOTHER transaction that deals explicitly with this insert, then leave the outer transaction open so all other behaviours could be rolled back.  So here's what I did.

So far, so good.  The COMMIT was hit, and the transaction committed, and using DBCC OPENTRANS confirms this (not shown).  So now I needed to test that this behaviour would work if the ROLLBACK was hit instead of the COMMIT.  So I modified the code as follows:

This has worked, right?  The inner transaction would have rolled back, leaving one open transaction?  Right?  Wrong.

In fact BOTH transactions were rolled back.  It didn't selectively roll back one transaction, even though the COMMIT only commits one transaction.  And here's the lesson.  When using ROLLBACK, all transactions are rolled back.  This is actually documented in BOL, and I kicked myself for not checking before messing with it.  Here's the quote from BOL:

"When nesting transactions, this [ROLLBACK] statement rolls back all inner transactions to the outermost BEGIN TRANSACTION statement." 

Here's an interesting thing.  I could have attempted to avoid this by using a transaction name, so e.g. BEGIN TRAN b and ROLLBACK TRAN b.  It would have errored though - it appears (though playing with this) that transaction b is subsumed by transaction a - @@TRANCOUNT is 2 as expected, but I cannot roll back b and DBCC OPENTRAN only shows the oldest active transaction, a:

So why was this such a bad thing?  Well, in the code above, -- some other stuff here was actually a couple of DELETE statements - that weren't supposed to be run without an explicit COMMIT from the developer.  Because the nested transaction did NOT commit and a ROLLBACK was issued, the rollback rolled back BOTH the open transactions, and the batch proceeded as normal and executed the DELETE statements, outside of a transaction context - leaving us no way of getting them back, bar recovery from a backup.

Luckily this wasn't a disaster in this case as the data wasn't of huge importance.  But it was a lesson for me on using nested transactions.  Many other sources on the web call them 'evil' and 'not to be trusted' - and now I know why.

Wednesday, October 15, 2014

The Symbiosis of Managers and Engineers

I was struck by a blog entry recently.  A senior executive from a company in the same field as a recent client of mine (am I being suitably anonymous yet?) wrote about his recent experiences trying to find more information on becoming an effective product manager.  Having first self-identified his deficiencies in this area, he was frustrated when, upon Googling the term, visiting a website about becoming a better product manager and downloading the recommended e-book, that the text in question was around 200 pages long with over 40 chapters.  In the remaining couple of paragraphs, he vented his disgust that there wasn't a five-minute summary or a short synopsis (beyond the blurb), or a summary chapter which he could read, so he could distil the essence and perhaps return to the finer detail of the chapters at a later date, as and when required.  He wrote that he wasn't 'the kind of person' to whom that format would appeal, and went further in slating the publishers for failing to consider the end-user (him) - cleverly relating this back to the very principles of good product management that the publishers failed to meet.

Now I got a little suspicious at this point - I tend to now, having gotten that bit older.  I'm a lot more suspicious of unsubstantiated claims.  So I tracked through, found the link, found the book.  In three short clicks I found the table of contents.  And the final chapter in the contents?  A summary of the key points from the entire book.  Three pages.  And it took me under a minute to find.  Why, oh why I opined, could this manager not take the trouble to read just SOME of the detail?  He would have discovered, had he scrolled to the end of the contents pages, this summary chapter.  Alternatively, from Googling the title and author I was able to find detailed Scribd notes with bullet-point synopses of each chapter.  Additionally, customer reviews also gave subjective opinions from readers who had already bought the book on Amazon.  None of what I did was very technical, just a case of looking for details.  As you might have gathered by the evident negative bias I'm already placing on this account, I'm a technician, or an engineer, and this kind of attention-deficit behaviour I find incomprehensible.  Alas, this lack of focus on detail is symptomatic of the management class, the big-picture people.

Now before you think, 'here we go, another rant against managers', I'd like to explain why I'm writing this -  I want to lay out some of the fundamental differences between the management and engineering camps as I see them, and provide an argument about why co-operation is vital between these two warring factions.  Feel free to substitute your own terms as you read - manager can be replaced with leader or non-technician, and the terms engineer and technician are used interchangeably anyway.  I'm interested in exploring my own thoughts on the dynamic between these two groups in a semi-structured way, yet I won't apologise for any tangents or rambling; although I'll try and keep the grammatical errors to a minimum. If you're still reading, hopefully you'll come along to the end.  

Sometimes, the worlds of management and the worlds of engineers seem miles apart.  Indeed, if you'll permit my generalisations, almost every trait varies between the two groups.  Take workplace fashion as a surface example; with managers dressed for business, smart shirts, ties and trousers, and engineers (or technicians, if you prefer the term), typically dressed for an evening in front of the XBox. It's not just physical characteristics that vary - the manager tends to be an extrovert, comfortable in front of large audiences, excited and energetic in the face of crisis, and the driving force behind many a team.  Whereas in contrast the typical engineer will be quiet, introverted, even withdrawn; some may say sullen, opinionated, comfortable in their own intellect but a poor team player and a worse leader.

Now before I get a hundred comments flaming me, accusing me of making crass generalisations, let me say that I'm working with just the stereotypes of the manager and engineer here - I fully recognise that each type has edge cases, and the condition of 'manager' or 'engineer' actually encompasses two broad spectra, on which can be observed many of the characteristics of people.  However for the sake of argument (and because you all probably know at least one manager and one engineer that conform to my definitions) let's accept these definitions and roll on. 

So let's look at the differences in time management between the manager and the engineer.  The manager fills their day rushing from meeting to meeting, with occasional stops at their desk to read their emails and dash off brief replies.  In transit, they stride along broken-necked, staring at their iThings.  They may carry around pieces of paper with fragments of project plans, or keep a notebook with minutes of meetings.  These meetings can happen in dedicated rooms, at desks, in corridors or on the phone.  These are people that 'do lunch'.  To an outsider observing management behaviour, it may seem that no real work is being done, that the manager is simply an outmoded and inefficient paradigm left over from the days of Filofaxes, Rolodexes and dinosaurs, rushing around talking to people about nothing at all, when everyone knows the REAL work these days is in technical roles, don't they?  Don't they?

Let's try and describe the job of a manager.  Managers have a very specific role, and the clue's in the title.  They manage (well, duh).  So what does this mean?  They communicate, empathise, associate, organise, motivate, prioritise.  They drive others, assess others, recruit others and sometimes fire others.  These are qualitative verbs, 'doing' words that don't immediately conjure tangible results.  What I mean by this is that organisation, communication, understanding - these don't seem 'real' in many ways, they don't present you with a finished product, a piece of code, a software module, a completed project - they simply seem too intangible to matter.  What IS it that managers actually DO?  Why do they have to spend the day communicating with others instead of producing quantitative, measurable output?  Compare these verbs with some engineering verbs.  Program.  Test.  Build.  Measure.  Analyse.  Maintain.  Diagnose.  The engineering verbs immediately suggest measurable, tangible, quantifiable output.  

But this isn't entirely fair.  This argument implies that the work of managers is somehow worth less than the work of engineers.  That the manager is there to make the bustle and the engineer is there in the background to deliver the product.  But counter to this line of argument, there is arguably a case that without the business bullshit, there'd be no work for engineers at all.

The skills of, for example, a project manager must include the ability to communicate complex ideas to different groups with different skill sets.  The PM must learn to 'talk tech' with the developers, the DBAs, the architects.  They need to understand the business data too, able to converse with the analysts and understand their needs.  The successful project manager must be able to take a thousand pieces of information and coalesce the ideas into meaningful, aggregated communications for different audiences, from the boardroom to the development team.  Can you imagine the terror of a new manager standing in front of his new technical team, their average IQ 148, with 3 PhDs in the room, introducing himself and subliminally trying to convince them how his addition to the team is a net benefit?  Then switching focus to the boardroom and doing it all over again?  Not many engineers could deal with the stress.  Can any management-resenting developer or engineer honestly claim to possess these skills?  

Indeed, I'll go further and argue that the stereotypical engineer is a self-confessed sociopath; anti-social and misanthropic, self-absorbed and arrogant.  The typical engineer isn't remotely equipped to use qualitative skills.  If you're a non-technician, stand up right now, find your nearest software developer and ask them to explain why TRUE isn't equal to NULL in simple terms.  Watch them squirm as they try and fail to find non-technical terms, to analogise, to explain.  They likely know the answer.  But they may laugh, or stutter, or struggle with embarrassment.  This works especially well if you're female and attractive.  Now repeat the experiment, but find a project manager and ask them to explain S.M.A.R.T. objectives.  They'll likely deliver a comprehensible, smooth, balanced answer and engage you in conversation about it afterwards.  Now do you see the difference between the two camps, and the benefits that management can bring to facilitating communication?  Given the right preparation, a non-technical manager is also capable of delivering complex TECHNICAL ideas too,  thus acting as an effective translation medium between the board and the shop floor. All he needs are the right facts.

Let's lighten the mood.  Here's one of my favourite quotes from the film Armageddon.  Billy Bob Thornton plays the hardcore military leader looking for answers from his geeky technical team on the approaching asteroid.  The technician is bearded, pale, unhealthily plump and terrified.  He is summoned to the war room and everyone's looking at him.  The dialogue goes something like this:

So give me a summary. 
<Technician><splutters - is shaking>
Uh.. well... (panics) ... We've been..uh.. looking at... uh...
<General><purposefully, not unkind>
Okay, I need someone who's had a little less caffeine this morning.

This is flippant, I know, but in this fictional example, the technician crumbles under pressure - lots of valuable technical information crammed into his head but without the ability to summarise and express it when it matters.  Billy Bob is playing the leader, the manager, the organiser.  There's nothing he personally, practically, using his hands, can do about the asteroid except marshal and drive the team that WILL do something about it, but he's confident in his actions and clear on his objectives.  And conversely, without his guidance, his orders, the drilling team who do the work would never get the chance to go to space, to get up there and sort out the problem in the first place as they don't have the necessary qualitative skills to arrange it.  Look at the misfits sent to space in Armageddon - of one, 'our toxicology results revealed ketamine'.  Co-operation between technicians and managers has to be a symbiotic relationship to work.

Without management providing other qualities such as the goals, the motivation, the team spirit and the drive, technicians like to think they would be coding to save the world, working on cutting-edge projects and developing new and exciting technologies, creating the new Facebook or inventing the hoverboard.  In reality, engineers would sit around on Reddit all day, or head home for an afternoon's nap.  Devoid of the ability to work in larger teams with disparate groups of people, devoid of the ability to plan for an event longer than the next guild raid, devoid of the ability to organise even their own wardrobe, let alone a complex juggernaut of a critical business project, technicians quickly lose business value.  Without external motivation, an engineer quickly becomes bored, restless, even depressed.  Anecdotally, I've seen a lot more developers quit their jobs than management.  Managers tend to stick it out, fix what's broken.  Engineers will cut and run when they reach some threshold of alienation from the business, from the environment around them.  I know, I've been one of them myself.  Because we lack longer-term focus, and get can frustrated by complex social situations (such as endless, endless meetings), we're more inclined to rage-quit, to get our coats on, shout fuck it and go home to bed.  Managers have a longer-term focus, and are better at juggling these longer-term priorities and goals.

Both camps have a lot to learn about each other in order to maximise the benefit and minimise the cost of association.  Management must realise that engineers normally enjoy their job, often to the point of obsession.  They enjoy wandering into work in an Atari t-shirt and trainers clutching a coffee-stained mug.  They enjoy the complexity of long, difficult problems to solve.  They will often work 12-hour days, then go home and spend the evening behind the computer.  They get job satisfaction from untangling knotty problems, rewriting software, designing hardware, creating TANGIBLES.  There's nothing less appealing for an engineer than having the impression that his whole day has been for nought.  And if you as a manager strive to create the ideal conditions for engineers to work (with yourself as a supporting actor and a large quantity of free coffee available), you'll get far more productivity, lower staff turnover and a higher level of morale from these teams than otherwise.

And us technicians, in turn, must realise that the job of management isn't to code up our data import module, nor advise us on code reuse or answer whether a function or a procedure is preferred.  Their job is to guide, provide an escalation point, help and support us into achieving the goals that we've been assigned.  It's to translate instructions between different groups, facilitate open communication, observe business procedures, obtain approvals and a hundred other intangibles that assist us in doing our jobs.

Tuesday, July 22, 2014

Back in the game...

Well, that was a long hiatus.  In the last 5 months I've completed 3 short-term DBA contracts and been everywhere from Chester to Manchester, Stoke-on-Trent to Coventry, Trafford to Milton Keynes.  What with my busy day-to-day role and being Dad to my three Lords of Misrule - my kids - I haven't found time to actually sit down and write.  So here I am, in a charming(!) hotel room in a town called Binley, finally finding time to re-acquaint myself with my keyboard.

Luckily, my American employers over at have been very understanding about my complete lack of output for the last 6 months.  I'm making amends, I've just sent over a new, fresh article on using OLE automation extended stored procedures in SQL Server - procedures which will allow you to access OLE objects such as FileSystemObject, enabling you to call methods and get properties from these objects back into SQL Server.  A little easier than using PowerShell, and hopefully of use to some data hacks out there.  Watch this space.

What else this week?  Once again, I find myself supporting unsupported SQL Server installations - my primary database servers this time are 2008, which isn't too bad, but there's a few linked servers including SQL Server 2000.  I've just unlearned how to use DATABASEPROPERTYEX, I've no wish to go rooting through archived TechNet articles again, but that's the way it goes sometimes.

I was shocked to find out Monsters Inc. is 13 years old!  The actress who played Boo is now approaching 15, a far cry from the 2-year-old who had to be chased around the set with a mic as she wouldn't stand in one place long enough to say her lines.  This film is more than three times older than my youngest child ... I can feel the grey hairs growing already.  I'll soon be buying one of those horrible grey hair cardigans and perfecting the comb-over, then there's no turning back.

I was sad that I couldn't attend SQLBits XII ( this year, despite registering to attend months ago.  What with the timing of a new contract and my recent wedding, it simply wasn't possible to go.  I'm hoping to catch up with some live sessions over on the website and see if I can get hold of last year's, too.

Recently I read a great book by Isaiah Hankel PhD, called 'Black Hole Focus'.  I'm a sucker for self-help books, and recognise my weakness, but this truly is a great volume.  One key lesson I took from it was how to motivate yourself - Dr. Hankel suggests creating a motivation board, a board that you hang on your wall with your goals in a brainstorm around your name, or the words 'My Goals'.  A little cheesy, yes, but I'm giving it a try - I got the Pritt-Stick and scissors out on Sunday night and created my first collage since art lessons at school.  And ... it works.  I've hung it in front of my office desk (at home) where I'm now spending three days a week.  When you're looking at it every day, it really helps you to focus, and I've got a clear plan for the next few months.

Next up, identifying those skills which are going rusty and getting a plan together to toughen up my training in them.  I'm thinking data science, particularly with respect to statistics and using R.  I have a strong suspicion the new 'BI', the new 'Big Data' is going to be 'Data Science', and analysts who can interpret data, analyse it and draw conclusions (MI, in other words, and exactly what I'm lined up to do for the next three months) are going to be in high demand.

Tuesday, February 25, 2014

Getting Around Strict GROUP BY Restrictions in SQL Server

SQL Server adheres to the ANSI-SQL standard, in that column names in queries with GROUP BY clauses (aggregate) queries must be included in the GROUP BY or in the aggregate clause itself.  

However, sometimes you don't want to group by a certain column.  Why would that be so?  Say, for example, you want to query the msdb.dbo.backupset column for information on your last successful backup for each database, and you also want to return the backup size for that backup, too.  Grouping the database name on MAX(backup_date) is fine, but throw in the backup size and you'll end up with a separate row for each database name, distinct backup size and maximum backup date.  I'll show you what I mean.  

Let's try writing a query to return the database name, backup size and the last backup date, grouped by database name.

SELECT   database_name, backup_size, MAX(backup_finish_date) 
FROM   msdb.dbo.backupset 
GROUP BY  database_name, backup_size

All very well ... except we have multiple rows per database.  So, let's try the next logical step - putting a MAX(backup_size) into the mix:

SELECT   database_name, MAX(backup_size), MAX(backup_finish_date) 
FROM   msdb.dbo.backupset 
GROUP BY  database_name

Did this work?  On the face of it, yes.  But let's not be so hasty - let's query the whole dataset on our interesting columns to see if the information presented is, in fact, true:

SELECT database_name, backup_size, backup_finish_date 
FROM msdb.dbo.backupset 

No!  As you can see, the former query with two MAXes returned a row saying that SANDBOX was last backed up on 2014-02-25 20:22:28.000 with a size of 9325877248 bytes.  This is not true.  The last backup date was 2014-02-25 20:22:28.000 with a backup size of 2727645184 bytes, as seen below.  The double MAXes have mangled the result set, displaying incorrect data!  Why?  Because we simply concatenated both MAX values - in the former query, they are independent, and we need them to be coupled.

Why is this query so difficult to write?  This is because T-SQL has a strict adherence to the GROUP BY standards defined in ANSI-SQL unlike some other database engines, such as MySQL.  The way to get around this is to query back on the data in a different way - a self-join, but on a slightly altered GROUP BY syntax.  This will effectively allow us to JOIN one aggregated query on the data (by backup_finish_date) on another aggregated query (by backup_size), as follows:

SELECT b1.database_name, b2.backup_size, b1.last_backup_date
SELECT b.database_name, 
MAX(b.backup_finish_date) [last_backup_date]
FROM msdb.dbo.backupset b
GROUP BY b.database_name ) b1 
SELECT b.database_name, b.backup_size, 
FROM msdb.dbo.backupset b ) b2
ON b1.database_name = b2.database_name 
AND b1.last_backup_date = b2.backup_finish_date 

Why does this work?  Basically, the first sub-query contains the last backup date per database.  By inner joining on both the database name and the last backup date to the ungrouped result set from msdb.dbo.backupset, we can bring the backup size for that database name and backup date (the maximum per database) into the result set, resulting in a listing of database name, backup size and last backup date (with the backup size correct for the backup date).

This is one way of getting around the GROUP BY restrictions in SQL Server - I'd be keen to hear more.  If you have any alternatives, please feel free to leave a comment below.

Thursday, February 20, 2014

Update - New Consultancy Launched!

I am pleased to announce the expansion of Optimal Computing into database consultancy - find the site over at .  You can get in touch by e-mailing or calling on +44 (0) 161 731 0037.  Optimal Computing specialises in data management, with an offering at the moment on SIMS application support for schools and colleges. 

More Stuff Published!

The good people over at have been kind enough to publish many of my articles over the last couple of years, and the articles have been building up.  Find my profile over here --> .

I Went Contracting!

Took the leap in September and I'm contracting now, instead of being a permanent DBA.  On my second contract so far and doing well, having fun and earning money.  If you know of any contracts in the North West of England, do get in touch - you can find out all about me on my LinkedIn profile which redirects from here ->

Get In Touch!
I might be an introverted, slightly aggressive, permanently angry misanthrope in real life but that's just my DBA side showing through :-)  No, really, I'm uber-social and welcome contact from anyone in the IT arena - from 'let's be friends' (recruiters, normally) to 'Help!  It's all falling down around my ears!'.  You can reach me at or follow me on Twitter: @dcolleySQL. 

Friday, December 13, 2013

SQL Server - Auditing sessions over time - Code example

So today I had a requirement to provide a quick and dirty way of collecting data on the sessions connected to SQL Server.  Specifically, I had to write a job that collected basic data on the current sessions, and if these sessions had not been previously recorded, record them into a table.  If the session was identical in all ways other than the session ID, then a counter called 'session count' is incremented for that session.

The challenging part here was providing what is basically a mashup of an INSERT and UPDATE statement, sometimes called an UPSERT.  In SQL Server 2008 and later versions, this functionality is provided using the MERGE statement.  My requirement here is to update a table with values if they exist; otherwise, insert them.  

So here's an example of how to use this MERGE statement in the context of my requirement. The example code is below.  You can modify it to suit your purpose - for example, write out to a permanent table, put it in a job, query the table in a different way, monitor requests rather than sessions, etc. or re-write it entirely, or simply use it as an example of MERGE for reference.

There's also a self-join in there which provides the columns on which to match.  This isn't strictly essential but I needed to match on columns then only update the last login time and last request end time if matched.    The outer query simply aggregates the results in the target table to provide a good view of the data.

Note this won't collect data on sessions which are opened and closed between running the collection.  So if you modify this code by e.g. using a permanent table and putting this code into a job (can't use a temp table in this context), run the job frequently.  If you run it e.g. every 15 minutes and a session is created and destroyed between the job execution intervals, you won't capture the data.

 -- ---------------------------------------------------------------------------------------------  
 -- Script to audit sessions on the server, accumulate session counts per distinct group of   
 -- values from sys.dm_exec_sessions, will ignore duplicate sessions (by session_id and unique   
 -- set of values), will increment when a new session is started that has the key values   
 -- matching, will insert when a brand new session is opened. Uses MERGE.  
 -- Run inside a stored procedure, or on the query window, multiple times over the course of X   
 -- hours to get a full list of sessions.  
 -- Add filters if necessary on the final SELECT.   
 -- Change #Logins to a permanent table if required.  
 -- Works only in SQL Server 2008, R2, 2012 and 2014.  
 -- Author: Derek Colley, 13/12/2013  
 -- ---------------------------------------------------------------------------------------------  
 -- TRUNCATE TABLE #Logins -- for debug  
 IF NOT EXISTS ( SELECT name FROM tempdb.sys.tables WHERE name LIKE ('#Logins%') )  
     CREATE TABLE #Logins (   
         uqid INT IDENTITY(1,1),   
         session_id INT,   
         last_login_time DATETIME,   
         host_name NVARCHAR(128),   
         program_name NVARCHAR(128),   
         client_interface_name NVARCHAR(32),   
         login_name NVARCHAR(128),   
         last_request_end_time DATETIME )  
 DECLARE @CurrentSessions TABLE (   
     session_id INT,   
     login_time DATETIME,   
     host_name NVARCHAR(128),   
     program_name NVARCHAR(128),   
     client_interface_name NVARCHAR(32),   
     login_name NVARCHAR(128),   
     last_request_end_time DATETIME )   
 INSERT INTO @CurrentSessions  
     SELECT        session_id, login_time, host_name, program_name, client_interface_name,   
                 login_name, last_request_end_time  
     FROM        sys.dm_exec_sessions  
     WHERE        session_id <> @@SPID   
     AND            session_id > 50  
 MERGE INTO #Logins   
 USING (   
         SELECT        c.session_id, c.login_time, c.host_name, c.program_name, c.client_interface_name,   
                     c.login_name , c.last_request_end_time  
         FROM        @CurrentSessions c  
         LEFT JOIN    #Logins l   
         ON            c.session_id = l.session_id   
         AND            c.host_name = l.host_name   
         AND            c.program_name = l.program_name   
         AND            c.client_interface_name = l.client_interface_name   
         AND            c.login_name = l.login_name    ) AS CurrentSessions  
     ON        #Logins.session_id = CurrentSessions.session_id   
     AND        #Logins.host_name = CurrentSessions.host_name   
     AND        #Logins.program_name = CurrentSessions.program_name   
     AND        #Logins.client_interface_name = CurrentSessions.client_interface_name   
     AND        #Logins.login_name = CurrentSessions.login_name  
     SET            last_login_time = CurrentSessions.login_time,  
                 last_request_end_time = CurrentSessions.last_request_end_time  
     INSERT        (    session_id, last_login_time, host_name, program_name, client_interface_name,   
                     login_name, last_request_end_time )      
     VALUES        (session_id, login_time, host_name, program_name, client_interface_name,   
                 login_name, last_request_end_time );  
 SELECT        COUNT(*) [session_count], MAX(last_login_time) [last_login_time], host_name, program_name,   
             client_interface_name, login_name, MAX(last_request_end_time) [last_request_end_time]  
 FROM        #Logins   
 GROUP BY    host_name, program_name, client_interface_name, login_name  

Monday, June 24, 2013

Vendors! Wake up and hire a DBA!

Time to have a rant about third-party vendor databases.  Yes, I'm looking at you Sage, VersionOne, and most especially, YOU, Hybris CRM.

Yes, you.  Mr. 'Let's not use clustered indexes.'  'Let's model our schemata using ORM.'  'Let's call our primary key column 'PK' on almost every table.  Let's not bother naming any of our constraints.  And using stored procedures or functions - that's so 1990s, man.

I mean, why NOT use BIGINT for every single integer column.  Or VARCHAR(255).  No, wait a minute, that's way too efficient.  Let's use NVARCHAR(255) instead.  For everything.

Take a look at this, part of the table definition from a vanilla Hybris CRM installation:

CREATE TABLE [dbo].[addresses](
[hjmpTS] [bigint] NULL,
[TypePkString] [bigint] NOT NULL,
[PK] [bigint] NOT NULL,
[createdTS] [datetime] NOT NULL,
[modifiedTS] [datetime] NULL,
[OwnerPkString] [bigint] NULL,
[aCLTS] [int] NULL,
[propTS] [int] NULL,
[p_dateofbirth] [datetime] NULL,
[p_middlename2] [nvarchar](255) NULL,
[p_streetname] [nvarchar](255) NULL,
[p_contactaddress] [tinyint] NULL,
[titlepk] [bigint] NULL,
[p_phone1] [nvarchar](255) NULL,
[p_remarks] [nvarchar](255) NULL,
[p_firstname] [nvarchar](255) NULL,
[p_phone2] [nvarchar](255) NULL,
[originalpk] [bigint] NULL,
[p_fax] [nvarchar](255) NULL,
[p_shippingaddress] [tinyint] NULL,
[p_streetnumber] [nvarchar](255) NULL,
[p_gender] [bigint] NULL,
[p_url] [nvarchar](255) NULL,
[p_district] [nvarchar](255) NULL,
[countrypk] [bigint] NULL,
[p_lastname] [nvarchar](255) NULL,
-- continues on for a loooooong time...

'p_gender' - BIGINT??? Really?  So there's 2^64 -1, or 18,446,744,073,709,551,615 possible genders, are there?  On what planet?  Or how about 'PK' for the primary key?  In column 3, of all places?  I sincerely hope the user isn't intending to put some long ASP.NET-generated URL in 'p_url', since it's limited to 255 characters.  And 'p_dateofbirth' clearly needs to store time information too, to the millisecond, as that's important when profiling your customers.

I'm dreading the day when we actually start using your hated system in production and your table scans and parameter-sniffed execution plans slam into my servers like a colossal tidal wave of shit.  I can absolutely guarantee that when I ring the vendor to complain I'll get through to some poor sod on a mere handful of Vietnamese Dong per hour telling me 'prease to call back in mornink'.  Believe me when I say I'm beginning to batten down the hatches now, and when 'main screen turn on' I'll be hiding in the server room, watching our monitoring software with my fist in my mouth.

Sage, you're not off the hook.  No matter how hard you insist, a database is not a 'company'.  It's a database.  I admire your overall design, placing your metadata in a separate DB and company / entity-specific information in separate DBs.  I don't appreciate you writing metadata to 'master' and bursting into tears when I remove it.  Nor, frankly, are your table structures much cop either.  It's not hilarious to dress up your SQL Server-related errors in another error wrap, spreading confusion and delay among the various support teams:

'An unexpected error has occurred.  SQL code 208.' (Sage)
'Msg 208, Level 16, State 1, Line 1 - Invalid object name ...' (SQL Server)

Why not just return the second message?  The problem will land in the lap of the DBAs anyway, won't it?  Save time!  'An unexpected error has occurred' simply means 'Something went wrong and I (the developer) can't be fucked to write a coherent error message, preferring to let support teams scramble to look up SQL Server error codes in sys.messages until the application dies a death and they switch to something written properly.'

Vendors!  Wake up!  It's time to start reconsidering old-fashioned concepts like 'testing' and 'good design'.  I know we're all about Agile now, but 30 years ago we had good, solid texts like 'The Theory of Relational Databases' (David Maier - free eText here) full of things like ... wait for it ... relations!  Functional dependencies!  Normalisation!  I know, yawn, yawn, why not just let nHibernate create it all for us, yeah, yeah ... But there's good, solid systems out there right now using databases built on these basic principles.  

Vendors, don't give up your day job.  Develop great apps and leave the database administration to the professionals.

The information and views expressed in this article are my own and do not represent the opinion or views of my employer or any third-party.