Supporting Multiple Databases in Applications

Tuesday, March 29, 2011

In general, having a persistence-ignorant application provides a lot of flexibility.  Allowing us to easily port between various RDBMS’s, NoSQL data stores, text files, the cloud, or simply storing our data on papyrus managed and maintain by a group of beer-making monks provides us with tangible opportunities to cross-sell our (non-hosted) application to multiple clients.

As any write once, run anywhere scheme, support for multiple databases can be notoriously difficult.  Consider the following SQL Server T-SQL code:

Wait…what?  I just created FOO, yet SQL Server is telling me that the table foo doesn’t exist.  What’s going on?  In this case, the database default collation has been changed to Latin1_General_Bin, a binary collation, so FOO != ‘foo’.  As a result, the following query also provides no results:

To get any data back from this table, we need to match exactly:

Great, so I’ve jacked the default installation collation in SQL Server.  Not a lot of people or organizations do this?  But…is this an academic exercise?  If you want to support multiple databases and clients, the answer is a resounding NO.  Consider:

  1. Some organizations may want the application to work on a database with binary collation.  Rare, but it could happen.
  2. Organizations might be using another database, and not all databases are case insensitive, even by default.

If you want a portable application, #2 is your much more concerning issue.  Recently I verified default installations on a number of popular database systems to determine what behavior they produced.  Here are my findings (in alphabetical order):

Table/Column Names Data
Oracle Insensitive Sensitive
Postgres Insensitive Sensitive
MySQL Sensitive Insensitive
SQLLite Insensitive Sensitive
SQL Server Insensitive Insensitive

The most curious finding is MySQL.  I’d love to know what reasoning was used to come to the conclusion that Table and Column names should be case sensitive, but data should be considered insensitive.  In any case, programming for the least common denominator requires handling these situations. 

Ideally you’d leave the SQL to an ORM.  If your application is unlikely to switch database platforms and an ORM is out of the question, I’d recommend a RDBMS-specific data access layer.  If you’re application absolutely must have a generic data layer, though, you must keep this in mind as part of design.


Emil's Wicked Cool Blog