Why enums in the database are a good idea

I had a discussion about database enums on StackOverflow at one point, and I decided to expand on it and write down why I think they are a good idea.

There are times when your schema is fairly simple and you only work with the database through your application and have a nice GUI, so you don’t mind that your tables are filled with endless columns of opaque numbers.

However, I had a job where I spent heaps of time in psql running queries to track down issues or find patterns in data or produce one-off reports. I was using Postgres 8.3 at the time, and I found myself wishing for enum fields as they would give me

  • simpler, shorter, more readable queries
  • easier to read query results in psql
  • fewer joins to type out and make mistakes in
  • fewer table names to remember.

Database enums produce similar benefits in your application code, regardless of whether you are using an ORM library, so I was pleased that they were introduced in Postgres 8.4.

I’d like to address some concerns about enums:

1. Enums are case sensitive, spaces are significant.

This is no different than a lot of programming languages, so developers are well equipped to handle this issue.

2. Enums are sometimes not comparable.

True, and that’s why you choose the right tool for the right job - don’t use an enum when you need to compare values, or write your own comparison function. I would say that there are heaps of cases where you have 2-3 possible values of an attribute and you don’t need to compare them - enums are great for that.

3. Enums do not allow translation into other languages.

Absolutely, so you should treat enum values same as you would treat integers - they are internal values, and you look up the translation for the value as needed. Enums are there for the developer’s benefit (i.e. better readability and simpler queries), and their values probably shouldn’t go straight to the UI.

4. Enums are not ANSI standard and are therefore implemented differently (if at all) in different database engines.

Sure, and this is where a layer of abstraction like an ORM library can hide those details.

5. When using a text string to compare within code it is possible to misspell it and the language will not warn you.

People say similar things about dynamic languages but it turns out that better unit testing and the increase in productivity more than offset this issue.

6. Enums are ordered in sorts in the order in which they were created.

If you’ve got integer constants, your sorts are only going to work until requirements change and you need to insert a new value right in the middle of your carefully arranged range to get the right sort order. In practice, you’ll probably end up calculating an extra sort field in queries regardless of whether you have an enum or an int column, unless you are happy to update all the values. Otherwise it’s the same as point 2 - don’t use an enum if it’s not going to do the job.

7. Enums are not easily maintained, and lock the database into a version of the world when the database was designed.

It is possible to alter tables, and database migrations (e.g. in Ruby on Rails) are not a difficult task to handle. The previous point shows that integers have their share of problems anyway. As a side note, everyone is using automated migrations, right? I have run into issues with altering the schema when working with large tables (tens and hundreds of millions of rows) as it was simply taking unacceptable amounts of time. However, this is hardly an enum specific issue, and I maintain that at that scale you need to architect your database and applications with these issues in mind (e.g. by including a switchover node to handle schema changes).

To summarise, I am not advocating mixing identity and description with enums, what I am advocating is ease of use and enums provide that in a lot of cases. However, like any other tool they have to be used appropriately.