When you are data modeling, entities have attributes. During the early part of a project, it’s possible to map out many of these attributes, but often you miss some–the requirements change, the customer changes their mind, or the architects missed something at the beginning. Depending on how you’ve modeled those attributes, the pain of adding, modifying or removing them can be mellow or intense. In addition, often the values stored in these attributes need to be queried, or modified themselves.
Suppose you have a employee table (I’m trying to do this with SQL 92 syntax but I am not a DBA):
create table emp (emp_id numeric, first_name varchar(100), last_name varchar(100), dept varchar(50));
and you suddenly need to add middle name and salary to this table. There are three ways to create an extensible schema in a relational database. Each of these has their pluses and minuses.
1. The DDL method of extension
alter table emp add middle_name varchar(100), salary numeric;
Here, you simply add another column. For querying and clarity, this method is fantastic. These columns can be indexed and it’s clear that employees now have two additional attributes. However, this also means that any mapping you have between your model objects and your database needs to be updated; probably code needs to be regenerated to add these two attributes to your model objects. It also means that you have to version your database–since code that expects a middle_name attribute on employee will probably die a horrible death if that column is missing. In addition, depending on the size of the system, you might need to get a DBA involved.
2. The DML method of extension
create table emp_attributes (emp_id numeric references emp(emp_id), name varchar(100), value varchar(100));
insert into emp_attributes (1, "middle_name", "Sam");
insert into emp_attributes (1, "salary", "100000");
In this case, you can add attributes without getting a DBA involved–you simply add columns into this table. However, there is no referential integrity on the name of the attribute (is middle_name the same as mid_name the same as MIDDLE_NAME?–though, to be fair, you can put constrains on the values of the
name column). In additional, the
value column is not typed; though almost any data type can be stored as a string, you can lose precision and waste time converting from string to the actual type you want. In addition, querying based on these attributes is tedious:
select first_name from emp e, emp_attributes ea where e.emp_id = ea.emp_id and ea.name = "name" and ea.value ="Sam"
If you want to get all employees paid more than Sam, you need to resort to database specific functions to convert that string to a number.
3. The stored object method
alter table emp add objectdata varbinary;
With this method, you create a object or data structure in memory and serialize it to a stream of bytes which you then store in the objectdata column. This is great because you can add whatever attributes you like and the database structure doesn’t need to change at all. However, the data is unreadable by normal SQL tools and other programming languages. Querying on this data also becomes very difficult and slow, as you end up having to recreate each employees data object and test conditions in the programming language–you’re not taking advantage of the database.
There are a couple of questions that can help you determine which method you should use: How often will attributes be added? How hard is the process for that? How difficult is it to regenerate your data layer? Will you want to use SQL tools?
In general, the DDL method is the best option. It’s just the cleanest, easiest to understand and query against. The DML method is the second best, as you can still use most of the SQL toolset, even if it’s more complicated. The stored object method for extending attributes in a relational database should be used carefully, when there are a large number of attributes which can change often and will never be queried upon.