Skip to content

Extending attributes of entities in relational databases

When you are data modeling, entities have attributes. During the early part of a project, it’s possible to map out many of these attributes, but often you miss some–the requirements change, the customer changes their mind, or the architects missed something at the beginning. Depending on how you’ve modeled those attributes, the pain of adding, modifying or removing them can be mellow or intense. In addition, often the values stored in these attributes need to be queried, or modified themselves.

Suppose you have a employee table (I’m trying to do this with SQL 92 syntax but I am not a DBA):
create table emp (emp_id numeric, first_name varchar(100), last_name varchar(100), dept varchar(50));
and you suddenly need to add middle name and salary to this table. There are three ways to create an extensible schema in a relational database. Each of these has their pluses and minuses.

1. The DDL method of extension
alter table emp add middle_name varchar(100), salary numeric;
Here, you simply add another column. For querying and clarity, this method is fantastic. These columns can be indexed and it’s clear that employees now have two additional attributes. However, this also means that any mapping you have between your model objects and your database needs to be updated; probably code needs to be regenerated to add these two attributes to your model objects. It also means that you have to version your database–since code that expects a middle_name attribute on employee will probably die a horrible death if that column is missing. In addition, depending on the size of the system, you might need to get a DBA involved.

2. The DML method of extension
create table emp_attributes (emp_id numeric references emp(emp_id), name varchar(100), value varchar(100));
insert into emp_attributes (1, "middle_name", "Sam");
insert into emp_attributes (1, "salary", "100000");

In this case, you can add attributes without getting a DBA involved–you simply add columns into this table. However, there is no referential integrity on the name of the attribute (is middle_name the same as mid_name the same as MIDDLE_NAME?–though, to be fair, you can put constrains on the values of the name column). In additional, the value column is not typed; though almost any data type can be stored as a string, you can lose precision and waste time converting from string to the actual type you want. In addition, querying based on these attributes is tedious:
select first_name from emp e, emp_attributes ea where e.emp_id = ea.emp_id and ea.name = "name" and ea.value ="Sam"
If you want to get all employees paid more than Sam, you need to resort to database specific functions to convert that string to a number.

3. The stored object method
alter table emp add objectdata varbinary;
With this method, you create a object or data structure in memory and serialize it to a stream of bytes which you then store in the objectdata column. This is great because you can add whatever attributes you like and the database structure doesn’t need to change at all. However, the data is unreadable by normal SQL tools and other programming languages. Querying on this data also becomes very difficult and slow, as you end up having to recreate each employees data object and test conditions in the programming language–you’re not taking advantage of the database.

There are a couple of questions that can help you determine which method you should use: How often will attributes be added? How hard is the process for that? How difficult is it to regenerate your data layer? Will you want to use SQL tools?

In general, the DDL method is the best option. It’s just the cleanest, easiest to understand and query against. The DML method is the second best, as you can still use most of the SQL toolset, even if it’s more complicated. The stored object method for extending attributes in a relational database should be used carefully, when there are a large number of attributes which can change often and will never be queried upon.

Problem solving via email composition

I’m currently struggling with an open source project, working to make the UI look like the one the designer showed the client. It’s a bit frustrating, because it’s sleuth work, and the clues are all in the code. Code I’ve not worked with, but with which I am rapidly gaining familiarity.

I cannot imagine how frustrating this would be if it were a closed source tool. Check that, I can. I worked on a project a few years ago with a closed source portal product, and it simply wouldn’t do what we asked it to do. Well, when you’re a small company, you do what you need to do in order to get the job done. This included some decompilation I’m sure voided our warranty, but delivered the client what was promised. It’s awful when you’re put in that position.

Regardless, I’ve been able to tweak this open source project to make it do what I need. When I find myself facing an issue that is not covered in the on line documentation, I’ve found that the best way to solve the problem is to start writing an email to the user list.

Now, because it’s written communication to a group of peers, some of whom are quite knowledgeable about the subject, I’m fairly careful about the content. I outline the problem carefully, explain the searches of the mailing list and web that I’ve already undertaken, and detail any avenues of solution that I’ve already explored. This rather rigorous analysis of the problem often leads to other glimmers of solutions, log files I have forgotten to check, and more hypotheses to prove or disprove. As I build the email, the problem becomes more and more manageable.

In my efforts to not appear a fool to peers, I reckon I have not sent about seventy percent of the emails I’ve composed–either I found another solution, I missed some crucial bit when before I started the email, or after doing due diligence on the problem I discovered the answer on the mailing list. It’s a bit weird, but crafting an excellent ‘can you help me’ email to a user list helps me even if the email isn’t sent.

Coding Standards

We’ve all been caught up in the religious war of coding standards. Whether it’s ‘tab vs spaces’ or ‘3 spaces vs 4 spaces’ or ‘curly brace on same line as if clause or curly brace on line below if clause’, these arguments can take up a significant amount of time on a project. Ken Arnold has an interesting post where he recommends that coding standards be integrated into the language specification, and enforced by the compiler. His reasons:

1. There’s no real productivity difference between one coding style and another.

2. There’s real productivity lost in setting up pretty printers, re-formatting code manually, and arguing about the better style.

3. If programmers have latitude, they’ll use it. (See perl.)

So, take away the freedom of programmers to format their code in a non standard manner. Make coding style part of the compiler–not as a warning though (as Bruce Eckel says “when you discover you can ignore them [warnings], you do”)–actually have the compiler error out if the formatting is incorrect.

Sure, I’d wail along with the rest of the coding community, for a while. But then I’d adjust, especially if the tools were there. And I’d probably be more productive. Talk about tough love.

Right tool, right job, part II

Here’s a nice explication of ‘choose the right tool for the job,’ by a pundit. The entire J2EE stack is often overkill, and I’ve seen shops that would have been much better served by a series of PHP applications, perhaps with java on the backend for heavier duty processing and legacy code reuse.

My only quarrel with his analysis is that he assumes we know how long an application is going to live. How many applications that started as prototypes get rolled to production and are the cause of endless headaches five years down the road?

As in any complex situation, when designing an application, and choosing how complicated to make its architecture, you need to balance between two principles:

1. It’s easier to do things right the first time. You’re in the code, you have an understanding of the requirements, there’s a green field, you have a budget–therefore making sure the application works is easier when you’re first writing it.

2. You aren’t gonna need it. This slogan, from the Agile folks, says that you have no way of knowing what you’re going to need five years in the future, so why try to design for that?

Taken too far, each principle is disastrous. If you follow principle number one fully, you’ll end up in analysis paralysis. If you take point two to heart, you’ll miss out on big chunks of functionality that have to be retro-fitted on later, to the detriment of a schedule and possibly the design.

That’s why they (architects, not pundits) get paid the big bucks.

How to evaluate an open source project

There are a fantastic number of open source projects out there, on SourceForge, apache, and elsewhere. This is fantastic because you can often leverage other work that folks have done, as well as knowledge and mistakes they’ve made. However, it can be extremely difficult to evaluate accurately how much such projects can help you, especially if you’ve not used them before, or if they are obscure. In addition, you probably don’t have a lot of time to choose a solution–clients that go with open source solutions tend to have budget constraints. I present here a quick checklist that I use to evaluate projects that I’m going to depend upon:

1. Know your drop dead features. These are features that the software package must have in order to be considered. Be careful not to make this too long. The primary purpose of this list is to allow you to quickly exclude packages that you know won’t work for you, and if you make it too long, you might be left with no options.

2. Look at the documentation attached to the project. This is the first place to start ruling a project out–if it doesn’t promise the features you need, move on. Also, look at a demo or screen shots, if possible. This lets you see how the package works. Compare behavior with the list of needed features.

3. Install the software. If you have difficulty installing it, that’s not an insurmountable issue–often open source projects aren’t the smoothest installations. However, installing it and spending a few hours playing around with this software that is going to be a significant part of your project can let you know if the impressions you received from the demo and documentation are correct–is it going to be easy enough to tweak/deploy/extend this software package?

4. In the world of open source support, the mailing list is king. Does this project have a mailing list? Is it archived? Is it googled? How active is it? If there’s no mailing list, is there a set of forums? The mailing list (or forum) is where you’re going to go when you have a smart question to ask, and you will, so you want this resource to be as strong as possible.

5. Look at the documentation again. The first time you did so, you were just looking to exclude a project based on feature set. This time, you want to see how much the documentation can help you. Is there a tutorial? Are the advanced topics that concern you covered? For java projects, is there javadoc? Is it more than just the methods and arguments that are automatically generated? What version of the software is the documentation for?

Of course, the more I depend on a piece of software, the more time I want to spend on evaluation. On the other hand, the process laid out above is entirely technical in nature, and, as we know, there may be other reasons for choosing, or not choosing, software. Installed base, existing team experience and knowledge, project timeline, or the fact that the CEO’s brother is on the board of a company with a rival offering all can influence software package choice. There are many factors to be considered, but this list is how I start off.

Expresso authentication and authorization

I’ve only briefly worked with Expresso. But I’ve heard good things about it. However, one ‘feature’ is really chapping my hide at the moment. Apparently, the only way to authenticate someone is to call the attemptLogin method on a ‘Controller’ object (a subclass of a Struts Action), which is protected and takes, among other things, the http request and response. There’s no way I can find to just pass in a username/password and authenticate. In addition, the authorization system is not broken out either. In OO fashion, you ask an object if a user can access it, and the object knows enough to reply.

I’m not trying to rag on the Expresso developers. After all, they are giving away a fine, full featured java web framework for free. But this just drove home to me how important it is in web applications to have the classes that talk http be nothing more than a thin translating layer around business classes. For instance, all a struts action should do is convert http forms to domain specific value objects, and then call business methods on business objects.

If this was the case in Expresso, it’d be trivial for me to leverage Expresso’s existing authentication model–I’d just have to fall the methods on the business object, perhaps after creating a domain specific value object. Now, however, I’ll probably have to monkey around with the http request and response, and decode exactly what parameters it wants, and fake those up.

Open source portal search

I’ve been looking at some open source portals. My client has an existing java application, written in Expresso that has some reasonably complex logic embedded in it. Additionally, it’s massively internationalized, with dynamic international content coming from a database, and static content coming from a set of resource bundles. There’s an existing process around updating both of these sets of data. And when we’re talking internationalization, we’re talking Asian character sets as well as the European character sets.

So, the criteria for the portal were:

1. Support for multi-byte character sets and easy localization.

2. Ability to integrate with Expresso’s authentication and authorization systems.

3. Support for normal portal features–adding/moving/removing portlets, minimize/maximize portlets.

4. Documentation.

I looked at a fair number of portals, including jcorporate’s own ePortal, eXo, Liferay, Jetspeed 1, Jetspeed 2, and Pluto (a last alternative, to be avoided if possible, is to roll our own portal-like application). First, I looked at ePortal, but that’s a dead project, with no releases. Then, I installed pluto, which seemed like it would be a perfect fit to be integrated into Expresso. However, integrating pluto looked complex, and after installing it (fantastic instructions for installing pluto here), I realized that pluto did not have a layout manager that would allow for the addition, rearranging or moving of portlets.

I then battled with Jetspeed 2, which involved installing a subversion client and building from source. This looked to be pretty cool, but the sheer lack of documentation, and the fact that there have been no releases, caused me to shy off. This is no failure of Jetspeed 2–this is what projects in development are like; I think it will be a fine project when done but my client just doesn’t need to be on the bleeding edge. I also took a quick look at Liferay, which seems to be a much more full featured portal application than we needed. After reading this blog on portals I decided to take a closer look at eXo. However, the documentation wasn’t fantastic, and it wasn’t readily apparent how to plug in authentication.

I also downloaded and installed Jetspeed 1; if you download the src distribution, you get the helpful tutorial. While Jetspeed 1 is not a standards based solution (I expect most of the portlets will be custom developed anyway), the user community is fairly active, as indicated by the mailing list, and I’ve found the documentation to be extensive. In addition, it meets the localization requirements and the pluggable authentication and authorization systems.

I’m less than thrilled about having to use maven for builds. Others have said it better than I, but it’s just too much for my needs. However, I was able to get an independent directory tree for my project by copying over the maven.xml, project.properties, and project.xml from the tutorial directory to an empty directory. Then I tweaked the project.* files, ran maven jetspeed:genapp, tweaked a few settings in TubineResources.properties to make sure the localization settings were correct, and, voila, I have a working project tree, that, using the Jetspeed maven plugin, is one command away from a deployable war file.