Skip to content

Useful tools: p6spy

This entry kicks off a series of entries where I’ll examine some of my favorite tools for development. Some of them will be long, some short, but all of them will highlight software I use to make my life a bit easier.

A large, large chunk of the development I do is taking data from a relational database to a an HTML screen, and back again. Often there are business rules for transforming the data, or validation rules, but making sure the data is stored safely and consistently is a high priority, and that means a relational database.

However, I do much of my work in java, which means that the relational-OO impedance mismatch is a common problem. One common way to deal with it is to use an OR tool–something like OJB or JDO. These tools provide object models of your database tables, usually with some help from you. You then have the freedom to pretend like your database doesn’t exist, and use these objects in your application. The OR framework takes care of the dirty work like SQL updates and caching.

Every convenience has its price, however, and OR mapping tools are no exception. The same abstraction that lets you pretend that you’re simply dealing with objects means that you cannot easily examine the SQL that is generated. In addition, the way that you’re using the objects may cause performance issues, because you’re treating the data as objects, rather than rows.

It’s much the same issue as calling methods over the network via RMI or accesing files via NFS: the abstraction is great and means that programmers don’t have to think about the consequences of remote access. But the failure of the abstraction can be catastrophic, all the more so because the programmer was not expecting to have to deal with the grotty details under the abstraction (that’s the whole point, right?).

OR tools do not fail often, or have many catastrophic failure modes, but they sure can be slow. With open source software, you can dig around and see how SQL is being generated, but that’s tedious and time consuming. With commercial products, you don’t even have that option. (Some OR tools may have their own ‘Show me the SQL’ switch–I haven’t run into them.)

Enter p6spy. p6spy can be used in place of any JDBC driver. You point it to the the real driver and it passes on any configuration or SQL calls to that driver. But p6spy logs every SQL statement passed to it and every result set passed back. (A fine non object oriented example of the Decorator pattern.)

It took me about 15 minutes to figure out how to use p6spy, the software is open source with decent documentation, the latest version has data source support, and it scratches an itch that most, if not all, java developers will have at some time. With p6spy, you can find out what that OR tool is doing under the covers–it’s an easy way to peel back some of the abstraction if needed.

Koders.com–search source code

Koders.com has apparently indexed many open source software projects. (Link via Dion.) I played around with it a bit and I think it’s a very slick application. I’m of two minds about this, though.

The good:

Code reuse is good. A co-worker of mine once called it ‘editor inheritance’–in a world where people time is expensive and disk space is cheap, it can make sense (not always) to just copy code rather than figure out how to make a piece of code re-usable. Koders lets you do this in a more effective way.

It also lets coders easily compare and contrast styles between real live projects. And I can only imagine that soon some researcher will sink his teeth into all the code and publish on First Monday.

The bad:

As the linux-SCO lawsuits have shown, it’s technically awfully easy to cut and paste code, but the results end up being illegal. I can only see this repository, even though it differentiates by license, exacerbating this problem. And mixing and matching code from different licenses becomes all the easier as they show up side by side in a search engine. If I were a company concerned with legal ramifications, I’d tread softly around this tool.

The possibilities:

Regardless, I have to say it’s a very cool application. I’ll be interested to find out how much people will use it. What would be really cool is further analysis–after all google gets its power from the links between websites–what would we learn by examining the links between code? For one, you’d have a better idea how useful and stable a project is, if you could know how many other projects used it. Having a plugin into a UML modelling tool would be pretty slick too.

Testing Korean content

I’m currently working on a site that needs to be truly localized for a large number of languages (tens of them). This is accomplished with large numbers of ResourceBundles, the MessageFormat class when variable text layout is needed, an Oracle backend which understands and doesn’t muck with UTF-8, an Access database which generates said bundles, and a crack team of translators.

However, how to test? Luckily, it’s fairly easy to have IE use a different language: clear instructions live here. One issue with the instructions is that they don’t tell you how to actually install a language pack. But this is easy too: I only had to right click on a page, choose the encoding menu, then choose more, and then the encoding I wanted (Korean, because I want to test double byte characters). I was then prompted to install a language pack. I accepted, and Windows downloaded a bunch of DLLs and other files. From then on I could view Korean characters (the encoding menu says I’m viewing ‘Unicode (UTF-8)’). Here’s a random site about mining that you can use to test your Korean language pack.

Don’t forget to test both the input and output of your application–saving user input, and being able to redisplay it is at least as important as being able to display what you draw from your ResourceBundle initially. As a bonus, the Korean character set that I installed via IE was made available to Firefox. This was done on the fly, not only did I not need to restart Windows, I didn’t even need to restart Firefox; I just needed to reload the page.

Mobile phones as examples of computing in context

Here’s an interesting 20 page paper examining some of the issues surround context and computing.

A few choice quotes:
“…answering the telephone has something of a moral compulsion.”

“…as much of this problem [a phone that understands context] reduces to that of building an intelligent computer.”

“…it is better to have machines which act in predictable ways so users can understand how they work” rather than unpredictable machines that ‘do the right thing.’

“A technology, like mobile phones, with its combination of voice mail, text messaging and the like, is something we dwell with in that it becomes part of the fibre of our practices and lives, even for those without(sic) reject them.” As someone who swore off mobile phones for a long time and now doesn’t know how life continued without them, I can sympathize with that sentiment.

“When people use a technology over time, and get used to seeing other people
using the technology, the actions of the technology come to be seen not as actions of technology but of the user themselves.”

Interesting reading.

Via Mobile Community Design

Extending attributes of entities in relational databases

When you are data modeling, entities have attributes. During the early part of a project, it’s possible to map out many of these attributes, but often you miss some–the requirements change, the customer changes their mind, or the architects missed something at the beginning. Depending on how you’ve modeled those attributes, the pain of adding, modifying or removing them can be mellow or intense. In addition, often the values stored in these attributes need to be queried, or modified themselves.

Suppose you have a employee table (I’m trying to do this with SQL 92 syntax but I am not a DBA):
create table emp (emp_id numeric, first_name varchar(100), last_name varchar(100), dept varchar(50));
and you suddenly need to add middle name and salary to this table. There are three ways to create an extensible schema in a relational database. Each of these has their pluses and minuses.

1. The DDL method of extension
alter table emp add middle_name varchar(100), salary numeric;
Here, you simply add another column. For querying and clarity, this method is fantastic. These columns can be indexed and it’s clear that employees now have two additional attributes. However, this also means that any mapping you have between your model objects and your database needs to be updated; probably code needs to be regenerated to add these two attributes to your model objects. It also means that you have to version your database–since code that expects a middle_name attribute on employee will probably die a horrible death if that column is missing. In addition, depending on the size of the system, you might need to get a DBA involved.

2. The DML method of extension
create table emp_attributes (emp_id numeric references emp(emp_id), name varchar(100), value varchar(100));
insert into emp_attributes (1, "middle_name", "Sam");
insert into emp_attributes (1, "salary", "100000");

In this case, you can add attributes without getting a DBA involved–you simply add columns into this table. However, there is no referential integrity on the name of the attribute (is middle_name the same as mid_name the same as MIDDLE_NAME?–though, to be fair, you can put constrains on the values of the name column). In additional, the value column is not typed; though almost any data type can be stored as a string, you can lose precision and waste time converting from string to the actual type you want. In addition, querying based on these attributes is tedious:
select first_name from emp e, emp_attributes ea where e.emp_id = ea.emp_id and ea.name = "name" and ea.value ="Sam"
If you want to get all employees paid more than Sam, you need to resort to database specific functions to convert that string to a number.

3. The stored object method
alter table emp add objectdata varbinary;
With this method, you create a object or data structure in memory and serialize it to a stream of bytes which you then store in the objectdata column. This is great because you can add whatever attributes you like and the database structure doesn’t need to change at all. However, the data is unreadable by normal SQL tools and other programming languages. Querying on this data also becomes very difficult and slow, as you end up having to recreate each employees data object and test conditions in the programming language–you’re not taking advantage of the database.

There are a couple of questions that can help you determine which method you should use: How often will attributes be added? How hard is the process for that? How difficult is it to regenerate your data layer? Will you want to use SQL tools?

In general, the DDL method is the best option. It’s just the cleanest, easiest to understand and query against. The DML method is the second best, as you can still use most of the SQL toolset, even if it’s more complicated. The stored object method for extending attributes in a relational database should be used carefully, when there are a large number of attributes which can change often and will never be queried upon.

Problem solving via email composition

I’m currently struggling with an open source project, working to make the UI look like the one the designer showed the client. It’s a bit frustrating, because it’s sleuth work, and the clues are all in the code. Code I’ve not worked with, but with which I am rapidly gaining familiarity.

I cannot imagine how frustrating this would be if it were a closed source tool. Check that, I can. I worked on a project a few years ago with a closed source portal product, and it simply wouldn’t do what we asked it to do. Well, when you’re a small company, you do what you need to do in order to get the job done. This included some decompilation I’m sure voided our warranty, but delivered the client what was promised. It’s awful when you’re put in that position.

Regardless, I’ve been able to tweak this open source project to make it do what I need. When I find myself facing an issue that is not covered in the on line documentation, I’ve found that the best way to solve the problem is to start writing an email to the user list.

Now, because it’s written communication to a group of peers, some of whom are quite knowledgeable about the subject, I’m fairly careful about the content. I outline the problem carefully, explain the searches of the mailing list and web that I’ve already undertaken, and detail any avenues of solution that I’ve already explored. This rather rigorous analysis of the problem often leads to other glimmers of solutions, log files I have forgotten to check, and more hypotheses to prove or disprove. As I build the email, the problem becomes more and more manageable.

In my efforts to not appear a fool to peers, I reckon I have not sent about seventy percent of the emails I’ve composed–either I found another solution, I missed some crucial bit when before I started the email, or after doing due diligence on the problem I discovered the answer on the mailing list. It’s a bit weird, but crafting an excellent ‘can you help me’ email to a user list helps me even if the email isn’t sent.

Coding Standards

We’ve all been caught up in the religious war of coding standards. Whether it’s ‘tab vs spaces’ or ‘3 spaces vs 4 spaces’ or ‘curly brace on same line as if clause or curly brace on line below if clause’, these arguments can take up a significant amount of time on a project. Ken Arnold has an interesting post where he recommends that coding standards be integrated into the language specification, and enforced by the compiler. His reasons:

1. There’s no real productivity difference between one coding style and another.

2. There’s real productivity lost in setting up pretty printers, re-formatting code manually, and arguing about the better style.

3. If programmers have latitude, they’ll use it. (See perl.)

So, take away the freedom of programmers to format their code in a non standard manner. Make coding style part of the compiler–not as a warning though (as Bruce Eckel says “when you discover you can ignore them [warnings], you do”)–actually have the compiler error out if the formatting is incorrect.

Sure, I’d wail along with the rest of the coding community, for a while. But then I’d adjust, especially if the tools were there. And I’d probably be more productive. Talk about tough love.

Right tool, right job, part II

Here’s a nice explication of ‘choose the right tool for the job,’ by a pundit. The entire J2EE stack is often overkill, and I’ve seen shops that would have been much better served by a series of PHP applications, perhaps with java on the backend for heavier duty processing and legacy code reuse.

My only quarrel with his analysis is that he assumes we know how long an application is going to live. How many applications that started as prototypes get rolled to production and are the cause of endless headaches five years down the road?

As in any complex situation, when designing an application, and choosing how complicated to make its architecture, you need to balance between two principles:

1. It’s easier to do things right the first time. You’re in the code, you have an understanding of the requirements, there’s a green field, you have a budget–therefore making sure the application works is easier when you’re first writing it.

2. You aren’t gonna need it. This slogan, from the Agile folks, says that you have no way of knowing what you’re going to need five years in the future, so why try to design for that?

Taken too far, each principle is disastrous. If you follow principle number one fully, you’ll end up in analysis paralysis. If you take point two to heart, you’ll miss out on big chunks of functionality that have to be retro-fitted on later, to the detriment of a schedule and possibly the design.

That’s why they (architects, not pundits) get paid the big bucks.