Dan Moore! – Page 93 – My place on the web

PL/SQL redux

I’ve written about PL/SQL before but recently have spent a significant amount of time writing stored procedures. Unlike some of my previous experiences, this time PL/SQL seemed like a great fit for the problem set, which was two fold.

In the first case, some of the stored procedures push data from stage tables, which are loaded via ODBC or SQL*Loader, into tables which the application accesses. PL/SQL is great for this type of task because cursors, especially when used with parameters, make row driven data transformations a pleasure, and fast as well. Handling deltas via updates instead of inserts was alright, and the fact is that PL/SQL code that manipulates data can be positively terse when compared to JDBC PreparedStatements and at least as fast. In addition, these stored procedures can be easily called over an ODBC connection, giving the client the capability to load new data to the stage tables and then call the stored procedure to update or insert the data as needed. (You could definitely do the same thing with a servlet and have the client hit a URL, but that’s a bit less self-contained.)

PL/SQL was also used to implement complex logic that was likely to change. Why do that in PL/SQL in the database rather than in java in the application server? Well, changes to PL/SQL programs don’t require a server restart, which can be quite an issue when a server needs high levels of uptime. Instead, you just recompile the PL/SQL. Sure, you can use the reloadable attribute of the context to achieve the same thing (if you’re using Tomcat) but recompiling PL/SQL doesn’t have the same performance hit as monitoring class files for changes.

Use the right tool for the job. Even if PL/SQL ties your application to Oracle, a judicious use of this language can have significant benefits.

Programming | moore | January 17, 2005

Under Pressure

In almost every software project of any length that I’ve participated in, the last few weeks before a release are tense and pressure filled. (Please note that I write custom business software; that’s what these conclusions are drawn from.) Being in the middle of a project release myself, I thought I’d muse on the causes of this pressure. Why are the last few weeks before the deadline so tense? Because software is, above all else, about the details. Joel puts it well in his interview with Salon.com:

The fundamental problem that you’re trying to solve here is that humans think of things in vague, mushy terms. In order to visualize something, they don’t have to actually visualize every part of it. Whereas the programmer, in order to actually implement that thing, to create it, needs to have every part specified.

What happens on projects over a certain level of complexity is this specification is pushed off, often until a decision must be made, or even past that point. This occurs for a number of reasons: programmers want to start coding, the client doesn’t have the information at the moment the issue is raised and it is never revisited, the answers to certain questions (or the questions themselves) are dependent on answers to other questions. In the beginning of a project, big questions are decided, but the small niggling details, which the compiler most certainly needs to know about, are, perhaps noted, but not dealt with.

Why not specify how the system will work before building any of it, to every exacting detail? Some software processes try to do this, but in general, unless the problem is very well understood (in which case the client will almost always be better served by off-the-shelf software), the requirements will change as the project progresses. (Incidentally, if they don’t, the project is a great candidate for offshoring.) The client will better understand the problem and technology and the software team will likewise better understand the problem and domain space. So specifying the entire system up front will likely leave the customers unhappy or the system unused.

Because business software is actually business process crystallization, it matters very much that things are correct. Because business software is implemented by a group of people with specialized skills and a different focus from the users, at best, or no understanding of the business, at worst, software delivery is unlike other deadline driven industries in that changes are expensive and mysterious. I think every software engineer has an example of a simple change request that turned out to have massive implications throughout the system, and this effect is mysterious to normal users.

What matters is not why the details crop up, but that they do. So, the last few weeks of every project consists of mentally running around and nailing down every detail. I expect this is true of every job with fixed deadlines (ever been around a retail store the day before Thanksgiving?). Every issue should be resolved or acknowledged when the software is released, and while some facets are less important than others, no detail is unimportant.

Programming | moore | January 15, 2005

Options for connecting Tomcat and Apache

Many of the java web applications I’ve worked on run in the Tomcat servlet engine, fronted by an Apache web server. Valid reasons for wanting to run Apache in front of Tomcat are numerous and include increased clickstream statistics, Apache’s ability to quickly and efficiently serve static content such as images, the ability to host other dynamic solutions like mod_perl and PHP, and Apache’s support for SSL certificates. This last is especially important–any site with sensitive data (credit card information, for example) will usually have that data encrypted in transit, and SSL is the default manner in which to do so.

There are a number of different ways to deal with the Tomcat-Apache connection, in light of the concerns mentioned above:

Don’t deal with the connection at all. Run Tomcat alone, responding on the typical http and https ports. This has some benefits; configuration is simpler and fewer software interfaces tends to mean fewer bugs. However, while the documentation on setting up Tomcat to respond to SSL traffic is adequate, Apache handling SSL is, in my experience, far more common. For better or worse, Apache is seen as faster, especially when when confronted with numeric challenges like encryption. Also, as of Jan 2005, Apache serves 70% of websites while Tomcat does not serve an appreciable amount of http traffic. If you’re willing to pay, Netcraft has an SSL survey which might better illuminate the differences in SSL servers.

If, on the other hand, you choose to run some version of the Apache/Tomcat architecture, there are a few different options. mod_proxy, mod_proxy with mod_rewrite, and mod_jk all give you a way to manage the Tomcat-Apache connection.

mod_proxy, as its name suggests, proxies http traffic back and forth between Apache and Tomcat. It’s easy to install, set up and understand. However, if you use this method, Apache will decrypt all SSL data and proxy it over http to Tomcat. (there may be a way to proxy SSL traffic to a different Tomcat port using mod_proxy–if so, I was unable to find the method.) That’s fine if they’re both running on the same box or in the same DMZ, the typical scenario. A byproduct of this method is that Tomcat has no means of knowing whether a particular request came in via secure or insecure means. If using a tool like the Struts SSL Extension, this can be an issue, since Tomcat needs such information to decide whether redirection is required. In addition, if any of the dynamic generation in Tomcat creates absolute links, issues may arise: Tomcat receives requests for localhost or some other hidden hostname (via request.getServerName()), rather than the request for the public host, whichApache has proxied, and may generate incorrect links.

Updated 1/16: You can pass through secure connections by placing the proxy directives in certain virtual hosts:

<VirtualHost _default_:80>
ProxyPass /tomcatapp http://localhost:8000/tomcatapp
ProxyPassReverse /tomcatapp http://localhost:8000/tomcatapp
</VirtualHost>

<VirtualHost _default_:443>

SSLProxyEngine On
ProxyPass /tomcatapp https://localhost:8443/tomcatapp
ProxyPassReverse /tomcatapp https://localhost:8443/tomcatapp
</VirtualHost>

This doesn’t, however, address the getServerName issue.

Updated 1/17:

Looks like the Tomcat Proxy Howto can help you deal with the getServerName issue as well.

Another option is to run mod_proxy with mod_rewrite. Especially if the secure and insecure parts of the dynamic application are easily separable (for example, if the application was split into /secure/ and /normal/ chunks), mod_rewrite can be used to rewrite the links. If a user visits this url: https://www.example.com/application/secure and traverses a link to /application/normal, mod_rewrite can send them to http://www.example.com/application/normal/, thus sparing the server from the strain of serving pages needlessly encrypted.

mod_jk is the usual way to connect Apache and Tomcat. In this case, Tomcat listens on a different port and a piece of software known as a connector enables Apache to send the requests to Tomcat with more information than is possible with a simple proxy. For instance, certain variables are sent via the connector when Apache receives an SSL request. This allows Tomcat full knowledge of the state of the request, and makes using a tool like the aforementioned Struts SSL Extension possible. The documentation is good. However using mod_jk is not always the best choice; I’ve seen some performance issues with some versions of the software. You almost always have to build it yourself: binary releases of mod_jk are few and far between, I’ve rarely found the appropriate version for my version of Apache, and building mod_jk is confusing. (Even though mod_jk 1.2.8 provides an ant script, I ended up using the old ‘configure/make/make install’ process because I couldn’t make the ant script work.)

In short, there are plenty of options for connecting Tomcat and Apache. In general, I’d start out using mod_jk, simply because that’s the option that was built specifically to connect the two; mod_proxy doesn’t provide quite the same level of integration.

Java Tomcat Web Applications | moore | January 14, 2005

sqlldr

I’ve been writing SQL*Loader scripts to load a fair bit of data into Oracle. I have a set of load tables with minimal constraints on them, into which SQL*Loader pushes the rows. Then I have written some PL/SQL which pulls from the load tables to the real database.

This architecture was chosen because the PL/SQL procedures can be written to allow incremental as well as full data loads. In the incremental case, it’s conceivable there there’d be a different way of pushing data over to the load tables (via ODBC or JMS, for example). In addition, the load tables can be denormalized, and you can put enough intelligence in the PL/SQL to turn your data structures into something at which a DBA won’t cringe.

Anyway, I thought I’d share a few tips, gleaned through the process. I’m definitely no SQL*Loader guru, but here are some useful links: the sqlldr FAQ, full of good information and recently updated, the Oracle Utilities page which does a great job of explaining all the options of SQL*Loader, and this case study which outlines internationalization with sqlldr. All very useful.

Two other tips: If you are loading delimited character data that is longer that 255 characters, you need to specify the length in your control file (for example, declaring it in the control file as char(4000)), or else you’ll get an aggravating error message warning that the data you’re loading is longer than the column in which you’re trying to load it. I spent some time looking very carefully at the load table trying to see what I was missing before I googled and found out that char fields do have default sizes in sqlldr control files.

And the bindsize and rows parameters are related, in terms of the amount of data that sqlldr can push into a table before it commits. You can make rows very very big, but if bindsize is too small (it defaults to 64k, apparently) the commits will happen sooner than they need to. For more explanation and other perforamance tips, see this page.

Overall, I’ve been very happy with how easy it is to load a fair bit of data, quickly (both in terms of load time and in development time) using sqlldr.

Databases Oracle Programming | moore | January 10, 2005

javascript and checkboxes

Ran into an interesting problem while I was using javascript today. I had a (dynamically generated) group of checkboxes that I wanted to be able to check and uncheck as a group. This was the code I had originally, which I had cribbed from one of the many fine javascript sites on the web:


function checkAll(field) {
   for (i = 0; i < field.length; i++) field[i].checked = true ;
}

This method was called by a link like this:


<a href="javascript:checkAll(document.form.checkboxes);">Check All</a>

All well and good, as long as the field that is passed into the function is an array of checkboxes. However, since javascript is a typeless language, you can call any method on an object, and depending on how egregarious the error is, the user might never see an error message. In this case, when the dynamically generated group of checkboxes has only one element, document.form.checkboxes is not an array of checkboxes, and its length attribute doesn’t return anything. The for loop is not executed at all, and the box is never checked.

The solution is simple enough, just check the type of object passed in:

function checkAll(field) {

    if (field.type != 'checkbox') {

        for (i = 0; i 

It makes a bit of sense why one checkbox wouldn't be an array of size one, but the switch caught me a bit off guard.  I'm trying to think of an analogous situation in the other dynamic languages I've used, but in most cases, you're either controlling both the calling and receiving code, or, in the case of libraries, the API is published.  Perhaps the javascript API documenting this behavior is published--a quick google did not turn anything up for me.

Programming | moore | January 10, 2005

Useful tools: the catch all email address

When working on a web application that requires authentication, email address is often chosen as a username. It’s guaranteed to be unique, it’s something that the user knows rather than another username they have to remember, and communication to the user is built in–if they’re having trouble, just have send them an email.

However, when developing the initial registration portion of a site that depends on email address for the username, you often run through many email addresses as you tackle development and bugs. Now, it is certainly easy enough to get more email addresses through Yahoo or hotmail. But that’s a tedious process, and you’re probably violating their terms of service.

Two other alternatives arise: you can delete the emails you want to reuse from the web application’s database. This is unsavory for a number of reasons. One is that mucking around in a database when you’re in the middle of testing registration is likely to distract you. Of course, if you have a the deletes scripted, it’s less of an issue. You’ll need to spend some time ensuring you’ve reset the state back to a true pure place; I’ve spent time debugging issues that arose for anomalous user state that could never be achieved without access to the back end.

Which is why I favor the other option. If you own your own domain name and have the catch all key set, all email for your domain that does not have a specified user goes to the catch all account. (I wasn’t able to find out much of hose this is set up, other than this offhanded reference to the /etc/mail/virtusertable file.)

I find having this available tremendously useful. You have an infinite number (well, perhaps not infinite, but very large) of addresses to throw away. At times, the hardest part is remembering which address I actually used, which is why having a system of some kind is useful. For example, for my dev database on my current project, I start all users with foo and then a number. For the stage database, I start all users with bar and then a number.

In addition to helping with development, it’s useful to have these throwaway email addresses when you are signing up for other web applications or posting on the web. For example, my jaas@mooreds.com account, which was posted on my JAAS and Struts paper, is hopelessly spammed. If I had put my real address on that paper, I would have much more spam than I do now, as jaas@mooreds.com simply goes to /dev/null courtesy of procmail. I’ve also used distinctive email addresses for blog comments and for subscribing to various mailling lists; this way I can find out if everyone really keeps their data as private as they say they will. Of course, there are free services out there that let you have throwaway email addresses but having your own domain gives you a bit more security and longevity.

All in all, I find that having a catch all email address set up for a given domain is a very useful development tool, as well as a useful general web browsing technique.

Programming Useful Tools | moore | December 29, 2004

Book Review: The Beast In the Garden

I just finished The Beast In the Garden: A Modern Parable of Man and Nature by David Baron. This non-fiction book is a quick read and outlines the comeback of the mountain lion, or cougar, along the Front Range, during the late 1980s to early 1990s. The cougar had been nearly wiped out by government bounties in the early part of the twentieth century, but the explosion of deer along the Front Range, along with revocation of that bounty, led to a comeback. In parts of the Denver metro area, mountain lions came to co-exist with human beings. This was especially true in Boulder, where the nature loving Boulderites assured a plentiful meat supply when they wouldn’t cull deer herds. The mountain lions grow familiar with human habits, learn that humans don’t mean danger, and end up mauling a high school student.

I really enjoyed the way the events were outlined, and Baron does a good job of making sure the science and character development are well balanced. He follows a few of the key players for the entire time, while bringing in other interesting characters, like the cougar hunter, as they appear. The science seems reasonable to me, though I haven’t taken a biology class since high school: large animals don’t have a natural aversion to humanity. Rather, this is a learned trait passed from generation to generation. Remove the killing that caused the aversion, and the animals will become more and more comfortable around humanity, to the point of considering humans a food source.

In the larger sense, though, this book is about managing wilderness, and realizing that as soon as you put a house up in a forest, you’ve changed the stakes. Humans love being around nature, but bleat for help as soon as nature threatens. In some ways, we want a Disneyland version of the forest–all of the beauty with none of the danger. You see this all the time with folks who build around national forest; as soon as fires season comes, they need to be protected. This is a thorny problem, and answers aren’t simple. The Beast In the Garden really is a parable, and I’m not sure we’ve learned the lessons.

Books | moore | December 26, 2004

ITConversations and business models

ITConversations is a great resource for audio conversations about technology. Doug Kaye, the owner/manager/executive assistant of IT Conversations, started a wiki conversation last month about that constant bugbear of all websites with free content: funding. Now, when you use 4.2 terrabytes a month of bandwidth, that problem is more intense than average; the conversation is still a worthwhile read for anyone trying to monetize their weblog or open content source.

Technology | moore | December 24, 2004

New vs old technologies

Compare the truths outlined here: “…for many businesses, sticking with what they have is the cheapest choice and best ROI” with Rands’ comments on tool cruft.

Of course, engineers aren’t businesses. But they operate under some of the same constraints–deadlines, limited money, etc. Despite what Rands says, there’s a balance to be struck between the new and the old. Of course, most folks, including myself, tend to lean towards the old and the known because it feels safer. But the known is (often) safer. Dion talks about it here and likewise doesn’t come to any conclusions.

I don’t want to sound like an old fogey, but I’ve been burned before in the past by short deadlines, new technologies and inexperienced users (of which I was one). I’m looking at Spring, having heard it praised to the sky, and want to use it on my next project. (Spring, incidentally, reminds me of a supercharged version of ATG’s Nucleus; what’s old is new again.) New tech is great, but not because it’s new. And old tech is safe, but not because it’s old. Each of these is appropriate when it’s the right tool for the job, but it’s hard to divorce that choice from my kneed jerk reactions and emotions–that’s what methods like ROI and research are designed to do.

Programming | moore | December 24, 2004

Precision and Accuracy in Software

Back in college, when I took first year physics lab, there was a section of the course that focused on teaching the difference between precision and accuracy in measurement. This distinction was crucial in experimental physics, since measurement is the bedrock of such experimentation. Basically, precision is how many digits of a measurement actually mean something. If I’m measuring the length of a room with my stride (and found it to be 30 feet long), the precision is less than if I were to measure the length of the room with a tape measure (and found it to be 33 feet, 6 and ¾ inches long). However, it’s possible that the stride measurement is more accurate than the length found with the tape measure, that is, it reflects how long the room actually is. (Perhaps there’s clothing on the floor which adds tape measurement, but which I stride over.)

These concepts aren’t just valid in physics; I think they’re also useful in software. When building a piece of software, I am precise if I build what I say I am going to build, and I am accurate if what I build actually meets the client’s business needs, that is, it solves the business problem. Almost every development tool either makes development more precise or more accurate.

The concept of precision lends itself easily to automation. For example, unit testing is rapidly gaining credence as a useful software technique. With unit testing, a developer writes test cases for each part of their code (often at the method level). The running of these tests ensures that code is actually doing what the developer thinks it is doing. I like writing unit tests; it gives me comfort to know that corner cases are taken care of and that changes to code can be fairly easily regression tested. Other techniques besides unit testing that help ensure precision include:

Round tripping: using a tool like TogetherJ, I can ensure that the model (often described in UML) and the code are in sync. This makes it easier for me to verify my mental model against the code.

Specification writing: The more precise a spec is, the easier it is to translate into code.

Compilers: the checking that occurs at compilation time can be very helpful in ensuring that the code is doing what I think it is doing–at a very low level. Obviously, this technique depends on the language used.

Now, precision is needed, because if I am not confident that I understand what the code is doing, then I’m in real trouble. However, accuracy is much more important. Having a customer onsite is a great example of a technique to ensure accuracy: you have a business domain expert available all the time for developers’ questions. In this situation, when a developer stumbles across a part of the business problem that they don’t quite understand, the don’t do what developers normally do (in order of decreasing accuracy):

1. Ask another developer, which works great if the target audience is developers, but not so well otherwise.
2. Best approximation (read: guess at the correct answer).
3. Ignore the issue. (‘I’ve got a lot more code to write before I can go home today, and we’re shipping in two weeks. We’ll just let the customer discover it and deal with it as a bug.’)

Instead, they have a real live business person, to whom this software really matters (hopefully), who they can ask. Doing this makes it much more likely that the final solution will actually solve the business problem. Other techniques to help improve accuracy include:

Issue tracking software (I use Bugzilla): Having a place where questions and conversations are recorded is truly helpful in making sure the mental model of the business user and the programmer are in sync. Using a web based tool means that non-technical users can participate and contribute.

Specification writing: A well written spec allows both the business user and developer to have a sense of what is being built, which means that the business user can correct invalid notions at an early stage. However, if a spec is too detailed, it can be used to justify precision at the cost of accuracy (‘hey, the code does exactly what’s specified’ is the excuse you’ll hear).

Spring and other dependency injection tools, as well as IDEs: These tools help accuracy by decreasing the costs of changing code.

Precision and accuracy are both important in software engineering. Perhaps the best way to characterize the two concepts is that precision is the mapping of the programmer’s model of the problem to the computer’s model, whereas accuracy is the mapping of the business’ needs to the programmer’s model. However, though both are needed, accuracy is much harder to obtain. Knowing that I’m building precisely what I think I’m building is beneficial only insofar as what I think I’m building is actually what the customer needs.

Programming | moore | December 20, 2004

PL/SQL redux

Under Pressure

Options for connecting Tomcat and Apache

sqlldr

javascript and checkboxes

Useful tools: the catch all email address

Book Review: The Beast In the Garden

ITConversations and business models

New vs old technologies

Precision and Accuracy in Software

Letters to a New Developer

Pages

Subscribe

Socials

Categories

Archives