Programming – Page 16

sqlldr

I’ve been writing SQL*Loader scripts to load a fair bit of data into Oracle. I have a set of load tables with minimal constraints on them, into which SQL*Loader pushes the rows. Then I have written some PL/SQL which pulls from the load tables to the real database.

This architecture was chosen because the PL/SQL procedures can be written to allow incremental as well as full data loads. In the incremental case, it’s conceivable there there’d be a different way of pushing data over to the load tables (via ODBC or JMS, for example). In addition, the load tables can be denormalized, and you can put enough intelligence in the PL/SQL to turn your data structures into something at which a DBA won’t cringe.

Anyway, I thought I’d share a few tips, gleaned through the process. I’m definitely no SQL*Loader guru, but here are some useful links: the sqlldr FAQ, full of good information and recently updated, the Oracle Utilities page which does a great job of explaining all the options of SQL*Loader, and this case study which outlines internationalization with sqlldr. All very useful.

Two other tips: If you are loading delimited character data that is longer that 255 characters, you need to specify the length in your control file (for example, declaring it in the control file as char(4000)), or else you’ll get an aggravating error message warning that the data you’re loading is longer than the column in which you’re trying to load it. I spent some time looking very carefully at the load table trying to see what I was missing before I googled and found out that char fields do have default sizes in sqlldr control files.

And the bindsize and rows parameters are related, in terms of the amount of data that sqlldr can push into a table before it commits. You can make rows very very big, but if bindsize is too small (it defaults to 64k, apparently) the commits will happen sooner than they need to. For more explanation and other perforamance tips, see this page.

Overall, I’ve been very happy with how easy it is to load a fair bit of data, quickly (both in terms of load time and in development time) using sqlldr.

Databases Oracle Programming | moore | January 10, 2005

javascript and checkboxes

Ran into an interesting problem while I was using javascript today. I had a (dynamically generated) group of checkboxes that I wanted to be able to check and uncheck as a group. This was the code I had originally, which I had cribbed from one of the many fine javascript sites on the web:


function checkAll(field) {
   for (i = 0; i < field.length; i++) field[i].checked = true ;
}

This method was called by a link like this:


<a href="javascript:checkAll(document.form.checkboxes);">Check All</a>

All well and good, as long as the field that is passed into the function is an array of checkboxes. However, since javascript is a typeless language, you can call any method on an object, and depending on how egregarious the error is, the user might never see an error message. In this case, when the dynamically generated group of checkboxes has only one element, document.form.checkboxes is not an array of checkboxes, and its length attribute doesn’t return anything. The for loop is not executed at all, and the box is never checked.

The solution is simple enough, just check the type of object passed in:

function checkAll(field) {

    if (field.type != 'checkbox') {

        for (i = 0; i 

It makes a bit of sense why one checkbox wouldn't be an array of size one, but the switch caught me a bit off guard.  I'm trying to think of an analogous situation in the other dynamic languages I've used, but in most cases, you're either controlling both the calling and receiving code, or, in the case of libraries, the API is published.  Perhaps the javascript API documenting this behavior is published--a quick google did not turn anything up for me.

Programming | moore | January 10, 2005

Useful tools: the catch all email address

When working on a web application that requires authentication, email address is often chosen as a username. It’s guaranteed to be unique, it’s something that the user knows rather than another username they have to remember, and communication to the user is built in–if they’re having trouble, just have send them an email.

However, when developing the initial registration portion of a site that depends on email address for the username, you often run through many email addresses as you tackle development and bugs. Now, it is certainly easy enough to get more email addresses through Yahoo or hotmail. But that’s a tedious process, and you’re probably violating their terms of service.

Two other alternatives arise: you can delete the emails you want to reuse from the web application’s database. This is unsavory for a number of reasons. One is that mucking around in a database when you’re in the middle of testing registration is likely to distract you. Of course, if you have a the deletes scripted, it’s less of an issue. You’ll need to spend some time ensuring you’ve reset the state back to a true pure place; I’ve spent time debugging issues that arose for anomalous user state that could never be achieved without access to the back end.

Which is why I favor the other option. If you own your own domain name and have the catch all key set, all email for your domain that does not have a specified user goes to the catch all account. (I wasn’t able to find out much of hose this is set up, other than this offhanded reference to the /etc/mail/virtusertable file.)

I find having this available tremendously useful. You have an infinite number (well, perhaps not infinite, but very large) of addresses to throw away. At times, the hardest part is remembering which address I actually used, which is why having a system of some kind is useful. For example, for my dev database on my current project, I start all users with foo and then a number. For the stage database, I start all users with bar and then a number.

In addition to helping with development, it’s useful to have these throwaway email addresses when you are signing up for other web applications or posting on the web. For example, my jaas@mooreds.com account, which was posted on my JAAS and Struts paper, is hopelessly spammed. If I had put my real address on that paper, I would have much more spam than I do now, as jaas@mooreds.com simply goes to /dev/null courtesy of procmail. I’ve also used distinctive email addresses for blog comments and for subscribing to various mailling lists; this way I can find out if everyone really keeps their data as private as they say they will. Of course, there are free services out there that let you have throwaway email addresses but having your own domain gives you a bit more security and longevity.

All in all, I find that having a catch all email address set up for a given domain is a very useful development tool, as well as a useful general web browsing technique.

Programming Useful Tools | moore | December 29, 2004

New vs old technologies

Compare the truths outlined here: “…for many businesses, sticking with what they have is the cheapest choice and best ROI” with Rands’ comments on tool cruft.

Of course, engineers aren’t businesses. But they operate under some of the same constraints–deadlines, limited money, etc. Despite what Rands says, there’s a balance to be struck between the new and the old. Of course, most folks, including myself, tend to lean towards the old and the known because it feels safer. But the known is (often) safer. Dion talks about it here and likewise doesn’t come to any conclusions.

I don’t want to sound like an old fogey, but I’ve been burned before in the past by short deadlines, new technologies and inexperienced users (of which I was one). I’m looking at Spring, having heard it praised to the sky, and want to use it on my next project. (Spring, incidentally, reminds me of a supercharged version of ATG’s Nucleus; what’s old is new again.) New tech is great, but not because it’s new. And old tech is safe, but not because it’s old. Each of these is appropriate when it’s the right tool for the job, but it’s hard to divorce that choice from my kneed jerk reactions and emotions–that’s what methods like ROI and research are designed to do.

Programming | moore | December 24, 2004

Precision and Accuracy in Software

Back in college, when I took first year physics lab, there was a section of the course that focused on teaching the difference between precision and accuracy in measurement. This distinction was crucial in experimental physics, since measurement is the bedrock of such experimentation. Basically, precision is how many digits of a measurement actually mean something. If I’m measuring the length of a room with my stride (and found it to be 30 feet long), the precision is less than if I were to measure the length of the room with a tape measure (and found it to be 33 feet, 6 and ¾ inches long). However, it’s possible that the stride measurement is more accurate than the length found with the tape measure, that is, it reflects how long the room actually is. (Perhaps there’s clothing on the floor which adds tape measurement, but which I stride over.)

These concepts aren’t just valid in physics; I think they’re also useful in software. When building a piece of software, I am precise if I build what I say I am going to build, and I am accurate if what I build actually meets the client’s business needs, that is, it solves the business problem. Almost every development tool either makes development more precise or more accurate.

The concept of precision lends itself easily to automation. For example, unit testing is rapidly gaining credence as a useful software technique. With unit testing, a developer writes test cases for each part of their code (often at the method level). The running of these tests ensures that code is actually doing what the developer thinks it is doing. I like writing unit tests; it gives me comfort to know that corner cases are taken care of and that changes to code can be fairly easily regression tested. Other techniques besides unit testing that help ensure precision include:

Round tripping: using a tool like TogetherJ, I can ensure that the model (often described in UML) and the code are in sync. This makes it easier for me to verify my mental model against the code.

Specification writing: The more precise a spec is, the easier it is to translate into code.

Compilers: the checking that occurs at compilation time can be very helpful in ensuring that the code is doing what I think it is doing–at a very low level. Obviously, this technique depends on the language used.

Now, precision is needed, because if I am not confident that I understand what the code is doing, then I’m in real trouble. However, accuracy is much more important. Having a customer onsite is a great example of a technique to ensure accuracy: you have a business domain expert available all the time for developers’ questions. In this situation, when a developer stumbles across a part of the business problem that they don’t quite understand, the don’t do what developers normally do (in order of decreasing accuracy):

1. Ask another developer, which works great if the target audience is developers, but not so well otherwise.
2. Best approximation (read: guess at the correct answer).
3. Ignore the issue. (‘I’ve got a lot more code to write before I can go home today, and we’re shipping in two weeks. We’ll just let the customer discover it and deal with it as a bug.’)

Instead, they have a real live business person, to whom this software really matters (hopefully), who they can ask. Doing this makes it much more likely that the final solution will actually solve the business problem. Other techniques to help improve accuracy include:

Issue tracking software (I use Bugzilla): Having a place where questions and conversations are recorded is truly helpful in making sure the mental model of the business user and the programmer are in sync. Using a web based tool means that non-technical users can participate and contribute.

Specification writing: A well written spec allows both the business user and developer to have a sense of what is being built, which means that the business user can correct invalid notions at an early stage. However, if a spec is too detailed, it can be used to justify precision at the cost of accuracy (‘hey, the code does exactly what’s specified’ is the excuse you’ll hear).

Spring and other dependency injection tools, as well as IDEs: These tools help accuracy by decreasing the costs of changing code.

Precision and accuracy are both important in software engineering. Perhaps the best way to characterize the two concepts is that precision is the mapping of the programmer’s model of the problem to the computer’s model, whereas accuracy is the mapping of the business’ needs to the programmer’s model. However, though both are needed, accuracy is much harder to obtain. Knowing that I’m building precisely what I think I’m building is beneficial only insofar as what I think I’m building is actually what the customer needs.

Programming | moore | December 20, 2004

Why did the multithreaded chicken cross the road?

to To other side. get the

(from Some Assembly Required.)

Programming | moore | December 11, 2004

Useful tools: javap

javap lets you examine java class files and jar files in a number of ways. See this web page for more information. For me, it’s an API reference. I use it in two ways:

1. When I’m coding, and I need to know the exact syntax of a method, I shell out: javap java.util.StringTokenizer. (Yes, I know that any modern IDE will do this for you without shelling out, but javap will work anywhere java is installed and with any editing tool. You trade portability for convenience.) One large catch is that inherited methods are not shown:

$ javap java.io.BufferedReader
Compiled from "BufferedReader.java"
public class java.io.BufferedReader extends java.io.Reader{
public int read();
throws java/io/IOException
static {};
public void close();
throws java/io/IOException
public void reset();
throws java/io/IOException
public boolean markSupported();
public boolean ready();
throws java/io/IOException
public void mark(int);
throws java/io/IOException
public long skip(long);
throws java/io/IOException
public int read(char[],int,int);
throws java/io/IOException
public java.io.BufferedReader(java.io.Reader);
public java.io.BufferedReader(java.io.Reader,int);
public java.lang.String readLine();
throws java/io/IOException
java.lang.String readLine(boolean);
throws java/io/IOException
}

Running javap on java.io.BufferedReader does not show the method read(char[]), inherited from java.io.Reader. (This example is from the J2SE 1.4 libraries.)

2. Sometimes, the javadoc is too up-to-date (or your jar files are too old) to answer questions about an API. For example, I’m working on a project with Jetspeed which depends on Turbine version 2.2. Unfortunately, this is an extremely old version of Turbine (release 16-Aug-2003), and the javadoc doesn’t appear to be available. (Updated Dec 11: It looks like the Turbine 2.2 javadoc is indeed online. Whoops.) Generating the javadoc with ant is certainly an possibility, and if I found myself going time and again to verify the API of Turbine 2.2, I’d do that. But for a quick one- or two-off question about an API that no web search turns up, javap can be very handy.

In short, if you have a quick question about an API, javap can help you out.

Java Programming Useful Tools | moore | December 10, 2004

Useful tools: p6spy

This entry kicks off a series of entries where I’ll examine some of my favorite tools for development. Some of them will be long, some short, but all of them will highlight software I use to make my life a bit easier.

A large, large chunk of the development I do is taking data from a relational database to a an HTML screen, and back again. Often there are business rules for transforming the data, or validation rules, but making sure the data is stored safely and consistently is a high priority, and that means a relational database.

However, I do much of my work in java, which means that the relational-OO impedance mismatch is a common problem. One common way to deal with it is to use an OR tool–something like OJB or JDO. These tools provide object models of your database tables, usually with some help from you. You then have the freedom to pretend like your database doesn’t exist, and use these objects in your application. The OR framework takes care of the dirty work like SQL updates and caching.

Every convenience has its price, however, and OR mapping tools are no exception. The same abstraction that lets you pretend that you’re simply dealing with objects means that you cannot easily examine the SQL that is generated. In addition, the way that you’re using the objects may cause performance issues, because you’re treating the data as objects, rather than rows.

It’s much the same issue as calling methods over the network via RMI or accesing files via NFS: the abstraction is great and means that programmers don’t have to think about the consequences of remote access. But the failure of the abstraction can be catastrophic, all the more so because the programmer was not expecting to have to deal with the grotty details under the abstraction (that’s the whole point, right?).

OR tools do not fail often, or have many catastrophic failure modes, but they sure can be slow. With open source software, you can dig around and see how SQL is being generated, but that’s tedious and time consuming. With commercial products, you don’t even have that option. (Some OR tools may have their own ‘Show me the SQL’ switch–I haven’t run into them.)

Enter p6spy. p6spy can be used in place of any JDBC driver. You point it to the the real driver and it passes on any configuration or SQL calls to that driver. But p6spy logs every SQL statement passed to it and every result set passed back. (A fine non object oriented example of the Decorator pattern.)

It took me about 15 minutes to figure out how to use p6spy, the software is open source with decent documentation, the latest version has data source support, and it scratches an itch that most, if not all, java developers will have at some time. With p6spy, you can find out what that OR tool is doing under the covers–it’s an easy way to peel back some of the abstraction if needed.

Java Programming Useful Tools | moore | November 23, 2004

Koders.com–search source code

Koders.com has apparently indexed many open source software projects. (Link via Dion.) I played around with it a bit and I think it’s a very slick application. I’m of two minds about this, though.

The good:

Code reuse is good. A co-worker of mine once called it ‘editor inheritance’–in a world where people time is expensive and disk space is cheap, it can make sense (not always) to just copy code rather than figure out how to make a piece of code re-usable. Koders lets you do this in a more effective way.

It also lets coders easily compare and contrast styles between real live projects. And I can only imagine that soon some researcher will sink his teeth into all the code and publish on First Monday.

The bad:

As the linux-SCO lawsuits have shown, it’s technically awfully easy to cut and paste code, but the results end up being illegal. I can only see this repository, even though it differentiates by license, exacerbating this problem. And mixing and matching code from different licenses becomes all the easier as they show up side by side in a search engine. If I were a company concerned with legal ramifications, I’d tread softly around this tool.

The possibilities:

Regardless, I have to say it’s a very cool application. I’ll be interested to find out how much people will use it. What would be really cool is further analysis–after all google gets its power from the links between websites–what would we learn by examining the links between code? For one, you’d have a better idea how useful and stable a project is, if you could know how many other projects used it. Having a plugin into a UML modelling tool would be pretty slick too.

Programming | moore | November 18, 2004

Testing Korean content

I’m currently working on a site that needs to be truly localized for a large number of languages (tens of them). This is accomplished with large numbers of ResourceBundles, the MessageFormat class when variable text layout is needed, an Oracle backend which understands and doesn’t muck with UTF-8, an Access database which generates said bundles, and a crack team of translators.

However, how to test? Luckily, it’s fairly easy to have IE use a different language: clear instructions live here. One issue with the instructions is that they don’t tell you how to actually install a language pack. But this is easy too: I only had to right click on a page, choose the encoding menu, then choose more, and then the encoding I wanted (Korean, because I want to test double byte characters). I was then prompted to install a language pack. I accepted, and Windows downloaded a bunch of DLLs and other files. From then on I could view Korean characters (the encoding menu says I’m viewing ‘Unicode (UTF-8)’). Here’s a random site about mining that you can use to test your Korean language pack.

Don’t forget to test both the input and output of your application–saving user input, and being able to redisplay it is at least as important as being able to display what you draw from your ResourceBundle initially. As a bonus, the Korean character set that I installed via IE was made available to Firefox. This was done on the fly, not only did I not need to restart Windows, I didn’t even need to restart Firefox; I just needed to reload the page.

Programming | moore | November 12, 2004

Programming - 16. page

sqlldr

javascript and checkboxes

Useful tools: the catch all email address

New vs old technologies

Precision and Accuracy in Software

Why did the multithreaded chicken cross the road?

Useful tools: javap

Useful tools: p6spy

Koders.com–search source code

Testing Korean content

Letters to a New Developer

Pages

Subscribe

Socials

Categories

Archives