Skip to content

Moving a Paradox application to PostgreSQL

I have a client that has an existing Paradox database. This database is used to keep track of various aspects of their customers, and is based on a database system I originally wrote on top of Notebook, so I’m afraid I have to take credit for all of the design flaws present in the application. This system was a single user Paradox database, with the client portion of Paradox installed on every computer and the working directory set to a shared drive location. It wasn’t a large system; the biggest table had about 10k records.

This system had worked for them for years, but recently they’d decided they needed a bit more insight into their customer base. Expanding the role of this database was going to allow them to do that, but the current setup was flawed. Paradox (version 10) often crashed, and only one user could be in at a time. I took a look at the system and decided that moving to a real client server database would be a good move. This would also allow them to move to a different client if they ever decided to get Access installed, or possibly a local web server. This document attempts to detail the issues I ran into and the steps I followed to enable a legacy Paradox application to communicate with a modern RDBMS.

I chose PostgreSQL as the DBMS for the back end. I wasn’t aware at the time that MySQL was recently freed for commercial use, but I still would have chosen PostgreSQL because of the larger feature set. The client had a Windows 2000 server; we discussed considered installing a Linux box in addition, but the new hardware costs and increased maintenance risk led me to install PostgreSQL on the Windows 2000 server. With Cygwin‘s installer, it was an easy task. I followed the documentation to get the database up and running after Cygwin installed it. They even have directions for installing the database as a Windows service (it’s in the documentation with the install), but since this was going to be a low use installation, I skipped that step.

After PostgreSQL was up and running, I had to make sure that the clients could access it. This consisted of three steps:

1. Make sure that clients on the network could access the database. I had to edit the pg_hba.conf file and start PostgreSQL with the -i switch. The client’s computers are all behind a firewall, so I set up the database to accept any connections from that local network without a password.

2. Install the PostgreSQL ODBC driver and create a system ODBC DSN (link is for creating an Access db, but it’s a similar process) for the new database on each computer.

3. Creating an alias in Paradox that pointed to the ODBC DSN.

Once these steps are done, I was able to query a test table that I had created in the PostgreSQL database. One thing that I learned quickly was that two different computers could indeed access PostgreSQL via the Paradox front end. However, in order to see each others changes to the database, I had to hit cntrl-F3, which refreshed from the server.

The next step was to move the data over. There are several useful articles about moving databases from other RDBMS to PostgreSQL here, but I used pxtools to output the data to plain text files. I then spent several days cleansing the data, using vi. I:

1. Exported table names were in mixed case; I converted them to lower case. PG handles mixed case, but only with ” around the table names, I believe.
2. Tried to deal with a complication from the database structure. I had designed it with two major tables, which shared a primary key. The client had been editing the primary key, and this created a new row in the database for one of the tables, but not the other. In the end, matching these up became too difficult, and the old data (older than a couple of years) was just written off.
3. Removed some of the unused columns in the database.
4. Added constraints (mostly not null) and foreign key relationships to the tables. While these had existed in the previous application, they weren’t captured in the export.

Then I changed the data access forms to point to the new database. The first thing I did was copy each of the data access forms, so that the original forms would still work with the original database. Most of the forms were very simple to port—they were just lookup tables. I found the automatic form generator to be very helpful here, as I added a few new lookup tables and this quickly generated the needed update/insert forms.

However, I did have one customized form that caused problems. It did inserts into three different tables. After the database rationalization, it only inserted into two, but that was still an issue. Paradox needed a value for the insert into each table (one because it was a primary key, the other because it was a foreign key). I couldn’t figure out how to have Paradox send the key to the both inserts without writing custom code. So, that’s what I did. I added code to insert first into the table for which the value was a primary key, and later to insert the value into the table for which it was a foreign key. It wasn’t a pretty solution, and I think the correct answer was to combine the two tables, but that wasn’t an option due to time and money constraints. I also made heavy use of the self.dataSource technique to keep lists limited to known values.

After moving the forms over, I had to move one or two queries over (mostly query by examples, qbes, which generated useful tables), but that was relatively straight forward; this was a helpful article regarding setting up some more complicated qbes. Also, I found a few good resources here and here.

I also updated a few documents that referenced the old system, and tried to put instructions for using the new system onto the forms that users would use to enter data. I moved the original database to a different directory on the shared drive, and had the client start using the new one. After a bit of adjusting to small user interface issues, as well as the idea that more than one user could use the database, the client was happy with the results.

Customer service automation

Customer service, like everything else, has undergone a revolution in the past two hundred years. In the olden days there was a corner grocer who knew people personally and therefore could render excellent, customized service. Then the large department and grocery stores appeared on the scene. These large corporations could offer goods cheaply, but didn’t know or care who their customers were nor what the customers were willing to buy. Now, companies are trying, via software and databases, to recreate the corner store insight and knowledge, but in a scalable fashion. Whether it is frequent shopper cards coupons, or automated recommendations of books, companies are trying to use software to scale their ability to know what the customer wants, and hence give it to them.

This is good for the company, because if companies only try to sell what is wanted, they do not have to spend as much money advertising and maintaining unwanted inventory. In addition, such tactics aren’t as likely to annoy the customer as trying to sell the customer something undesired. This also builds customer loyalty, since the company gives the impression of caring about customers’ perceived needs. This is not a false impression: the company does care about customers’ needs, because satisfying these needs is the only way the company makes money.

This is also good for the customer because it gives them what they want, at minimal fuss. It also makes for cheaper goods in the long run, since companies aren’t spending excessive amounts of money on ill targeted customers; a single twenty something with no children has no need for diapers, and sending them coupons for those diapers only wastes resources.

However, there is a fly or two in the ointment of customer awareness via large databases. In stark contrast to the grocer, large companies are not the peers of their customers, and this inequality can lead to issues. In addition, the quality of the customer service provided by software, while better than no customer service, isn’t a replacement for human interaction. I am explicitly leaving aside the issues of privacy since they are murky and still being defined.

The corner grocer, whom companies are trying to emulate in service, was a member of the community. If he cheated a customer, word got around. If he was doing anything unethical, his peers and customers could apply neighborly pressure in order to rectify his behavior. And, most importantly, the knowledge he gained about hist customers was counterbalanced by their knowledge of him as a neighbor. Few of these constraints operate on modern, large companies in anywhere near the same fashion. I’m not denying that people can affect the behavior of retailers with words, activism and lawsuits. However, changing the behavior of a large corporation is never going to be as easy as changing the actions of a local shop owner.

In addition to the difference in resources and power between customers and companies, it’s also clear that performance suffers. This isn’t strictly related to the gathering of customer data, but the existence of such data inspires companies to cut costs by automating. In general, I believe that the service provided by any software is inferior to that provided by a real live human being. And by building these databases, companies are being seduced by the siren call of reducing human interactions—if software (a [relatively] fixed cost that scales well) can recommend a good book, who needs an employee (a recurring cost that scales poorly). I realize that this may sound Luddite, but certainly in the current incarnation, I’ve found such software doesn’t match up well with recommendations from real people.

In short, I believe that more and more companies are customizing and tailoring the customer experience in order to cut costs and build loyalty. But I also feel that there are significant downsides to such tailoring and I’m not sure that it’s worth it.

Turn off the download manager in Mozilla

I hate the download manager that Mozilla turns on by default. It’s another window you have to Alt – Tab through, and it rarely has useful information for me. Granted, if I was on a modem, or downloaded files often, it might be more useful. But as it is, 90% of the time that it pops open, I don’t even look at it until the download is done. In fact, I can’t think of a single time when having the download manager has been useful for me, even though it’s high on other people’s lists of cool features in Mozilla (or, in this case, Firefox).

Luckily, you can turn it off in Mozilla 1.6 (I haven’t tried in earlier versions). Go to edit / preferences / download / … and choose the behavior you want when downloading files.

jad

If you’ve never used jad then you’re missing out on a great tool. Jad lets you easily decompile java class files. It may be shady legally, depending on what contracts you’ve signed, but it’s definitely useful in debugging and understanding behavior of java applications. It couldn’t be simpler to use. Just run

jad classfile.class

from the command line, and you get a java file (named classfile.java) in the same directory. The names of the variables aren’t fantastic (s1, s2…) but it sure beats reading the bytecode output of javap -c.

Note, it’s free for noncommercial use, but if you want to use it commercially, contact the author for terms. And if you get a chance to download it from the above tripod.com link, grab it and store it someplace else, because the page often is unavailable due to its exceeding bandwidth limits.

Windows frustrations

I’m reading Hackers by Steven Levy right now. This book is about the first people to really program computers with enthusiasm and an eye towards some of the anarchic possibilities of the machine. And the obstacles they overcame were tremendous. Writing entire video games in assembly language, re-implementing FORTRAN for different platforms (heck, writing anything in FORTRAN at all is a trial), working with computers the size of entire building floors, dealing with the existing IBM priesthood… There were plenty of obstacles to getting work done with a computer back then.

And, there still are, I have to say. I’m currently writing this from my first laptop ever. I love it. The mobility, the freedom, especially when combined with a wireless network card. This computer came with Windows XP and I plan to leave windows on this box, primarily so that I can do more J2ME development.

Now, the first thing any Unix user learns is that you shouldn’t log in as root any more than you absolutely have to. The reasons for this are many: you can delete system files unintentionally, there’s no log file to recreate disaster scenarios, and in general, you just don’t need to do this. The first thing I do every time I’m on a new desktop Unix box is download a copy of sudo and install it. Then I change the root password to something long and forgettable, preferably with unpronounceable characters. I do this so that there’s never any chance of me logging in as the super user again. I will say that this has caused me to have to boot from a root disk a time or two, but, on the other hand, I’ve never deleted a device file unintentionally.

Anyway, the purpose of that aside was to explain why I feel that you should always run your day to day tasks as a less privileged user. Even more so on Windows than on Unix, given the wider spread of Windows viruses and, to be honest, my lack of experience administering Windows. So, the first thing I did when I got this new computer was to create a non administrative user. Of course, for the first couple of days, I spent most of my time logged in as the administrative user, installing OpenOffice, Vim and other software. I also got my wireless card to work, which was simple. Plug in the card, have it find the SSID, enter the WEP key and I was in business.

That is, until I tried to access the Internet via my wireless card when logged in as the limited user. The network bounces up and down, up and down, and there doesn’t seem to be anything I can do about it. Every second, the network changed status. To be honest, I haven’t looked in google because I can’t even think of how to describe the phenomenon. But, when I’m logged in as the administrator, it’s smooth sailing. There are some things I plan to try, like creating another administrator and seeing if that account has similar problems. If that’s the case, it’s probably not the fact that my limited privilege account has limited privileges, but rather that the network software hasn’t been made accessible to it. However, this situation is especially frustrating because the time when I least want to be logged in as an administrative user is when I’m most vulnerable to worms, viruses and rogue email attachments–that is to say, when I’m connected to the Internet.

I remember fighting this battle 3 years ago, when I was using Windows NT on a team of software developers. I was the only one (so far as I know) to create and use regularly a non privileged account. Eventually, I just said ‘screw it’ and did everything as the administrative user, much as I’ll do now after a few more attempts to make the unprivileged user work. Windows just doesn’t seem to be built for this deep division between administrators and users, and that doesn’t seem to have changed.

Dimensions of Application Configuration

Tom Malaher has written an excellent rant about the state of installing and configuring third party software. Since most programmers are definitively not at the bleeding edge of technology (“we need you to build another order entry system”), we all use third party software and understand some of his frustration. After all, it would be nice to be able to configure such software in any way we deemed fit, rather than having to deal with the dictates of the vendor.

Alas, such flexibility is not often found. Even among open source software, you can find rigidity. Of course, if you take the time, you can fix the problems, but the entire point of third party software is that you can use it
‘out of the box,’ thus saving time.

Tom gave a masterful analysis of the structural components of third party software. Though he repeatedly asks for comments and suggestions, I don’t have any to make regarding his ‘types of data’ delineation. However, I thought it would be worthwhile to examine configuration data more closely. (Eric S Raymond also covers configuration in general here.) In fact, I think there are a number of interesting facets that tie into making configuration data easy to version, store, and separate from other types of data.

1. App specific vs universal format

You can either have one configuration files (or one set of files) that are all shared by every application (a la config.sys and win.ini) or you can have application specific configuration files for every substantial installed application (a la sendmail.conf and /etc/*).

One set of files makes it easy for the user to know where the application they just installed is configured. It also ensures that all applications use roughly the same type of configuration: the same comment character, the same sectioning logic, the same naming conventions. It also means that you can use the operating system to manage the configuration files, rather than having each application have to write their own code to create and manage their configuration.

Having each application manage their own configuration files ensures that the configuration will be tailored to the application’s needs. Some applications might need a hierarchical configuration file, where some sections inherit from others. Others can get by with a simple text file with name value pairs. Another advantage of having separate configuration files is that, well, they are separate. This makes it easier to version them, as well as making it easier to tweak the configuration files, possibly to run multiple instances of one application.

2. User vs system

This is closely related to the first differentiation. However it is distinct, as it’s possible to have a system format for configuration that has specific areas for users, and to have an app specific format that excludes any other application running on a given system. The crucial question is each user can have an independent installation of a given application.

It’s hard to argue against allowing each user to have an individual configuration, but in certain situations, it may make sense. If, for example, there are parameters whose change may drastically affect the performance of a system (the size of a TCP packet), or which may govern specific limited resources (the allocation of ports), then it may make sense to limited user specific configuration. You may notices that my examples are all drawn from the operating system, and this may be one application where user specific configuration may not be a good idea,
ince the OS underlies all the other applications.

3. Binary vs text

There are two possible formats in which to store configuration information. One is eminently computer readable, minimizes disk usage, and increases the speed of the application. The other one is superior.

Binary configuration formats are quicker for the computer to read and take up less space on disk. However, they are prone to rot, as only the application that wrote it can read and manipulate the file. No one else can, and this unfortunately includes the poor programmer who needs to modify some behavior of the application years after it was written.

Text configuration files, on the other hand, parse slower and are bulkier. However, they can also be self describing (check out this sample sendmail configuration file for a counter example). This in itself is a win, because it gives a human being a chance to understand the file. In addition, such configuration files can also be manipulated by the bevy of tools that can transmogrify the configuration files into something else (a bit of perl, anyone). They can also be easily version controlled, and diffed. Pragmatic programmers like text files (section3.14) for many of the above reasons.

It’s clear that there are several different options when it comes to configuring any one particular application. Some of these are related, and some are orthogonal, but all of them deserve consideration when designing any application.

Checking the status of your files, using CVS

When I used CVS a few years ago, I remember a colleague writing a tremendous perl script that you could run from anywhere in the CVS source tree. It would let you know whether you had files that weren’t in CVS, needed to be updated, or were going to be merged. It was quite a nice piece of perl code, which essentially parsed the output of cvs status, and the information it output was quite useful at the end of a long bug fixing or coding session (“hey, what files did I change again?”). However, it also needed to be maintained and documented, as well as explained to users.

The other day, I stumbled on something which works almost as well, but is part of CVS already: cvs -qn up. The q option tells CVS to be quiet, and not chat about all the directories that it sees. The n option tells CVS not to make any changes on the filesystem, but just tell you what changes it would have made. Here’s some sample output:

[moore@localhost guide]$ cvs -qn up
? securityTechniques/NewStuff.rtf
M securityTechniques/InputValidation.rtf
M securityTechniques/SessionManagement.rtf
U securityTechniques/AuthenticationWorkingDraft.doc

M means that the file has been changed locally. ? means that the file exists on the locally, but not in the repository. U means that the file has changed in the repository, but not yet been updated locally. For more information on the output of update, look here.

Use this command and never lose track of the files in your CVS tree again.

Book Review: Enterprise J2ME

Update 2/25/07: added Amazon link.

I go to Java Users Groups (yes, I’m struggling to get in touch with my inner geek) once every two or three months. Sometimes there’s an engaging speaker, but most of the time the fellow up front looks like he’s just swallowed a hot pepper, speaks like he has a permanent stutter, and answers questions like I’m speaking Greek. (I’m not making fun; I had a hard time when I was in front of a JUG too.) Regardless of the quality of the speaker, I gain something just by watching the presentation–he points out interesting technologies and usually has a list of resources at the end that I can use for further research.

I think Michael Yuan would be a great speaker at a JUG, as he seems to have a masterful understanding of Java 2 Platform, Micro Edition (J2ME). However, the true value of his book, Enterprise J2ME, was in its introduction of new ideas and concepts, and the extensive resource listings. This book is a survey of the current state of the art in mobile java technology. Whatever your topic is, except for gaming development, you’ll find some coverage here. Securing information on the device or network, XML parsing strategies, messaging architectures, and data synchronization issues are all some of the topics that Yuan covers.

My favorite chapter was Chapter 7, ‘End to End Best Practices.’ Here, Yuan covers some of the things he’s learned in developing his own enterprise applications, and offers some solutions to five issues that differ between the J2ME world and the worlds familiar to most Java developers: J2EE and J2SE. He offers capsule solutions to the issues of “limited device hardware, slow unreliable networks, pervasive devices, ubiquitous integration [and] the impatient user.” Later in the book, he explores various architectures to expand on some of these capsules.

However, the strength of this book, exposing the reader to a number of different mobile technologies, is also its weakness. JUG speakers very rarely dive into a technology to the point that I feel comfortable using it without additional research; I usually have to go home, download whatever package was presented, and play with it a bit to get a real feel for its usefulness. This book was much the same. Some of the chapters, like chapters 12 and 13, where issues with databases on mobile devices (CDC devices, not CLDC devices) weren’t applicable to my kind of development, but you can hardly fault Yuan for that. Some of the later chapters felt like a series of ‘hello world’ applications for various vendors. This is especially true of chapter 12, and also of chapter 20, which is a collection of recipes for encryption on the device.

Additionally, I feel like some of the points he raised in Chapter 7 are never fully dealt with. An example of this is section 7.3.3, “Optimize for many devices.” The project I’m on is struggling with this right now, but I had trouble finding any further advice on this important topic beyond this one paragraph section. However, these small issues don’t take away from the overall usefulness of the book–if you are developing enterprise software, you’ll learn enough from this book to make its purchase worthwhile.

However, I wouldn’t buy the book if you’re trying to learn J2ME. Yuan gives a small tutorial on basic J2ME development in Appendix A, but you really need an entire book to learn the various packages, processes and UI concerns of J2ME, whether or not you have previously programmed in Java. Additionally, if you’re trying to program a standalone game, this book isn’t going to have a lot to offer you, since Yuan doesn’t spend a lot of time focused on UI concerns and phone compatibility issues. Some of the best practices about limited hardware may be worth reading, and if it’s a networked game, however, you may gain from his discussions in Chapter 6, “Advanced HTTP Techniques.” In general though, I’m not sure there’s enough to make it worth a game developer’s while.

I bought this book because I’m working on a networked J2ME application, and it stands alone in its discussion of the complex architectural issues that such applications face. It covers more than that, and isn’t perfect, but it is well worth the money, should you be facing the kind of problems I am. Indeed, I wish I had had this book months ago, as I’m sure it would have improved the my current application.

Link to book on Amazon.

An IP address is to DNS as a URL is to Google

I just read this post from Mike Clark. Now, I agree with some of what he says. It’s true that it is a whole lot easier to remember terms you were searching for than a URL. Words and concepts are just plain easier to remember than strings where the slightest mistype will give you a 404 error. That’s why we use DNS rather than just typing in IP addresses everywhere. However, IP addresses work almost all the time, even when the DNS server is down or misconfigured. If I know the IP address of a mail server, then I can still check my email even when I can’t resolve its domain name.

This is true of the search engine/URL dichotomy as well. Have you noticed the size of the uproar when Google changes PageRank? Every time a search engine changes its ranking algorithms, it will throw into havoc any sites you’ve memorized via search terms. And search engines change their systems more often than DNS goes down. But cool URIs [URLs] don’t change.

Another issue is that when it’s so easy to search vast amounts of information, you don’t end up looking anywhere else. This rant, which circulated a few months ago, highlights that issue. It’s almost like, if you can’t find something online, you can’t be bothered to find out about it. I do it myself. Even results of search engine queries don’t get fully explored. How often have you viewed anything other than the first page at google?

I understand the power and love of search engines, but folks, including myself, need to be sure to understand the implications of using them as shorthand for permanent links and/or shortcuts for true research.

How can you keep a website out of a search engine?

It’s an interesting problem. Usually, you want your site to be found, but there are cases where you’d rather not have your website show up in a search engine. There are many reasons for this: perhaps because google never forgets, or perhaps because what is on the website is truly private information: personal photos or business documents. There are several ways to prevent indexing of your site by a search engine. However, the only sure fire method is to password protect your site.

If you require some kind of username and password to access your website, it won’t be indexed from by any search engine robots. Even if a search engine finds it, the robot doing the indexing won’t be able to move past the login page, as they won’t have a username and password. Use a .htaccess if you have no other method of authenticating, since even simple text authentication will stop search engine robots. Intranets and group weblogs will find this kind of block useful. However, if it’s truly private information, make sure that you use SSL because .htaccess access control sends passwords in clear text. You’ll be defended from search engines, but not from people snooping for interesting passwords.

What if you don’t want people to be forced to remember a username and password? Suppose you want to share pictures of baby with Grandma and Grandpa, but don’t want to either force them to remember anything, nor allow the entire world to see your child dressed in a pumpkin suit. In this case, it’s helpful to understand how search engines work.

Most search engines start out with a given set of URLs, often submitted to them, and then follow all the links in a relentless search for more content (for more on this, see this excellent tutorial). Following the links means that submitters do not have to give the search engine each and every page of a site, as well as implying that any page linked to by a submitted site will eventually be indexed as well. Therefore, if you don’t want your site to be searched, don’t put the web sites URL any place it could be picked up. This includes archived email lists, Usenet news groups, and other websites. Make sure you make this requirement crystal clear to any other users who will be visiting this site, since all it takes is one person posting a link somewhere on the web, or submitting the URL to a search engine, for your site to be found and indexed. I’m not sure whether search engines look at domain names from whois and try to visit those addresses; I suspect not, simply because of the vast number of domains that are parked, along with the fact that robots have plenty of submitted and linked sites to visit and index.

It’s conceivable that you’d have content that you didn’t want searched, but you did want public. For example, if the information is changing rapidly: a forum or bulletin board, where the content rapidly gets out of date, or you’re EBay. You still want people to come to the web site, but you don’t want any deep links. (Such ‘deep linking’ has been an issue for a while, from 1999 to 2004.) Dynamic content (that is, content generated by a web server, usually from a relational database) is indexable when linked from elsewhere, so that’s no protection.

There are, however, two ways to tell a search engine, “please, don’t index these pages.” Both of these are documented here. You can put this meta tag: <meta name=”robots” content=”none”> in the <head> section of your HTML document. This lets you exclude certain documents easily. You can also create a robots.txt file, which allows you to disallow indexing of documents on a directory level. It also is sophisticated enough to do user-agent matching, which means that you can have different rules for different search engines.

Both of these latter approaches depend on the robot being polite and following conventions, whereas the first two solutions guarantee that search engines won’t find your site, and hence that strangers will have a more difficult time as well. Again, if you truly want your information private, password protect it and only allow logins over SSL.