Skip to content

Three tech tips

Here are three items that I’ve found useful in the past, but aren’t worth an individual post because of their triviality.

1. Sometimes file archives are only available in .zip format. There are unix programs out there that can unzip such archives, and linux often ships with one. But sometimes it’s not installed. Lately, I’m almost always doing some kind of java development, in which case, you can use the jar command to extract the archive.

2. I generate an html page of all my rss feeds, using a custom perl hack (I wouldn’t go so far as to term it a script). (No newsgator for me! Did I mention I still use pine for email?) This can produce quite a big file, since I’m querying around 80 feeds. In an effort to reduce my bandwidth, which I pay for, I now gzip my rss feeds page, using CPU that I don’t pay for (well, not directly). And, while gzip may not be the most efficient of compressors, files in gzipped format can be transparently read in all the browsers I cared to test: Mozilla, Firefox, IE, and even lynx.

3. Sometimes you just want the data from a mysql query in an easy format that you can pull into a spreadsheet and manipulate further. In the past, I would have written a quick perl script, using DBI, but after investigating the client options, I found another way. mysql -u user -B -ppass -e 'select * from my_data' databasename gives you nice tab delimited output. I’ve used this with the mysql 4 client; since I couldn’t track down the mysql 3 manual, I’m not clear what version of the mysql client supports these features.

Firefox customization

Firefox, the lightweight browser based on Mozilla, has been garnering quite a bit of attention lately. I’ve been a Mozilla user since 0.5, but only use the browser component, so I thought I’d give Firefox a try. It works great, and is very similar to IE (by design, no doubt). But browsing is a habit of mine, and, like anybody else, I don’t like to change my habits. Luckily, it was easy to change Firefox to fit my needs.

1. Have the search bar respond to my shortcuts (i for google images, g for google search, q for qwestdex search). This was no different than setting it up for Mozilla.

2. Firefox by default saves form entries. I don’t like that–it’s the paranoid in me. Easily changed: go to Tools / Options / Privacy / Saved Form Information and deselect the “Save information…” checkbox.

3. Firefox blithely closes a window when there’s more than one tab open. Wow! I don’t like that at all–Mozilla gives me a warning and 99% of the time, I was aiming at the wrong window or had forgotten that I had multiple tabs open. Feedster handed me this post so I knew I wasn’t alone; a bit of searching on MozDev turned up this handy extension: tab warning. Installing this was a snap, and now my browsing experience is back to what I expected.

One problem I haven’t figured out how to fix: in Mozilla, when you open a link in a new tab, the new tab gains focus. In Firefox, the old tab remains in front.

Spamorama

I just ran across one of the most virulent pieces of weblog spam I’ve ever seen. It was an innocuous comment: ‘please help with my website…’ and the URL wasn’t ostentatiously bad:

pseudobreccia60 DOT tripod DOT com DOT ve (please don’t visit this site!)

pseudobreccia, in case you’re wondering, is a kind of rock. ve is the Venezuelan country code. tripod DOT com DOT ve points to ns4.hotwired.com as its authoritative name server. The comment wasn’t blatantly off topic. So, I wasn’t super suspicious of the site.

Being a bit curious, I visited it. What you get is some kind of flash application. It seems innocent enough, just an ad and an under construction sign. Viewing source shows you nothing, but every time you close the window, or change the location in the address bar, it pops up a new window with the same URL in it (I ended up having to shut the browser down entirely via the Process Manager before it would go away). But, the payload is a periodical full size window pop up with advertisements for, what else, p0rn. Shocking, I know. But the persistance of the app was amazing. I almost wish I had a flash decompiler just to take a look at what it was doing.

I was doing all this in Mozilla–I can’t imagine what it tries to do to Internet Explorer (sets up itself as your homepage, adds itself to your favorites) and I don’t want to find out.

The people’s voice

Tim Bray points out Radio Vox Populi which is a really cool idea:

weblogs + web crawler + text-to-speech + mp3 streaming = talk radio for everyone.

Of course it could do with some filtering, or categorization, but it’s a cool idea. It actually jives with an idea I’ve had for a long time, which is to use text-to-speech, perhaps Festival, to burn cds of Project Gutenberg to create cheap books on cd (oh, should I listen to Boy Scouts on Motorcycles, by G. Harvey Ralphson or Armenian Literature, by Anonymous today?). That’d be cool, if you can handle listening to a robot voice.

An IM application server

I’ve written before about IM in the workplace. It’s becoming more and more prevalent, and other people have noticed this as well. IM is something that’s easy to use, and gives you the immediate response of the phone without be nearly as intrusive.

Now, in the past, using IRC, it was relatively easy to have a program, or bot, that would listen to conversations, or that you could ask questions of. They were dumb, but they worked. In the world of IM, I wasn’t aware of any easy way to do this. However, browsing freshmeat yesterday I discovered an easy way to write IM applications.

It’s called the SDBA Revolution Instant Messaging Application Server and building IM applications is fantastically easy if you use this perl framework. I was able to download it, and build a simple application in about 30 minutes. And that includes signing up for the usernames from AOL. It uses a perlish syntax and doesn’t support extremely complicated applications, but does offer enough to be useful. If you can code a php website, you can build an IM application. The author even provides six or so sample applications, including a database interface (scary!). The only issues I found with the IM app server were:

1. It doesn’t support Yahoo! That’s because the Yahoo! IM perl module has been unmaintained since the last Yahoo! protocol update.

2. I’m not sure of the legality of using a bot on a public service like AIM, MSN, or Yahoo!. Violations of these license agreements happen all the time, but, if you’re a stickler for those darn license agreements, this application server appears to work with Jabber.

Just goes to show you that 30 minutes a week browsing freshmeat or SourceForge will almost never be wasted. A bit of slack to do this will probably pay off in the long run.

Moving a Paradox application to PostgreSQL

I have a client that has an existing Paradox database. This database is used to keep track of various aspects of their customers, and is based on a database system I originally wrote on top of Notebook, so I’m afraid I have to take credit for all of the design flaws present in the application. This system was a single user Paradox database, with the client portion of Paradox installed on every computer and the working directory set to a shared drive location. It wasn’t a large system; the biggest table had about 10k records.

This system had worked for them for years, but recently they’d decided they needed a bit more insight into their customer base. Expanding the role of this database was going to allow them to do that, but the current setup was flawed. Paradox (version 10) often crashed, and only one user could be in at a time. I took a look at the system and decided that moving to a real client server database would be a good move. This would also allow them to move to a different client if they ever decided to get Access installed, or possibly a local web server. This document attempts to detail the issues I ran into and the steps I followed to enable a legacy Paradox application to communicate with a modern RDBMS.

I chose PostgreSQL as the DBMS for the back end. I wasn’t aware at the time that MySQL was recently freed for commercial use, but I still would have chosen PostgreSQL because of the larger feature set. The client had a Windows 2000 server; we discussed considered installing a Linux box in addition, but the new hardware costs and increased maintenance risk led me to install PostgreSQL on the Windows 2000 server. With Cygwin‘s installer, it was an easy task. I followed the documentation to get the database up and running after Cygwin installed it. They even have directions for installing the database as a Windows service (it’s in the documentation with the install), but since this was going to be a low use installation, I skipped that step.

After PostgreSQL was up and running, I had to make sure that the clients could access it. This consisted of three steps:

1. Make sure that clients on the network could access the database. I had to edit the pg_hba.conf file and start PostgreSQL with the -i switch. The client’s computers are all behind a firewall, so I set up the database to accept any connections from that local network without a password.

2. Install the PostgreSQL ODBC driver and create a system ODBC DSN (link is for creating an Access db, but it’s a similar process) for the new database on each computer.

3. Creating an alias in Paradox that pointed to the ODBC DSN.

Once these steps are done, I was able to query a test table that I had created in the PostgreSQL database. One thing that I learned quickly was that two different computers could indeed access PostgreSQL via the Paradox front end. However, in order to see each others changes to the database, I had to hit cntrl-F3, which refreshed from the server.

The next step was to move the data over. There are several useful articles about moving databases from other RDBMS to PostgreSQL here, but I used pxtools to output the data to plain text files. I then spent several days cleansing the data, using vi. I:

1. Exported table names were in mixed case; I converted them to lower case. PG handles mixed case, but only with ” around the table names, I believe.
2. Tried to deal with a complication from the database structure. I had designed it with two major tables, which shared a primary key. The client had been editing the primary key, and this created a new row in the database for one of the tables, but not the other. In the end, matching these up became too difficult, and the old data (older than a couple of years) was just written off.
3. Removed some of the unused columns in the database.
4. Added constraints (mostly not null) and foreign key relationships to the tables. While these had existed in the previous application, they weren’t captured in the export.

Then I changed the data access forms to point to the new database. The first thing I did was copy each of the data access forms, so that the original forms would still work with the original database. Most of the forms were very simple to port—they were just lookup tables. I found the automatic form generator to be very helpful here, as I added a few new lookup tables and this quickly generated the needed update/insert forms.

However, I did have one customized form that caused problems. It did inserts into three different tables. After the database rationalization, it only inserted into two, but that was still an issue. Paradox needed a value for the insert into each table (one because it was a primary key, the other because it was a foreign key). I couldn’t figure out how to have Paradox send the key to the both inserts without writing custom code. So, that’s what I did. I added code to insert first into the table for which the value was a primary key, and later to insert the value into the table for which it was a foreign key. It wasn’t a pretty solution, and I think the correct answer was to combine the two tables, but that wasn’t an option due to time and money constraints. I also made heavy use of the self.dataSource technique to keep lists limited to known values.

After moving the forms over, I had to move one or two queries over (mostly query by examples, qbes, which generated useful tables), but that was relatively straight forward; this was a helpful article regarding setting up some more complicated qbes. Also, I found a few good resources here and here.

I also updated a few documents that referenced the old system, and tried to put instructions for using the new system onto the forms that users would use to enter data. I moved the original database to a different directory on the shared drive, and had the client start using the new one. After a bit of adjusting to small user interface issues, as well as the idea that more than one user could use the database, the client was happy with the results.

Turn off the download manager in Mozilla

I hate the download manager that Mozilla turns on by default. It’s another window you have to Alt – Tab through, and it rarely has useful information for me. Granted, if I was on a modem, or downloaded files often, it might be more useful. But as it is, 90% of the time that it pops open, I don’t even look at it until the download is done. In fact, I can’t think of a single time when having the download manager has been useful for me, even though it’s high on other people’s lists of cool features in Mozilla (or, in this case, Firefox).

Luckily, you can turn it off in Mozilla 1.6 (I haven’t tried in earlier versions). Go to edit / preferences / download / … and choose the behavior you want when downloading files.

Windows frustrations

I’m reading Hackers by Steven Levy right now. This book is about the first people to really program computers with enthusiasm and an eye towards some of the anarchic possibilities of the machine. And the obstacles they overcame were tremendous. Writing entire video games in assembly language, re-implementing FORTRAN for different platforms (heck, writing anything in FORTRAN at all is a trial), working with computers the size of entire building floors, dealing with the existing IBM priesthood… There were plenty of obstacles to getting work done with a computer back then.

And, there still are, I have to say. I’m currently writing this from my first laptop ever. I love it. The mobility, the freedom, especially when combined with a wireless network card. This computer came with Windows XP and I plan to leave windows on this box, primarily so that I can do more J2ME development.

Now, the first thing any Unix user learns is that you shouldn’t log in as root any more than you absolutely have to. The reasons for this are many: you can delete system files unintentionally, there’s no log file to recreate disaster scenarios, and in general, you just don’t need to do this. The first thing I do every time I’m on a new desktop Unix box is download a copy of sudo and install it. Then I change the root password to something long and forgettable, preferably with unpronounceable characters. I do this so that there’s never any chance of me logging in as the super user again. I will say that this has caused me to have to boot from a root disk a time or two, but, on the other hand, I’ve never deleted a device file unintentionally.

Anyway, the purpose of that aside was to explain why I feel that you should always run your day to day tasks as a less privileged user. Even more so on Windows than on Unix, given the wider spread of Windows viruses and, to be honest, my lack of experience administering Windows. So, the first thing I did when I got this new computer was to create a non administrative user. Of course, for the first couple of days, I spent most of my time logged in as the administrative user, installing OpenOffice, Vim and other software. I also got my wireless card to work, which was simple. Plug in the card, have it find the SSID, enter the WEP key and I was in business.

That is, until I tried to access the Internet via my wireless card when logged in as the limited user. The network bounces up and down, up and down, and there doesn’t seem to be anything I can do about it. Every second, the network changed status. To be honest, I haven’t looked in google because I can’t even think of how to describe the phenomenon. But, when I’m logged in as the administrator, it’s smooth sailing. There are some things I plan to try, like creating another administrator and seeing if that account has similar problems. If that’s the case, it’s probably not the fact that my limited privilege account has limited privileges, but rather that the network software hasn’t been made accessible to it. However, this situation is especially frustrating because the time when I least want to be logged in as an administrative user is when I’m most vulnerable to worms, viruses and rogue email attachments–that is to say, when I’m connected to the Internet.

I remember fighting this battle 3 years ago, when I was using Windows NT on a team of software developers. I was the only one (so far as I know) to create and use regularly a non privileged account. Eventually, I just said ‘screw it’ and did everything as the administrative user, much as I’ll do now after a few more attempts to make the unprivileged user work. Windows just doesn’t seem to be built for this deep division between administrators and users, and that doesn’t seem to have changed.

How can you keep a website out of a search engine?

It’s an interesting problem. Usually, you want your site to be found, but there are cases where you’d rather not have your website show up in a search engine. There are many reasons for this: perhaps because google never forgets, or perhaps because what is on the website is truly private information: personal photos or business documents. There are several ways to prevent indexing of your site by a search engine. However, the only sure fire method is to password protect your site.

If you require some kind of username and password to access your website, it won’t be indexed from by any search engine robots. Even if a search engine finds it, the robot doing the indexing won’t be able to move past the login page, as they won’t have a username and password. Use a .htaccess if you have no other method of authenticating, since even simple text authentication will stop search engine robots. Intranets and group weblogs will find this kind of block useful. However, if it’s truly private information, make sure that you use SSL because .htaccess access control sends passwords in clear text. You’ll be defended from search engines, but not from people snooping for interesting passwords.

What if you don’t want people to be forced to remember a username and password? Suppose you want to share pictures of baby with Grandma and Grandpa, but don’t want to either force them to remember anything, nor allow the entire world to see your child dressed in a pumpkin suit. In this case, it’s helpful to understand how search engines work.

Most search engines start out with a given set of URLs, often submitted to them, and then follow all the links in a relentless search for more content (for more on this, see this excellent tutorial). Following the links means that submitters do not have to give the search engine each and every page of a site, as well as implying that any page linked to by a submitted site will eventually be indexed as well. Therefore, if you don’t want your site to be searched, don’t put the web sites URL any place it could be picked up. This includes archived email lists, Usenet news groups, and other websites. Make sure you make this requirement crystal clear to any other users who will be visiting this site, since all it takes is one person posting a link somewhere on the web, or submitting the URL to a search engine, for your site to be found and indexed. I’m not sure whether search engines look at domain names from whois and try to visit those addresses; I suspect not, simply because of the vast number of domains that are parked, along with the fact that robots have plenty of submitted and linked sites to visit and index.

It’s conceivable that you’d have content that you didn’t want searched, but you did want public. For example, if the information is changing rapidly: a forum or bulletin board, where the content rapidly gets out of date, or you’re EBay. You still want people to come to the web site, but you don’t want any deep links. (Such ‘deep linking’ has been an issue for a while, from 1999 to 2004.) Dynamic content (that is, content generated by a web server, usually from a relational database) is indexable when linked from elsewhere, so that’s no protection.

There are, however, two ways to tell a search engine, “please, don’t index these pages.” Both of these are documented here. You can put this meta tag: <meta name=”robots” content=”none”> in the <head> section of your HTML document. This lets you exclude certain documents easily. You can also create a robots.txt file, which allows you to disallow indexing of documents on a directory level. It also is sophisticated enough to do user-agent matching, which means that you can have different rules for different search engines.

Both of these latter approaches depend on the robot being polite and following conventions, whereas the first two solutions guarantee that search engines won’t find your site, and hence that strangers will have a more difficult time as well. Again, if you truly want your information private, password protect it and only allow logins over SSL.