moore – Page 86 – Dan Moore!

The Economist citing Wikipedia

Well, Wikipedia has hit the big time, as far as I’m concerned. Check out article on Bayesian reasoning and the human mind. An interesting article, given that Bayesian filtering is used to fight spam. But what really blew me away was the figure entitled “Vital Statistics, which is drawn from Wikipedia. The fact that The Economist, a major publication, is using it as a source is even more compelling than The Onion mocking it.

Technology and Society | moore | January 5, 2006

On ‘The Perils of Java Schools’

Joel has another interesting article, The Perils of Java Schools, where he laments the fact that many CS degrees are focusing on Java. His main points are that if you don’t focus on the harder parts of CS (recursion and pointers) then you don’t weed out inadequate programmers, and that Java doesn’t allow for adequate examination of those harder parts. Weeding is needed, even though the harder parts aren’t–for most jobs. (His example of the need for pointers is working on operating systems–how many programmers really need to do that? Darn few. [Of course, for those who are interested in working on operating systems, I’d recommend avoiding a Java based CS degree.]) In addition to weeding out those who don’t have a talent for programming, recursion and pointers are a great interview topics (see his Guerrilla Guide to Interviewing) for finding smart people to hire.

When I read his article, I thought about the two related responses to his lament. The first is that coding isn’t the most important thing for many ‘programming jobs’ anymore. For a large number of them, the ability to relate to business problems and solve business needs is much more important. See this article for a related discussion on how to avoid being outsourced. A pure coder is more likely to be outsourced than a coder who also knows the business. I’d argue that at many organizations, a brilliant pure coder who can’t relate to the business folks is less effective than a decent coder who can extract requirements.

I don’t have a CS degree. It has definitely hurt me at times: I’m not as comfortable with some of the lower level constructs (parse trees, pointers) as other colleagues with a traditional CS degree. However, my liberal arts education has benefited me, because the writing and oral communication skills that I honed at college help me pinpoint what non-technical folks want to build. In fact, while building a system is fun, the real challenge and reward of software engineering is finding out what needs to be built, and figuring out how to build it. Both types of skills are necessary.

But, the other point of his lament remains. How do you find intelligent software engineers, and how do you distinguish those who talk a good game from those who can actually play it? I was sitting around the poker table a few days ago and some friends were discussing MVC and n-tier architectures. It’s so easy to toss around those high falutin’ words–it’s another to understand the nitty gritty of building them–I don’t. But I don’t think anyone who hasn’t worked on those large scale systems really does–CS degree or not.

I don’t know any one way to distinguish good workers. The closest thing to a methodology I have is to ask them questions about real world situations that stonkered me, and see if their answers make sense. Joel still makes sense in the Guerrilla Guide when he says you want people who are “smart and get things done”. But I believe that the focus of development has changed enough that the lack of C knowledge is not a loss. Just as the lack of punch card skills is not a loss.

(Note that I’ve used software engineers, developers, and programmers synonymously above, which may or may not be a justifiable abuse of the English language.)

Programming | moore | December 30, 2005

Mozilla, XPCOM and xpcshell

Most people know about mozilla through Firefox, their IE browser replacement. (Some geeks may remember the Netscape source code release.) But mozilla is a lot more than just a browser–there’s an entire API set, XPCOM and XUL, that you can use to build applications. (There are books about doing so, but mozilla development seems to run ahead of them.) I’m working on a project that needs some custom browser action, so looking at XPCOM seemed a wise idea.

XPCOM components can be written in a variety of languages, but most of the articles out there focus on C++. While I’ve had doubts about scripting languages and large scale systems, some others have had success heading down the javascript path. I have no desire to delve into C++ any more than I have to (I like memory management), so I’ll probably be writing some javascript components. Unfortunately, because XPCOM allows javascript to talk to C++, I won’t be able to entirely avoid the issue of memory management.

xpcshell is an application bundled with mozilla that allows me to interact with mozilla’s platform in a very flexible manner. It’s more than just another javascript shell because it gives me a way to interact with the XPCOM API (examples). To install xpcshell (on Windows) make sure you download and install the zip file, not the Windows Installer. (I tried doing the complete install and the custom install, and couldn’t figure out a way to get the xpcshell executable.)

One cool thing you can do with xpcshell is write command line javascript scripts. Putting this:

var a = "foobar";
print(a);
a=a.substr(1,2);
print(a);

in a file named test.js gives this output:

$ cat test.js | ./xpcshell.exe
foobar
oo

Of course, this code doesn’t do anything with XPCOM–for that, see aforementioned examples.

I did run into some library issues running the above code on linux–I needed to execute it in the directory where xpcshell was installed. On windows that problem doesn’t seem to occur.

A few other interesting links: installing xpcshell for firefox, firefox extensions with the mozilla build system, a javascript library easing XPCOM development, and another XPCOM reference.

Technology | moore | December 20, 2005

On contracting

I recently (within the last couple of months) took a full time job as a software developer. After two years of contract software development, I have found it to be quite a change. I think that both contracting and full time employment have good things to offer; if you can manage it economically, contracting is well worth doing for a while. Why? A number of reasons:

When you contract, you’re responsible for finding your own work and making sure you get paid. This alone is worthwhile, as it gives you a fantastic appreciation for sales and accounting. The networking to find your next gig is great experience for moving up the food chain in a company.

You also get exposed to different technologies and businesses. In my two years, I did work in 6 different programming languages, a number of different frameworks (both free and expensive) and build environments, and 3 or 4 operating systems. That was great–I grew technically and learned how to tell if another techie was competent (and what they were competent at!) relatively quickly.

But more importantly was the variety of business situations I worked in. I did work alone, with one other technical person, on a team with no dedicated QA, on a team with dedicated QA and build staff. I did work with small companies, medium sized companies and enormous companies. I worked with startups and with established firms. Some of the organizations had excellent management structures–in others the inmates were running the asylum. And I paid much more attention than I would have if I had been an employee, because my paycheck rested on making sure that whoever was in charge was happy. This breadth of business experience is something that you cannot come by if you’re a full time employee.

Beyond that experience, there’s also the time and money factors. If you want to control your time, contracting is great–I regularly took multiple week vacations, because I was willing to sacrifice earnings for them. On the flip side, if money is more important than time, you can certainly earn significant amounts of money when contracting, as you get paid by the hour.

Contracting also has a lot lower stress than an employee-employer relationship. In my opinion, in a proper employee-employee relationship, the employee is loyal to the goals of the company (which implies understanding them–itself a difficult matter) and helps achieve those goals. In addition, if an employee makes a technology decision, they are living with that decision for the foreseeable future. These factors combine to make work as a full time employee more stressful; more rewarding, but more stressful. In the contractor-client relationship, you still want to do a great job. But the permanence of technology decisions isn’t there–either you’re doing work that fits into an existing technology stack or you’re making technology decisions but will be moving on in some finite period of time. Your stress is limited to whether your invoices will get paid!

There are downsides to contracting, too. For one, you’re not part of a team. You may work in a team, and they may make an effort to include you, but you’re not really part of that team–you’re just a hired gun who will do the work and then head off. When the inevitable bug comes up 5 months down the road, they can call you, but doing so incurs a higher transaction cost than if they just had to grab a fellow employee for a minute or two. The flexibility of money and time that you have can cause resentment too.

While the breadth of technologies and business methods can be great to experience, it can also be difficult to process. To hop in and be productive on the first day or two of a new contract can be hard, because you want to make sure you’re fitting into the existing processes.

All of the above comments are based solely on my experience. But I’d say that if you’re considering contracting, do it! It’s a great experience. Make sure you have a buffer of 6 months of pay, and then jump in.

Programming | moore | December 18, 2005

Set up your own geocode service

Update, 2/9/06: this post only outlines how to set up a geocode engine for the United States. I don’t know how to do it for any other countries.

Geocoder.us provides you with a REST based geocoding service, but their commercial services are not free. Luckily, the data they use is public domain, and there are some helpful perl modules which make setting up your own service a snap. This post steps you through setting up your own geocoding service (for the USA), based on public domain census data. You end up with a Google map of any address in the USA, but of course the lat/long you find could be used with any mapping service.

First, get the data.


$ wget -r -np -w 5 --random-wait ftp://www2.census.gov/geo/tiger/tiger2004se/

If you only want the data for one state, put the two digit state code at the end of the ftp:// url above (eg ftp://www2.census.gov/geo/tiger/tiger2004se/CO/ for Colorado’s data).

Second, install the needed perl modules. (I did this on cygwin and linux, and it was a snap both times. See this page for instructions on installing to a nonstandard location with the CPAN module and don’t forget to set your PERL5LIB variable.)

$ perl -MCPAN -e shell
cpan> install S/SD/SDERLE/Geo-Coder-US-1.00.tar.gz
cpan> install S/SM/SMPETERS/Archive-Zip-1.16.tar.gz

Third, import the tiger data (this code comes from the Geo::Coder::US perldoc, and took 4.5 hours to execute on a 2.6ghz pentium4 with 1 gig of memory). Note that if you install via the CPAN module as shown above, the import_tiger_zip.pl file is under ~/.cpan/:


$ find www2.census.gov/geo/tiger/tiger2004se/CO/ -name \*.zip 
  | xargs -n1 perl /path/to/import_tiger_zip.pl geocoder.db

Now you’re ready to find the lat/long of an address. Find one that you’d like to map, like say, the Colorado Dept of Revenue: 1375 Sherman St, Denver, CO.


$ perl -MGeo::Coder::US -e 'Geo::Coder::US->set_db( "geocoder.db" );
my($res) = Geo::Coder::US->geocode("1375 Sherman St, Denver, CO" ); 
print "$res->{lat}, $ res->{long}\n\n";'

39.691702, -104.985361

And then you can map it with Google maps.

Now, why wouldn’t you just use Yahoo!’s service (which provides geocoding and mapping APIs)? Perhaps you like Google’s maps better. Perhaps you don’t want to use a mapping service at all, you just want to find lat/longs without reaching out over the network.

Technology | moore | December 1, 2005

Running Tomcat on port 80

The typical java web application is fronted by a web server (usually Apache) for a number of reasons. Apache handles static content well, and also is easier to configure to listen on privileged ports (under 1024). I’ve written before about different options for connecting Tomcat and Apache, but there are times when all you need is a servlet engine, and installing Apache is overkill. If you don’t want users to see a nonstandard port in their url (http://foo.com:8080/webapp/), then you have a couple of options.

You can run tomcat as root. This is probably not a good idea, since anyone who can write a jsp can now execute arbitrary commands as root. I don’t know how Tomcat’s security is, but in general, the fewer applications running with super user privileges, the better.

If you share my dislike of Tomcat running as root, here’s an excellent rundown of the options for running Tomcat on port 80. I went the route of jsvc. This seemed to work just fine, though every time we shut down tomcat, we would get an entry in the error log file: jsvc.exec error: Service exit with a return value of 143.

That didn’t start to disturb me until I realized that the destroy method of our servlets weren’t being called. This method cleaned up after the servlet and it was important that it get executed. A bit of googling turned up a discussion of this very problem. The version of jsvc that ships with Tomcat 5.0.27 doesn’t shut down Tomcat very nicely.

I downloaded and compiled subversion, because that’s the version control system that the daemon jakarta project (of which jsvc is a part) used. I then checked out the version of the source tagged daemon-1_0_1 (svn co http://svn.apache.org/repos/asf/jakarta/commons/proper/daemon/tags/daemon-1_0_1/) and rebuilt jsvc. This new version allows tomcat to call the destroy methods of servlets, and everything seems to be happy.

Java | moore | November 29, 2005

Amazon’s Mechanical Turk

I did some work a long time ago with Amazon Web Services; I gave them an email address and they periodically send me newsletters about their web services. The most recent one contained a link to an article about a new service: Amazon Mechanical Turk. This service provides ‘Artificial Artificial Intelligence’ and lets developers place tasks in front of humans in a scalable, standardized manner. Amazon, with their infrastructure, makes sure that the task is completed and pays the human who completes the task. Right now, I only saw one set of tasks, sponsored by Amazon, so I’m not sure of the uptake. But this is certainly an fascinating idea–an interesting inverse of the normal computer/human relationship.

Technology and Society | moore | November 27, 2005

The Ghost of Missing Requirements

I read OK/Cancel sporadically, but the Halloween cartoon was just too good to not call attention to:

I think we’ve all been on such haunted projects.

Programming | moore | November 14, 2005

unescaping a string with PL/SQL

I’ve written about PL/SQL before, but I’ve recently started working on a project that uses it heavily. Given the amount of code written for Oracle databases, I’m rather suprised that there’s not a PL/SQL Cookbook, where, like the Perl Cookbook and the Java Cookbook (more cookbooks from O’Reilly are listed here). There is an Oracle Cookbook, but based on a quick scan of Amazon, it’s is focused, as you’d expect, more on the database design than on PL/SQL programming. (Interestingly, there is a Oracle+PHP cookbook, and a PL/SQL sample code page but neither of those is quite what I’m looking for.)

The reason that I’d like a PL/SQL cookbook is that there are large sets of problems that routinely need to be solved in PL/SQL, but the language is so low level (though they just added some regex support in 10g; bravo!) that doing these routine tasks and making sure they’re correctly implemented can be difficult and tedious. This is especially true when it’s a programmer from a different language who’s used to higher levels of abstraction (like, for example, the good folks who author CPAN modules provide)–it’d be well worth my $70 to make sure that I never had to deal with a problem like, say, unescaping a string.

For that’s the problem I recently had to solve. Essentially, we have a string that looks like this: yellow,apple. This string represents two values, which need to be put in different places by splitting them up into ‘yellow’ and ‘apple’. All well and good until the possiblity of embedded commas arises, for it’s possible that the desired end values were ‘yellow,blue’ and ‘apple,banana’. The answer, of course, is to escape the commas on the way in (turning the second input into something like this: yellow:,blue,apple:,banana, and when processing to unescape those special characters (both the comma and the escape character, which in the example is the colon). That’s what these three functions do. They take a string like the above examples and parse it into a table, to be iterated over at your leisure.

/* ------------------- function splitit ------------------*/
FUNCTION splitit(p_str VARCHAR2, p_del VARCHAR2  := ',',p_idx PLS_INTEGER, p_esc VARCHAR2

:= ':')
RETURN INTEGER
IS
l_idx       PLS_INTEGER;
l_chars_before      VARCHAR2(32767);
l_escape_char       VARCHAR2(1) := p_esc;
l_chars_before_count        PLS_INTEGER := 0;
BEGIN
>
LOOP
l_idx := instr(p_str,p_del, p_idx);
IF l_idx > 0 then
WHILE substr(p_str, l_idx-l_chars_before_count-1, 1) = l_escape_char LOOP
l_chars_before_count := l_chars_before_count +1;
END LOOP;

IF mod(l_chars_before_count, 2) = 0 THEN
-- if chars_before_count is even, then we're at a segment boundary
RETURN l_idx;
ELSE
-- if odd, then we're at an escaped delimiter, want to move past
RETURN splitit(p_str, p_del, l_idx+1, p_esc);
END IF;
l_chars_before_count := 0;
ELSE
RETURN l_idx;
EXIT outer;
END IF;
END LOOP;
END splitit;
/* ------------------- function splitit ------------------*/

/* ------------------- function unescape ------------------*/

FUNCTION unescape(p_str VARCHAR2, p_del VARCHAR2 := ',', p_esc VARCHAR2 := ':')
RETURN VARCHAR2
IS
l_str VARCHAR2(32767);
BEGIN
l_str := replace(p_str, p_esc||p_del, p_del);
l_str := replace(l_str, p_esc||p_esc, p_esc);
RETURN l_str;
END unescape;
/* ------------------- function unescape ------------------*/

/* ------------------- function split ------------------*/

FUNCTION split(p_list VARCHAR2, p_del VARCHAR2 := ',')
RETURN split_tbl
IS
l_idx       PLS_INTEGER;
split_idx   PLS_INTEGER     := 0;
l_list      VARCHAR2(32767) := p_list;
l_chars_before      VARCHAR2(32767);
l_escape_char       VARCHAR2(1) := ':';
l_array split_tbl := split_tbl('','','','','','','','','','');
BEGIN
l_list := p_list;
LOOP
split_idx := split_idx + 1;
IF split_idx > 10 then
EXIT;
END IF;

l_idx := splitit(l_list, p_del, 1, l_escape_char);
IF l_idx > 0 then
l_array(split_idx) := unescape(substr(l_list,1,l_idx-1), p_del,

l_escape_char);
l_list := substr(l_list,l_idx+length(p_del));
ELSE
l_array(split_idx) := l_list;
EXIT;
END IF;
END LOOP;
RETURN l_array;
END split;
/* ------------------- function split ------------------*/

/* in the header file, split_tbl is defined */
TYPE split_tbl IS TABLE of varchar2(32767)

Not all of this code is mine–I built on a solution from a colleague. But I hope this saves one other person from the afternoon I just endured. And if you are a PL/SQL expert and care to critique this solution, please feel free.

Databases Oracle Programming | moore | November 12, 2005

Article on open formats

Gervase Markham has written an interesting article about open document formats. I did a bit of lurking on the bugzilla development lists for a while and saw Gervase in action–quite a programmer and also interested in the end user’s experience. I think he raises some important issues–if html had been owned by a company, the internet (as the web is commonly known, even though it’s only a part of the internet) would not be where it is today. If Microsoft Word (or WordPerfect) had opened up their document specification (or worked with other interested parties on a common one), other companies could have competed on features and consumers would have benefited. More on OpenDocument, including a link to a marked up version of a letter from Microsoft regarding the standard.

Technology | moore | November 3, 2005

All posts by moore - 86. page

The Economist citing Wikipedia

On ‘The Perils of Java Schools’

Mozilla, XPCOM and xpcshell

On contracting

Set up your own geocode service

Running Tomcat on port 80

Amazon’s Mechanical Turk

The Ghost of Missing Requirements

unescaping a string with PL/SQL

Article on open formats

Letters to a New Developer

Pages

Subscribe

Socials

Categories

Archives