Skip to content

Amazon’s Mechanical Turk

I did some work a long time ago with Amazon Web Services; I gave them an email address and they periodically send me newsletters about their web services. The most recent one contained a link to an article about a new service: Amazon Mechanical Turk. This service provides ‘Artificial Artificial Intelligence’ and lets developers place tasks in front of humans in a scalable, standardized manner. Amazon, with their infrastructure, makes sure that the task is completed and pays the human who completes the task. Right now, I only saw one set of tasks, sponsored by Amazon, so I’m not sure of the uptake. But this is certainly an fascinating idea–an interesting inverse of the normal computer/human relationship.

unescaping a string with PL/SQL

I’ve written about PL/SQL before, but I’ve recently started working on a project that uses it heavily. Given the amount of code written for Oracle databases, I’m rather suprised that there’s not a PL/SQL Cookbook, where, like the Perl Cookbook and the Java Cookbook (more cookbooks from O’Reilly are listed here). There is an Oracle Cookbook, but based on a quick scan of Amazon, it’s is focused, as you’d expect, more on the database design than on PL/SQL programming. (Interestingly, there is a Oracle+PHP cookbook, and a PL/SQL sample code page but neither of those is quite what I’m looking for.)

The reason that I’d like a PL/SQL cookbook is that there are large sets of problems that routinely need to be solved in PL/SQL, but the language is so low level (though they just added some regex support in 10g; bravo!) that doing these routine tasks and making sure they’re correctly implemented can be difficult and tedious. This is especially true when it’s a programmer from a different language who’s used to higher levels of abstraction (like, for example, the good folks who author CPAN modules provide)–it’d be well worth my $70 to make sure that I never had to deal with a problem like, say, unescaping a string.

For that’s the problem I recently had to solve. Essentially, we have a string that looks like this: yellow,apple. This string represents two values, which need to be put in different places by splitting them up into ‘yellow’ and ‘apple’. All well and good until the possiblity of embedded commas arises, for it’s possible that the desired end values were ‘yellow,blue’ and ‘apple,banana’. The answer, of course, is to escape the commas on the way in (turning the second input into something like this: yellow:,blue,apple:,banana, and when processing to unescape those special characters (both the comma and the escape character, which in the example is the colon). That’s what these three functions do. They take a string like the above examples and parse it into a table, to be iterated over at your leisure.

/* ------------------- function splitit ------------------*/
FUNCTION splitit(p_str VARCHAR2, p_del VARCHAR2  := ',',p_idx PLS_INTEGER, p_esc VARCHAR2

:= ':')
RETURN INTEGER
IS
l_idx       PLS_INTEGER;
l_chars_before      VARCHAR2(32767);
l_escape_char       VARCHAR2(1) := p_esc;
l_chars_before_count        PLS_INTEGER := 0;
BEGIN
>
LOOP
l_idx := instr(p_str,p_del, p_idx);
IF l_idx > 0 then
WHILE substr(p_str, l_idx-l_chars_before_count-1, 1) = l_escape_char LOOP
l_chars_before_count := l_chars_before_count +1;
END LOOP;

IF mod(l_chars_before_count, 2) = 0 THEN
-- if chars_before_count is even, then we're at a segment boundary
RETURN l_idx;
ELSE
-- if odd, then we're at an escaped delimiter, want to move past
RETURN splitit(p_str, p_del, l_idx+1, p_esc);
END IF;
l_chars_before_count := 0;
ELSE
RETURN l_idx;
EXIT outer;
END IF;
END LOOP;
END splitit;
/* ------------------- function splitit ------------------*/

/* ------------------- function unescape ------------------*/

FUNCTION unescape(p_str VARCHAR2, p_del VARCHAR2 := ',', p_esc VARCHAR2 := ':')
RETURN VARCHAR2
IS
l_str VARCHAR2(32767);
BEGIN
l_str := replace(p_str, p_esc||p_del, p_del);
l_str := replace(l_str, p_esc||p_esc, p_esc);
RETURN l_str;
END unescape;
/* ------------------- function unescape ------------------*/

/* ------------------- function split ------------------*/

FUNCTION split(p_list VARCHAR2, p_del VARCHAR2 := ',')
RETURN split_tbl
IS
l_idx       PLS_INTEGER;
split_idx   PLS_INTEGER     := 0;
l_list      VARCHAR2(32767) := p_list;
l_chars_before      VARCHAR2(32767);
l_escape_char       VARCHAR2(1) := ':';
l_array split_tbl := split_tbl('','','','','','','','','','');
BEGIN
l_list := p_list;
LOOP
split_idx := split_idx + 1;
IF split_idx > 10 then
EXIT;
END IF;

l_idx := splitit(l_list, p_del, 1, l_escape_char);
IF l_idx > 0 then
l_array(split_idx) := unescape(substr(l_list,1,l_idx-1), p_del,

l_escape_char);
l_list := substr(l_list,l_idx+length(p_del));
ELSE
l_array(split_idx) := l_list;
EXIT;
END IF;
END LOOP;
RETURN l_array;
END split;
/* ------------------- function split ------------------*/

/* in the header file, split_tbl is defined */
TYPE split_tbl IS TABLE of varchar2(32767)

Not all of this code is mine–I built on a solution from a colleague. But I hope this saves one other person from the afternoon I just endured. And if you are a PL/SQL expert and care to critique this solution, please feel free.

Article on open formats

Gervase Markham has written an interesting article about open document formats. I did a bit of lurking on the bugzilla development lists for a while and saw Gervase in action–quite a programmer and also interested in the end user’s experience. I think he raises some important issues–if html had been owned by a company, the internet (as the web is commonly known, even though it’s only a part of the internet) would not be where it is today. If Microsoft Word (or WordPerfect) had opened up their document specification (or worked with other interested parties on a common one), other companies could have competed on features and consumers would have benefited. More on OpenDocument, including a link to a marked up version of a letter from Microsoft regarding the standard.

XHTML Compatibility in the mobile world

Here is an interesting outline of some of the issues faced in making mobile user interfaces work well with today’s technologies. What’s old is new again–browsers on cell phones are dealing with the same standards compliance and diversity issues that desktop browsers were faced with 10 years ago. The difference is that there’s no one (yet) with large enough market share to rule by fiat (like Navigator and then Internet Explorer did).

Cross browser javascript/css development issues

I’m working on an application that needs to be supported on a wide variety of browsers, and unfortunately includes some interesting javascript and css. There are three problems we’ve encountered so far.

1. Finding Browser share

When you want to support most users, you have to try to figure out what they’re using. There are at least three or four different sites which give you their browser share, but I think you have to pay if you want really accurate, detailed information; here’s one source, here’s another, and here’s one last site. Update, 11/3: here are stats for the www.bbc.co.uk homepage.

2. Javascript specifications

Perhaps it’s just me, but I’ve had a devil of a time finding a list of javascript events supported by various browsers. I’ll give it to Microsoft, they have some documentation on supported events; I couldn’t find a similar list of events anywhere on the mozilla site. Here’s the Mozilla Javascript page but I don’t see anything resembling an API there. (All I want is a javascript javadoc!) Here is the best comparison of event support on modern browsers that I found. Update 10/31: here is a list of events that Gecko recognizes.

3. Getting ahold of old browsers and older operating systems, so you can test

Luckily, this is fairly easy to solve. VMWare (which I’ve written about previously) takes care of the various operating systems (well, that and a mac mini) that we need to test under. And a simple google search turned up a fantastic archive of old browsers: browsers.evolt.org, which has many different browsers going all the way back to NCSA Mosaic!.

JBoss at Work ships

JBoss at Work by Thomas Marrs and Scott Davis and a book I technically reviewed, is shipping. Having read all of it, I’d say it’s worth a look both for the technical content–the authors take the reader all the way through a standard J2EE application, pointing out all the JBoss specific configurations and gotchas–and for the slightly whimsical and easy to read style. Sometimes it was a be repetitive, but that’s not bad for a book aimed at getting folks without much experience up and running. Read the sample chapter on ear building and deployment and see if it fits with your needs.

A quick survey of online map generation options

I have a client who wants to put some maps on his commercial website. I’ve done a bit of looking around, and it’s not clear to me what the best way to do it is. There are really two types of mapping services out there. One depends on URL creation, like MapQuest, MapBlast, Yahoo and Google–you don’t register or do much coding at all, you just create a GET string with the address encoded in it. The other is a web service where you register for a key and use an API to generate a map, like Yahoo, Google and MapPoint. You’ll note that Yahoo and Google appear on both of those lists–that’s because they provide both a URL interface and a more formal API.

Now, even though I am not a lawyer, it seems to me, via looking around at the various Terms Of Service (TOS), that commercial use of any of the URL interfaces is not an option. The Yahoo Maps TOS says

The data included in Yahoo! Maps, including but not limited to maps, routes, and/or directions (“Data”), is provided for your personal use only…

while the MapQuest TOS says

…MapQuest grants you a nonexclusive, non-transferable license to view and print the Materials solely for your own personal non-commercial use.

Google Maps, which has been extensively mashed up with other sorts of data, appears to abide by the general Google TOS which say

The Google Services are made available for your personal, non-commercial use only. You may not use the Google Services to sell a product or service, or to increase traffic to your Web site for commercial reasons, such as advertising sales.

To be fair, you may contact Google about commercial services: [i]f you want to make commercial use of the Google Services, you must enter into an agreement with Google to do so in advance. Updated, 10/30: apparently the maps API is governed by a different TOS, which apparently allows commercial use “as long your site is generally accessible to consumers without charge”. My apologies. I didn’t look at MapBlast too carefully, because it’s built on the MapPoint web service, which has a noncommercial provision.

Luckily, at least for those of us in the United States of America, there are two services provided by the Census Bureau (see, those taxes you’re paying are worth something!) which provide mapping data. As far as I could find, these services have no limits on commercial or non commercial use, but they are a bit hokier than the ones I laid out above. (Here are the Tiger TOS and the general Census position of resale.) The Tiger site was the preferable of the two, because it actually gives you a marker for your location. Of course, you have to geocode your address to find your lat/long, but Geocoder.us makes this easy, and even provides instructions on making your own service. The data for Tiger is from 1998, however. If you’re looking for more recent data, FactFinder is worth looking at. It didn’t work for my client because it provided no way to pinpoint a particular address, though it did allow you to recenter on one without geocoding it.

Neither of these provide directions, as far as I could see, so if you’re looking for that, as well as if you want the cooler interfaces of the private sector, you need to look to the web services.

MapPoint, which is a Microsoft service, explicitly denies external commercial use in its TOS:

MapPoint Web Service is for your individual use, solely for internal use by you for your business, or for your own personal use.

Yahoo and Google, however, take a bit more flexible position. For each of these services, according to their TOS, you need to contact them to use the web service they provide in a commercial context. (Yahoo Maps Web Service TOS, and the Google TOS for the web service which is the same as that for the URL interface service.) I have no idea what kind of licensing agreement will emerge from talks with these companies, but, from reading their TOSes, it appears to me that if you want to use their data in a commercial manner, you need to have that conversation.

I’ve covered all the services that provide maps that I know of. Please let me know if there are any that I’ve missed or anything I’ve misinterpreted.