Skip to content

Programming - 10. page

February Boulder Denver New Tech Meetup Notes

I went to the Boulder Denver New Tech meetup tonight and, boy was it a good time. I’ve been to a few Boulder Java Users Groups, and some more academic talks at CU, but this was different than both. At the BJUGs, it’s typically a bunch of geeks and a very technical topic. At the CU colloquia, it’s an academic crowd, with a lot of focus on academic questions, and an even more technical topic.This meetup, on the other hand, had, I felt, a nice mixture of technical folks and business folks. This was the sixth one held in Boulder–for more information check out their website. I believe the format is 5 minutes of presentation followed by questions eliminates a lot of fluff. The presenters tonight were:

* David Cohen: www.techstars.org

David talked about his new organization, which recently was in the local press. Techstars.org will select 10 teams with technology ideas and fund them for a summer (to the tune of $15k). During that time, they’ll be mentored by a wide variety of successful local entrepreneurs and, at the end, have a chance to pitch their idea to angel investors. The strength of the team will be a large factor in determining the winners, and applications are due on March 31 (they’ve already had over 100 teams apply).

* Russ Bryant: GoUrban.net

Russ talked about the website his company is building, which focuses on the urban lifestyle demographic (he mentioned blacks and latinos in particular). It’s particularly aimed a segment of the population that has bad or no credit, and a large part of the business plan depends on selling prepaid debit cards to that population. Such a debit card will allow users to participate in internet purchasing, as well as gain the other benefits of debit cards (users can charge up the cards at stores around the country). GoUrban.net will also be a drop ship ecommerce site, and has a long list of partnerships.

* Elliot Turner: Orchestr8.net

Elliot focused on mashups for the ordinary user. His application, Alchemy Point, is a Firefox toolbar that aims to provide some of the functionality of GreaseMonkey, but for normal users. He mentioned that mashups are a great way to create a user centric web, by allowing users to grab only what is interesting to them (as opposed to most websites, which have a distinctly larger audience). The toolbar comes with a number of preconfigured mashups (‘make this text bold’, ‘put a map next to this address’) that have been written by his company. Users also have the capability, with a simple XML syntax (with a graphical UI yet to come), to create their own, and to share them.

* Dennis Yu: www.thesocialcorp.com

Dennis talked about search engine marketing, which is the business of building campaigns on the major search engine sites. He said a campaign consists of keyword + ad copy + bid for the keyword. He showed an example of a campaign his company built for a New Year’s Eve ticket seller and said for every 9 cents the company spent on SEM, they got one dollar in sales. He also mentioned that there are a ton of MFA (made for AdSense) sites out there, and as an advertiser, you need to be aware of sites sending you worthless referrals and block them as soon as possible, as the click fraud happening is tremendous. Oh yeah, they’re also looking to help nonprofits use SEM.

* Fernando Cardenas: appventure.com

Fernando discussed the application his company is building to combat software failure. First, the cost of software failure is $60 billion a year, and 25% of all software projects are just plain abandoned. The reason for that is the disconnect between the developers and business folks. His solution is to put better tools in the hands of business folks, and have the tools create a set of business rules and a UI that the business people are happy with. Then, with the app 60-80% done, the business users can hand it off to the developers for fine tuning. I’m a bit skeptical, because I think there’s no silver bullet and the hard parts of any development conversation are the grinding out of requirements. Plus, I’ve seen one too many scary Access applications (and built a scary Paradox application myself)–not sure how the maintainability of the code generated by the tool will be. But I hope that his application succeeds as there’s a ton of places where simple applications could save a lot of scutwork. Thingamy is in the same space, I think.

All in all it was quite a nice night, and well worth attending. A lot of exciting energy in the air–it reminded me a bit of 1999 (I even saw a fellow with a Netscape fleece on!).

MolhadoRef: A refactoring-aware version control system

This past Thursday, I went to one of the CU Comp Sci Colloquia. The speaker was Danny Dig, and he discussed Refactoring-Aware Software Merging and Version Control for Object-Oriented Programs, in particular, MolhadoRef. Mr Dig billed this several times as the first refactoring aware source control system ‘that we are aware of’.

Being relatively new to IDEs in general and Eclipse in particular, I find that the automatic refactoring is one of the largest productivity enhancements that I’ve discovered. The ability to almost effortlessly change a variable name to more accurately reflect its purpose, and to know that such a change will propagate everywhere it is used, lets you update variable names more often. I was recently editing a piece of PHP code and one of the variables had changed slightly in purpose. I could have done a search and replace, but I wouldn’t be positive that the changes hadn’t broken something subtle. With Eclipse, there’s no such hesitation.

This is almost worth the cost of having to use the mouse and not being able to use vi keystrokes (at least, without the vi plugin). But when you throw in the other refactorings, including move method and rename method, well, refactoring becomes invaluable.

What follows are my notes from the talk.

Mr Dig discussed how refactoring is becoming more prevalent in todays development environments, and how this is challenging text based version control systems such as CVS and SVN. The main issues are that refactoring causes global changes, while version control systems excel at tracking local changes. In addition, refactoring impacts program elements, while version control systems focus on files. In short, refactorings are more likely than normal edits to create conflicts and/or merge errors–due to refactoring tools, changes are regularly no longer limited to files.

In addition, when you move members between classes, you can lose version history–the only connection back to the original location is a commit comment, which is neither dependable nor robust.

MolhadoRef aims to change this, by making the version control system become refactoring aware. Currently MolhadoRef exists in ‘research project’ form as an Eclipse plugin, and is part of Mr Dig’s PhD thesis work. (By ‘research form’, I think he means not ready for prime time. For example, in his demo, when you checked changes in, there was no support for a commit message.) There are plans to release it as an open source project, but it is currently not so available. MolhadoRef is based on Molhado, which is an SCM system with support for OO software, on the backend, and Eclipse and a refactoring engine on the front end.

Basically, MolhadoRef can handle API level merging, where ‘API level’ means any member declarations. It also supports normal text edits in the same manner as CVS or SVN. It also tracks compile time as well as runtime merge conflicts.

On checkin, it follows the following algorithm:

  1. Detect API change operations, using a 3 way comparison. They are also tracked with tools such as Catchup or RefactoringCrawler
  2. Detect and resolve conflicts. It attempts to do so in an automated fashion, but sometimes requires user input.
  3. Undo refactorings. Sometimes this is not entirely possible, but if it fails, MolhadoRef is intelligent enough to fall back to a text merge.
  4. Merge text changes. This happens on a method level, not on a file level.
  5. Replay refactorings. This can be hard, because you need to serialize what may have been two parallel series of refactorings. He went into some things I didn’t understand right here, including using graph theory to reorder the refactoring, or making a refactoring into an enhanced refactoring (which not only changes code, but future refactorings in the chain).

Mr Dig conducted a case study and an experiment to prove the effectiveness of MolhadoRef. In the case study, he and a partner developed MolhadoRef separately for 3 weeks, and then merged the changes using both MolhadoRef and CVS. With 1267 lines of changed code, CVS had 36 conflicts, 41 compile errors, 7 runtime errors (determined by unit test coverage), and it took 105 minutes (including human time) to merge the two trees. In contrast, MolhadoRef had 1 conflict, 0 compile and runtime errors, and it took 1 minute to merge. Pretty impressive.

The experiment was to split 10 students (fairly experienced with Eclipse and java) into two groups. Each group had a common task. At the end, the changes made by each person in the first group were merged with the changes made by each person in the second group, producing 25 merge sessions. Using these episodes, using MolhadoRef led to 3.6 times fewer conflicts, 11.6 times fewer compiler errors, and 1.5 fewer runtime errors. In addition, the time to merge was 3.5 times faster with MolhadoRef.

The larger picture that Mr Dig is pursuing is to perform automated upgrades of component based applications (pdf). If you’ve ever upgraded from one version of a java web framework to another, you’ll know how painful it can be, especially if the API has changed. According to Mr Dig, 80% of the API changes in 5 projects he’s examined so far (4 open source, 1 proprietary) were due to refactorings. He told us why MolhadoRef fits into this picture, but I didn’t note it. Bummer.

He did say that both Microsoft and IBM were interested in his work. He also mentioned, I think, that Eclipse has already accepted some of his refactoring code.

/Notes

What does that mean for us developers who don’t use build version control systems? well, not much right now, but if some of the bigger providers start to use this, it means that bigger, longer lived projects can be better managed; that code history won’t be lost as often, and that merge conflicts will decrease in severity and number. I for one can’t wait for that day.

[tags]refactoring, version control systems[/tags]

Article on information security

This article views the software insecurity problem from an economics perspective. Makes sense to me, except he totally ignores the other costs that come along with liability–lawyers, lost productivity, insurance costs. However, these costs seem reasonable. I have a friend who is a general contractor and builds houses. He was astonished to find that computer consultants don’t have to carry liability insurance. Perhaps it’s time for that.

[tags]security, software liability[/tags]

Large varchar columns can lead to huge ESRI exports when using ogr2ogr

I was recently using ogr2ogr to convert, on the fly, some data in a PostGIS to other standard formats (ESRI and MapInfo). The ESRI export in particular had some problems–it took about 4 minutes for the export of an table with 11K rows and 37 columns, and it generated a 700M dbf file. This file was then zipped (with the other config files), and in around 6 minutes was compressed to a 7M zip file, that was sent to the browser. Now, you can imagine how thrilled a user would be to wait 10 minutes for an export. Apache was timing out (the default timeout is 5 min) and I was at a loss as to how to address the performance issue.

I mentioned this to a colleague who has significantly more experience with GIS tools, and he pointed out that in the source table there were several varchar(4000) fields. Now, in PostgreSQL, [i]f the string to be stored is shorter than the declared length … values of type character varying [varchar] will simply store the shorter string. But the ESRI export does not do that–each varchar(4000) field was padded to a length of 4000, even though none of the fields approached that length.

The solution? A few simple select max(length(colname)) from table and alter table statements, and the varchar(4000)columns were decreased in size. The dbf file decreased to a 50M file, uncompressed, and the entire zip file decreased to 5M. As you can guess, the download time was slashed.

Update 2/16: The kind members of the GDAL mailing list pointed me to a document listing all the limitations of the ESRI driver for ogr2ogr.  Check out the “Creation Issues Section”.

[tags]PostGIS, ESRI, ogr2ogr[/tags]

Uploading a file in php using an iframe

I recently wrote a PHP file upload application. Yawn, right? There’s a manual entry on this task, which tells you just how often folks need to do it.

Well, I threw in a few new tricks–they were new to me at least. Basically, instead of posting to a page in the typical way, I post to a 1px by 1px invisible iframe, which then generates some javacript:

<form action="/upload.php" enctype="multipart/form-data"
method="post" target="target_upload">
<iframe id="target_upload" name="target_upload"
style="border: 0pt none ; width: 1px; height: 1px">

———–

<script type="text/javascript">
if (parent && parent.uploadFinished) {
parent.uploadFinished(<?php echo $result;?>);
} </script>

The next question is how to communicate from the iframe, which is handling the processing, to the parent window, which is where the user interface is. The upload process parses the file and puts it in a database. If there are any errors, the parsing code ouptuts “0”, otherwise, it returns “1”. This value is passed to a javascript function which is written to the iframe. It looks something like the above, where the uploadFinished function basicly shows divs containing an appropriate message.

How is this superior from a regular page post? Not by very much. It’s not truly asynchronous–you still see the browser wheel spin. The only browser actions you can take while an upload is proceeding are those that open in a new window. That’s not much, but it’s more than a normal post allows you to do. In addition, this is relatively easy to do and you don’t have to deal with creating a new ‘successful upload page’.

[tags]PHP, file upload[/tags]

Installing the median user defined function on MySQL

I just re-read “How To Lie With Statistics”, which is so good I think it should be required reading in every middle school. In it, the author makes the point that there are three kinds of ‘averages’: arithmetic mean, median and mode (here I am, contributing to Wikipedia’s dominance, due to my laziness in looking up alternative definitions of statistical concepts). In general, the median is the most informative average, because it’s not skewed by a small number of outliers.But mysql (and other databases I’ve worked on) don’t natively supprt the medan, whereas I believe most support average (by which they mean ‘arithmetic mean’). Sure, you can use a stored procedure (as suggested here for PostgreSQL. However, I’m working with MySQL 4, which does not support stored procs. However, there is another solution: user defined functions. These seem like stored procedures, except you have to write them in C (or C++).

Now, I’m not a C programmer. Luckily, someone has written and released a set of mysql user defined functions that include median (as well as many other statistical manipulations). The bad news is that it hasn’t been updated for years. The good news is that with a bit of luck and many downloads, I was able to get the median function working on mysql, both on windows as a dll, and on linux as a shared library. To repeat, I am not a C programmer, so if you see any head thumping errors below, please let me know and I’ll update this document.

First off, I was working with these versions of mysql: c:\Program Files\MySQL\MySQL Server 4.1\bin\mysql.exe Ver 14.7 Distrib 4.1.10a, for Win95/Win98 (i32) and mysql Ver 14.7 Distrib 4.1.7, for pc-linux (i686)

To get median working on windows, you need to:

  1. Download the mysql-udf tarball.
  2. patch the files if you’re running a version of mysql greater than 4.1.1. patch available here, or the patched tarball is here.
  3. Download and install Visual C++ Express. (If you have a C compiler on Windows, you can skip this step and the next. Oh, and the ones following that will probably be different. (Here’s a blog post about creating a UDF using Visual Studio C++ 2003.)
  4. Download and install the platform SDK; I only followed through step 3. If you don’t, you’ll get ‘windows.h’ errors when you try to compile the UDF.
  5. Untar the myslq-udf tarball. Patch if needed.
  6. Install the mysql header files. I was able to do this via the Windows Installer, which let me modify my existing mysql installation; I had to add the ‘C Include Files / Lib Files’ feature.
  7. Create a new directory. Copy udf_median.cc from the untarred directory to this new directory.
  8. Create a new file in that directory called udf_median.def. This file contains all the methods the UDF is exporting. Or you can just download the file I used here.
  9. Open Visual C++ Express
  10. Choose File / New / Project From Existing Code. Hit Next. Browse to the directory you just created. Create a name for the project. Hit Finish
  11. Edit the udf_median.cc file and comment out the #ifdef HAVE_DLOPEN line as well as the corresponding #endif. If I didn’t do this, I kept getting link errors, as I guess everything between those preprocessor directives was not being compiled.
  12. Add the mysql include files: right click on the project and choose properties. Expand ‘Configuration Properties’ then ‘C/C++’ and click ‘General’. On the right, add an include directory. Navigate to the Mysql include directory and add that.
  13. Add the module definition file: right click on the project and choose properties. Expand ‘Configuration Properties’ then ‘Linker’ and click ‘Input’. Add ‘udf_median.def’ to the key ‘Module Definition File’.
  14. Make sure VC knows this is a DLL: right click on the project and choose properties. Expand ‘Configuration Properties’ and click ‘General’. Choose ‘Dynamic Library (.dll)’ for Configuration Type. If you don’t do this, you’ll get errors like: error LNK2019: unresolved external symbol _WinMain because the compiler thinks you’re trying to build an application.
  15. Right click on the project and choose ‘Build’. This gives you a DLL in the Debug directory.
  16. Copy the DLL to the bin directory of your mysql installation.
  17. Create the function by logging in to mysql and running this command: CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.dll';. (The user you log in as will need to have the ability to insert rows into the mysql tables.)
  18. Test and enjoy.

Deploying the UDF to linux is much simpler, mostly because you don’t have to install a compiler, linker, etc. I used ‘gcc (GCC) 3.3.4’.

  1. Download the mysql-udf tarball.
  2. patch the files if you’re running a version of mysql greater than 4.1.1. patch available here, or the patched tarball is here.
  3. Untar the myslq-udf tarball. Patch if needed.
  4. Edit the udf_median.cc file and comment out the #ifdef HAVE_DLOPEN line as well as the corresponding #endif.
  5. Compile and link the code. Do not use the instructions on the mysql-udf homepage. If you compile with those flags, you’ll get this error when you try to add the function: mysql> CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';
    ERROR 1126 (HY000): Can't open shared library 'udf_median.so' (errno: 22 /usr/lib/udf_median.so: undefined symbol: _Znwj)
    . Rather, use the instructions in this bug report: gcc -shared -lstdc++ -I /usr/include -I /usr/local/include -I /usr/local/mysql/include/ -o udf_median.so udf_median.cc'
  6. Copy the shared library to a directory where mysql will see it. I put it in /usr/lib.
  7. Create the function by logging in to mysql and running this command: CREATE AGGREGATE FUNCTION median RETURNS REAL SONAME 'udf_median.so';. (The user you log in as will need to have the ability to insert rows into the mysql tables.)
  8. Test and enjoy.

We have pushed this UDF to production with replicated servers and haven’t seen any issues with it yet.

I want to extend my thanks to:

[tags]median, mysql, user defined functions[/tags]

Comments on upgrading to version two of Google Maps

I recently upgraded a simple simple Google Map that I built last spring to display some of the cross country skiing around Boulder. You can see the original version here. I built this based on this XML.com article, using XMLHttpRequest to retrieve the data from the server and Gmarker.openInfoWindoXSLT() with this XSL stylesheet to present the data.

I decided to upgrade this map last week to version two. Since openInfoWindowXSLT is no longer supported on every browser, I feared that the upgrade would take significant effort, even though the map very simple. However, the upgrade ended up being easier than I thought it would be. To get started, I read the Google Upgrade Guide–this document explains just what changes were made in the API. The changes that affected my map included:

  • A few method name changes–centerAndZoom becomes setCenter
  • GPoint is no longer used to indicate a latitude and longitude location on a map, and its replacement, GLatLng, reverses the order of the constructor’s arguments.
  • Zoom levels are flipped around, with larger numbers now signifying higher resolutions
  • The biggest effort was modifing the code not to use the XSLT process for generating infoWindows. However, this was easier than I thought it would be. I simply wrote a javascript method that mimicked what the XSL had previously done. Sure, accessing the DOM elements was a bit of a hassle that required some debugging (that’s the win of XSL–declarative DOM access), but the alternatives were either ignore browsers that don’t have built-in XSLT support (Safari) or integrate AJAXSLT, a Google sponsored project to provide cross browser XSLT support. If this were a larger project that depended on more XSLT, I probably would have done the latter.

Upgrading my (admittedly very simple map) took about 1.5 hours. Visit the new map and take a look at the code.

[tags]google maps upgrade[/tags]

Book Review: Google Maps API V2

Seven months ago, I wrote about Google Maps Gotchas. I mentioned Scott Davis’ Google Maps API Pragmatic Friday article, published by the Pragmatic Programmer folks. Well, a few things have happened since then. In April, Google released version two of their maps API (though they still haven’t set a date when version one will no longer be supported), Scott revised his article and I spent a tax deductible $8.50 to give it a read. What you’ll find below is my take on his article.

The good: first, the ordering was easy, and I received my custom PDF (complete with “Prepared Exclusively for Daniel Scott Moore” as a footer on every page) in less than 20 minutes. Scott explains in a very easy to understand fashion how to create a map. He also covers each of the API’s javascript objects and how to use them. In particular, I thought the list of events and objects that fire them (in the ‘Events’ chapter) was a good reference. Now, Google provides a class reference, but Scott’s are a bit easier to understand here’s a comparison, for the Gmarker class:

Google API:

A GMarker marks a position on the map. It implements the GOverlay interface andthus is added to the map using the GMap2.addOverlay() method.A marker object has a point, which is the geographical position where the marker is anchored on the map, and an icon. If the icon is not set in the constructor, the default icon G_DEFAULT_ICON is used.

After it is added to a map, the info window of that map can be opened through the marker. The marker object will fire mouse events and infowindow events.

Davis’ Book:

In the Core Objects section, we introduced the GLatLng. A GLatLng stores a Latitude / Longitude coordinate, but it doesn’t offer you a way to visualize it on a map. A GMarker is the way to add GLatLngs GMarker to the map for display purposes. The GMarker constructor takes a GLatLng as the only required argument.Once we have the marker, we need to tell the map to display it; map.addOverlay(myMarker) should do the trick. (Objects that you superimpose over the map are called Overlays.) You can remove the Overlays marker using map.removeOverlay(myMarker). To remove all overlays, use map.clearOverlays( ).

var myPoint = new GLatLng(38.898748, -77.037684);
var myMarker = new GMarker(myPoint);
map.addOverlay(myMarker);

Theoretically a map can support an unlimited number of markers, but anecdotal evidence suggests that performance starts to slow down significantly after a hundred or so markers. (File under, “Doc, it hurts when I do this.”)

I liked the real world examples–the fact that you could click through and see the code Scott was writing about in action on his website is a real plus. In addition, he builds a decently complex example in Chapter 7 where the user can add and delete cities. He also gives a good warning about examples that use Gmap, rather than Gmap2.

However, there were some issues. Scott’s coverage of the upgrade to version two of the API is, unfortunately, rather spotty. In his blog, the June release of that feature, and the April revision of the book). He also doesn’t cover GDownloadURL, a convenience method for XMLHttpRequest processing, or the GUnload methods. I’ll freely admit that the maps API is a moving target, and some of the omissions above may be due to that.

However, there are other problems. Though billed as a beginner book, he omits what I consider to be one of the fundamental challenges of Google Maps development–the performance obstacles large numbers of database driven markers (other than the comment mentioned above in the GMarker reference). In addition, he doesn’t cover design options, nor cross browser issues (like the transparent PNG in IE issue).

In the last chapter, he mentions good examples of mapping websites, but Scott omits references to useful websites–something that even dead tree books do. In particular, he doesn’t mention mapki.com (a wiki full of useful user provided data) nor the Google Maps group (which some users consider a primary differentiator between Google and Yahoo Maps).

One final gripe is that the 75 pages of content that I expected were really only 45–text only filled about 60% of the column width. I expect that in articles I read for free on the web, but in books that I pay for, I like a bit higher content to page ratio.

In short, this ebook is a good choice for the first time Google Maps builder. This is due to the tutorial nature of much of the book, the examples, and the explanation of typical good javascript code, such as using anonymous functions for the event handlers. It is not entirely adequate in covering version 2 of the API, possibly due to API changes, and it ignored some of the more complex aspects of the API.

If you’re looking for a folksy introduction to Google Maps api, it’s worth the $8.50 to have a coherent guide. If you’ve muddled through one google maps project, piecing together things from the API docs and various blogs, it becomes less worthwhile. But if you want some kind of discussion about complex Google Maps issues this document is not the right place to look.

[tags]Scott Davis, Google Maps, Pragmatic Fridays[/tags]

11 Tips for Managing Virtual Employees

Via WebWorkerDaily, here’s 11 tips for managing virtual workersUpdate, 6/2009: this link is dead, but here’s the wayback machine’s archived version.
I have been working virtually on and off for over three years (mostly as a contractor). I think that most of Scott’s comments are spot on, except two of them.

#1 (It’s not cheaper) says that you’ll need hardward like webcams and wireless keyboards. I’ve done prefectly well with a laptop and cellphone. In fact, one of the great things about working virtually is that everyone can choose their equipment to suit their needs.

#4 (Metrics, Metrics, Metrics) is one I’d amend, rather than totally discount. I think what Scott is really saying is that you need some means of verifying that work is getting done. And if it is not, you want to know sooner rather than later, so the more incremental feedback you can get, the better. In software, this can be accomplished by metrics, but frequent releases can also serve the same purpose.

That said, all of his post is worth reading.

[tags]working virtually,virtual software development[/tags]