Skip to content

MolhadoRef: A refactoring-aware version control system

This past Thursday, I went to one of the CU Comp Sci Colloquia. The speaker was Danny Dig, and he discussed Refactoring-Aware Software Merging and Version Control for Object-Oriented Programs, in particular, MolhadoRef. Mr Dig billed this several times as the first refactoring aware source control system ‘that we are aware of’.

Being relatively new to IDEs in general and Eclipse in particular, I find that the automatic refactoring is one of the largest productivity enhancements that I’ve discovered. The ability to almost effortlessly change a variable name to more accurately reflect its purpose, and to know that such a change will propagate everywhere it is used, lets you update variable names more often. I was recently editing a piece of PHP code and one of the variables had changed slightly in purpose. I could have done a search and replace, but I wouldn’t be positive that the changes hadn’t broken something subtle. With Eclipse, there’s no such hesitation.

This is almost worth the cost of having to use the mouse and not being able to use vi keystrokes (at least, without the vi plugin). But when you throw in the other refactorings, including move method and rename method, well, refactoring becomes invaluable.

What follows are my notes from the talk.

Mr Dig discussed how refactoring is becoming more prevalent in todays development environments, and how this is challenging text based version control systems such as CVS and SVN. The main issues are that refactoring causes global changes, while version control systems excel at tracking local changes. In addition, refactoring impacts program elements, while version control systems focus on files. In short, refactorings are more likely than normal edits to create conflicts and/or merge errors–due to refactoring tools, changes are regularly no longer limited to files.

In addition, when you move members between classes, you can lose version history–the only connection back to the original location is a commit comment, which is neither dependable nor robust.

MolhadoRef aims to change this, by making the version control system become refactoring aware. Currently MolhadoRef exists in ‘research project’ form as an Eclipse plugin, and is part of Mr Dig’s PhD thesis work. (By ‘research form’, I think he means not ready for prime time. For example, in his demo, when you checked changes in, there was no support for a commit message.) There are plans to release it as an open source project, but it is currently not so available. MolhadoRef is based on Molhado, which is an SCM system with support for OO software, on the backend, and Eclipse and a refactoring engine on the front end.

Basically, MolhadoRef can handle API level merging, where ‘API level’ means any member declarations. It also supports normal text edits in the same manner as CVS or SVN. It also tracks compile time as well as runtime merge conflicts.

On checkin, it follows the following algorithm:

  1. Detect API change operations, using a 3 way comparison. They are also tracked with tools such as Catchup or RefactoringCrawler
  2. Detect and resolve conflicts. It attempts to do so in an automated fashion, but sometimes requires user input.
  3. Undo refactorings. Sometimes this is not entirely possible, but if it fails, MolhadoRef is intelligent enough to fall back to a text merge.
  4. Merge text changes. This happens on a method level, not on a file level.
  5. Replay refactorings. This can be hard, because you need to serialize what may have been two parallel series of refactorings. He went into some things I didn’t understand right here, including using graph theory to reorder the refactoring, or making a refactoring into an enhanced refactoring (which not only changes code, but future refactorings in the chain).

Mr Dig conducted a case study and an experiment to prove the effectiveness of MolhadoRef. In the case study, he and a partner developed MolhadoRef separately for 3 weeks, and then merged the changes using both MolhadoRef and CVS. With 1267 lines of changed code, CVS had 36 conflicts, 41 compile errors, 7 runtime errors (determined by unit test coverage), and it took 105 minutes (including human time) to merge the two trees. In contrast, MolhadoRef had 1 conflict, 0 compile and runtime errors, and it took 1 minute to merge. Pretty impressive.

The experiment was to split 10 students (fairly experienced with Eclipse and java) into two groups. Each group had a common task. At the end, the changes made by each person in the first group were merged with the changes made by each person in the second group, producing 25 merge sessions. Using these episodes, using MolhadoRef led to 3.6 times fewer conflicts, 11.6 times fewer compiler errors, and 1.5 fewer runtime errors. In addition, the time to merge was 3.5 times faster with MolhadoRef.

The larger picture that Mr Dig is pursuing is to perform automated upgrades of component based applications (pdf). If you’ve ever upgraded from one version of a java web framework to another, you’ll know how painful it can be, especially if the API has changed. According to Mr Dig, 80% of the API changes in 5 projects he’s examined so far (4 open source, 1 proprietary) were due to refactorings. He told us why MolhadoRef fits into this picture, but I didn’t note it. Bummer.

He did say that both Microsoft and IBM were interested in his work. He also mentioned, I think, that Eclipse has already accepted some of his refactoring code.


What does that mean for us developers who don’t use build version control systems? well, not much right now, but if some of the bigger providers start to use this, it means that bigger, longer lived projects can be better managed; that code history won’t be lost as often, and that merge conflicts will decrease in severity and number. I for one can’t wait for that day.

[tags]refactoring, version control systems[/tags]

Initialize your GWT widgets

I’m a big fan of using GWT to increase web application usability in an incremental fashion. It may be fine to use GWT to build a whole-blown application, but I’ve never done that. When you go the widget approach, often you want to configure the widget, perhaps based on the page it is on. Kevin Jansz talks about how to give a GWT module init params (very much like init-param elements in web.xml). He suggests using the Dictionary class, which is in the i18n module. For a sweet example (that is not even related to i18n), read the Dictionary doc linked to above.There are some caveats. From the aforementioned documentation:

…the Dictionary class is fully dynamic. As a result, a variety of error conditions (particularly those involving key mismatches) cannot be caught until runtime. Similarly, the GWT compiler is unable discard unused dictionary values since the structure cannot be statically analyzed.

To me, using a Dictionary is a better way of getting configuration information from a host page than what I’ve done in the past: write a value to a hidden span and use the DOM GWT class to access it. Much clearer and no unneeded DOM elements. In fact, if you wanted to get fancy, you could generate the javascript object properties dynamically (this is conjecture, I’ve not tested this).

Nice find Kevin!

Is the tech contractor to salaried employee ratio a sign of the tech business cycle?

Nat Torkington seems to think so: “…when the clever contractors head for salaried positions then bad times are coming.” (The last paragraph is cut off due to an HTML error, but basically asks if a shortage of engineering talent affects this indicator in one way or another.)

It is a complicated problem because there are a lot of other factors influencing whether being a contractor or an employee is a better fit for a person’s situation: need for income stability, contacts in the industry, demand for a given skill set, even preference for new situations. Even so, it’d be interesting to look back and see what the ratio was over the last, say, 20 years.

I took a look at the BLS web site, but couldn’t find any reports delineating contract employees and salaried employees.
Via Infectious Greed.

[tags]contracting, business cycle indicators[/tags]

Article on information security

This article views the software insecurity problem from an economics perspective. Makes sense to me, except he totally ignores the other costs that come along with liability–lawyers, lost productivity, insurance costs. However, these costs seem reasonable. I have a friend who is a general contractor and builds houses. He was astonished to find that computer consultants don’t have to carry liability insurance. Perhaps it’s time for that.

[tags]security, software liability[/tags]

Transcript of GWT talk

Here is a transcript from a tech talk about GWT by Bruce Johnson, who is apparently a tech lead over at Google. It’s from a tech talk.  You can view the video as well (I guess–I didn’t). Some very interesting stuff in there, even though it’s about two months old. Here are some excerpts I found interesting.

On why Java was chosen for javascript development:

We want a more mature language [than javascript]. The Java language has a lot of years on it now. We have a lot of developers that know Java. There are a lot of books, other supporting technologies, things like debuggers and JUnit, and there is a tool called ‘FindBugs’ which does static analysis. Have you guys heard of FindBugs? It is fantastic. It is like Lint on steroids. So, you really just point it at your Java source code and it says, ‘Oh, by the way, here is like 200 bugs,’ and actually most of them are really truly bugs. Code coverage, Javadoc, really good things, all available out of the box if you use Java.

FindBugs has an eclipse plugin, by the way.

On integrating GWT into existing applications:

Probably the single biggest thing that we have screwed up so far when telling people about GWT is all these demos are like from scratch demos, and we have worked so hard to make sure that you did not have to write applications from scratch to use GWT. So, I think so far that is the biggest flub. GWT does not require you to start over. So, for example, if you have a wizzy travel service application for example that is say, based on JSP, all you need in order to add GWT logic to a page is to drop a meta tag into your head and then have a DIV or multiple DIVs that act like place holders for where you want to insert GWT behavior, which means that your Java source could not be more loosely coupled with the page. Basically, it is only connection of the page is based on ID and then, however, many assumptions you want to make.

I’ve definitely found the above to be the truth. Using GWT to build standalone components is a low risk way to explore the technology and add value to your website.

All in all, an interesting talk, err, read. (Here’s another talk by Bruce on TSS.)

Webapp performance tuning tool list

Here’s a great article about performance tuning web applications. In short, have a goal, and measure, measure, measure. Otherwise, you’re just shooting in the dark at a pin in a haystack. Or something like that.

I’ve touched on the complexity of performance testing web applications before, but this article goes me one better by outlining various tools that can be used to actual test different pieces of the stack.

I did notice one missing piece, though. The SitePen folks outline tools to test from the browser to the web server, and then the database server. But they don’t mention any app server or web server profilers. I wonder whether that’s an unintentional oversight, or whether they haven’t needed to tune dynamic business logic, either in the app server or web server layer.

I don’t have any business logic layer performance tuning tools to suggest, either. Looks like has a number of profilers–anyone have experience using one?

[tags]performance tuning[/tags]

Large varchar columns can lead to huge ESRI exports when using ogr2ogr

I was recently using ogr2ogr to convert, on the fly, some data in a PostGIS to other standard formats (ESRI and MapInfo). The ESRI export in particular had some problems–it took about 4 minutes for the export of an table with 11K rows and 37 columns, and it generated a 700M dbf file. This file was then zipped (with the other config files), and in around 6 minutes was compressed to a 7M zip file, that was sent to the browser. Now, you can imagine how thrilled a user would be to wait 10 minutes for an export. Apache was timing out (the default timeout is 5 min) and I was at a loss as to how to address the performance issue.

I mentioned this to a colleague who has significantly more experience with GIS tools, and he pointed out that in the source table there were several varchar(4000) fields. Now, in PostgreSQL, [i]f the string to be stored is shorter than the declared length … values of type character varying [varchar] will simply store the shorter string. But the ESRI export does not do that–each varchar(4000) field was padded to a length of 4000, even though none of the fields approached that length.

The solution? A few simple select max(length(colname)) from table and alter table statements, and the varchar(4000)columns were decreased in size. The dbf file decreased to a 50M file, uncompressed, and the entire zip file decreased to 5M. As you can guess, the download time was slashed.

Update 2/16: The kind members of the GDAL mailing list pointed me to a document listing all the limitations of the ESRI driver for ogr2ogr.  Check out the “Creation Issues Section”.

[tags]PostGIS, ESRI, ogr2ogr[/tags] doesn’t allow file uploads using php

I’m doing some work for a client using the MODx CMS. I will be writing more about that cool framework later, but I wanted to let the world know that hosting does not allow file upload via php scripts.

It doesn’t matter what your php.ini file says, the hosting environment doesn’t allow it. I was so astonished by an email telling me this that I called their customer service. Very politely, the fellow on the other end of the line repeated the prohibition. I asked “So, if I need file uploads, the only way to get them is to leave XO.” He was pretty uncomfortable, but said that was the case.

I guess has spoiled me. I simply can’t believe that a modern hosting service wouldn’t allow that kind of fundamental functionality.

Am I off base here? Do most hosting providers prohibit this functionality? Did I just not talk to the correct folks at XO?

[tags]php, file upload,[/tags]

Does any other blogging platform approach WordPress?

This person’s answer is ‘No!’. Looks like someone in the blog platform world has declared that the WordPress community has learned the lessons the Struts community learned a few years ago: If you document an open source system, provide plenty of examples and a supportive community, you can distance yourself from your competitors. Make it easy for the developers (QT) to choose you!

He states:

…the blogging market is c.l.o.s.e.d. – as in no more room, and most importantly, no more competition… [emphasis his]

(Regarding the strength of Struts, as of today, Dice has 1965 jobs matching ‘struts’, versus 176 for ‘rails’, 1481 for ‘spring’ and 493 for ‘JSF’. Now, it’s been a while since I commented on web frameworks, but it’s a pleasant surprise to see Spring approach Struts. Yes, yes, my methodology for documenting the ‘distance’ of Struts from its competitors is somewhat suspect. I don’t have access to book trends data, and what I can find doesn’t break things down to the framework level. Thanks for caring.)

However, Spring looks to be on the rise; even the most popular packages and/or platforms can fall from popularity. Especially in technology, where “new” is often a feature. Hence, I disagree with the statement that WordPress has locked down the blogging application market. My point is not argued from a knowledge of WordPress, but rather a knowledge of technology and tech trends.

Via sogrady.