This past Thursday, I went to one of the CU Comp Sci Colloquia. The speaker was Danny Dig, and he discussed Refactoring-Aware Software Merging and Version Control for Object-Oriented Programs, in particular, MolhadoRef. Mr Dig billed this several times as the first refactoring aware source control system ‘that we are aware of’.
Being relatively new to IDEs in general and Eclipse in particular, I find that the automatic refactoring is one of the largest productivity enhancements that I’ve discovered. The ability to almost effortlessly change a variable name to more accurately reflect its purpose, and to know that such a change will propagate everywhere it is used, lets you update variable names more often. I was recently editing a piece of PHP code and one of the variables had changed slightly in purpose. I could have done a search and replace, but I wouldn’t be positive that the changes hadn’t broken something subtle. With Eclipse, there’s no such hesitation.
This is almost worth the cost of having to use the mouse and not being able to use vi keystrokes (at least, without the vi plugin). But when you throw in the other refactorings, including move method and rename method, well, refactoring becomes invaluable.
What follows are my notes from the talk.
Mr Dig discussed how refactoring is becoming more prevalent in todays development environments, and how this is challenging text based version control systems such as CVS and SVN. The main issues are that refactoring causes global changes, while version control systems excel at tracking local changes. In addition, refactoring impacts program elements, while version control systems focus on files. In short, refactorings are more likely than normal edits to create conflicts and/or merge errors–due to refactoring tools, changes are regularly no longer limited to files.
In addition, when you move members between classes, you can lose version history–the only connection back to the original location is a commit comment, which is neither dependable nor robust.
MolhadoRef aims to change this, by making the version control system become refactoring aware. Currently MolhadoRef exists in ‘research project’ form as an Eclipse plugin, and is part of Mr Dig’s PhD thesis work. (By ‘research form’, I think he means not ready for prime time. For example, in his demo, when you checked changes in, there was no support for a commit message.) There are plans to release it as an open source project, but it is currently not so available. MolhadoRef is based on Molhado, which is an SCM system with support for OO software, on the backend, and Eclipse and a refactoring engine on the front end.
Basically, MolhadoRef can handle API level merging, where ‘API level’ means any member declarations. It also supports normal text edits in the same manner as CVS or SVN. It also tracks compile time as well as runtime merge conflicts.
On checkin, it follows the following algorithm:
- Detect API change operations, using a 3 way comparison. They are also tracked with tools such as Catchup or RefactoringCrawler
- Detect and resolve conflicts. It attempts to do so in an automated fashion, but sometimes requires user input.
- Undo refactorings. Sometimes this is not entirely possible, but if it fails, MolhadoRef is intelligent enough to fall back to a text merge.
- Merge text changes. This happens on a method level, not on a file level.
- Replay refactorings. This can be hard, because you need to serialize what may have been two parallel series of refactorings. He went into some things I didn’t understand right here, including using graph theory to reorder the refactoring, or making a refactoring into an enhanced refactoring (which not only changes code, but future refactorings in the chain).
Mr Dig conducted a case study and an experiment to prove the effectiveness of MolhadoRef. In the case study, he and a partner developed MolhadoRef separately for 3 weeks, and then merged the changes using both MolhadoRef and CVS. With 1267 lines of changed code, CVS had 36 conflicts, 41 compile errors, 7 runtime errors (determined by unit test coverage), and it took 105 minutes (including human time) to merge the two trees. In contrast, MolhadoRef had 1 conflict, 0 compile and runtime errors, and it took 1 minute to merge. Pretty impressive.
The experiment was to split 10 students (fairly experienced with Eclipse and java) into two groups. Each group had a common task. At the end, the changes made by each person in the first group were merged with the changes made by each person in the second group, producing 25 merge sessions. Using these episodes, using MolhadoRef led to 3.6 times fewer conflicts, 11.6 times fewer compiler errors, and 1.5 fewer runtime errors. In addition, the time to merge was 3.5 times faster with MolhadoRef.
The larger picture that Mr Dig is pursuing is to perform automated upgrades of component based applications (pdf). If you’ve ever upgraded from one version of a java web framework to another, you’ll know how painful it can be, especially if the API has changed. According to Mr Dig, 80% of the API changes in 5 projects he’s examined so far (4 open source, 1 proprietary) were due to refactorings. He told us why MolhadoRef fits into this picture, but I didn’t note it. Bummer.
He did say that both Microsoft and IBM were interested in his work. He also mentioned, I think, that Eclipse has already accepted some of his refactoring code.
/Notes
What does that mean for us developers who don’t use build version control systems? well, not much right now, but if some of the bigger providers start to use this, it means that bigger, longer lived projects can be better managed; that code history won’t be lost as often, and that merge conflicts will decrease in severity and number. I for one can’t wait for that day.
[tags]refactoring, version control systems[/tags]