Maven 2.0.7 Released!

Things seem to be speeding up again in Maven land, ever since a bunch of the core Maven developers left Mergere to start Sonatype. It seems likely that development at Mergere was focused on value-add propositions like Maestro (based on Maven), while Sonatype focuses on Maven itself, making money off of training and partnerships. Consider the following timeline:

  • Maven 2.0.2 (January 2006)
  • Maven 2.0.3 (March 2006)
  • Maven 2.0.4 (April 2006)
  • …zzzz…zz…zzzzzz…
  • Maven 2.0.5 (February 2007)
  • Maven 2.0.6 (April 2007)
  • Maven 2.0.7 (June 2007)
  • Maven 2.0.8 (August 2007 [tentative])

As you can see, nothing much happened for about a year there.

In his announcement on The Server Side, Jason mentions the team’s intention to release monthly maintenance releases from here on out. I’ve also noticed increased activity on a couple of the issues I’m watching in Maven’s issue tracker, including features slated for the 2.1 release. I really like Maven 2, and I’m glad to see these signs of resurgent activity.

Congratulations, guys!

Research Paper: HtmlUnit Refactoring

The other day I stumbled across a research paper entitled Digging the Development Dust for Refactorings [1], which addresses software repository data mining. Specifically, the paper identifies four types of data which can be examined — source code metrics, identifiers, ROI estimates, and design differencing — and examines their use in building a refactoring history for a software project. Which project, you ask? HtmlUnit!

From the abstract:

Software repositories are rich sources of information about the software development process. Mining the information stored in them has been shown to provide interesting insights into the history of the software development and evolution. Several different types of information have been extracted and analyzed from different points of view. However, these types of information have not been sufficiently cross-examined to understand how they might complement each other. In this paper, we present a systematic analysis of four aspects of the software repository of an open source project — source-code metrics, identifiers, return-on-investment estimates, and design differencing — to collect evidence about refactorings that may have happened during the project development. In the context of this case study, we comparatively examine how informative each piece of information is towards understanding the refactoring history of the project and how costly it is to obtain.

The authors evaluate their proposed refactoring detection methodology by trying it out on the HtmlUnit repository:

To evaluate the effectiveness of our lightweight refactoring method, we examined an open-source system HTMLUnit. HTMLUnit is a realistic representative example of open-source development. There are nine releases in its history from May 22, 2002 to March 17, 2005. It is quite well documented; in fact, examining the log comments in its CVS-repository history, we found many references to refactorings and their rationale, which is critical for our understanding of the system lifecycle.

So we get kudos on our commit logs. Continuing on into the conclusion:

Based on our HTMLUnit case study, we have found that a heuristic combination of source-code metrics and identifiers-movement analysis — using information easily available on any repository platform — can be quite effective in recovering specific refactorings in the software evolutionary lifecycle, albeit not as accurate as structural analysis of the logical system design and less computationally intensive. An even more interesting finding was that the refactorings omitted by the developers in the system’s documentation were found to be “bad investments of development time” according to our ROI estimate, which implies that developers’ documentation is a good description of the developers’ intention if not of their actual work.

Apparently the authors’ analysis identified 11 refactorings, three of which were not documented in the commit logs. These same three undocumented refactorings were also found to have negative ROIs and less than 50% relevance. The assumption made by the authors is that these three refactorings were accidental: code cleanups or bug fixes that got a little too bloated. So basically we’re pretty good about documenting our refactorings, except when we “accidentally refactor”. Interesting stuff!

[1] C.Schofield, B.Tansey, Z.Xing and E.Stroulia, Digging the Development Dust for Refactorings, Proc. of the 14th International Conference on Program Comprehension, Athens, Greece, June 14-16, 2006.

New Tapestry 5 Feature

One of the key features in Tapestry 5 is something which Howard has dubbed adaptive API. The idea is that the framework adapts to your code, rather than the other way around. IoC containers get you halfway there by allowing you to code POJOs with distinct responsibilities which are later wired by the container. However, when your code is part of a larger whole (i.e. you’re using some framework), best practice usually dictates that the relevant boundaries are formed by some well-defined set of interfaces.

Tapestry 5 has taken another route: do whatever you want to do, and the framework will use every trick in the bag to attempt to accommodate you. One obvious example of this principle is the event handler API. Specifically, the return values for these event handlers determine the next page that will be rendered. In most web frameworks, you would most definitely need to return an instance of IPage or ILink or whatever well-defined return type your framework understands. In T5 land, however, you can return almost anything that might make sense: null, a page name, a page instance, a link, even a stream. And now, thanks to Howard’s genius and my copy-and-paste skills, a page class :-)

Hibernate, Spring and ASM

As far as I can tell, there isn’t much love lost between the folks at JBoss (including the Hibernate team) and the guys at Interface21 (the people behind Spring). Of course, it’s hard to gauge these things based on third-party reports and miscellaneous blog posts, but I think I’ve found confirmation — and in the process, discovered Gavin’s secret weapon against the Spring onslaught.

The secret weapon is ASM. No, not the American School of Madrid (where yours truly spent three years of his high school career). Rather, the Java bytecode manipulation library which allows Hibernate (and Spring) to attain that level of awesomeness we’ve all come to love. So both projects are cool — the problem is that they’re incompatibly cool. Spring uses ASM 2.2.3, which is the latest version of the stable branch. Hibernate, on the other hand, uses ASM 1.5.3, which is the latest version of the, ah, antediluvian branch.

The Spring forums are full of people dealing with this issue, which presents itself in the form of NoSuchMethodErrors, AbstractMethodErrors and other unintuitive exceptions. If you’re using Maven, as many people are, the solution is to exclude certain transitive dependencies in your POM files. I’ve had to do this at least 4 times in the past couple of months.

But why should this even be necessary? I’ve found no clue as to why the Hibernate team has not upgraded from the 1.X branch to the 2.X branch in the roughly 20 years it has been available [1]. Which is why I’ve come to the conclusion that the only possible rationale behind this state of affairs is that the Hibernate team is using it as a stumbling block to Spring adoption. Forward the rumor mill!

[1] Time estimate in dog / internet years.