Hibernate: Trouble in Paradise

I’ve written before about the problems which seem to crop up when you introduce Hibernate into your project. Unfortunately, I’m becoming more and more convinced that these are not isolated issues. Hibernate has a dependency management problem.

Things were not always this way. Hibernate was once independent, unbeholden to outside influences. Sure, you’ve always needed a somewhat-thicker-than-average skin in order to file bugs or ask questions in the forums. An extra helping of patience never hurt, either. But nowadays it seems like you need all of these things, plus the time and skill necessary to patch a JAR or two.

We recently upgraded to the latest Hibernate production JARs: Hibernate Core 3.2.6.GA, Hibernate Annotations 3.3.0.GA and Hibernate EntityManager 3.3.1.GA. Having already benefited from our earlier experience regarding conflicts between Spring and Hibernate, I figured this would take about five minutes: modify our root Maven2 POM, do a full build, verify everything works. Done! Not quite.

The first problem we encountered, reported 4 months ago, is that the production Hibernate EntityManager POM excludes a transitive dependency which it should not exclude. Net result? You have to hack together a custom version of Hibernate EntityManager which does not exclude this transitive dependency. Of course, this transitive dependency exists in the JBoss Maven2 repo, but not in the central repo. Nice.

Next, if you’re using a standard Maven2 Windows installation, you’ll run into this bug, because Hibernate now refuses to load JARs from directories with spaces in them (the standard location for Maven2 local repositories in Windows is in “Documents and Settings”). Very nice. Welcome to 1998!

You may notice a trend here: an old ASM dependency which causes conflicts with other libraries, a required dependency that is mistakenly excluded, and a dependency on a deprecated class in a buggy JBoss utility library.

Two of my co-workers are already suggesting we switch to TopLink. Jokingly, of course. For now. Napoleon is reputed to have said that “if they want peace, nations should avoid the pin-pricks which precede cannon shots.” Third-party libraries should likewise avoid annoying their users with irritating minutiae, or they may find these users mobilizing the artillery.

Space vs Time

A long, long time ago I took a college course, the title of which was Languages and Translation. The content of this sophomore-level course? A smörgåsbord of systems programming, heaps and stacks, pointers, *nix system calls, compilers, lex and yacc, grammars, lexical and semantic analysis, code optimization, and data representation — all taught and learned in C. While learning C. Oh, and Lisp.

This course made quite an impression, mainly because of my initial inexperience. I should explain that the path which led me to L&T went something like this:

  1. I’m a senior in high school. I’m applying to college. I need to choose a major. Hmm… writing that Blackjack game on my TI calculator was pretty cool. It was a whole 50 lines of code! Plus I’m in that typing class, closing in on 20 words per minute. Maybe I’ll try Computer Science!
  2. I’m a freshman at Georgia Tech. First semester. I’m taking Intro to Computing. Man, this HTML nonsense sure uses a lot of brackets! And this Microsoft Access program is impossible to use!
  3. Still a freshman at Tech. Second semester. This Intro to Programming class is pretty crazy! We’re using Java for the assignments. The TA mentioned in passing that there are no pointers in Java, but I have no idea what a pointer is, so I could care less. I’m beginning to grok object-oriented programming.
  4. Welcome to Languages and Translation! Malloc, Malloc, Malloc! Realloc, Calloc, Malloc! Bwahahahahahaha!

Psychological damage aside, this was a great class. Jim Greenlee, who taught the course, was both an evil bastard and a great teacher. One of the tenets of code optimization which he often highlighted was “space versus time,” the idea that you can often optimize for one at the expense of the other, but rarely for both at the same time.

For example, a compiler can decide to inline a short function in order to avoid time-consuming stack allocations, but the compiled program will be larger (less time, but more space). Of course, if space (memory) is at a premium, your compiler might instead try to recognize common code sequences and hoist them into artificial functions (less space, but more time).

Flash forward 8 years. We have a client/server application at work which uses DTOs to transfer data to and from the client, and we use Hibernate on the server to persist our BOs. A specific server call, invoked in the presence of a large amount of data, brings the application to its knees.

Immediately we jumped to conclusions — Arrgh! Hibernate is such a hog! If you don’t code things perfectly, you can’t scale! And sometimes not even then! A quick profiling session confirmed our fears. Three hours later we had bypassed Hibernate in this specific instance, coding to the JDBC API instead. Unfortunately, this wasn’t the last of our performance problems. A second profiling session indicated that we had another bottleneck in our DTO-to-BO conversion routines!

Now, something which must be understood about Hibernate’s collection semantics is that when you use Hibernate to load BO A, which has an X-to-N relationship with BO B, you should (usually) use the collection of Bs provided by Hibernate. For example, if you use Hibernate to load a UserGroup, and you want to modify the list of Users associated with said UserGroup, you should modify the existing list of Users. You should not create a new list, add Users to it, and then give the UserGroup the new list of Users. Why? Because creating a new list results in a one-shot delete, followed by N insert statements. This is usually not desired.

However, a naive approach to modifying the collection provided by Hibernate (clearing it and then adding the BOs which you know you want) is just as bad, because a call to collection.clear() also results in a one-shot delete. The best approach is one of minimal modification to the existing collection.

In the case of DTO-to-BO conversion, where the DTO representation of an object is being transferred to the corresponding BO representation, this means adding items to the BO’s collection that are in the DTO’s collection but not in the BO’s collection, and removing items from the BO’s collection that are in the BO’s collection but not in the DTO’s collection. Elements that are in both collections are simply ignored.

The obvious implementation of this algorithm looks something like this:

for(DTO dto : dtos) {
 if(!contains(bos, dto)) add(bos, dto);
}

for(BO bo : bos) {
 if(!contains(dtos, bo)) remove(bos, bo);
}

Unfortunately, when using lists, the contains() calls above hide a nested loop, resulting in O(n2) performance. Once n gets into the thousands, things start to get sluuuugish. Veeeeery sluuuugish.

The solution? Trade a little space for a lot of time! By constructing HashMaps which contain all of the BOs in the lists, keyed on business keys which uniquely identify the BOs, the contains() calls above can be performed in constant time by invoking map.containsKey(). The result is O(n) performance. Much better!

Hibernate, Spring and ASM

As far as I can tell, there isn’t much love lost between the folks at JBoss (including the Hibernate team) and the guys at Interface21 (the people behind Spring). Of course, it’s hard to gauge these things based on third-party reports and miscellaneous blog posts, but I think I’ve found confirmation — and in the process, discovered Gavin’s secret weapon against the Spring onslaught.

The secret weapon is ASM. No, not the American School of Madrid (where yours truly spent three years of his high school career). Rather, the Java bytecode manipulation library which allows Hibernate (and Spring) to attain that level of awesomeness we’ve all come to love. So both projects are cool — the problem is that they’re incompatibly cool. Spring uses ASM 2.2.3, which is the latest version of the stable branch. Hibernate, on the other hand, uses ASM 1.5.3, which is the latest version of the, ah, antediluvian branch.

The Spring forums are full of people dealing with this issue, which presents itself in the form of NoSuchMethodErrors, AbstractMethodErrors and other unintuitive exceptions. If you’re using Maven, as many people are, the solution is to exclude certain transitive dependencies in your POM files. I’ve had to do this at least 4 times in the past couple of months.

But why should this even be necessary? I’ve found no clue as to why the Hibernate team has not upgraded from the 1.X branch to the 2.X branch in the roughly 20 years it has been available [1]. Which is why I’ve come to the conclusion that the only possible rationale behind this state of affairs is that the Hibernate team is using it as a stumbling block to Spring adoption. Forward the rumor mill!

[1] Time estimate in dog / internet years.

« Previous entries