The Future of Groovy Interoperability?

Last week I experimented with Groovy, one of the more popular dynamically typed languages available on the JVM. Specifically, I was trying to mix and match Java and Groovy, starting with a pure Java codebase, and merging certain sections to Groovy where it made sense — beginning with the domain objects. In a nutshell, I found that while Groovy is pretty tubular, my expectations with regards to its interoperability with Java were somewhat optimistic.

After a little thought and some prodding from Marc and Guillaume, it became clear that from an interoperability perspective Groovy use can be categorized into four broad use case scenarios. From simplest to most complex, they are:

  • Pure Groovy.
  • Groovy invoking Java.
  • Java invoking Groovy.
  • Groovy invoking Java invoking Groovy invoking… a.k.a. circular dependency.

The first use case is obviously the most trivial. If Groovy code didn’t work, there would be no Groovy project. This is the baseline from which the other use cases are built. The second scenario is where things start to get interesting. The implications of successfully covering the second use case are huge: bytecode generation, JVM as a platform (HotSpot!), and access to third party libraries representing over 10 years of development. Just in case some of you have been hiding under a rock, I should note that this has been possible for a while ;-)

Moving from the second scenario to the third is trivial, as far as Groovy is concerned. Unfortunately, build tools and project management packages such as Maven need to be coerced into compiling the Groovy code before the Java code. This is a challenge, at least for Maven. This inflexibility, though not Groovy’s fault, does pose a problem. It seems the standard way of dealing with this dilemma is to hide all of your Groovy code behind pure Java interfaces, and then move the Java code, the interfaces, and the Groovy code into three separate modules that can be built serially. Obviously, this amounts to erecting an artificial barrier between your Java codebase and your Groovy codebase. However, if the future does indeed hold a harmonious coexistence of multiple first-class languages on the JVM, as Groovy enthusiasts hope, then these issues will fade as Java developers become multilingual and the more popular build systems begin to cater to their needs.

Now we arrive at the fourth and final scenario: circular dependency. Looking back, this is the problem I somewhat naively assumed had already been solved by the Groovy team. The Java domain object which I converted to Groovy implemented an interface defined in Java, and the domain object itself was obviously used throughout the application, implemented in Java. Thus, I couldn’t compile the Java code before compiling the Groovy code because the domain object referenced by the Java code had not and could not yet be compiled. Furthermore, I wouldn’t have been able to fix the problem by compiling the Groovy code first, because the domain object referenced an interface implemented in Java. Dierk Koenig’s Groovy in Action calls this “the chicken and egg dependency problem:”

Groovy and Java both have no problem accessing, extending, or implementing compiled classes or interfaces from the other language. But at the source code level, neither compiler is really aware of the other language’s source files. If you want to work seamlessly between the two languages, the trick is to always compile dependent classes using the appropriate compiler prior to compiling a class that uses a dependent class.

This sounds simple, but in practice, there are many tricky scenarios, such as compiling a Java file that depends on a Groovy file that depends on a Java file. Before you know it, you can quickly end up with intricate dependencies crossing the boundaries of each language. In the best scenario, you may have the alternate back and forth between the two language compilers until all the relevant classes are compiled. A more likely scenario is that it will become difficult to determine which compiler to call when. The worst case scenario — and it’s not uncommon — occurs when you have circular dependencies. You will reach a deadlock where neither language will compile because it needs the other language to be compiled first.

The obvious solution is to punt: get rid of the circular dependency, revert to either use case two or use case three, and rant about “open sores” software for a couple of days to make yourself feel better. But let’s dig a little deeper and leave the ranting for next week… what would really need to be implemented in order to achieve the final interoperability scenario? JSR 223 (Scripting for the Java Platform) is all about implementing script engines in Java, which then host some target scripting language; not exactly what we’re looking for. JSR 292 (Supporting Dynamically Typed Languages on the Java Platform) is looking to add a new JVM instruction called invokedynamic, allowing method invocation verification to occur dynamically, at runtime. This new bytecode would make compilation of dynamically typed languages easier, but doesn’t solve our present conundrum.

What, then, is the solution? Well, we would probably need to use a single compiler (or compiler framework) for both languages. Classes defined in Java syntax would need to be aware of classes defined in Groovy syntax, and vice versa, before any bytecode is serialized into .class files. I tried to stay away from the compiler classes in college, but it seems fairly obvious that, assuming Eric isn’t lying, you would need a compiler framework providing:

  • Pluggable lexical and syntactical analysis (one per language).
  • Unified semantic analysis, code optimization and generation.

JSR 199 (Java Compiler API) looks interesting, but it doesn’t provide this sort of pluggable architecture. However, given that Sun GPL’ed the Java compiler not too long ago, it doesn’t seem too far-fetched to wonder if someone from Groovy, JRuby, Jython or Rhino might not start exploring the possibilities.

There’s also a second possibility that might require a little less tinkering. According to Tom Ball, a technology director at Sun who works on Java programming language tools, an ingenious hacker might make use of the JSR 199 compiler API to create a custom JavaFileManager implementation such that calls to getJavaFileForInput() requesting JavaFileObjects of type JavaFileObject.Kind.CLASS would forward the requests to the Groovy compiler. This custom JavaFileManager would behave similarly to the existing GroovyClassLoader, providing a convenient facade for the Groovy compiler, but could possibly be written in such a way that circular dependencies would be resolved correctly.

As Dierk puts it:

Until the Java compiler is aware of classes not yet compiled in other languages, you have to use intermediary interfaces or abstract classes in Java to make the interaction between Java and Groovy smoother during the compilation process. Let’s hope some day the Java compilers will provide hooks for interacting with foreign compilers of alternate languages for the JVM.

So. The code is all out there. Does anyone want to bet on how long it takes someone to stop hoping and start scratching their itch?

Giving Groovy a Chance

So HLS recently gave Groovy a shot by trying it out with the brand-spanking-new Tapestry 5, and the results were relatively promising. Then Marc Guillemot, of HtmlUnit and Canoo WebTest fame, started trying to get me to buy Dierk’s recently-published Groovy In Action ;-) I had considered using Groovy to reduce the LOC in the domain objects for one of my side projects, but I’m using JPA annotations as part of my persistence configuration. As of this writing, only the most bleeding-edge Groovy bits will support annotations, so I decided it wasn’t worth pursuing. However, I recently realized that this particular project is going to require integration with different legacy database systems, which means I’d probably be better off doing my mapping in hbm.xml files.

Now that it’s highly unlikely that I will be using JPA annotations to map my domain objects to the backing database, I decided that it was time to give Groovy a try, and see if I could get rid of all the boilerplate getters and setters in my domain objects. I chose one of the simpler domain objects for the trial. It contained a little over 150 LOC, comprising 7 properties, overridden equals() / hashCode() / toString(), and a couple of trivial convenience methods.

Initial setup consisted of installing the Groovy Eclipse Plugin and took about 1 minute. My initial experience was similar to Howard’s — everything just worked. I immediately halved the number of lines in the class, since I could get rid of all my getters and setters. I should mention that before moving this particular domain object over into Groovy, I wrote extensive unit tests for it. I just can’t get over the feeling that giving up all the compile-time checks that Java has to offer somehow puts me on shaky ground.

After making sure that my unit tests still passed, I tried to figure out if there was anywhere else I could cut down on the LOC. I settled on the equals() and hashCode() methods, and decided that surely there must be a way in Groovy to declaratively express the fields that should be used as a bean’s identity. Unfortunately, the only thing I was able to find was GROOVY-27, an open enhancement wish filed four years ago. Eventually I had a pass at writing a GroovyBean superclass with generic equals() and hashCode() methods relying on an abstract getIdentity() method, which returns a list of the values comprising the bean’s identity. I could have done this in Java, and it was 5 times slower than the older implementation (which was already relatively slow due to the use of EqualsBuilder and HashCodeBuilder), but it reduced my LOC to 35. Plus, the root of all evil is premature optimization, right? ;-)

Up to this point everything had gone relatively smoothly, but unfortunately what lay ahead was frustration upon frustration. First, in implementing the equals() and hashCode() methods as described above, I naively decided to use Groovy’s cool closure support for looping. Bad decision. Not surprisingly, there’s a lot of magic going on under the covers to get closures to work, and I got ClassNotFoundExceptions, ClassCastExceptions and StackOverflowExceptions until I got rid of the closures. Apparently some of the magic involves these essential methods, and Groovy is not happy to be using magic at this basic level.

Second, I’m using Maven 2 to manage this project, and apparently most Groovy enthusiasts use Ant. The result? Second-class support for Maven. Supposedly there’s a Maven plugin out there somewhere that worked every third Tuesday in 2004, but the “standard” way of integrating Maven 2 and Groovy is via Groovy’s ant task. It was painful enough to see the 100 lines of Java code I had just annihilated resurrected as ghastly XML code in my pom.xml file, but unfortunately the Groovy compilation ends up occurring after the standard Java compilation. This would be OK if I were writing Groovy code that never got used by Java code, but unfortunately, domain objects are the bedrock of any application and are used everywhere else. I suppose I could bind the Groovy compilation to some other build lifecycle phase which occurs before the standard compile phase, or put all my Groovy files in a sub-module referenced by the main project, or any number of other hacks, but the point is that I don’t want to spend my time fighting Groovy so that I can be three times as productive writing my domain objects.

Finally, there is the issue of the Groovy Eclipse plugin itself. It works well out of the box, but there is some room for improvement. For example, why must it compile into the bin-groovy directory? Why can’t it just use the project’s default output folder? Also, while you can right-click on a project and choose “Add Groovy Nature”, there’s not an option to “Remove Groovy Nature”. I’m also slightly OCD when it comes to code formatting, and it’s nice to be able to auto-format my code. Allowing the IDE to remove those trailing tabs for me probably doubles my productivity. However, the Groovy editor can’t auto-format.

All in all, aside from the undocumented no-closures-in-hashCode-or-equals issue, I found Groovy to be a well-polished alternative to Java on the JVM. As I mentioned earlier, the LOC in my domain objects was cut in half, and a declarative identity feature for beans would be absolutely killer. However, when you consider the need to interact with the entire Java ecosystem, in this case Eclipse and Maven 2, it’s obvious that Groovy could use some lovin’. A solid Maven 2 plugin that doesn’t require 50 lines of XML (convention over configuration, baby!) and some polish on the Eclipse plugin would go a long way to making a la carte Groovy use possible. As things stand, however, I’m going to have to continue to writing my domain objects in Java.

UPDATE: Apparently the JRuby guys are working on IDE integration and a Maven 2 plugin