JavaScript Isn’t a Toy Language (Anymore)!

So… JavaScript. When did one begin to feel that this crufty, popup-enabling, slightly-better-than-VB programming language for the unwashed masses might actually merit a second look?

Was it the first time you used Google Maps and realized you were moving the map without reloading the entire page?

Was it when Sun decided to include Rhino in the JDK?

Or when you browsed the Dojo codebase and realized that Java doesn’t have a monopoly on obtuse, enterprisey, over-architected design?

No? Maybe you figured it out when 60+ companies got together and decided it was worth the effort to start the OpenAjax Alliance in order to formalize common sense best practices for JavaScript libraries.

I know! It was when you (and your mother, and your coworkers, and all of their extended families) read Steve Yegge’s NBL blog post.

Actually, maybe it was when you decided to add John Resig to your blogroll.

Me?

The other week I read this article by John, in which he mentions the big O performance characteristics of a certain JavaScript benchmark. It doesn’t matter what benchmark, just focus on the important part here: big O. In an article on JavaScript. Big O. And JavaScript. Big O. JavaScript. And nary a raised eyebrow among the comments; almost the complete opposite, actually!

It’s almost like it’s respectable, or som’n.

Thomas Paine on Software Design

I draw my idea of the form of government software from a principle in nature which no art can overturn, viz. that the more simple any thing is, the less liable it is to be disordered, and the easier repaired when disordered;…

Thomas Paine, Common Sense (1776)

Space vs Time

A long, long time ago I took a college course, the title of which was Languages and Translation. The content of this sophomore-level course? A smörgåsbord of systems programming, heaps and stacks, pointers, *nix system calls, compilers, lex and yacc, grammars, lexical and semantic analysis, code optimization, and data representation — all taught and learned in C. While learning C. Oh, and Lisp.

This course made quite an impression, mainly because of my initial inexperience. I should explain that the path which led me to L&T went something like this:

  1. I’m a senior in high school. I’m applying to college. I need to choose a major. Hmm… writing that Blackjack game on my TI calculator was pretty cool. It was a whole 50 lines of code! Plus I’m in that typing class, closing in on 20 words per minute. Maybe I’ll try Computer Science!
  2. I’m a freshman at Georgia Tech. First semester. I’m taking Intro to Computing. Man, this HTML nonsense sure uses a lot of brackets! And this Microsoft Access program is impossible to use!
  3. Still a freshman at Tech. Second semester. This Intro to Programming class is pretty crazy! We’re using Java for the assignments. The TA mentioned in passing that there are no pointers in Java, but I have no idea what a pointer is, so I could care less. I’m beginning to grok object-oriented programming.
  4. Welcome to Languages and Translation! Malloc, Malloc, Malloc! Realloc, Calloc, Malloc! Bwahahahahahaha!

Psychological damage aside, this was a great class. Jim Greenlee, who taught the course, was both an evil bastard and a great teacher. One of the tenets of code optimization which he often highlighted was “space versus time,” the idea that you can often optimize for one at the expense of the other, but rarely for both at the same time.

For example, a compiler can decide to inline a short function in order to avoid time-consuming stack allocations, but the compiled program will be larger (less time, but more space). Of course, if space (memory) is at a premium, your compiler might instead try to recognize common code sequences and hoist them into artificial functions (less space, but more time).

Flash forward 8 years. We have a client/server application at work which uses DTOs to transfer data to and from the client, and we use Hibernate on the server to persist our BOs. A specific server call, invoked in the presence of a large amount of data, brings the application to its knees.

Immediately we jumped to conclusions — Arrgh! Hibernate is such a hog! If you don’t code things perfectly, you can’t scale! And sometimes not even then! A quick profiling session confirmed our fears. Three hours later we had bypassed Hibernate in this specific instance, coding to the JDBC API instead. Unfortunately, this wasn’t the last of our performance problems. A second profiling session indicated that we had another bottleneck in our DTO-to-BO conversion routines!

Now, something which must be understood about Hibernate’s collection semantics is that when you use Hibernate to load BO A, which has an X-to-N relationship with BO B, you should (usually) use the collection of Bs provided by Hibernate. For example, if you use Hibernate to load a UserGroup, and you want to modify the list of Users associated with said UserGroup, you should modify the existing list of Users. You should not create a new list, add Users to it, and then give the UserGroup the new list of Users. Why? Because creating a new list results in a one-shot delete, followed by N insert statements. This is usually not desired.

However, a naive approach to modifying the collection provided by Hibernate (clearing it and then adding the BOs which you know you want) is just as bad, because a call to collection.clear() also results in a one-shot delete. The best approach is one of minimal modification to the existing collection.

In the case of DTO-to-BO conversion, where the DTO representation of an object is being transferred to the corresponding BO representation, this means adding items to the BO’s collection that are in the DTO’s collection but not in the BO’s collection, and removing items from the BO’s collection that are in the BO’s collection but not in the DTO’s collection. Elements that are in both collections are simply ignored.

The obvious implementation of this algorithm looks something like this:

for(DTO dto : dtos) {
 if(!contains(bos, dto)) add(bos, dto);
}

for(BO bo : bos) {
 if(!contains(dtos, bo)) remove(bos, bo);
}

Unfortunately, when using lists, the contains() calls above hide a nested loop, resulting in O(n2) performance. Once n gets into the thousands, things start to get sluuuugish. Veeeeery sluuuugish.

The solution? Trade a little space for a lot of time! By constructing HashMaps which contain all of the BOs in the lists, keyed on business keys which uniquely identify the BOs, the contains() calls above can be performed in constant time by invoking map.containsKey(). The result is O(n) performance. Much better!

Creating a DTO DSL With ANTLR 3

We recently began a new development cycle at my day job. The project in question is a JEE client/server application which uses DTOs to transfer information between the client and the server. One of the lessons learned from the previous cycle was that developers tend to want to generalize and reuse code, including DTO code.

Code reuse is generally a good thing. It’s one of a handful of principles which can more generally be summarized as “lazy programmers are good programmers” [1]. But in our specific scenario, DTOs were being extended and reused to such a degree that we were ending up with a near 1:1 correspondence between DTOs and the BOs from which they were derived.

As a result, we ended up with two problems. First, because the DTOs were no longer owned exclusively by a specific use case, screen or scenario, it became much harder to modify them. Developers who were responsible for use cases whose requirements changed no longer had the luxury of a quick DTO change. They had to get in touch with all the other consumers of the affected DTOs and come to some sort of consensus prior to implementing the change.

The second problem was a bit more insidious. Can you guess what it was? A hint: We’re using Hibernate to map our BOs to a database. Another hint: One of the biggest factors in making ORM scalable (and even feasible) is lazy loading.

You can probably guess what our problem was at this point. By developing these swiss-army knife DTOs which essentially replicated the BO object graph, we forfeited all lazy loading advantages. In fact, because we continued to use lazy loading, we were actually getting worse performance than we could have achieved by enabling eager loading throughout the entire BOM!

The solution is a strict ban of (most) code reuse in DTOs. Unfortunately, that means a lot of replicated code. Even more unfortunately, this replicated code is boilerplate snooze-code that you couldn’t pay an intern to write, let alone real ;-) programmers. Even more double-plus unfortunately, our system is implemented in Java, which is nearly as verbose as my wife [2].

So what’s the solution to the solution? Well, if Groovy supported bound properties, it might be an option [3]. But it doesn’t. JRuby, focusing as it does on the “Ruby” side of the equation and not so much on the “J” side of things, isn’t an option either. There we stood, lost in a sea of problematic solutions and questionable answers, when Martin Fowler and Neal Ford came to me in a vision and told me write my own DSL! Not really, but I did decide to write my own DSL.

The DSL is just an ANTLR grammar which allows a programmer to specify the core pieces of information regarding a DTO in a format which Java programmers will find vaguely familiar. Our DTOs require four or five artifacts per property: a constant for the property name (to be used during property binding on the client), an instance variable, a setter, a getter and possibly inclusion in the hashCode, equals and toString methods (depending on whether or not the property is part of the entity’s business key).

During testing, I found that a large (but not uncommonly so) DTO with 30 properties took a little over 1000 lines of Java code. A lot of that was comments, but meaningful comments are required everywhere in this project. That same DTO could be written in 62 lines of code using the DTO DSL: 2 for the class itself and 2 per property. Here’s an example of the DSL:

sorting.ReceptacleDto {
   /** The receptacle number. */
   BIZ int number;
   /** The name of the receptacle type. */
   String type;
   /** The name of the receptacle status. */
   String status;
   /** The receptacle's gross weight. */
   Quantity grossWeight;
   /** The receptacle's destination. */
   String destination;
}

The first line identifies the DTO’s package and class name. Each line thereafter is part of a two-line comment/property combination. The comment is used to generate meaningful JavaDoc comments for the property name constants, instance variables, setters and getters. Each property declaration identifies the property type and name. The property declarations may also include an optional BIZ keyword which flags the property as being part of the entity’s business key (and therefore part of the equals, hashCode and toString implementations).

I should note that this is explicitly not a round-trip solution. Developers write their DTO using the DSL, generate the Java code, and from that point on maintain the Java code manually. Further, no type checks are performed — in the example above, the grammar has no idea whether or not String is a valid type. It just generates the Java syntax, devoid of semantic checks, and the developer can use an IDE to automatically organize the required imports.

The ANTLR grammar file [4] which parses the DSL is only 340 lines of code, 60 lines of which are lexer and parser rules. Play with it, modify it, change it to fit your needs. Enjoy!

[1] Also: Don’t Repeat Yourself; Premature optimization is the root of all evil; etc.

[2] I kid, I kid!

[3] Bound properties (i.e. setters trigger property change events) are required because we bind the DTOs directly to the view via the JGoodies Binding library.

[4] The original filename is Dto.g, but my blogging software considers grammar files to be dangerous beasts, so I had to disguise it as a text file.

SQL Injection Illustrated

Heh.

Code as Art

I figure that as a programmer, I should be aware of the classic literature in the field. Most lists of must-read computer science books include The Mythical Man-Month, a collection of essays by Frederick Brooks. First published in 1975, it presents the author’s views and opinions on the subject of software engineering, as informed by his experiences in managing the deveopment of OS/360 at IBM.

I haven’t gotten through the whole thing yet, but I’ve found it very interesting and informative thus far. There is one section in particular, in the essay called The Tar Pit, which I like. It provides a nice description of the creativeness which is often involved in programming:

As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God’s delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake…

The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures…

Yet the program construct, unlike the poet’s words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

It’s a good thing that there are people out there who can verbalize ideas this well. Especially when, like Bach, they sign their work SDGSoli Deo Gloria.