JavaScript Iteration Order

The blogosphere has been flooded with Google Chrome-related posts for the past week or so, especially in the niches catering to web developers, browser developers and JavaScript enthusiasts. I haven’t tried it out yet, but I don’t see why that should keep me from joining the fracas ;-)

The Chrome team’s vision is refreshing. Their technology is intriguing. Their future? Uncertain. However, these topics have been explored in such depth that any effort at original contributions in these areas, at this time, are almost certainly useless.

I did, however, have to smile when I read John Resig’s post on JavaScript in Chrome. In it, he highlights a number of JavaScript issues in this new browser, the third one being for loop ordering:

Currently all major browsers loop over the properties of an object in the order in which they were defined. Chrome does this as well, except for a couple cases… If an object contains a value which is not a primitive… then its properties will be enumerated in a different order from which they were defined.

This is an interesting bug due to one fact: This behavior is explicitly left undefined by the ECMAScript specification… However, specification is quite different from implementation. All modern implementations of ECMAScript iterate through object properties in the order in which they were defined. Because of this the Chrome team has deemed this to be a bug and will be fixing it.

The reason I find this amusing is that the Rhino codebase suffered from the exact same problem, only recently adhering to the de facto standard which in this area subsumes the de jure ECMAScript standard.

We had a number of HtmlUnit users complain about this lapse, and even now the fix is not available in any official Rhino releases — hence HtmlUnit’s Rhino fork.

It appears implementers are doomed to repeat this mistake until such time as this unofficial convention is given official sanction. The recent outbreak of common sense, pragmatism and brotherly love on the ECMAScript committee is cause for optimism.

As John Resig mentions in his post on ECMAScript 3.1, much of the committee’s recent work has focused on standardization of such conventions:

Probably the most important development within all this [ECMAScript Harmony news] is the codification of existing de facto standards. For example, the concept of JavaScript getters and setters (implemented by Mozilla, Apple, and Opera) are going to be quickly fast-tracked into the specification (in the case of getters and setters they already have been). Seeing real-world code quickly make a bee-line for standardization is truly heartwarming. We’ll probably see more of this for topics like ‘let’ and ‘expression closures’ – but which will arrive post-ECMAScript 3.1 (since they require new syntax).

So will definition-order property iteration make it into ECMAScript 3.1? One can only hope, for the sake of the V9, V10 and V11 development teams…

UPDATE: The current (September 1st, 2008) ECMAScript 3.1 spec contains the following comment regarding the text in question:

We considered specifying the enumeration order but there were too many issues with existing implementations that optimize the representation of arrays.

Argh…

From the Annals of Leaky Abstractions

Last week I created a new Java package named “net.gredler.app.converter” during a bit of refactoring. I know, I know. Pretty impressive stuff. But there’s more.

If you’ve used Eclipse before, you know that it provides feedback as you type, alerting you if your package name is not valid as is. For example, if you type “net.gredler.app.”, Eclipse will helpfully throw up the following error:

Invalid package name. A package name cannot start or end with a dot.

Well, I eventually got to “net.gredler.app.con”, and received the following error message:

Invalid package name. con is an invalid name on this platform.

Weird, no? It turns out that there are some limitations on directory names in Windows: you can’t have directories named “con”, “prn”, “aux” or “nul”, among others.

Apparently these were reserved words in DOS back in the day, and this restriction has propagated to the latest versions of Windows in the name of backward compatibility.

So if you’re coding in Linux or Mac OS X and want to ensure that your Java web application isn’t deployable on Windows, adding a package named “con” ought to do the trick (assuming your servlet container explodes WARs) ;-)

Maven Enhancements to Keep an Eye On

MNG-3397: More concise POM syntax (more info here).
MNG-3379: Concurrent artifact resolution (more info here).
MNG-2315: Easy mass transitive dependency exclusions.
MNG-1977: Global transitive dependency exclusions.

Have I missed any?

HtmlUnit 2.2 Released

A new version of HtmlUnit, the Java headless browser, has been released. The main purpose of this library is to enable scalable, performant pure-Java integration testing of web applications. HtmlUnit can also be used to scrape the web, and drives a number of other open source libraries, including Canoo WebTest, WebDriver, Celerity, Schnell, and JSFUnit.

Highlights of changes incorporated in version 2.2 include:

- Better handling of ill-formed HTML.
- Enhancements in the areas of performance and memory usage.
- Enhanced API for dealing with attachments.
- Enhanced API for dealing with proxies.
- Use of a (temporary) forked version of Rhino to fix many JavaScript bugs.
- More than 80 bugfixes and enhancements overall.

Please see the changelog for more information.

HtmlUnit 2.2 is available via the central Maven repository, or may be downloaded directly here.

JavaScript Isn’t a Toy Language (Anymore)!

So… JavaScript. When did one begin to feel that this crufty, popup-enabling, slightly-better-than-VB programming language for the unwashed masses might actually merit a second look?

Was it the first time you used Google Maps and realized you were moving the map without reloading the entire page?

Was it when Sun decided to include Rhino in the JDK?

Or when you browsed the Dojo codebase and realized that Java doesn’t have a monopoly on obtuse, enterprisey, over-architected design?

No? Maybe you figured it out when 60+ companies got together and decided it was worth the effort to start the OpenAjax Alliance in order to formalize common sense best practices for JavaScript libraries.

I know! It was when you (and your mother, and your coworkers, and all of their extended families) read Steve Yegge’s NBL blog post.

Actually, maybe it was when you decided to add John Resig to your blogroll.

Me?

The other week I read this article by John, in which he mentions the big O performance characteristics of a certain JavaScript benchmark. It doesn’t matter what benchmark, just focus on the important part here: big O. In an article on JavaScript. Big O. And JavaScript. Big O. JavaScript. And nary a raised eyebrow among the comments; almost the complete opposite, actually!

It’s almost like it’s respectable, or som’n.

URL.hashCode() Considered Harmful

I just cut HtmlUnit’s build time by about 20% by changing four lines of code. How? HtmlUnit keeps a small cache of web requests in a HashMap, keyed on the request URL. The problem with this is twofold:

  1. The URL.hashCode() method is synchronized.
  2. The URL.hashCode() method triggers DNS lookups for the URL hosts.

The impact of item 2 was magnified by the fact that some of the HtmlUnit unit tests use a mock web connection to connect to fake URLs. DNS (non)resolution of these fake URLs took an especially long time.

The fix was to key the map entries on the value of URL.toString() instead. Apparently I’m not the first person to stumble across this problem. So think twice before coding your next HashMap<URL, XXX> ;-)

HtmlUnit 2.1 Released

The HtmlUnit team is pleased to announce a new release of HtmlUnit. This latest version includes a number of bug fixes and performance enhancements, and sports excellent support for GWT, jQuery and Sarissa, decent support for Prototype and Dojo, and basic support for YUI. Please see the changelog for more details.

In related news, we’ve (temporarily) forked the Rhino JavaScript engine in order to add browser-compatible JavaScript behavior which is slowly making its way into the Rhino project proper. The most important of these changes (so far) is definition-order property iteration. All of this should be available in the next version; many thanks to Marc Guillemot for his work in this area.

Anyway, give it a whirl and let us know what you think!

Thomas Paine on Software Design

I draw my idea of the form of government software from a principle in nature which no art can overturn, viz. that the more simple any thing is, the less liable it is to be disordered, and the easier repaired when disordered;…

Thomas Paine, Common Sense (1776)

Hibernate: Trouble in Paradise

I’ve written before about the problems which seem to crop up when you introduce Hibernate into your project. Unfortunately, I’m becoming more and more convinced that these are not isolated issues. Hibernate has a dependency management problem.

Things were not always this way. Hibernate was once independent, unbeholden to outside influences. Sure, you’ve always needed a somewhat-thicker-than-average skin in order to file bugs or ask questions in the forums. An extra helping of patience never hurt, either. But nowadays it seems like you need all of these things, plus the time and skill necessary to patch a JAR or two.

We recently upgraded to the latest Hibernate production JARs: Hibernate Core 3.2.6.GA, Hibernate Annotations 3.3.0.GA and Hibernate EntityManager 3.3.1.GA. Having already benefited from our earlier experience regarding conflicts between Spring and Hibernate, I figured this would take about five minutes: modify our root Maven2 POM, do a full build, verify everything works. Done! Not quite.

The first problem we encountered, reported 4 months ago, is that the production Hibernate EntityManager POM excludes a transitive dependency which it should not exclude. Net result? You have to hack together a custom version of Hibernate EntityManager which does not exclude this transitive dependency. Of course, this transitive dependency exists in the JBoss Maven2 repo, but not in the central repo. Nice.

Next, if you’re using a standard Maven2 Windows installation, you’ll run into this bug, because Hibernate now refuses to load JARs from directories with spaces in them (the standard location for Maven2 local repositories in Windows is in “Documents and Settings”). Very nice. Welcome to 1998!

You may notice a trend here: an old ASM dependency which causes conflicts with other libraries, a required dependency that is mistakenly excluded, and a dependency on a deprecated class in a buggy JBoss utility library.

Two of my co-workers are already suggesting we switch to TopLink. Jokingly, of course. For now. Napoleon is reputed to have said that “if they want peace, nations should avoid the pin-pricks which precede cannon shots.” Third-party libraries should likewise avoid annoying their users with irritating minutiae, or they may find these users mobilizing the artillery.

Space vs Time

A long, long time ago I took a college course, the title of which was Languages and Translation. The content of this sophomore-level course? A smörgåsbord of systems programming, heaps and stacks, pointers, *nix system calls, compilers, lex and yacc, grammars, lexical and semantic analysis, code optimization, and data representation — all taught and learned in C. While learning C. Oh, and Lisp.

This course made quite an impression, mainly because of my initial inexperience. I should explain that the path which led me to L&T went something like this:

  1. I’m a senior in high school. I’m applying to college. I need to choose a major. Hmm… writing that Blackjack game on my TI calculator was pretty cool. It was a whole 50 lines of code! Plus I’m in that typing class, closing in on 20 words per minute. Maybe I’ll try Computer Science!
  2. I’m a freshman at Georgia Tech. First semester. I’m taking Intro to Computing. Man, this HTML nonsense sure uses a lot of brackets! And this Microsoft Access program is impossible to use!
  3. Still a freshman at Tech. Second semester. This Intro to Programming class is pretty crazy! We’re using Java for the assignments. The TA mentioned in passing that there are no pointers in Java, but I have no idea what a pointer is, so I could care less. I’m beginning to grok object-oriented programming.
  4. Welcome to Languages and Translation! Malloc, Malloc, Malloc! Realloc, Calloc, Malloc! Bwahahahahahaha!

Psychological damage aside, this was a great class. Jim Greenlee, who taught the course, was both an evil bastard and a great teacher. One of the tenets of code optimization which he often highlighted was “space versus time,” the idea that you can often optimize for one at the expense of the other, but rarely for both at the same time.

For example, a compiler can decide to inline a short function in order to avoid time-consuming stack allocations, but the compiled program will be larger (less time, but more space). Of course, if space (memory) is at a premium, your compiler might instead try to recognize common code sequences and hoist them into artificial functions (less space, but more time).

Flash forward 8 years. We have a client/server application at work which uses DTOs to transfer data to and from the client, and we use Hibernate on the server to persist our BOs. A specific server call, invoked in the presence of a large amount of data, brings the application to its knees.

Immediately we jumped to conclusions — Arrgh! Hibernate is such a hog! If you don’t code things perfectly, you can’t scale! And sometimes not even then! A quick profiling session confirmed our fears. Three hours later we had bypassed Hibernate in this specific instance, coding to the JDBC API instead. Unfortunately, this wasn’t the last of our performance problems. A second profiling session indicated that we had another bottleneck in our DTO-to-BO conversion routines!

Now, something which must be understood about Hibernate’s collection semantics is that when you use Hibernate to load BO A, which has an X-to-N relationship with BO B, you should (usually) use the collection of Bs provided by Hibernate. For example, if you use Hibernate to load a UserGroup, and you want to modify the list of Users associated with said UserGroup, you should modify the existing list of Users. You should not create a new list, add Users to it, and then give the UserGroup the new list of Users. Why? Because creating a new list results in a one-shot delete, followed by N insert statements. This is usually not desired.

However, a naive approach to modifying the collection provided by Hibernate (clearing it and then adding the BOs which you know you want) is just as bad, because a call to collection.clear() also results in a one-shot delete. The best approach is one of minimal modification to the existing collection.

In the case of DTO-to-BO conversion, where the DTO representation of an object is being transferred to the corresponding BO representation, this means adding items to the BO’s collection that are in the DTO’s collection but not in the BO’s collection, and removing items from the BO’s collection that are in the BO’s collection but not in the DTO’s collection. Elements that are in both collections are simply ignored.

The obvious implementation of this algorithm looks something like this:

for(DTO dto : dtos) {
 if(!contains(bos, dto)) add(bos, dto);
}

for(BO bo : bos) {
 if(!contains(dtos, bo)) remove(bos, bo);
}

Unfortunately, when using lists, the contains() calls above hide a nested loop, resulting in O(n2) performance. Once n gets into the thousands, things start to get sluuuugish. Veeeeery sluuuugish.

The solution? Trade a little space for a lot of time! By constructing HashMaps which contain all of the BOs in the lists, keyed on business keys which uniquely identify the BOs, the contains() calls above can be performed in constant time by invoking map.containsKey(). The result is O(n) performance. Much better!

« Older entries Newer entries »

Follow

Get every new post delivered to your Inbox.