JRuby + HtmlUnit

I have to say, JRuby is cool stuff.

About a year ago the guys at FINN.no decided to wrap HtmlUnit in a Watir-ish API; Celerity was born, and HtmlUnit was introduced into the Ruby ecosystem.

Now Celerity is itself being wrapped by Culerity, which integrates Celerity and Cucumber.

How cool is that?

JavaScript Performance: Rhino beats IE?

I’ve been examining HtmlUnit’s performance from a couple of different angles lately. As a pure-Java headless browser intended for integration testing, one of HtmlUnit’s big draws is improved performance vis-a-vis native browsers and libraries which drive native browsers (Selenium, WebDriver, etc).

One the one hand, it’s easy to see that HtmlUnit reduces overhead by forgoing a GUI. No layouting, no drawing, no problem. If you poke around a little bit, you’ll also find that HtmlUnit does not download most images (there are some exceptions), nor does it download external CSS files if CSS has been disabled — all advantages in terms of network usage.

However, as you get closer to the RIA end of the web application spectrum, these performance advantages become increasingly overshadowed by JavaScript performance. HtmlUnit relies on Rhino to do the JavaScript heavy lifting behind the scenes, so as web applications become more functional, we’re going to be relying more and more on Rhino’s muscle.

Google just released version 3 of their V8 JavaScript Benchmark Suite, which tests pure JavaScript and pretty much ignores the DOM manipulation side of things — making it a perfect worst-case scenario benchmark with which to compare HtmlUnit to native browsers. In other words, if you use HtmlUnit such that all of its traditional performance advantages are negated (unlikely though that may be), how does it stack up against the native browsers?

Not too bad, as it turns out (bigger numbers are better):

javascript-benchmark1

The good news is that Rhino is more performant than IE 6 or IE 7, so HtmlUnit still beats these browsers in this unrealistic worst-case scenario.

The bad news is that IE is by far the slowest native browser out there in terms of JavaScript execution speed; we can’t assume that it will remain slow forever… can we?

HtmlUnit 2.4 Released

HtmlUnit 2.4 has been released. See the changelog for more information about all of the improvements made since September, when version 2.3 was released. The TSS announcement is probably a good place to comment or ask questions about this release.

It’s exciting to see the steady improvement in JavaScript support. From the HtmlUnit main page:

The unit tests of some well-known JavaScript libraries are included in HtmlUnit’s own unit tests; based on these unit tests, the following libraries are known to work well with HtmlUnit:

  • jQuery 1.2.6: Full support (see unit test here )
  • MochiKit 1.4.1: Full support (see unit tests here )
  • GWT 1.5.3: Full support (see unit test here )
  • Sarissa 0.9.9.3: Full support (see unit test here )
  • Prototype 1.6.0: Very good support (see unit test here )
  • Ext JS 2.2: Very good support (see unit test here )
  • Dojo 1.0.2: Good support (see unit test here )
  • YUI 2.3.0: Good support (see unit test here )

HtmlUnit 2.2 Released

A new version of HtmlUnit, the Java headless browser, has been released. The main purpose of this library is to enable scalable, performant pure-Java integration testing of web applications. HtmlUnit can also be used to scrape the web, and drives a number of other open source libraries, including Canoo WebTest, WebDriver, Celerity, Schnell, and JSFUnit.

Highlights of changes incorporated in version 2.2 include:

- Better handling of ill-formed HTML.
- Enhancements in the areas of performance and memory usage.
- Enhanced API for dealing with attachments.
- Enhanced API for dealing with proxies.
- Use of a (temporary) forked version of Rhino to fix many JavaScript bugs.
- More than 80 bugfixes and enhancements overall.

Please see the changelog for more information.

HtmlUnit 2.2 is available via the central Maven repository, or may be downloaded directly here.

URL.hashCode() Considered Harmful

I just cut HtmlUnit’s build time by about 20% by changing four lines of code. How? HtmlUnit keeps a small cache of web requests in a HashMap, keyed on the request URL. The problem with this is twofold:

  1. The URL.hashCode() method is synchronized.
  2. The URL.hashCode() method triggers DNS lookups for the URL hosts.

The impact of item 2 was magnified by the fact that some of the HtmlUnit unit tests use a mock web connection to connect to fake URLs. DNS (non)resolution of these fake URLs took an especially long time.

The fix was to key the map entries on the value of URL.toString() instead. Apparently I’m not the first person to stumble across this problem. So think twice before coding your next HashMap<URL, XXX> ;-)

HtmlUnit 2.1 Released

The HtmlUnit team is pleased to announce a new release of HtmlUnit. This latest version includes a number of bug fixes and performance enhancements, and sports excellent support for GWT, jQuery and Sarissa, decent support for Prototype and Dojo, and basic support for YUI. Please see the changelog for more details.

In related news, we’ve (temporarily) forked the Rhino JavaScript engine in order to add browser-compatible JavaScript behavior which is slowly making its way into the Rhino project proper. The most important of these changes (so far) is definition-order property iteration. All of this should be available in the next version; many thanks to Marc Guillemot for his work in this area.

Anyway, give it a whirl and let us know what you think!

HtmlUnit in the Wild: New Features

I’ve been using HtmlUnit to crawl the web for the past couple of weeks. This interesting experience has led to two new features:

First, I’ve added an insecure SSL handler which trusts anyone and everyone. Why? Because websites often have misconfigured or expired SSL certificates, and the standard Java behavior is to throw a bunch of exceptions when this happens. Not very nice. So now you can call WebClient.setUseInsecureSSL(true) instead and continue crawling, happily oblivious to the webmaster’s incompetence.

Second, I’ve added a popup blocker. Lots of sites send a bunch of popups your way, and even though they’re not quite as annoying when you’re using a headless browser like HtmlUnit, they still waste time and bandwidth. So now you can call WebClient.setPopupBlockerEnabled(true), and your crawler will be that much faster.

These features will be available in HtmlUnit 1.14, or you can just grab the latest snapshot build here. Enjoy!

Assertions in HtmlUnit

I’ve been going back through the pros and cons of JWebUnit as part of my research for the HtmlUnit vs Foo series of articles I’m writing. One of JWebUnit’s big draws is the set of easy-to-use assertion methods provided by its base test case class, WebTestCase. HtmlUnit doesn’t provide such a thing, because it doesn’t provide a base test case class.

There has always been something of a trade-off here: use JWebUnit and tie yourself to a specific unit testing framework (JUnit) while benefiting from a more domain-specific set of assertions (assertCookiePresent, assertFormPresent, assertLinkPresent, etc), or fly free with HtmlUnit but perform assertions using only the primitive utility methods provided by your unit testing framework (assertNull, assertNotNull, assertEquals, etc).

However, I’ve long though that it would be nice for HtmlUnit to have the best of both worlds by using an assertion utility class, similar to TestNG’s Assert class. Experiencing the convenience of JWebUnit’s API again has given me the final kick in the pants, and the first version of HtmlUnit’s new WebAssert class is now in SVN. It will be included as part of HtmlUnit 1.14, or you can always grab the latest build here. I’m sure the set of available assertions will grow, but here is the initial list:

  • assertTitleEquals(HtmlPage, String)
  • assertTitleContains(HtmlPage, String)
  • assertTitleMatches(HtmlPage, String)
  • assertElementPresent(HtmlPage, String)
  • assertElementPresentByXPath(HtmlPage, String)
  • assertElementNotPresent(HtmlPage, String)
  • assertElementNotPresentByXPath(HtmlPage, String)
  • assertTextPresent(HtmlPage, String)
  • assertTextPresentInElement(HtmlPage, String, String)
  • assertTextNotPresent(HtmlPage, String)
  • assertTextNotPresentInElement(HtmlPage, String, String)
  • assertLinkPresent(HtmlPage, String)
  • assertLinkNotPresent(HtmlPage, String)
  • assertLinkPresentWithText(HtmlPage, String)
  • assertLinkNotPresentWithText(HtmlPage, String)
  • assertFormPresent(HtmlPage, String)
  • assertFormNotPresent(HtmlPage, String)
  • assertInputPresent(HtmlPage, String)
  • assertInputNotPresent(HtmlPage, String)
  • assertInputContainsValue(HtmlPage, String, String)
  • assertInputDoesNotContainValue(HtmlPage, String, String)

HtmlUnit vs HttpUnit

There’s a lot of misinformation out there regarding web application test tools, so I’ve decided to post a series of short articles comparing some of the open source options available here in Java-land, circa 2007. The first of these articles will focus on HtmlUnit and HttpUnit. Please take my criticism and praise with a grain of salt, as I’m a committer to the HtmlUnit project and thus probably biased. Nevertheless, I will do my best to be objective. I may even overcompensate in the other direction!

Confusion

The HtmlUnit and HttpUnit projects are often confused due to the similarity of their names. And the similarity doesn’t end there: they are both open source projects; they are both 100% Java frameworks, rather than drivers for native browsers like IE or Firefox; and they are both fairly mature projects.

This confusion is compounded by the fact that many test frameworks which once used HttpUnit under the covers have since switched to using HtmlUnit, mainly in order to benefit from its excellent JavaScript support. Examples include JWebUnit, whose FAQ briefly explains the switch, and Canoo WebTest, which switched in 2004 due to JavaScript support issues and an unresponsive development team [1].

HttpUnit

HttpUnit is the granddaddy web app testing framework. Started in the summer of 2000 by Russ Gold [2], it was the first project to focus on this niche area. The project has since stagnated somewhat, with nearly 40% of bugs remaining open, some of them nearly three years old. Its latest maintenance release is about a year and a half old.

The API is fairly low-level, modeling web interactions at something approaching the HTTP request and response level. The following is a slightly modified example from the HttpUnit Cookbook:

WebConversation wc = new WebConversation();
WebResponse resp = wc.getResponse("http://www.google.com/");
WebLink link = resp.getLinkWith("About Google");
link.click();
WebResponse resp2 = wc.getCurrentPage();

As you can see, things center around WebConversations, WebRequests and WebResponses. Unfortunately, any page with a decent amount of JavaScript is likely to break HttpUnit, and you can absolutely forget testing any pages which use third party JavaScript libraries.

Nevertheless, HttpUnit continues to generate 3,000 to 4,000 downloads per month. A good analogy, if I may be allowed a brief subjective comment, is that HttpUnit is to the web app testing world what Struts is to the web app framework world: there are many “better” options out there, but it just won’t go away! ;-)

HtmlUnit

HtmlUnit is itself a fairly old project, having been started by Mike Bowler in early 2002. Mike has since ceased active development, but the project currently boasts 3 or 4 active developers and a total of seven committers (whereas HttpUnit remains a one-man show). It averages about three releases per year, and has seen increased developer activity in the past six months or so, especially in the area of JavaScript support.

HtmlUnit’s API is a bit more high-level than HttpUnit’s, modeling web interaction in terms of the documents and interface elements which the user interacts with:

WebClient wc = new WebClient();
HtmlPage page = (HtmlPage) wc.getPage("http://www.google.com");
HtmlForm form = page.getFormByName("f");
HtmlSubmitInput button = (HtmlSubmitInput) form.getInputByName("btnG");
HtmlPage page2 = (HtmlPage) button.click();

As you can see, the code centers around WebClients, as well as pages, links, forms, buttons, etc. Pages with a modicum of custom JavaScript will probably work when tested with HtmlUnit. Unfortunately, pages which use third party libraries might or might not work when tested via HtmlUnit. As of the current version, Prototype, Script.aculo.us, DWR and jQuery are known to be supported fairly well, Dojo is a bit of an unknown, YUI is known to be unsupported, and GWT is known to work with fairly simple applications. Most of this compatibility has been achieved in the past two or three releases, so obviously things are fairly fluid.

Conclusion

If you’re using HttpUnit for legacy reasons, it’s a fairly solid package, but don’t expect to get much support when you need to report a bug or submit a patch for a new feature. If you’re starting a new project and are trying to decide between these two frameworks, HtmlUnit wins hands down. It has the features, the community and the momentum.

Of course, if you’re considering web application testing tools, you’re probably looking at more than just these two options. Canoo WebTest, TestMaker, JWebUnit, Selenium, WebDriver and JMeter are all likely to be on your list. Depending on your project budget and requirements, Squish and Mercury QTP may also be under consideration. If that’s the case, stay tuned, because I intend to post a series of web app testing framework comparisons in the coming months — all of them involving HtmlUnit, of course!

[1] It’s interesting to note that both Marc Guillemot and I (two HtmlUnit committers) began by using HttpUnit, submitting patches for missing features — but settled on HtmlUnit when the patches were not applied in a timely manner.

[2] The HttpUnit website states that Russ currently works for Oracle, developing the OC4J application server. Coincidentally, this is the production application server we’re using at my day job. Thanks, Russ! :-)

HtmlUnit 1.12 Released

As per Marc’s announcement on the mailing lists and my post to TSS, HtmlUnit 1.12 has been released.

It contains a really mind-blowing number of bugfixes, a couple of very important performance improvements (including one last minute change which cut the build time by a third), and a couple of new features like Marc’s experimental AJAX controller. The change log has all the details.

Progress has been made in the compatibility department on a number of fronts: Marc Guillemot has been working on script.aculo.us drag’n'drop support, more Prototype unit tests are passing, I’ve gotten all the jQuery unit tests to pass, and Ahmed Ashour has committed support for basic GWT applications.

Robert Di Marco and I are both interested in investigating YUI compatibility, so there may be some news to look forward to in that area.

Enjoy!

« Previous entries Next Page » Next Page »