HtmlUnit in the Wild: New Features

I’ve been using HtmlUnit to crawl the web for the past couple of weeks. This interesting experience has led to two new features:

First, I’ve added an insecure SSL handler which trusts anyone and everyone. Why? Because websites often have misconfigured or expired SSL certificates, and the standard Java behavior is to throw a bunch of exceptions when this happens. Not very nice. So now you can call WebClient.setUseInsecureSSL(true) instead and continue crawling, happily oblivious to the webmaster’s incompetence.

Second, I’ve added a popup blocker. Lots of sites send a bunch of popups your way, and even though they’re not quite as annoying when you’re using a headless browser like HtmlUnit, they still waste time and bandwidth. So now you can call WebClient.setPopupBlockerEnabled(true), and your crawler will be that much faster.

These features will be available in HtmlUnit 1.14, or you can just grab the latest snapshot build here. Enjoy!

About these ads

1 Comment

  1. December 2, 2007 at 3:43 pm

    [...] Selenium was based on an older application driver called JWebUnit, which in turn was originally based on HttpUnit. JWebUnit and most other application drivers are now based on HtmlUnit, and all of the history and other detailed information can be read about on this wonderful blog entry entitled HtmlUnit vs. HttpUnit (a must read!). That article is also very good to hear about support of external Javascript libraries, and the author’s blog also has updates on new HtmlUnit features. [...]


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: