I've recently been working with an SEO firm to improve our "keyword density", structure and several other things on our public website. In their long list of recommendations was the task of producing nice pretty urls with relevant keywords, dashes instead of underscores, and so on, easily said, not so easily executed or so I thought. Our architecture in a nutshell is Apache web servers, fronting WebSphere application servers, running a Struts-based web application. Now if you know Struts, 9 times out of 10 your url's are ugly, because a bunch of programmers didn't care at all when they developed your application about the impact the urls would have on natural search and the framework developers pretty much left you with a bunch of "do.do". Very quickly the SEO firm was recommending 70+ rewrite rules on the Apache server to resolve to the urls in the application and then custom work for each individual url to rewrite it to the friendly url, so that when Googlebot crawls the site it would traverse these friendly urls. I cringed at the thought of this suggestion, not only is this not maintainable, but when I run a local server I can't use the rewritten urls, as my development environment doesn't have a full blown http server with rewrite capabilities. I knew there had to be a better solution, I just wasn't sure what it was.
I happened upon the UrlRewriteFilter, when researching regular expressions and rewrite rules, a rules engine that allows you to set up inbound and outbound rules to modify the urls using regular expressions. It handles every occurance of the url on both sides of the equation. The library comes in a jar with an xml configuration file that goes in your WEB-INF directory, the only other thing that needs to be done is a little configuration for the filter in the web.xml and the use of response.encodeURL when creating urls in your application. I had the second taken care of since all my url's were encoded via the html:link tag included with Struts.
The rules in the urlrewrite.xml look like this:
^/tidy/page$ /old/url/scheme/page.do ^/old/url/scheme/page.do$ /tidy/page
Bye bye ugly struts url's. Some other notable features, you can set up conditional rules, you can call custom code to manipulate the structure of urls, you can execute multiple rules in a row, you also have access to several other application tidbits, such as an attributes and parameters in the request. This was exactly what I needed, I'd like to thank Paul Tuckey for a much needed and well written open source rewrite rules engine, and saving me many hours of pain and suffering.