One Byte At A Time

Whilst working on Postvorta one of the things I've tried to do is to make the code as efficient as possible in order that search results are returned as quickly as possible. Mostly this has involved caching data where possible as well as using efficient data structures and algorithms. Of course with Postvorta being a web application part of the time taken to show search results is dependent on the amount of data that is actually returned to a browser including; HTML pages, JavaScript files, style sheets, and images. I am already using JAWR to minify and compress JavaScript and CSS files which makes a real difference to the amount of data that you have to download each time you search but in this post I want to talk about a small issue I uncovered when trying to trim just a few bytes from the HTML pages.

I've recently been reading a book on Java Performance by Charlie Hunt and Binu John. While it covers quite a few aspects of performance that I was already aware of there is also quite a lot of information that is new to me. One chapter is devoted to performance tuning for web applications and as well as mentioning minifying and compressing static files (JavaScript, CSS etc.) it devotes a section to considering whitespace in the dynamically generated pages.

When you save a file of text, whitespace characters (spaces, tabs, new lines) all take up the same amount of disk space as any other character, i.e. 1 byte (I know this isn't entirely accurate but I don't want to get into a long discussion of line endings and encoding formats so this assumption will suffice for what follows). This is acceptable if you want to use whitespace for formatting but HTML specifically doesn't use whitespace in this way. Any sequence of whitespace in a HTML file is converted by the browser into a single space character, so it is wasteful to transmit extra whitespace than is needed for the page to be understood and rendered. Of course most people use whitespace not just for formatting but to make the HTML code easier to understand and debug. There are filters that I could add to Postvorta that would strip out all extraneous whitespace before transmitting the results back to the browser but a) this would make debugging the page tricky and b) each filter I add has it's own performance overhead. My plan, therefore, has been to try and re-work the code where possible to eliminate some whitespace while leaving the code readable and to not add an extra filter. In most cases this is easy, but there is one area where eliminating whitespace is more difficult.

When switching between HTML and Java in a JSP page whitespace is often inserted to ensure that the resulting page can be properly interpreted. Unfortunately in almost every case this whitespace is superfluous and can be removed. Fortunately there is an easy way of removing the blank lines from the output that these whitespace characters introduce. The easiest way is to added the following page directive to a JSP page:
<%@ page trimDirectiveWhitespaces="true" %>
While Postvorta currently only contains two pages (the results page and the advanced syntax page) this is easy to do, but in a more complex application there may be tens or hundreds of pages at which point this approach becomes less appealing. You can, however, enable the same feature for every page by editing the applications web.xml to add the following:
<jsp-config>
  <jsp-property-group>
    <url-pattern>*.jsp</url-pattern>
    <trim-directive-whitespaces>true</trim-directive-whitespaces>
  </jsp-property-group>
</jsp-config>
I tried both approaches and they do indeed produce the same output, which in my test case brought the page size down to 13,238 bytes from the original 13,300, saving me a total of 62 bytes! Now 62 bytes might not be very much but this is per page view and so can quickly mount up. Looking at the differences between the old and new pages I noticed that there were still quite a large number of blank lines in the head section of the HTML file that I thought should have been removed. It turns out that the problem is related to how I style the pages but is easy to solve.

I use SiteMesh (I'm using v2.4.1) to style all the pages within Postvorta. This allows me to define the main layout of the pages once and then use this to display all pages. For those of you who use Blogger, you can consider a SiteMesh layout to be equivalent to your blog template. The layout is applied via a SiteMesh specific filter and it appears, that when using the web.xml approach to enable the trimming of whitespace, the layout is applied after the whitespace has been trimmed. This means that whitespace within the main body of the page is removed but not within the head section. The trick is to use the page directive approach within the SiteMesh layout. This has the advantage of being applied to the entire page, only needs to be specified once, and in my test case saves another 15 bytes which brings the total page size down by 77 bytes to 13,223 bytes.

The total savings are small, but if Postvorta ever becomes really popular, shaving a few bytes here and there might well make a noticeable different to performance.

2 comments:

  1. Duh! I'd like to pretend I understood at least two sentences but that would have been bragging!

    ReplyDelete
    Replies
    1. Sorry about that! Mind you the reason I started this blog was, at least in part, to make it clear that not everyone would understand every post.

      I guess the take home message is that the smaller I can make the page the quicker it will load. In fact since I wrote this post I've done some more work and the test page is now down to just 12,578 bytes, a total saving of 722 bytes.

      Delete