<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-7025390077222270096</id><updated>2012-03-06T05:47:44.654Z</updated><category term='JSpell'/><category term='SVGRoundTrip'/><category term='Simple QuickTime Embedding'/><category term='Cranium'/><category term='CSS'/><category term='Quick As A Flash'/><category term='3DAssembler'/><category term='GATE'/><category term='Mímir'/><category term='Postvorta'/><category term='JavaScript'/><category term='Java'/><category term='Date Parser'/><title type='text'>Code from an English Coffee Drinker</title><subtitle type='html'>Rather than bore my family with code snippets I'll post them here!</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>40</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-8496890202747131915</id><published>2012-03-04T17:24:00.000Z</published><updated>2012-03-04T17:39:01.157Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Postvorta'/><title type='text'>Postvorta: WordPress Support</title><content type='html'>&lt;img style="float:left; padding: 0 10px 10px 0;" src="http://4.bp.blogspot.com/-6fuioV9heXA/T1OiYt8UdDI/AAAAAAAAA-c/PYJQ3J-hYo0/s400/postvorta-wordpress.png" /&gt;Just a quick &lt;a href="http://englishjavadrinker.blogspot.com/search/label/Postvorta"&gt;Postvorta&lt;/a&gt; related announcement (which I'm sure you'll have figured out from the title and image); Postvorta now supports indexing WordPress blogs! Currently this is limited to blogs hosted on &lt;a href="http://wordpress.com/"&gt;WordPress.com&lt;/a&gt; although I'm hoping to extend the support to self hosted WordPress blogs shortly.&lt;br /&gt;&lt;br /&gt;Whilst the search works the same as it does for Blogger hosted blogs there are a couple of things to note. Firstly there are currently no image thumbnails in the search results. While WordPress blogs do include image information within the data I use to index the blogs it's not in the same format or as easily usable as the support provided by Blogger. I'll try and rectify this at some point, as I extend support to self hosted WordPress blogs and possibly other blogging platforms. Also free WordPress.com blogs don't allow custom gadgets to include JavaScript or HTML forms and so I don't currently have a way of providing a search gadget like I do on Blogger. other than those two issues it seems to work fine and I've already had positive feedback from one happy user.&lt;br /&gt;&lt;br /&gt;So if you have a WordPress.com blog and you'd like to use Postvorta then feel free to &lt;a href="http://www.postvorta.co.uk/register.jsp"&gt;sign up to the beta programme&lt;/a&gt;.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-8496890202747131915?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/8496890202747131915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/03/postvorta-wordpress-support.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8496890202747131915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8496890202747131915'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/03/postvorta-wordpress-support.html' title='Postvorta: WordPress Support'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-6fuioV9heXA/T1OiYt8UdDI/AAAAAAAAA-c/PYJQ3J-hYo0/s72-c/postvorta-wordpress.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-3509075256729243063</id><published>2012-03-04T12:42:00.000Z</published><updated>2012-03-04T14:28:08.415Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='GATE'/><title type='text'>Disposable Memories</title><content type='html'>&lt;i&gt;This post is going to be a very technical look at &lt;a href="http://gate.ac.uk"&gt;GATE&lt;/a&gt;'s memory consumption. This will involve discussion of Java's class loading mechanism and garbage collection. If none of that interests you, or if that first sentence didn't make sense, then can I make a friendly suggestion that you stop reading about now!&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;img style="float:left; width: 220px; padding: 10px 10px 0 0;" src="http://2.bp.blogspot.com/-EBUDbIwof9I/T1NAg6MttTI/AAAAAAAAA9U/U3FIxDIIuQc/s400/memories.jpg" /&gt;Before we dive in you need a little background on how memory is used in Java. This short description isn't entirely accurate (a full and accurate description would probably require an entire book) but will suffice for the purposes of this post.&lt;br /&gt;&lt;br /&gt;There are, broadly speaking, two types of information that Java keeps in memory when running an application; the definitions of each class that has been used and the information about each instance of a particular class. The instance level information is created when you use the &lt;code&gt;new&lt;/code&gt; keyword and is made available for garbage collection when there is no longer anyway of accessing the specific instance. The class definitions on the other hand are only ever released when the classloader instance, which loaded them into memory, is garbage collected. If the Java application doesn't do any custom class loading then all class definitions will be loaded into the system classloader which is never garbage collected. Class definitions are stored in the PermGen area of the Java heap which is why, if you load too many class definitions, you will eventually get the message:&lt;br /&gt;&lt;pre class="brush: java"&gt;Exception in thread "main" java.lang.OutOfMemoryError: PermGen space&lt;/pre&gt;The common way of avoiding this problem is simply to increase the amount of available memory (either the total heap size or just the PermGen area). While this often makes the problem go away, it actually just increases the time until the problem will occur.&lt;br /&gt;&lt;br /&gt;&lt;img style="float: right; padding-left: 10px;" src="http://1.bp.blogspot.com/-8CEWNsYPKBc/T1NK_urp1OI/AAAAAAAAA9g/UfEIOGDji0U/s1600/memory-classloaders-1.png" /&gt;GATE itself contains a lot of classes (over 2000 just in the main source tree) but also supports dynamically loading &lt;a href="http://gate.ac.uk/userguide/sec:developer:plugins"&gt;plugins&lt;/a&gt; and compiling new classes from &lt;a href="http://gate.ac.uk/userguide/chap:jape"&gt;JAPE grammars&lt;/a&gt;. This all means that there is no maximum number of classes that might be loaded and hence no way of ensuring that the PermGen is always big enough. Fortunately GATE doesn't dynamically load classes into the system classloader but into a custom classloader as you can see from this diagram. Unfortunately this classloader is a singleton instance which is never released.&lt;br /&gt;&lt;br /&gt;One of the side effects of this is that if you re-initialize (or close and re-create) a JAPE transducer a new copy of the class definitions are created and added to the PermGen area. This is one of the reasons that when using GATE as part of a web service we suggest using a pool of pipelines rather than creating a new pipeline for each request. Not only does this reduce response times (assuming an adequate sized pool) but also prevents exhaustion of the PermGen area. To show just how quickly this can become a problem I wrote the following short piece of code.&lt;br /&gt;&lt;pre class="brush: java"&gt;Gate.runInSandbox(true);&lt;br /&gt;Gate.init();&lt;br /&gt;&lt;br /&gt;Gate.getCreoleRegister().registerDirectories(&lt;br /&gt;  (new File("/home/mark/gate-top/externals/gate/plugins/ANNIE/"))&lt;br /&gt;    .toURI().toURL());&lt;br /&gt;&lt;br /&gt;FeatureMap params = Factory.newFeatureMap();&lt;br /&gt;Transducer jape = (Transducer)Factory.createResource(&lt;br /&gt;  "gate.creole.ANNIETransducer", params);&lt;br /&gt;&lt;br /&gt;long count = 1;&lt;br /&gt;while (true) {&lt;br /&gt;  System.out.println(count);&lt;br /&gt;  jape.reInit();&lt;br /&gt;  ++count;&lt;br /&gt;}&lt;/pre&gt;This simply initializes GATE, creates a single instance of the ANNIE NE Transducer and then repeatedly re-initializes it. On my machine Java defaults to using 82MB for the PermGen and this was exhausted after loading the transducer just 104 times. I was monitoring the performance using &lt;a href="http://visualvm.java.net/"&gt;VisualVM&lt;/a&gt; and you can see from the following screen shot how the memory was quickly exhausted.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-N7OksApMMJQ/T1NPOZsbmyI/AAAAAAAAA9s/W049oine0-M/s1600/normal-gate.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img src="http://2.bp.blogspot.com/-N7OksApMMJQ/T1NPOZsbmyI/AAAAAAAAA9s/W049oine0-M/s600/normal-gate.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;There are a number of other issues with using a singleton classloader but they boil down to the fact that once a class has been defined it can never be forgotten or redefined. The practical aspects of this are that unloading a plugin doesn't actually result in the class definitions being forgotten, so you can't unload, recompile, reload a plugin. If you want to make a change you have to close and restart GATE. Another problem is that if two plugins try and load different versions of the same class only the first version will be used. This is particularly problematic when dealing with complex plugins which may use multiple third-party libraries. With the plugins in the main distribution we try and keep to just one version of each library but clearly we have no control over third-party plugins.&lt;br /&gt;&lt;br /&gt;&lt;img style="padding-left: 10px; float: right;" src="http://2.bp.blogspot.com/-xbYYxrR7yLY/T1NTq1SN-oI/AAAAAAAAA-E/V9v3mvcmMjk/s400/memory-classloaders-2.png" /&gt;These issues have annoyed me for a while, but I haven't had either the time (this has all been done in my own time) or in some cases the technical knowledge to do anything about it. A few weeks ago, having read a book on &lt;a href="http://www.amazon.co.uk/gp/product/0137142528/ref=as_li_ss_tl?ie=UTF8&amp;tag=markgreenwoos-21&amp;linkCode=as2&amp;camp=1634&amp;creative=19450&amp;creativeASIN=0137142528"&gt;Java Performance&lt;/a&gt;, a couple of pieces of the puzzle started to fall into place and I realized that I could probably have a good crack at a new classloader architecture that would solve all of these problems. You can see the architecture I've adopted in the diagram to the right. There are two things this diagram doesn't show. Firstly there can be any number of plugin or JAPE classloaders, and secondly those classloaders are what is known as &lt;i&gt;parent last&lt;/i&gt;. The reason for having any number of plugin or JAPE classloaders (they are actually the same class I'm just splitting them up on the diagram to show that it handles both plugin loading and compiling JAPE grammars) should be obvious, as it allows us to throw one away when it is no longer required (i.e. we don't want to have to unload all plugins just to unload one). The idea of &lt;i&gt;parent last&lt;/i&gt; classloading, however, requires a more detailed explanation.&lt;br /&gt;&lt;br /&gt;Traditionally classloaders in Java take a &lt;i&gt;parent first&lt;/i&gt; approach. This means that when they are asked to load a class they start by asking their parent classloader (follow the arrows upwards) to load the class, which in turn asks it's parent etc. It's only if this fails that a classloader will itself try and load a class. Changing to using &lt;i&gt;parent last&lt;/i&gt; means that two classloaders can now have different copies of the same class defined within them, and hence we can support different versions of the same third party library appearing within different plugins. This works because when a class in a plugin tries to load a class it looks within it's own plugin before looking in either the main GATE classloader or the classloaders associated with other plugins or JAPE grammars. This mechanism also allows classes in different plugins to refer to one another (i.e. a JAPE grammar can refer to classes loaded via a plugin).&lt;br /&gt;&lt;br /&gt;To ensure that classloaders can be released and garbage collected properly, I've also made a change to the way plugins are unloaded. Currently in GATE unloading a plugin simply removes the definition of the resources it contains from the CREOLE register, but it doesn't unloaded any instances of the resources that are currently in use. This does tend to lead to some funny behaviour (often weird AWT error messages) and to ensure a thorough cleanup I've updated this code so that it does now unload any instances of resources that depend on the plugin before unloading the plugin itself.&lt;br /&gt;&lt;br /&gt;It's taken quite a bit of time hunting through heap dumps using VisualVM but I'm now confident that this new approach works well and that classloaders, and hence classes, can be thrown away when they are no longer needed. As an example, we can use this modified version of GATE along with the sample code I showed before (to re-initialize a JAPE grammar over and over again). This time I let it run for over 1000 iterations (almost 10 minutes) before I stopped it as there was no sign of it running out of memory. A quick look at VisualVM explains why.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-I8hFe5wSDxQ/T1NQ3wgOD_I/AAAAAAAAA94/Yx3EAzXU09Q/s1600/memory-clever-gate.png" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img src="http://3.bp.blogspot.com/-I8hFe5wSDxQ/T1NQ3wgOD_I/AAAAAAAAA94/Yx3EAzXU09Q/s600/memory-clever-gate.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;You can clearly see that every time the PermGen is almost exhausted garbage collection kicks in freeing some of the previously loaded and discarded class definitions.&lt;br /&gt;&lt;br /&gt;This work has all been done in &lt;a href="https://gate.svn.sourceforge.net/svnroot/gate/gate/branches/cl-experiment"&gt;this separate SVN branch&lt;/a&gt; so as not to disrupt the main source tree.  This means that to try this modified, memory friendly version of GATE you will need to check out the branch and compile it yourself. Mind you if you have read all the way to here then I'm going to guess that won't be a problem. My hope is that this will eventually be merged back into the trunk but I'd prefer to wait until this branch has been tested by a number of other people and not just me. Also there are currently 9 tests that fail when using this branch, although I've looked at the failing code and in every case it's actually the test that is at fault. Essentially the tests load each plugin twice, once into the system classloader (as the classes are on the main Java classpath) and once into the GATE classloader (when the plugin is loaded via the API). Because of the change to &lt;i&gt;parent last&lt;/i&gt; classloading, this means that two different definitions of the same class can end up being used which results in apparently nonsensical error messages such as: &lt;code&gt;Error converting class gate.chineseSeg.RunMode to class gate.chineseSeg.RunMode&lt;/code&gt;. This situation never happens during normal use of GATE Developer, and can be fixed by altering the &lt;code&gt;creole.xml&lt;/code&gt; files used to define the plugins. In other words the failed tests shouldn't stop you trying this branch, although I also wouldn't suggest using this branch in a production environment until it has seen further testing.&lt;br /&gt;&lt;br /&gt;So on that note, please try the branch if you have been concerned about memory consumption or had problems with clashing library versions and let me know if you find any problems, or if you have any further suggestions for improvements.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-3509075256729243063?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/3509075256729243063/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/03/disposable-memories_04.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3509075256729243063'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3509075256729243063'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/03/disposable-memories_04.html' title='Disposable Memories'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-EBUDbIwof9I/T1NAg6MttTI/AAAAAAAAA9U/U3FIxDIIuQc/s72-c/memories.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-4665331959861512641</id><published>2012-02-26T15:55:00.001Z</published><updated>2012-02-26T15:56:29.921Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='JavaScript'/><title type='text'>Disable Dynamic Views: An Update</title><content type='html'>&lt;img style="float: left;" src="http://1.bp.blogspot.com/-hg5f7BHj6IY/Tx_GJUOu0pI/AAAAAAAAA58/MmyP8sqgTXw/s400/disable_dynamic_views.png" /&gt;Just over a month ago I &lt;a href="http://englishjavadrinker.blogspot.com/2012/01/putting-spanner-in-works.html"&gt;blogged about&lt;/a&gt; a small GreaseMonkey script that you could install so that you didn't have to see any of Blogger's new &lt;a href="http://buzz.blogger.com/2011/09/dynamic-views-seven-new-ways-to-share.html"&gt;dynamic view templates&lt;/a&gt; ever again. Well it turns out that there was a small bug in that version of the script.&lt;br /&gt;&lt;br /&gt;If you remember, the script essentially works by tricking Blogger into thinking you are viewing the blog without JavaScript by adding &lt;code&gt;v=0&lt;/code&gt; to the query string of each link within the blog. Well it turns out that there was one case where this wasn't working properly; the link that jumps you to the comments within the post specific page. The problem was that these links point not only to a page but a specific anchor by adding &lt;code&gt;#comments&lt;/code&gt; to the URL. Unfortunately my script was adding the query string after the anchor reference (i.e. &lt;code&gt;post-page.html#comments?v=0&lt;/code&gt;) when it should add it to the page location (i.e. &lt;code&gt;post-page.html?v=0#comments&lt;/code&gt;). I've updated the script to v1.1 which contains a fix for this. In theory if you have already installed the script your browser should eventually pick up the new version. But if you haven't yet installed the script or just want to make sure you have the latest version then you can install/upgrade by simply &lt;a href="http://apps.englishcoffeedrinker.co.uk/greasemonkey/disable_dynamic_views.user.js"&gt;clicking this link&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-4665331959861512641?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/4665331959861512641/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/02/disable-dynamic-views-update.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/4665331959861512641'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/4665331959861512641'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/02/disable-dynamic-views-update.html' title='Disable Dynamic Views: An Update'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-hg5f7BHj6IY/Tx_GJUOu0pI/AAAAAAAAA58/MmyP8sqgTXw/s72-c/disable_dynamic_views.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-5494192523222334785</id><published>2012-02-25T12:49:00.000Z</published><updated>2012-02-25T15:11:57.782Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Postvorta'/><title type='text'>One Byte At A Time</title><content type='html'>&lt;a href="http://cookecitychronicle.blogspot.com/2011/12/one-bite-at-time.html" style="float:left; padding: 10px 10px 10px 0;"&gt;&lt;img border="0" height="225" width="300" src="http://2.bp.blogspot.com/-ibzJClaqfEs/T0i8o-nEAGI/AAAAAAAAA60/_US3OJcJd_U/s400/one_bite_at_a_time.jpg" /&gt;&lt;/a&gt;Whilst working on &lt;a href="http://englishjavadrinker.blogspot.com/search/label/Postvorta"&gt;Postvorta&lt;/a&gt; one of the things I've tried to do is to make the code as efficient as possible in order that search results are returned as quickly as possible. Mostly this has involved caching data where possible as well as using efficient data structures and algorithms. Of course with Postvorta being a web application part of the time taken to show search results is dependent on the amount of data that is actually returned to a browser including; HTML pages, JavaScript files, style sheets, and images. I am already using &lt;a href="http://jawr.java.net/"&gt;JAWR&lt;/a&gt; to minify and compress JavaScript and CSS files which makes a real difference to the amount of data that you have to download each time you search but in this post I want to talk about a small issue I uncovered when trying to trim just a few bytes from the HTML pages.&lt;br /&gt;&lt;br /&gt;I've recently been reading a book on &lt;a href="http://www.amazon.co.uk/gp/product/0137142528/ref=as_li_ss_tl?ie=UTF8&amp;tag=markgreenwoos-21&amp;linkCode=as2&amp;camp=1634&amp;creative=19450&amp;creativeASIN=0137142528"&gt;Java Performance&lt;/a&gt; by Charlie Hunt and Binu John. While it covers quite a few aspects of performance that I was already aware of there is also quite a lot of information that is new to me. One chapter is devoted to performance tuning for web applications and as well as mentioning minifying and compressing static files (JavaScript, CSS etc.) it devotes a section to considering whitespace in the dynamically generated pages.&lt;br /&gt;&lt;br /&gt;When you save a file of text, whitespace characters (spaces, tabs, new lines) all take up the same amount of disk space as any other character, i.e. 1 byte (I know this isn't entirely accurate but I don't want to get into a long discussion of line endings and encoding formats so this assumption will suffice for what follows). This is acceptable if you want to use whitespace for formatting but HTML specifically doesn't use whitespace in this way. Any sequence of whitespace in a HTML file is converted by the browser into a single space character, so it is wasteful to transmit extra whitespace than is needed for the page to be understood and rendered. Of course most people use whitespace not just for formatting but to make the HTML code easier to understand and debug. There are filters that I could add to Postvorta that would strip out all extraneous whitespace before transmitting the results back to the browser but a) this would make debugging the page tricky and b) each filter I add has it's own performance overhead. My plan, therefore, has been to try and re-work the code where possible to eliminate some whitespace while leaving the code readable and to not add an extra filter. In most cases this is easy, but there is one area where eliminating whitespace is more difficult.&lt;br /&gt;&lt;br /&gt;When switching between HTML and Java in a JSP page whitespace is often inserted to ensure that the resulting page can be properly interpreted. Unfortunately in almost every case this whitespace is superfluous and can be removed. Fortunately there is an easy way of removing the blank lines from the output that these whitespace characters introduce. The easiest way is to added the following page directive to a JSP page:&lt;br /&gt;&lt;pre class="brush: java"&gt;&amp;lt;%@ page trimDirectiveWhitespaces="true" %&amp;gt;&lt;/pre&gt;While Postvorta currently only contains two pages (the results page and the advanced syntax page) this is easy to do, but in a more complex application there may be tens or hundreds of pages at which point this approach becomes less appealing. You can, however, enable the same feature for every page by editing the applications &lt;code&gt;web.xml&lt;/code&gt; to add the following:&lt;br /&gt;&lt;pre class="brush: xml"&gt;&amp;lt;jsp-config&amp;gt;&lt;br /&gt;  &amp;lt;jsp-property-group&amp;gt;&lt;br /&gt;    &amp;lt;url-pattern&amp;gt;*.jsp&amp;lt;/url-pattern&amp;gt;&lt;br /&gt;    &amp;lt;trim-directive-whitespaces&amp;gt;true&amp;lt;/trim-directive-whitespaces&amp;gt;&lt;br /&gt;  &amp;lt;/jsp-property-group&amp;gt;&lt;br /&gt;&amp;lt;/jsp-config&amp;gt;&lt;/pre&gt;I tried both approaches and they do indeed produce the same output, which in my test case brought the page size down to 13,238 bytes from the original 13,300, saving me a total of 62 bytes! Now 62 bytes might not be very much but this is per page view and so can quickly mount up. Looking at the differences between the old and new pages I noticed that there were still quite a large number of blank lines in the &lt;code&gt;head&lt;/code&gt; section of the HTML file that I thought should have been removed. It turns out that the problem is related to how I style the pages but is easy to solve.&lt;br /&gt;&lt;br /&gt;I use &lt;a href="http://www.opensymphony.com/sitemesh/"&gt;SiteMesh&lt;/a&gt; (I'm using v2.4.1) to style all the pages within Postvorta. This allows me to define the main layout of the pages once and then use this to display all pages. For those of you who use Blogger, you can consider a SiteMesh layout to be equivalent to your blog template. The layout is applied via a SiteMesh specific filter and it appears, that when using the &lt;code&gt;web.xml&lt;/code&gt; approach to enable the trimming of whitespace, the layout is applied after the whitespace has been trimmed. This means that whitespace within the main body of the page is removed but not within the &lt;code&gt;head&lt;/code&gt; section. The trick is to use the page directive approach within the SiteMesh layout. This has the advantage of being applied to the entire page, only needs to be specified once, and in my test case saves another 15 bytes which brings the total page size down by 77 bytes to 13,223 bytes.&lt;br /&gt;&lt;br /&gt;The total savings are small, but if Postvorta ever becomes &lt;i&gt;really&lt;/i&gt; popular, shaving a few bytes here and there might well make a noticeable different to performance.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-5494192523222334785?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/5494192523222334785/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/02/one-byte-at-time.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/5494192523222334785'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/5494192523222334785'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/02/one-byte-at-time.html' title='One Byte At A Time'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-ibzJClaqfEs/T0i8o-nEAGI/AAAAAAAAA60/_US3OJcJd_U/s72-c/one_bite_at_a_time.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-3876483977753777944</id><published>2012-02-15T20:15:00.000Z</published><updated>2012-02-15T20:15:05.029Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Postvorta'/><category scheme='http://www.blogger.com/atom/ns#' term='Mímir'/><title type='text'>Postvorta Mk II: Faster And With More Features!</title><content type='html'>The &lt;a href="http://en.wikipedia.org/wiki/Supermarine_Spitfire_%28early_Merlin_powered_variants%29#Mk_II_.28Type_329.29"&gt;Spitfire Mk II&lt;/a&gt; was essentially the same as the original model, just with an upgraded Merlin engine.  Today I've done something similar to &lt;a href="http://wwww.postvorta.co.uk"&gt;Postvorta&lt;/a&gt; my &lt;i&gt;"intelligent blog search engine"&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;Those of you who read the &lt;a href="http://englishjavadrinker.blogspot.com/2011/10/postvorta-providing-intelligent-blog.html"&gt;initial blog posting&lt;/a&gt; I did on Postvorta may remember that underneath it all Postvorta relies on &lt;a href="http://gate.ac.uk/mimir/"&gt;GATE Mímir&lt;/a&gt; for indexing and search. Yesterday we (i.e. the &lt;a href="http://gate.ac.uk/"&gt;GATE group&lt;/a&gt; at the &lt;a href="http://www.shef.ac.uk"&gt;University of Sheffield&lt;/a&gt;) released new versions of most of the software we develop, including Mímir 4. While I'm not heavily involved with the development of Mímir I do use it quite substantially at work and I've been slowly updating all the systems I'm involved with, including Postvorta, to use this new version. Not only is this new version of Mímir faster it also takes a slightly different approach in the way it handles search results which is more suited to Postvorta than the old approach. I've also added some extra code to Postvorta to cache more information locally. Together these changes have resulted in Postvorta returning results an awful lot faster than before. You will also notice that switching between pages of results is significantly faster than it was before. Of course all these changes are "under the hood" so just like with the Spitfire Mk II, Postvorta should look roughly the same but work much faster. There is one new feature though that is worth talking about: result visualization.&lt;br /&gt;&lt;br /&gt;&lt;img style="float:left; padding: 0 5px 0 0; width:250px; height:250px;" src="http://4.bp.blogspot.com/-EHg-kQITtl8/Tzt2tfJp4vI/AAAAAAAAA6k/_1DydXuOA8I/s400/postvorta-graph.png" /&gt;When you search a blog using Postvorta it returns a list of relevant documents ordered from most recent to oldest. Combined with the different ways you can search a blog this ordering is usually the most useful. In some cases, however, especially when a search returns lots of results, it can be difficult to hunt through the posts to find the one you are interested in. To help with this I've started to think about different ways of visualizing the results. Whilst I've had a number of ideas the first to make it as far as a working, stable implementation is a date distribution graph.&lt;br /&gt;&lt;br /&gt;A date distribution graph (in this context anyway) is simply a vertical bar chart showing how the results are distributed by month. The graph, just like the results, works backwards so the most recent month is on the left. The bars of the graph can be clicked on to go to the result page containing the first result for that month. Essentially it allows you to jump to posts from a given month directly without having to page through lots of irrelevant results. Currently, depending upon the number of search results, the graph can take a moment or two to be produced, but this is done asynchronously to the normal page loading to allow you to see the actual results as soon as possible.&lt;br /&gt;&lt;br /&gt;As always I'd be interested in knowing what you think of this new feature. You can play with it either by searching this blog (the search box is just over on the right), or &lt;a href="http://www.postvorta.co.uk/register.jsp"&gt;your own blog&lt;/a&gt;.&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-3876483977753777944?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/3876483977753777944/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/02/postvorta-mk-ii-faster-and-with-more.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3876483977753777944'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3876483977753777944'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/02/postvorta-mk-ii-faster-and-with-more.html' title='Postvorta Mk II: Faster And With More Features!'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-EHg-kQITtl8/Tzt2tfJp4vI/AAAAAAAAA6k/_1DydXuOA8I/s72-c/postvorta-graph.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-1589146841938392189</id><published>2012-01-25T10:23:00.000Z</published><updated>2012-01-25T10:24:05.476Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='JavaScript'/><title type='text'>Putting A Spanner In The Works</title><content type='html'>&lt;img style="float: left;" src="http://1.bp.blogspot.com/-hg5f7BHj6IY/Tx_GJUOu0pI/AAAAAAAAA58/MmyP8sqgTXw/s400/disable_dynamic_views.png" /&gt;I'm really not a &lt;a href="http://en.wikipedia.org/wiki/Luddite"&gt;luddite&lt;/a&gt;, I have no problem with new technical inventions that make life easier or cheaper. What I object to is using technology simply for the sake of it, when the result is inferior to what there was before. Unfortunately, in my view, a lot of recent "innovations" on blogger seem to fall into this second category. Today my object of hate is &lt;a href="http://buzz.blogger.com/2011/09/dynamic-views-seven-new-ways-to-share.html"&gt;dynamic views&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Dynamic views are a nice way of showcasing the new web standards, HTML5 and CSS3, but personally I think they are horrible. In my experience they are slow to load (and believe me my machine ranks above average on CPU and memory), and they remove any sense of individuality from the blogs on which they are used. Whilst the presentation shouldn't be as important as the content, I think we all judge books by their covers and blogs by their layout. Unfortunately once a blog owner has decided to use dynamic layouts us as readers have no option to view the original template unless we turn off JavaScript. Unfortunately, if you turn off JavaScript you will be able to see the main page of the blog using the normal template but you won't be able to look at older posts and none of the widgets will work. Fortunately the rest of this post introduces a solution that means if you hate dynamic views as much as I do you can banish them for good!&lt;br /&gt;&lt;br /&gt;When you turn JavaScript off Blogger doesn't gracefully show you the old template for a dynamic blog. What it does is give you a link to the front page of the blog with &lt;code&gt;?v=0&lt;/code&gt; appended to the end. It turns out that adding &lt;code&gt;v=0&lt;/code&gt; to the query string of the URL for any page within a blog causes it to be displayed using the old template. Unfortunately Blogger doesn't re-write the links in a page when you add &lt;code&gt;v=0&lt;/code&gt; to the URL, so they all point back to the dynamic view. And even if you have JavaScript turned off they just send you back to the homepage not the actual page you wanted to look at. This is a horrid mess, but it does give us a way of fixing things.&lt;br /&gt;&lt;br /&gt;My solution is a GreaseMonkey script that runs as the page loads, redirects you away from the dynamic view and back to the original template pages and re-writes all the out going links, where necessary, to include the &lt;code&gt;v=0&lt;/code&gt; parameter so that they all just work. If you use Firefox then you will need to &lt;a href="https://addons.mozilla.org/en-US/firefox/addon/greasemonkey/"&gt;install the GreaseMonkey addon&lt;/a&gt; before you can use the script, if you use Google Chrome then support for GreaseMonkey scripts is built in. Once you are ready you can install my script by &lt;a href="http://apps.englishcoffeedrinker.co.uk/greasemonkey/disable_dynamic_views.user.js"&gt;clicking this link&lt;/a&gt;. Currently it will only be applied to blogspot hosted blogs so unfortunately you might still see the odd dynamic view blog (&lt;a href="http://buzz.blogger.com"&gt;Blogger Buzz&lt;/a&gt; for example) but hopefully there shouldn't be too many.&lt;br /&gt;&lt;br /&gt;Let me know what you think and if you have any suggestions for improvements, and certainly links to any blogspot hosted blogs where you still see the dynamic views.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-1589146841938392189?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/1589146841938392189/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/01/putting-spanner-in-works.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/1589146841938392189'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/1589146841938392189'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/01/putting-spanner-in-works.html' title='Putting A Spanner In The Works'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-hg5f7BHj6IY/Tx_GJUOu0pI/AAAAAAAAA58/MmyP8sqgTXw/s72-c/disable_dynamic_views.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-572519665493955717</id><published>2012-01-16T09:02:00.000Z</published><updated>2012-01-16T09:02:15.027Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='CSS'/><title type='text'>Threaded Comments</title><content type='html'>&lt;img style="width: 300px; float: left; padding: 0 5px 5px 0;" src="http://2.bp.blogspot.com/-T1ARBiTjjSw/TxPdNv0OlcI/AAAAAAAAA48/pvLVt-s9R6c/s400/threads.jpg" /&gt;It would appear that a few days ago Blogger &lt;a href="http://buzz.blogger.com/2012/01/engage-with-your-readers-through.html"&gt;introduced another new feature&lt;/a&gt; to our blogs: threaded comments.&lt;br /&gt;&lt;br /&gt;Now while I actually quite like the idea (much like the &lt;a href="http://englishjavadrinker.blogspot.com/2011/09/fixing-bloggers-mistakes.html"&gt;lightbox fiasco&lt;/a&gt;) it has been introduced as the default if you use an embedded comment form and publish full feeds. The other problem (and the one that affected this blog) is that the HTML elements and CSS styles used to display the comments have changed, which (depending on your template) might mean that they look different or even wrong. So while I quite like the idea here are two little tips that might make life easier.&lt;br /&gt;&lt;br /&gt;Firstly, let's assume that while they display correctly you just don't like threaded comments and want to turn them off. This is actually easily achievable with following piece of CSS.&lt;br /&gt;&lt;pre class="brush: css"&gt;.comment-actions {&lt;br /&gt;   display: none;&lt;br /&gt;}&lt;/pre&gt;You need to add this to your blog template, either by editing the HTML of your template and adding it anywhere inside a &lt;code&gt;style&lt;/code&gt; element, or by going to advanced in the Template Designer where you can add arbitrary CSS. Essentially this just removes the reply link, so to leave a comment everyone is forced to use the main comment form at the bottom of the post. This has, however, the unfortunate side effect of also removing the default delete comment link. Personally I don't think that's too much of an issue as it is easy to moderate comments through the dashboard. If I have the time I'll try and figure out how to reinstate the delete link and then I'll publish another post explaining how.&lt;br /&gt;&lt;br /&gt;On this blog I actually don't mind the comments being threaded, but I do mind them not displaying properly, which is exactly what was happening. When I customized the template I use for this blog I added some CSS styles to control how &lt;code&gt;blockquotes&lt;/code&gt; appear. I wasn't intending to use them for structure within the template so I styled them to look like an actual quote. You can see the styles in action in &lt;a href="http://englishjavadrinker.blogspot.com/2011/12/i-do-exist-honest.html"&gt;this post&lt;/a&gt; where I quote part of an e-mail. The problem is that blogger are now using &lt;code&gt;blockquote&lt;/code&gt; to layout comments. This meant that each comment was styled in the same way as that e-mail quote, and because the surrounding context was different the comments were actually overlapping one another slightly. All in all it looked pretty horrible. Fortunately there is an easy fix. Because I know that I only want to use my styles within the body of a post I can limit the CSS to only applying within that section of the page by prefixing the style with &lt;code&gt;.post-body&lt;/code&gt; as the post is always contained within a &lt;code&gt;div&lt;/code&gt; with the class &lt;code&gt;post-body&lt;/code&gt;. So my three &lt;code&gt;blockquote&lt;/code&gt; styles now look like.&lt;br /&gt;&lt;pre class="brush: css"&gt;.post-body blockquote {&lt;br /&gt;   font-style: italic;&lt;br /&gt;   width: 500px;&lt;br /&gt;   margin: 0 auto;&lt;br /&gt;   text-align: justify;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;.post-body blockquote:before {&lt;br /&gt;   display:block;&lt;br /&gt;   content: open-quote;&lt;br /&gt;   font-size: 300%;&lt;br /&gt;   height: 0;&lt;br /&gt;   margin-left: -0.8em;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;.post-body blockquote:after {&lt;br /&gt;   display: block;&lt;br /&gt;   font-size: 300%;&lt;br /&gt;   content: close-quote;&lt;br /&gt;   margin-top: -0.8em;&lt;br /&gt;   margin-left: 500px;&lt;br /&gt;}&lt;/pre&gt;Of course depending on the template you use you may have different problems that show up because of the new comment system so you may need to experiment a little to find exactly what CSS tricks you need to apply to put things back the way they were before Blogger introduced the new feature/bug (delete as you see fit).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-572519665493955717?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/572519665493955717/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/01/threaded-comments.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/572519665493955717'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/572519665493955717'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2012/01/threaded-comments.html' title='Threaded Comments'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-T1ARBiTjjSw/TxPdNv0OlcI/AAAAAAAAA48/pvLVt-s9R6c/s72-c/threads.jpg' height='72' width='72'/><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-79621175357633454</id><published>2011-12-15T15:08:00.000Z</published><updated>2011-12-15T15:08:59.131Z</updated><title type='text'>I Do Exist, Honest!</title><content type='html'>After a stressful day I appear to exist once again. If you tried to visit my blog yesterday you would have seen that it had been removed. Worse than that Google had decided to suspended my entire account. That meant I lost access not only to all my blogs, but photos in picasa, my calendars, my gmail account, and my &lt;a href="https://plus.google.com/u/0/100530048286198152742"&gt;Google+ profile&lt;/a&gt;. It was almost as if I no longer existed (online). I filled in the contact us form to request my account back and then almost a day later I got this e-mail from Google:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;We apologize for any inconvenience you may have experienced. The issue you described should now be resolved.&lt;/i&gt;&lt;/blockquote&gt;I don't think I've ever been quite so relieved. While the whole experience was a bit of a nightmare it did get me thinking about how I could keep backups of all my Google data in case something similar happened again in the future. After a little bit of hunting around the web (via Google of course) I came across the Data Liberation Front.&lt;br /&gt;&lt;br /&gt;&lt;img style="float:left; width: 200px; padding: 0 5px 5px 0;" src="http://2.bp.blogspot.com/-XOoXo1LPbkc/TuoKrMrRy4I/AAAAAAAAA4U/XAIls0YpZUo/s400/dl-bg.png" /&gt;The &lt;a href="http://www.dataliberation.org/home"&gt;Data Liberation Front&lt;/a&gt; is a Google engineering team who are trying to make it easy to liberate your data from Google products. This is useful for either moving your data to a competing service or for simply backing it up locally.&lt;br /&gt;&lt;br /&gt;They have step by step instructions on exporting your data from most of Google's products, but they are also developing &lt;a href="http://google.com/takeout/"&gt;Google Takeout&lt;/a&gt;. Google Takeout brings together a number of the export tools into a simple interface allowing you to select which data you want to download. It then collects the data together into a single zip file that you can easily download and archive. Currently it supports: +1’d sites, Buzz, Contacts and Circles, Picasa Web Albums, Profile, and Google+ Stream. As far as I can tell the aim is to add more products to this list and I'd certainly appreciate Blogger being included.&lt;br /&gt;&lt;br /&gt;So if you don't have a recent backup of your Google hosted data maybe now is the time to do something about it.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-79621175357633454?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/79621175357633454/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/12/i-do-exist-honest.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/79621175357633454'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/79621175357633454'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/12/i-do-exist-honest.html' title='I Do Exist, Honest!'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-XOoXo1LPbkc/TuoKrMrRy4I/AAAAAAAAA4U/XAIls0YpZUo/s72-c/dl-bg.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-924105340094330669</id><published>2011-11-27T16:53:00.000Z</published><updated>2011-11-27T20:00:03.500Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><title type='text'>Striped Clouds</title><content type='html'>&lt;a href="http://www.flickr.com/photos/arenamontanus/1252651164/" title="Striped Cirrocumulus Undulatus Clouds by Arenamontanus, on Flickr"&gt;&lt;img src="http://farm2.staticflickr.com/1365/1252651164_36ecfaefa7.jpg" style="float: left; width: 300px; padding: 5px 5px 5px 0;" alt="Striped Cirrocumulus Undulatus Clouds"&gt;&lt;/a&gt;I've spent quite a bit of my spare time  over the last week or two doing some Java GUI programming, the reasons for which will become clear in a later post. Quite a lot of the GUI is table based and so I've spent quite a bit of time playing with custom rendering code for different data types to make things easier to visualize and edit. There are &lt;a href="http://docs.oracle.com/javase/tutorial/uiswing/components/table.html#renderer"&gt;plenty of tutorials&lt;/a&gt; on how to do this spread over the web but one thing I found quite difficult was writing a renderer that worked reliably across different Java Look and Feels (L&amp;F). The one renderer I wrote that highlights most of the problems I had was for displaying a checkbox in a table.&lt;br /&gt;&lt;br /&gt;By default Java will display a checkbox for boolean data types, but unfortunately it doesn't disable the checkbox when it can't be edited. This leads to a situation where you can't change the state of the checkbox but there is no visual feedback to tell you this. So I wrote a simple renderer that would disable the checkbox if it wasn't editable. The first problem I found was that under the GTK+ L&amp;F the background of the cell didn't change when the row was selected. It did under the default Metal L&amp;F and after a little bit of debugging I discovered the problem. Every Swing component has an opaque property which determines if it's background is drawn or not. It turns out that the default value is dependent on the L&amp;F. So under Metal checkbox's have an opaque background while under GTK+ they don't. Fortunately this is easy to fix simply by calling &lt;code&gt;setOpaque(true)&lt;/code&gt;. A similar problem occurs with the focus rectangle around the outside of each table cell, but again it's an easily fixed by calling &lt;code&gt;setBorderPainted(true)&lt;/code&gt;. These tweaks gave me a cell renderer which seemed to work well until, that is, I tried it under the &lt;a href="http://docs.oracle.com/javase/tutorial/uiswing/lookandfeel/nimbus.html"&gt;Nimbus L&amp;F&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The Nimbus L&amp;F was introduced in Java SE 6 Update 10, and was meant to be the new cross platform L&amp;F that would replace the aging Metal which has been the default since Swing was first developed. One of the nice things about Nimbus is that it is resolution independent and uses vector graphics rather than bitmaps. This should, in theory, lead to a crisper interface. Personally I'm not a fan, and as yet it hasn't replaced Metal as the default L&amp;F. It seemed, however, sensible to make sure my rendering code worked correctly under Nimbus as well as Metal, GTK+ and CDE/Motif (these are the four L&amp;Fs available by default when running Java under Ubuntu). Unfortunately it didn't.&lt;br /&gt;&lt;br /&gt;Nimbus, in an attempt to be different, colours the background of table rows alternating colours -- by default white and a light gray. This is instead of drawing a border around the cells. The problem is that my renderer (and almost every example I've seen) gets the background colour for the cell from the table by calling either &lt;code&gt;table.getBackground()&lt;/code&gt; or &lt;code&gt;table.getSelectionBackground()&lt;/code&gt;. The selected background colour works correctly but the unselected cells get drawn with a dark gray background. There are three tricks to work around this while leaving the code working under the other L&amp;Fs. The first is to get the alternative background colour from the UIManager class. The second is to recreate the unselected background colour to make it display correctly. Finally we use the modulus operator to determine which row colour we should be using. Adding these workarounds gives me the following cell renderer which seems to work under the four L&amp;Fs available by default under Ubuntu as well as the Windows L&amp;F.&lt;br /&gt;&lt;pre class="brush: java"&gt;import java.awt.Color;&lt;br /&gt;import java.awt.Component;&lt;br /&gt;&lt;br /&gt;import javax.swing.BorderFactory;&lt;br /&gt;import javax.swing.JCheckBox;&lt;br /&gt;import javax.swing.JTable;&lt;br /&gt;import javax.swing.UIManager;&lt;br /&gt;import javax.swing.border.Border;&lt;br /&gt;import javax.swing.table.TableCellRenderer;&lt;br /&gt;&lt;br /&gt;/**&lt;br /&gt; * A TableCellRenderer for JCheckBox that disables the checkbox when the&lt;br /&gt; * cell isn't editable to make it clear that you can't click on it&lt;br /&gt; * &lt;br /&gt; * @author Mark A. Greenwood&lt;br /&gt; */&lt;br /&gt;@SuppressWarnings("serial")&lt;br /&gt;public class CheckBoxTableCellRenderer extends JCheckBox implements&lt;br /&gt;                                                    TableCellRenderer {&lt;br /&gt;&lt;br /&gt;  private static final Border NO_FOCUS =&lt;br /&gt;    BorderFactory.createEmptyBorder(1, 1, 1, 1);;&lt;br /&gt;&lt;br /&gt;  public CheckBoxTableCellRenderer() {&lt;br /&gt;    super();&lt;br /&gt;    setHorizontalAlignment(JCheckBox.CENTER);&lt;br /&gt;    setBorderPainted(true);&lt;br /&gt;    setOpaque(true);&lt;br /&gt;  }&lt;br /&gt;&lt;br /&gt;  public Component getTableCellRendererComponent(JTable table,&lt;br /&gt;    Object value, boolean isSelected, boolean hasFocus,&lt;br /&gt;    int row, int column) {&lt;br /&gt;&lt;br /&gt;    // this is needed for Nimbus which has alternative rows in different&lt;br /&gt;    // colors hopefully other L&amp;Fs that also do this use the same key&lt;br /&gt;    Color alternate = UIManager.getColor("Table.alternateRowColor");&lt;br /&gt;&lt;br /&gt;    // strangely the background color from nimbus doesn't render properly&lt;br /&gt;    // unless we convert it in this way. I'm guessing the problem is to do&lt;br /&gt;    // with the DerivedColor class that Nimbus uses&lt;br /&gt;    Color normal = new Color(table.getBackground().getRGB());&lt;br /&gt;&lt;br /&gt;    if(isSelected) {&lt;br /&gt;      setForeground(table.getSelectionForeground());&lt;br /&gt;      setBackground(table.getSelectionBackground());&lt;br /&gt;    } else {&lt;br /&gt;      setForeground(table.getForeground());&lt;br /&gt;      setBackground(alternate != null &amp;&amp; row % 2 == 0 ? alternate : normal);&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    setEnabled(table.isCellEditable(row, column));&lt;br /&gt;    setSelected(value != null &amp;&amp; (Boolean)value);&lt;br /&gt;&lt;br /&gt;    if(hasFocus) {&lt;br /&gt;      setBorder(UIManager.getBorder("Table.focusCellHighlightBorder"));&lt;br /&gt;    } else {&lt;br /&gt;      setBorder(NO_FOCUS);&lt;br /&gt;    }&lt;br /&gt;&lt;br /&gt;    return this;&lt;br /&gt;  }&lt;br /&gt;}&lt;/pre&gt;One thing to note is that I've used the alternative colour for the even numbered rows (i.e. when &lt;code&gt;row % 2 == 0&lt;/code&gt;). I've seen some web pages suggesting that you need to use the alternative colour on the odd rows. I'm not sure how Nimbus decides which colour to use for which rows so if you see them switched around for some reason you'll need to tweak the code slightly (i.e. use &lt;code&gt;row % 2 == 1&lt;/code&gt;).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-924105340094330669?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/924105340094330669/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/11/striped-clouds.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/924105340094330669'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/924105340094330669'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/11/striped-clouds.html' title='Striped Clouds'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-3214242262616306640</id><published>2011-11-23T19:14:00.000Z</published><updated>2011-11-23T19:21:50.848Z</updated><title type='text'>Quick As A Fox</title><content type='html'>&lt;img style="float:left; padding-right: 5px;" src="http://1.bp.blogspot.com/-_a4hgqmvnVs/Ts09fe8IQEI/AAAAAAAAA3g/ZgsIjpL98N4/s400/firefox-quicktime.png" /&gt;I like watching films and I buy a lot of DVDs (there are almost 500 movies in the house). I tend to find though, that often I have a completely different view to most movie critics so I don't tend to read movie reviews or magazines. I do, however, find watching movie trailers useful (the one exception being The Break Up, which I thought looked great but &lt;a href="http://englishcoffeedrinker.blogspot.com/2009/02/what-not-to-watch-break-up.html"&gt;definitely wasn't&lt;/a&gt;) so I can easily waste an hour or so on the &lt;a href="http://trailers.apple.com/"&gt;iTunes Movie Trailers&lt;/a&gt; website. Yesterday, having read an &lt;a href="http://www.bbc.co.uk/news/entertainment-arts-15756623"&gt;article about possible Oscar contenders&lt;/a&gt; I went to hunt down the trailer for &lt;a href="http://trailers.apple.com/trailers/fox_searchlight/thedescendants/"&gt;The Descendants&lt;/a&gt; only to find that the trailer wouldn't play.&lt;br /&gt;&lt;br /&gt;My main PC runs &lt;a href="http://www.ubuntu.com/"&gt;Ubuntu&lt;/a&gt; and in the past Apple have gone out of their way to make it difficult to watch the trailers on Linux so I just assumed they had deliberately &lt;i&gt;broken&lt;/i&gt; something. I had a quick hunt around the web and couldn't see anyone else complaining that things had changed recently so I decided it had to be a problem on my machine. It took me a while to track down the problem but I thought it worth mentioning here just in case it trips up anyone else. Essentially &lt;a href="http://www.videolan.org/vlc/"&gt;VLC&lt;/a&gt; was the cause of the problem.&lt;br /&gt;&lt;br /&gt;I use VLC as the main video player on my computer as it works well with DVDs and almost any file format/codec combination you throw at it. However, I'd recently had a problem with the audio and video going out of sync. While trying to fix that problem I had done a complete reinstall of VLC. As well as reinstalling the main application this had also reinstalled too Firefox plugins. It turns out that one of these plugins seems to interfere with the &lt;a href="http://projects.gnome.org/totem/"&gt;Totem&lt;/a&gt; based QuickTime plugin. I'm guessing that the VLC plugin takes presedence and then fails for some reason. The easy solution is to simply disable one of the VLC based Firefox plugins. You can do this from the addons page in Firefox (accessed by entering &lt;code&gt;about:addons&lt;/code&gt; in the address bar). You should find two plugins named 'VLC Multimedia Plug-in'. The difference between them is that one states it is compatible with Totem. Leave this one alone and disable the other one and hey presto! QuickTime movies should start playing again.&lt;br /&gt;&lt;br /&gt;Given that before the reinstall of VLC trailers had been playing properly, I'm guessing that I'd worked this problem out before and then forgotten all about it (memory like swiss cheese and all that), so hopefully this post should at least remind me of the solution next time I have the same problem even if it doesn't help anyone else.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-3214242262616306640?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/3214242262616306640/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/11/quick-as-fox.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3214242262616306640'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3214242262616306640'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/11/quick-as-fox.html' title='Quick As A Fox'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-_a4hgqmvnVs/Ts09fe8IQEI/AAAAAAAAA3g/ZgsIjpL98N4/s72-c/firefox-quicktime.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-7639362454719374061</id><published>2011-11-05T15:38:00.001Z</published><updated>2011-11-05T15:38:50.681Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><title type='text'>Off The Tracks</title><content type='html'>Error handling is always important. Nobody likes it when an application they are using crashes so badly that it stops working. Of course some application crashes are more embarrassing or public than others. On my way back from &lt;a href="http://englishcoffeedrinker.blogspot.com/search/label/Prague"&gt;a work meeting in Prague&lt;/a&gt; last week I saw a perfect example in Huddersfield train station.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-t107twYg72g/TrVWtcUF_oI/AAAAAAAAA0c/YHL1elk7egE/s1600/TrainOutOfBoundsException.jpg" imageanchor="1" style="margin-left:1em; margin-right:1em"&gt;&lt;img border="0" height="281" width="400" src="http://2.bp.blogspot.com/-t107twYg72g/TrVWtcUF_oI/AAAAAAAAA0c/YHL1elk7egE/s400/TrainOutOfBoundsException.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Fortunately a simple &lt;a href="http://download.oracle.com/javase/6/docs/api/java/lang/ArrayIndexOutOfBoundsException.html"&gt;ArrayIndexOutOfBoundsException&lt;/a&gt; didn't seem to stop the trains running on time.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-7639362454719374061?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/7639362454719374061/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/11/off-tracks.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7639362454719374061'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7639362454719374061'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/11/off-tracks.html' title='Off The Tracks'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-t107twYg72g/TrVWtcUF_oI/AAAAAAAAA0c/YHL1elk7egE/s72-c/TrainOutOfBoundsException.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-2361563978668910357</id><published>2011-10-21T19:51:00.000+01:00</published><updated>2011-10-23T16:28:58.391+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='JavaScript'/><title type='text'>Browser Detection: How Not To Do It</title><content type='html'>&lt;img style="float: left; width: 320px; padding-right: 10px;" src="http://3.bp.blogspot.com/-nqMo4t0XBpQ/TqG-NX4GUGI/AAAAAAAAAyo/0j0PFsn3LyY/s400/what-version-are-you-running.jpg" /&gt;Web browsers used to behave so differently from one another that it was common to write code to detect which browser type and version was being used to display the page so that appropriate code could be run.  Fortunately, modern browsers tend to support web standards better and so apart from a few CSS tweaks it is unusual to come across browser detection code (the one exception to this rule being the expense system I have to use at work which doesn't seem to like Firefox at all). So I was a little surprised the other day when I suddenly found myself bounced to a an unsupported browser page on a shopping site that I'd been browsing around a few days before. So being the inquisitive kind I had a dig around and came across this gem.&lt;br /&gt;&lt;pre class="brush: javascript"&gt;if (navigator.userAgent.match(/Firefox\/[12]/)) {&lt;br /&gt;   window.location.href = "unsupported.html";&lt;br /&gt;}&lt;/pre&gt;For those of you who don't speak JavaScript, this essentially checks (using a regular expression) for the presence of either "Firefox/1" or "Firefox/2" in the useragent string, which identifies the make and version of the web browser requesting the page.&lt;br /&gt;&lt;br /&gt;Now I like living on the bleeding edge of browser development and so I run the &lt;a href="http://nightly.mozilla.org/"&gt;nightly builds of Firefox&lt;/a&gt;. Given the recent change in the release cycle of Firefox, the version number has climbed quite rapidly and the nightly build now sends the following useragent string (you can find out where your browser sends using &lt;a href="http://whatsmyuseragent.com/"&gt;this website&lt;/a&gt;):&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;Mozilla/5.0 (X11; Linux x86_64; rv:10.0a1) Gecko/20111012 Firefox/10.0a1&lt;/div&gt;&lt;br /&gt;So from reading this you should be able to see that I'm actually running version 10.0a1 of Firefox. It should also be clear why the check of my browser resulted in me being bounced to the unsupported browser page: Firefox/10.0a1 contains the string "Firefox/1". This is a good example of why you really shouldn't write your own browser detection code, especially as there are a number of well written and up to date scripts out there that correctly extract the make and version number and which are easy to use. And if you subscribe to the&lt;a href="http://en.wikipedia.org/wiki/Not_Invented_Here"&gt; not-invented-here&lt;/a&gt; school of thought, then at least make sure you actually implement a sensible solution!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-2361563978668910357?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/2361563978668910357/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/10/browser-detection-how-not-to-do-it.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/2361563978668910357'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/2361563978668910357'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/10/browser-detection-how-not-to-do-it.html' title='Browser Detection: How Not To Do It'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-nqMo4t0XBpQ/TqG-NX4GUGI/AAAAAAAAAyo/0j0PFsn3LyY/s72-c/what-version-are-you-running.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-6616211816949525271</id><published>2011-10-18T19:27:00.000+01:00</published><updated>2011-10-18T19:27:20.677+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Postvorta'/><category scheme='http://www.blogger.com/atom/ns#' term='Mímir'/><title type='text'>Postvorta: Providing Intelligent Blog Search</title><content type='html'>&lt;img style="float:left;" src="http://4.bp.blogspot.com/-WGHtRFDiDIY/Tp21z2p9YnI/AAAAAAAAAyc/W5KwtvIj2Us/s400/postvorta.png" /&gt;The eagle-eyed amongst you may have noticed that about a month ago the search box in the sidebar of this blog changed. I used to use the standard Google search gadget but I now use a gadget powered by &lt;a href="http://www.postvorta.co.uk"&gt;Postvorta&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Postvorta was built specifically to enable intelligent searching of blogs. How do I know this you ask? Well I spent the past year building Postvorta in my spare time. The initial motivation was a number of conversations with fellow bloggers about the inadequacies of the Google search gadget and coupled with the fact that my job involves processing natural language documents (I work as part of the &lt;a href="http://gate.ac.uk"&gt;GATE&lt;/a&gt; group at the &lt;a href="http://www.shef.ac.uk"&gt;University of Sheffield&lt;/a&gt;) I thought I was in a position to provide something better.&lt;br /&gt;&lt;br /&gt;It is difficult to know exactly how the standard Google search gadget works, but as far as I can tell (both from personal experimentation and from talking to others) it appears to only index the main content of each post. For example, it certainly doesn't index the labels associated with posts. This means that while you can view all posts with a given label you can't search for them using the search gadget. Postvorta, however, indexes all the important content from your blog posts: title, article, labels, and comments. Importantly it does not index the pages you see when you view the blog in a web browser, instead it access the underlying data (via the Google Data APIs) which means that it can ignore the repeated information in the blog template. For example, many blogs contain a gadget which lists recent post titles, these shouldn't be indexed with each post as that makes it much more difficult to search for the actual post. A search can also be restricted by date and/or by the people who commented on a post. I've tried to provide as much flexibility as possible while keeping the full interface relatively simple.&lt;br /&gt;&lt;br /&gt;Fortunately when building Postvora I didn't have to start from scratch. One advantage of working in a research group that makes their software available under an open-source license is that I can make use of software I use at work in my own projects. In this case the main indexing and search facilities behind Postvorta are built upon &lt;a href="http://gate.ac.uk/family/mimir.html"&gt;GATE Mímir&lt;/a&gt;. I've talked about Mímir &lt;a href="http://englishjavadrinker.blogspot.com/2011/07/people-in-news.html"&gt;before on this blog&lt;/a&gt; and if you've read that post then you shouldn't be surprised that as well as searching for words, like Google and the other search engines, you can use Postvorta to search your blogs semantically, i.e. for things. So you can search for any posts containing, for example, the name of a person without knowing what the name was in advance. If you are new to Mímir then Postvorta provides a comprehensive description of the &lt;a href="http://www.postvorta.co.uk/search/syntax.html"&gt;query syntax&lt;/a&gt; which becomes available when you choose to use it through the search interface (by default searches are treated as a simple bag-of-words just as with other search engines).&lt;br /&gt;&lt;br /&gt;Feel free to have a play with Postvorta through the search gadget on this blog. I'm also using it on my &lt;a href="http://englishcoffeedrinker.blogspot.com/"&gt;main blog&lt;/a&gt; where there are a lot more posts to search through. Postvorta is currently being run as a closed beta (while I evaluate performance, reliability etc.) but if you like what you see then you can &lt;a href="http://www.postvorta.co.uk/register.jsp"&gt;register your interest&lt;/a&gt; and I'll try and index your blog as soon as possible -- note that currently Postvorta only supports Blogger blogs, although WordPress support should be coming soon.&lt;br /&gt;&lt;br /&gt;Let me know what you think.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-6616211816949525271?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/6616211816949525271/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/10/postvorta-providing-intelligent-blog.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6616211816949525271'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6616211816949525271'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/10/postvorta-providing-intelligent-blog.html' title='Postvorta: Providing Intelligent Blog Search'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-WGHtRFDiDIY/Tp21z2p9YnI/AAAAAAAAAyc/W5KwtvIj2Us/s72-c/postvorta.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-2025135353176390792</id><published>2011-10-13T10:23:00.000+01:00</published><updated>2011-10-13T10:25:22.178+01:00</updated><title type='text'>Blogger's Lightbox Returns To Haunt Our Blogs!</title><content type='html'>&lt;img style="float:left;" src="http://3.bp.blogspot.com/-Lj21UTiXN2g/TpasTnxsm8I/AAAAAAAAAxs/e5h7L4JsmzI/s400/killlightbox.png" /&gt;Blogger have now reintroduced lightbox to all blogs. While they claim to have fixed a lot of the bugs/issues that were reported before, they have still turned lightbox on by default. Fortunately we don't need to &lt;a href="http://englishjavadrinker.blogspot.com/2011/09/fixing-bloggers-mistakes.html"&gt;apply a hacky workaround&lt;/a&gt; this time.&lt;br /&gt;&lt;br /&gt;If you don't want to use lightbox then you can turn it off for each blog you write through your dashboard. Simply select “No” next to Lightbox in the Settings &gt; Posts and Comments section (new interface) or the Settings &gt; Formatting section (old interface). I still think this feature should be opt-in rather than opt-out but at least they are allowing us to opt-out, so I guess we should be grateful.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-2025135353176390792?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/2025135353176390792/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/10/bloggers-lightbox-returns-to-haunt-our.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/2025135353176390792'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/2025135353176390792'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/10/bloggers-lightbox-returns-to-haunt-our.html' title='Blogger&apos;s Lightbox Returns To Haunt Our Blogs!'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-Lj21UTiXN2g/TpasTnxsm8I/AAAAAAAAAxs/e5h7L4JsmzI/s72-c/killlightbox.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-510433656252636129</id><published>2011-10-07T19:09:00.000+01:00</published><updated>2011-10-07T19:09:37.409+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Mímir'/><title type='text'>Don't Look Down</title><content type='html'>&lt;img style="width: 200px; float:left; padding-right: 5px;" src="http://3.bp.blogspot.com/-hiBl0se5jZE/To8kvfrghZI/AAAAAAAAAxk/tKTXhnyOlJY/s400/39527-animals_fear_heights.jpg" /&gt;And now for something completely different -- a posting not at all related to Blogger!&lt;br /&gt;&lt;br /&gt;I've recently been spending quite a bit of my free time playing around with &lt;a href="http://gate.ac.uk/family/mimir.html"&gt;GATE Mímir&lt;/a&gt; (the reasons for which will become clear in a later post). As I've &lt;a href="http://englishjavadrinker.blogspot.com/2011/07/people-in-news.html"&gt;mentioned before&lt;/a&gt;, Mímir is a multi-paradigm indexing and retrieval system which allows us to combine text, annotations and knowledge base data in a single index. Text within Mímir is indexed using &lt;a href="http://mg4j.dsi.unimi.it/"&gt;MG4J&lt;/a&gt; and by default is processed (at both indexing and search time) by a &lt;a href="http://mg4j.dsi.unimi.it/docs/it/unimi/dsi/mg4j/index/DowncaseTermProcessor.html"&gt;DowncaseTermProcessor&lt;/a&gt; which ensures that searches are case insensitive. Unfortunately while case insensitive searching is great there are other common problems when searching text collections, one of which can be nicely illustrated just from the name Mímir.&lt;br /&gt;&lt;br /&gt;Whilst the name of the system, Mímir, contains an accented character, most people when searching would probably not go to the bother of figuring out how to enter the accented i and would instead try searching for Mimir. But just as Mímir and Mimir are visually different, so are they different when stored in an MG4J index. In other words if we search using the unaccented version we won't get any results! Whilst Mímir is a slightly unusual case I'm sure that we can all agree that a search for cafe should also bring back documents which mention café.&lt;br /&gt;&lt;br /&gt;Now for latin alphabets I could come up with a mapping that would reduce most accented characters down to an unaccented version, but it would be time consuming to build and wouldn't handle the different ways in which accented characters can be encoded using Unicode. So I had a bit of a hunt around and discovered a simple, and I think, elegant way of converting accented characters to their unaccented forms courtesy of a &lt;a href="http://glaforge.appspot.com/article/how-to-remove-accents-from-a-string"&gt;blog posting by Guillaume Laforge&lt;/a&gt;. Creating a custom MG4J term processor using this code was trivial and so I now have a way of ensuring that accented characters don't cause me any problems. The one issue was getting Mímir to use the new term processor.&lt;br /&gt;&lt;br /&gt;&lt;img style="float: right; padding-left: 5px; padding-bottom: 5px;" src="http://4.bp.blogspot.com/-eVOy3cDvu4U/To8iyFSKR9I/AAAAAAAAAxc/3y8LZYQn87g/s400/classloaders.png"/&gt;I deploy Mímir in a &lt;a href="http://tomcat.apache.org/"&gt;Tomcat&lt;/a&gt; instance by building a WAR file and while I could simply add a JAR file containing my custom term processor to the WEB-INF/lib folder before creating the WAR I'd prefer not to have to. If i included my code within the Mímir WAR then anytime I wanted to make a change would require rebuilding the WAR and redeploying which seems to be more work than necessary. Fortunately the Mímir config file allows you to specify GATE plugins that should be loaded when the web app is started. So it is trivial to create a GATE plugin which references a JAR containing my custom term processor. Unfortunately when I tried this, MG4J threw a ClassNotFoundException. The problem is that Java never looks down.&lt;br /&gt;&lt;br /&gt;On the right you can see the ClassLoader hierarchy that is created when Mímir is deployed in Tomcat -- I've added little icons to show which are created by the Java runtime environment, which by Tomcat and which by the GATE embedded within Mímir. As you can see the GATE classloader, which is responsible for loading the plugin containing my custom term processor, is right at the bottom of the hierarchy. The MG4J libraries in the Mímir WEB-INF/lib folder which is the responsibility of the Web App classloader. Each classloader only knows about it's parent and not about any children and when asked to load a class first asks it's parent classloader and only if the class cannot be loaded does it then try loading it itself. The problem I was facing was that when MG4J tried to load my custom term processor it did so by asking the Web App classloader and as it is loaded by a child classloader the class couldn't be found and hence a ClassNotFoundException was thrown. Rather than giving up and simply adding the term processor to the WEB-INF/lib folder I decided to see if I could find a way of injecting the term processor into the right classloader.&lt;br /&gt;&lt;br /&gt;Now before we go any further I should point out that one of my collegues has described what follows as evil, and I have to say I agree with him. That said it works and given the way Mímir works I can't see any problems arising, but I wouldn't suggest this as a general solution to the class loading problem described above for reasons I'll detail later. However....&lt;br /&gt;&lt;br /&gt;Each class in Java knows which ClassLoader instance was responsible for creating it and we can use this information to forcibly inject code into the right place, using the following method.&lt;br /&gt;&lt;br /&gt;&lt;pre style="clear: both;" class="brush: java"&gt;private static void codeInjector(Class&amp;lt;?&amp;gt; c1, Class&amp;lt;?&amp;gt; c2)&lt;br /&gt;{&lt;br /&gt;  try&lt;br /&gt;  {&lt;br /&gt;    // Get the class loader which loaded MG4J&lt;br /&gt;    ClassLoader loader = c2.getClassLoader();&lt;br /&gt;&lt;br /&gt;    if (!loader.equals(c1.getClassLoader()))&lt;br /&gt;    {&lt;br /&gt;      //Assuming we aren't running inside the MG4J class loader...&lt;br /&gt;&lt;br /&gt;      //Get an input stream we can use to read the byte definition of this class				&lt;br /&gt;      InputStream inp = c1.getClassLoader().getResourceAsStream(c1.getName().replace('.', '/') + ".class");&lt;br /&gt;&lt;br /&gt;      if (inp != null)&lt;br /&gt;      {&lt;br /&gt;        //If we could get an input stream then...&lt;br /&gt;&lt;br /&gt;        //read the class definition into a byte array&lt;br /&gt;        byte[] buf = new byte[1024 * 100]; //assume that the class is no larger than 100KB, this one is only 3.5KB&lt;br /&gt;        int n = inp.read(buf);&lt;br /&gt;        inp.close();&lt;br /&gt;&lt;br /&gt;        //get the defineClass method&lt;br /&gt;        Method method = ClassLoader.class.getDeclaredMethod("defineClass", String.class, byte[].class, int.class, int.class);&lt;br /&gt;&lt;br /&gt;        //defineClass is protected so we have to make it public before we can call it&lt;br /&gt;        method.setAccessible(true);&lt;br /&gt;&lt;br /&gt;        try&lt;br /&gt;        {&lt;br /&gt;          //call defineClass to inject ourselves into the MG4J class loader&lt;br /&gt;          method.invoke(loader, null, buf, 0, n);&lt;br /&gt;        }&lt;br /&gt;        finally&lt;br /&gt;        {&lt;br /&gt;          //set the defineClass method back to being protected&lt;br /&gt;          method.setAccessible(false);&lt;br /&gt;        }&lt;br /&gt;      }&lt;br /&gt;    }&lt;br /&gt;  }&lt;br /&gt;  catch (Exception e)&lt;br /&gt;  {&lt;br /&gt;    //hmm, something has gone badly wrong so throw the exception&lt;br /&gt;    throw new UndeclaredThrowableException(e, "Unable to inject " + c1.getName() + " into the same class loader as " + c2.getName());&lt;br /&gt;  }&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;In essence this method injects the definition of class &lt;code&gt;c1&lt;/code&gt; into the classloader responsible for class &lt;code&gt;c2&lt;/code&gt;. So in my term processor I call it from a static initializer as follows:&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: java;"&gt;static&lt;br /&gt;{&lt;br /&gt;  codeInjector(NormalizingTermProcessor.class,&lt;br /&gt;    it.unimi.dsi.mg4j.index.Index.class);&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;So how does this all work. Well hopefully the code comments will help but...&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Firstly we check that the classes are defined by different classloaders (lines 6-8)&lt;/li&gt;&lt;li&gt;Then we convert the class name into the path to the class file and try and open a stream to read from that file (line 13). If we can't read the class file then it means we have already injected the class which is why the classloader can't find the class file.&lt;/li&gt;&lt;li&gt;We then read the class file into a byte array (lines 20-22)&lt;/li&gt;&lt;li&gt;To inject code into a classloader we need to use the &lt;code&gt;defineClass&lt;/code&gt; method, which unfortunately is protected. So we retrieve a handle to the method and remove the protected restriction (lines 25-29)&lt;/li&gt;&lt;li&gt;We now call &lt;code&gt;deifneClass&lt;/code&gt; on the classloader we want to know about the class passing in the bytes we read in from the original class file (line 33)&lt;/li&gt;&lt;li&gt;Finally we put the protected restriction back so we leave things as they were when we found them (line 38)&lt;/li&gt;&lt;/ol&gt;Now there are a couple of things to be aware of which could trip you up if you try and do something similar:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;If there is a security manager in place then you may find that you can't call the &lt;code&gt;defineClass&lt;/code&gt; method even when the protected restriction is removed.&lt;/li&gt;&lt;li&gt;This code will result in the same class being defined in two classloaders (which after all was the whole point) but instances of the class cannot be shared between the classloaders. If you try to you will get an exception (can't actually remember which one).&lt;/li&gt;&lt;/ol&gt;Neither of these seem to be an issue with loading custom MG4J term processors into Mímir, so this seems to be a nice, albeit evil, way of allowing me to add functionality without having to add to the Mímir WAR file. Success!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-510433656252636129?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/510433656252636129/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/10/dont-look-down.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/510433656252636129'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/510433656252636129'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/10/dont-look-down.html' title='Don&apos;t Look Down'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-hiBl0se5jZE/To8kvfrghZI/AAAAAAAAAxk/tKTXhnyOlJY/s72-c/39527-animals_fear_heights.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-2050503637903850060</id><published>2011-09-22T19:48:00.001+01:00</published><updated>2011-09-25T08:43:56.738+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='JavaScript'/><title type='text'>Followers, What Followers?</title><content type='html'>&lt;img style="float:left; margin-right: 5px;" src="http://3.bp.blogspot.com/-Z3mpMsUdS9Y/Tnt5iLQZfdI/AAAAAAAAAwc/7s9JBSXTMYc/s400/followers.jpg" /&gt;Now I don't want people to think that this blog is just about code for fixing problems with Blogger but... for this post I'll be fixing a problem with Blogger!&lt;br /&gt;&lt;br /&gt;I don't tend to read my own blog that often, which means I don't spend much time looking at it. So it wasn't until I started investigating &lt;a href="http://englishjavadrinker.blogspot.com/2011/09/fixing-bloggers-mistakes.html"&gt;how to disable Blogger's forced lightbox viewer&lt;/a&gt; that I spotted I had a problem with the followers gadget on some of my blogs. Developing my original fix to that problem involved reloading my blog over and over again as I tried different things. Sometimes the followers gadget appeared and sometimes it didn't. There didn't seem to be any pattern but there was definitely a problem.&lt;br /&gt;&lt;br /&gt;Given that a lot of the gadgets you find on blogs are JavaScript based the first thing I checked was the JavaScript console (I use &lt;a href="http://getfirebug.com/"&gt;Firebug&lt;/a&gt; in Firefox for most of my web development work) which showed the following two errors whenever the followers gadget failed to appear; window.googleapisv0 is undefined and google.friendconnect.container is undefined.&lt;br /&gt;&lt;br /&gt;Given that the problem was intermittent I made a guess that this was some form of &lt;a href="http://en.wikipedia.org/wiki/Race_condition"&gt;race condition&lt;/a&gt;. Most browsers will try and load at least two files required to render a web page in parallel. This means that sometimes files download and become available to the browser in a different order. Usually this doesn't matter but my gut feeling was that this was causing the problem. A quick look at the list of JavaScript files associated with my blog showed that there were quite a few related to the followers widget and that were all loaded at roughly the same time. I experimented by manually adding each of these scripts to the head section of the HTML template until I eventually found a solution.&lt;br /&gt;&lt;br /&gt;Strangely it wasn't a script associated directly with the followers gadget that solved the problem, but rather a script for the +1 button. This did, however, explain why I was only seeing the problem on my blogs which included the sharing buttons below each post.  So if you find that your followers gadget sometimes doesn't appear then it might be worth trying the following fix.&lt;br /&gt;&lt;br /&gt;You need to edit the HTML version of your template (you might be able to do this via a HTML/JavaScript gadget as well but I've had less success) so in the old style Blogger dashboard go to the design page or in the new style go to the template page and then click to edit the HTML version of your template. Now directly after &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; insert the following:&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: js"&gt;&amp;lt;script src='https://apis.google.com/js/plusone.js' type='text/javascript'/&amp;gt;&lt;/pre&gt;&lt;br /&gt;This fixed the problem for me, and I've suggested the fix in a couple of threads on the Blogger forum and it seems to have worked there as well, so hopefully it should stop your followers gadget disappearing for good!&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-2050503637903850060?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/2050503637903850060/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/09/followers-what-followers.html#comment-form' title='25 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/2050503637903850060'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/2050503637903850060'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/09/followers-what-followers.html' title='Followers, What Followers?'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-Z3mpMsUdS9Y/Tnt5iLQZfdI/AAAAAAAAAwc/7s9JBSXTMYc/s72-c/followers.jpg' height='72' width='72'/><thr:total>25</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-8624343517774422810</id><published>2011-09-16T09:41:00.000+01:00</published><updated>2011-09-22T20:07:39.257+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='JavaScript'/><title type='text'>Fixing Blogger's Mistakes</title><content type='html'>&lt;b style="font-size:125%;"&gt;UPDATE: Blogger have finally stopped forcing the lightbox viewer upon us, which means the fix detailed in this post is no longer required! I'll be watching though and if it reappears as the default option, with no ability to turn it off, then I'll update the fix so that we can continue to choose how our images are displayed.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Yesterday Blogger &lt;a href="http://www.google.co.uk/support/forum/p/blogger/thread?tid=73493b0be3a4116d&amp;amp;hl=en"&gt;introduced a new 'feature' to some blogs&lt;/a&gt;. Now images appear in a &lt;a href="http://en.wikipedia.org/wiki/Lightbox_%28JavaScript%29"&gt;Lightbox&lt;/a&gt; powered overlay. Unfortunately a lot of people think that this feature is actually a bug. On &lt;a href="http://englishcoffeedrinker.blogspot.com/"&gt;one of my other blogs&lt;/a&gt;, it is a really problem due to the fact that I was already using a different script to achieve a similar affect. With the new feature I now get two popup copies of each image which really is horrid. So I spent a good hour trying to find a hack or workaround, until Blogger sees fit to allow us to disable the bug.&lt;br /&gt;&lt;br /&gt;The main part of the fix is a simple piece of Javascript.&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: js"&gt;&amp;lt;script type='text/javascript'&amp;gt;&lt;br /&gt;//&amp;lt;![CDATA[&lt;br /&gt; var images = document.getElementsByTagName('img');&lt;br /&gt;  for (var i = 0 ; i &amp;lt; images.length ; ++i) {&lt;br /&gt;    images[i].parentNode.innerHTML = images[i].parentNode.innerHTML;&lt;br /&gt;  }&lt;br /&gt;//]]&amp;gt;&lt;br /&gt;&amp;lt;/script&amp;gt;&lt;/pre&gt;&lt;br /&gt;The fix works because the new Blogger code adds an onClick function to the actual image, whereas most people wrap the images in a link. What I wanted to do was simply remove the onClick function but I couldn't figure out how (and believe me I tried), but simply recreating the image removes any registered events. The problem is ensuring that this code runs after the code Blogger used to add the lightbox viewer.&lt;br /&gt;&lt;br /&gt;The trick to getting this code in the right place (thanks to &lt;a href="http://www.google.co.uk/support/forum/p/blogger/user?userid=13245839567042854012&amp;amp;hl=en"&gt;Bonjour Tristesse&lt;/a&gt; for this bit) involves editing the HTML version of your template. From the Design page in the old Blogger dashboard or from the Template page in the new version bring up the HTML version of your template and then place the code almost at the very end, right between &lt;code&gt;&amp;lt;/body&amp;gt;&lt;/code&gt; and &lt;code&gt;&amp;lt;/html&amp;gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;If you aren't happy editing the HTML version of your template then you can also add the fix via a gadget. Simply go to the layout editor and add a new HTML/Javascript gadget (it doesn't matter where). Leave the title of the gadget blank and paste in the following code.&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: js"&gt;&amp;lt;script type="text/javascript"&amp;gt;&lt;br /&gt;//&amp;lt;![CDATA[&lt;br /&gt;var lightboxIsDead = false;&lt;br /&gt;function killLightbox() {&lt;br /&gt;  if (lightboxIsDead) return;&lt;br /&gt;  lightboxIsDead = true;&lt;br /&gt;  var images = document.getElementsByTagName('img');&lt;br /&gt;  for (var i = 0 ; i &amp;lt; images.length ; ++i) {&lt;br /&gt;     images[i].parentNode.innerHTML = images[i].parentNode.innerHTML;&lt;br /&gt;  }&lt;br /&gt;}&lt;br /&gt; &lt;br /&gt;if (document.addEventListener) {&lt;br /&gt;  document.addEventListener('DOMContentLoaded', killLightbox, false);&lt;br /&gt;} else {&lt;br /&gt;  document.attachEvent('onDOMContentLoaded', killLightbox);&lt;br /&gt;  window.attachEvent('onload', killLightbox);&lt;br /&gt;}&lt;br /&gt;//]]&amp;gt;&lt;br /&gt;&amp;lt;/script&amp;gt;&lt;/pre&gt;&lt;br /&gt;Save the gadget and you are done. The fix will have been applied and things should be back to how they were before Blogger introduced this bug/feature. If/when Blogger see sense and allow us to disable this feature then you can easily delete my workaround simply be deleting the gadget from your layout. Note that applying the fix by editing the HTML version of your template is slightly more reliable, but in most cases you won't see any difference between the two.&lt;br /&gt;&lt;br /&gt;Now I'm quite happy to let each individual blog owner choose how to display their photos, and some might even like the new photo viewer. From reading the forums, however, it is clear that some people just really hate the new viewer and would prefer not to see it even on other people's blogs. Well it turns out that the above fix also works when used as a &lt;a href="http://en.wikipedia.org/wiki/Greasemonkey"&gt;Greasemonkey script&lt;/a&gt;. If you already have Greasemonkey installed in your browser then you can simply &lt;a href="http://www.dcs.shef.ac.uk/%7Emark/blog/blog_files/software/KillLightbox/killlightbox.user.js"&gt;install the script&lt;/a&gt; to kill Blogger's lightbox on all Blogspot hosted blogs. If you don't have Greaemonkey installed then the Wikipedia page should point you to a version for your favorite browser.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 17th September:&lt;/b&gt; I've simplified the script slightly and added a fix so that if the mouse was already within an image when the page loaded the fix will still apply if you click the image, assuming you move the mouse at least one pixel in any direction.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 17th September:&lt;/b&gt; I've edited the post to suggest that the fix is used via a HTML/Javascript gadget so that new readers don't have to wade through the comments to find this out.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 17th September:&lt;/b&gt; Now we specify false in the &lt;code&gt;addEventListener&lt;/code&gt; call to ensure better backwards compatibility with older versions of Firefox.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 20th September:&lt;/b&gt; Added Bonjour Tristesse's much better fix as the main suggested workaround.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 21st September:&lt;/b&gt; Added the section on using the newest fix as a Greasemonkey script to kill Lightbox on all Blogspot hosted blogs.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;strike&gt;&lt;i&gt;&lt;b&gt;UPDATED 21st September:&lt;/b&gt; Simplified the new fix slightly to do the replace inside body instead of the main div. This means that it will work even if you have heavily modified a template to no longer have the named div assumed by the previous version.&lt;/i&gt;&lt;/strike&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 21st September:&lt;/b&gt; The old method now registers the function so it is fired when the DOM is loaded not the page. This should mean it works even before the page has fully loaded.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 21st September:&lt;/b&gt; Simplified the short fix, as the replacement isn't actually required to make it work. This cuts down on the number of bytes served and should run quicker as well!&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 21st September:&lt;/b&gt; Switched back to recommending the gadget based fix (albeit a simpler version) because Bonjour Tristesse's version actually breaks other widgets within the posts, such as the Google +1 button in the post sharing widget. Fortunately the new and improved gadget version is applied much quicker and so seeing the viewer is much less likely than before.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 22nd September:&lt;/b&gt; Only replace the actual image, not the entire content of the parent element. This should reduce the number of situations in which there is a chance of breaking any other scripts or gadgets.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 22nd September:&lt;/b&gt; Attach to both onDOMContentLoaded and onLoad when running under IE to ensure the code gets run regardless of which version of IE we are using, but make sure we don't try and run twice as that is pointless.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 22nd September:&lt;/b&gt; Rewrote the post to show that the same fix can be applied both by editing the HTML template or by adding a gadget. The difference from before is that now the HTML template based fix won't break the sharing buttons etc.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;&lt;b&gt;UPDATED 22nd September:&lt;/b&gt; No longer use cloneNode as IE actually clones the event handlers so the viewer still appears.&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-8624343517774422810?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/8624343517774422810/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/09/fixing-bloggers-mistakes.html#comment-form' title='113 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8624343517774422810'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8624343517774422810'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/09/fixing-bloggers-mistakes.html' title='Fixing Blogger&apos;s Mistakes'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><thr:total>113</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-8958133432998954089</id><published>2011-09-11T15:34:00.000+01:00</published><updated>2011-09-11T15:34:27.502+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><title type='text'>Trawling The Heap</title><content type='html'>&lt;img src="http://4.bp.blogspot.com/-42ppzywmTWY/TmzByd4EmCI/AAAAAAAAAv0/Cf3_z2FThMo/s400/dump_2.jpg" style="float: left; margin-right: 5px;" width="350" /&gt;I've spent a good few hours over the last week trying to track down a memory leak in a web application I've been working on. As far as I could tell from the code all the relevant resources were being freed when finished with, but still after a few hours the tomcat instance in which the app was running would grind to a halt as the available free memory inched ever closer to zero. In the end I decided that that only solution was to trawl through a heap dump to find out exactly what was being leaked and what was holding a reference to it.&lt;br /&gt;&lt;br /&gt;Now it used to be that taking exploring the Java heap was a tedious and horrid process. Fortunately, the JDK now comes with &lt;a href="http://visualvm.java.net/"&gt;VisualVM&lt;/a&gt; that makes working with the heap really easy.&lt;br /&gt;&lt;br /&gt;VisualVM can attach to any running Java process and monitor it's memory usage, which in itself can be useful, but it can also take a heap dump and then provides an easy tool for navigating through the often vasts amount of information provided. Now in theory you should be able to use VisualVM to examine the heap of the tomcat server running a troublesome web app. Now try as I might I couldn't get this to work. The problem stems from the fact that I'm running tomcat under a different user account than my own, an account that you can't actually log in to (for the curious I installed tomcat under Ubuntu using the default package which runs tomcat under the tomcat6 user). I could monitor the memory usage but no matter what I tried (and believe me I tried all sorts of things) I couldn't manage to get a heap dump.&lt;br /&gt;&lt;br /&gt;In the end I resorted to manually creating a core dump using the unix gcore utility and then loading this into VisualVM which could then generate a heap dump. This actually works quite nicely. The only downside is that it requires you to know the process ID of the tomcat web server and this changes everytime the server is restarted, which if you are debugging a problem, can be quite often. So to make my life a little easier I've written a small bash script that makes tomcat dump it's heap, which I've cleverly called tomscat!&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: bash"&gt;#!/bin/bash&lt;br /&gt;&lt;br /&gt;pid=`ps -u tomcat6 | grep java | sed 's/ .*$//g'`&lt;br /&gt;count=0&lt;br /&gt;&lt;br /&gt;ls -1 tomcat*.$pid &gt; /dev/null 2&gt;&amp;1&lt;br /&gt;&lt;br /&gt;[ $? -eq 0 ] &amp;&amp; count=`ls -1 tomcat*.$pid | wc -l`&lt;br /&gt;&lt;br /&gt;gcore -o tomcat$count $pid&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;This script firstly finds the pid for the tomcat process then works out if there are already any core dumps for this instance of tomcat and then generates a core dump into a nicely named file. Currently there is little in the way of error handling so if it doesn't work any errors may be cryptic! Anyway hopefully other people might find this script useful, I know it made the process of creating a bunch of heap dumps quite easy, and once I had the heap dumps tracking down the leak was fairly easy (turns out the the leak was due to large cache objects associated with database connections not being made available for garbage collection).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-8958133432998954089?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/8958133432998954089/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/09/trawling-heap.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8958133432998954089'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8958133432998954089'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/09/trawling-heap.html' title='Trawling The Heap'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-42ppzywmTWY/TmzByd4EmCI/AAAAAAAAAv0/Cf3_z2FThMo/s72-c/dump_2.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-8533462753787496692</id><published>2011-07-21T07:24:00.000+01:00</published><updated>2011-07-21T07:24:59.836+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Mímir'/><category scheme='http://www.blogger.com/atom/ns#' term='GATE'/><title type='text'>People in the News</title><content type='html'>&lt;img style="float:right; padding: 0 0px 5px 5px; width: 200px;" src="http://2.bp.blogspot.com/-hiTUFvmIaY4/TicLZPzMYOI/AAAAAAAAAts/m15_f7I-yDs/s400/pin.png" /&gt;Back in May I was involved in producing a demo for a show-and-tell session at the &lt;a href="http://gate.ac.uk/conferences/fig/fig4.html"&gt;GATE training course&lt;/a&gt;. The idea was to try and demonstrate the process of defining an application, developing an annotation pipeline, annotating a large corpus, and then providing search over the documents, annotations and associated semantic information.&lt;br /&gt;&lt;br /&gt;The idea we settled upon was to extend the basic &lt;a href="http://gate.ac.uk/userguide/chap:annie"&gt;ANNIE&lt;/a&gt; application, that is bundled with GATE, to annotate BBC News articles and to link the entities within them to &lt;a href="http://dbpedia.org"&gt;DBpedia&lt;/a&gt;. This would then allow us to search the documents both for textual information, the same as any other search engine, but to also restrict the search based on information that might not be present in the documents but which is encoded in DBpedia. This worked well and allowed us to demonstrate the use of &lt;a href="http://gate.ac.uk/family/developer.html"&gt;GATE Developer&lt;/a&gt;, the &lt;a href="http://gate.svn.sourceforge.net/viewvc/gate/gcp/trunk/"&gt;GATE Cloud Parallelizer&lt;/a&gt; (GCP) and &lt;a href="http://gate.ac.uk/family/mimir.html"&gt;GATE Mímir&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;The combination of text, annotations and semantic information allow us to search the documents in interesting ways. You can play with the &lt;a href="http://demos.gate.ac.uk/mimir/gpd/search/gus"&gt;basic Mímir interface&lt;/a&gt; (referred to as GUS) over the demo index to see for yourself how useful the combination can be. Given that not many people reading this will already know the Mímir query syntax, and those that do probably won't know what annotations etc. are in the index, here are few example queries to get you started:&lt;br /&gt;&lt;dl&gt;&lt;dt&gt;People Born in Sheffield&lt;/dt&gt;&lt;dd&gt;&lt;code&gt;{Person sparql = "SELECT ?inst WHERE { ?inst :birthPlace &amp;lt;http://dbpedia.org/resource/Sheffield&amp;gt;}"}&lt;/code&gt;&lt;/dd&gt;&lt;dt&gt;The Location of Steel Industries&lt;/dt&gt;&lt;dd&gt;&lt;code&gt;{Organization sparql = "SELECT ?inst WHERE { ?inst :industry &amp;lt;http://dbpedia.org/resource/Steel&amp;gt;}"} [0..4] in {Location}&lt;/code&gt;&lt;/dd&gt;&lt;dt&gt;A BBC Scotland document, written after the start of 2011, in which a Labour Party member is being quoted&lt;/dt&gt;&lt;dd&gt;&lt;code&gt;({Person sparql = "SELECT ?inst WHERE { ?inst :party &amp;lt;http://dbpedia.org/resource/Labour_Party_%28UK%29&amp;gt;}"} root:say) IN ({Document date &amp;gt; 20110000} OVER {DocumentClassification sparql = "SELECT ?inst WHERE { ?inst a bbc:Classification . FILTER (?inst = bbc:Scotland)}"})&lt;/code&gt;&lt;/dd&gt;&lt;/dl&gt;As you can see from these examples, as the queries get more complex they quickly become unwieldy. The problem is that Mímir provides a very rich query syntax and the basic GUS interface does nothing to hide the syntax from the user. Whenever we demo Mímir people love it but we always have to stress that GUS is not an end user search tool -- it is a development tool to enable you to check the contents of an index and to develop complex queries. In other words...&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center; font-size: 120%;"&gt;&lt;img src="http://1.bp.blogspot.com/-Akh8Tdyrg6s/TicWExYU7MI/AAAAAAAAAt0/VqIss3ogQm0/s400/mindtrick.jpg" /&gt;&lt;div&gt;GUS is &lt;i&gt;not&lt;/i&gt; the interface you are looking for!&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;Now I really like the demo we put together but trying to teach people the Mímir query syntax is difficult, especially if they don't already know any SPARQL. Also it is difficult to explain to potential partners/customers how they could take a Mímir index and produce their own custom interfaces. Whilst these thoughts have been festering in the back of my mind for a while I've only just found the time to go back to the demo and to build a custom interface (partly because next week I'm going to be teaching some people how to build custom Mímir interfaces, so I thought it best to have built at least one).&lt;br /&gt;&lt;br /&gt;Given how rich the query syntax is, it is unlikely that a custom interface will be able to expose all the information within the index. Instead a number of interfaces may be developed, for the same index, in order to provide different types of search. Given this I decided to focus on searching for people within the BBC News articles. I used GUS to explore the index (which is what GUS is really for) and built up a number of complex person related queries. I then set about breaking these queries down into sections that could be easily represented in a form based fashion.&lt;br /&gt;&lt;br /&gt;Once the form was complete it was trivial to reconstruct the complex queries from the form elements. All that was left to do was to interface with the Mímir index. Fortunately as well as GUS Mímir comes with an XML based &lt;a href="http://en.wikipedia.org/wiki/Representational_State_Transfer"&gt;RESTful&lt;/a&gt; interface. So the demo now builds complex queries from the form elements submits the query to Mímir via it's RESTful interface and then displays the results all without the user having to know anything about Mímir's query syntax.&lt;br /&gt;&lt;br /&gt;The completed demo is unimaginatively called &lt;a href="http://demos.gate.ac.uk/pin/"&gt;People in the News&lt;/a&gt; and you should feel free to play around with it. Some example queries include; &lt;a href="http://demos.gate.ac.uk/pin/?name=Jonathan&amp;bornIn=&amp;famousAs=Criminal&amp;after=&amp;before=&amp;within=&amp;content=true"&gt;criminals called Jonathan&lt;/a&gt;, &lt;a href="http://demos.gate.ac.uk/pin/?name=&amp;bornIn=Russia&amp;famousAs=Astronaut&amp;after=&amp;before=&amp;within=&amp;content=true"&gt;Russian astronauts&lt;/a&gt;, and (my favourite complex example) &lt;a href="http://demos.gate.ac.uk/pin/?name=&amp;bornIn=Sheffield&amp;famousAs=Politician|OfficeHolder&amp;after=01%2F04%2F2011&amp;before=30%2F04%2F2011&amp;within=Scotland&amp;content=true"&gt;politicians born in Sheffield mentioned in BBC Scotland documents from April 2011&lt;/a&gt;. The nice thing about the new interface is how easy it is to fill in the form to run these queries. That last example would otherwise entail you entering the following into GUS:&lt;br /&gt;&lt;dl&gt;&lt;dd&gt;&lt;code&gt;(({Person sparql="SELECT DISTINCT ?inst WHERE { ?inst :birthPlace &amp;lt;http://dbpedia.org/resource/Sheffield&amp;gt; . { ?inst a :Politician } UNION { ?inst a :OfficeHolder . ?inst a &amp;lt;http://xmlns.com/foaf/0.1/Person&amp;gt; } }"}) IN {Content}) IN ({Document date &amp;gt;= 20110401 date &amp;lt;= 20110430} OVER {DocumentClassification sparql = "SELECT ?inst WHERE { ?inst a bbc:Classification . FILTER (?inst = bbc:Scotland)}"})&lt;/code&gt;&lt;/dd&gt;&lt;/dl&gt;It is still something of a work-in-progress so if you have any ideas for improvements, or you find any bugs/oddities please do let me know.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-8533462753787496692?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/8533462753787496692/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/07/people-in-news.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8533462753787496692'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8533462753787496692'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/07/people-in-news.html' title='People in the News'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-hiTUFvmIaY4/TicLZPzMYOI/AAAAAAAAAts/m15_f7I-yDs/s72-c/pin.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-3308999214758907128</id><published>2011-06-22T18:31:00.000+01:00</published><updated>2011-06-22T18:31:04.188+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Cranium'/><title type='text'>Cranium</title><content type='html'>&lt;img style="display: none;" src="http://2.bp.blogspot.com/-Dd3HDQ4CrEc/TgIkKL6t2KI/AAAAAAAAAqI/VPdzIrltZIg/s400/cranium.png" /&gt;Over the last few weeks I've been trying to hunt down a memory leak in a servlet based web application. Periodically the Java virtual machine in which Tomcat was running would inexplicably run out of PermGen space and become so unresponsive that the only solution was to kill and restart the server process. After a lot of hunting through logs and trawling the Internet for pointers, I've found that the problem actually occurs when a web application is redeployed, although the out of memory error may occur later (which is why it was difficult to spot in the logs).&lt;br /&gt;&lt;br /&gt;It turns out that when an application is redeployed the old classloader should be garbage collected which should free up both heap and PermGen memory by removing all the information related to the discarded web application. Unfortunately if something outside your web application holds a reference to even one class within the application which was loaded via the applications classloader then the classloader itself, and hence all the class information it has loaded, will not become eligible for garbage collection and this, eventually, results in exhaustion of the PermGen memory pool. If that isn't initially clear, never fear, as Frank Kieviet wrote a &lt;a href="http://blogs.oracle.com/fkieviet/entry/classloader_leaks_the_dreaded_java"&gt;brilliant article&lt;/a&gt; (with diagrams) which explains the problem in more detail.&lt;br /&gt;&lt;br /&gt;Looking back through the Tomcat logs it seems as if something within one of the libraries I was using is leaking a Timer instance which stops the classloader being garbage collected. I haven't actually managed to fix the problem yet but I did learn quite a few things along the way which I've collected together and turned into....&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img src="http://2.bp.blogspot.com/-4soiTQqxQgU/TgIOquJ3QqI/AAAAAAAAAqA/QCkoGaNhPJ4/s400/cranium.png" /&gt;&lt;/div&gt;Cranium is a web application (distributed as a WAR file) that provides information on the memory usage of the servlet container in which it is being hosted. This includes information on all the memory pools (both heap and non-heap) as well as class loading and garbage collection. It also incorporates two different ways of triggering garbage collection to help monitor for memory leaks etc. Rather than trying to explain in detail what Cranium allows you to monitor I'm hosting it as a &lt;a href="http://demos.englishcoffeedrinker.co.uk/cranium/"&gt;demo&lt;/a&gt; for you to look at (although I've disabled the garbage collection tools so that they cannot be used to make the server unstable).&lt;br /&gt;&lt;br /&gt;As with most of my software Cranium is open-source and you can grab the code from my &lt;a href="http://greenwoodma.servehttp.com/svn/repos/open-source/list/Cranium/trunk/"&gt;SVN repository&lt;/a&gt; or you can simply grab a &lt;a href="http://demos.englishcoffeedrinker.co.uk/cranium/cranium.war"&gt;pre-built WAR file&lt;/a&gt;. If you want to track development of Cranium then you can monitor it via my &lt;a href="http://greenwoodma.servehttp.com/jenkins/job/Cranium/"&gt;Jenkins server&lt;/a&gt; which also produces a &lt;a href="http://greenwoodma.servehttp.com/jenkins/job/Cranium/lastSuccessfulBuild/artifact/cranium.war"&gt;bleeding edge WAR file&lt;/a&gt; on each build.&lt;br /&gt;&lt;br /&gt;I know a lot of the information Cranium displays is available through other tools but I'm already finding it really useful and I hope that at least one other person does too!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-3308999214758907128?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/3308999214758907128/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/06/cranium.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3308999214758907128'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3308999214758907128'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/06/cranium.html' title='Cranium'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-Dd3HDQ4CrEc/TgIkKL6t2KI/AAAAAAAAAqI/VPdzIrltZIg/s72-c/cranium.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-1592885830637907245</id><published>2011-04-09T19:14:00.000+01:00</published><updated>2011-04-09T19:14:17.553+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Quick As A Flash'/><title type='text'>Gordon's Alive!</title><content type='html'>&lt;img style="float:left; width: 250px; padding: 10px;" src="http://3.bp.blogspot.com/-Xs-AzHbklLU/TaCdBCnkhQI/AAAAAAAAAow/TQ5w1p9L1ag/s400/flashgordon.jpg" /&gt;A few years ago I wrote a small servlet to allow QuickTime movies to be converted into Flash video on the fly, specifically to support playback on the Wii -- I gave it the rather unimaginative name &lt;a href="http://englishcoffeedrinker.blogspot.com/2008/11/bringing-quicktime-to-wii.html"&gt;Quick As A Flash&lt;/a&gt;. I've had little need to update the code until recently when I &lt;a href="http://englishcoffeedrinker.blogspot.com/2011/02/meerkats-have-taken-over.html"&gt;upgraded my main PC from Windows to Ubuntu&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I have a web app that I wrote and use to index/search all the DVDs I own. It interfaces with Amazon to get artwork and reviews and allows for linking trailers to each film. I had been using Apple's &lt;a href="http://en.wikipedia.org/wiki/QuickTime_for_Java"&gt;QuickTime for Java&lt;/a&gt; to get the dimensions and duration of the QuickTime trailers I was adding to the index. Unfortunately this has a) never been available under Linux and b) has been deprecated by Apple. So I decided to revisit Quick As A Flash and add support for extracting this information to the servlet.&lt;br /&gt;&lt;br /&gt;Quick As A Flash uses &lt;a href="http://www.ffmpeg.org/"&gt;FFmpeg&lt;/a&gt; to do the transcoding to Flash and it is trivial to read the dimensions and duration of the movie from the FFmpeg output. In a simply case of I-could-so-I-did I've also added support for generating a thumbnail image from the QuickTime movie.&lt;br /&gt;&lt;br /&gt;I've no idea if anyone else is using this code or would ever find it useful but if you are interested then you can grab a &lt;a href="http://www.dcs.shef.ac.uk/%7Emark/blog/blog_files/software/web/quickflash.zip"&gt;binary copy&lt;/a&gt; or track progress on the &lt;a href="http://greenwoodma.servehttp.com/jenkins/job/Quick%20As%20A%20Flash/"&gt;Jenkins build page&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-1592885830637907245?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/1592885830637907245/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/04/gordons-alive.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/1592885830637907245'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/1592885830637907245'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/04/gordons-alive.html' title='Gordon&apos;s Alive!'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-Xs-AzHbklLU/TaCdBCnkhQI/AAAAAAAAAow/TQ5w1p9L1ag/s72-c/flashgordon.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-7100625605170081642</id><published>2011-03-27T19:23:00.001+01:00</published><updated>2011-07-20T18:11:01.666+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Mímir'/><category scheme='http://www.blogger.com/atom/ns#' term='GATE'/><title type='text'>About The Size Of It</title><content type='html'>&lt;img src="http://2.bp.blogspot.com/-O6tgo7bvm1k/TY8VAj0m-eI/AAAAAAAAAoE/02kk-p3LwWI/s1600/measuring_belt.jpgg" style="float: left; padding: 0 10px;" width="300" /&gt;After quite a lot of work I've now managed to bring some semblance of order (and documentation) to the last of the GATE plugins that I've been trying to clean up for general release. So as of the most recent &lt;a href="http://hudson.gate.ac.uk/job/GATE-Nightly/lastSuccessfulBuild/"&gt;nightly build of GATE&lt;/a&gt; there is now a Measurements Tagger which you can load from the Tagger_Measurements plugin. I'm not going to attempt to give a full description of the PR here, so if you want the full details have a look at the &lt;a href="http://gate.ac.uk/userguide/sec:misc-creole:measurements"&gt;user guide where there are three whole pages you can read&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;In essence the PR annotates measurements appearing in text and normalizes the extracted information to allow for the easy comparison of measurements defined using different units. Now while that description is accurate it probably doesn't make much sense so here are a few examples.&lt;br /&gt;&lt;br /&gt;Imagine that you wanted to find all distance measurements less than 3 metres appearing in a document. The Measurements Tagger makes this really simple. You could annotate your documents and then look at the &lt;code&gt;unit&lt;/code&gt; and &lt;code&gt;value&lt;/code&gt; features of all the Measurement annotations to find those where the unit is "metre" and the value is less than 3, but this would miss lots of valid measurements. For example, 3cm is less than three metres but uses a prefix to make writing the measurement easier. Or how about 4.5 inches? This is clearly less than 3 metres but is specified in an entirely different system of units. Fortunately as well as annotating measurements with the unit and value specified in the document, this new PR also normalizes (where possible) the measurement to it's base form. &lt;br /&gt;&lt;br /&gt;The base form of a unit usually consists solely of &lt;a href="http://en.wikipedia.org/wiki/International_System_of_Units"&gt;SI units&lt;/a&gt;. This means, for example, that all lengths are normalized to metres, times to seconds, and speeds to metres per second (which is classed as a derived unit but is made up only of SI units).&lt;br /&gt;&lt;br /&gt;In our example this means that 3cm is normalized to 0.03m and 4.5 inches to 0.1143m which allows them to both be recognized as being less than 3 metres. Under the hood the PR uses a modified version of the &lt;a href="http://units-in-java.sourceforge.net/"&gt;Java port&lt;/a&gt; of the &lt;a href="http://www.gnu.org/software/units/"&gt;GNU Units&lt;/a&gt; package to recognize and normalize the measurements. This approach makes it easy to add new units or to customize the parser for a specific domain, providing a very flexible solution.&lt;br /&gt;&lt;br /&gt;The PR doesn't actually contain code for recognizing the value of a measurement, rather it relies on the annotations produced by the &lt;a href="http://gate.ac.uk/userguide/sec:misc-creole:numbers:numbers"&gt;Numbers Tagger&lt;/a&gt; I cleaned up and &lt;a href="http://englishjavadrinker.blogspot.com/2011/02/numbers-have-real-value.html"&gt;released back in February&lt;/a&gt;. This means that this new PR can also recognize numbers written in many different ways allowing for measurements such as "forty-five miles per hour", "three thousand nanometres" and "2 1/2 pints".&lt;br /&gt;&lt;br /&gt;Both the Numbers and Measurement taggers were originally developed for annotating a large corpus of patent documents. Once annotated the corpus could then be searched via another GATE technology called &lt;a href="http://gate.ac.uk/family/mimir.html"&gt;Mímir&lt;/a&gt;. Mímir, is a multiparadim IR system which allows searching over text, annotations, and knowledge base data. There are a couple of &lt;a href="http://services.gate.ac.uk/mimir/"&gt;demo indexes&lt;/a&gt; (including a subset of the patent corpus) that you can try, and this video does a good job of explaining how the measurement annotations can be really useful.&lt;br /&gt;&lt;br /&gt;&lt;div style="clear:both; text-align:center;"&gt;&lt;iframe src="http://player.vimeo.com/video/11334635?title=0&amp;amp;byline=0&amp;amp;color=0065FF" width="500" height="281" frameborder="0"&gt;&lt;/iframe&gt;&lt;/div&gt;&lt;br /&gt;If you find the whole topic of measurements interesting then I'd recommend reading &lt;a href="http://www.amazon.co.uk/gp/product/0330450301/ref=as_li_ss_tl?ie=UTF8&amp;tag=markgreenwoos-21&amp;linkCode=as2&amp;camp=1634&amp;creative=19450&amp;creativeASIN=0330450301"&gt;"About The Size Of It"&lt;/a&gt; by Warwick Cairns. It's only a short book but it explains why we use the measurements we do and how they have evolved over time. I found it interesting, but then I quite like &lt;a href="http://writingtheblurb.blogspot.com/search/label/non-fiction"&gt;reading non-fiction&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Hopefully the new measurement PR will turn out to be really useful for a lot of people/projects. If you benefit from using GATE in general, or these new PRs in particular, then why not consider &lt;a href="http://gate.ac.uk/donate.html"&gt;making a donation&lt;/a&gt; to help support future development.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-7100625605170081642?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/7100625605170081642/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/03/about-size-of-it.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7100625605170081642'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7100625605170081642'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/03/about-size-of-it.html' title='About The Size Of It'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-O6tgo7bvm1k/TY8VAj0m-eI/AAAAAAAAAoE/02kk-p3LwWI/s72-c/measuring_belt.jpgg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-62885893768459331</id><published>2011-03-16T19:06:00.000Z</published><updated>2011-03-16T19:06:38.303Z</updated><title type='text'>Hudson Becomes Jenkins</title><content type='html'>I've upgraded the Hudson instance I use to compile most of my software to the newest version which, &lt;a href="http://jenkins-ci.org/content/hudsons-future"&gt;after a dispute with Oracle&lt;/a&gt;, is now called Jenkins. As well as upgrading the software I've changed the URL to match. I'm using &lt;a href="http://j2ep.sourceforge.net/"&gt;J2EP&lt;/a&gt; in order to rewrite the old URLs to their new forms so hopefully all existing links will continue to work as before, but if you spot anything that doesn't seem to work properly please leave a comment so I can fix things.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-62885893768459331?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/62885893768459331/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/03/hudson-becomes-jenkins.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/62885893768459331'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/62885893768459331'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/03/hudson-becomes-jenkins.html' title='Hudson Becomes Jenkins'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-7436667169735055102</id><published>2011-03-03T10:18:00.000Z</published><updated>2011-03-03T10:18:56.090Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='Date Parser'/><category scheme='http://www.blogger.com/atom/ns#' term='GATE'/><title type='text'>When Was Yesterday?</title><content type='html'>&lt;img src="http://2.bp.blogspot.com/-imt_o3O950c/TW6h8DqhMXI/AAAAAAAAAmw/T3mce1clk2k/s800/calendar.jpg" style="float: left; padding: 0 10px;" width="300" /&gt;Today sees the release of another of the GATE plugins I've been working on cleaning up over the last few months. Unlike the other plugins I've talked about recently this one has a much longer history as I wrote the core code back when I was a PhD student.&lt;br /&gt;&lt;br /&gt;Many information extraction (IE) tasks benefit from or require the extraction of accurate date information. While ANNIE (the IE system that comes with GATE) does produce Date annotations no attempt is made to normalize these dates, i.e. to firmly fix all dates, even partial or relative ones, to a timeline using a common date representation. My PhD focused on &lt;a href="http://www.dcs.shef.ac.uk/~mark/nlp/pubs/thesis.pdf"&gt;open-domain question answering&lt;/a&gt;, an IE task in which dates can play an import role; any "when" question, or questions starting "who is..." benefit from accurate date information. The problem was that I couldn't find a good Java library for parsing dates into a common format, so of course I set about writing one.&lt;br /&gt;&lt;br /&gt;The library I wrote is unimaginatively called Date Parser and has been freely available since around 2005. You can currently find the parser being built by my &lt;a href="http://greenwoodma.servehttp.com/hudson/job/Date%20Parser/"&gt;Hudson server&lt;/a&gt;. Without going into too many technical details (the &lt;a href="http://greenwoodma.servehttp.com/hudson/job/Date%20Parser/javadoc"&gt;Javadoc&lt;/a&gt; is available for those who like that kind of thing) the parser takes a string and attempts to parse it as a date starting from a given offset. Unlike the built in &lt;a href="http://download.oracle.com/javase/1.5.0/docs/api/java/text/DateFormat.html"&gt;DateFromat&lt;/a&gt; class which is limited to parsing one date format at a time my parser attempts to handle as many date formats as possible. Of course there are only so many ways you can re-arrange three pieces of information, but the parser also handles relative dates and dates which are not fully specified. For example, "April 2011" would be parsed into a Date object representing the 1st of April 2011. Possibly more interesting though is that fact that words/phrases such as yesterday, today, next Wednesday, and 3 days ago are all also parsed and recognized. In these instances the actual date being mentioned is calculated based upon a context date supplied to the parser. So if the word yesterday appears in the context of the 3rd of March 2011 the string will be recognized as referring to the 2nd of March 2011.&lt;br /&gt;&lt;br /&gt;The parser worked really well during my PhD work and has seen numerous improvements since then as well. It started to be used in GATE projects a year or so ago and was initially used in conjunction with ANNIE. ANNIE adds Date annotations to document and I wrote a JAPE grammar that would find these annotations and then run the parser over the underlying text adding the normalized date value (if found) as a new feature. The code eventually moved to being a PR (rather than JAPE) for performance reasons and to support some new features. The problem, however, was that the dates the parser could handle and the dates that ANNIE finds don't always align. This meant that adding a new date format required changes to both ANNIE and the Date Parser. So when I started to clean up the code for release I made the decision to re-write the PR as a standalone component that no longer relies on ANNIE.&lt;br /&gt;&lt;br /&gt;Surprisingly it was very easy to convert the existing code to remove the reliance on ANNIE and I think the performance (both time and accuracy) have been improved as a result. This isn't to say that ANNIE is bad at finding dates, just that it does some things differently and it also annotates times with Date annotations which for this task can confuse the issue.&lt;br /&gt;&lt;br /&gt;Full documentation is available in the &lt;a href="http://gate.ac.uk/userguide/sec:misc-creole:datenormalizer"&gt;user guide&lt;/a&gt; and the PR is already available in the &lt;a href="http://hudson.gate.ac.uk/job/GATE-Nightly/lastSuccessfulBuild/"&gt;nightly builds&lt;/a&gt; of GATE (you need to load the Tagger_DateNormalizer plugin) so feel free to have a play and let me know what you think.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-7436667169735055102?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/7436667169735055102/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/03/when-was-yesterday.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7436667169735055102'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7436667169735055102'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/03/when-was-yesterday.html' title='When Was Yesterday?'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-imt_o3O950c/TW6h8DqhMXI/AAAAAAAAAmw/T3mce1clk2k/s72-c/calendar.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-68164419254650173</id><published>2011-02-22T17:24:00.002Z</published><updated>2011-02-23T07:13:25.498Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><title type='text'>More Ice In Your Tea?</title><content type='html'>&lt;img src="http://3.bp.blogspot.com/-TZbFpdi0CI8/TWPt6tPDXgI/AAAAAAAAAmQ/zsKVA_pU9vY/s400/BugsLifeWallpaper800.jpg" style="float: left; padding: 0 10px;" width="300" /&gt;I really shouldn't blog when I'm angry or annoyed as I tend to rant a little more than I intend! In retrospect I was a little harsh in my &lt;a href="http://englishjavadrinker.blogspot.com/2011/02/why-you-shouldnt-drink-icedtea.html"&gt;last post&lt;/a&gt; -- anyone who freely gives their time to developing free software shouldn't have to put up with me disparaging their work.&lt;br /&gt;&lt;br /&gt;So as penance I've now tracked down the source of the weird class loading bug I highlighted and have submitted a detailed bug report, including a proposed fix, to the &lt;a href="http://icedtea.classpath.org/wiki/IcedTea-Web"&gt;IcedTea netx project&lt;/a&gt; (netx is the name of the open-source Web Start replacement). You can monitor the &lt;a href="http://icedtea.classpath.org/bugzilla/show_bug.cgi?id=638"&gt;progress of the bug&lt;/a&gt; through their public bug tracker. If I had the right permissions it's such a simple fix that I'd be happy to do it myself, but you have to earn the respect of project maintainers before getting the right to commit code.&lt;br /&gt;&lt;br /&gt;&lt;em&gt;Update, 23th February: it's now been fixed in the main code tree although it will take a while before it makes it into an Ubuntu update.&lt;/em&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-68164419254650173?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/68164419254650173/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/02/more-ice-in-your-tea.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/68164419254650173'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/68164419254650173'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/02/more-ice-in-your-tea.html' title='More Ice In Your Tea?'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-TZbFpdi0CI8/TWPt6tPDXgI/AAAAAAAAAmQ/zsKVA_pU9vY/s72-c/BugsLifeWallpaper800.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-6323156334144004065</id><published>2011-02-20T20:28:00.000Z</published><updated>2011-02-20T20:28:47.593Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><title type='text'>Why You Shouldn't Drink The IcedTea</title><content type='html'>&lt;img border="0" height="320" src="http://2.bp.blogspot.com/--udbpeO4-cc/TWFcULGWG2I/AAAAAAAAAmM/wjtW1mv6ems/s320/bigstockphoto_glass_of_iced_tea_with_a_pitch_1620479.jpg" style="float: left; padding: 10px;" width="212" /&gt;I'm all for supporting open-source software but there are limits. I've recently &lt;a href="http://englishcoffeedrinker.blogspot.com/2011/02/meerkats-have-taken-over.html"&gt;switched to using Ubuntu&lt;/a&gt; on my main machine at home and have run into two bugs in the same piece of open-source software.&lt;br /&gt;&lt;br /&gt;If you are a regular reader of this blog then you are probably aware that I do most of my software development using Java. A default install of Ubuntu (10.10) includes the OpenJDK based IcedTea version of Java 6. This is a version of Java that is covered by an open-source license -- which is in comparison to the Sun/Oracle version of Java for which you can read the source but which was not covered by an open-source licence (it's now "mostly" covered by GPL v2 with the classpath exception). I've never really understood the philosophical argument behind IcedTea and the need for a clean room implementation of Java, although Oracle's recent attack on Android provides some explanation. Anyway, given that it was the default installation of Java I was willing to give it a try. Within minutes though I'd found two show stopping bugs and so have switched back to using the reliable Sun/Oracle release of Java 6.&lt;br /&gt;&lt;br /&gt;The first bug is visual and one that I knew existed in earlier versions of IcedTea but which I hoped had been fixed by now. In essence the ImageIO JPEG reader in IcedTea doesn't properly handle JPEG images with embedded colour profiles. What you end up with is an image that looks like a a photographic negative rather than the image you tried to load. This bug basically means that you can't use IcedTea for any application that allows users to load arbitrary JPEG files. For me this means I can't recommend it for running &lt;a href="http://englishcoffeedrinker.blogspot.com/2007/10/convert4frame.html"&gt;Convert4Frame&lt;/a&gt;, &lt;a href="http://englishcoffeedrinker.blogspot.com/2008/10/tagme.html"&gt;TagME&lt;/a&gt;, &lt;a href="http://englishcoffeedrinker.blogspot.com/2009/11/trotternish-pumpkin.html"&gt;PhotoGrid&lt;/a&gt; or &lt;a href="http://englishcoffeedrinker.blogspot.com/2010/07/3dassembler.html"&gt;3DAssembler&lt;/a&gt;. Also I can't use IcedTea to run the tomcat server in which I host my &lt;a href="http://greenwoodma.servehttp.com/cookbook/"&gt;cookbook&lt;/a&gt; web app. What is really annoying about this bug is that it was originally in the main Sun/Oracle distribution, reported all the way back in 2003, but was fixed in Java 5 update 4, you can read all about it in the &lt;a href="http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4881314"&gt;bug report&lt;/a&gt;. If the open-source version can't fix a bug that is around eight years old then why do they even bother!&lt;br /&gt;&lt;br /&gt;The second bug is a little stranger but no less annoying. The &lt;a href="http://download.oracle.com/javase/1.5.0/docs/api/java/lang/ClassLoader.html#loadClass%28java.lang.String%29"&gt;documentation&lt;/a&gt; for the method &lt;code&gt;ClassLoader.loadClass(String name)&lt;/code&gt; states that either it returns the resulting &lt;code&gt;Class&lt;/code&gt; object or throws a &lt;code&gt;ClassNotFoundException&lt;/code&gt; if (wait for it) the class was not found. That all seems perfectly logical to me. Unfortunately there appears to be at least one situation in which IcedTea returns &lt;code&gt;null&lt;/code&gt; instead of throwing an exception when the class cannot be found.&lt;br /&gt;&lt;br /&gt;I distribute a lot of the open-source Java software that I develop in my spare time via Web Start and once I had Ubuntu up and running I thought I'd check Java by launching &lt;a href="http://englishcoffeedrinker.blogspot.com/2010/07/3dassembler.html"&gt;3DAssembler&lt;/a&gt;. Unfortunately it failed to load and gave me a rather strange &lt;code&gt;NullPointerException&lt;/code&gt;. After a bit of digging around (the version of the app on my website doesn't match my development version and hence the line numbers were out) I eventually tracked the problem back to this try/catch block.&lt;br /&gt;&lt;pre class="brush: java"&gt;try {&lt;br /&gt;  Class rmClass = Assemble3D.class.getClassLoader().loadClass("org.jdesktop.swinghelper.debug.CheckThreadViolationRepaintManager");&lt;br /&gt;  RepaintManager.setCurrentManager((RepaintManager)rmClass.getConstructor().newInstance());&lt;br /&gt;  System.err.println("EDT Debug Mode Is Active");&lt;br /&gt;}&lt;br /&gt;catch (ClassNotFoundException e) {&lt;br /&gt;  // the debug classes from SwingHelper are not available&lt;br /&gt;}&lt;/pre&gt;This code tries to load a class, via reflection, that catches EDT violations (painting Swing components from the wrong thread) and that I only use during development to aid in debugging. I load the class via reflection so that when I distribute the application I can simply leave out the JAR file containing the debug class and everything will continue to work -- the class isn't found so an exception is thrown, caught and ignored and the application continues on. The problem with IcedTea is that when running as a Web Start application the call to &lt;code&gt;loadClass&lt;/code&gt; in line 2 returns &lt;code&gt;null&lt;/code&gt; instead of throwing a &lt;code&gt;ClassNotFoundException&lt;/code&gt;. This means that the catch block isn't triggered and the exception is thrown all the way out of the main method, killing the application. It seems to only be a Web Start issue as running my development copy locally under IcedTea doesn't cause &lt;code&gt;loadClass&lt;/code&gt; to return &lt;code&gt;null&lt;/code&gt;. Of course this problem I can fix by changing the catch block to trap all exceptions, but the point is I shouldn't have to!&lt;br /&gt;&lt;br /&gt;As I said at the beginning of this post I'm all for open-source software, but I believe there are cases where developers who give their time freely to projects should think more about the merits of the project and if it is really needed. The "official" Oracle release of Java is now, for all intense and purposes, under an open-source license for the development of desktop applications (mobile and embedded uses are a different kettle of fish). Given this, is there really any need for a clean room implementation, especially when that implementation is so buggy as to render it useless in many situations?&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-6323156334144004065?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/6323156334144004065/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/02/why-you-shouldnt-drink-icedtea.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6323156334144004065'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6323156334144004065'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/02/why-you-shouldnt-drink-icedtea.html' title='Why You Shouldn&apos;t Drink The IcedTea'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/--udbpeO4-cc/TWFcULGWG2I/AAAAAAAAAmM/wjtW1mv6ems/s72-c/bigstockphoto_glass_of_iced_tea_with_a_pitch_1620479.jpg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-7917922513386542386</id><published>2011-02-13T09:07:00.000Z</published><updated>2011-02-13T09:07:36.293Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='GATE'/><title type='text'>What's Actually Worth Reading?</title><content type='html'>Another day, another GATE processing resource -- as you can tell I've been busy tidying up the PRs that I've developed recently. One of the reasons for this spurt of cleaning and documenting code is that a project I'm currently working on is ending soon and the information extraction pipeline we have developed needs to be fully documented. Being able to just point to multiple sections of the GATE user guide for more details on each PR in the application makes the documentation much easier to write. Of course that means that the PRs have to actually have documentation in the user guide!&lt;br /&gt;&lt;br /&gt;I won't go into details about the project I'm currently working on with &lt;a href="http://www.nationalarchives.gov.uk/"&gt;The National Archives&lt;/a&gt; (if you want the details then there was &lt;a href="http://www.kable.co.uk/national-archives-web-searches-upgrade-28jun10"&gt;a press release&lt;/a&gt; and the head of the GATE group, i.e. my boss, &lt;a href="http://computingtext.blogspot.com/2010/05/open-data-at-national-archives.html"&gt;has blogged about it&lt;/a&gt;) suffice it to say that it involves processing millions of web pages drawn from hundreds of different web sites.&lt;br /&gt;&lt;br /&gt;We can extract an awful lot of information from the web pages we are processing, so much so in fact that it can be difficult to search through the information. We have multiple tools to help with searching but one thing we quickly realised is that it would be nice to ignore information extracted from boilerplate content. Most web pages contain text that isn't really part of the content; headers, menus, navigation links etc. These sections can contain entities that we might extract but it is highly unlikely that they will be relevant to the main content of the page. For this reason it would be nice to be able to exclude these in some way when searching through the extracted information.&lt;br /&gt;&lt;br /&gt;The approach we choose was to keep everything extracted using the IE pipeline but to also determine the sections of the document that were actually content. This allows us to search for entities within content. It also means that if our ability to determine what is useful content and what isn't is flawed in any way we have still extracted the entities appearing in other parts of the document.&lt;br /&gt;&lt;br /&gt;Rather than implementing a content detection system from scratch I decided to base the PR on an existing Java library called &lt;a href="http://code.google.com/p/boilerpipe/"&gt;boilerpipe&lt;/a&gt;. The boilerpipe library contains a number of different algorithms for detecting content most of which are available through the new GATE PR. There are some features that are not available due to it currently not being possible to map them directly to a GATE document.&lt;br /&gt;&lt;br /&gt;To give you a better idea of what the new PR does here is a screen shot of a &lt;a href="http://www.bbc.co.uk/news/world-middle-east-12388844"&gt;web page&lt;/a&gt; loaded into both a browser and GATE. In the GATE window you can see the pink sections that have been marked as content (click on the image for a larger easier to read version).&lt;br /&gt;&lt;br /&gt;&lt;div style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-6auvZqHsJBA/TVa5Y4p2wzI/AAAAAAAAAl0/1eOpIm6TRQY/s1600/boilerpipe-example.png"&gt;&lt;img src="http://2.bp.blogspot.com/-6auvZqHsJBA/TVa5Y4p2wzI/AAAAAAAAAl0/1eOpIm6TRQY/s600/boilerpipe-example.png" style="border: 1px solid gray; margin: 0pt auto;" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Whilst this kind of approach is never going to be perfect it seems, from initial testing, that it does indeed help to filter out erroneous results when searching through information extracted from large web based corpora.&lt;br /&gt;&lt;br /&gt;If you want to try it out yourself then it's already in the main GATE svn repository and the &lt;a href="http://hudson.gate.ac.uk/job/GATE-Nightly/lastSuccessfulBuild/"&gt;nightly builds&lt;/a&gt;. Details of how to configure the PR can be found in the relevant &lt;a href="http://gate.ac.uk/userguide/sec:misc-creole:boilerpipe"&gt;section of the GATE user guide&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-7917922513386542386?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/7917922513386542386/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/02/whats-actually-worth-reading.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7917922513386542386'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7917922513386542386'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/02/whats-actually-worth-reading.html' title='What&apos;s Actually Worth Reading?'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-6auvZqHsJBA/TVa5Y4p2wzI/AAAAAAAAAl0/1eOpIm6TRQY/s72-c/boilerpipe-example.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-5694283745181297903</id><published>2011-02-11T19:35:00.002Z</published><updated>2011-02-12T14:26:40.397Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='GATE'/><title type='text'>Numbers Have Real Value</title><content type='html'>&lt;img src="http://4.bp.blogspot.com/-5cRSKLjJBUk/TVWC_lX6PiI/AAAAAAAAAls/cF0xo3heTXg/s400/numbers.jpg" style="float: left; padding: 5px; width: 250px;" /&gt;So here is a question for you...&lt;br /&gt;&lt;br /&gt;What do the following numbers all have in common? 3^2, 2&lt;sup&gt;3&lt;/sup&gt;, 101, 3.3e3, 1/4, 9&lt;sup&gt;1/2&lt;/sup&gt;, 4x10^3, 5.5*4^5, thirty one, three hundred, four thousand one hundred and two, 3 million, and fünfundzwanzig.&lt;br /&gt;&lt;br /&gt;The answer is that they can all be recognized, annotated and converted to a real number representation (a Java Double) by a new GATE PR that has just been released and that I've just finished &lt;a href="http://gate.ac.uk/userguide/sec:misc-creole:numbers:numbers"&gt;documenting for the user guide&lt;/a&gt;. You may never have really thought about this before but it turns out that there are so many ways of writing numbers in text that recognising them is actually really quite difficult. If you also want to know the value of the number you have recognised then this adds an extra layer of complexity especially when the number is written out in words rather than digits.&lt;br /&gt;&lt;br /&gt;The PR actually started life back in 2009 for recognising numbers in patent documents as a precursor to recognising and normalizing measurements but since then has seen lots of development to extend the range of numbers that can be recognised. This new version is being used on a number of projects both to recognise numbers simply for the sake of finding numbers but also to help find drug doses, government spending and lots of generic measurements.&lt;br /&gt;&lt;br /&gt;Requests for code to recognising numbers and determine their value has cropped up a number of times on the GATE mailing list and whilst we had been using this code internally for a while we knew that there were issues with it and it had never been tidied up or documented to the extent where we would be happy to show it to other people! Having discovered yet-another-bug in the code a fortnight ago I decided to take the time to rewrite large chunks of the code in order to fix most of the outstanding issues and to increase the range of numbers we could recognise. Hopefully this has led to a more useful PR. If you'd like to try it out then you can find this PR in the Tagger_Numbers plugin within the main GATE svn repository and it's in the &lt;a href="http://hudson.gate.ac.uk/job/GATE-Nightly/lastSuccessfulBuild/"&gt;nightly builds&lt;/a&gt; as well.&lt;br /&gt;&lt;br /&gt;The plugin actually contains two PRs; Numbers Tagger and Roman Numerals Tagger. As you can guess by the name this second PR annotates Roman numerals appearing in documents. As with the main PR this also calculates the numeric value of the Roman numerals. I'm guessing that this PR is probably less useful than the main Numbers Tagger but we have found it to be helpful in the past when trying to recognise document sections, tables, figures etc. which can often be labelled with Roman numerals instead of Arabic numbers, e.g. Section VI, Table IV, Figure IIIa. If you are interested in the Roman Numerals Tagger then you can find more details in the &lt;a href="http://gate.ac.uk/userguide/sec:misc-creole:numbers:roman"&gt;user guide&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-5694283745181297903?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/5694283745181297903/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/02/numbers-have-real-value.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/5694283745181297903'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/5694283745181297903'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/02/numbers-have-real-value.html' title='Numbers Have Real Value'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-5cRSKLjJBUk/TVWC_lX6PiI/AAAAAAAAAls/cF0xo3heTXg/s72-c/numbers.jpg' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-6722444652032833932</id><published>2011-01-30T12:03:00.000Z</published><updated>2011-01-30T12:03:48.646Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='GATE'/><title type='text'>Schema Enforcer</title><content type='html'>&lt;img src="http://2.bp.blogspot.com/_xY_30cvz3_A/TURODmmhNZI/AAAAAAAAAk4/8OI6oTDESdw/s400/no-entry.png" style="background: white; border: 1px solid gray; float: left; margin: 5px; padding: 5px; width: 200px;" /&gt;In my &lt;a href="http://englishjavadrinker.blogspot.com/2011/01/gate-general-architecture-for-text.html"&gt;previous post&lt;/a&gt; I introduced you to &lt;a href="http://gate.ac.uk/"&gt;GATE&lt;/a&gt;, the software I use and help to develop at work. Over the last ten years I've developed a number of processing resources (PRs are like plugins) for GATE. Some of these plugins have made it into the main GATE distribution (the &lt;a href="http://gate.ac.uk/userguide/sec:parsers:chemistrytagger"&gt;Chemistry Tagger&lt;/a&gt; and the &lt;a href="http://gate.ac.uk/userguide/sec:parsers:npchunker"&gt;Noun Phrase Chunker&lt;/a&gt; being the most successful) whilst I've allowed others to slowly die. I still have quite a few that I've developed either for my own pet projects or for work that should really be made available for everyone to use. The problem tends to be that they need cleaning up and documenting before they are released. I've now made a start on cleaning up the PRs that I think are useful and in this post I'll introduce you to the first of these that I've managed to commit to the main GATE SVN repository; the Schema Enforcer.&lt;br /&gt;&lt;br /&gt;The idea for the Schema Enforcer started to germinate in my head during a long afternoon trying to teach people how to manually annotate documents using &lt;a href="http://gate.ac.uk/teamware/"&gt;GATE Teamware&lt;/a&gt;. In essence we want people who are familiar with a set of documents to markup the entities within the documents that they believe are interesting/relevant to a given task. We then treat these manually annotated documents as a gold standard for evaluating automatic systems that create the same annotations.&lt;br /&gt;&lt;br /&gt;It turns out that if you can pre-annotate the documents with an automatic system and have the annotators correct and add to existing annotations they not only find the task easier to understand but they tend to be able to annotate a document quicker which usualy saves us money.&lt;br /&gt;&lt;br /&gt;When processing a document in GATE you tend to find that applications create a lot of annotations that are not actually required. For example, GATE creates a &lt;code&gt;SpaceToken&lt;/code&gt; annotation over each blank space. These can be really useful when creating other more complex annotations but no human is ever going to need to look at them. So when pre-annotating documents for Teamware what I (and most other people do) is to simply create a new annotation set into which we copy any annotation types which we are asking the annotators to create or correct (we usually do this using the &lt;a href="http://gate.ac.uk/userguide/sec:misc-creole:ast"&gt;Annotation Set Transfer PR&lt;/a&gt; rather than by hand). The problem with simply copying annotations from one set to another is that this does nothing to check that the annotation features conform to any set of guidelines. Whilst odd features are less of an issue than intermediate or temporary annotations they can still be quite distracting.&lt;br /&gt;&lt;br /&gt;In Teamware, when starting an annotation process, you specify the annotations that can be created using &lt;a href="http://gate.ac.uk/userguide/sec:corpora:schemas"&gt;XML based annotation schmeas&lt;/a&gt;. These define the type of the annotation, it's features, and for some features the set of permitted values. For example here is a schema for defining a Location annotation.&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: xml"&gt;&amp;lt;?xml version="1.0"?&amp;gt;&lt;br /&gt;&amp;lt;schema xmlns="http://www.w3.org/2000/10/XMLSchema"&amp;gt;&lt;br /&gt;  &amp;lt;element name="Location"&amp;gt;&lt;br /&gt;    &amp;lt;complexType&amp;gt;&lt;br /&gt;      &amp;lt;attribute name="locType" use="required" value="other"&amp;gt;&lt;br /&gt;        &amp;lt;simpleType&amp;gt;&lt;br /&gt;          &amp;lt;restriction base="string"&amp;gt;&lt;br /&gt;            &amp;lt;enumeration value="region"/&amp;gt;&lt;br /&gt;            &amp;lt;enumeration value="airport"/&amp;gt;&lt;br /&gt;            &amp;lt;enumeration value="city"/&amp;gt;&lt;br /&gt;            &amp;lt;enumeration value="country"/&amp;gt;&lt;br /&gt;            &amp;lt;enumeration value="county"/&amp;gt;&lt;br /&gt;            &amp;lt;enumeration value="other"/&amp;gt;&lt;br /&gt;          &amp;lt;/restriction&amp;gt;&lt;br /&gt;        &amp;lt;/simpleType&amp;gt;  &lt;br /&gt;      &amp;lt;/attribute&amp;gt;&lt;br /&gt;&lt;br /&gt;      &amp;lt;attribute name="requires-attention" use="optional" type="boolean"/&amp;gt;         &lt;br /&gt;      &amp;lt;attribute name="comment"  use="optional" type="string"/&amp;gt;&lt;br /&gt;    &amp;lt;/complexType&amp;gt;&lt;br /&gt;  &amp;lt;/element&amp;gt;&lt;br /&gt;&amp;lt;/schema&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;You should be able to see from this that a Location annotation can have three features (referred to as attributes in the schema); locType, requires-attention, and comment. The last two features are fairly self explanatory but the locType feature requires a little explanation. Basically locType is an enumerated feature, that is it can only take on one of the six values specified in the schema. What this means is that an annotator cannot decide to create a Location annotation with a locType set to, for instance, beach as that is not one of the defined values. In this case they would probably set locType to other and use the comment feature to say that it is actually a beach. Also note that locType is a required feature which means you can't choose not to set it's value.&lt;br /&gt;&lt;br /&gt;The idea I had should now be obvious; why not use the schemas to drive the copying of annotations from one annotation set to another. After a little bit of experimenting this idea became the Schema Enforcer PR. Details of exactly how to use the PR can be found in the &lt;a href="http://gate.ac.uk/userguide/sec:misc-creole:schemaenforcer"&gt;main GATE manual&lt;/a&gt; but in essense the Schema Enforcer will copy an annotation if and only if....&lt;br /&gt;&lt;ul&gt;&lt;li&gt;the type of the annotation matches one of the supplied schemas, and&lt;/li&gt;&lt;li&gt;all required features are present and valid (i.e. meet the requirements for being copied to the 'clean' annotation)&lt;/li&gt;&lt;/ul&gt;Each feature of an annotation is copied to the new annotation if and only if....&lt;br /&gt;&lt;ul&gt;&lt;li&gt;the feature name matches a feature in the schema describing the annotation,&lt;/li&gt;&lt;li&gt;the value of the feature is of the same type as specified in the schema, and&lt;/li&gt;&lt;li&gt;if the feature is defined, in the schema, as an enumerated type then the value must match one of the permitted values&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;I've now made use of this PR in two different projects and it really does make life easier. Not only can I be sure that annotations people get to correct in Teamware actually match the annotation guidelines, but it provides a really easy way of producing a 'clean' annotation set as the output of a GATE application, but don't just take my word for it!&lt;br /&gt;&lt;blockquote&gt;nice one, mark - very useful! i've had these problems before too, but used jape grammars instead - your approach is much nicer!&lt;/blockquote&gt;&lt;blockquote&gt;I think it would be nice if whoever gets to teach Teamware at FIG doesn't get snagged by the non-standard annotations that came up on Tuesday.  ;-)&lt;/blockquote&gt;So if you already develop GATE applications and think that you'd like to add the Schema Enforcer to your pipeline you can find it in the main &lt;a href="http://sourceforge.net/projects/gate/develop"&gt;GATE SVN repository&lt;/a&gt; or just grab a recent &lt;a href="http://hudson.gate.ac.uk/job/GATE-Nightly/lastSuccessfulBuild/"&gt;nightly build&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-6722444652032833932?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/6722444652032833932/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/01/schema-enforcer.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6722444652032833932'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6722444652032833932'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/01/schema-enforcer.html' title='Schema Enforcer'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_xY_30cvz3_A/TURODmmhNZI/AAAAAAAAAk4/8OI6oTDESdw/s72-c/no-entry.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-6214787133118856573</id><published>2011-01-22T17:53:00.000Z</published><updated>2011-01-23T18:33:18.592Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='GATE'/><title type='text'>GATE: General Architecture for Text Engineering</title><content type='html'>&lt;img border="0" height="150" src="http://2.bp.blogspot.com/_xY_30cvz3_A/TTsTUU9dwzI/AAAAAAAAAkU/4cdrtDye3XA/s200/splash.png" style="float: left; padding: 10px;" width="200" /&gt;So far I've only talked about code that I've developed or played around with in my own time. In preparation for future blog posts I thought I'd spend a little time talking about the code I'm paid to work on.&lt;br /&gt;&lt;br /&gt;As some of you may already know I work in the &lt;a href="http://www.shef.ac.uk/dcs/"&gt;Department of Computer Science&lt;/a&gt; at the &lt;a href="http://www.shef.ac.uk/"&gt;University of Sheffield&lt;/a&gt;. I work in the &lt;a href="http://nlp.shef.ac.uk/"&gt;Natural Language Processing Group&lt;/a&gt; (NLP) where my interests have focused on information extraction -- getting useful information about entities and events from unstructured text such as newspaper articles or blog posts. The main piece of software that makes this work possible is &lt;a href="http://gate.ac.uk/"&gt;GATE&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;GATE is a General Architecture for Text Engineering. This means that it provides both the basic components required for building applications that work with natural language as well as a framework in which these components can be easily linked together and reused.&amp;nbsp; The fact that I never have to worry about basic processing such as tokenization (splitting text into individual words and punctuation), sentence splitting, and part-of-speech tagging means that I'm free to concentrate on extracting information from the text. I've used GATE since 2001 when I started work on my PhD. For the last two years I've been employed as part of the &lt;a href="http://gate.ac.uk/people/"&gt;core GATE team&lt;/a&gt;. Technically I'm not paid to develop GATE (I don't think any of us actually are) but the projects we work on all rely on GATE and so we contribute new plugins or add new features as the need arises.&lt;br /&gt;&lt;br /&gt;One of the things I really like about working on GATE is that it is open-source software (released under the &lt;a href="http://www.gnu.org/licenses/lgpl-3.0.html"&gt;LGPL&lt;/a&gt;) which means not only am I free to talk about the work I do but also anyone is able to freely use and contribute to the development. This also means that GATE has been adopted by a large number of &lt;a href="http://gate.ac.uk/commercial.html"&gt;companies&lt;/a&gt; and universities around the world for all sorts of interesting tasks -- I'm currently involved in three projects that involve GATE being used for cancer research, mining of medical records and government transparency.&lt;br /&gt;&lt;br /&gt;So if you are interested in text engineering and you haven't heard of GATE 1) shame on you and 2) go &lt;a href="http://gate.ac.uk/download/"&gt;try it out&lt;/a&gt; and see just what it can do. And for those of you who don't do need to process text at least you'll know what I'm talking about when I refer to it in future posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-6214787133118856573?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/6214787133118856573/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/01/gate-general-architecture-for-text.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6214787133118856573'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6214787133118856573'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/01/gate-general-architecture-for-text.html' title='GATE: General Architecture for Text Engineering'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_xY_30cvz3_A/TTsTUU9dwzI/AAAAAAAAAkU/4cdrtDye3XA/s72-c/splash.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-7252077889800045337</id><published>2011-01-04T12:41:00.000Z</published><updated>2011-01-04T12:41:45.854Z</updated><title type='text'>SVN Paths Are Case Sensitive</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_xY_30cvz3_A/TSMRnqSe5DI/AAAAAAAAAi4/twa6HC8j5MY/s1600/svn.png" imageanchor="1" style="clear: left; float: left; margin-bottom: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="172" src="http://3.bp.blogspot.com/_xY_30cvz3_A/TSMRnqSe5DI/AAAAAAAAAi4/twa6HC8j5MY/s200/svn.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;Over the last couple of days I've been busy re-installing the computer that runs my SVN repository (it runs other things as well but that isn't so important). It's a Windows machine and it had finally reached the point where the only solution to the BSODs it kept suffering was a full re-install.&lt;br /&gt;&lt;br /&gt;I've never been particularly good at making regular backups of things and while I've suffered a fair amount of hardware failures over the years I've never really lost anything important. In fact the SVN repository itself has saved me from a disk crash recently. So at the same time as the re-install I thought I should setup a proper back schedule and organize my data a little more carefully.&lt;br /&gt;&lt;br /&gt;So I now have two disks in the machine that are not used for day-to-day stuff. One drive holds the live copy of the SVN repository (as well as the Hudson home dir, Tomcat webapps and associated MySQL data). The other drive holds backup copies of everything.&lt;br /&gt;&lt;br /&gt;This re-organization meant that local path access to the SVN repository changed (the external &lt;code&gt;svn://&lt;/code&gt; URL stayed the same), which meant I had to update the Hudson configurations (which use local file access for performance) to use the new paths. &lt;br /&gt;&lt;br /&gt;So I went through each of the 12 jobs in Hudson and changed the paths accordingly. I checked a few of the projects and they built without any problems so I assumed that was job done. Then this morning I noticed that all 12 jobs were being built every 10 minutes as polling SVN always reported that the workspace didn't contain a checkout of the correct folder. The path it was showing was correct so at first glance nothing appeared wrong. After messing around at the command line for a bit I eventually figured out the problem.&lt;br /&gt;&lt;br /&gt;Basically I'd changed from URLs starting &lt;code&gt;file:///z:/SVN&lt;/code&gt; to URLs starting &lt;code&gt;file:///L:/SVN&lt;/code&gt;. For some reason I'd typed in the drive letter as a capital (the way Windows displays it) rather than in lower case. It turns out that while SVN is happy to do a checkout from the capital letter version it stores the URL in the checked out copy using a lowercase drive letter, and hence on future updates the two don't match. Fixing the jobs to access URLs starting &lt;code&gt;file:///l:/SVN&lt;/code&gt; fixed the problem.&lt;br /&gt;&lt;br /&gt;Bizarrely Hudson didn't complain about the problem so the builds all succeeded it's just that there was an awful lot of wasted CPU time over the last day or so!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-7252077889800045337?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/7252077889800045337/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/01/svn-paths-are-case-sensitive.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7252077889800045337'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7252077889800045337'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2011/01/svn-paths-are-case-sensitive.html' title='SVN Paths Are Case Sensitive'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xY_30cvz3_A/TSMRnqSe5DI/AAAAAAAAAi4/twa6HC8j5MY/s72-c/svn.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-8941337043517498321</id><published>2010-12-17T17:10:00.000Z</published><updated>2010-12-17T19:47:49.918Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='JSpell'/><title type='text'>JSpell Added To Softpedia</title><content type='html'>&lt;img src="http://1.bp.blogspot.com/_xY_30cvz3_A/TQuXHtZS2NI/AAAAAAAAAiY/WxBE7VoycKI/s320/softpedia.jpg" style="float: right; height: 240px; padding-bottom: 5px; padding-right: 5px;" /&gt;A long time ago, and for reasons I can't now remember, I wrote a small spelling correction library called JSpell. It's been available on &lt;a href="http://www.dcs.shef.ac.uk/%7Emark/dev/java/index.html"&gt;my website&lt;/a&gt; since 2005 and hasn't seen any updates since then. In fact it was all so long ago I'd completely forgotten about it until Sunday when I received an e-mail.&lt;br /&gt;&lt;blockquote&gt;&lt;i&gt;Congratulations,&lt;br /&gt;&lt;br /&gt;JSpell, one of your products, has been added to Softpedia's database of software programs for Mac OS. It is featured with a description text, screenshots, download links and technical details on this page:&lt;br /&gt;&lt;a href="http://mac.softpedia.com/get/Development/Libraries/JSpell.shtml"&gt;http://mac.softpedia.com/get/Development/Libraries/JSpell.shtml&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The description text was created by our editors, using sources such as text from your product's homepage, information from its help system, the PAD file (if available) and the editor's own opinions on the program itself.&lt;/i&gt;&lt;/blockquote&gt;So after nearly six years of neglect I've revisited JSpell. It's now available from &lt;a href="http://greenwoodma.servehttp.com/svn/repos/open-source/list/JSpell/trunk"&gt;my subversion repository&lt;/a&gt; and is &lt;a href="http://greenwoodma.servehttp.com/hudson/job/JSpell/"&gt;being compiled by Hudson&lt;/a&gt;. I've already fixed a couple of bugs that were present in the previous version and have a few ideas for improvements.&lt;br /&gt;&lt;br /&gt;I'm not entirely sure why it's only been added to the Mac OS section (especially as the build file is a Windows batch file), but I'll raise that with them once I have a new version ready for release -- one with a platform agnostic build script.&lt;br /&gt;&lt;br /&gt;It might be old code but it's nice to know that at least once person thinks it might be useful.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-8941337043517498321?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/8941337043517498321/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/12/jspell-added-to-softpedia.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8941337043517498321'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8941337043517498321'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/12/jspell-added-to-softpedia.html' title='JSpell Added To Softpedia'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_xY_30cvz3_A/TQuXHtZS2NI/AAAAAAAAAiY/WxBE7VoycKI/s72-c/softpedia.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-819505553610721667</id><published>2010-12-04T16:15:00.000Z</published><updated>2010-12-05T10:38:20.395Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='3DAssembler'/><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><title type='text'>3D Portraits</title><content type='html'>&lt;img height="255" src="http://2.bp.blogspot.com/_xY_30cvz3_A/TPpnkM5sLII/AAAAAAAAAhs/R5Haa03sBi4/s400/Wall+Heads.png" style="float: left; padding-bottom: 5px; padding-right: 5px;" width="400" /&gt;On my recent trip to &lt;a href="http://englishcoffeedrinker.blogspot.com/search/label/Beijing"&gt;Beijing&lt;/a&gt; I took a number of 3D images, quite a few of which were portrait instead of landscape. For example, here is a 3D view of some building decoration in the Forbidden City.&lt;br /&gt;&lt;br /&gt;It seemed daft to have to rotate the images before loading them into &lt;a href="http://englishcoffeedrinker.blogspot.com/2010/07/3dassembler.html"&gt;3DAssembler&lt;/a&gt; so on the flight home I implemented the required functionality. So today sees the release of v2.1.0 of 3DAssembler which includes the ability to produce portrait as well as landscape views and also sees a fix to a rather nasty memory leak. Enjoy!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-819505553610721667?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/819505553610721667/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/12/3d-portraits.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/819505553610721667'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/819505553610721667'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/12/3d-portraits.html' title='3D Portraits'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_xY_30cvz3_A/TPpnkM5sLII/AAAAAAAAAhs/R5Haa03sBi4/s72-c/Wall+Heads.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-6589169478400208051</id><published>2010-11-26T18:24:00.000Z</published><updated>2010-11-26T18:26:03.252Z</updated><category scheme='http://www.blogger.com/atom/ns#' term='JavaScript'/><category scheme='http://www.blogger.com/atom/ns#' term='Simple QuickTime Embedding'/><title type='text'>Alternative Content</title><content type='html'>I haven't had lots of free time recently, hence the lack of posts on this blog. Hopefully that will change over Christmas and I'll have the time to catch up on some of my own projects. Today's post is just a quick note about a new version of my &lt;a href="http://englishcoffeedrinker.blogspot.com/2008/01/simple-quicktime-movie-embedding.html"&gt;Simple QuickTime Embedding&lt;/a&gt; script.&lt;br /&gt;&lt;br /&gt;In the previous version if QuickTime wasn't installed then the script could be configured to either redirect to a given URL or to call a JavaScript function. Whilst this was quite flexible it did require some programming skills in order to provide useful alternative content. The new version of the script adds an easier method.&lt;br /&gt;&lt;br /&gt;Now if QuickTime isn't installed and within the player &lt;code&gt;DIV&lt;/code&gt; there is a &lt;code&gt;DIV&lt;/code&gt; with the CSS class &lt;code&gt;alt-content&lt;/code&gt; then the content of this &lt;code&gt;DIV&lt;/code&gt; will be used to fully replace the content of the main player &lt;code&gt;DIV&lt;/code&gt;. It's easier to view the &lt;a href="http://www.dcs.shef.ac.uk/%7Emark/blog/blog_files/software/web/quicktime/index.html"&gt;examples&lt;/a&gt; than it is to explain it! Hopefully this will make the script more flexible and hence more useful to a wider range of users.&lt;br /&gt;&lt;br /&gt;With this release I've also moved the code into my &lt;a href="http://greenwoodma.servehttp.com/svn/repos/open-source/list/SimpleQuickTimeEmbedding/trunk/"&gt;SVN repository&lt;/a&gt; so interested parties can monitor future development (and I can more easily figure out what I changed between releases).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-6589169478400208051?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/6589169478400208051/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/11/alternative-content.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6589169478400208051'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6589169478400208051'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/11/alternative-content.html' title='Alternative Content'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-4271581707988114318</id><published>2010-09-25T21:25:00.000+01:00</published><updated>2010-09-25T21:25:43.129+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='SVGRoundTrip'/><title type='text'>SVGRoundTrip, Part 3b: Convert SVG to Java 2D</title><content type='html'>Having re-read the previous post I realised that the stuff on supporting unsupported SVG elements could have been clearer. Somehow I doubt I could write it any better so I thought I'd show an example instead.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;img border="0" src="http://1.bp.blogspot.com/_xY_30cvz3_A/TJ5ZqaPiOrI/AAAAAAAAAfY/Qr8A3TIx55E/s800/unsupported-elements.png" /&gt;&lt;/div&gt;&lt;br /&gt;The SVG file used in these three examples contains both embedded raster images and text nodes which, by default, are unsupported. If we specify that we wish to skip unsupported SVG elements (&lt;code&gt;-u skip&lt;/code&gt;) then what we get is a Java class that will draw the left hand image. If we specify that we want to add image support (&lt;code&gt;-i&lt;/code&gt;) then we get the middle image. And finally, if we say that unsupported SVG elements should be treated as images (&lt;code&gt;-u image&lt;/code&gt;) then we get the right hand image.&lt;br /&gt;&lt;br /&gt;Whilst it may look like the right hand image is best what you have to remember is that only the elements in the left most image will scale without a loss of quality.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-4271581707988114318?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/4271581707988114318/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-3b-convert-svg-to.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/4271581707988114318'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/4271581707988114318'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-3b-convert-svg-to.html' title='SVGRoundTrip, Part 3b: Convert SVG to Java 2D'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_xY_30cvz3_A/TJ5ZqaPiOrI/AAAAAAAAAfY/Qr8A3TIx55E/s72-c/unsupported-elements.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-2380735792796027822</id><published>2010-09-25T18:38:00.000+01:00</published><updated>2010-09-25T20:47:55.569+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='SVGRoundTrip'/><title type='text'>SVGRoundTrip, Part 3: Convert SVG to Java 2D</title><content type='html'>As I mentioned in the &lt;a href="http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-1-or-why-use-svg.html"&gt;first post on SVGRoundTrip&lt;/a&gt;, my initial interest in SVG images came from wanting scalable images that I could use within the UI of my applications.&lt;br /&gt;&lt;br /&gt;When I originally started work on &lt;a href="http://englishcoffeedrinker.blogspot.com/2010/07/3dassembler.html"&gt;3DAssembler&lt;/a&gt; I used PNG versions of the icons from the Tango and Gnome Desktop icon sets as I didn't know how to deal with the original SVG versions. During development I noticed that the &lt;a href="https://flamingo.dev.java.net/"&gt;Flamingo component suite&lt;/a&gt; (which I was using for the ribbon) also contained support for converting SVG files into Java 2D based classes that would draw the icons at the correct size we required. For version 2 of 3DAssembler I changed from all PNG images to all SVG and the different sized icons look much better. Whilst most of the SVG files I wanted to used converted without any problems some didn't and so I set about figuring out why.&lt;br /&gt;&lt;br /&gt;The first problem that I noticed was that the clipping wasn't correct. This usually showed up where I had an image that went outside the bounds of the SVG page size. When exporting to a PNG from &lt;a href="http://inkscape.org/"&gt;Inkscape&lt;/a&gt; everything outside the page boundaries was removed and I expected the same behavior when the images were converted to Java code. It was definitely a bug as the image being drawn in my app was actually a lot wider than it should have been and because it wasn't being clipped was actually spilling out over other components. I initially fixed the issue by simply setting a clip on the &lt;code&gt;Graphics2D&lt;/code&gt; object before passing it to the generated Java code. Whilst this fixed the overflow issue it didn't fix clipping issues within the bounds of the image. Each element in an SVG file can specify a clip and this was being ignored when the code within Flamingo was converting the SVG file to Java code. After a little trial and error I managed to add support for clipping any SVG element that required it.&lt;br /&gt;&lt;br /&gt;The second problem related to scaling. Whilst SVG images can be scaled at will without a loss in quality they are originally drawn at a specific size. Knowing how to scale all the SVG elements requires knowing this original size. Unfortunately, the code in the Flamingo library didn't extract the page size from the original SVG file rather it used the bounds of the image (i.e. the rectangle that fully encloses all the elements) for scaling. Usually (at least in the files I was using) the bounding rectangle only differed from the page size by a pixel or so and therefore the images drawn using the wrong information were almost identical to those using the correct page size. The problem really only appeared when I fixed the clipping issue. With correct clipping I kept seeing the right/bottom column/row of the image being clipped. It turns out that the correct page size information is available (although it's well buried) in the information provided by &lt;a href="http://xmlgraphics.apache.org/batik/"&gt;Batik&lt;/a&gt; when it parses the SVG file so it was fairly easy to fix this problem as well.&lt;br /&gt;&lt;br /&gt;The final problem that I noticed was that the code wasn't transcoding alpha values properly either. Each SVG element can have an alpha value associated with it to set the transparency. The problem arose when elements were nested. So for example if you have a set of nested shapes and you set alpha to 0.5 for the root element all the nested elements should have this value to start with before any other values are applied. Again once I knew what the problem was fixing it was easy.&lt;br /&gt;&lt;br /&gt;These three fixes allowed me to convert all the SVG files I wanted to use for 3DAssembler without any problems. I patched my copy of Flamingo and then posted the patch to the &lt;a href="https://flamingo.dev.java.net/servlets/ProjectForumMessageView?forumID=3339&amp;amp;messageID=36936"&gt;Flamingo discussion forum&lt;/a&gt;. I would probably have left it at that, but a few days later a message was posted by the developer of Flamingo to say that he was no longer supporting or developing the library, and so SVGRoundTrip was born to hold these fixes but also as a place for new features and other SVG related ideas.&lt;br /&gt;&lt;br /&gt;The original library (as well as having the bugs outlined above) doesn't support all SVG files as it doesn't support embedded raster images or text -- there might be other things as well but these are the things I know about. So I set about adding support for these as well. Supporting embedded raster images is easy, I just store the images on disk and then add code to the generated classes to read the images back when they are needed. Supporting all other unsupported elements (including text) is also easy -- I just convert the elements to images and then treat them like raster nodes. This isn't ideal as now they don't scale well and so by default raster images and other elements not supported by the original library are still not supported.&lt;br /&gt;&lt;br /&gt;Anyway, enough waffle, I've wrapped up all the SVG to Java 2D code into a simple to use command line application (it can also be accessed via an API), the usage of which is given below&lt;br /&gt;&lt;br /&gt;&lt;pre&gt;SVGBatchConverter: Convert SVG files into Java2D based classes&lt;br /&gt;&lt;br /&gt;Usage: java -jar SVGRoundTrip.jar [OPTIONS] SVGFile1 ... SVGFileN&lt;br /&gt;  -f       by defualt classes are not generated if they exist and are newer&lt;br /&gt;           than the SVG files, use this option to force them to be regenerated&lt;br /&gt;  -i       by default raster images embedded in SVG files are not supported&lt;br /&gt;           enabling this option generates PNG files for raster nodes which&lt;br /&gt;           are loaded and drawn as required by the generated code&lt;br /&gt;  -n name  classname format -- by default the generated class will have the&lt;br /&gt;           same name as the SVG file use this option to specify&lt;br /&gt;           'prefix+suffix', if the format doesn't contain a + then it just&lt;br /&gt;           specifies a prefix&lt;br /&gt;  -o dir   output directory (required), this should be the root of the source&lt;br /&gt;           tree the correct sub-directory will be used based upon specified&lt;br /&gt;           package name&lt;br /&gt;  -p name  the name of the package the generated classes should be a member of&lt;br /&gt;  -t name  the template name or a file containing a template -- valid template&lt;br /&gt;           names are currently 'plain' and 'flamingo', see the docs for details&lt;br /&gt;           if unspecified the 'plain' template will be used&lt;br /&gt;  -u name  specify how to treat unsupported SVG elements -- possible values are&lt;br /&gt;           'fail' the default behaviour which causes conversion to fail&lt;br /&gt;           'skip' unsupported elements will be skipped, conversion will succeed&lt;br /&gt;           'image' unsupported elements will be converted to images to maintain&lt;br /&gt;           the look of the original SVG file -- image support must (-i) must&lt;br /&gt;           also be enabled for this to work&lt;br /&gt;&lt;br /&gt;If directories are specified instead of single SVG files then all SVG files in&lt;br /&gt;the directories will be converted&lt;/pre&gt;So now you know how I convert SVG files into Java 2D based classes for use in my applications. If you want to try this yourself then &lt;a href="http://greenwoodma.servehttp.com/svn/repos/open-source/list/SVGRoundTrip/trunk/"&gt;check out the code&lt;/a&gt; and have a go.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-2380735792796027822?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/2380735792796027822/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-3-convert-svg-to-java.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/2380735792796027822'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/2380735792796027822'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-3-convert-svg-to-java.html' title='SVGRoundTrip, Part 3: Convert SVG to Java 2D'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-8565836718254111756</id><published>2010-09-21T19:21:00.000+01:00</published><updated>2010-09-21T19:21:53.900+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='SVGRoundTrip'/><title type='text'>SVGRoundTrip, Part 2: In, Out, and Shake It All About!</title><content type='html'>I've done quite a bit more work on SVGRoundTrip, and while I don't yet want to highlight specific API features (as they may well change over the next few days), I did want to show an example.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;img border="0" src="http://lh4.ggpht.com/_xY_30cvz3_A/TJj1sM33jbI/AAAAAAAAAec/kL5FAUq50Dg/s800/live-svg-in.png" /&gt;&lt;/div&gt;&lt;br /&gt;This shows three version of a test application I've developed for SVGRoundTrip. The first image is a standard screen shot of the running test app, the second image is from the exported SVG file generated by the library (I loaded the SVG file into &lt;a href="http://inkscape.org/"&gt;Inkscape&lt;/a&gt; and then exported a PNG), and the last image shows the exported SVG being loaded into a different test application and then saved as an image.&lt;br /&gt;&lt;br /&gt;It's not perfect but, as work-in-progress, I'm pretty happy with the results.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-8565836718254111756?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/8565836718254111756/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-2-in-out-and-shake-it.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8565836718254111756'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/8565836718254111756'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-2-in-out-and-shake-it.html' title='SVGRoundTrip, Part 2: In, Out, and Shake It All About!'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_xY_30cvz3_A/TJj1sM33jbI/AAAAAAAAAec/kL5FAUq50Dg/s72-c/live-svg-in.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-6514023548200023915</id><published>2010-09-19T16:15:00.000+01:00</published><updated>2010-09-19T16:18:51.736+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><category scheme='http://www.blogger.com/atom/ns#' term='SVGRoundTrip'/><title type='text'>SVGRoundTrip, Part 1: Why Use SVG?</title><content type='html'>Until recently, whenever I was designing the interface for a new application I'd usually use raster images, normally PNG files, for button icons etc. The advantage of raster images is that I can use many different graphics programs to edit the files and I have control over the appearance of every single pixel. The downside is that the images don't scale well. For example, lets say that I start out using 16x16 pixel images for toolbar buttons. I then realise that as I don't have many buttons it would make more sense to have bigger buttons, say 32x32 pixels. I could just scale the small images up but they would look horrid, so I'd have to go back and create a whole new set of images.&lt;br /&gt;&lt;br /&gt;SVG files (&lt;a href="http://en.wikipedia.org/wiki/Scalable_Vector_Graphics"&gt;Scalable Vector Graphics&lt;/a&gt;) do, however, scale well. Their ability to scale comes from the fact that instead of recording pixel information they store shape and colour information -- a square is stored by knowing the points of the four corners. This makes it easy to scale the images without losing quality. The downside is that there are less applications I can use to edit them (&lt;a href="http://inkscape.org/"&gt;Inkscape&lt;/a&gt; is the best I've found) and more importantly Java has no native support for displaying them. Fortunately the &lt;a href="http://xmlgraphics.apache.org/batik/"&gt;Batik&lt;/a&gt; library from Apache can load and convert SVG files so I can use them in my applications. Unfortunately there are a couple of downsides to using Batik:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The Batik distribution is huge, just the main JAR file weighs in at over 3MB. Given that I tend to release applications via Java Web Start I don't really want to add 3MB to the download time.&lt;/li&gt;&lt;li&gt;Converting SVG files into a format that Java can display seems to be quite CPU and memory intensive, which leads to slow startup times as soon as you have just a few files to convert.&lt;/li&gt;&lt;/ol&gt;The &lt;a href="https://flamingo.dev.java.net/"&gt;Flamingo component suite&lt;/a&gt; (that I've used to develop the GUI for both &lt;a href="http://englishcoffeedrinker.blogspot.com/2009/11/trotternish-pumpkin.html"&gt;PhotoGrid&lt;/a&gt; and &lt;a href="http://englishcoffeedrinker.blogspot.com/2010/07/3dassembler.html"&gt;3DAssembler&lt;/a&gt;) sidesteps these issues by converting the SVG files into Java classes at development time. This means you don't need Batik at run-time (reducing the download size of the application) and the time/resources to convert the files is done once and not at each application startup. Perfect! Unfortunately the developer of Flamingo has recently announced that he is &lt;a href="https://substance.dev.java.net/servlets/ReadMsg?list=users&amp;amp;msgNo=1546"&gt;suspending support and development&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;As I don't want to rely on code that is no longer being supported or developed, especially as I know there are some issues with the SVG related code, I've decided to start work on a new library called SVGRoundTrip. The library will be based on Batik and incorporate ideas from Flamingo with the aim of supporting both the display of SVG files in Java applications as well as the production of SVG files from within Java applications. I'm starting by adding support for things I need but will be happy to added support for more features as time goes by.&lt;br /&gt;&lt;br /&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_xY_30cvz3_A/TJYl8R8NyaI/AAAAAAAAAeI/nvJsKMZ4Wi0/s320/logos.png" style="float: left; padding-bottom: 5px; padding-right: 10px;" /&gt;I'm going to write a number of posts highlighting different aspects of the library but for now I just wanted to show an example of why using SVG files is a much better idea than relying on raster image formats. The image on the left shows five different renderings of the same &lt;a href="http://greenwoodma.servehttp.com/svn/repos/open-source/get/SVGRoundTrip/trunk/docs/Logo.svg?revision=HEAD"&gt;SVG file&lt;/a&gt;: 16x16, 32x32, 48x48, 64x64 and 200x200. The SVG file stores the 48x48 pixel version but as you can see there is no loss in quality as we scale up or down in size. &lt;br /&gt;&lt;br /&gt;Whilst SVGRoundTrip is by no means finished the example image was actually produced using the library to load an SVG file into a Java application, draw it a different sizes and then save the image to a PNG file. If you can't wait until the next exciting installment of this blog then you can follow development from the &lt;a href="http://greenwoodma.servehttp.com/hudson/job/SVGRoundTrip/"&gt;SVGRoundTrip page on my Hudson server&lt;/a&gt;, or grab the source from the &lt;a href="http://greenwoodma.servehttp.com/svn/repos/open-source/list/SVGRoundTrip"&gt;subversion repository&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-6514023548200023915?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/6514023548200023915/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-1-or-why-use-svg.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6514023548200023915'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/6514023548200023915'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/svgroundtrip-part-1-or-why-use-svg.html' title='SVGRoundTrip, Part 1: Why Use SVG?'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_xY_30cvz3_A/TJYl8R8NyaI/AAAAAAAAAeI/nvJsKMZ4Wi0/s72-c/logos.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-7745293719776486105</id><published>2010-09-12T16:27:00.000+01:00</published><updated>2010-09-12T16:27:59.691+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Java'/><title type='text'>PNG Keywords</title><content type='html'>I've written a number of applications for manipulating photos and most produce PNG files as output. I output PNG instead of JPEG for two reasons; support for transparency and lossless compression. The downside to using PNG files is that there is no default support for adding metadata, such as EXIF information, to the images. In most situations this isn't a problem, but I decided it would be nice to allow the user to add a title or copyright information to the images. Now PNG files don't support EXIF but they do support textual key-value pairs.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.libpng.org/pub/png/spec/iso/index-object.html#11textinfo"&gt;Section 11.3.4&lt;/a&gt; of the &lt;a href="http://www.libpng.org/pub/png/spec/iso/index-object.html"&gt;PNG Specification&lt;/a&gt; details the support within the file format for textual information. Text is stored within PNG files as key-value pairs and the specification gives the following list of default keywords:&lt;br /&gt;&lt;table style="margin: 0 auto;" summary="This table defines the keywords defined for tEXt, iTXt and zTXt chunks"&gt;&lt;tbody&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Title&lt;/td&gt; &lt;td class="Regular"&gt;Short (one line) title or caption for image&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Author&lt;/td&gt; &lt;td class="Regular"&gt;Name of image's creator&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Description&lt;/td&gt; &lt;td class="Regular"&gt;Description of image (possibly long)&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Copyright&lt;/td&gt; &lt;td class="Regular"&gt;Copyright notice&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Creation Time&lt;/td&gt; &lt;td class="Regular"&gt;Time of original image creation&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Software&lt;/td&gt; &lt;td class="Regular"&gt;Software used to create the image&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Disclaimer&lt;/td&gt; &lt;td class="Regular"&gt;Legal disclaimer&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Warning&lt;/td&gt; &lt;td class="Regular"&gt;Warning of nature of content&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Source&lt;/td&gt; &lt;td class="Regular"&gt;Device used to create the image&lt;/td&gt; &lt;/tr&gt;&lt;tr&gt; &lt;td class="Regular"&gt;Comment&lt;/td&gt; &lt;td class="Regular"&gt;Miscellaneous comment&lt;/td&gt; &lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;Whilst it makes sense to stick with these keywords (so other software can make use of the information) the specification also states that &lt;i&gt;other keywords may be defined for other purposes&lt;/i&gt;. Currently I can't think of any information that I want to add to PNG files that isn't covered by the default keyword list, all I needed to do was figure out how to actually add the information.&lt;br /&gt;&lt;br /&gt;I assumed that I'd be able to quickly find some code on the Internet for doing this kind of thing. Unfortunately it turns out that there are plenty of web sites that describe in detail how to add/retrieve metadata from JPEG images (including EXIF and IPTC), but I couldn't find a single useful example of adding information to PNG files and so I had to figure it out for myself. The applications I wanted to add this feature to are all written in Java and so I headed to the documentation to see what I could find.&lt;br /&gt;&lt;br /&gt;I was writing PNG files using the static convenience methods of &lt;a href="http://download.oracle.com/javase/6/docs/api/javax/imageio/package-summary.html"&gt;javax.ImageIO&lt;/a&gt; which don't allow for much customization; you pass an image, a file handle and the format name and it uses default values to write the image to disk. Fortunately you can use the classes directly and have a lot more control over the processing, including altering any associated metadata.&lt;br /&gt;&lt;br /&gt;The ImageIO package uses an XML tree structure to represent metadata and there is a DTD describing the supported metadata for each image format. The DTD describing &lt;a href="http://download.oracle.com/javase/6/docs/api/javax/imageio/metadata/doc-files/png_metadata.html"&gt;PNG metadata&lt;/a&gt; includes the elements for storing textual information and it was fairly straightforward to write code to add new elements to the structure.&lt;br /&gt;&lt;br /&gt;While testing the code I noticed that as well as the native PNG metadata there was also support for a &lt;a href="http://download.oracle.com/javase/6/docs/api/javax/imageio/metadata/doc-files/standard_metadata.html"&gt;plugin neutral metadata format&lt;/a&gt;. I converted my code to use this format instead and got the same results as before. So why, you ask, would I want to do this?&lt;br /&gt;&lt;br /&gt;If you use the neutral metadata format then the image writers convert this into their own metadata format when writing out the image. This means that I could specify, for example, the title of the image and it would appear in a PNG file but it would also get converted into a JPEG header comment if I switched output formats. There is no guarantee that information in the standard metadata format will be preserved by the different plugins so you need to experiment a little (for example if you specify multiple text elements only one of them gets retained as the JPEG comment element).&lt;br /&gt;&lt;br /&gt;So without further ado here is the method I wrote to save a PNG file with embedded keywords.&lt;br /&gt;&lt;pre class="brush: java"&gt;public static void writeImage(RenderedImage image,&lt;br /&gt;        Map&amp;lt;String, String&amp;gt; keywords, File file) throws IOException {&lt;br /&gt;&lt;br /&gt;    ImageWriter writer = null;&lt;br /&gt;    OutputStream out = null;&lt;br /&gt;    ImageOutputStream ios = null;&lt;br /&gt;&lt;br /&gt;    try {&lt;br /&gt;        // find a writer for the image format&lt;br /&gt;        Iterator&amp;lt;ImageWriter&amp;gt; iter = ImageIO.getImageWritersByFormatName("png");&lt;br /&gt;        if (iter.hasNext())&lt;br /&gt;            writer = iter.next();&lt;br /&gt;&lt;br /&gt;        if (writer == null)&lt;br /&gt;            throw new IOException("Can't Write PNG Files!");&lt;br /&gt;&lt;br /&gt;        // get the default writer parameters&lt;br /&gt;        ImageWriteParam iwparam = writer.getDefaultWriteParam();&lt;br /&gt;&lt;br /&gt;        // get the default metadata that we will add to&lt;br /&gt;        IIOMetadata metadata = writer.getDefaultImageMetadata(&lt;br /&gt;                new ImageTypeSpecifier(image), iwparam);&lt;br /&gt;&lt;br /&gt;        // if there are keywords then...&lt;br /&gt;        if (keywords != null &amp;amp;&amp;amp; keywords.size() &amp;gt; 0) {&lt;br /&gt;            // if we are not allowed to edit the standard metadata then...&lt;br /&gt;            if (metadata.isReadOnly()&lt;br /&gt;                    || !metadata.isStandardMetadataFormatSupported())&lt;br /&gt;                throw new IOException("Metadata Cannot Be Edited!");&lt;br /&gt;&lt;br /&gt;            // create a "Text" node to hold the keywords&lt;br /&gt;            IIOMetadataNode text = new IIOMetadataNode("Text");&lt;br /&gt;&lt;br /&gt;            for (Map.Entry&amp;lt;String, String&amp;gt; keyword : keywords.entrySet()) {&lt;br /&gt;                // copy each keyword/value pair into a node&lt;br /&gt;                IIOMetadataNode node = new IIOMetadataNode("TextEntry");&lt;br /&gt;                node.setAttribute("keyword", keyword.getKey());&lt;br /&gt;                node.setAttribute("value", keyword.getValue());&lt;br /&gt;&lt;br /&gt;                // PNG files only support Latin-1 characters&lt;br /&gt;                // hence the value for the encoding attribute&lt;br /&gt;                node.setAttribute("encoding", "ISO-8859-1");&lt;br /&gt;&lt;br /&gt;                // the spec seems to say that we don't need to specify&lt;br /&gt;                // these but if you don't you get an exception&lt;br /&gt;                node.setAttribute("language", "en");&lt;br /&gt;                node.setAttribute("compression", "none");&lt;br /&gt;&lt;br /&gt;                // add the keyword node to the "Text" node&lt;br /&gt;                text.appendChild(node);&lt;br /&gt;            }&lt;br /&gt;&lt;br /&gt;            // the text node has to be in the right place in the&lt;br /&gt;            // tree before we can merge it&lt;br /&gt;            IIOMetadataNode root = new IIOMetadataNode("javax_imageio_1.0");&lt;br /&gt;            root.appendChild(text);&lt;br /&gt;&lt;br /&gt;            // merge the keywords into the existing metadata&lt;br /&gt;            metadata.mergeTree("javax_imageio_1.0", root);&lt;br /&gt;        }&lt;br /&gt;&lt;br /&gt;        // setup the writers ready&lt;br /&gt;        out = new FileOutputStream(file);&lt;br /&gt;        ios = ImageIO.createImageOutputStream(out);&lt;br /&gt;        writer.setOutput(ios);&lt;br /&gt;&lt;br /&gt;        // write out the image with it's metadata&lt;br /&gt;        writer.write(null, new IIOImage(image, null, metadata), iwparam);&lt;br /&gt;        ios.flush();&lt;br /&gt;        out.flush();&lt;br /&gt;    } finally {&lt;br /&gt;        // properly close all the writers&lt;br /&gt;&lt;br /&gt;        if (writer != null)&lt;br /&gt;            writer.dispose();&lt;br /&gt;&lt;br /&gt;        if (ios != null)&lt;br /&gt;            ios.close();&lt;br /&gt;&lt;br /&gt;        if (out != null)&lt;br /&gt;            out.close();&lt;br /&gt;    }&lt;br /&gt;}&lt;/pre&gt;You can also &lt;a href="http://greenwoodma.servehttp.com/svn/repos/open-source/get/blog-code/PNGKeywords/src/PNGKeywords.java?revision=HEAD"&gt;download a fully working example&lt;/a&gt; if you would prefer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-7745293719776486105?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/7745293719776486105/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/png-keywords_12.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7745293719776486105'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/7745293719776486105'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/png-keywords_12.html' title='PNG Keywords'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-7025390077222270096.post-3380119924590513421</id><published>2010-09-09T19:07:00.000+01:00</published><updated>2010-09-12T15:33:48.061+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='CSS'/><title type='text'>Hiding Hidden Objects</title><content type='html'>I thought I'd start this new blog with something simple -- a CSS layout problem that has bugged me for a long time and that I finally solved for the design of this blog.&lt;br /&gt;&lt;br /&gt;When I've released applications through my &lt;a href="http://englishcoffeedrinker.blogspot.com/"&gt;main blog&lt;/a&gt; I've always tried to make them easy for people to use, and my approach has usually been to use &lt;a href="http://en.wikipedia.org/wiki/Java_Web_Start"&gt;Java Web Start&lt;/a&gt; to download, install and run the programs. This works really well as the &lt;a href="http://download.oracle.com/javase/6/docs/technotes/guides/jweb/deployment_advice.html"&gt;deployment script&lt;/a&gt; checks for the correct version of Java and builds the appropriate links into the web page for me. I have, however, noticed that there are a couple of problems with the deployment script.&lt;br /&gt;&lt;br /&gt;Firstly the JavaScript file cannot be included in the head section of the blog template with the other scripts I use. The problem is that the script embeds an object into the page at the point it is included. Objects can't be included in the head section of a HTML page so many browsers assume that if they find an object then you have forgotten the closing head tag and close the head section for you. This is annoying but easy to work around -- just include the script at the beginning of the page body.&lt;br /&gt;&lt;br /&gt;The second problem is that although the object that is embedded is marked as hidden it still takes up a single line of vertical space, and of course depending on where you embed the script that could be a lot of space (imagine a 60pt title for instance).&lt;br /&gt;&lt;br /&gt;It turns out that setting the hidden attribute to true is the same as setting the visibility CSS style to hidden -- the element isn't displayed but space is reserved for it. The trick, therefore, is to use CSS to turn off display of the element altogether. Adding the following CSS rule to your style sheet will do the trick and hide the Java deployment plugin so that it doesn't take up any space.&lt;br /&gt;&lt;pre class="brush: css"&gt;#deployJavaPlugin {&lt;br /&gt;   visibility: hidden;&lt;br /&gt;   display: none;&lt;br /&gt;}&lt;/pre&gt;It isn't really necessary to set the visibility property but I like keeping it in for completeness.&lt;br /&gt;&lt;br /&gt;Why this isn't done by the deployment script I don't know but at least now I have control over the white space in my layout.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/7025390077222270096-3380119924590513421?l=englishjavadrinker.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://englishjavadrinker.blogspot.com/feeds/3380119924590513421/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/hiding-hidden-objects.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3380119924590513421'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/7025390077222270096/posts/default/3380119924590513421'/><link rel='alternate' type='text/html' href='http://englishjavadrinker.blogspot.com/2010/09/hiding-hidden-objects.html' title='Hiding Hidden Objects'/><author><name>Mark</name><uri>http://www.blogger.com/profile/02418527698793489162</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://2.bp.blogspot.com/_xY_30cvz3_A/TF7Iesij9lI/AAAAAAAAAaQ/EIhq7s9Z730/S220/profile.jpg'/></author><thr:total>0</thr:total></entry></feed>
