Postvorta: Providing Intelligent Blog Search

The eagle-eyed amongst you may have noticed that about a month ago the search box in the sidebar of this blog changed. I used to use the standard Google search gadget but I now use a gadget powered by Postvorta.

Postvorta was built specifically to enable intelligent searching of blogs. How do I know this you ask? Well I spent the past year building Postvorta in my spare time. The initial motivation was a number of conversations with fellow bloggers about the inadequacies of the Google search gadget and coupled with the fact that my job involves processing natural language documents (I work as part of the GATE group at the University of Sheffield) I thought I was in a position to provide something better.

It is difficult to know exactly how the standard Google search gadget works, but as far as I can tell (both from personal experimentation and from talking to others) it appears to only index the main content of each post. For example, it certainly doesn't index the labels associated with posts. This means that while you can view all posts with a given label you can't search for them using the search gadget. Postvorta, however, indexes all the important content from your blog posts: title, article, labels, and comments. Importantly it does not index the pages you see when you view the blog in a web browser, instead it access the underlying data (via the Google Data APIs) which means that it can ignore the repeated information in the blog template. For example, many blogs contain a gadget which lists recent post titles, these shouldn't be indexed with each post as that makes it much more difficult to search for the actual post. A search can also be restricted by date and/or by the people who commented on a post. I've tried to provide as much flexibility as possible while keeping the full interface relatively simple.

Fortunately when building Postvora I didn't have to start from scratch. One advantage of working in a research group that makes their software available under an open-source license is that I can make use of software I use at work in my own projects. In this case the main indexing and search facilities behind Postvorta are built upon GATE Mímir. I've talked about Mímir before on this blog and if you've read that post then you shouldn't be surprised that as well as searching for words, like Google and the other search engines, you can use Postvorta to search your blogs semantically, i.e. for things. So you can search for any posts containing, for example, the name of a person without knowing what the name was in advance. If you are new to Mímir then Postvorta provides a comprehensive description of the query syntax which becomes available when you choose to use it through the search interface (by default searches are treated as a simple bag-of-words just as with other search engines).

Feel free to have a play with Postvorta through the search gadget on this blog. I'm also using it on my main blog where there are a lot more posts to search through. Postvorta is currently being run as a closed beta (while I evaluate performance, reliability etc.) but if you like what you see then you can register your interest and I'll try and index your blog as soon as possible -- note that currently Postvorta only supports Blogger blogs, although WordPress support should be coming soon.

Let me know what you think.

3 comments:

  1. I have found it very much better than Google and am grateful for the blogs you've done for me, thanks Mark.

    ReplyDelete
  2. Thanks for the tips I am redignsing my Blog how do I remove the Google Search that came with the template
    here is my blog URL http://www.incme-booster.tk

    ReplyDelete
  3. Hi Info-Marshal, I think you misspelt the url (the e is missing from income), looking at your actual blog (via the link in your profile) it looks as if the search box has been added as a HTML widget, so it should be a case of just deleting it from the layout editor.

    ReplyDelete