A Non-Conformant Head

UPDATE 21st April: It would appear that the snippet of misformed HTML is no longer being included in our blogs! The information doesn't seem to be included in any other fashion so I'm assuming it will be back once Blogger have decided how they are going to do it properly.

I've been messing about with Blogger templates over the last few days and I've spotted that Blogger seem to have broken the HTML of every blog they host! They have added an invalid element within the head section of each page which causes the premature closing of the head section. Depending on what scripts etc. you use in your template this may cause a problem.

The new code that Blogger have added to our templates is aimed at adding extra metadata to each page, which in turn will enable Google to have more information about each page within a blog when they are included in a search result. They are using the schema.org metadata format to achieve this. Specifically each page of this blog now contains the following in the head section:
<itemscopetag itemscope='itemscope' itemtype='http://schema.org/Blog'>
   <meta content='Code from an English Coffee Drinker' itemprop='name'/>
</itemscopetag>
As you can probably gather, this snippet essentially tags the page as being from a blog whose title is "Code from an English Coffee Drinker". From the full Blog schema you can see that there is actually a whole set of properties that Google could set for each blog, and I'm guessing that at some point in the future they will add more information, which in turn will enrich their search result pages. Now I'm all for adding extra metadata (I've even written a GATE application that runs ANNIE over webpages and then embeds appropriate schema.org metadata), but unfortunately Blogger have messed up their implementation.

The problem is that they have used an itemscopetag tag, which isn't valid in any version of the HTML specification. Also the specification tells us that if, when parsing the head section of a page we encounter an unknown tag "act as if an end tag token with the tag name "head" had been seen, and reprocess the current token". This essentially causes the premature closing of the head section, with anything else now part of the body instead. Depending on what has been forced out of head and into body and which browser you are using you may see different results. For example, it looks as if links to the Chrome Web Store are broken by this.

What Blogger should have done was added the information to the body tag or one of the main content div tags instead. For example, they could have started the body as follows:
<body itemscope='itemscope' itemtype='http://schema.org/Blog'>
   <meta content='Code from an English Coffee Drinker' itemprop='name'/>
This would have embedded exactly the same metadata but in a format conformant with the HTML specification, and which follows the instructions given on the schema.org site.

Unfortunately there doesn't appear to be anyway to remove this code from our blogs. The best we can do is to move the piece of template code that generates the invalid tag (as well as lots of other code) as late in the head section as possible, so that it pushes the least possible code into the body. To do this you need to edit the HTML version of your template and move the line:
<b:include data='blog' name='all-head-content'/>
To just before the closing head tag so it looks like:
   <b:include data='blog' name='all-head-content'/>
</head>
Hopefully Blogger will fix the code the generate soon but until then we just have to minimize the damage they inflict on our blogs any way we can.

1 comments:

  1. I can't pretend that I fully understood the mechanics but I think I got the gist of the affect. You'd think an organisation as big as Google could get that right.

    ReplyDelete