(File) Size Does Matter!

I like PDF files. Everyone (or as near as makes no difference) can open them, and you can be fairly certain that the file will look the same on every computer. Sometimes though I struggle to produce PDF files that are of a high enough quality, but small enough to e-mail or host on a website.

The most recent problem I've faced involved producing a PDF from an SVG file (authored using Inkscape) which included linked digital photos. The SVG file itself is just 77kB in size, and the five photos it links to total just over 7MB. Even if the images were simply included as is, I wouldn't expect the resulting PDF to be more than 8MB in size, although I would hope that some reasonable down sampling would be applied. Printing to a PDF from within Inkscape, however, results in a file of 22MB in size! That's over three times the size of the included photos, which is just ridiculous. Exporting to a PDF instead of printing, results in a file of the same size so isn't much help.

Now I don't know enough about the inner workings of a PDF file, but a 22MB file (which is incidentally a single A4 page) just isn't right, and certainly isn't sensible for trying to send around via e-mail etc. I've tried searching the web for suggestions, but none of the ideas I've found have turned out to be much use; either they don't alter the size of the file, or they reduce the quality so far that the text becomes unreadable.

The first alternative I tried was to produce a PNG image from the SVG instead of a PDF. While an image file wasn't really what I wanted I thought it might be a good place to start. Using Inkscape to produce a 300 DPI image resulted in a file of just 1.7MB in size; a file, I should add, with a quality that is perfectly adequate. The problem with an image file is that it is easy to alter and can be a pain to print at the right size, so I'd still like to use a PDF file. Fortunately, on linux at least, it is easy to create a PDF from a PNG, using the simple command
convert input.png output.pdf
Given that Inkscape can also be controlled from the command line I can easily write a reusable script that converts an SVG direct to a PDF of a reasonable size:
#!/bin/sh

inkscape -f "$1.svg" -e out.png -C -d 300
convert out.png "$1.pdf"
rm out.png
I can then use this by passing in the name of the SVG file (minus the .svg extension) and out will pop a PDF file. Now I know this approach isn't perfect but given my requirements (a PDF of a reasonable size that is really only needed for printing) this works well. I'm intending to see if I can find a better solution (one that will keep the text as text) so if you have any suggestions please feel free to leave a comment.

CORPSE: COld stoRage backuP SoftwarE

I've recently started using Amazon Glacier to store the backup of last resort of all my digital photos (I've discussed the motivation behind this on my other blog). While I think the idea of cold storage, and Glacier in particular, is great there is just one problem; Amazon don't provide a friendly user interface for Glacier. Glacier is part of the Amazon Web Service (AWS) framework, and as such is designed as a service around which other people can develop applications.

A quick web search will bring up links to quite a few different tools for using Glacier. I'm currently uploading my data using SAGU (the Simple Amazon Glacier Uploader). While SAGU does make uploading data to Glacier easy it has a number of shortcomings which the developer is working to address. I've proposed a couple of patches, but I'm also looking at implementing more useful functionality (for example, range based downloads are a must have feature if you want to keep the costs under control). While I'm hopping that some of the ideas I'm working on might be incorporated into SAGU, I'm currently developing them in a separate project which I'm calling CORPSE (which of course stands for COld stoRage backuP SoftwarE).

Even if CORPSE doesn't result in a full application, I'm using it to a) experiment with a number of Java libraries I've not used before (including AWS itself, Jackson, Flyway) and b) using git for version control. In the past I've always used a centralised version control system (be that CVS, PVCS or SVN), but the decentralised nature of git is intriguing and with a free account on GitHub I don't have anything to lose. I'm sure I'll mention CORPSE again on this blog in the future but if you want to take a look at what is there now or to follow development then you can find it in this GitHub project.

To Track, Or Not To Track, That Is The Question

If you read any of my other blogs, then you may well be aware that I've recently opened a shopfront on Shapeways to make a few 3D models available for sale. While I really like this feature of Shapeways I wanted to customize the experience a little and so built a seperate, more customizable, shopfront. One of the options when setting up the Shapeways store was to provide a Google Analytics tracking code. I've never used any tracking code before on any of the websites I've built as I've never really seen a strong reason to. Given I was being prompted to though, I thought I'd investigate, especially as it would allow me to see how often people looked at my models, separate to how often people buy them.

Registering for a Google Analytics account is free and easy (as with most Google services) and within a few minutes I had the tracking information to register with Shapeways. Having got this far I decided to also add the tracking information to the separate shop front I'd built, and of course this is where life got interesting.

The first problem I faced was that I couldn't get Google to recognise that I'd added the tracking code to my website. No matter what I tried it simply kept telling me that I hadn't yet added the relevant JavaScript to my pages, yet I knew it was there. After a lot of head scratching and web searching I eventually found the problem and an easy solution. The domain I purchased doesn't actually include the initial www of most web addresses, but the way I have the DNS records and forwarding configured if you type the bare domain name you will be forwarded to the www version automatically. So when completing the Google Analytics form I included the www, which turns out was my mistake. I'm still not sure why this should make a difference (I'm guessing that Google aren't following the forwarding information when checking the page) but anyway removing the www from the default URL field fixed the problem.

The second problem was that I then wanted to turn the tracking off. Not so that I could comply with the EU directive on cookies or anything (I'm taking the applied consent rule, on the basis that if you visit an online shop you have to assume, at the bare minimum, that your page views are being tracked), but so that I wouldn't end up tracking my own page views.

I tend to have two copies of every website I develop running at the same time; one is the publicly visible (hopefully stable) version, while the second is my internal development version. Once I'm happy with updates to the development version I then push them to the live version. So when it comes to tracking page views not only do I not want to track my page views on the live site, but I definitely don't want to track page views on the development version.

My solution to this is to use server side code to only embed the tracking related JavaScript into the page, if it should be used. For this I wrote a utility function to determine if I should be tracking the page view or not:
public static boolean trackPageViews(HttpServletRequest request)
      throws Exception {

   //get the address the request is coming from
   InetAddress client = InetAddress.getByName(request.getRemoteHost());

   //the client is the same machine (an address such as 127.0.0.1)
   if (client.isLoopbackAddress()) return false;

   //the client is within the same site (i.e. not using a world visible IP)
   if (client.isSiteLocalAddress()) return false;

   //the client is at the same world visible IP address as the server
   if (client.toString().equals(getServerIP(false))) return false;

   //track all other page views
   return true;
}
Essentially this code checks the request and says not to track page views if it is coming from a loop back address (i.e. 127.0.0.1 always refers to the local machine, no matter what the machine is), a site local address (i.e. within the same local network and hence using a non-public IP address, often something in the 192.168.*.* range), or if the request is from the same public IP as the server is running on. The first two checks (lines 8 and 11) use standard Java libraries to do the checking, and catch all page views of the development versions of my websites (which are always only accessible from within my local network). The third check is, however, slightly more complex.

The problem is that you're machine itself probably doesn't have a world visible IP address; general a home network will be connected to a broadband router which will have a world visible address. So what we need to do is find the external IP address of the router. There are a number of websites available that will show you the world visible IP from which you are connecting and so we could use one of those to figure out the external address of the server, or we could write our own.
package utils;

import java.io.IOException;
import java.net.InetAddress;

import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class IPAddressServlet extends HttpServlet
{
   @Override
   public void doGet(HttpServletRequest request,
      HttpServletResponse response) throws IOException {

      InetAddress client = InetAddress.getByName(request.getRemoteHost());
      response.getWriter().println(clientAddress.toString());
   }
}
Assuming you access this servlet via a public IP address then it will simply echo the address back to you. We can now fill out the missing getServerIP method from our trackPageViews method.
private static String myip = null;

private static String getServerIP(boolean refresh)
      throws IOException {

   if (myip == null || refresh) {
      URL url = new URL("http", HOST, "/myip");
      BufferedReader in = new BufferedReader(
         new InputStreamReader(url.openStream()));
      myip = in.readLine();
      in.close();
   }
  
   return myip;
}
I'm assuming this will work in all cases where you connect to the server using it's standard web address, as that will always be converted via a DNS server to a world visible IP address, although if you have a strange network setup where you might have multiple world visible IPs through which you could connect then it might not work correctly. The important point is it works for me, so I can now happily browse the public version and development version of the site from anywhere within my home network safe in the knowledge that my own page views aren't being tracked, so any analytics Google collects for me are form real visitors; albeit only those with JavaScript enabled, but that's an issue for another day.

Before I forget, thanks to GB for the photo I used to brighten up this post!

Generating Sitemaps

A long time ago (okay 2009) on an entirely different blog, I introduced v0.1 of the SitemapGenerator. This was a command line tool that I was running once an hour to monitor my blog, and if anything had changed it would re-generate a sitemap and notify Google of the changes. I wanted this predominantly so that a Google custom search engine would stay up-to-date, something I no longer need to worry about given that I use Postvorta to index all my blogs. I haven't used the code in a number of years but recently I again need to generate a sitemap. This time though I wanted to generate the file from within the web application I was building rather than via a separate recurring script.

I did a quick search for a better version than my original, and while I found a few Java libraries none really met my requirements; I wanted something lightweight that would allow me to take advantage of the extensions to the protocol Google has introduced for documenting images, videos and news articles. So I went back and re-wrote the guts of my original application. The new and improved version now supports a simple API that I can call from within my web application (although I've maintained the command line tool in case anybody was still using it).

If you are interested then you can grab the new version direct from SVN.

Back To Front But Functional

While the circuit required for programming an ATmega328P-PU chip (i.e. a standalone Arduino) is fairly simply (regardless of whether you are running at 8MHz or 16MHz) it would still be a pain to have to set everything up on a breadboard each time I wanted to programme a chip. The solution was to build a shield for the Arduino which I could simply slot on whenever I wanted to programme a new ATmega328P-PU.

Now I could have gone all fancy and designed a custom PCB for this project, but for such a simple project (where I wouldn't ever need more than one shield) this seemed overkill. Instead I built the shield out of a budget prototyping shield from oomlout. Obviously I don't want to solder an ATmega328P-PU to the shield as that would defeat the purpose, so I've added a 28pin DIP socket in its place. It took me a while to figure out the best placement of the components and to cut appropriate sized pieces of wire (lots of the connections are actually made simply by soldering the leads together on the underside of the board), although actually soldering everything in place was fairly quick (the advantage of lots of planning time). Most of the components were held in place by bending their leads slightly. The DIP socket, however, kept falling out. Unfortunately one of those occasions was just as I was about to solder it into place, and I managed to put it in back to front :(

Fortunately, the only problem with having the DIP socket back to front is that the notched end of the chip doesn't match up with the notch on the socket, which is why I've added a white paint mark to the socket to remind me to put the chip in the wrong way around. Anyway I now have a functioning shield which I can use to programme an ATmega328P-PU to run at either 8MHz or 16MHz and to upload sketches to it. I'd call that a success.

Double The Speed For Just 35p

In my previous post I explained how I'd eventually managed to figure out the procedure for configuring and programming an ATmega329P-PU micro-controller. The configuration in that post was deliberately designed to be as minimal as possible, which was why I was limited to using the internal 8MHz RC oscillator. While 8MHz is possibly fast enough for a lot of projects, there are occasions where speed is important; a lightening trigger for a camera for example. In these cases running the chip at 16MHz would be preferable. Fortunately adding an external 16MHz crystal to double the speed is both easy and cheap!

Before we get to the hardware though, we need add a new entry to the boards.txt file defining the new setup (see the previous post for details on this and how to upload code etc.).
atmega328p16mhz.name=ATmega328P (16 MHz external crystal)

atmega328p16mhz.upload.protocol=stk500v1
atmega328p16mhz.upload.maximum_size=30720
atmega328p16mhz.upload.speed=9600

atmega328p16mhz.bootloader.low_fuses=0xD7
atmega328p16mhz.bootloader.high_fuses=0xDF
atmega328p16mhz.bootloader.extended_fuses=0x07
atmega328p16mhz.bootloader.unlock_bits=0x3F
atmega328p16mhz.bootloader.lock_bits=0x0F

atmega328p16mhz.build.mcu=atmega328p
atmega328p16mhz.build.f_cpu=16000000L
atmega328p16mhz.build.core=arduino:arduino
atmega328p16mhz.build.variant=standard
Most of this is identical to the definition for the 8MHz version. The two main changes are to the low fuse (to specify that we are using an external full-swing crystal) and to the processing speed (changed from 8000000 to 16000000).

To make this work we need to add three small components to the previous circuits: a 16MHz crystal and two 22pF capacitors. Fortunately these are very cheap components and add just 36p to the cost of the circuit (making a total cost of £5.54 or £3.87 if you ordered parts for ten circuits). You can see how these are wired up in these diagrams (for programming on the left and in use on the right).


Configuring the fuses and uploading sketches to this new circuit is exactly the same procedure as for the 8MHz version, other than ensuring you select the 16MHz version from the boards menu.

There is, however, one important thing to note. Once you have configured the chip to run at 16MHz, you can't then use it without a 16MHz crystal present. You can't even re-programme the fuses to set it back to run at 8MHz. So if you want to go back to using the internal RC oscillator re-programme the fuses with the crystal present and then simplify the circuit.

Honey, I Shrunk The Arduino

As you might be able to tell from recent posts, I've been doing quite a bit of work with an Arduino. I've now got at least one project that I'd like to make a little more permanent, rather than it just being a bunch of components on a breadboard.

The easiest option would probably be to solder the components onto a prototyping shield (for example, these budget models from oomlout). There are two problems with this approach though. Firstly not every component is ideally suited for the form factor of an Arduino shield (for example a 16x2 character LCD takes up most of the prototyping space), and secondly to be any use a shield still needs to be connected to an Arduino. While Arduino's aren't extortionately expensive at just over £20 (depending where you buy them from), that's quite a cost to add to a project.

There are cheaper versions of the Arduino available, such as the Arduino Mini, which would bring the cost down to around £10 and which might be ideal for some projects. If you want to move your project onto a custom PCB though, you probably don't want to have to solder on a second PCB. The solution is to build the Arduino components directly into your circuit rather than relying on a pre-built solution. This should be cheaper than buying an Arduino and allows you to lay out the components in the most efficient way on a custom PCB.

The problem of course is figuring out how to construct an Arduino from the component parts. I spent quite a while trawling the internet trying to find the answers, but couldn't find a single page that collected everything you needed to know in one place, so I'm sure you can guess what the rest of this post is going to cover :)

The first decision you have to make is which Arduino compatible microcontroller you are going to base your circuit around. This will depend on the number of input/output pins you need as well as the size of the compiled sketch you intend to use. Given that I'm prototyping my ideas using an Arduino UNO, I decided to stick with the ATmega328P-PU chip the UNO is based around. This is by far the most expensive component you will need to buy, but you can pick them up for around £3. You will find that a number of places sell them with the Arduino bootloader programmed in, for which they charge a premium. Unless you are simply wanting to replace the chip in an existing Arduino UNO you don't actually need to use the bootloader, so save yourself a few pounds and buy the plain un-programmed chips.

Most of the tutorials on using a ATmega328P-PU expect you to run the chip at 16MHz (the same speed as the UNO) but this requires you to provide an external clock signal using a 16MHz crystal. As components mean costs, I decided I'd be a cheapskate and run the chip just using the internal 8MHz clock. We don't need any other components in order to programme the chip, but we do need to tell the Arduino software how to work with the chip.

Extra boards/chips are made available through the Arduino software by adding them to the boards.txt file in the hardware/arduino/ folder of the Ardunio installation (for full details of the format of this file try this Arduino wiki page). After a fair amount of messing around (starting from this example) I eventually figured out the following definition:
atmega328p8mhz.name=ATmega328P (8 MHz internal clock)

atmega328p8mhz.upload.protocol=stk500v1
atmega328p8mhz.upload.maximum_size=30720
atmega328p8mhz.upload.speed=9600

atmega328p8mhz.bootloader.low_fuses=0xE2
atmega328p8mhz.bootloader.high_fuses=0xDF
atmega328p8mhz.bootloader.extended_fuses=0x07
atmega328p8mhz.bootloader.unlock_bits=0x3F
atmega328p8mhz.bootloader.lock_bits=0x0F

atmega328p8mhz.build.mcu=atmega328p
atmega328p8mhz.build.f_cpu=8000000L
atmega328p8mhz.build.core=arduino:arduino
atmega328p8mhz.build.variant=standard
I'm sure that to most people (me included) that isn't self explanatory, so let's step through it to see what it means.

Firstly, line 1 simply assigns a human readable name to the board definition (this is what will apeartr on the Boards menu). Lines 3-5 state that we will be uploading to the chip using the right protocol to use the Arduino as the ISP (In System Programmer), the maximum sketch size, and the speed at which we will upload to the chip. Unfortunately, at least in version 1.0 of the Arduino software the upload speed seems to be ignored which will come back to bite us later.

Lines 7 to 11 specify the fuse settings. These are essentially used to configure the chip. I used Engbedded's fuse calculator to figure out the correct settings. Note that I'm not using the default fuse settings that the calculator suggests as I've turned off the Divide clock by 8 internally option. I found that otherwise the chip seemed to be running at 1MHz rather than 8MHz (i.e. the blink example sketch while setup to blink at 1 second intervals appeared to be actually blinking at 8 second intervals). I also specified the smallest bootloader space option, given that we aren't going to use one. Lines 13 and 14 specify the chip type and speed and finally lines 15 and 16 specify which internal libraries to compile the sketch against (in this example the standard variant which is the UNO).

So now that the Arduino software knows how to programme the board, we need to turn our existing Arduino UNO into an ISP which we can use to programme the chip. Now there is an example sketch called ArduinoISP bundled with the Arduino software, but try as I might I couldn't get it to work. Instead I'm using a slightly improved version released by Adafruit. Simply download the new version into your sketchbook folder and then upload it to your Arduino UNO the same way you would any normal sketch. Now wire the ATmega382P-PU you want to programme to the Arduino as shown to the left (this is also documented in the comments at the top of the ArduinoISP sketch, which also talks about adding LEDs for feedback; feel free to add these if you wish). Now in the IDE, from the Tools menu set the Programmer to ArduinoISP and the board to ATmega328P (8 MHz internal clock).

Now before we can actually programme the chip we need to set the fuses to the correct values as discussed above. To do this simply choose Burn Bootloader from the Tools menu (note that this is slightly misleading in that we aren't burning a bootloader as we didn't specify one in boards.txt, but this menu item sets the fuses as well as burning the bootloader if specified). Hopefully this will "just work", but for some reason my version of the IDE uses the wrong upload speed and as such this fails. Fortunately it's fairly easy to do manually. The easiest way is to get the IDE to generate the command line for you. To do this open the Preferences dialog from the File menu, and turn on the verbose output during upload. Now try to burn the bootloader again. It will still fail, but the first message will be the command line it is trying to use. On my machine this is:

/usr/share/arduino/hardware/tools/avrdude -C/usr/share/arduino/hardware/tools/avrdude.conf -v -v -v -v -patmega328p -cstk500v1 -P/dev/ttyACM0 -b19200 -e -Ulock:w:0x3F:m -Uefuse:w:0xFF:m -Uhfuse:w:0xD9:m -Ulfuse:w:0x62:m

This command is setting the baud rate to 19200 which is an awful lot faster than both the upload speed we specified in boards.txt and the baud rate specified in the ArduinoISP sketch, both of which use 9600. So simply change the -b19200 to -b9600 and execute the command manually (from a terminal or command prompt etc.). Hopefully that should work correctly. Once you have the fuses set, assuming you don't want to change the way the chip is configured, you won't need to do this step in the future.

Now you are ready to actually upload a sketch to the new chip. I usually test that everything is working using the simple Blink example that comes with the Arduino IDE. As you have already connected the Aruino to pin 13 of the new chip (this doubles as the SCK pin) I usually change the sketch to use pin 8 which is the bottom right pin if you have yours in the same orientation as the above diagram, so that you can see the sketch working without having to change any of the wiring used for programming. Now to upload to the new chip you need to use the "Upload Using Programmer" option from the File menu (making sure that the programmer and board are still set correctly) and not the Upload button on the toolbar. If setting the fuses failed the first time and you had to manually change the baud rate then you will probably run into the same problem here as well, but the solution is the same (run the command from the IDE then copy it out, change the baud rate, and then run it manually). Once it has uploaded successfully you should find your LED is blinking on and off at one second intervals.

Now, as the main aim was to remove the need to dedicate a full Arduino UNO to each project we need to power the chip in some other way than by being connected to the UNO. Fortunately this is also fairly simple as you can see from the setup to the left. This uses just four components; a voltage regulator, two capacitors, and a 9V battery connector.

I've actually found that the supply from a fresh 9V battery is stable enough that you can do without the two capacitors, but as they are stipulated by the datasheet for the voltage regulator I've included them.

Minimal Arduino Part List
PartUnit Cost
ATMEL ATmega328P-PU£3.02
5V Voltage Regulator£0.40
100µF Capacitor£0.151
9V Battery Connector£0.60
These extra components are, as you can see from this table, much cheaper than the main ATmega328P-PU, and adding the two capacitors adds just 30p to the cost. In total the components add up to just £4.32 + VAT for a grand total of £5.19 -- or around half the cost of the Arduino Mini, or a quarter of the cost of an Arduino UNO. The circuit could be made even cheaper if you were willing to buy the components in bulk; for example if you buy 10 ATmega328P-PU then they cost just £1.63 each, so you could make 10 of these circuits for just £3.52 each.

The one thing I haven't tried yet is using any analog inputs with this setup. The datasheet for the ATmega328P-PU says that the AREF pin needs to be connected to power via a low-pass filter to remove noise if the analog to digital converter is being used. I've seen a number of websites that at least suggest this isn't necessary, especially if powering via a battery, rather than an alternating current based power source. I'll figure this out in due course but even if I do need to add extra components they will be priced in the pence so won't add much to the overall cost.