Write Once, Debug Everywhere

One of the reasons that I quite enjoy developing software in Java is the promise of WORA or write once, run anywhere. In principle this means that I can develop software on one machine and it should work exactly the same on any other computer, regardless of operating system, as long as a copy of Java is installed. I've been writing Java code for over a decade and in all that time I don't think I've ever personally seen a case where the principal falls down -- and certainly not if the code is restricted to just using the core Java libraries. So this morning, with a deadline fast approaching, I hit the opposite of WORA, WODE; write once, debug everywhere!

I'd written and debugged some code on my PC, and it was working flawlessly. I'd integrated the code into a tomcat hosted web app, and again testing on my own machine showed no problems at all. I uploaded the web app to one of the servers at work (more processing power and a better internet connection for demo purposes) and it instantly crashed! A quick look at the log showed the problem to be a ConcurrentModificationException. I've seen this exception before when I've tried to modify a list (or some other Collection instance) while looping through the list, but in this case a) I couldn't see anything wrong with my code and b) I had a machine where the code was working fine!

After a little debugging it became clear that the exception was being thrown by the following line of code:
items.removeAll(items.subList(0, items.size()/2));
Essentially this should halve the original list, retaining the second half (the items in the list were sorted based on an assigned score and I wanted to throw away the lower scoring items). It's worth noting that the subList call returns a view backed by the original list. In other words changes to one list should be reflected in the other. As the code worked fine on my machine I was more than a little confused.

Given that uploading the full web app to the server at work was a relatively time consuming process I wrote a small test application instead to try and figure out what was happening, including the expected output in the comments.
// create a list of four strings
List<String> items = new ArrayList<String>(
  Arrays.asList(new String[]{"a","b","c","d"}));
System.out.println(items); // [a, b, c, d]

// get a view over the first half of the list
List<String> someItems = items.subList(0, items.size()/2);  
System.out.println(items); // [a, b, c, d]
System.out.println(someItems); // [a, b]

// change the first item in the sub-list to "e"
System.out.println(someItems.set(0, "e")); // a
System.out.println(items); // [e, b, c, d]
System.out.println(someItems); // [e, b]

// change the first item in the original list to "f"
System.out.println(items.set(0,"f")); // e
System.out.println(items); // [f, b, c, d]
System.out.println(someItems); // [f, b]

// remove all items in the sub-list from the original list
System.out.println(items.removeAll(someItems)); // true
System.out.println(items); // [c, d]
System.out.println(someItems); // will probably throw an exception
I then ran this on my local machine, where my original code had been working and got the following result:
mark@mag:~$ java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.4) (6b24-1.11.4-1ubuntu0.12.04.1)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

mark@mag:~$ java CMETest
[a, b, c, d]
[a, b, c, d]
[a, b]
a
[e, b, c, d]
[e, b]
e
[f, b, c, d]
[f, b]
true
[c, d]
Exception in thread "main" java.util.ConcurrentModificationException
  at java.util.ArrayList$SubList.checkForComodification(ArrayList.java:1091)
  at java.util.ArrayList$SubList.listIterator(ArrayList.java:972)
  at java.util.AbstractList.listIterator(AbstractList.java:300)
  at java.util.ArrayList$SubList.iterator(ArrayList.java:968)
  at java.util.AbstractCollection.toString(AbstractCollection.java:431)
  at java.lang.String.valueOf(String.java:2838)
  at java.io.PrintStream.println(PrintStream.java:788)
  at CMETest.main(CMETest.java:24)
So far so good. The output matches what I expected to happen, including the fact that the last line throws an exception. It makes sense to throw an exception at this point as I'm essentially trying to access a list consisting of items which I've deleted. So having checked that my test case behaved as I expected I uploaded this to the server at work and ran it.
mark@derwent:~$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

mark@derwent:~$ java CMETest
[a, b, c, d]
[a, b, c, d]
[a, b]
a
[e, b, c, d]
[e, b]
e
[f, b, c, d]
[f, b]
Exception in thread "main" java.util.ConcurrentModificationException
  at java.util.SubList.checkForComodification(AbstractList.java:752)
  at java.util.SubList.listIterator(AbstractList.java:682)
  at java.util.AbstractList.listIterator(AbstractList.java:284)
  at java.util.SubList.iterator(AbstractList.java:678)
  at java.util.AbstractCollection.contains(AbstractCollection.java:82)
  at java.util.AbstractCollection.removeAll(AbstractCollection.java:336)
  at CMETest.main(CMETest.java:22)
Fortunately for my sanity this shorter example failed to behave in the same way on the server as it did on my machine. You can see from the exception that it is failing on line 22 which is the call to removeAll. While that at least proves that the removeAll call is at fault it still leaves the question as to why it works on one machine and fails on another?

Even when the Java language wasn't available under an open source license the source code to the core APIs has always been available. Back in the early days of Java I even used this fact on a number of occasions to submit bug reports to Sun (who originally developed Java before they were bought by Oracle) along with suggested fixes. Unfortunately I haven't been able to track down the source bundles for the exact two versions of Java involved but I did manage to track down the OpenJDK source for version 1.6.0 build 26. This doesn't match either version exactly as I'm running b12 of OpenJDK whereas the server is running buld 26, but the Oracle version rather than OpenJDK. Having looked at the stack trace and this version of the source code I think I can now see why it fails.

The stack trace shows that when I call removeAll on list (which is actually an instance of ArrayList), the implementation in java.util.AbstractCollection is the one that is actually used. This method is a very simple iterator based implementation:
public boolean removeAll(Collection<?> c) {
  boolean modified = false;
  Iterator<?> e = iterator();
  while (e.hasNext()) {
    if (c.contains(e.next())) {
      e.remove();
      modified = true;
    }
  }
  return modified;
}
Now the documentation for the subList method states that
The semantics of the list returned by this method become undefined if the backing list (i.e., this list) is structurally modified in any way other than via the returned list. (Structural modifications are those that change the size of this list, or otherwise perturb it in such a fashion that iterations in progress may yield incorrect results.)
Clearly the call to e.remove() on line 6 structurally modifies the backing list, which will render the sublist, c undefined, which explains why an exception is thrown (probably on the call to contains on line 5). So now I know why it fails on the server, but why does it work on my machine.

I'm guessing that in the version of Java running on my machine, when I call removeAll, rather than using the default implementation in AbstractCollection, it instead uses the ArrayList specific version.
public boolean removeAll(Collection<?> c) {
  return batchRemove(c, false);
}

private boolean batchRemove(Collection<?> c, boolean complement) {
  final Object[] elementData = this.elementData;
  int r = 0, w = 0;
  boolean modified = false;
  try {
    for (; r < size; r++)
      if (c.contains(elementData[r]) == complement)
        elementData[w++] = elementData[r];
  } finally {
    // Preserve behavioral compatibility with AbstractCollection,
    // even if c.contains() throws.
    if (r != size) {
      System.arraycopy(elementData, r,
                       elementData, w,
                       size - r);
      w += size - r;
    }
    if (w != size) {
      for (int i = w; i < size; i++)
        elementData[i] = null;
      modCount += size - w;
      size = w;
      modified = true;
    }
  }
  return modified;
}
As you can see this implementation is slightly more complex, but I'm guessing that it's a lot more efficient. Essentially this implementation modifies the underlying array directly rather than via the public API -- it works by shifting each entry towards the beginning of the array overwriting any entries that need to be removed. Similarly the implementation of contains called from line 11, access the array directly meaning that the entire removeAll call never results in a check for a concurrent modification and hence never throws an exception.

So now I understand why the same code behaves differently on two different machines. Well I almost understand. The missing piece of the puzzle is why the rather naive version of removeAll is used on the server? Without access to the source code of the exact versions of Java involved I can't be certain, but I'm assuming that the ArrayList specific implementation is an OpenJDK efficiency improvement that isn't in the Oracle branded version.

Fortunately my original code is easy to fix, I can achieve the same effect (deleting the items in the first half of the list) using the following line of code which does appear to run anywhere.
items.subList(0, items.size()/2).clear();

0 comments:

Post a Comment