Ricky's technical blog

Saturday, May 15, 2010

Refactor: Lazy Initialisation to Eager

A field is being initialised when used, rather than when the object is created.

Extract the initialisation into a separate method that returns the value to store in the field, move the assignment to the field declaration, and inline any private use of the getter.

becomes

Motivation

The lazy initialisation is no longer necessary, and harms readability. Additionally, incorrect lazy initialisation can cause problems for concurrent programs.

Mechanics

Where the field is assigned, redeclare a variable shadowing the original field. Rename that variable including its uses, and at the end of the block assign the new variable to the field:

Extract a method for the operations on the new variable. This should be static, at least during the refactor, to prevent accidental operations on other fields. Other fields' values should be passed in explicitly as parameters. This helps to prevent initialisation-order problems (where y gets initialised before x, but while initialising y, we read x and get a big fat null).

Get rid of the == null check, and make the field final:

Wednesday, April 28, 2010

Using types to help refactor

In some real code, I changed an int that was used to index into an array of ViewerPanels, so that the ViewerPanel was passed around instead of the int. I missed some cases in the first pass, so I had a think about how to make sure I don't do that again.

The cases I missed were PropertyChangeListeners. If you're not familiar with that type, you can use it with a PropertyChangeSupport to listen for changes in values. It has Object in its API, rather than using generics. I don't particularly like the idea of listening for changes, particularly if the listeners then end up interfering with the objects they're listening to, but that's what I have in existing code, so I need to get on with it for now.

The tests didn't pick it up, I found it by coming across a firePropertyChange and wondering what the listening end now looked like. The tests could be better, sure.

Here follows four versions of a simpler code sample. The refactor we want to do in this case is just to go from double to float. Version 2 shows what happens when one forgets to check the PropertyChangeListeners. Version 3 steps back to Version 1 but alters it to use a type-safe version of PropertyChangeListener. Version 4 does the refactor again, but this time the compiler forces the ChangeListener to be altered, because we're no longer based on PropertyChangeListener.

The below code is executable.

Monday, January 18, 2010

JNI + applets = pain!

I mostly work on GUI software for controlling and viewing video from security cameras. The software is written in Java, and mostly written by people who are no longer with the company. The usual app I work on is a Swing application, but there is also an applet, which displays just a viewer; the controls for it are HTML buttons around the applet.

We use MPEG-4 (and are adding H.264) extensively, and Java's support for that is, well, non-existent, so we use ffmpeg, via JNI bindings generated by SWIG.

If I was starting again, and for some reason still thought using Java was a good idea, I'd try xuggler out. Xuggler appears to provide the kind of wrapping library around ffmpeg that I would hope to evolve our code towards. But even Xuggler wouldn't fix these recent issues.

The first problem, which was mostly solved before I got involved, is that you cannot load a native library directly from your JAR file. You need to extract it (typically into java.io.tmpdir), and then load it. Not too hard, but it does start to feel icky especially when you consider that an abnormal termination of the browser will cause the file to be left taking up space until manually deleted.

The second problem is the strange interaction between classloaders and JNI - even within trusted code, two classloaders in the same JVM cannot each load the same library. I'm sure somebody somewhere thinks this is a feature rather than a bug. Our code does nothing fancy at all with classloaders; it just gets affected by the fact that two applets executed by the same browser instance will share a JVM. This is still true after Java 6 Update 10, in which the JVM that applets run in is no longer the same process as the browser. If the codebase that you specify to the applet tag is different between two instantiations of the applet, then each instantiation occurs in a different classloader, and the JVM prevents the second instantiation from loading the same library (from the same filename).

This wasn't too bad, and we changed our web pages so that the codebase was the same in each instantiation. As it happened, we weren't loading anything other than the applet jar itself, so that was quite okay.

The applets and web pages are hosted on our image servers, which are units that can deliver data from and to multiple cameras. So then when one browses from an applet hosted on one image server to an applet hosted on another, despite the codebase in the applet tag being the same, two classloaders are used, and the JVM prevents the second applet from loading its native library.

The obvious solution to this is to, when writing the native library to disk, choose a unique name. But that's where the third problem comes in..

Most of our users run Windows, and on Windows if you try to delete a .dll that is in use the OS prevents that. This time, however, the relevant users are all using Solaris, and Solaris allows the .so to be deleted even while it's in use. Our library-extraction code would try to delete libffmpeg-1.so, and if it succeeded (which on Windows would be because the library was not in use), it would then use that as the filename to give the library extracted from the jar file, and then load it. As you can probably see, this works well on Windows, and like a wooden saw elsewhere, because the JVM then believes two classloaders are trying to load the same library.

The fourth problem is that the applet is self-contained; we have a separate build for Windows, Linux, OS X and Solaris, so JavaScript in the web pages has to detect the OS. Detecting the OS is ok, but we support Solaris Intel and Solaris Sparc, and there's no way of detecting those (please correct me!) from JavaScript. We have a mechanism for dealing with this, but I'm not happy with it.

I do plan on making the applet no longer self-contained, so that we distribute one platform-independent section, which detects the OS it's on and loads the correct native library, but really, this is just a case where Java doesn't come with batteries included. I realise applets are not even on the backburner anymore; they've been behind the stove getting mouldy for years, but this does seem like something that should be a lot more straightforward.

An approach that looks promising is that of applet-launcher, which wraps your applet up in another applet that handles all the library loading. I couldn't try this easily as I only control the Java code, not our web pages - it would have required the addition of lots of .getSubApplet() calls in the JavaScript code.

Tuesday, December 08, 2009

Deleting code, what first?

I have about 200kloc of Java code to work with, and I often stumble across bits that are more complex than they need to be, costing me time when I'm trying to solve something else.

Within reason, having a smaller body of code is an improvement. Here are my guidelines, mainly for myself, to shrinking it.

1. Use IDEA's analysis tools to find unused code, and delete it. If it's public-facing API, and you know of no code that uses it, it's probably broken anyway. Deprecate it now and delete it at some later date. If you deleted it and somebody was using it, they'll complain at you, you'll reinstate it and include at least a comment that team X at company Y is using it, and at best an automated test that the code does what team X requires.

Watch out for main methods when you delete code - there are a few ad-hoc utilities in our codebase.

Always ensure that deprecated code is only called by deprecated code, i.e., that deprecation forms a clique.

2. Look at 'throws' clauses. If one low-level method has, say, 'throws IOException' in its signature, and all callers either pass it on to their caller or catch and do something dumb with it like logging, then you may as well do that within the low-level method, or rethrow it as a RuntimeException. This can remove a surprising amount of bullshit code.

3. Look for non-final static variables, a reasonable indicator of mutable singletons. Try to stamp them out. While this might actually increase the amount of code you have, it will do wonders for the maintainability. I did this today and found a potential bug (which I've yet to prove). The worst case is that you find out why you can't stamp a particular one out - you'll have learned something useful about the application's architecture and flaws along the way.

4. Use IDEA's "field could be local variable" inspection to simplify classes. As a general rule, a class with fewer fields is simpler.

5. Private getters and setters that do nothing beyond this.foo = foo; and return foo; are useless; inline them.

6. Search for repeated code; the IDEA Ultimate Edition can do this, but the Community Edition cannot. Unfortunately, it can only find them, not suggest a solution, and in my experience it doesn't get the ordering right; we had 15 almost-identical classes that it didn't even notice. It's still very useful though.

Even if you can't think of a decent abstraction, get rid of the repetition anyway. If needed, a better name or better way of doing it will be easy to use later. It's far easier to refactor 1 copy than 10 copies of the same code.

After a discussion at work, we decided that duplicated code has a cost proportional to its age multiplied by its size multiplied by the number of repetitions. That seems a reasonable rule; the cost is paid when using it or removing it.

7. Try to replace setters with constructor parameters, until it gets unwieldy or simply impossible as the objects get mutated after initialisation. When it gets unwieldy because the parameter list gets too long, consider splitting the class up.

8. Immutables remove the need for defensive copying. Lots of awful code is dedicated to copying values so that they can be passed to third parties. If you can make those values immutable this gets a lot easier.

9. Immutables remove the need for synchronization. Lock-free programming is far simpler if you can get away with it. Most programmers, myself included, get locks wrong, and the problems grow exponentially. That said, I don't have a general answer for what to do instead; I'm still at the stage where I need to work it out per-case.

10. Delete commented-out code. It's revision-controlled; if anyone actually ever remembers about it they can easily restore it. It's usually traces of previous experiments, and isn't of any real use. Similarly, delete the IDE-generated crap, like "Created on Mar 13 2005". You might want to keep the @author info, unless it's obviously useless, like @author Administrator, but even that can be of limited value years later when the author has left the company.

11. This isn't really about code, but: commit little and often. It's easy to do 3 refactors in one go and screw the third up, then have to revert and repeat the first 2. It's easier if you committed the first two. Similarly, if you find that a commit caused a regression, it's easier to work out what change caused it if each commit is small. That's not to say you should commit known broken code, at least not to anything other than an experimental branch.

In short: commits are game save points, not presentations.

Monday, November 30, 2009

My co-worker wouldn't get it

I'm continually unimpressed by a certain form of argument that I've seen in many places, but especially in the continuing discussions about closures in Java.

It goes like this: Feature X is great, and will make lots of code simpler, but there will be a lot of worse code because of it. Sometimes co-workers are mentioned, sometimes contractors who have to maintain the code without having months to learn its innards.

I think this boils down to two issues:

1. *I* would write bad code with Feature X.

2. I'm worried that I'll have to deal with bad code that uses Feature X.

Any healthy programming environment will not freeze code that is known to be bad in some way. It can be fixed, and lessons can be learned. If you seriously worry about your co-workers, watch their commits. If you worry about yourself, write some non-critical code that uses the feature a lot; perhaps a prototype, some unit tests or something unrelated to work.

With some exceptions, most programmers will write bad code before they learn how to write good code. Let them do so, and help them to get past it.

My co-workers would get it.

Tuesday, October 27, 2009

100 Bugs (ok, tickets) in 100 Days

I've now closed 100 tickets in the last 100 calendar days at work, which might not mean a lot, but it's been a personal target of mine to get to this point, so I'll celebrate it by, um, blogging!

Saturday, September 26, 2009

Which language shall we learn?

Hi Jose,

We decided, while drinking an overly-priced red wine the other night, that I'd help you to learn how to program, but without making it sound complicated. So, I thought I'd show you a few different language syntaxes and let you choose.

You can do anything you want with any of these languages, so feel free to pick the prettiest or whichever makes you laugh, or use whatever other criteria you feel like.

I chose the top-10 current mainstream languages, with the exception of PHP, which is too focused on one kind of task, and added Scala, Haskell and Erlang, because I like them.


1.
import java.util.Scanner;

public class Main {
  public static void main(String[] args) {
    Scanner scanner = new Scanner(System.in);
    System.out.println("Please enter your name.");  
    String line = scanner.nextLine();
    System.out.println("Nice to meet you, " + line + ".");
  }
}

2.
#include <stdio.h>

int main(int argc, char **argv) {
  char name[256];

  printf("Please enter your name.\n");
  scanf("%s", name);
  printf("Nice to meet you, %s.\n", name);
  return 0;
}

3.
#include <iostream>
using namespace std;

int main() {
  string name;
  cout << "Please enter your name.\n");
  getline(cin, name);
  cout << "Nice to meet you, " << name << ".\n";
  return 0;
}

4.
Module example
  Public Sub Main()
    Dim line As String
    Console.WriteLine("Please enter your name.")
    line = Console.ReadLine()
    Console.WriteLine("Nice to meet you, " + line + ".")
  End Sub
End Module

5.
#!/usr/bin/perl -w
use strict;
print "Please enter your name.\n";
my $name = <>;
chomp $name;
print "Nice to meet you, $name.\n";

6.
using System;
class Hello {
  static void Main() {
    Console.WriteLine("Please enter your name.\n");
    var name = Console.ReadLine();
    Console.WriteLine("Nice to meet you, " + name + ".\n");
  }
}

7.
name = raw_input("Please enter your name: ")
print "Nice to meet you, ", name, "."

8.
importPackage(java.util);
scanner = new Scanner(System['in']);
System.out.println("Please enter your name.");
name = scanner.nextLine();
System.out.println("Nice to meet you, " + line + ".");

9.
puts "Please enter your name."
name = gets.chomp
puts "Nice to meet you, " + name

10.
main = do
  putStrLn "Please enter your name."
  name <- getLine
  putStrLn ("Nice to meet you, " ++ name ++ ".")

11.
val name = Console.readLine("Please enter your name.")
println("Nice to meet you, " + name + ".")

12.
Name = get_line("Please enter your name.").
put_chars("Nice to meet you, " ++ Name).

Making Methods Testable

I'm going through my old drafts and deleting or publishing them. This one I no longer agree with, I'd rather rewrite the methods to be side-effect free. And the 'magic' I referred to is probably mocks.

"OOP to me means only messaging, local retention and protection and hiding of state-process, and extreme late-binding of all things." -- Alan Kay.

Messaging seems to imply behaviour. From wikipedia: "in object-oriented programming languages such as Smalltalk or Java, a message is sent to an object, specifying a request for action."

It seems fairly trivial that, in, say, Java, Math.cos(double) is not object-oriented. It doesn't take any objects, it doesn't return any objects. Even if double was an object type, it wouldn't be object-oriented. One reason is that it isn't late-bound; the other is that it isn't a message that demands action, it's just a function call.

It's certainly possible to imagine an action that has no effects other than returning a value; I don't dispute that; but it does seem overblown to say "sending the message 'cos' to the Math object", instead of saying that Math has a function called cos, and we're calling that. In fact, function is a much better fit for this, because Math.cos is as close as we can get to a mathematical function. Whenever I say function from now on I'll mean a mathematical function.

Java actually has two implementations of all the maths functions, e.g., Math.cos and StrictMath.cos. We could make it so that which one we used was dynamically configurable (again, I'll cover late binding later), but that doesn't make it any more of an action. It's still a function. I assert that most of what we consider an object's behaviour can be validly and usefully thought of as functions over that object's data. In a simple way, you can see this is true; if an object has state X, and you call method x() on it, that changes the object to be in state Y, and you later return the object to state X and call the method again, it will do the same thing. The big difference between methods and functions seems to be that you can't see what side effects a method causes, without knowing the code.

Methods that have real effects on the world, or change objects, are really actions. Testing these is harder than testing functions. They're also harder to reason about and actually unnecessary.

I can think of two ways of testing them, the standard one of which is to simulate the environment and test for evidence of the action. This has the problem that any unwanted action isn't easily detected, except in the most carefully controlled environments. For example, you might set up a chroot environment for automatically testing a program that manipulates files, and not notice that it erases /etc/passwd (because you never use the chrooted filesystem again), or you might test what a method sends to System.out but neglect to also check System.err.

Another way of testing actions would be to first look at the code differently. Code doesn't do anything. It describes things. A program that writes data to a file doesn't really; it requests that data be written to a file. Intercept all interactions with the outside world. If someone wants to write to System.err, they do it through your testing code. The implementation is a little difficult to conceive; perhaps in Java it has to be done through aspects, or DI, if it is to be reasonably brief.

Once you have all the effects that an action produces, you can see if they match your expectations. There is a complication though; you could end up simulating too much of your environment. If a method asks how big a file is, you need to provide an answer. Ideally you would give a different value for each test, e.g., some exception saying the file is inaccessible, lengths 0, 1, and any other lengths that seem relevant to the method.

What we've actually done is to cocoon the method; we've now made it into a function! The perfect test suite for a method makes the method into a function. Well, that's all well and good for I/O, etc., but how about when we're testing a mutable object - it might have changed more than is usual to test for. E.g.:

public void broken()
{
    setX(5);  // we're happy with this line.
    setY(10); // but this one seems wrong.  We want the test to fail while this line is present.
}

.Let's magically jump in the way and grab a list of changes to the object that this method causes. We know x has been changed and y has been changed, so the test fails. In fact, for this case we never need to execute the actions, because we can see that they fail without executing them. For more interesting cases, we can again simulate the effect. Besides my using this technique to demonstrate a point, it could be used to test real running objects without changing them.

So again, the ideal test suite for a method makes it look like a function. If we adapt our code to explicitly state what effects it should cause, instead of going off and causing them, we then actually have functions in our code. Here's the above broken(), but fixed..

public List<Action> fixed()
{
    return asList(new AssignAction("x",5),new AssignAction("y",10));
}

The code is significantly uglier and doesn't use static typing, but it's now actually a function. Because I've adapted it to return stuff instead of do stuff, it is now far easier to test. Or perhaps all I've done is to make message passing even more explicit. Well, message passing involves actions, and to some extent we can invoke the above actions and observe their results without changing anything (recurse through the list constructing a chain of values, rather like SICP's discussion of chained environments).

It's now clearly far easier to reason about the method, because you can trivially observe all of its side-effects. You could even decide which ones to allow, externally to the code. In the usual case that you want to execute the effects immediately, you can still do that.