General musings on programming languages, and Java.

Friday, December 15, 2006

Preventing NullPointerExceptions, Maybe

Null has always bothered me. I can write code without causing NullPointerExceptions, fairly easily, but without the techniques documented here, some still slip through. Of course, my automated tests are entirely comprehensive (joke), so there's no problem, right?

Wrong. Writing tests doesn't solve the problem that null exists in the first place. If we place a bollard in the middle of a street, and test all the cars to make sure that they can get around it without hitting the houses, that doesn't make the bollard acceptable.

One rule absolutely solves this. Assign a value to each field as soon as it's declared. A non-null value. To be picky, you'd have to also ban the new Object[x] form of array creation, and never give a local variable a null value. Let's not be picky.

The instinctive reaction to this is to say that you don't always have a value to put in the field, and therefore that null is the best value.

Partly true. However, null is not the best value. The likely first thought is the NullObject pattern. For example, if we have a java.sql.Connection field, we might set it up with java.lang.reflect.Proxy, so that we can call methods on the Connection, though they do nothing. This only hides the problem, in obscure runtime behaviour. Usually, we'd rather see clear runtime behaviour (a NullPointerException) than obscure runtime behaviour ("I thought I'd saved to the DB, but it was the NullConnection"). NullObject isn't going to help.

It's better to have a real distinction between a useful value and a useless value - one that forces you to 'check', or even checks for you. There are a couple of ways of doing this. The @NotNull and @Nullable annotations introduced by IntelliJ IDEA is one - though I haven't used those myself. Another way is possible, using only the Java language. Though it has to be said, the Java 7 language will make this more comfortable.

And Now To The Meat

The following concept was shamelessly stolen from Haskell.

Given a field that may have a Connection, or null, I'll change it to 'maybe a Connection', or Maybe<Connection>. There are two implementations of Maybe - one of them does have a Connection (well, T), and one of them has Nothing.

Then, rather than testing it to see whether it really has a Connection, I tell it what I want it to do if it has a Connection, and what I want it to do if it doesn't have a Connection. Oh, and for maximum flexibility, return me the result.

Let's go with a less flexible version for a moment, as an explanation.

interface Maybe<T>
    void apply(SideEffect<T> runThisIfTheresAnObject,Runnable runThisIfThereIsnt);

interface SideEffect<T>
    void run(T input);
So, when I call maybeConnection.apply(saveStuffToDB,initialiseConnectionAndSaveStuff), if maybeConnection is 'just' a Connection, it will call, and if it is Nothing, it will call

However, there are two problems with this approach. One is that I tend towards functional programming, and this stuff relies on side effects, so it irritates me. The other is that programming side effects with anonymous classes can really be a pain in Java, thanks to the 'final' requirement on enclosing local variables.

So what I really want to do is to change apply so that it returns something. I could make it return Object, but then I'm just reintroducing the old ClassCastException possibility. I could make Maybe take two type parameters, T and R, R being the return type of apply. However, that would mean that each Maybe would only be able to run 'functions' that return one type - impractical.

Generics allows you to declare type parameters on methods, not just whole classes/interfaces, so let's do that. If you don't like the look of this, skip to the bottom and eye up the alternative implementation (visitor).

interface Maybe<T>
    <R> R apply(Function<T,R> ifT,R ifNothing);
Let's just walk through that syntax. <R> just declares a type parameter. If you don't like that, simply ignore it. apply takes in a Function, which has one method, R run(T), and it takes an R. If there is a 'real' object, a T, the Function's run method will be invoked, and the R that it returns will be returned from apply. If there isn't a real object, then ifNothing is returned.

It's rather like encapsulating an if statement. By taking the responsibility for checking null away from the user of Maybe, we're taking the possible bug away too. Note that we're only taking it as far as Maybe - of course, if the two implementations of Maybe are broken, then the bug will be everywhere.

And now for example usage:

Maybe<Connection> maybeConnection=MaybeUtility.nothing();
... some code, might set maybeConnection to something else, might not.
String outputToUser=maybeConnection.apply(new Function<Connection,String>()
    public String run(Connection connection)
        some code that uses a PreparedStatement etc. and returns a String.
},"Er, some fool forgot to connect to the database.  Fire Fred");
What we're doing here is implementing dynamic dispatch. It's another way of implementing the visitor pattern. In fact, Maybe can be implemented easily via the standard idiom for the visitor pattern - the only reason I don't is that I like single-method interfaces. I find that they fit my thinking better. They also fit the closure proposal better, which is probably worth bearing in mind now.

Here's Maybe implemented with a more obvious visitor approach:

interface Maybe<T>
    <R> R accept(MaybeVisitor<T,R> visitor);

interface MaybeVisitor<T,R>
    R ifJust(T t); //in Haskell, the opposite of Nothing is Just, in terms of the Maybe type.
    R ifNothing();
Maybe and friends can all be found in Functional Peas, which is currently a placeholder for some useful bits and pieces of functional (or nearly-functional) code.

Yeah, but..

If you think that this is wasteful in terms of programmer time, I might agree with you - until we have good syntax for closures, using Maybe isn't syntactically that attractive. This can be dealt with to an extent - such as by reducing the need for null from the original code, or choosing the visitor approach. I will blog about techniques for doing that, probably under the heading 'Reducing Mutability'. Another way is to prefer function composition over always writing 'closures', which personally I'm not very good at that yet.

If you think this is useless, because people don't make mistakes if they test enough, I refer you back to the bollard analogy at the beginning.

If you think this is useless, because I am not on a large team, I'm young, I work in a University, and therefore don't know what I'm talking about, then please don't bother commenting, and have a nice life.

If you think that this is useful, but that your colleagues won't understand or agree, just discuss it with them. They might have a better idea.


dibblego said...

Hi Ricky,
Instead of "Reducing Mutability" for a title, I suggest "Controlled Side Effects - Monads for Java Users".

You'll note that the monadic operations cannot be generalised with Java's weak type system - you'll have to resort to casting.

Ricky Clarkson said...

I haven't come across a problem yet. Perhaps you need to look at self-referential bounds (like Enum<E extends Enum<E>>).

Feel free to give an example, anyway.

Anonymous said...

Here's a diatribe against null in the same spirit.

dibblego said...

Since everyone is starting to wake up to the fact that "null is a problem", I figured I'd put my two cents in - hope you don't mind Ricky: Maybe in Java

Ricky, in your response, I am not sure what you are saying - are you suggesting that it is possible to generalise monadic operations in Java? If so, can you please show me how bind looks? I claim it isn't possible and that it can be proven so by reductio ad absurdum, but if you have found something that I have overlooked, I am most interested.

dibblego said...

Oops, Maybe in Java and Revisiting Maybe in Java

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.