General musings on programming languages, and Java.

Monday, July 28, 2008

Optional Values in Java

If you take some Java code and write psuedocode representing it, you'll probably find that you don't bother with null checks and you don't bother with getters and setters. Sure, in psuedocode you're lazy, but it's more than that - null is usually wrong, so much so that intentional uses of null look like sloppy code.

In fact, if you're writing an API, you probably want to keep null out of your interactions with your users - you want to make sure they realise their mistake if they give you null and you don't want to give them null, lest they forget to check it. But there are actual times when you need some way of representing an optional value.

One particularly popular approach is to use sentinel values - let's say "" for Strings, Double.NaN for doubles, -1 for ints. Now everywhere you read the value you need to check for the sentinel, or be sure that not checking for it won't cause you problems.

Another approach is to use an empty list to represent no value, and a list of 1 element otherwise. Again you need to check whether the list is empty before getting the result out.

You could make a class that might hold a value, that has methods called hasValue() and getValue(). Again, requires a check.

In all these you need to remember to check before you get the value - not much of an improvement over using null directly.

If I categorise some code including null checks (no, not nunchucks), then we'll have something to toy with:

1. foreach

if (x != null) {
 doStuffWith(x);
}
2. map
String s;
if (x == null) {
 s = null;
}
else {
 s = x.toString();
}
3. fold
int length;
if (s == null) {
 length = 0;
}
else {
 length = s.length();
}
Those were some strange names I gave to these categories! Let's tackle foreach first: Think of a value that might be null as a collection containing 0 or 1 elements - foreach would be a loop that runs 0 or 1 times to do something with the value.

map is a mapping from a domain containing null, to a co-domain containing null, - for example, mapping from rectangular coordinates to polar coordinates should probably yield null for a null input, if it doesn't throw an exception.

fold is a more manageable name for a 'catamorphism', which is a transformation that tends to yield a simpler value than the collection it's applied to (which seems the opposite of a fold in origami). In the case of a possibly-null value, the result is simpler because the result is (usually) a not null value.

Being responsible non-repetitive Java programmers, we'd like to encapsulate our possibly-null value plus the checks into an object with three methods, foreach, map and fold, rather than repeating them everywhere:

interface Optional<T> {
 void foreach(Task<T> task);
  R map(Conversion<T,R> conversion);
  R fold(R theDefault, Conversion<T,R> conversion);
}
(you might really want to make Optional Iterable so that you get Java's foreach loop, rather than providing foreach, as an implementation detail).

In the same way that java.util.Collections.sort can take a Comparator, each of these methods takes in an object that has a method that gets called if and when it needs to be.

interface Task<T> { void execute(T value); }
interface Conversion<T,R> { R convert(T value); }
Let's look at how we can convert the earlier null-using code to code using Optional.

1. foreach

x.foreach(doStuff);
2. map
String s=x.map(toString);
3. fold
int length=x.fold(0,length);
Of course, the likelihood is that you're not lucky enough to already have doStuff stored as a Task, toString stored as a Conversion and length stored as a Conversion, so perhaps you'd use an anonymous class to provide those. Unfortunately the syntax for anonymous classes bloats the code too much to be readable in a blog (or an IDE).

It would be useful to have good syntax for using foreach, map and fold in Java, so that there was at last an attractive alternative to null. For now we'll have to settle for attractive semantics rather than attractive syntax though.

I think this is beautiful because it provides a level of abstraction that gets you further from a potential source of bugs, makes your code more expressive about what it accepts, and lets you do in objects what otherwise would be repetitive.

A complete implementation of Optional is available in Functional Java under the name Option. There, Task is called E, and Conversion is called F. Option is most widely known as Maybe, from Haskell.

May your nulls rest in peace.

5 comments:

Reinier Zwitserloot said...

Not quite as flexible and powerful, but it has the advantage of actually being perfectly doable in current java:

For query methods, provide a second version of the method that also accepts a default, to be used when the query method would normally return null/sentinel.

System.getProperty() actually has this, and it makes coding for reading out properties 1-liners, instead of 2/3-liners:

String mailHost = System.getProperty("myprog.mailhost", "localhost"); //legal, and looks good.

versus:

String mailHost = System.hasProperty("myprog.mailhost") ? String.getProperty("myprog.mailhost") : "localhost"; //urrrgh! Repeating a string constant?

String mailHost = System.getProperty("myprog.mailhost");
if ( mailHost == null ) mailHost = "localhost"; //forget this and you get something that compiles but is broken and your program will throw an NPE waaay later, when you first try to mail something, making it somewhat hard to chase this problem down.


It's a shame System.getProperty() is one of the few places that offers it. java.util.Map doesn't, for example.

Also, I think you gloss over the value of sentinels a bit. You should definitely use sentinels where the sentinel value is itself likely to just work in client code without special checks. For example, do not ever return 'null' to indicate there are no elements, for a multi-element search function that returns a list with the results. Just return the empty list; 9 times out of 10, the client code can just iterate over the result and the right thing happends without an explicit .isEmpty() check.

Even in situations where that isn't the case, a sentinel can be better than null, because null is so unwieldy. You can call a method on a sentinel. You can store sentinels in e.g. maps as the key or the value (you can store null too but this complicates matters; you can only have one of those, and as a value you can no longer use get() normally as you can't tell the difference between 'not in map' and 'sentinel'. That's another advantage of sentinels: Unlike null, sentinels carry their own context, so its harder to confuse two different sentinels. This isn't news to you Ricky, but, perhaps for your other readers.

Unknown said...

Is it just me or have you not covered Maybe-in-Java a couple of times already?

Does it not always turn out to be a bit of a clunk bolted on? (Compared to Haskell, or perhaps Scala)

I try to avoid the whole shebang by never returning null (tell-don't-ask helps here) and doing either fail-fast or using a built-in default whenever I get a null value from a 3rd party.

Ricky Clarkson said...

Reinier, your getOrElse solution does work, but I think it encourages get, which is the problem. For an Optional, it is not unusual or an error case to have no value - the no value case shouldn't be treated as a second class citizen. I use Scala a lot, and Scala's Option does have getOrElse, but I never use that. Yes, part of the reason might be that the other ways are not ugly in Scala, but I've never used or wanted a getOrElse equivalent in Java either.

"You should definitely use sentinels where the sentinel value is itself likely to just work in client code without special checks." - Agreed, but then it's no longer a sentinel. Your empty list case is fine, but you should not use an empty list to report that the database could not be connected to, etc.

Ricky Clarkson said...

Christian,

Yes, I've covered Maybe before, but I knew less then. I was hoping to make it clearer and contain some information I didn't show before. Also, I may or may not point my manager at this new post as an explanation of it if I find the need to.

I think it's only as much a bolt-on as anything that benefits from higher order functions in Java, such as lists, event handling, I/O, database access and graphics.

Ricky Clarkson said...

Christian,

"I try to avoid the whole shebang by never returning null (tell-don't-ask helps here) and doing either fail-fast or using a built-in default whenever I get a null value from a 3rd party."

Using a default whenever you get a null from a 3rd party is the kind of thing I'd like to prevent - you don't know whether the null came in by accident or not.

Blog Archive

About Me

A salsa dancing, DJing programmer from Manchester, England.